{"id":138,"date":"2015-03-14T16:29:06","date_gmt":"2015-03-14T20:29:06","guid":{"rendered":"http:\/\/csclub.uwaterloo.ca\/~k55chen\/?p=138"},"modified":"2017-11-10T23:18:57","modified_gmt":"2017-11-11T04:18:57","slug":"look-into-crspcompustat-link-table","status":"publish","type":"post","link":"https:\/\/www.kaichen.work\/?p=138","title":{"rendered":"Look into CRSP\/Compustat link table"},"content":{"rendered":"<p>The link history table (CCMXPF_LNKHIST) is the primary table used for WRDS CCM web queries. In this post, I explain this table in detail.<\/p>\n<p><strong>Background<\/strong><\/p>\n<p>We know that a company may issue multiple securities, one of which is considered primary for the company. On CRSP, this suggests that one PERMCO (the company-lever identifier) may have multiple PERMNOs (the security-level identifier). Because CRSP collects security-level data such as price and trading volume, we should consider\u00a0CRSP as breaking down to the security level.<\/p>\n<p>It is well known that Compustat provides financial statement data of a company. The micro unit on Compustat is each and every company. However, it is less known that Compustat also provides security data. In addition, because the coverage of Compustat is more extensive than that of CRSP, Compustat contains addtional security data that are unavailable on CRSP.<\/p>\n<p>Historically, Compustat included only one (primary) security per company. Since mid-April in 2007, all securities issued by a company are available on Compustat with a new identifier, IID, which is used along with GVKEY to identify all securities tracked by Compustat. A marker item, PRIMISS, indicates whether a security is primary or secondary. Therefore, like PERMCO on CRSP, one GVKEY may have multiple IIDs currently.<\/p>\n<p>Let me summarize the identifiers used by Compustat and CRSP:<\/p>\n\n<table id=\"tablepress-1\" class=\"tablepress tablepress-id-1\">\n<thead>\n<tr class=\"row-1\">\n\t<th class=\"column-1\">Database<\/th><th class=\"column-2\">Identifier<\/th><th class=\"column-3\">Description<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping\">\n<tr class=\"row-2\">\n\t<td class=\"column-1\">Compustat<\/td><td class=\"column-2\">GVKEY<\/td><td class=\"column-3\">Compustat\u2019s permanent company identifier.<\/td>\n<\/tr>\n<tr class=\"row-3\">\n\t<td class=\"column-1\">Compustat<\/td><td class=\"column-2\">IID<\/td><td class=\"column-3\">Compustat\u2019s permanent security identifier. An identifying relationship exists between IID and GVKEY. Both must be accessed as a pair to properly identify a Compustat security. One GVKEY may have multiple IIDs.<\/td>\n<\/tr>\n<tr class=\"row-4\">\n\t<td class=\"column-1\">Compustat<\/td><td class=\"column-2\">PRIMISS<\/td><td class=\"column-3\">This item indicates whether a security is primary (P) or secondary (J). P identifies a primary security with the highest average trading volume over a period of time. J identifies a joiner (secondary) security.<\/td>\n<\/tr>\n<tr class=\"row-5\">\n\t<td class=\"column-1\">CRSP<\/td><td class=\"column-2\">PERMCO<\/td><td class=\"column-3\">CRSP\u2019s permanent company identifier.<\/td>\n<\/tr>\n<tr class=\"row-6\">\n\t<td class=\"column-1\">CRSP<\/td><td class=\"column-2\">PERMNO<\/td><td class=\"column-3\">CRSP\u2019s permanent security identifier. One PERMNO belongs to only one PERMCO. One PERMCO may have one or more PERMNOs.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-1 from cache -->\n<p>The last piece of background information is that the link between CRSP and Compustat (at both company level and security level) may change over time.<\/p>\n<p><strong>The linking process<\/strong><\/p>\n<p>Prior to the introduction of IID, Compustat included only one (primary) security per company. The link between CRSP and Compustat was between CRSP PERMNO and Compustat GVKEY. Because PERMNO is a security identifier and GVKEY is a company identifier, this link may be a many-to-one relationship, i.e., multiple PERMNOs may be linked to a single GVKEY.<\/p>\n<p>Because Compustat security-level information is now available, CRSP started to build security-level links in April 2007.<\/p>\n<p><strong>The linking history table<\/strong><\/p>\n<p>The main product of CRSP&#8217;s laborious linking efforts is the link history table. This table is Compustat-centric, that is, this table is organized and identified by Compustat identifiers which are then linked to CRSP identifiers. All Compustat records are retained, regardless of whether or not the securities (defined by GVKEY-IID) are in the CRSP universe.The following is a slice of the table (Please note that IID, PERMCO, and PERMNO have the prefix &#8220;L&#8221; in the link history table.):<\/p>\n\n<table id=\"tablepress-2\" class=\"tablepress tablepress-id-2\">\n<thead>\n<tr class=\"row-1\">\n\t<th class=\"column-1\">GVKEY<\/th><th class=\"column-2\">LINKPRIM<\/th><th class=\"column-3\">LIID<\/th><th class=\"column-4\">LINKTYPE<\/th><th class=\"column-5\">LPERMNO<\/th><th class=\"column-6\">LPERMCO<\/th><th class=\"column-7\">LINKDT<\/th><th class=\"column-8\">LINKENDDT<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping\">\n<tr class=\"row-2\">\n\t<td class=\"column-1\">COMPUSTAT global company key<\/td><td class=\"column-2\">Primary link marker<\/td><td class=\"column-3\">Security-level identifier<\/td><td class=\"column-4\">Link type code<\/td><td class=\"column-5\">Historical CRSP PERMNO link to COMPUSTAT record<\/td><td class=\"column-6\">Historical CRSP PERMCO link to COMPUSTAT record<\/td><td class=\"column-7\">First effective date of link<\/td><td class=\"column-8\">Last effective date of link<\/td>\n<\/tr>\n<tr class=\"row-3\">\n\t<td class=\"column-1\">10411<\/td><td class=\"column-2\">P<\/td><td class=\"column-3\">1<\/td><td class=\"column-4\">LC<\/td><td class=\"column-5\">63773<\/td><td class=\"column-6\">5230<\/td><td class=\"column-7\">19811215<\/td><td class=\"column-8\">.E<\/td>\n<\/tr>\n<tr class=\"row-4\">\n\t<td class=\"column-1\">10411<\/td><td class=\"column-2\">P<\/td><td class=\"column-3\">1<\/td><td class=\"column-4\">NR<\/td><td class=\"column-5\"><\/td><td class=\"column-6\"><\/td><td class=\"column-7\">19741129<\/td><td class=\"column-8\">19811214<\/td>\n<\/tr>\n<tr class=\"row-5\">\n\t<td class=\"column-1\">10411<\/td><td class=\"column-2\">J<\/td><td class=\"column-3\">7<\/td><td class=\"column-4\">LC<\/td><td class=\"column-5\">90655<\/td><td class=\"column-6\">5230<\/td><td class=\"column-7\">20050516<\/td><td class=\"column-8\">20120131<\/td>\n<\/tr>\n<tr class=\"row-6\">\n\t<td class=\"column-1\">10411<\/td><td class=\"column-2\">J<\/td><td class=\"column-3\">6<\/td><td class=\"column-4\">NR<\/td><td class=\"column-5\"><\/td><td class=\"column-6\"><\/td><td class=\"column-7\">20050429<\/td><td class=\"column-8\">20060131<\/td>\n<\/tr>\n<tr class=\"row-7\">\n\t<td class=\"column-1\">10411<\/td><td class=\"column-2\">J<\/td><td class=\"column-3\">2<\/td><td class=\"column-4\">NR<\/td><td class=\"column-5\"><\/td><td class=\"column-6\"><\/td><td class=\"column-7\">19940331<\/td><td class=\"column-8\">.E<\/td>\n<\/tr>\n<tr class=\"row-8\">\n\t<td class=\"column-1\">10411<\/td><td class=\"column-2\">J<\/td><td class=\"column-3\">4<\/td><td class=\"column-4\">NR<\/td><td class=\"column-5\"><\/td><td class=\"column-6\"><\/td><td class=\"column-7\">20020131<\/td><td class=\"column-8\">20060131<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-2 from cache -->\n<p>LIID is based on Compustat&#8217;s IID. Because Compustat&#8217;s company data range extends earlier than its security data range, there are time periods during which no IID is assigned by Compustat for a GVKEY. In these cases, CRSP assigns a dummy IID ending in \u201cX\u201d as a placeholder in the link table. This GVKEY-dummy IID may or may not be linked to a CRSP PERMNO.<\/p>\n<p>LINKPRIM is a marker item that indicates whether a GVKEY-LIID is a primary security. This marker is based on Compustat&#8217;s Primary\/Joiner flag (PRIMISS). However, due to missing primary issue markers from Compustat for early history, calendar ranges of overlapping, and different treatment for US and Canadian security issues, CRSP overides Compustat&#8217;s primary issue marker in many cases. The purpose is to produce one primary security throughout the company history. &#8220;P&#8221; represents the primary security issue identified by Compustat, while &#8220;C&#8221; represents the primary security issue identified or overridden by CRSP. In most applications, we only need the primary security.<\/p>\n<p>Another important item is LINKTYPE. In short, LC and LU are considered as the most accurate links. They are also the default link types used for WRDS CCM web queries. LX and LD are considered as &#8220;soft&#8221; links of low accuracy. Old merging sample codes also include LS in addition to LC and LU. But by definition below, I do not think that LS should be included.<\/p>\n<p>LINKDT and LINKENDDT are straightforward. They mark the period during which the link is valid.<\/p>\n<p>Please see the detailed description of each item:<\/p>\n\n<table id=\"tablepress-3\" class=\"tablepress tablepress-id-3\">\n<thead>\n<tr class=\"row-1\">\n\t<th class=\"column-1\">ITEM NAME<\/th><th class=\"column-2\">TYPE<\/th><th class=\"column-3\">DESCRIPTION<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping\">\n<tr class=\"row-2\">\n\t<td class=\"column-1\">GVKEY<\/td><td class=\"column-2\">integer, primary key (1)<\/td><td class=\"column-3\">Compustat GVKEY<\/td>\n<\/tr>\n<tr class=\"row-3\">\n\t<td class=\"column-1\">LIID<\/td><td class=\"column-2\">char(3), primary key (2)<\/td><td class=\"column-3\">Compustat IID. A dummy IID with an \u201cX\u201d suffix is assigned by CRSP as a placeholder if no IID is assigned by Compustat for a GVKEY in early history.<\/td>\n<\/tr>\n<tr class=\"row-4\">\n\t<td class=\"column-1\">LINKDT<\/td><td class=\"column-2\">integer (date), primary key (3)<\/td><td class=\"column-3\">First effective calendar date of link record range<\/td>\n<\/tr>\n<tr class=\"row-5\">\n\t<td class=\"column-1\">LINKENDDT<\/td><td class=\"column-2\">integer (date)<\/td><td class=\"column-3\">Last effective calendar date of link record range<\/td>\n<\/tr>\n<tr class=\"row-6\">\n\t<td class=\"column-1\">LPERMNO<\/td><td class=\"column-2\">integer<\/td><td class=\"column-3\">Linked CRSP PERMNO, 0 if no CRSP security link exists<\/td>\n<\/tr>\n<tr class=\"row-7\">\n\t<td class=\"column-1\">LPERMCO<\/td><td class=\"column-2\">integer<\/td><td class=\"column-3\">Linked CRSP PERMCO, 0 if no CRSP company link exists<\/td>\n<\/tr>\n<tr class=\"row-8\">\n\t<td class=\"column-1\">LINKPRIM<\/td><td class=\"column-2\">char(3)<\/td><td class=\"column-3\">Primary issue marker for the link. This marker is based on Compustat Primary\/Joiner flag (PRIMISS), but may be overridden by CRSP in some cases. Values are:<br \/>\n<br \/>\nP \u2013 Primary, identified by Compustat in monthly security data.<br \/>\n<br \/>\nJ \u2013 Joiner secondary issue of a company, identified by Compustat in monthly security data.<br \/>\n<br \/>\nC \u2013 Primary, assigned by CRSP to resolve ranges of overlapping or missing primary markers from Compustat in order to produce one primary security throughout the company history.<br \/>\n<br \/>\nN \u2013 Secondary, assigned by CRSP to override Compustat. Compustat allows a US security and a Canadian security issued by the same company to both be marked as Primary at the same time. For purposes of the link, CRSP allows only one primary at a time and marks the others as N.<\/td>\n<\/tr>\n<tr class=\"row-9\">\n\t<td class=\"column-1\">LINKTYPE<\/td><td class=\"column-2\">char(3)<\/td><td class=\"column-3\">Link type code. Each link is given a code describing the connection between the CRSP and Compustat data. Values are:<br \/>\n<br \/>\nLC \u2013 Link research complete. Standard connection between databases.<br \/>\n<br \/>\nLU \u2013 Unresearched link to issue by CUSIP<br \/>\n<br \/>\nLX \u2013 Link to a security that trades on another exchange system not included in CRSP data.<br \/>\n<br \/>\nLD \u2013 Duplicate Link to a security. Another GVKEY\/IID is a better link to that CRSP record.<br \/>\n<br \/>\nLN \u2013 Primary link exists but Compustat does not have prices.<br \/>\n<br \/>\nLS \u2013 Link valid for this security only. Other CRSP PERMNOs with the same PERMCO will link to other GVKEYs.<br \/>\n<br \/>\nNR \u2013 No link available, confirmed by research<br \/>\n<br \/>\nNU \u2013 No link available, not yet confirmed<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-3 from cache -->\n<p>I download the link history table as of January 30, 2015. I delete records without a link found (about 67% of all records; remember the link history table is Compustat-centric and the Compustat universe is bigger than the CRSP universe). For remaining records with a link found, I present the following statistics to give you a sense of the values of LINKPRIM, LINKTYPE, and LIID. As you can see, the vast majority of the primary issue marker is identified by Compustat, and &#8220;LC&#8221; and &#8220;LU&#8221; types of links constitute about 90% of all identified\u00a0links.<\/p>\n<pre class=\"theme:github font:courier-new nums:false lang:default highlight:0 decode:true \">. tab linkprim\r\n\r\n    Primary |\r\nLink Marker |      Freq.     Percent        Cum.\r\n------------+-----------------------------------\r\n          C |      7,345       23.41       23.41\r\n          J |        359        1.14       24.55\r\n          N |        907        2.89       27.45\r\n          P |     22,764       72.55      100.00\r\n------------+-----------------------------------\r\n      Total |     31,375      100.00\r\n\r\n. tab linktype\r\n\r\n  Link Type |\r\n       Codd |      Freq.     Percent        Cum.\r\n------------+-----------------------------------\r\n         LC |     11,524       36.73       36.73\r\n         LD |        141        0.45       37.18\r\n         LN |        274        0.87       38.05\r\n         LS |      1,845        5.88       43.93\r\n         LU |     16,622       52.98       96.91\r\n         LX |        969        3.09      100.00\r\n------------+-----------------------------------\r\n      Total |     31,375      100.00\r\n\r\n. tab liid\r\n\r\nSecurity-le |\r\n        vel |\r\n Identifier |      Freq.     Percent        Cum.\r\n------------+-----------------------------------\r\n        00X |      2,369        7.55        7.55\r\n         01 |     25,645       81.74       89.29\r\n        01C |        897        2.86       92.15\r\n         02 |        961        3.06       95.21\r\n        02C |         36        0.11       95.32\r\n         03 |        209        0.67       95.99\r\n        03C |         21        0.07       96.06\r\n         04 |         62        0.20       96.25\r\n        04C |          3        0.01       96.26\r\n         05 |         23        0.07       96.34\r\n        05C |          1        0.00       96.34\r\n         06 |          7        0.02       96.36\r\n        06C |          2        0.01       96.37\r\n         07 |          4        0.01       96.38\r\n        07C |          1        0.00       96.39\r\n         08 |          2        0.01       96.39\r\n         09 |          1        0.00       96.40\r\n         10 |          2        0.01       96.40\r\n         19 |          1        0.00       96.40\r\n        19C |          1        0.00       96.41\r\n         90 |        929        2.96       99.37\r\n        90C |          7        0.02       99.39\r\n         91 |         72        0.23       99.62\r\n         92 |         14        0.04       99.67\r\n         93 |          9        0.03       99.69\r\n         95 |          2        0.01       99.70\r\n         96 |          1        0.00       99.70\r\n         99 |          2        0.01       99.71\r\n        99X |         91        0.29      100.00\r\n------------+-----------------------------------\r\n      Total |     31,375      100.00<\/pre>\n<p><span style=\"color: #000000;\"><b>The merging code<\/b><\/span><\/p>\n<p>You may notice the following announcement on the CCM product:<\/p>\n<blockquote><p>As of the February 2014 release, USEDFLAG is no longer used in the WRDS CCM web queries.\u00a0 Please select LINKTYPES LC, LU, and LS for the same results.\u00a0These represent the vast majority of the links between CRSP securities and Compustat companies, without introducing duplicate data.<\/p>\n<p>The WRDS-created linking dataset (ccmxpf_linktable) has been deprecated. It will continue to be created for a transition period of 1 year. SAS programmers should use the Link History dataset (ccmxpf_lnkhist) from CRSP.<\/p><\/blockquote>\n<p>This suggests that many old merging codes should be updated accordingly. Based on the explanation above, the most important query filters are LINKPRIM, LINKTYPE, LINKDT and LINKENDDT. LINKPRIM is used to select only primary security. LINKTYPE is used to ensure accuracy. LINKDT and LINKENDDT are used to ensure validity of a link at a give time. I believe that the following code is better than most sample codes I have seen:<\/p>\n<pre class=\"lang:sas decode:true \">\/************************************************************************************\r\n* STEP ONE: Extract Compustat data;\r\n************************************************************************************\/\r\n\r\ndata comp1;\r\n  set comp.funda;\r\n  where fyear&gt;=1986\r\n    and indfmt='INDL' and datafmt='STD' and popsrc='D' and consol='C';\r\n  keep gvkey datadate fyear;\r\nrun;\r\n \r\n\/*******************************************************************************************\r\n* STEP TWO: Link GVKEYS to CRSP identifiers;                                               *\r\n* Use CCMXPF_LNKHIST table to obtain CRSP identifiers                                      *\r\n********************************************************************************************\/\r\n\r\nproc sql;\r\n  create table comp2 as\r\n  select a.*, b.lpermno as permno\r\n  from comp1 as a, crsp.ccmxpf_lnkhist as b\r\n  where a.gvkey=b.gvkey and b.linkprim in ('P', 'C') and\r\n  b.LINKTYPE in ('LU', 'LC') and\r\n  a.datadate &gt;= b.LINKDT and (a.datadate &lt;= b.LINKENDDT or missing(b.LINKENDDT))\r\n  order by gvkey, datadate;\r\nquit;<\/pre>\n<p>&nbsp;<\/p>\n<p>Finally, I acknowledge that the information mainly comes from the official\u00a0<a href=\"http:\/\/www.crsp.com\/files\/ccm_data_guide_0.pdf\" target=\"_blank\" rel=\"noopener\">CRSP\/Compustat Merged Database Guide<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The link history table (CCMXPF_LNKHIST) is the primary table used for WRDS CCM web queries. In this post, I explain this table in detail. Background We know that a company may issue multiple securities, one of which is considered primary &hellip; <a href=\"https:\/\/www.kaichen.work\/?p=138\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11,8],"tags":[],"_links":{"self":[{"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/posts\/138"}],"collection":[{"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=138"}],"version-history":[{"count":89,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/posts\/138\/revisions"}],"predecessor-version":[{"id":987,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/posts\/138\/revisions\/987"}],"wp:attachment":[{"href":"https:\/\/www.kaichen.work\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=138"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=138"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=138"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}