{"id":1591,"date":"2022-02-12T10:50:35","date_gmt":"2022-02-12T15:50:35","guid":{"rendered":"http:\/\/kaichen.work\/?p=1591"},"modified":"2023-05-11T16:14:03","modified_gmt":"2023-05-11T20:14:03","slug":"link-factset-and-crsp","status":"publish","type":"post","link":"https:\/\/www.kaichen.work\/?p=1591","title":{"rendered":"Link FactSet and CRSP"},"content":{"rendered":"<p>Both FactSet and CRSP offer identifier files that contain primary identifiers at the entity level and security level (note: an entity may issue multiple securities). These files provide a mapping between their primary identifiers and all other historical identifiers such as CUSIP and ticker. By utilizing these identifiers, we can construct a link table.<\/p>\n<h4><strong>Relevant CRSP Variables<\/strong><\/h4>\n<p>The CRSP STOCKNAMES file provides a mapping between CRSP permanent identifiers and all historical CUSIPs, company names, and exchange tickers, along with their effective date ranges.<\/p>\n<ul>\n<li><code>PERMCO<\/code> is the unique identifier for CRSP entities. <code>PERMCO<\/code> remains constant over time, even if the company changes its name, CUSIP, or exchange ticker.<\/li>\n<li><code>PERMNO<\/code> is the unique identifier for CRSP securities. <code>PERMNO<\/code> also does not change over time too.<\/li>\n<li><code>COMNAM<\/code> represents the company name in CRSP at a specific point in time.<\/li>\n<li><code>NCUSIP<\/code> is the 8-digit historical CUSIP for CRSP securities. A change in CUSIP can occur due to various reasons, including non-fundamental events like splits and company name changes. CRSP also has a header variable named <code>CUSIP<\/code> that reflects only the most recent CUSIP for the entire time series.<\/li>\n<li><code>NAMEDT<\/code> and <code>NAMEENDDT<\/code> represent the first and last effective dates of each name structure, which includes CUSIP, company name, exchange code, exchange ticker, share class, and SIC code. Each <code>PERMNO<\/code> has at least one name structure in the STOCKNAMES file.<\/li>\n<\/ul>\n<h4><strong>Relevant FactSet Variables<\/strong><\/h4>\n<p>The FactSet SYM_V1_SYM_CUSIP_HIST file provides a mapping between FactSet permanent identifiers and all historical CUSIPs, along with their effective date ranges.<\/p>\n<ul>\n<li><code>FSYM_ID<\/code> is the unique identifier for FactSet securities (similar to <code>PERMNO<\/code> in CRSP). <code>FSYM_ID<\/code> remains unchanged over time. In another identifier file, ENT_V1_ENT_SCR_SEC_ENTITY_HIST, FactSet provides a mapping between FactSet securities and their issuing entities, uniquely identified by <code>FACTSET_ENTITY_ID<\/code>, which, like <code>PERMCO<\/code> in CRSP, does not change over time.<\/li>\n<li><code>CUSIP<\/code> is the 9-digit historical CUSIP for FactSet securities. FactSet also has a header variable named <code>MOST_RECENT<\/code> that reflects only the most recent CUSIP for the entire time series.<\/li>\n<li><code>START_DATE<\/code> and <code>END_DATE<\/code> represent the first and last effective dates of each name structure.<\/li>\n<\/ul>\n<h4><strong>Linking FactSet FSYM_ID and CRSP PERMNO using CUSIP<\/strong><\/h4>\n<p>Both FactSet <code>FSYM_ID<\/code> and CRSP <code>PERMNO<\/code> are used to uniquely identify securities. We can link the two using the associated historical CUSIPs, which serve as a common identifier in both FactSet and CRSP.<\/p>\n<p>The following two-part code generates a link table that provides a mapping between FactSet <code>FSYM_ID<\/code> and CRSP <code>PERMNO<\/code>, along with their effective date ranges (<code>START_DATE<\/code> and <code>END_DATE<\/code>). The first part uses SAS, and the second part uses Stata. An SAS expert should be able to complete the task entirely in SAS. I use Stata for the second part simply because a handy command (<code>tsspell<\/code>) is available in Stata.<\/p>\n<h5>First-part SAS code:<\/h5>\n<pre class=\"lang:sas decode:true\">proc sort data=sym_v1_sym_cusip_hist out=fs (keep=fsym_id cusip start_date end_date);\r\nwhere not(missing(cusip));\r\nby fsym_id cusip start_date;\r\nrun;\r\n\r\nproc sort data=stocknames out=crsp1 (keep=permno ncusip comnam namedt nameenddt);\r\nwhere not missing(ncusip);\r\nby permno ncusip namedt;\r\nrun;\r\n\r\ndata crsp2;\r\nset crsp1;\r\ndo i=0 to (nameenddt-namedt);\r\ndate=namedt+i;\r\noutput;\r\nend;\r\nformat namedt nameenddt date date9.;\r\ndrop i;\r\nrun;\r\n\r\nproc sql;\r\ncreate table linktable_temp as\r\nselect distinct a.permno, a.date, b.fsym_id\r\nfrom crsp2 a, fs b\r\nwhere a.ncusip=substr(b.cusip,1,8) and b.start_date&lt;=a.date&lt;=coalesce(b.end_date,today());\r\nquit;\r\n\r\nproc export data= linktable_temp\r\noutfile= \"C:\/linktable_temp\"\r\ndbms= dta replace;\r\nrun;\r\n\r\n<\/pre>\n<h5>Second-part Stata code:<\/h5>\n<pre class=\"lang:default decode:true\">use linktable_temp, clear\r\n\r\negen pid=group(permno fsym_id)\r\n\r\ntsset pid date\r\n\r\ntsspell pid\r\n\r\nbysort pid _spell: egen start_date=min(date)\r\nbysort pid _spell: egen end_date=max(date)\r\n\r\nformat start_date end_date %td\r\n\r\nduplicates drop pid _spell, force\r\n\r\nkeep permno fsym_id start_date end_date\r\n\r\nsaveold linktable, replace v(12)\r\n\r\n<\/pre>\n<p>Please set up the <code>LIBNAME<\/code> and directories in SAS and Stata to properly reference the relevant datasets. In Stata, use the command <code class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">ssc install tsspell, replace<\/code> to install the third-party command.<\/p>\n<p>Please note that in the link table, the <code>END_DATE<\/code> column represents the most recent update date of the CRSP STOCKNAMES file (rather than a missing value) if the link is still active. Therefore, it is important to use the most recent CRSP STOCKNAMES file to avoid any unexpected error.<\/p>\n<p>I find that the link table only covers a small portion of the FactSet universe, as CRSP only keeps historical CUSIP data for securities traded in major exchanges, while FactSet extracts data from a much large security universe. Specifically, only 36,560 <code>FSYM_ID<\/code>s can be linked to a <code>PERMNO<\/code>, whereas there are 310,234 <code>FSYM_ID<\/code>s in FactSet with <code>FACTSET_ENTITY_ID<\/code>s.<\/p>\n<p>I would like to express my gratitude to two WRDS articles: <a href=\"https:\/\/wrds-www.wharton.upenn.edu\/pages\/support\/applications\/linking-databases\/linking-crsp-and-compustat\/\" target=\"_blank\" rel=\"noopener\">Merging CRSP and COMPUSTAT Data<\/a> and <a href=\"https:\/\/wrds-www.wharton.upenn.edu\/pages\/support\/applications\/linking-databases\/linking-ibes-and-crsp-data\/\" target=\"_blank\" rel=\"noopener\">Linking IBES and CRSP Data<\/a>\u00a0for their valuable insights.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Both FactSet and CRSP offer identifier files that contain primary identifiers at the entity level and security level (note: an entity may issue multiple securities). These files provide a mapping between their primary identifiers and all other historical identifiers such &hellip; <a href=\"https:\/\/www.kaichen.work\/?p=1591\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[14],"_links":{"self":[{"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/posts\/1591"}],"collection":[{"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1591"}],"version-history":[{"count":29,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/posts\/1591\/revisions"}],"predecessor-version":[{"id":1949,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/posts\/1591\/revisions\/1949"}],"wp:attachment":[{"href":"https:\/\/www.kaichen.work\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1591"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1591"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1591"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}