Link FactSet and CRSP

Both FactSet and CRSP offer identifier files that contain primary identifiers at the entity level and security level (note: an entity may issue multiple securities). These files provide a mapping between their primary identifiers and all other historical identifiers such as CUSIP and ticker. By utilizing these identifiers, we can construct a link table.

Relevant CRSP Variables

The CRSP STOCKNAMES file provides a mapping between CRSP permanent identifiers and all historical CUSIPs, company names, and exchange tickers, along with their effective date ranges.

  • PERMCO is the unique identifier for CRSP entities. PERMCO remains constant over time, even if the company changes its name, CUSIP, or exchange ticker.
  • PERMNO is the unique identifier for CRSP securities. PERMNO also does not change over time too.
  • COMNAM represents the company name in CRSP at a specific point in time.
  • NCUSIP is the 8-digit historical CUSIP for CRSP securities. A change in CUSIP can occur due to various reasons, including non-fundamental events like splits and company name changes. CRSP also has a header variable named CUSIP that reflects only the most recent CUSIP for the entire time series.
  • NAMEDT and NAMEENDDT represent the first and last effective dates of each name structure, which includes CUSIP, company name, exchange code, exchange ticker, share class, and SIC code. Each PERMNO has at least one name structure in the STOCKNAMES file.

Relevant FactSet Variables

The FactSet SYM_V1_SYM_CUSIP_HIST file provides a mapping between FactSet permanent identifiers and all historical CUSIPs, along with their effective date ranges.

  • FSYM_ID is the unique identifier for FactSet securities (similar to PERMNO in CRSP). FSYM_ID remains unchanged over time. In another identifier file, ENT_V1_ENT_SCR_SEC_ENTITY_HIST, FactSet provides a mapping between FactSet securities and their issuing entities, uniquely identified by FACTSET_ENTITY_ID, which, like PERMCO in CRSP, does not change over time.
  • CUSIP is the 9-digit historical CUSIP for FactSet securities. FactSet also has a header variable named MOST_RECENT that reflects only the most recent CUSIP for the entire time series.
  • START_DATE and END_DATE represent the first and last effective dates of each name structure.

Linking FactSet FSYM_ID and CRSP PERMNO using CUSIP

Both FactSet FSYM_ID and CRSP PERMNO are used to uniquely identify securities. We can link the two using the associated historical CUSIPs, which serve as a common identifier in both FactSet and CRSP.

The following two-part code generates a link table that provides a mapping between FactSet FSYM_ID and CRSP PERMNO, along with their effective date ranges (START_DATE and END_DATE). The first part uses SAS, and the second part uses Stata. An SAS expert should be able to complete the task entirely in SAS. I use Stata for the second part simply because a handy command (tsspell) is available in Stata.

First-part SAS code:

Second-part Stata code:

Please set up the LIBNAME and directories in SAS and Stata to properly reference the relevant datasets. In Stata, use the command ssc install tsspell, replace to install the third-party command.

Please note that in the link table, the END_DATE column represents the most recent update date of the CRSP STOCKNAMES file (rather than a missing value) if the link is still active. Therefore, it is important to use the most recent CRSP STOCKNAMES file to avoid any unexpected error.

I find that the link table only covers a small portion of the FactSet universe, as CRSP only keeps historical CUSIP data for securities traded in major exchanges, while FactSet extracts data from a much large security universe. Specifically, only 36,560 FSYM_IDs can be linked to a PERMNO, whereas there are 310,234 FSYM_IDs in FactSet with FACTSET_ENTITY_IDs.

I would like to express my gratitude to two WRDS articles: Merging CRSP and COMPUSTAT Data and Linking IBES and CRSP Data for their valuable insights.

 

This entry was posted in Data and tagged . Bookmark the permalink.

7 Responses to Link FactSet and CRSP

  1. Bree Neal says:

    Super useful code! Thanks a lot. Do you know how to get which Factset entity id is which institutional investor?

  2. curious_student says:

    Professor,
    Thank you for generously sharing your knowledge, expertise, and insights.

    Another approach to explore is using Compustat gvkey (an issuer or company level identifier) and Compustat gvkey combined with Compustat iid, “gvkey_iid” (an issue or security level identifier) as a “common connector”.

    I know there is a CRSP-Compustat linking table that can be used to link a Compustat permco to a gvkey, and a Compustat permno to a Compustat gvkey_iid . https://www.crsp.org/products/documentation/link-actions

    I will explore if there is a way to link Facset identifiers (ex. FACTSET_ENTITY_ID, FSYM_ID) to Compustat identifiers (gvkey and gvkey_iid).

    I will report what I find here.

  3. Curious Student says:

    Professor,
    Thank you for generously sharing your knowledge, expertise, and insights.

    Another approach to explore for linking Factset and CRSP is using Compustat gvkey (an issuer or company level identifier) and Compustat gvkey combined with Compustat iid, “gvkey_iid” (an issue or security level identifier) as a “common connector”.

    I know there is a CRSP-Compustat linking table that can be used to link a Compustat permco to a gvkey, and a Compustat permno to a Compustat gvkey_iid.  https://www.crsp.org/products/documentation/link-actions

    I will explore if there is a way to link Facset identifiers (FACTSET_ENTITY_ID, FSYM_ID) to Compustat identifiers (gvkey and gvkey_iid   combined with iid).

    I will report what I find here.

    • Amy says:

      Hi,
      Have you found a way to link Facset identifiers (FACTSET_ENTITY_ID, FSYM_ID) to Compustat identifiers (gvkey and gvkey_iid combined with iid)? Thank you.

      Amy

      • Kai Chen says:

        After you follow this post to get PERMNO, you can use Compustat-CRSP link table to get GVKEY.

        • Amy says:

          Thank you so much Kai.

          I’m trying to keep SIC code, but failed.

          “proc sort data=prac.stocknames out=prac.crsp1 (keep=permno ncusip comnam namedt nameenddt SIC);”
          Could you help out? Thank you so much.
          Amy

Leave a Reply to Curious Student Cancel reply

Your email address will not be published. Required fields are marked *