Both FactSet and CRSP offer identifier files that contain primary identifiers at the entity level and security level (note: an entity may issue multiple securities). These files provide a mapping between their primary identifiers and all other historical identifiers such as CUSIP and ticker. By utilizing these identifiers, we can construct a link table.

Relevant CRSP Variables

The CRSP STOCKNAMES file provides a mapping between CRSP permanent identifiers and all historical CUSIPs, company names, and exchange tickers, along with their effective date ranges.

  • PERMCO is the unique identifier for CRSP entities. PERMCO remains constant over time, even if the company changes its name, CUSIP, or exchange ticker.
  • PERMNO is the unique identifier for CRSP securities. PERMNO also does not change over time too.
  • COMNAM represents the company name in CRSP at a specific point in time.
  • NCUSIP is the 8-digit historical CUSIP for CRSP securities. A change in CUSIP can occur due to various reasons, including non-fundamental events like splits and company name changes. CRSP also has a header variable named CUSIP that reflects only the most recent CUSIP for the entire time series.
  • NAMEDT and NAMEENDDT represent the first and last effective dates of each name structure, which includes CUSIP, company name, exchange code, exchange ticker, share class, and SIC code. Each PERMNO has at least one name structure in the STOCKNAMES file.

Relevant FactSet Variables

The FactSet SYM_V1_SYM_CUSIP_HIST file provides a mapping between FactSet permanent identifiers and all historical CUSIPs, along with their effective date ranges.

  • FSYM_ID is the unique identifier for FactSet securities (similar to PERMNO in CRSP). FSYM_ID remains unchanged over time. In another identifier file, ENT_V1_ENT_SCR_SEC_ENTITY_HIST, FactSet provides a mapping between FactSet securities and their issuing entities, uniquely identified by FACTSET_ENTITY_ID, which, like PERMCO in CRSP, does not change over time.
  • CUSIP is the 9-digit historical CUSIP for FactSet securities. FactSet also has a header variable named MOST_RECENT that reflects only the most recent CUSIP for the entire time series.
  • START_DATE and END_DATE represent the first and last effective dates of each name structure.

Linking FactSet FSYM_ID and CRSP PERMNO using CUSIP

Both FactSet FSYM_ID and CRSP PERMNO are used to uniquely identify securities. We can link the two using the associated historical CUSIPs, which serve as a common identifier in both FactSet and CRSP.

The following two-part code generates a link table that provides a mapping between FactSet FSYM_ID and CRSP PERMNO, along with their effective date ranges (START_DATE and END_DATE). The first part uses SAS, and the second part uses Stata. An SAS expert should be able to complete the task entirely in SAS. I use Stata for the second part simply because a handy command (tsspell) is available in Stata.

First-part SAS code:
Second-part Stata code:
Please set up the LIBNAME and directories in SAS and Stata to properly reference the relevant datasets. In Stata, use the command ssc install tsspell, replace to install the third-party command.

Please note that in the link table, the END_DATE column represents the most recent update date of the CRSP STOCKNAMES file (rather than a missing value) if the link is still active. Therefore, it is important to use the most recent CRSP STOCKNAMES file to avoid any unexpected error.

I find that the link table only covers a small portion of the FactSet universe, as CRSP only keeps historical CUSIP data for securities traded in major exchanges, while FactSet extracts data from a much large security universe. Specifically, only 36,560 FSYM_IDs can be linked to a PERMNO, whereas there are 310,234 FSYM_IDs in FactSet with FACTSET_ENTITY_IDs.

I would like to express my gratitude to two WRDS articles: Merging CRSP and COMPUSTAT Data and Linking IBES and CRSP Data for their valuable insights.

 

Categorized in:

Tagged in: