Both FactSet and CRSP offer identifier files that contain primary identifiers at the entity level and security level (note: an entity may issue multiple securities). These files provide a mapping between their primary identifiers and all other historical identifiers such as CUSIP and ticker. By utilizing these identifiers, we can construct a link table.
Relevant CRSP Variables
The CRSP STOCKNAMES file provides a mapping between CRSP permanent identifiers and all historical CUSIPs, company names, and exchange tickers, along with their effective date ranges.
PERMCOis the unique identifier for CRSP entities.
PERMCOremains constant over time, even if the company changes its name, CUSIP, or exchange ticker.
PERMNOis the unique identifier for CRSP securities.
PERMNOalso does not change over time too.
COMNAMrepresents the company name in CRSP at a specific point in time.
NCUSIPis the 8-digit historical CUSIP for CRSP securities. A change in CUSIP can occur due to various reasons, including non-fundamental events like splits and company name changes. CRSP also has a header variable named
CUSIPthat reflects only the most recent CUSIP for the entire time series.
NAMEENDDTrepresent the first and last effective dates of each name structure, which includes CUSIP, company name, exchange code, exchange ticker, share class, and SIC code. Each
PERMNOhas at least one name structure in the STOCKNAMES file.
Relevant FactSet Variables
The FactSet SYM_V1_SYM_CUSIP_HIST file provides a mapping between FactSet permanent identifiers and all historical CUSIPs, along with their effective date ranges.
FSYM_IDis the unique identifier for FactSet securities (similar to
FSYM_IDremains unchanged over time. In another identifier file, ENT_V1_ENT_SCR_SEC_ENTITY_HIST, FactSet provides a mapping between FactSet securities and their issuing entities, uniquely identified by
FACTSET_ENTITY_ID, which, like
PERMCOin CRSP, does not change over time.
CUSIPis the 9-digit historical CUSIP for FactSet securities. FactSet also has a header variable named
MOST_RECENTthat reflects only the most recent CUSIP for the entire time series.
END_DATErepresent the first and last effective dates of each name structure.
Linking FactSet FSYM_ID and CRSP PERMNO using CUSIP
FSYM_ID and CRSP
PERMNO are used to uniquely identify securities. We can link the two using the associated historical CUSIPs, which serve as a common identifier in both FactSet and CRSP.
The following two-part code generates a link table that provides a mapping between FactSet
FSYM_ID and CRSP
PERMNO, along with their effective date ranges (
END_DATE). The first part uses SAS, and the second part uses Stata. An SAS expert should be able to complete the task entirely in SAS. I use Stata for the second part simply because a handy command (
tsspell) is available in Stata.
First-part SAS code:
proc sort data=sym_v1_sym_cusip_hist out=fs (keep=fsym_id cusip start_date end_date);
by fsym_id cusip start_date;
proc sort data=stocknames out=crsp1 (keep=permno ncusip comnam namedt nameenddt);
where not missing(ncusip);
by permno ncusip namedt;
do i=0 to (nameenddt-namedt);
format namedt nameenddt date date9.;
create table linktable_temp as
select distinct a.permno, a.date, b.fsym_id
from crsp2 a, fs b
where a.ncusip=substr(b.cusip,1,8) and b.start_date<=a.date<=coalesce(b.end_date,today());
proc export data= linktable_temp
dbms= dta replace;
Second-part Stata code:
use linktable_temp, clear
egen pid=group(permno fsym_id)
tsset pid date
bysort pid _spell: egen start_date=min(date)
bysort pid _spell: egen end_date=max(date)
format start_date end_date %td
duplicates drop pid _spell, force
keep permno fsym_id start_date end_date
saveold linktable, replace v(12)
Please set up the
LIBNAME and directories in SAS and Stata to properly reference the relevant datasets. In Stata, use the command
ssc install tsspell, replace to install the third-party command.
Please note that in the link table, the
END_DATE column represents the most recent update date of the CRSP STOCKNAMES file (rather than a missing value) if the link is still active. Therefore, it is important to use the most recent CRSP STOCKNAMES file to avoid any unexpected error.
I find that the link table only covers a small portion of the FactSet universe, as CRSP only keeps historical CUSIP data for securities traded in major exchanges, while FactSet extracts data from a much large security universe. Specifically, only 36,560
FSYM_IDs can be linked to a
PERMNO, whereas there are 310,234
FSYM_IDs in FactSet with