Link Audit Analytics, Compustat, CRSP and I/B/E/S

The following program is used to link each financial restatement in Audit Analytics to Compustat, CRSP, and I/B/E/S. The resultant dataset aa contains unique identifiers of Audit Analytics (res_notify_key), Compustat (gvkey), CRSP (permno), and I/B/E/S (ibtic).

Please note this program uses the macro ICLINK. In order to use this macro, you need to add the following line to your autoexec.sas file in your WRDS home directory (see here for details):

options sasautos=('/wrds/wrdsmacros/', SASAUTOS) MAUTOSOURCE;

 

Posted in SAS | 15 Comments

Regression with a large dummy-variable set (e.g., firm and year fixed effects) in Stata

I run a regression in Stata with a Compustat dataset that contains fyear and gvkey. I want to add firm and year fixed effects, so I type the following command:

regress DV IV i.fyear i.gvkey

But Stata returns the error r(103), which suggests “too many variables specified”. It turns out that the culprit is i.gvkey.

I find two work-arounds:

(1) Use both xtset and xtreg:
xtset gvkey
xtreg DV IV i.fyear, fe

(2) Use areg:
areg DV IV i.fyear, absorb(gvkey)

xtreg is the Stata command for fixed-, between-, and random-effects linear models, and areg is the Stata command for linear regression with a large dummy-variable set.

In my example, I find that both commands returns exactly same results.

Posted in Stata | Leave a comment

Stata command to create Fama-French industry classifications based on SIC codes

[Update on January 18, 2023] I was advised that the installation command in the original post didn’t work anymore. The author now instructed users, on his website, to find packages here.

You can download a package, as well as its help file, from the above link and place them in your ado directory.

Please type sysdir in Stata to find your ado directory. You may have multiple ado directories, and I would recommend using either PLUS or Personal.

Thank you Judson for creating this handy command!

[Original post] See ffind written by Judson Caskey. Here is his website:
https://sites.google.com/site/judsoncaskey/data

The syntax is as follows (download Help file here):
ffind sic, newvar(ffi) type(30)

type(30) means Fama_French 30 industries classification. Type must be 5, 10, 12, 17, 30, 38, 48 or 49.

This command does not support installation using ssc install. Type the following in Stata to install the command:
net from https://sites.google.com/site/judsoncaskey/data
net install utilities.pkg

See here for equivalent SAS macro:
http://faculty-gsb.stanford.edu/dehaan/SAS.html

Posted in Stata | 30 Comments

Fuzzy match in Stata

Nice article. http://ebp-projects.isr.umich.edu/NCRN/papers/wasi_flaaen_statarecordlinkageutilities.pdf

Author’s website: http://www-personal.umich.edu/~nwasi/programs.html

For installation, type the following command in Stata:
net from http://www-personal.umich.edu/~nwasi/programs
net install stnd_compname.pkg

Update on August 30, 2018—The foregoing URLs seem broken. Please use the following command to install the package:
net from http://www.stata-journal.com/software/sj15-3
net install dm0082.pkg, replace
net get dm0082.pkg, replace

Reference see https://www.stata-journal.com/article.html?article=dm0082.

Posted in Stata | 2 Comments

Calculate CFO tenure with Execucomp in Stata

Nice article and nice blog! Happy to find that someone built a similar blog.

https://robsonglasscock.wordpress.com/2014/04/01/determining-cfo-tenure-with-execucomp-data/

Posted in Stata | 1 Comment

Calculate market value of equity: use CRSP or Compustat?

I encountered a problem today: I had to calculate market value of equity. On Compustat Quarterly I found three variables that seemed to be what I need: PRCCQ, CSHOQ, and MKVALTQ. On CRSP, I found two variables: PRC and SHROUT. So I calculated three market values:

MV1 = PRCCQ * CSHOQ

MV2 = MKVALTQ

MV3 = ABS(PRC) * SHROUT (Note PRC may be negative so we have to get absolute value)

I suppose the three measures should be at least quite close, but they are not. Specifically, MV1 and MV2 are quite close, but many many MV2 are missing. The difference between MV1 and Mv3 is big in many cases. So I wonder why.

Then I find this article (link) that gives a good analysis. Thanks Guan!

Guan’s short answer seems to be: use COMPUSTAT.

Another application (link) on WRDS notes:

Market value of equity as of the December end of any given fiscal year can be obtained as a product of the close market price at the calendar year end (PRCC_C variable in FUNDA) times the shares outstanding (CSHO). CSHO represents the net number of all common shares outstanding at year-end, excluding treasury shares and scrip. If this market value of equity is not available, we take the product of monthly close market price (PRCCM) and the quarterly shares outstanding (CSHOQ) as of December from the Compustat Security pricing table (SECM). Note that in case a company has more than one common stock issue, the program selects the price of the primary issue identified by Compustat as the issue with the highest average trading volume over a period of time (PRIMISS=’P’). One potential concern here is that the price used represents the price of a single issue, whereas the shares oustanding reflect all common shares. According to WRDS analysis, it does not appear to affect the results as the aggregation of shares outstanding across various stock issues is done by Compustat primarily in cases when issues have similar price.

Estimating market value of equity with CRSP and Compustat

Estimating a firm’s market value of common equity (MVE) at a given point in time is something that a finance researcher does quite often. It can be surprisingly hard using only the two commonly available datasets for financial and stock information, CRSP and Compustat.

The most foolproof way is to use the prcc_f and csho fields in Compustat Fundamentals Annual file (or their counterparts prccq and cshoq), which are the closing stock price and the number of shares outstanding (in millions of shares) on the balance sheet date.

For example, here are the values and corresponding market value of equity for Apple, Berkshire Hathaway and the New York Times Company on the latest datadate in my local copy of Compustat:

CompanyDateGVKEYPERMCOcsho (m)prcc_fMVE ($m)
Apple2010-09-300016907915.97283.75259,906
Berkshire Hathaway2009-12-310021765401.55299,200153,958
New York Times Co.2009-12-3100786621280144.51312.361,786

I can tell you that these numbers are, at least in my opinion, correct. This method is fine if you happen to need MVE on the annual balance sheet date, or the quarterly balance sheet date for firms that are in the Fundamentals Quarterly file, but what if you need MVE at the end of a different month or some arbitrary date? For example, you may want to construct portfolios sorted by book-to-market, and since accounting information is not released on datadate, you want to make sure you are not using information that is not available to outsider investors.

The simplest way is, of course, to look in the CRSP Monthly (or Daily) Stock File. The PERMNO for Apple common stock is 14593, so we could multiply abs(prc) (because bid/ask midpoint prices are reported as negative numbers in CRSP, don’t forget that!) and shrout, shares outstanding in thousands. For Apple on 2010-09-30, this gives us 283.75×915,970=$259.9 billion, the same result as Compustat.

What about Berkshire Hathaway? Remember that Berkshire Hathaway has two share classes traded on the New York Stock Exchange, A shares (PERMNO 17778) and B shares (PERMNO 83443). We simply have to multiply price and shares outstanding for each class and sum them. The generic way to do this is to start with a PERMCO, get every security that has share code 10 or 11 (common stock), and multiply price with shares outstanding. Here’s sample SAS code for this, in this case hard coded for Berkshire, but you can figure out how to merge with a separate list of PERMCOs and dates (and properly use group by):

How come we had to use 4 different figures to calculate Berkshire’s market value using CRSP data, and only two figures with Compustat data? This is because csho and prcc_f are stated in terms of only a single class of stock. Berkshire Hathaway had 1.552 million class A equivalent shares outstanding, weighted by the relative economic interest of the different share classes, a number that the poorly paid S&P research assistants probably found in Item 6, “Class A equivalent common shares outstanding, in thousands” in the Berkshire 10-K. (Some companies make this a lot harder to find, especially on the balance sheet date rather than reporting date.)

Now if you do this for the Times Company on 2009-12-31, you would find only one share class on CRSP, PERMNO 47466, and a market value of 12.36×143,715=$1,776 million. This is almost $10 million less than what Compustat has. What happened?

If you looked very closely at the CRSP names file, you would notice that PERMNO 47466 represents class A common stock. Some companies have only a single class of common stock that is still designated class A, but the Times Company has a separate class B that is not traded on any exchange and is not on the CRSP tape.

So how do we estimate MVE for the Times Company at the end of January 2010, which is not in Compustat? My solution is to multiply the CRSP market value with the last known ratio of Compustat to CRSP market value. This ratio was 1,786/1,776=1.0056. CRSP market value on 2010-01-29 was 12.92×143,715=$1,858 million. Multiplied with our ratio, this gives an estimate of total market value of equity of $1,867 million. This estimate doesn’t account for the special voting rights of class B stock, but it is probably the best practical value we can assign to Times Company equity at market prices.

My adjustment may not be such a huge deal for the Times Company, but I can tell you that at the end of 2002, Ascent Media Group (PERMNO 88354, GVKEY 006948) had a total market value of equity, as implied by issues on CRSP, that was 11.7 times the market value of issues on the CRSP tape.

Posted in Learning Resources | 9 Comments

Calculate delta (pay-performance sensitivity), vega (risktaking incentives), and firm-specific wealth (inside equity) for executives on Execucomp

Several papers use measures of delta (pay-performance sensitivity), vega (risktaking incentives), and firm-specific wealth (inside equity) for executives on Execucomp. For example,

1.  Core, J., Guay, W., 2002. Estimate the value of employee stock option portfolios and their sensitivities to price and volatility. Journal of Accounting Research 40, 613-630.

  • develops a method to calculate delta and vega using information provided by Execucomp in the pre-2006 period

2.  Coles, J., Daniel, N., Naveen, L., 2006. Managerial incentives and risk-taking. Journal of Financial Economics 79, 431-468.

  • estimates delta and vega

3.  Daniel, N., Li, Y., and Naveen, L. 2013. No asymmetry in pay for luck. Working Paper.

  • calculates a measure of firm-specific wealth using executives’ stock and option portfolios

Naveen makes publicly available her SAS program used to calculate delta, vega, and firm-specific wealth. See her homepage here. However,

  • Her program is not self-executable because she uses three external datasets for which she does not provide SAS codes used to create them.
  • Her program is to be executed locally so not portable (does not support PC SAS or SSH connection).
  • Her program calculates these measures only up to fiscal year 2010.

I improve Naveen’s program to make it self-contained and executable on its own. Specifically, I recreate the three datasets within the new program and update dataset references to point to the sever end. Now you can run the program via PC SAS or SSH connection, and specify the start year and end year of the period of interest. So you can easily update the data up to the most recent date.

I write a little more details in the overview section in the new program. As evidenced in the overview, I believe that I successfully replicate Naveen’s data using the new program. However, if you decide to use the new program, the accuracy of the generated data is your own responsibility.

Lastly, please cite Naveen’s work if you use the new program. I would be appreciated if you are generous enough to acknowledge my work.

Posted in SAS | 49 Comments

Look into CRSP/Compustat link table

The link history table (CCMXPF_LNKHIST) is the primary table used for WRDS CCM web queries. In this post, I explain this table in detail.

Background

We know that a company may issue multiple securities, one of which is considered primary for the company. On CRSP, this suggests that one PERMCO (the company-lever identifier) may have multiple PERMNOs (the security-level identifier). Because CRSP collects security-level data such as price and trading volume, we should consider CRSP as breaking down to the security level.

It is well known that Compustat provides financial statement data of a company. The micro unit on Compustat is each and every company. However, it is less known that Compustat also provides security data. In addition, because the coverage of Compustat is more extensive than that of CRSP, Compustat contains addtional security data that are unavailable on CRSP.

Historically, Compustat included only one (primary) security per company. Since mid-April in 2007, all securities issued by a company are available on Compustat with a new identifier, IID, which is used along with GVKEY to identify all securities tracked by Compustat. A marker item, PRIMISS, indicates whether a security is primary or secondary. Therefore, like PERMCO on CRSP, one GVKEY may have multiple IIDs currently.

Let me summarize the identifiers used by Compustat and CRSP:

DatabaseIdentifierDescription
CompustatGVKEYCompustat’s permanent company identifier.
CompustatIIDCompustat’s permanent security identifier. An identifying relationship exists between IID and GVKEY. Both must be accessed as a pair to properly identify a Compustat security. One GVKEY may have multiple IIDs.
CompustatPRIMISSThis item indicates whether a security is primary (P) or secondary (J). P identifies a primary security with the highest average trading volume over a period of time. J identifies a joiner (secondary) security.
CRSPPERMCOCRSP’s permanent company identifier.
CRSPPERMNOCRSP’s permanent security identifier. One PERMNO belongs to only one PERMCO. One PERMCO may have one or more PERMNOs.

The last piece of background information is that the link between CRSP and Compustat (at both company level and security level) may change over time.

The linking process

Prior to the introduction of IID, Compustat included only one (primary) security per company. The link between CRSP and Compustat was between CRSP PERMNO and Compustat GVKEY. Because PERMNO is a security identifier and GVKEY is a company identifier, this link may be a many-to-one relationship, i.e., multiple PERMNOs may be linked to a single GVKEY.

Because Compustat security-level information is now available, CRSP started to build security-level links in April 2007.

The linking history table

The main product of CRSP’s laborious linking efforts is the link history table. This table is Compustat-centric, that is, this table is organized and identified by Compustat identifiers which are then linked to CRSP identifiers. All Compustat records are retained, regardless of whether or not the securities (defined by GVKEY-IID) are in the CRSP universe.The following is a slice of the table (Please note that IID, PERMCO, and PERMNO have the prefix “L” in the link history table.):

GVKEYLINKPRIMLIIDLINKTYPELPERMNOLPERMCOLINKDTLINKENDDT
COMPUSTAT global company keyPrimary link markerSecurity-level identifierLink type codeHistorical CRSP PERMNO link to COMPUSTAT recordHistorical CRSP PERMCO link to COMPUSTAT recordFirst effective date of linkLast effective date of link
10411P1LC63773523019811215.E
10411P1NR1974112919811214
10411J7LC9065552302005051620120131
10411J6NR2005042920060131
10411J2NR19940331.E
10411J4NR2002013120060131

LIID is based on Compustat’s IID. Because Compustat’s company data range extends earlier than its security data range, there are time periods during which no IID is assigned by Compustat for a GVKEY. In these cases, CRSP assigns a dummy IID ending in “X” as a placeholder in the link table. This GVKEY-dummy IID may or may not be linked to a CRSP PERMNO.

LINKPRIM is a marker item that indicates whether a GVKEY-LIID is a primary security. This marker is based on Compustat’s Primary/Joiner flag (PRIMISS). However, due to missing primary issue markers from Compustat for early history, calendar ranges of overlapping, and different treatment for US and Canadian security issues, CRSP overides Compustat’s primary issue marker in many cases. The purpose is to produce one primary security throughout the company history. “P” represents the primary security issue identified by Compustat, while “C” represents the primary security issue identified or overridden by CRSP. In most applications, we only need the primary security.

Another important item is LINKTYPE. In short, LC and LU are considered as the most accurate links. They are also the default link types used for WRDS CCM web queries. LX and LD are considered as “soft” links of low accuracy. Old merging sample codes also include LS in addition to LC and LU. But by definition below, I do not think that LS should be included.

LINKDT and LINKENDDT are straightforward. They mark the period during which the link is valid.

Please see the detailed description of each item:

ITEM NAMETYPEDESCRIPTION
GVKEYinteger, primary key (1)Compustat GVKEY
LIIDchar(3), primary key (2)Compustat IID. A dummy IID with an “X” suffix is assigned by CRSP as a placeholder if no IID is assigned by Compustat for a GVKEY in early history.
LINKDTinteger (date), primary key (3)First effective calendar date of link record range
LINKENDDTinteger (date)Last effective calendar date of link record range
LPERMNOintegerLinked CRSP PERMNO, 0 if no CRSP security link exists
LPERMCOintegerLinked CRSP PERMCO, 0 if no CRSP company link exists
LINKPRIMchar(3)Primary issue marker for the link. This marker is based on Compustat Primary/Joiner flag (PRIMISS), but may be overridden by CRSP in some cases. Values are:

P – Primary, identified by Compustat in monthly security data.

J – Joiner secondary issue of a company, identified by Compustat in monthly security data.

C – Primary, assigned by CRSP to resolve ranges of overlapping or missing primary markers from Compustat in order to produce one primary security throughout the company history.

N – Secondary, assigned by CRSP to override Compustat. Compustat allows a US security and a Canadian security issued by the same company to both be marked as Primary at the same time. For purposes of the link, CRSP allows only one primary at a time and marks the others as N.
LINKTYPEchar(3)Link type code. Each link is given a code describing the connection between the CRSP and Compustat data. Values are:

LC – Link research complete. Standard connection between databases.

LU – Unresearched link to issue by CUSIP

LX – Link to a security that trades on another exchange system not included in CRSP data.

LD – Duplicate Link to a security. Another GVKEY/IID is a better link to that CRSP record.

LN – Primary link exists but Compustat does not have prices.

LS – Link valid for this security only. Other CRSP PERMNOs with the same PERMCO will link to other GVKEYs.

NR – No link available, confirmed by research

NU – No link available, not yet confirmed

I download the link history table as of January 30, 2015. I delete records without a link found (about 67% of all records; remember the link history table is Compustat-centric and the Compustat universe is bigger than the CRSP universe). For remaining records with a link found, I present the following statistics to give you a sense of the values of LINKPRIM, LINKTYPE, and LIID. As you can see, the vast majority of the primary issue marker is identified by Compustat, and “LC” and “LU” types of links constitute about 90% of all identified links.

The merging code

You may notice the following announcement on the CCM product:

As of the February 2014 release, USEDFLAG is no longer used in the WRDS CCM web queries.  Please select LINKTYPES LC, LU, and LS for the same results. These represent the vast majority of the links between CRSP securities and Compustat companies, without introducing duplicate data.

The WRDS-created linking dataset (ccmxpf_linktable) has been deprecated. It will continue to be created for a transition period of 1 year. SAS programmers should use the Link History dataset (ccmxpf_lnkhist) from CRSP.

This suggests that many old merging codes should be updated accordingly. Based on the explanation above, the most important query filters are LINKPRIM, LINKTYPE, LINKDT and LINKENDDT. LINKPRIM is used to select only primary security. LINKTYPE is used to ensure accuracy. LINKDT and LINKENDDT are used to ensure validity of a link at a give time. I believe that the following code is better than most sample codes I have seen:

 

Finally, I acknowledge that the information mainly comes from the official CRSP/Compustat Merged Database Guide.

Posted in Learning Resources, SAS | 45 Comments

Errors in CRSP, Compustat and other popular data source

I find a web page providing references for articles that study specific aspects of CRSP, Compustat and other popular sources of data used in academic research.

https://www.kellogg.northwestern.edu/researchcomputing/crsp-cstat-references.htm

Some interesting topics include: error rates in CRSP and COMPUSTAT, differences between COMPUSTAT and CRSP SIC codes, etc. …

Posted in Learning Resources | 2 Comments

EDGAR index files in Stata dataset (from 1993 Q1 to March 2, 2017)

SEC makes all EDGAR filings publicly available. We can download all 10-Ks, 10-Qs, 8-Ks filed since 1993. However, SEC makes this far away from just a few mouse clicks (in order to reduce the server load and avoid the possible abuse I guess). To download EDGAR filings, we have to download EDGAR index files first to get the full path of each 10-K, 10-Q, 8-K, etc. We cannot download any file without the full path information. See technical details here.

I downloaded all EDGAR index files and converted them into Stata datasets. You can download here: Stata format (1993–2000); Stata format (2001–2005); Stata format (2006–2010); Stata format (2011–2015); Stata format (2016–2019/03/16).

If you want to know how I do this, please read my another blog here.

Posted in Data | 16 Comments