{"id":1522,"date":"2021-08-30T14:56:20","date_gmt":"2021-08-30T18:56:20","guid":{"rendered":"http:\/\/www.kaichen.work\/?p=1522"},"modified":"2023-08-01T20:44:20","modified_gmt":"2023-08-02T00:44:20","slug":"use-stata-to-do-propensity-score-matching","status":"publish","type":"post","link":"https:\/\/www.kaichen.work\/?p=1522","title":{"rendered":"Stata command to perform propensity score matching (PSM)"},"content":{"rendered":"<p>Most propensity score matching (PSM) examples typically use cross-sectional data rather than panel data. However, in accounting research, panel data (observations with two subscripts <em>i<\/em> and <em>t<\/em>, e.g., firm-years) are often used in a difference-in-differences (DID) research design. This involves two dummy variables, <code>TREATMENT<\/code> and <code>POST<\/code>\u00a0, in the following regression:<\/p>\n<p><em>Outcome = TREATMENT + POST + TREATMENT * POST<\/em><\/p>\n<p>where <code>TREATMENT<\/code> indicates a treatment event and <code>POST<\/code> indicates before or after that event. In this context, it is common to perform one-to-one matching using selected <strong>pre-event<\/strong> and <strong>firm-level<\/strong> variables (<code>X<\/code>s). These pre-event variables can be measured either at the most recent date before the event (e.g., total assets at the most recent quarter end before the event) or as an average over the pre-event period (e.g., average total assets in the four quarters preceding the event).<\/p>\n<p>To conduct PSM, a probit or logit regression is needed:<\/p>\n<p><em>TREATMENT = X1 + X2 + &#8230;<\/em><\/p>\n<p>The single nearest neighbour based on propensity score is selected as the matched control observation. The treatment observations and their respective matched control observations then form the sample for subsequent DID regressions.<\/p>\n<p>In Stata, the third-party module <code>psmatch2<\/code> is commonly used to find matched control observations using PSM. To install the module, the following command can be used:<\/p>\n<p><code>ssc install psmatch2<\/code><\/p>\n<p>Once installed, the following command is typically used:<\/p>\n<p><code>psmatch2 TREATMENT X1 X2 ..., [noreplacement logit descending]<\/code><\/p>\n<p>There are three options in the above command:<\/p>\n<ul>\n<li><code>noreplacement<\/code> \u2013 Perform one-to-one matching without replacement. I would add this option if I want to find more unique matches.<\/li>\n<li><code>logit<\/code> \u2013 Uses logit instead of the default probit regression to estimate the propensity score. I would be indifference between using logit and probit.<\/li>\n<li><code>descending<\/code> \u2013 More details about this option can be found in <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3873103\/\" target=\"_blank\" rel=\"noopener\">Lunt (2014)<\/a>. The author concludes that &#8220;in the absence of a caliper (another option that I would omit to maximize the number of matches), the descending method provides the best matches, particularly when there is a large separation between exposed (treated) and unexposed (untreated) subjects.&#8221; Therefore, I would add this option.<\/li>\n<\/ul>\n<p><code>psmatch2<\/code> creates several variables, with <code>_id<\/code> and <code>_n1<\/code> being the most useful for subsequent DID regressions:<\/p>\n<ul>\n<li><code>_id<\/code> is a new identifier created for all observations in the case of one-to-one and nearest-neighbors matching.<\/li>\n<li><code>_n1<\/code> stores the new identifier (<code>_id<\/code>) of the matched control observation for every treatment observation.<\/li>\n<\/ul>\n<p>There is one limitation with <code>psmatch2<\/code>. Sometimes, we may want the treatment and its matched control to have the same value on a variable <code>X<\/code>. For example, we may want the treatment and its matched control to be drawn from the same industry, or both to be male or female. <code>psmatch2<\/code> lacks a direct solution for this requirement. Some imperfect workarounds, such as adding <code>i.industry<\/code> or <code>i.gender<\/code> in <code>X<\/code>s, are discussed in <a href=\"https:\/\/www.stata.com\/statalist\/archive\/2012-08\/msg00531.html\" target=\"_blank\" rel=\"noopener\">this post<\/a>. In contrast, the <code>PSMATCH<\/code> procedure in SAS provides a perfect solution by offering the <code>EXACT=<\/code> statement. I am not sure if SAS achieves this by implementing a stratification method, but if it does, it is possible that <code>psmatch2<\/code> in Stata could achieve similar results by tweaking its options. More details on the <code>PSMATCH<\/code> procedure in SAS can be found in <a href=\"https:\/\/support.sas.com\/documentation\/onlinedoc\/stat\/142\/psmatch.pdf\" target=\"_blank\" rel=\"noopener\">this manual<\/a>.<\/p>\n<p>It is worth noting that that <code>psmatch2<\/code> is preferable to Stata&#8217;s built-in command <code>teffects<\/code> because the variables generated by <code>psmatch2<\/code> (particularly <code>_id<\/code> and <code>_n1<\/code>) are necessary for subsequent DID regressions, whereas <code>teffects<\/code> does not return such variables.<\/p>\n<p>This article aims to provide a quick how-to and may omit some necessary steps for PSM, such as assessing covariate balance. A more rigorous discussion on PSM in accounting research can be found in <a href=\"https:\/\/doi.org\/10.2308\/accr-51449\" target=\"_blank\" rel=\"noopener\">Shipman, Swanquist, and Whited (2017)<\/a>.<\/p>\n<p>I would like to express my gratitude to the authors of the following articles that have been beneficial in preparing this post:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.ssc.wisc.edu\/sscc\/pubs\/stata_psmatch.htm\" target=\"_blank\" rel=\"noopener\">Propensity Score Matching in Stata using teffects<\/a><\/li>\n<li><a href=\"https:\/\/rstudio-pubs-static.s3.amazonaws.com\/365980_0961bedca33748d0a707740b0fde2e93.html\" target=\"_blank\" rel=\"noopener\">Propensity score matching in Stata<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Most propensity score matching (PSM) examples typically use cross-sectional data rather than panel data. However, in accounting research, panel data (observations with two subscripts i and t, e.g., firm-years) are often used in a difference-in-differences (DID) research design. This involves &hellip; <a href=\"https:\/\/www.kaichen.work\/?p=1522\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[17],"_links":{"self":[{"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/posts\/1522"}],"collection":[{"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1522"}],"version-history":[{"count":27,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/posts\/1522\/revisions"}],"predecessor-version":[{"id":2170,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/posts\/1522\/revisions\/2170"}],"wp:attachment":[{"href":"https:\/\/www.kaichen.work\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1522"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1522"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1522"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}