{"id":811,"date":"2017-04-23T00:36:16","date_gmt":"2017-04-23T04:36:16","guid":{"rendered":"http:\/\/www.kaikaichen.com\/?p=811"},"modified":"2021-05-24T20:55:32","modified_gmt":"2021-05-25T00:55:32","slug":"use-python-to-download-dtcc-data","status":"publish","type":"post","link":"https:\/\/www.kaichen.work\/?p=811","title":{"rendered":"Use Python to download data from the DTCC\u2019s Swap Data Repository"},"content":{"rendered":"<p>I helped my friend to download data from the\u00a0DTCC\u2019s Swap Data Repository. I am not familiar with the data and just use this as a programming practice.<\/p>\n<p>This article gives an introduction to the origin of the data: <a href=\"http:\/\/www.dtcc.com\/news\/2013\/january\/03\/swap-data-repository-real-time\" target=\"_blank\" rel=\"noopener noreferrer\">http:\/\/www.dtcc.com\/news\/2013\/january\/03\/swap-data-repository-real-time<\/a><\/p>\n<p>The Python script will:<\/p>\n<ol>\n<li>download the daily Credit zip files; and<\/li>\n<li>extract CSV from individual zip files and combine the content\u00a0into a single huge CSV (size 220MB), which then can be imported into Stata or other statistical package.<\/li>\n<\/ol>\n<p>As of April 22, 2016, there were around one million historical records. The data seems available from April 6, 2013 and missing sporadically from then on. The Python script will print the bad dates where\u00a0the daily data is not available.<\/p>\n<pre class=\"lang:python decode:true \">import io\r\nimport zipfile\r\nfrom datetime import date\r\n\r\nimport pandas as pd\r\nimport requests\r\n\r\nstart = date(2013, 1, 1)\r\nend = date.today()\r\n\r\nurls = []\r\n\r\nfor i in range(start.toordinal(), end.toordinal()):\r\n    datestr = date.fromordinal(i).isoformat().replace('-', '_')\r\n    url = ('https:\/\/kgc0418-tdw-data2-0.s3.amazonaws.com\/slices\/CUMULATIVE_CREDITS_' + datestr + '.zip',\r\n           'CUMULATIVE_CREDITS_' + datestr + '.zip')\r\n    urls.append(url)\r\n\r\nbadurls = []\r\n\r\ndf = pd.DataFrame()\r\n\r\nfor url in urls:\r\n    request = requests.get(url[0])\r\n    if not zipfile.is_zipfile(io.BytesIO(request.content)):\r\n        print(url[1], 'is non-existent!')\r\n        badurls.append(url)\r\n    else:\r\n        with open(url[1], 'wb') as f:\r\n            f.write(request.content)\r\n        print(url[1], 'downloaded!')\r\n        z = zipfile.ZipFile(io.BytesIO(request.content))\r\n        df_ = pd.read_csv(z.open(z.namelist()[0]))\r\n        df_['DATE'] = url[1][19:29]\r\n        df = df.append(df_, ignore_index=True)\r\n\r\ndf.to_csv('dtcc.csv')\r\n\r\nprint(badurls)<\/pre>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I helped my friend to download data from the\u00a0DTCC\u2019s Swap Data Repository. I am not familiar with the data and just use this as a programming practice. This article gives an introduction to the origin of the data: http:\/\/www.dtcc.com\/news\/2013\/january\/03\/swap-data-repository-real-time The &hellip; <a href=\"https:\/\/www.kaichen.work\/?p=811\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7,10],"tags":[],"_links":{"self":[{"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/posts\/811"}],"collection":[{"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=811"}],"version-history":[{"count":10,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/posts\/811\/revisions"}],"predecessor-version":[{"id":1494,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=\/wp\/v2\/posts\/811\/revisions\/1494"}],"wp:attachment":[{"href":"https:\/\/www.kaichen.work\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=811"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=811"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kaichen.work\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=811"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}