To enable a more automated approach to gathering information about public companies, a set of utilities (mkdbcache.py and falsh.py) and a RESTful service (edgar_scv.py) have been created. This initial focus is related to the SEC's EDGAR repository, which keeps data on public companies who are listed on stock exchanges in the United states. Over time additions, in the form of other open repositories like what's available in the UK, will be considered.
The intention of these utilities, and the associated RESTful service, is to enable open access, Apache Software Foundation V2 License, to open data maintained by the SEC. There are certainly other API overlays available, at a cost, and given some research into these other tools/services the best alternative is/was to open source this set of tools. We hope that you will review and potentially work with us a we improve this tool set over time. There are certainly a few things we'll be working on soonish as we pull this tool into our developing application. Notably, here are the things that are likelly to be worked:
- Ensure that this will properly run in docker using docker-compose
- Beef up the installation and configuration documentation so anyone can partake
- Enable TLS for all of the communications with the service
- Get NGINX working properly, it is broken now
- Create some super simple wrapping scripts to operationalize the service
- Have fun refactoring the code, and potentially invite other developers in to particpate
Since this code falls under a liberal ASF license, there is no warranty or guarantee of support. Additionally, since the development work we're pursuing isn't yet needing to pull data in from this service this code has been "hibernating" a bit. This is a nice way of saying it may not work, and we're not yet in a position to improve/fix the code base. However if you get it running please let us know, and when we come back here we will endeavor to update this README accordingly. So stay tuned...
- PyEdgar - used to interface with the SEC's EDGAR repository
- SQLite - helps all utilities and the RESTful service quickly and expressively respond to interactions with the other elements to find appropriate company data
- Flask and associated utilities - used to realize the RESTful service
- nginx - enables hosting of the RESTful service
- mkdbcache.py - through PyEdgar interacts withe SEC EDGAR repository, and generates a SQLite database cache file which can be used by other utilities
- falsh.py - allows a command shell interaction with the SQLite database cache file enabling from simple to expressive queries.
- cutils.py - a set of common helper functions and utilities used by other functions
- edgar_svc.py - implements a RESTful service with 3 API calls available to ensure that an eventual end user can find information about a potential company they are interested in.
usage: mkdbcache.py [-h] [--cleanall] [--cleandb] [--cleancache] [--getmaster]
[--year Y] [--verbose {50,40,30,20,10}]
A utility to create a db cache for select SEC edgar data.
optional arguments:
-h, --help show this help message and exit
--cleanall, -a Clean up the cache files and db cache and exit.
--cleandb, -d Clean up the db cache only and exit.
--cleancache, -c Clean up the cache files only and exit.
--getmaster, -g Get the master.gz files only and exit.
--year Y, -y Y Define the year to start from, defaults to 2010.
--verbose {50,40,30,20,10}, -v {50,40,30,20,10}
Set the logging verbosity.
falsh> help
Documented commands (type help <topic>):
________________________________________
exit get10kurl getfilings getheaders help
falsh> help getfilings
getcompanies [company name OR string]
Query either a company name or partial name.
falsh> help get10kurl
get10kurl [https://...html]
Retrieve and print the URL for the 10k filing taking in the
URL from the 10k index HTML file.
falsh> help getheaders
getheaders [cik:accession]
Retrieve and print the headers for the document described by
CIK:Accession.
falsh>
Example 'getfilings' query and result for the company "google".
falsh> getfilings google
{1288776: {'10-K': [[2,
11,
2016,
'0001652044-16-000012',
'https://www.sec.gov/Archives/edgar/data/1288776/000165204416000012/',
'https://www.sec.gov/Archives/edgar/data/1288776/000165204416000012/0001652044-16-000012-index.html']],
'10-K/A': [[3,
29,
2016,
'0001193125-16-520367',
'https://www.sec.gov/Archives/edgar/data/1288776/000119312516520367/',
'https://www.sec.gov/Archives/edgar/data/1288776/000119312516520367/0001193125-16-520367-index.html']],
'name': 'GOOGLE INC.'}}
Example 'getheaders' query for the 2016 10-K filing.
falsh> getheaders 1288776:0001652044-16-000012
{'<sec-document>0001652044-16-000012.txt': '20160211',
'<sec-header>0001652044-16-000012.hdr.sgml': '20160211',
'accession-number': '0001652044-16-000012',
'business-phone': '650-253-0000',
'central-index-key': '0001652044',
'city': 'MOUNTAIN VIEW',
'company-conformed-name': 'Alphabet Inc.',
'conformed-period-of-report': '20151231',
'conformed-submission-type': '10-K',
'date-as-of-change': '20160211',
'date-of-name-change': '20040428',
'filed-as-of-date': '20160211',
'filer': {'business-address': {'business-phone': '650-253-0000',
'city': 'MOUNTAIN VIEW',
'state': 'CA',
'street1': '1600 AMPHITHEATRE PARKWAY',
'zip': '94043'},
'company-data': {'central-index-key': '0001652044',
'company-conformed-name': 'Alphabet Inc.',
'fiscal-year-end': '1231',
'irs-number': '611767919',
'standard-industrial-classification': 'SERVICES-COMPUTER '
'PROGRAMMING, '
'DATA '
'PROCESSING, '
'ETC. [7370]',
'state-of-incorporation': 'DE'},
'filing-values': {'film-number': '161412149',
'form-type': '10-K',
'sec-act': '1934 Act',
'sec-file-number': '001-37580'},
'mail-address': {'city': 'MOUNTAIN VIEW',
'state': 'CA',
'street1': '1600 AMPHITHEATRE PARKWAY',
'zip': '94043'}},
'filer_0': {'business-address': {'business-phone': '650 253-0000',
'city': 'MOUNTAIN VIEW',
'state': 'CA',
'street1': '1600 AMPHITHEATRE PARKWAY',
'zip': '94043'},
'company-data': {'central-index-key': '0001288776',
'company-conformed-name': 'GOOGLE INC.',
'fiscal-year-end': '1231',
'irs-number': '770493581',
'standard-industrial-classification': 'SERVICES-COMPUTER '
'PROGRAMMING, '
'DATA '
'PROCESSING, '
'ETC. '
'[7370]',
'state-of-incorporation': 'DE'},
'filing-values': {'film-number': '161412150',
'form-type': '10-K',
'sec-act': '1934 Act',
'sec-file-number': '001-36380'},
'former-company': {'date-of-name-change': '20040428',
'former-conformed-name': 'Google Inc.'},
'mail-address': {'city': 'MOUNTAIN VIEW',
'state': 'CA',
'street1': '1600 AMPHITHEATRE PARKWAY',
'zip': '94043'}},
'film-number': '161412149',
'fiscal-year-end': '1231',
'flat': False,
'form-type': '10-K',
'former-conformed-name': 'Google Inc.',
'irs-number': '611767919',
'public-document-count': '119',
'sec-act': '1934 Act',
'sec-file-number': '001-37580',
'standard-industrial-classification': 'SERVICES-COMPUTER PROGRAMMING, DATA '
'PROCESSING, ETC. [7370]',
'state': 'CA',
'state-of-incorporation': 'DE',
'street1': '1600 AMPHITHEATRE PARKWAY',
'zip': '94043'}
Example response for 'get10kurl'.
falsh> get10kurl https://www.sec.gov/Archives/edgar/data/51143/000104746919000712/0001047469-19-000712-index.html
https://www.sec.gov/Archives/edgar/data/51143/000104746919000712/a2237254z10-k.htm