This is a website to showcase our final project for FIN 377 - Data Science for Finance course at Lehigh University.
This is not intended as investment advice.
This project is inspired by the research paper titled Lazy Prices, written by Lauren Cohen, Christopher Malloy, and Quoc Nguyen. The paper analyzes S&P 500 companies’ quarterly and annual SEC filings between 1995-2014 to determine significant changes in language and structure over time. They found that changes to the reports predict a pessimistic future for the company in areas like earnings, profitability, and even future firm-level bankruptcies. On the other hand, firms that do not make significant changes to their quarterly and annual reports experience positive returns over time.
The main goal of our project is to replicate the work in Lazy Prices. This will lead us to generate sell or buy signals depending on the documents’ differences or lack thereof. The original research focuses on the Management’s Discussion and Analysis (MD&A) and Risk Factors sections, which we are also investigating. Our objective is to prove that greater magnitude of changes will lead to significant underperfomance.
There are 3 main parts of our project: data grab, data clean, and data analyze.
In data grab, we use sec_edgar_downloader to load the S&P 500 companies’ 10k’s and 10q’s in 2009-2010 and 2020. For each SEC document, we create a csv file that counts the number of times each word is used, excluding the most basic “filler words”, which are ‘and’, ‘the’, ‘of’, ‘to’, ‘in’, ‘a’, ‘for’, ‘on’, ‘or’, ‘as’, ‘is’, and ‘this’.
In data clean, in order to create a cosine similarity between 2010 and 2020 for each ticker, we create vectors which are also numpy arrays. We then convert these arrays from multidimensional to two-dimensional in order to plot each vector on a graph. After that, we are able to measure the cosine similarity by passing both vectors, and the result will be a value between [0,1].
In data analyze, all of the data from data clean is pulled and aggregated to an individual dataframe stored in quintiles. The cosine similarities are compared to the companies’ respective stock returns from Yahoo Finance to determine whether we should short “changers” or buy “non-changers”.
We found that a long-short portfolio constructed using this strategy did generally generate returns of 4.92% per annum. That said, we did not observe significant market outperformance for the periods we analyzed. We list the returns we generated and market returns for the period in the table below
Note: These are listed as quarterly returns and are not an annualized figure
Quarter | Return | Market Return |
---|---|---|
Q1 2010 | 4.00% | 11.09% |
Q2 2010 | 2.42% | -6.75% |
Q3 2010 | -1.70% | 8.03% |
Q4 2010 | -2.63% | 8.36% |
Q1 2020 | 6.48% | -10.54% |
Q2 2020 | 0.63$ | 12.02% |
Q3 2020 | -4.46% | 2.27% |
Q4 2020 | 4.92% | 14.93% |
To conclude, this is a valid strategy, but back end data leakage is likely leading to weak results in our analysis. Given more time, we would implement more functionality to read filing meta data, which would allow specific names to be tied to returns starting on the day after the filing date. Additionally, there would be more put into cleaning through all the filings downloaded in the downloader file.
Jack is completing his Financial Engineering Master’s degree at Lehigh University in May 2022 and will be joining Millennium as a Fixed Income Quantitative Analyst in New York City in June.
Noah is in his senior year of undergrad at Lehigh University. Following his graduation in May 2022, he will be attending UCLA’s Master’s of Financial Engineering program.
Matt is finishing up his Master’s of Business Administration degree with a concentration in Business Analytics at Lehigh University in August 2022. Following his program, he will be joining PwC as a mergers and acquisitions consultant in New York City.
Sherzod is completing his Master’s of Business Administration degree with a concentration in Business Analytics at Lehigh University in August 2022. He has experience in Uzbekistan as a Financial Analyst at VEON.
To view the GitHub repo for this website, click here.