.chapter28<-function(i=0){ " i Chapter 28: Term Projects i Projects - ------------------------------ -- ----------------------------- 1 Requirements 21 KMV model and default probability 2 Retirement calculator 22 Financial statement analysis 3 Best one:CAPM, FF3, FFC4, or FF5? 23 Black-Litterman model 4 Test of the January Effect 24 Brandt, Santa-Clara, Valkanov Model (2009) 5 Bankruptcy prediction: Z-score 25 Exploring the TORQ database 6 Updating a monthly data set 26 SEC filings (dealing with index files) 7 Momentum trading strategy 27 R package called Rattle 8 52-week high trading strategy 28 SEC 10-K: BS, IS or CF 9 Max trading strategy 29 SEC 10-K (Forms 3, 4 and 5) 10 Spread from daily price 30 SEC 10-K (13-f) 11 Event study using R 31 SEC Mutual Fund Prospectus 12 Monte Carlo: a slot machine 32 Census Summary Form 1 (SF1) 13 Monte Carlo: Black Jack 33 Census Summary Form 2 (SF2) 14 Benford Law and accounting fraud 34 Census Demographic profile 15 Readability of 10-K filings 35 Census Redistribution `16 Business cycle indicator 36 Census Congressional Districts 113 17 illiquidity, Amihud(2002) 37 Census Congressional Districts 115 18 Liquidity, Pastor/Stambough(2003) 38 SCF (Survey of Consumer Finance) 19 Spread estimation from TAQ 39 Supporting data sets and codes 20 A reverse mortgage calculator 40 Topics taken already (updated on) Example #1:>.c28 # see the above list Example #2:>.c28() # the same as the above Example #3:>.c28(1) # see the first explanation ";.zchapter28(i)} .c30<<-.c31<-"There are only 28 chapters." .n28chapter<-40 .zchapter28<-function(i){ if(i==0){ print(.c28) }else{ .printEachQ(28,i,.n28chapter) } } .c28<-.chapter28 .termProjects<-.c28 .tp<-.c28 .C28EXPLAIN1<-"Requirement of a term project ////////////////////////////////////// Objective: This is an integral part of this course. It could be viewed as the application of what you have learnt from this course to a real-world situation. Format: Group project (each group could have up to three members) Topic:The 1st type (from my list) 1) theory and background of the topic, 2) R programs with a short explanation of the codes, 3) final data set (plus the codes to process the data, the source of raw data) Note: please do not send me your raw data. The 2nd type of projects is to study one R package. 1) why this package is useful 2) a summary of most important included functions 3) examples to use them The 3rd type of topics is to generate 20 data sets 1) why those data sets are important 2) how to retrieve those data sets efficiently 3) applications of those data sets Each group chooses one topic from a list of potential term projects (first come and first served since each topic should be chosed by one group). Three files: Each group should submit three files a) A text file contains your program, final result b) final data set(s) c) a short report (maximum page limit: 15, double space, font of 11) d) PowerPoint file Dropbox : submit your files to the dropbox on UBlearns Presentation: Each group would present their term project in front of the whole class Due date:if you want my comments, you should submit your files before your presentation. If not, you could submit your files after your presentation. ////////////////////////// " .C28EXPLAIN2<-"Retirement calculator ////////////////////////////////////// Source: http://money.cnn.com/calculator/retirement/retirement-need/ Step 1: estimate John Doe s final annual salary when he retires Input variables: a) current salary b) salary growth rate (factor in the inflation rate) c) number of years before his retirement For example, if John is 35 year-old and earning $50,000 now. If he plans to retire at 67, his final annual salary will be 50000*(1+g)(67-35), where g is the annual salary growth rate. Step 2: Estimate the required annual cash inflow for the first retirement year. For example, we could assume that the expected cash inflow for the first year after retirement is 80% or 85% of his/her last annual salary. Step 3: estimate how many years after a person s retirement. For instance, this value is 25 if John s life expectancy is 92 and retries at 67 (92-67). Step 4: estimate the present value, at the time he retires, of a growing annuity Input values: the 1st cash flow, a growth rate and an appropriate discount rate Step 5: Factor in the social security benefit (this could be another data case) Estimate the present value, at time of your retirement, of your Social Security benefit Input values: Monthly benefit Discount rate Step 6: John s net required cumulative wealth when he retires (the result of Step 4 minus the result of Step 5) Step 7: estimate John s required saving from now to that year. Such as the annual saving or percentage saving Primary Insurance Amount https://www.ssa.gov/oact/COLA/piaformula.html ////////////////////////// " .C28EXPLAIN3<-"Which one is the best? CAPM, FF3, FFC4, or FF5 ////////////////////////// Objectives of this term project 1) understand different models: CAPM, FF3, FF4 and FF5 2) Understand how download and process data 3) Understand the T-value F-values and adjusted R2 a) CAPM: R(IBM) = Rf + beta*(Rm - Rf) (1) where R(IBM) is the IBM's mean return or expected return Rf is the risk-free rate Rm is the market mean return or expected return b) FF3 Fama-French 3-factor model: R(IBM) = Rf + beta1*(Rm - Rf)+beta2*SMB + beta3*HML (2) where SMB is small minus big, HML is high book-to-market ratio portfolio minus low ratio portfolio c) FF4 is the Fama-French-Carhart 4 factor model R(IBM) = ff3 + beta4*MOM (3) Where MOM is momentum factor d) FF5 is the Fama-French 5-factor model: R(IBM) = ff3 + beta4**RMW + beta5*CMA (4) where RMW is Robust minus weak, CAM is conservative minus Aggressive Three questions: 1) Which criterion? 2) The performance is time-period independent? 3) In-sample estimation vs. out sample prediction We use the adjusted R2 as our criterion to measure the performance of each model. Step 1: download monthly price data from Yahoo!Finance Step 2: choose a period to run various models Step 3: summarize your testing results (sample statistics) Step 4: (Optional: out-of-sample prediction) source of data: 1) CRSP monthly data 2) http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html ////////////////////////// " .C28EXPLAIN4<-"Test of the January Effect using Excel ////////////////////////// If the Efficient Market Hypothesis (EMH) holds, we should not expect many market anomalies such as January Effect, Weekday Effect, momentum strategy (buy winners and sell losers). However, many researchers and professionals have found that returns in January are quite different from other months. Question: Are January returns statistically different from other months? mean return for January = mean return of none-January (1) Choose about a dozen stocks to test the existence of so-called January effect. A few companies are listed below. Note that S&P500 is listed as well. Step 1: For one give ticker (or PERMNO) retrieve return data from CRSP monthly data set Step 2: Sort those monthly returns into two groups: returns in January and returns in other months. Step 3: For each stock/index, test whether its two means are equal, see the above equation. Step 4: repeat the above procedure for 100 stocks. Comments on your results. ////////////////////////// " .C28EXPLAIN5<-"Bankruptcy prediction by using Z-score ////////////////////////// The Altman s Z score is used to predict the possibility of a firm goes to bankruptcy. This score is a weighted average of 5 ratios based on a firm s balance sheet and income statement. For public firms, Altman (1968) offers the following formula. Z=3.3*X1+0.99*X2+0.6*X3+1.2*X4+1.4*X5, (1) where the definitions of X1,X2,X3,X4 and X5 are given in the following table. Variable Definition -- ------------- X1 EBIT/Total Assets X2 Net Sales/Total Assets X3 Market Value of Equity/Total Liabilities X4 Working Capital/Total Assets X5 Retained Earnings/Total Assets Based on the ranges of z-scores, we could classify public firms into following 4 categories. Eidlenan (1995) finds that the Z score correctly predicted 72% of bankruptcies two years prior to the event. Z-score range Description ------------- -------------- > 3.0 Safe 2.7 to 2.99 On Alert. 1.8 to 2.7 Good chances of going bankrupt within 2 years. < 1.80 Probability of Financial distress is very high References Altman, Edward I.,2000,Predicting Financial Distress of Companies, Retrieved on September 4th, 2009 from http://pages.stern.nyu.edu/~ealtman/Zscores.pdf Altman, Edward I,1968,Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy, Journal of Finance,189 209. Eidleman, Gregory J.,1995,Z-Scores A Guide to Failure Prediction, The CPA Journal Online, https://www.easycalculation.com/statistics/altman-z-score.php ////////////////////////// " .C28EXPLAIN6<-"Updating a monthly Excel data set and write an instruction ////////////////////////// First, let's download the Excel data set http://canisius.edu/~yany/data/monthlyYan.xlsx The structure of this date set is very simple: Three columns: ID, Date and Value, see the first selvera lines below. ID date value -- --------- ------ A 11/30/1999 38.96 A 12/31/1999 71.39 A 1/31/2000 61.12 A 2/29/2000 95.91 A 3/31/2000 96.03 A 4/28/2000 81.83 A 5/31/2000 67.98 A 6/30/2000 68.1 A 7/31/2000 37.63 A 8/31/2000 56.33 Note (1) The frequency of the data set is monthly. (2) \"A\" is the stock ticker (3) For stocks, the last column called value is the monthly adjusted price. (3) for SMB (Fama-French factor), value is for factor, i.e., return Do the following things: (a) find out all unique ID's (b) update the data set (c) write a 2-page manual on how to use this data set i) how to estimate monthly returns ii) how to estimate annual returns iii) how to generate a n-stock matrix, such as 5-stocks matrix ////////////////////////// " .C28EXPLAIN7<-"Momentum trading strategy ////////////////////////// One phrase summary: buy winners and sell losers. Implied assumption: within a short-term (between 3 months and 12 months), the winner will remain a winner while a loser would continue to be a loser. Two related questions: ---------------------- 1) how to define a winner from a loser? 2) how to conduct a test? Objectives of this term project: 1) learn CRSP monthly data set 2) learn how to use R to test the trading strategy Source of data:CRSP monthly data Basic logic: According to Jegadeesh and Titman (1993) it is a profitable trading strategy if we buy the past winners and sell the past losers. Notations: Check the past K-month returns, and then form a portfolio for L months, Where K=3,6,9 and 12 and L=3, 6, 9 and 12. Below we use K=L=6 as an example. Trading strategy: Estimate all stocks past 6- month returns and sort stocks into 10 groups (deciles) according to their 6-month total returns. Long the top decile (winners) and short the bottom decile (losers) for the next 6 months. Procedure: Step 0: Starting month: January 1965 Step 1: Retrieve CRSP data (PERMNO, DATE and RET) Step 2: Estimate past 6-month cumulative returns R_t^6month Step 3: Sort all stocks into deciles according to their cumulative 6-month returns Step 4: Long winners (best return group) and short losers for the next 6-month Step 5: Estimate portfolio returns Step 6: Move to the next month and repeat the above steps until 12/1989 References ---------------------- Jegadeesh Narasimhan and Sheridan Titman, 1993, Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency, Journal of Finance 48 (1), 65-91. http://canisius.edu/~yany/doc/momentumJF1993.pdf Appendix A: Table 1 from Jegadeesh and Titman (1993). http://canisius.edu/~yany/doc/momentumTable1.pdf http://canisius.edu/~yany/doc/momentumTable1.png ////////////////////////// " .C28EXPLAIN8<-"Replicate 52-week high trading strategy ////////////////////////////////////// George and Huang (2004) show that we could design a profitable trading strategy based on the 52-Week High. First, they estimate a ratio by dividing today s price by its 52-week high. Based on such a ratio, all stocks are sorted from the highest to the lowest. The stocks belong to the top (bottom) 30% are labeled as winners (losers). Again, the trading strategy is to buy winners and sell losers. They demonstrate that such a trading strategy is quite profitable with an average return difference of 0.45% per month between the winner and loser portfolios. Objectives of this term project: 1) Understand how to download daily data from Yahoo!Finance [see the versions #1 or #2 below] 2) Understand how to use R to process data 3) Confirm or reject the so-called 52-week High trading strategy Time period: as long as possible [versions #1 or #2] July 1963 to December 2018 [Version #3] Basic logic: According to George and Huang (2004) it is a profitable trading strategy if we based on the ratio of the current stock price divided by its 52-week High Trading strategy: Estimate all stocks 52-week high, estimate the ratio of today s price over its 52-week High, sort them from the highest to the lowest. Treat the top 30% as winners and bottom 30% as losers. Buy winner and sell losers. Procedure for version #1: Step 0: formulate your trading strategy: for example, ratio > 0.8 you buy price - 52wLow ratio < 0.3 you sell ratio = ------------------ (52wHigh - 52wlow) Procedure for Version #1: Step 1: Download one stock from Yahoo!finance [choose max for the time period] Step 2: Estimate returns Step 3: Sort data from the earliest to the latest Step 4: Starting from observation # 253, estimate 52wHigh and 52wLow Step 5: Calculate the ratio Step 6: Based on your trading strategy, long or short the stock for the next period Step 7: Generate a column for returns for this trading strategy Step 8: Test whether this is a profitable trading strategy (benchmark is the long-only trading strategy) Procedure for versions #2 and #3(CRSP data): Step 1: Load data sets stockDaily and stockMonthly Step 2: Starting month: July 1963 Step 3: Estimate all stocks 52-week High and estimate the ratio Price/52-week high Step 4: Sort all stocks from highest to lowest Step 5: Choose top 30% as winners and bottom 30% as losers Step 6: Estimate equal-weighted portfolios return for both winner and looser portfolios Step 7: Move to the next month and repeat the above steps until the last month (December 2001) Step 8: Conduct a test George, Thomas J, and Chuan-Yang Huang, 2004, The 52-week High and Momentum Investing, Journal of Finance 54, 5, 2145-2176. ////////////////////////////////////// " .C28EXPLAIN9<-"Replicate a so-called Max trading strategy ////////////////////////////////////// Bali, Cakici and Whitelaw (2011) find that sorting stocks by their maximum daily returns (MAX) in the previous month could produce a monthly return difference of more than 1% between the lowest and highest MAX deciles. In addition, the alphas from running Fama-French-Carhart 4-factor model for those two extreme portfolios are significantly different. Thus, we could design a profitable trading strategies based on stocks last month extreme daily returns. Cources of data: stockDaily and stockMonthly.RData from CRSP Objectives of this term project (version #1): 1) Understand how to download data from Yahoo!Finance 2) Prove or disapprove so-called the max-trading strategy Objectives of this term project (version s #2 and #3): 1) Understand the CRSP database 2) Understand how to use Excel or R to retrieve and process data 3) Prove or disapprove so-called the max-trading strategy by replicating Table 1 of Bali et al. (2011). Basic logic: According to Bali et al. (2011) some investors like stocks with lottery-type payoffs which have big past returns with a small probability. Period: July 1962 to December 2005 Trading strategy: Estimate all stocks maximum returns in the last month, sort stocks into 10 groups (deciles) according to their last month s maximum daily returns. Long the top decile (winners) and short the bottom decile (losers) for one month. Procedure (for versions #1): Step 0: choose your trading strategy, e.g., if ratio > 0.9 long if ratio < 0.1 short Step 1: download daily data for one stock Step 2: Estimate the maximum daily returns of the previous month Step 3: based on your trading strategy, long or short Step 4: repeat until the last month Step 6: test Procedure (for versions #3): Step 0: Starting month: July 1962 Step 1: load stockDaily Step 2: Estimate the maximum daily returns of the previous month, i.e., May 1962 Step 3: Sort all stocks into deciles according to their maximum last month daily returns Step 4: Long the top 10% and short the bottom 10% Step 5: load stockMonthly and estimate portfolio returns and their difference Step 6: Move to the next month and repeat the above steps until the last month (December, 2005) References Bali, Turan G., Nusret Cakici, and Robert F. Whitelaw, 2011, Maxing Out: Stocks as Lotteries and the Cross-Section of Expected Returns, Journal of Financial Economics 99 427-446. ////////////////////////////////////// " .C28EXPLAIN10<-"Spread estimation from daily price ////////////////////////////////////// Spread is defined as the difference between ask and bid Generally, the difference between two prices or interest rates. In stock trading, the difference between the current bid and ask prices for a stock (the bid/ask or bid/offer spread). In futures trading, the price difference between delivery months for the same commodity or asset. In bond trading, the difference between yields of bonds with similar quality and different maturities, or of different quality and the same maturity. In underwriting, the difference between what the issuer receives from the underwriter and what the underwriter receives from the public (underwriting spread). http://lexicon.ft.com/Term?term=spread Roll (1984) designs a method to estimate the spreads by using the first order covariance of price changes. S=2 *sqrt(-cov(A, B) ) (1) where A= deltaP(t-1) B= deltaPP(t) Objectives for this term-project 1) understand how to download and process daily data from Yahoo!finance 2) understand the logic behind the above formula 3) estimate Roll's spread for a dozen stocks 4) comment on your results Source of data: CRSP ////////////////////////////////////// " .C28EXPLAIN11<-"Event Study using R ////////////////////////////////////// One example: testing the impact of HSIC added to the S&P500 on March 18, 2015. The basic idea for Event Study is to test whether our AR (Abnormal Return) is statistically significant. The definition of abnormal return is given below. AP=realized return-expected return (1) To estimate our expected return, we apply the following linear regression. y = alpha + beta* x (2) where, y is the expected return and x is the market return on that day. To estimate two parameters, a and , we run a linear regression or apply related formulae by choosing an evaluation period of 252-day long, starting the day before our event window counting backward, see below. Estimation period Event-window |-------------------------------|-------------|----------| n days before event-day m-day after Here is the design. i) The event day is 3/18/2015 ii) Our event window: 10 days before and 10 days after iii) The estimation period: from 253 days before to one-day before our event window Thus, roughly we could download daily data from 2/1/2014 to 4/22/2015. Step 1: download daily price data from HSIC and S&P500 (^GSPC). Step 2: Choose only adjusted price, see the left panel below. Then sort data from oldest to the latest, see the right panel below. Step 3: estimate daily returns, see the formula in D3. Step 4: Highlight our event day, a window around the event and the estimation period For example, we could use red color for event day, 10-day before and 10-day after our event day. In addition, we could highlight our estimation period green. Step 5: Based on the estimation period, apply following formulae to estimate intercept, slope, R2 and standard error assume B column is for stock returns, C column is S&P500 returns i) intercept =intercept(B,C) ii) slope =slope (B,C) iii) R2 =rsq (B,C) iv) standard error =steyx (B,C) Step 6: Estimate, expected return, AR (abnormal return), CAP (cumulative abnormal return) and T-AP (T-value for abnormal return) Expected return AP CAR T-value for AR -------------- -- --- -------------- Expected return = intercept + slope * market AR (abnormal return) = realized return expected return CAR (cumulative abnormal return) = sum of all ARs up to today T-value for AR = AR/standard error Note: to make our spread sheet clean, we have two choices: i) Hide many rows ii) Copy above four output values to a place near our event window Comment on your results. ////////////////////////////////////// " .C28EXPLAIN12<-"Monte Carlo Simulation to mimic a slot machine ////////////////////////////////////// Objectives: 1) understand related statistics 2) apply the Excel randbetween() function 3) learn to link picture to a cell and 4) using the vlookup() function to search a table of pictures Task #1: A simple case with just three numbers Assume that we have three objects: apple, banana and eggplant, see below. We enter three numbers and try to output three corresponding fruits by using the Excel vlookup() function. Q1: What is the probability of winning, defined as matching three? Q2: Assume the cost of one play is $1, what is the winning price if this is a fair game ? Q3: What is the expected value, if the cost of one play is $1 and our winning price is $7? Task #2: Design a slot machine with 3 objects with pictures. Step 1: generate the following entries. Below, we use C16 for apple as an example. Searching online to find a apple image. Right click the picture of apple, then choose format picture . Step 2: we manually enter three numbers in cells B3, C3 and D3, Our objective is to search our picture table (fruit pictures) to output corresponding three fruits. In this case, we expect to see apple, apple and banana. Step 3: Click cell C16 (not apple but the cell), copy, then select our destination cell, i.e., F3, then from Paste link to choose \"Lined Picture (I)\", see the right image below. Step 4: Click \"Formula\", \"Define Name\", see below, where X will be our image column, i.e., C16:C18, Y is our indicator, B3, Z is our number columns, i.e., B16:B18. Below, we define a name called firstNumber. Step 5: Click picture in F3 and we replace =$C$16 with =firstNumber (or other name you defined), see the right image above. Repeat the same procedure for other two cells. Task 3: Build a slot machine with 10 different fruits and assume that the machine would have a slight advantage to the owner of the machine, such as for 1 million plays, the casino would have a profit of $100. Q4: What is the winning price if we have three same pictures? Q5: What is your result after playing 100 times? References http://en.wikipedia.org/wiki/Slot_machine ////////////////////////////////////// " .C28EXPLAIN13<-"Monte Carlo Simulation to mimic Black Jack ////////////////////////////////////// This is a 2-player game: a dealer and a player. Below, we assume that you are the player. Rule #1: cards 2 to 10 have their face value, while J, Q, and K are worth 10 points and Ace is worth either 1 or 11 points (player's choice). Terminology: Blackjack : one A plus any card worth 10 points. Lose : the player's bet is taken by the dealer. Win : the player wins as much as he bet. Blackjack (natural): the player wins 1.5 times the bet. Push : the player keeps his bet, neither winning nor losing money. Step 1: the dealer draw two cards, one face up, while the player draw two cards (face up) Step 2: the player could draw the third card Win or lose: if the sum of your cards is less than 21 and is bigger than dealer s, you win. http://www.pagat.com/banking/blackjack.html ////////////////////////////////////// " .C28EXPLAIN14<-"Benford Law and accounting fraud detection ////////////////////////////////////// Benford Law is also called the First-Digit Law which gives different frequencies for 9 first digits from 1 to 9. Convention wisdom would conclude that each (first) digit would have roughly the same frequency, i.e., 1/9=0.1111=11%. However, according to the Benford Law, the lower is the value of a digit, the higher is its probability. In other words, we will see more values with leading digit of 1 than with the leading digit of 2. The probability of each digit is given by the following formula. Prob(d)=log10((d+1)/d) (1) where Prob() is the probability (frequency), d is the digit, and log10() is the log function with a base of 10. For Excel, log10() is the same as log(). Digit Formula probability ---- ---- ---- 1 =log10(2/1) 0.301 2 =log(3/2) 0.176 3 =log(4/3) 0.125 4 =log(5/4) 0.097 5 =log(6/5) 0.079 6 =log(7/6) 0.067 7 =log(8/7) 0.058 8 =log(9/8) 0.051 9 =log(10/9) 0.046 ------------ ----- Total 100% Objectives: 1) understand Benford Law 2) download about a dozen companies annual reports 3) estimate the distributions of the 1st digits 4) report your results and discuss Procedure: To download annual financial statements. Step 1: go to Yahoo!Finance http://finance.yahoo.com/ Step 2: enter a ticker, such as IBM Step 3: find three types of financial statements. Step 4: download those financial statements Note 1: the function to get the first digit is =left(cell, 1) Note 2: you could use the Excel countif() function. References Accounting Web, 20 Ways You Can Detect Fraud, 2014, http://www.accountingweb.com/aa/law-and-enforcement/20-ways-you-can-detect-fraud Sharma, Anuj, Prabin Kumar Panigrahi, 2012, A Review of Financial Accounting Fraud Detection based on Data Mining Techniques, INternationla Journal of Computer Aplication 39, 1, https://arxiv.org/ftp/arxiv/papers/1309/1309.3944.pdf MCGINTY, JO CRAVEN, 2014, Accountants Increasingly Use Data Analysis to Catch Fraud, Auditors Wield Mathematical Weapons to Detect Cheating, http://www.wsj.com/articles/accountants-increasingly-use-data-analysis-to-catch-fraud-1417804886 Testing Benford Law, http://testingbenfordslaw.com/ What is Benford Law, https://en.wikipedia.org/wiki/Benford%27s_law#cite_note-Nigrini-19 ////////////////////////////////////// " .C28EXPLAIN15<-"Readability of 10-K filings and firm's performance ////////////////////////////////////// Objectives: 1) Understand the usage of 10-K 2) learn how to parse 10-K 3) understand the Fog-index and learn how to calculate it for each 10-K filing 4) Comments on your result Source of data a) SEC EDGAR (Electronic Data Gathering , Analysis and Retrieval) b) I have all 10-K filings from Q1 1993 to Q2 2016 (the number of filings is 210,842 and the size is 440G) Structure vs. unstructured data The unstructured information has a lion share of all information, 70% to 80% and it is reported that 80% of structured information came from unstructured one. On the other-hand, SEC filings is an important source of information (gold mine) since public companies, certain insiders, and broker-dealers are required to make regular SEC filings. Text analysis Text is one of the most important informant belongs to unstructured information. Text analysis, also called text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. For example, we could look at the frequency of each words, keywords, number of lines, sentences, frequency of positive words vs. negative, tone of the speech etc. For example, let s look at the top used words by Reagan in 1994 and Obama 2008, see below. Which one belongs to Obama? Text analysis for finance and accounting Applying text analysis to finance and accounting does not have a long history. Li (2008) shows that the readability of 10-K filings has a statistically significant impact on the performance of a firm s subsequent performance. The readability measure used by Li (2008) is call Fog index defined below. Fog index=0.4*(n+p) (1) where, n is the average number of words per sentence, while p is the percentage of complex words. A complex word is a word has more than two syllables. Because of defining and measuring readability in the context of financial disclosures becomes important with the increasing use of textual analysis and the SEC s plain English initiative, Lougran and McDonald (2015) show that the Fog Index the most commonly applied readability measure is poorly specified in financial applications. Of Fog s two components, one is mis specified and the other is difficult to measure. They suggest to use the size of 10-K filing as a simple readability proxy and show that it outperforms the Fog Index. Another added advantage is that it does not require document parsing, thus facilitates replication. According to Loughran and McDonald (2014), there are 632 different forms. On the other hand, most researchers used only one or two forms, such as 10-K. Thus, the SEC filings database is a gold mine waiting to be explored. Reference Li, Feng, 2008, Annual report readability, current earnings, and earnings persistence, Journal of Accounting and Economics 45, 221 247. source(\"http://datayyy.com/textAnalysis.txt\") ////////////////////////////////////// " .C28EXPLAIN16<-"business cycle indicator ////////////////////////////////////// There exist many profitable trading strategies, such as the individual stock momentum, first documented in Jegadeesh and Titman (1993), the industry momentum in Moskowitz and Grinblatt (1999), the effect of the 52-week high price in George and Hwang (2004), and the effect of the maximum daily return in a month in Bali et al. (2011). However, Yan and Zhang (2016) argue that those profitable trading strategies would not be profitable during difficult time. In other words, investors would change their behavior during a recession. Objectives: 1) understand the concept of business cycle 2) generate a business cycle indicator 3) if possible run CAPM by including this indicator Source of data: 1) Business cycle data is from the National Bureau of Economic Research center. The original starting date is June 1854. 2) stock data is from Yahoo!Finance. Comments on your result References Bali, Turan G., Nusret Cakici, and Robert F. Whitelaw, 2011, Maxing Out: Stocks as Lotteries and the Cross-Section of Expected Returns, Journal of Financial Economics 99 427-446. George, Thomas J, and Chuan-Yang Hwang, 2004, The 52-week High and Momentum Investing, Journal of Finance 54, 5, 2145-2176. Grinblatt, Mark, and Bing Han, 2005, Prospect theory, mental accounting, and momentum, Journal of Financial Economic 78, 311-339. Jegadeesh, N., and S. Titman, 1993, Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency, Journal of Finance 48, 65-91. Moskowitz, Tobias, and Mark Grinblatt, 1999, Do industries explain momentum? Journal of Finance 54, 2017-2069. Yan, Yuxing and Shaojun Zhang, 2016, Business cycle, investors preferences and trading strategies, Frontiers of Business Research in China (forthcoming) Table 1 from Yan and Zhang(2016) For a peak, we assign a positive 1 while for a trough, we assign a negative 1. Any months between those peaks and troughs, we linearly interpolate, see Panel B below. P for Peak and T for Trough. T(t-1) is for the pervious Trough and P(t-1) is for the previous Peak. Contraction Expansion cycle Peak (P) Trough (T) P to T T(t-1) to P T(-1) to T P(t-1) to P ----------- ----------------- ------ ---------- ----------- ----- May 1923(II) July 1924 (III) 14 22 36 40 October 1926(III) November 1927 (IV) 13 27 40 41 August 1929(III) March 1933 (I) 43 21 64 34 May 1937(II) June 1938 (II) 13 50 63 93 February 1945(I) October 1945 (IV) 8 80 88 93 November 1948(IV) October 1949 (IV) 11 37 48 45 July 1953(II) May 1954 (II) 10 45 55 56 August 1957(III) April 1958 (II) 8 39 47 49 April 1960(II) February 1961 (I) 10 24 34 32 December 1969(IV) November 1970 (IV) 11 106 117 116 November 1973(IV) March 1975 (I) 16 36 52 47 January 1980(I) July 1980 (III) 6 58 64 74 July 1981(III) November 1982 (IV) 16 12 28 18 July 1990(III) March 1991(I) 8 92 100 108 March 2001(I) November 2001 (IV) 8 120 128 128 December 2007(IV) June 2009 (II) 18 73 91 81 ////////////////////////////////////// " .C28EXPLAIN17<-"illiquidity measure, Amihud (2002) ////////////////////////////////////// Objective: estimate 12 stocks illiquidity measures for each month in 2016. Note: you choose the last 6 stock yourself. Comments on your findings 6 stock symbols are given below. Company name Ticker Industry ---------------------- ------ ------------------ 1 Microsoft Corporation MSFT Application software 2 Apple Inc. AAPL Personal Computer 3 Citigroup Inc. C Money Center Banks 4 Wal-Mart Stores, Inc. WMT Discount, Variety Stores 5 Home Depot, Inc. HD Home improvement services .. ................. ... ...... 12 General Electric Corp GE Technology Amihud (2002) illiquidity measure uses the absolute daily return over its corresponding trading dollar volume. A monthly stock illiquidity measure is the mean of daily illiquidity measure. 1 |Ri| illiq(t)= --- * sum (--------------- ) (1) n pi * Vi where illiq(t) is a monthly illiquidity measure, n is the number of trading days within the month, Ri is daily return on day i, Vi is the trading volume on day i and Pi is the closing price of the underlying stock on day i. The Amihud illiquidity measure includes two components: spread and the impact of trading. Illiquidity is the opposite of liquidity, i.e., a higher value indicates a low liquidity and a small value indicates a higher liquidity level. Why? Step 1: download daily price data from Yahoo Finance Step 2: estimate daily returns and dollar trading volume Step 3: estimate the ratio Step 4: estimate monthly illiquidity measures References Amihud, Yakov, 2002, Illiquidity and Stock returns, Journal of Financial Markets 5, 31-56. " .C28EXPLAIN18<-"liquidity measure, Pastor and Stambough (2003) ////////////////////////////////////// Objectives: 1) Understand the logic of the measure 2) Learn how to download and process data from Yahoo!Finance 3) estimate individual stock s liquidity Basic logic:Pastor and Stambaugh (2003) design the following regression to estimate individual stock s liquidity. y(t)=alha + 1*x1(t-1)+ 2*x2(t-1)+error(t) (1) where,y(t) is the excess stock return on day t, the excess return is defined as R(t)-Rm(t), R(t) is the stock return, Rm(t) is the market return at time t; x1(t-1) is the lagged stock return, i.e., R(t-1), X2(t-1) is the lagged dollar trading volume, i.e., x2(t-1)=P(t-1)*V(t-1), P(t-1) is the daily closing price of the stock at t-1 and V(t-1) is the daily trading volume at t-1. The regression is based on the daily data within each month with a minimum number of observations of 15. The liquidity measure for an individual stock in each month is defined as: liquidity measure=beta2 (2) For the first trial, we ignore other constraints. The Market liquidity is the equally weighted individual stock's liquidity and scaled by the market capitalization. Procedure: Step 1: Retrieve daily data Step 2: generate y, x1, x2 for each stocks Step 3: Run regression (1) to estimate 2 for each month References Pastor, L. & Stambaugh, R., 2003, Liquidity risk and expected stock returns. Journal of Political Economy 111, 642-685. ////////////////////////////////////// " .C28EXPLAIN19<-"Spread estimation from TAQ (Trade and Quote) high-frequency data ////////////////////////////////////// Objectives: 1) Understand the structure of TAQ database 2) Using Excel to retrieve data from one day s data sets 3) estimate the spread for 10 stocks High-Frequency trading has attracted lots of attention because of its huge profits and because of it is not clear whether it is fair to small investors and its impact on the health of the stock market. According to Investopedia, HFT (High-Frequency Trading) is defined as: A program trading platform that uses powerful computers to transact a large number of orders at very fast speeds. High-frequency trading uses complex algorithms to analyze multiple markets and execute orders based on market conditions. Typically, the traders with the fastest execution speeds will be more profitable than traders with slower execution speeds. As of 2009, it is estimated more than 50% of exchange volume comes from high-frequency trading orders. To understand HFT, we have to understand TAQ (Trade and Quote) database. Data Sets: November 1, 2004 is randomly selected as our day, see its 4 data sets below. Two index files have an extension of .idx , while two data files have an extension of .bin . 12/01/2004 04:03 PM 1,800,548,334 Q200411a.bin 12/01/2004 04:03 PM 182,424 Q200411a.idx 12/01/2004 04:08 PM 184,899,853 T200411a.bin 12/01/2004 04:08 PM 169,334 T200411a.idx 4 File(s) 1,985,799,945 bytes References Philips, Matthew, What Michael Lewis Gets Wrong About High-Frequency Trading, 4/1/2014 http://www.bloomberg.com/bw/articles/2014-04-01/what-michael-lewis-gets-wrong-about-high-frequency-trading Appendix A: first several lines from CQ (consolidated Quotes) from TAQ symbol date time bid ofr bidsiz ofrsiz mode EX MMID A 20040401 8:00:02 30.62 32.64 1 1 12 P A 20040401 8:11:40 29.68 33.58 20 20 12 P A 20040401 8:12:56 30.7 33.58 2 20 12 P A 20040401 8:30:02 0 0 0 0 12 T BRUT A 20040401 8:30:02 1 100 1 1 12 T CAES A 20040401 8:30:02 0 0 0 0 12 T DATA A 20040401 8:30:02 0 0 0 0 12 T MADF Table 1: structure of a binary index file. The size (bit) of an index file is 22 with 4 variables. # Name of the variable Meaning Size Type 1 Ticker Stock symbol 10 Character 2 Date Trading date 4 Integer 3 Begrec Beginning record 4 Integer 4 Endrec Ending record 4 Integer Table 2: Structure of binary CT (Consolidated Trade) file. The size is 29 with 8 variables. # Name of the variable Meaning Size Type 1 Time Trading time 4 Integer 2 Price Trading price 8 Float 3 Tseq Sequence number 4 Integer 4 Size Trading size 4 Integer 5 G127 G127 rule 2 Integer 6 CORR Correction 2 Integer 7 COND Sale condition 4 Character 8 Ex Exchange 1 Character Table 3: Structure of a binary CQ (Consolidated Quote) file. The size is 39 with 9 variables. # Name of the variable Meaning Size Type 1 Time Trading time 4 Integer 2 Bid Bid price 8 Float 3 Ofr Ask price 8 Float 4 Qseq Sequence number 4 Integer 5 Bidsiz Bid size 4 Integer 6 Asksiz Ask size 4 Integer 7 MODE quote condition 2 Integer 8 EX Exchange 1 Character 9 MMID NASDAQ market maker 4 Character ////////////////////////////////////// " .C28EXPLAIN20<-"Reverse mortgage calculator ////////////////////////////////////// EXAMPLE #1: John Bosworth, Age 68 Home Value - $250,000 Home Equity - $210,000 Approximate Mortgage Balance - $40,000 Challenge: John is a widower who lives at home alone. He would like to keep his home, but is having trouble making payments and meeting expenses. His monthly mortgage payment is $611. Even with both Social Security income and pension, he is still short by $187 per month Solution: John takes out a tax free reverse mortgage for $142,496. He takes a lump sum of $40,000 and applies it to his existing mortgage and the balance in monthly payments of $681. After paying the mortgage off entirely, John s monthly income rises to $1,291. That s $611 per month for the mortgage payment, plus another $681 from the reverse mortgage. EXAMPLE #2 Craig Jenkins, Age 82, and Sylvia Jenkins, Age 79 (reverse mortgages are calculated using the age of the youngest home owner) Home Value - $375,000 Home Equity - $375,000 Challenge: Craig and Sylvia both take medication to stay in good health. The cost of monthly meds and treatments makes it difficult for them to find the money needed to maintain the quality of life they once enjoyed. Solution: They take out a tax free reverse mortgage with the option of one lump sum totaling $218,419, or a monthly income of $1,495. The extra cash flow from their reverse mortgage more than covers their monthly cost for medication, and allows Craig and Sylvia more freedom with much less stress. EXAMPLE #3 Kathy Tobias, Age 63, and Rinaldi Tobias, Age 71 (reverse mortgages are calculated using the age of the youngest home owner) Home Value - $165,000 Home Equity - $165,000 Challenge: Kathy and Rinaldi would like to spend their retirement traveling around the U.S. in their RV, but don t have extra money they would need to help pay for rising gas prices and other added travel expenses. Solution: They take out a tax free reverse mortgage of $82,419. This will give them an extra $519 per month which they can use any way they d like, and more than supplements their need for gas and RV maintenance. EXAMPLE #4 Gordon Penilla, Age 62, and Joanne Penilla, Age 65 (reverse mortgages are calculated using the age of the youngest home owner) Home Value - $850,000 Home Equity - $850,000 Challenge: Gordon and Joanne have no real debts, and their monthly income is adequate for them to live life as planned, but they would like to help out with the cost of college tuition for a grand child. For that, their income monthly and savings do not suffice. Solution: Gordon and Joanne take out a tax free reverse mortgage credit line allowing up to $265,411. Each grandparent can now bestow a monetary gift to the grandchild, the amount being that which is currently allowed by law. Note 1: Reverse mortgage proceeds are based upon the current interest rates at the time the loan closes, the age of the youngest borrower, and the equity in the home. The examples above are based on an interest rate of 6.26%. Note 2: Borrowers can lock rates in for 60 days from the date of application to the closing. All rates adjust weekly, and the rate for closing is determined by the weekly rate set on Tuesdays of each week (excluding Federal Holidays) and stay valid until the following Monday. http://www.seacoastreversemortgage.com/loanOptions/Custom%20Pages/Scenario%20Examples/ http://www.kiplinger.com/article/retirement/T035-C000-S001-reverse-mortgages-risky-for-boomers.html ////////////////////////////////////// " .C28EXPLAIN21<-"KMV model and default probability ////////////////////////////////////// Objective: 1) Estimate market value and its volatility for KMV model 2) estimate default point 3) estimate default probability KMV stands for Kealhofer, McQuown and Vasicek who found a company focusing on measuring default risk. KMV methodology is one of the most important methods to estimate the probability of default for a given company by using its balance sheet information and the equity market information. The objective here is to estimate the market value of total assets (A) and its corresponding volatility (sigmaA). The result will be used to estimate default distance and default probability. The basic idea is to treat the equity of a firm as a call option and the debt is its strike price. Let us look at the following simplest example. For a firm, if its debt is $80 and equity is $20 then the total assets will be $100. Assume that the assets jump to $110 and the debt remains the same, the equity increases to $30. On the other hand, if the assets drop to $90, the equity will be only $10. Since the equity holders are the residual claimer, their value has the following expression. E = max?(assets - debt,0)=max?(A-D,0) (1) Recall for a call option, we have the following payoff function. Payoff(call) = max?(sT-K,0) (2) This means that we could treat equity as a call option with debt as our exercise price. With appropriate notations, we will have the following formulae for a firm s equity. KMV model is defined below. E=A*N(d1 )-e^(-rT) N(d2) ln?(A/D)+(r+0.5*sigma^2 )T d1 = --------------------------- (3) sigmaA * sqrt(T) d2 =d1- sigmaA *sqrt(T) On the other hand, the following relationship between the volatilities of the equity and the total assets holds. In the following equation, we have delta=dE/(dVA )=N(d1 ). sigmaE=A/E N(d1)*A*sigmaA delta_sigmaA= ------------- (4) E Since d1 and d2 are defined by the above equations, we have two equations for two unknown (A and sgimaA), see below. Thus, we could use a trial-and-error or simultaneous equation method to solve for those two unknowns. Eventually, we want to solve the following two equations for A and s_A. E=A*N(d1)-e^(-rT) *N(d2 ) A sigmaE= --- * N(d1)*siamgA (5) E We should pay attention that the estimated A (Market value of total assets) from Equation (5) is different from the summation of market value of assets plus the book value of the debt. The usages of those two derived values (A and sigmaA) will be used by Equations (6-8). Here is a KMV example: E=110,688 (shares outstanding*price of stock), D=64,062 (total debt), Rf=0.07 (risk-free rate), T=1 (1 year). The result is A=170,558 sA=0.29. Based on the following codes we got A=170,393 and sigmaE=0.2615. The output is : A=170,393.78 and sigmaE is 0.2615. Please pay attention that the summation of the book value of debt and the market value of equity is 174,750 (?170,558). Distance to Default Distance to default (DD) is defined by the following formula, where A is the market value of the total assets and sigmaA is its risk. The interpretation of this measure is clear, the higher DD, the safer is the firm. A - Default Point DD= ----------------------- (6) A *sigmaA In terms of Default Point, there is no theoretical fixed default point. However, we could use all short-term debts plus the half of long-term debts as our default point. After we have the values of MV of assets and its volatility, we could use the following equation to estimate the Distance to Default. The A and s_A are from the output from Equation (5). On the other hand, if the default point equals to E, we would have the following formula. ln(VA/D)+(r-0.5*sigmaA^2)T DD= - ------------------------------- (7) sigmaA*sqrt(T) Note that there is a negative sign in front of the ratio According to Black-Scholes model, the relationship between DD and Default Probability is given below. DP(Default Probability) = N(-DD) (8) ////////////////////////////////////// " .C28EXPLAIN22<-"Financial statement analysis ////////////////////////////////////// Objective: 1) Understand the importance of financial statement analysis 2) understand the definitions of various ratios, such as Debt/equity ratio, ROE, ROA, DuPont Identity 3) Compare the performance of the firm with itself and with peers 4) Given your recommendations Note: If you could \"automate\" your process, it will be more meaningful. For example, you spend one day to finaish one company. How long you would finish the next one or 10th one? Here are potential helps. 1) get financial statement easily, see type c28 2) You can use some simple macro, such as record your operation c26 Tool: Excel Procedure: 1) Download a company's several years' financial statements 2) Conduct analysis such as ratio analysis 3) Compare its performance with itself and with peers 4) Write your comments and recommendation ////////////////////////////////////// " .C28EXPLAIN23<-"Black and Litterman model (1992) ////////////////////////////////////// Objective: ---------- 1) Understand the shortcomings of our optimization model 2) understand the contributions of Black and Litterman (1992) 3) using Excel to illustrate a few examples 4) Extension? Sources ------- blacklitterman.org http://blacklitterman.org Black and Little example http://canisius.edu/~yany/excel/blacklitterman.xlsx ////////////////////////////////////// " .C28EXPLAIN24<-"Brandt, Santa-Clara and Valkanov model (2009) ////////////////////////////////////// Objectives: ---------- 1) Understand the traditional Markowitz mean-variance optimization http://www.effisols.com/basics/MVO.htm https://en.wikipedia.org/wiki/Modern_portfolio_theory 2) shortcomings and limitations of the current optimization model 3) understand the contributions of Brandt et al. (2009) Brandt, Santa-Clara, Valkanov approach (2009) http://www.nber.org/papers/w10996 4) using R to implement their approach 5) Extension Data sources ------- 1) CRSP 2) Compustat 3) Prof. French's Data Library Parametric Portfolio Policies: Exploiting Characteristics in the Cross Section of Equity Returns Abstract We propose a novel approach to optimizing portfolios with large numbers of assets. We model directly the portfolio weight in each asset as a function of the asset's characteristics. The coefficients of this function are found by optimizing the investor's average utility of the portfolio's return over the sample period. Our approach is computationally simple, easily modified and extended, produces sensible portfolio weights, and offers robust performance in and out of sample. In contrast, the traditional approach of first modeling the joint distribution of returns and then solving for the corresponding optimal portfolio weights is not only difficult to implement for a large number of assets but also yields notoriously noisy and unstable results. Our approach also provides a new test of the portfolio choice implications of equilibrium asset pricing models. We present an empirical implementation for the universe of all stocks in the CRSP-Compustat dataset, exploiting the size, value, and momentum anomalies. ////////////////////////////////////// " .C28EXPLAIN25<-"TORQ database ////////////////////////////////////// The TORQ database contains transactions, quotes, order processing data and audit trail data for a sample of 144 NYSE stocks for the three months November, 1990 through January 1991. This document covers installation, formatting and use of the data. Conceptual and institutional details concerning the data are given in a companion publication Hasbrouck and Sosebee (1992). These data are distributed for purposes of academic research. No warranty is made that they are free of errors. The user assumes all responsibility for the consequences of any errors. Objectives: ---------- 1) Understand TORQ database 2) understand how to retrieve data from the binary data sets 3) illustrate a few applications 4) comments Sources ------- Manual, http://people.stern.nyu.edu/jhasbrou/Research/Working%20Papers/TORQDOC3.PDF Source of data http://people.stern.nyu.edu/jhasbrou/Research/Working%20Papers/ ////////////////////////////////////// " .C28EXPLAIN26<-"SEC filings (dealing with index files) ////////////////////////////////////// .Objectives: ---------- 1) Understand what is the usages of SEC filings 2) understand how to search SEC EDGAR platform 3) download one quarterly file and using Excel to explore a) How many company b) how many CIK c) how many forms e) frequency of those forms f) others 4) collect all observations related to 10K and generate an R data sets 5) potential applications Sources ------- https://www.sec.gov/edgar.shtml Quarterly index files https://www.sec.gov/Archives/edgar/full-index/ The first several lines from Q3 2017 ------------------------------------------- Description: Master Index of EDGAR Dissemination Feed by Company Name Last Data Received: September 30, 2017 Comments: webmaster@sec.gov Anonymous FTP: ftp://ftp.sec.gov/edgar/ Company Name Form Type CIK Date Filed File Name --------------------------------------------------------------------------------------------------------------------------------------------- (OurCrowd Investment in MST) L.P. D 1599496 2017-08-24 edgar/data/1599496/0001465818-17-000048.txt 1 800 FLOWERS COM INC 10-K 1084869 2017-09-15 edgar/data/1084869/0001437749-17-015969.txt 1 800 FLOWERS COM INC 3 1084869 2017-07-27 edgar/data/1084869/0001140361-17-028807.txt 1 800 FLOWERS COM INC 3 1084869 2017-07-27 edgar/data/1084869/0001140361-17-028809.txt 1 800 FLOWERS COM INC 3 1084869 2017-07-27 edgar/data/1084869/0001140361-17-028810.txt 1 800 FLOWERS COM INC 3 1084869 2017-07-27 edgar/data/1084869/0001140361-17-028811.txt 1 800 FLOWERS COM INC 3 1084869 2017-07-27 edgar/data/1084869/0001140361-17-028812.txt 1 800 FLOWERS COM INC 3 1084869 2017-07-27 edgar/data/1084869/0001140361-17-028813.txt 1 800 FLOWERS COM INC 3 1084869 2017-07-27 edgar/data/1084869/0001140361-17-028814.txt 1 800 FLOWERS COM INC 3 1084869 2017-07-27 edgar/data/1084869/0001140361-17-028815.txt 1 800 FLOWERS COM INC 3 1084869 2017-07-27 edgar/data/1084869/0001140361-17-028816.txt Note: combine a) and b) below a) https://www.sec.gov/Archives/ b) 2017-08-24 edgar/data/1599496/0001465818-17-000048.txt we have https://www.sec.gov/Archives/edgar/data/1599496/0001465818-17-000048.txt ////////////////////////////////////// " .C28EXPLAIN27<-"Rattle R package ////////////////////////////////////// Welcome to Rattle (rattle.togaware.com). Rattle is a free graphical user interface for Data Science, developed using R. R is a free software environment for statistical computing, graphics, machine learning and artificial intelligence. Together Rattle and R provide a sophisticated environment for data science, statistical analyses, and data visualization. See the Help menu for extensive support in using Rattle. The books Data Mining with Rattle and R and Essential Data Science are available from Amazon. The Togaware Desktop Data Mining Survival Guide includes Rattle documentation and is available from datamining.togaware.com Rattle works with open source R which is limited to datasets and processing that fit into your computers memory. Further details from https://docs.microsoft.com/en-us/r-server/ Rattle is licensed under the GNU General Public License, Version 2. Rattle comes with ABSOLUTELY NO WARRANTY. See Help -> About for details. Rattle Version 5.1.0. Copyright 2006-2017 Togaware Pty Ltd. Rattle is a registered trademark of Togaware Pty Ltd. Rattle was created and implemented by Graham Williams with contributions as acknowledged in 'library(help=rattle)'. ----------------------------------------- First, install an R package called rattle >install.packages(\"rattle\") ----------------------------------------- > library(rattle) Rattle: A free graphical interface for data science with R. Version 5.1.0 Copyright (c) 2006-2017 Togaware Pty Ltd. Type 'rattle()' to shake, rattle, and roll your data. > > rattle() https://cran.r-project.org/web/packages/rattle/vignettes/rattle.pdf ////////////////////////////////////// " .C28EXPLAIN28<-"SEC 10-K: BS, IS or CF ////////////////////////////////////// This is a very interesting projects. If you could generate BS or IS, it will be more than enough. Step 1: download all SEC Financial Statements at https://www.sec.gov/dera/data/financial-statement-data-sets.html Step 2: unzip one and write a SAS program to retrieve data Step 3: work on one zip file Step 4: write SAS programs to generate many individual SAS data sets or generate one big SAS data set, Step 5: Generate your own BS Method I: download latest several years BS from Yahoo!Finance replicate with your data Method II: generate your own BS Advantage with your data sets a) 10 years' data from 2009 to 2018 b) next year, we will have one more year c) Easily estimate industry means such as CA Quick Ratio ---------- CL CA is the current assets CL is the current liability d) you could generate some SAS, R or Python data sets ////////////////////////// " .C28EXPLAIN29<-"SEC Forms 3, 4 and 5 ////////////////////////////////////// What are Forms 3, 4, and 5? Corporate insiders meaning a company's officers and directors, and any beneficial owners of more than ten percent of a class of the company's equity securities registered under Section 12 of the Securities Exchange Act of 1934 must file with the SEC a statement of ownership regarding those securities. On August 27, 2002, the SEC adopted rules and amendments to Section 16 of the Exchange Act, implementing the provisions of the Sarbanes-Oxley Act of 2002 that accelerated the deadline for filing most insider ownership reports. The initial filing is on Form 3. An insider of an issuer that is registering equity securities for the first time under Section 12 of the Exchange Act must file this Form no later than the effective date of the registration statement. If the issuer is already registered under Section 12, the insider must file a Form 3 within ten days of becoming an officer, director, or beneficial owner. Changes in ownership are reported on Form 4 and must be reported to the SEC within two business days. You can find the limited categories of transactions not subject to the two-day reporting requirement in the new rule. Insiders must file a Form 5 to report any transactions that should have been reported earlier on a Form 4 or were eligible for deferred reporting. If a Form must be filed, it is due 45 days after the end of the company's fiscal year. Today, the financial statement analysis has nothing to do with insider trading. Step 1: download all SEC Financial Statements by using .dumpSECfinS function from 2009 to 2018 Step 2: unzip one and write a SAS program to retrieve data Step 3: work on one zip file Step 4: write SAS programs to generate many individual SAS data sets for Forms 3, 4 and 5 Step 5: Make your data sets quite user friendly Advantages with your data sets a) 10 years' data from 2009 to 2018 b) next year, we will have one more year c) Easily estimate all insiders trades d) you could generate some SAS, R or Python data sets https://www.sec.gov/fast-answers/answersform345htm.html ////////////////////////// " .C28EXPLAIN30<-"SEC 10-K (13-f) ////////////////////////////////////// What is 13-f? -------------- Form 13F-?Reports Filed by Institutional Investment Managers An institutional investment manager that uses the U.S. mail (or other means or instrumentality of interstate commerce) in the course of its business, and exercises investment discretion over $100 million or more in Section 13(f) securities (explained below) must report its holdings on Form 13F with the Securities and Exchange Commission (SEC). In general, an institutional investment manager is: (1) an entity that invests in, or buys and sells, securities for its own account; or (2) a natural person or an entity that exercises investment discretion over the account of any other natural person or entity. Institutional investment managers can include investment advisers, banks, insurance companies, broker-dealers, pension funds, and corporations. Form 13F is required to be filed within 45 days of the end of a calendar quarter. The Form 13F report requires disclosure of the name of the institutional investment manager that files the report, and, with respect to each section 13(f) security over which it exercises investment discretion, the name and class, the CUSIP number, the number of shares as of the end of the calendar quarter for which the report is filed, and the total market value. Today, the financial statement analysis does not consider the holdings of financial institutions. Step 1: download all SEC Financial Statements by using .dumpSECfinS function from 2009 to 2018 Step 2: unzip one and write a SAS program to retrieve data Step 3: work on one zip file Step 4: write SAS programs to generate many individual SAS data sets for Forms 3, 4 and 5 Step 5: Make your data sets quite user friendly Advantages with your data sets a) 10 years' data from 2009 to 2018 b) next year, we will have one more year c) Easily estimate all insiders trades d) you could generate some SAS, R or Python data sets https://www.sec.gov/fast-answers/answers-form13fhtm.html ////////////////////////// " .C28EXPLAIN31<-"SEC Mutual Fund Prospectus Risk/Return Summary Data Sets ////////////////////////// The Mutual Fund Prospectus Risk/Return Summary Data Sets provides text and numeric information extracted from the risk/return summary section of mutual fund prospectuses. The data is extracted from exhibits to mutual fund prospectuses tagged in eXtensible Business Reporting Language (XBRL). The information is presented without change from the \"as filed\" submissions by each registrant as of the date of the submission. The data is presented in a flattened format to help users analyze and compare corporate disclosure information over time and across registrants. The data sets will be updated quarterly. Data contained in documents filed after 5:30PM Eastern on the last business day of a quarter will be included in the subsequent quarterly posting. https://www.sec.gov/dera/data/mutual-fund-prospectus-risk-return-summary-data-sets The Mutual Fund Prospectus Risk-Return Summary Data Sets (PDF, 207 kb) https://www.sec.gov/dera/data/rr1.pdf provides documentation of scope, organization, file formats and table definitions. ////////////////////////// " .C28EXPLAIN32<-"Census Summary Form 1 (SF1) ////////////////////////// What is SF 1? Summary File 1 (SF 1) contains the data compiled from the questions asked of all people and about every housing unit. Population items include sex, age, race, Hispanic or Latino origin, household relationship, household type, household size, family type, family size, and group quarters. Housing items include occupancy status, vacancy status, and tenure (whether a housing unit is owner-occupied or renter-occupied). There are 177 population tables (identified with a \"P\") and 58 housing tables (identified with an \"H\") shown down to the block level; 82 population tables (identified with a \"PCT\") and 4 housing tables (identified with an \"HCT\") shown down to the census tract level; and 10 population tables (identified with a \"PCO\") shown down to the county level, for a total of 331 tables. The SF 1 Urban/Rural Update added 2 PCT tables,increasing the total number to 333 tables. There are 14 population tables and 4 housing tables shown down to the block level and 5 population tables shown down to the census tract level that are repeated by the major race and Hispanic or Latino groups. SF 1 includes population and housing characteristics for the total population, population totals for an extensive list of race (American Indian and Alaska Native tribes, Asian, and Native Hawaiian and Other Pacific Islander) and Hispanic or Latino groups, and population and housing characteristics for a limited list of race and Hispanic or Latino groups. Population and housing items may be cross-tabulated. Selected aggregates and medians also are provided. A complete listing of subjects in this file is found in the \"Subject Locator\" chapter. To download all data, type .dumpCensusSF1 source of data https://www2.census.gov/census_2010/04-Summary_File_1/ Manual https://www.census.gov/prod/cen2010/doc/sf1.pdf ////////////////////////// " .C28EXPLAIN33<-"Census Summary Form 2 (SF2) ////////////////////////// What is SF2? Summary File 2 (SF 2) contains the data compiled from the questions asked of all people and about every housing unit. SF 2 includes population characteristics, such as sex, age, average household size, household type, and relationship to householder such as nonrelative or child. The file includes housing characteristics, such as tenure (whether a housing unit is owner-occupied or renter-occupied), age of householder, and household size for occupied housing units. Selected aggregates and medians also are provided. A complete listing of subjects in SF 2 is found in Chapter 3, Subject Locator. The layout of the tables in SF 2 is similar to those in SF 1. These data are presented in 47 population tables (identified with a \"PCT\") and 14 housing tables (identified with an \"HCT\") shown down to the census tract level; and 10 population tables (identified with a \"PCO\") shown down to the county level, for a total of 71 tables. Each table is iterated for 331 population groups: the total population, 75 race categories, 114 American Indian and Alaska Native categories (reflecting 60 tribal groupings), 47 Asian categories (reflecting 24 Asian groups), 43 Native Hawaiian and Other Pacific Islander categories (reflecting 22 Native Hawaiian and Other Pacific Islander groups) and 51 Hispanic/not Hispanic groups. The presentation of SF 2 tables for any of the 331 population groups is subject to a population threshold of 100 or more people. That is, if there are fewer than 100 people in a specific population group in a specific geographic area, their population and housing characteristics data are not available for that geographic area in SF 2. To download all data, type .dumpCensusSF2 Source of data https://www2.census.gov/census_2010/05-Summary_File_2/ Manual https://www.census.gov/prod/cen2010/doc/sf2.pdf ////////////////////////// " .C28EXPLAIN34<-"Census Demographic profile ////////////////////////// A short intro ------------- The Demographic Profile Summary File contains 100 percent data asked of all people and about every housing unit on topics such as sex, age, race, Hispanic or Latino origin, household relationship, household type, group quarters population, housing occupancy, and housing tenure. GEOGRAPHIC CONTENT The Demographic Profile Summary File is released as individual files for the United States, each of the 50 states, the District of Columbia, and Puerto Rico. The data items are identical for all files, but the geographic coverage differs. The summary level sequence chart outlines the hierarchical and geographic summaries in their entirety. To download all data, type --------------------------- .dumpCensusDemographicProfile Source of the data --------------------------- https://www2.census.gov/census_2010/03-Demographic_Profile/ Manual --------------------------- https://www.census.gov/prod/cen2010/doc/dpsf.pdf Manual about the data structure --------------------------- https://www2.census.gov/census_2010/03-Demographic_Profile/0README_DPSF.pdf ////////////////////////// " .C28EXPLAIN35<-"Census Redistribution ////////////////////////// To download all data, type .dumpCensusRedistribution source of data https://www2.census.gov/census_2010/redistricting_file--pl_94-171/ Manual ////////////////////////// " .C28EXPLAIN36<-"Census Congressional Districts113 ////////////////////////// To download all data, type .dumpCensusCongressionalDistricts113 Source of data https://www2.census.gov/census_2010/08-SF1_Congressional_Districts_113/ ////////////////////////// " .C28EXPLAIN37<-"Census Congressional Districts115 ////////////////////////// To download all data, type .dumpCensusCongressionalDistricts115 Source of data https://www2.census.gov/census_2010/08-SF1_Congressional_Districts_115/ ////////////////////////// " .C28EXPLAIN38<-"Survey of Consumer Finances (SCF) ////////////////////////// To download all data, type .dumpSCF Source of data https://www.federalreserve.gov/econres/scfindex.htm ////////////////////////// " " 1 R package Shiny Ryan, Jiawen, Dan and Dan 2 Business cycle indicator Huanyuan, Yixiang, and Yilin 3 Retirement Calculator Xiaobing and Tian 4 Momentum strding strategy Bhrigu, Samkit and Maharsh 5 Financial Statement Analysis Arunmohan, Darshan, Husham and Heta 6 Black-Litterman Model KANNAN, YUQIONG and JINGYU 7 Which one is the best? Brandon Pritchard 8 Simulation for Jackjack Brandon, Wen-chien, Yujie 9 Prdict bankruptcy using Z-score Xiufeng, Chenyu and Wei 10 KMV model & default probability Kunal,Zuzar and Zhoongjian " .C28EXPLAIN39<-"Supporting data sets and codes ////////////////////////////////////// To help students finish various topics, I have generated many data sets and wrote some basic functions. Below is a partial list. Data Sets for CRSP: --------- CRSP monthly stock data set CRSP monthly index data set CRSP daily stock data sets CRSP daily index data set CRSP information data set CRSP S&P500 add and delete data set Data Sets for Fama-French factors: --------- FF3monthly : Fama-Fench 3 factor monthly data set FFC4monthly: Fama-French-Carhard 4 factor monthly data set FF5monthly : Fama-French 5 factor monthly data set FF3daily : Fama-Fench 3 factor daily data set FFC4daily : Fama-French-Carhard 4 factor daily data set FF5daily : Fama-French 5 factor monthly daily dat set Other data Sets --------- tradingDaysM: trading days for monthly data set tradingDaysD: trading days for daily data set Programs for CRSP: -------- loadCRSP explainCRSP show_crspInfo show_sp500monthly show_sp500daily show_sp500add showIndexMonthly showIndexDaily showStockMonthly showStockDaily findInfoGivenTickers findPERMNOgivenTickers getCRSPmonthlyGivenPERMNO getStockMonthlyGivenTickerAddTicker getCRSPmonthlyIndexRet getStockMonthlySeveralPERMNOs getStockMonthlySeveralTickers oneStockPlusSP500monthly ewPortfolio capWeighted Programs for others: -------- getYear getMonth getDay saveYan ////////////////////////////////////// " .C28EXPLAIN40<-"Projects taken already ////////////////////////////////////// # Name of the topic Group Date to present -- ---------------------- --------------------------- ---------------- 1 Test of the January Effect Amarendra 2 52-week high trading strategy Steve,Ed,Guodong, Rick 3 Financial statement analysis Hariston, Eric 4 R package quantmod Bennett and Sharan 5 PerformanceAnalytics Hongjian, Chris, ZiAng ////////////////////////////////////// " .C28EXPLAIN40_2018<-"Projects taken already (updated on 11/30/2018) ////////////////////////////////////// # Name of the topic Group Date to present -- ---------------------- --------------------------- ---------------- 1 Market correlations Sam, Valerie,Ryan 12/3 2 VaR estimation using R Mohit, Madhavi,Dhruv Mistry 12/3 3 Fin Statement Analysis Linh, Chandra, Aditya 12/3 4 Black Jack Simulation Karthik,Lakshmi,Priya,Sumaita 12/3 5 KMV default prob. Ankita,Sarang,Nidhi,Sonal 11/26 6 52-w high/low strategy Trusha, Sindhu,Yogesh 12/3 7 Portfolio Analytics packages Nick, Alex 11/26 8 Max trading strategy Saharsh,Alex,Morgan,Gina 9 Bankruptcy prediction Kapil,Will,Prateek,Yousuf,Edward 12/3 10 SEC 10-K filings->BS Bill,Mark 12/3 11 Black-Litterman model Owen, Yongjicai, Zhong 12/3 12 MonteCarlo for slot machine Ruby,Supreet,Ashher 12/3 13 Illiquidity Measure,Amihud Raaj, Jingwei,Hongjun 12/3 14 Retirement calculator Qi,Juncen,Zhenglin 12/3 15 Spread from daily prices Matt, Niveditha 12/3 ////////////////////////////////////// "