.chapter28<-function(i=0){
" i Chapter 28: Term Projects i Projects
- ------------------------------ -- -----------------------------
1 Requirements 21 KMV model and default probability
2 Retirement calculator 22 Financial statement analysis
3 Best one:CAPM, FF3, FFC4, or FF5? 23 Black-Litterman model
4 Test of the January Effect 24 Brandt, Santa-Clara, Valkanov Model (2009)
5 Bankruptcy prediction: Z-score 25 Exploring the TORQ database
6 Updating a monthly data set 26 SEC filings (dealing with index files)
7 Momentum trading strategy 27 R package called Rattle
8 52-week high trading strategy 28 SEC 10-K: BS, IS or CF
9 Max trading strategy 29 SEC 10-K (Forms 3, 4 and 5)
10 Spread from daily price 30 SEC 10-K (13-f)
11 Event study using R 31 SEC Mutual Fund Prospectus
12 Monte Carlo: a slot machine 32 Census Summary Form 1 (SF1)
13 Monte Carlo: Black Jack 33 Census Summary Form 2 (SF2)
14 Benford Law and accounting fraud 34 Census Demographic profile
15 Readability of 10-K filings 35 Census Redistribution
`16 Business cycle indicator 36 Census Congressional Districts 113
17 illiquidity, Amihud(2002) 37 Census Congressional Districts 115
18 Liquidity, Pastor/Stambough(2003) 38 SCF (Survey of Consumer Finance)
19 Spread estimation from TAQ 39 Supporting data sets and codes
20 A reverse mortgage calculator 40 Topics taken already (updated on)
Example #1:>.c28 # see the above list
Example #2:>.c28() # the same as the above
Example #3:>.c28(1) # see the first explanation
";.zchapter28(i)}
.c30<<-.c31<-"There are only 28 chapters."
.n28chapter<-40
.zchapter28<-function(i){
if(i==0){
print(.c28)
}else{
.printEachQ(28,i,.n28chapter)
}
}
.c28<-.chapter28
.termProjects<-.c28
.tp<-.c28
.C28EXPLAIN1<-"Requirement of a term project
//////////////////////////////////////
Objective: This is an integral part of this course. It could be viewed
as the application of what you have learnt from this course
to a real-world situation.
Format: Group project (each group could have up to three members)
Topic:The 1st type (from my list)
1) theory and background of the topic,
2) R programs with a short explanation of the codes,
3) final data set (plus the codes to process the data, the source of raw data)
Note: please do not send me your raw data.
The 2nd type of projects is to study one R package.
1) why this package is useful
2) a summary of most important included functions
3) examples to use them
The 3rd type of topics is to generate 20 data sets
1) why those data sets are important
2) how to retrieve those data sets efficiently
3) applications of those data sets
Each group chooses one topic from a list of potential term projects
(first come and first served since each topic should be chosed by one group).
Three files: Each group should submit three files
a) A text file contains your program, final result
b) final data set(s)
c) a short report (maximum page limit: 15, double space, font of 11)
d) PowerPoint file
Dropbox : submit your files to the dropbox on UBlearns
Presentation:
Each group would present their term project in front of the whole class
Due date:if you want my comments, you should submit your files
before your presentation. If not, you could submit your files after
your presentation.
//////////////////////////
"
.C28EXPLAIN2<-"Retirement calculator
//////////////////////////////////////
Source: http://money.cnn.com/calculator/retirement/retirement-need/
Step 1: estimate John Doe s final annual salary when he retires
Input variables:
a) current salary
b) salary growth rate (factor in the inflation rate)
c) number of years before his retirement
For example, if John is 35 year-old and earning $50,000 now. If he plans
to retire at 67, his final annual salary will be 50000*(1+g)(67-35),
where g is the annual salary growth rate.
Step 2: Estimate the required annual cash inflow for the first retirement year.
For example, we could assume that the expected cash inflow for the
first year after retirement is 80% or 85% of his/her last annual salary.
Step 3: estimate how many years after a person s retirement.
For instance, this value is 25 if John s life expectancy is 92 and retries at 67 (92-67).
Step 4: estimate the present value, at the time he retires, of a growing annuity
Input values: the 1st cash flow, a growth rate and an appropriate discount rate
Step 5: Factor in the social security benefit (this could be another data case)
Estimate the present value, at time of your retirement, of your Social Security benefit
Input values:
Monthly benefit
Discount rate
Step 6: John s net required cumulative wealth when he retires
(the result of Step 4 minus the result of Step 5)
Step 7: estimate John s required saving from now to that year.
Such as the annual saving or percentage saving
Primary Insurance Amount
https://www.ssa.gov/oact/COLA/piaformula.html
//////////////////////////
"
.C28EXPLAIN3<-"Which one is the best? CAPM, FF3, FFC4, or FF5
//////////////////////////
Objectives of this term project
1) understand different models: CAPM, FF3, FF4 and FF5
2) Understand how download and process data
3) Understand the T-value F-values and adjusted R2
a) CAPM: R(IBM) = Rf + beta*(Rm - Rf) (1)
where R(IBM) is the IBM's mean return or expected return
Rf is the risk-free rate
Rm is the market mean return or expected return
b) FF3 Fama-French 3-factor model:
R(IBM) = Rf + beta1*(Rm - Rf)+beta2*SMB + beta3*HML (2)
where SMB is small minus big, HML is high book-to-market
ratio portfolio minus low ratio portfolio
c) FF4 is the Fama-French-Carhart 4 factor model
R(IBM) = ff3 + beta4*MOM (3)
Where MOM is momentum factor
d) FF5 is the Fama-French 5-factor model:
R(IBM) = ff3 + beta4**RMW + beta5*CMA (4)
where RMW is Robust minus weak, CAM is
conservative minus Aggressive
Three questions:
1) Which criterion?
2) The performance is time-period independent?
3) In-sample estimation vs. out sample prediction
We use the adjusted R2 as our criterion to measure the performance of each model.
Step 1: download monthly price data from Yahoo!Finance
Step 2: choose a period to run various models
Step 3: summarize your testing results (sample statistics)
Step 4: (Optional: out-of-sample prediction)
source of data:
1) CRSP monthly data
2) http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html
//////////////////////////
"
.C28EXPLAIN4<-"Test of the January Effect using Excel
//////////////////////////
If the Efficient Market Hypothesis (EMH) holds, we should not expect many market
anomalies such as January Effect, Weekday Effect, momentum strategy (buy winners
and sell losers). However, many researchers and professionals have found that
returns in January are quite different from other months.
Question: Are January returns statistically different from other months?
mean return for January = mean return of none-January (1)
Choose about a dozen stocks to test the existence of so-called
January effect. A few companies are listed below. Note that S&P500
is listed as well.
Step 1: For one give ticker (or PERMNO) retrieve return data
from CRSP monthly data set
Step 2: Sort those monthly returns into two groups: returns
in January and returns in other months.
Step 3: For each stock/index, test whether its two means
are equal, see the above equation.
Step 4: repeat the above procedure for 100 stocks.
Comments on your results.
//////////////////////////
"
.C28EXPLAIN5<-"Bankruptcy prediction by using Z-score
//////////////////////////
The Altman s Z score is used to predict the possibility of a firm goes
to bankruptcy. This score is a weighted average of 5 ratios based
on a firm s balance sheet and income statement. For public firms,
Altman (1968) offers the following formula.
Z=3.3*X1+0.99*X2+0.6*X3+1.2*X4+1.4*X5, (1)
where the definitions of X1,X2,X3,X4 and X5 are given in the following table.
Variable Definition
-- -------------
X1 EBIT/Total Assets
X2 Net Sales/Total Assets
X3 Market Value of Equity/Total Liabilities
X4 Working Capital/Total Assets
X5 Retained Earnings/Total Assets
Based on the ranges of z-scores, we could classify public firms
into following 4 categories. Eidlenan (1995) finds that the
Z score correctly predicted 72% of bankruptcies two years
prior to the event.
Z-score range Description
------------- --------------
> 3.0 Safe
2.7 to 2.99 On Alert.
1.8 to 2.7 Good chances of going bankrupt within 2 years.
< 1.80 Probability of Financial distress is very high
References
Altman, Edward I.,2000,Predicting Financial Distress of Companies,
Retrieved on September 4th, 2009 from http://pages.stern.nyu.edu/~ealtman/Zscores.pdf
Altman, Edward I,1968,Financial Ratios, Discriminant Analysis and the
Prediction of Corporate Bankruptcy, Journal of Finance,189 209.
Eidleman, Gregory J.,1995,Z-Scores A Guide to Failure Prediction,
The CPA Journal Online, https://www.easycalculation.com/statistics/altman-z-score.php
//////////////////////////
"
.C28EXPLAIN6<-"Updating a monthly Excel data set and write an instruction
//////////////////////////
First, let's download the Excel data set
http://canisius.edu/~yany/data/monthlyYan.xlsx
The structure of this date set is very simple:
Three columns: ID, Date and Value, see the first selvera lines below.
ID date value
-- --------- ------
A 11/30/1999 38.96
A 12/31/1999 71.39
A 1/31/2000 61.12
A 2/29/2000 95.91
A 3/31/2000 96.03
A 4/28/2000 81.83
A 5/31/2000 67.98
A 6/30/2000 68.1
A 7/31/2000 37.63
A 8/31/2000 56.33
Note
(1) The frequency of the data set is monthly.
(2) \"A\" is the stock ticker
(3) For stocks, the last column called value
is the monthly adjusted price.
(3) for SMB (Fama-French factor), value is for factor,
i.e., return
Do the following things:
(a) find out all unique ID's
(b) update the data set
(c) write a 2-page manual on how to use this data set
i) how to estimate monthly returns
ii) how to estimate annual returns
iii) how to generate a n-stock matrix, such as 5-stocks matrix
//////////////////////////
"
.C28EXPLAIN7<-"Momentum trading strategy
//////////////////////////
One phrase summary: buy winners and sell losers.
Implied assumption: within a short-term (between 3 months and 12 months),
the winner will remain a winner while a loser would
continue to be a loser.
Two related questions:
----------------------
1) how to define a winner from a loser?
2) how to conduct a test?
Objectives of this term project:
1) learn CRSP monthly data set
2) learn how to use R to test the trading strategy
Source of data:CRSP monthly data
Basic logic: According to Jegadeesh and Titman (1993) it is a profitable trading
strategy if we buy the past winners and sell the past losers.
Notations: Check the past K-month returns, and then form a portfolio for L months,
Where K=3,6,9 and 12 and L=3, 6, 9 and 12. Below we use K=L=6 as an example.
Trading strategy: Estimate all stocks past 6- month returns and sort stocks
into 10 groups (deciles) according to their 6-month total returns. Long
the top decile (winners) and short the bottom decile (losers) for the next 6 months.
Procedure:
Step 0: Starting month: January 1965
Step 1: Retrieve CRSP data (PERMNO, DATE and RET)
Step 2: Estimate past 6-month cumulative returns R_t^6month
Step 3: Sort all stocks into deciles according to their cumulative 6-month returns
Step 4: Long winners (best return group) and short losers for the next 6-month
Step 5: Estimate portfolio returns
Step 6: Move to the next month and repeat the above steps until 12/1989
References
----------------------
Jegadeesh Narasimhan and Sheridan Titman, 1993, Returns to Buying Winners and
Selling Losers: Implications for Stock Market Efficiency, Journal of Finance
48 (1), 65-91.
http://canisius.edu/~yany/doc/momentumJF1993.pdf
Appendix A: Table 1 from Jegadeesh and Titman (1993).
http://canisius.edu/~yany/doc/momentumTable1.pdf
http://canisius.edu/~yany/doc/momentumTable1.png
//////////////////////////
"
.C28EXPLAIN8<-"Replicate 52-week high trading strategy
//////////////////////////////////////
George and Huang (2004) show that we could design a profitable trading strategy based on the 52-Week High. First, they estimate a ratio by dividing today s price by its 52-week high. Based on such a ratio, all stocks are sorted from the highest to the lowest. The stocks belong to the top (bottom) 30% are labeled as winners (losers). Again,
the trading strategy is to buy winners and sell losers. They demonstrate that such a trading strategy is
quite profitable with an average return difference of 0.45% per month between the winner and loser portfolios.
Objectives of this term project:
1) Understand how to download daily data from Yahoo!Finance [see the versions #1 or #2 below]
2) Understand how to use R to process data
3) Confirm or reject the so-called 52-week High trading strategy
Time period: as long as possible [versions #1 or #2]
July 1963 to December 2018 [Version #3]
Basic logic: According to George and Huang (2004) it is a profitable trading strategy if we based
on the ratio of the current stock price divided by its 52-week High
Trading strategy: Estimate all stocks 52-week high, estimate the ratio of today s price over its
52-week High, sort them from the highest to the lowest. Treat the top 30% as winners and bottom
30% as losers. Buy winner and sell losers.
Procedure for version #1:
Step 0: formulate your trading strategy: for example,
ratio > 0.8 you buy price - 52wLow
ratio < 0.3 you sell ratio = ------------------
(52wHigh - 52wlow)
Procedure for Version #1:
Step 1: Download one stock from Yahoo!finance [choose max for the time period]
Step 2: Estimate returns
Step 3: Sort data from the earliest to the latest
Step 4: Starting from observation # 253, estimate 52wHigh and 52wLow
Step 5: Calculate the ratio
Step 6: Based on your trading strategy, long or short the stock for the next period
Step 7: Generate a column for returns for this trading strategy
Step 8: Test whether this is a profitable trading strategy (benchmark is the long-only trading strategy)
Procedure for versions #2 and #3(CRSP data):
Step 1: Load data sets stockDaily and stockMonthly
Step 2: Starting month: July 1963
Step 3: Estimate all stocks 52-week High and estimate the ratio Price/52-week high
Step 4: Sort all stocks from highest to lowest
Step 5: Choose top 30% as winners and bottom 30% as losers
Step 6: Estimate equal-weighted portfolios return for both winner and looser portfolios
Step 7: Move to the next month and repeat the above steps until the last month (December 2001)
Step 8: Conduct a test
George, Thomas J, and Chuan-Yang Huang, 2004, The 52-week High and Momentum
Investing, Journal of Finance 54, 5, 2145-2176.
//////////////////////////////////////
"
.C28EXPLAIN9<-"Replicate a so-called Max trading strategy
//////////////////////////////////////
Bali, Cakici and Whitelaw (2011) find that sorting stocks by their
maximum daily returns (MAX) in the previous month could produce a
monthly return difference of more than 1% between the lowest and
highest MAX deciles. In addition, the alphas from running Fama-French-Carhart
4-factor model for those two extreme portfolios are significantly different.
Thus, we could design a profitable trading strategies based on stocks last month extreme daily returns.
Cources of data: stockDaily and stockMonthly.RData from CRSP
Objectives of this term project (version #1):
1) Understand how to download data from Yahoo!Finance
2) Prove or disapprove so-called the max-trading strategy
Objectives of this term project (version s #2 and #3):
1) Understand the CRSP database
2) Understand how to use Excel or R to retrieve and process data
3) Prove or disapprove so-called the max-trading strategy by
replicating Table 1 of Bali et al. (2011).
Basic logic: According to Bali et al. (2011) some investors like stocks with
lottery-type payoffs which have big past returns with a small probability.
Period: July 1962 to December 2005
Trading strategy: Estimate all stocks maximum returns in the last month,
sort stocks into 10 groups (deciles) according to their last month s
maximum daily returns. Long the top decile (winners) and short the
bottom decile (losers) for one month.
Procedure (for versions #1):
Step 0: choose your trading strategy, e.g.,
if ratio > 0.9 long
if ratio < 0.1 short
Step 1: download daily data for one stock
Step 2: Estimate the maximum daily returns of the previous month
Step 3: based on your trading strategy, long or short
Step 4: repeat until the last month
Step 6: test
Procedure (for versions #3):
Step 0: Starting month: July 1962
Step 1: load stockDaily
Step 2: Estimate the maximum daily returns of the previous month, i.e., May 1962
Step 3: Sort all stocks into deciles according to their maximum last month daily returns
Step 4: Long the top 10% and short the bottom 10%
Step 5: load stockMonthly and estimate portfolio returns and their difference
Step 6: Move to the next month and repeat the above steps until the last month (December, 2005)
References
Bali, Turan G., Nusret Cakici, and Robert F. Whitelaw, 2011, Maxing Out:
Stocks as Lotteries and the Cross-Section of Expected Returns, Journal
of Financial Economics 99 427-446.
//////////////////////////////////////
"
.C28EXPLAIN10<-"Spread estimation from daily price
//////////////////////////////////////
Spread is defined as the difference between ask and bid
Generally, the difference between two prices or interest rates. In stock trading,
the difference between the current bid and ask prices for a stock (the bid/ask or
bid/offer spread). In futures trading, the price difference between delivery
months for the same commodity or asset. In bond trading, the difference between
yields of bonds with similar quality and different maturities, or of different
quality and the same maturity. In underwriting, the difference between what the
issuer receives from the underwriter and what the underwriter receives from the
public (underwriting spread).
http://lexicon.ft.com/Term?term=spread
Roll (1984) designs a method to estimate the spreads by using the
first order covariance of price changes.
S=2 *sqrt(-cov(A, B) ) (1)
where A= deltaP(t-1)
B= deltaPP(t)
Objectives for this term-project
1) understand how to download and process daily data from Yahoo!finance
2) understand the logic behind the above formula
3) estimate Roll's spread for a dozen stocks
4) comment on your results
Source of data: CRSP
//////////////////////////////////////
"
.C28EXPLAIN11<-"Event Study using R
//////////////////////////////////////
One example: testing the impact of HSIC added to the S&P500 on March 18, 2015.
The basic idea for Event Study is to test whether our AR (Abnormal Return) is
statistically significant. The definition of abnormal return is given below.
AP=realized return-expected return (1)
To estimate our expected return, we apply the following linear regression.
y = alpha + beta* x (2)
where, y is the expected return and x is the market return on that day.
To estimate two parameters, a and , we run a linear regression or apply
related formulae by choosing an evaluation period of 252-day long,
starting the day before our event window counting backward, see below.
Estimation period Event-window
|-------------------------------|-------------|----------|
n days before event-day m-day after
Here is the design.
i) The event day is 3/18/2015
ii) Our event window: 10 days before and 10 days after
iii) The estimation period: from 253 days before to one-day before our event window
Thus, roughly we could download daily data from 2/1/2014 to 4/22/2015.
Step 1: download daily price data from HSIC and S&P500 (^GSPC).
Step 2: Choose only adjusted price, see the left panel below.
Then sort data from oldest to the latest, see the right panel below.
Step 3: estimate daily returns, see the formula in D3.
Step 4: Highlight our event day, a window around the event and the estimation period
For example, we could use red color for event day, 10-day before and
10-day after our event day. In addition, we could highlight our
estimation period green.
Step 5: Based on the estimation period, apply following formulae to estimate
intercept, slope, R2 and standard error
assume B column is for stock returns, C column is S&P500 returns
i) intercept =intercept(B,C)
ii) slope =slope (B,C)
iii) R2 =rsq (B,C)
iv) standard error =steyx (B,C)
Step 6: Estimate, expected return, AR (abnormal return),
CAP (cumulative abnormal return) and T-AP (T-value for abnormal return)
Expected return AP CAR T-value for AR
-------------- -- --- --------------
Expected return = intercept + slope * market
AR (abnormal return) = realized return expected return
CAR (cumulative abnormal return) = sum of all ARs up to today
T-value for AR = AR/standard error
Note: to make our spread sheet clean, we have two choices:
i) Hide many rows
ii) Copy above four output values to a place near our event window
Comment on your results.
//////////////////////////////////////
"
.C28EXPLAIN12<-"Monte Carlo Simulation to mimic a slot machine
//////////////////////////////////////
Objectives:
1) understand related statistics
2) apply the Excel randbetween() function
3) learn to link picture to a cell and
4) using the vlookup() function to search a table of pictures
Task #1: A simple case with just three numbers
Assume that we have three objects: apple, banana and eggplant, see below.
We enter three numbers and try to output three corresponding fruits by using the Excel vlookup() function.
Q1: What is the probability of winning, defined as matching three?
Q2: Assume the cost of one play is $1, what is the winning price if this is a fair game ?
Q3: What is the expected value, if the cost of one play is $1 and our winning price is $7?
Task #2: Design a slot machine with 3 objects with pictures.
Step 1: generate the following entries.
Below, we use C16 for apple as an example. Searching online
to find a apple image. Right click the picture of apple, then choose format picture .
Step 2: we manually enter three numbers in cells B3, C3 and D3,
Our objective is to search our picture table (fruit pictures)
to output corresponding three fruits. In this case, we expect
to see apple, apple and banana.
Step 3: Click cell C16 (not apple but the cell), copy, then select
our destination cell, i.e., F3, then from Paste link to choose
\"Lined Picture (I)\", see the right image below.
Step 4: Click \"Formula\", \"Define Name\", see below, where X will
be our image column, i.e., C16:C18, Y is our indicator, B3,
Z is our number columns, i.e., B16:B18. Below, we define a
name called firstNumber.
Step 5: Click picture in F3 and we replace =$C$16 with =firstNumber
(or other name you defined), see the right image above. Repeat
the same procedure for other two cells.
Task 3: Build a slot machine with 10 different fruits and assume that
the machine would have a slight advantage to the owner of the
machine, such as for 1 million plays, the casino would have a profit of $100.
Q4: What is the winning price if we have three same pictures?
Q5: What is your result after playing 100 times?
References
http://en.wikipedia.org/wiki/Slot_machine
//////////////////////////////////////
"
.C28EXPLAIN13<-"Monte Carlo Simulation to mimic Black Jack
//////////////////////////////////////
This is a 2-player game: a dealer and a player. Below, we assume
that you are the player.
Rule #1: cards 2 to 10 have their face value, while J, Q, and K
are worth 10 points and Ace is worth either 1 or 11
points (player's choice).
Terminology:
Blackjack : one A plus any card worth 10 points.
Lose : the player's bet is taken by the dealer.
Win : the player wins as much as he bet.
Blackjack (natural): the player wins 1.5 times the bet.
Push : the player keeps his bet, neither winning nor losing money.
Step 1: the dealer draw two cards, one face up, while the player draw two cards (face up)
Step 2: the player could draw the third card
Win or lose: if the sum of your cards is less than 21 and
is bigger than dealer s, you win.
http://www.pagat.com/banking/blackjack.html
//////////////////////////////////////
"
.C28EXPLAIN14<-"Benford Law and accounting fraud detection
//////////////////////////////////////
Benford Law is also called the First-Digit Law which gives different frequencies
for 9 first digits from 1 to 9. Convention wisdom would conclude that each
(first) digit would have roughly the same frequency, i.e., 1/9=0.1111=11%.
However, according to the Benford Law, the lower is the value of a digit,
the higher is its probability. In other words, we will see more values
with leading digit of 1 than with the leading digit of 2. The probability
of each digit is given by the following formula.
Prob(d)=log10((d+1)/d) (1)
where Prob() is the probability (frequency), d is the digit, and log10()
is the log function with a base of 10. For Excel, log10() is the same as log().
Digit Formula probability
---- ---- ----
1 =log10(2/1) 0.301
2 =log(3/2) 0.176
3 =log(4/3) 0.125
4 =log(5/4) 0.097
5 =log(6/5) 0.079
6 =log(7/6) 0.067
7 =log(8/7) 0.058
8 =log(9/8) 0.051
9 =log(10/9) 0.046
------------ -----
Total 100%
Objectives:
1) understand Benford Law
2) download about a dozen companies annual reports
3) estimate the distributions of the 1st digits
4) report your results and discuss
Procedure:
To download annual financial statements.
Step 1: go to Yahoo!Finance http://finance.yahoo.com/
Step 2: enter a ticker, such as IBM
Step 3: find three types of financial statements.
Step 4: download those financial statements
Note 1: the function to get the first digit is =left(cell, 1)
Note 2: you could use the Excel countif() function.
References
Accounting Web, 20 Ways You Can Detect Fraud, 2014,
http://www.accountingweb.com/aa/law-and-enforcement/20-ways-you-can-detect-fraud
Sharma, Anuj, Prabin Kumar Panigrahi, 2012, A Review of Financial Accounting Fraud
Detection based on Data Mining Techniques, INternationla Journal of Computer
Aplication 39, 1, https://arxiv.org/ftp/arxiv/papers/1309/1309.3944.pdf
MCGINTY, JO CRAVEN, 2014, Accountants Increasingly Use Data Analysis to Catch
Fraud, Auditors Wield Mathematical Weapons to Detect Cheating, http://www.wsj.com/articles/accountants-increasingly-use-data-analysis-to-catch-fraud-1417804886
Testing Benford Law, http://testingbenfordslaw.com/
What is Benford Law, https://en.wikipedia.org/wiki/Benford%27s_law#cite_note-Nigrini-19
//////////////////////////////////////
"
.C28EXPLAIN15<-"Readability of 10-K filings and firm's performance
//////////////////////////////////////
Objectives:
1) Understand the usage of 10-K
2) learn how to parse 10-K
3) understand the Fog-index and learn how to calculate it for each 10-K filing
4) Comments on your result
Source of data
a) SEC EDGAR (Electronic Data Gathering , Analysis and Retrieval)
b) I have all 10-K filings from Q1 1993 to Q2 2016 (the number of filings is
210,842 and the size is 440G)
Structure vs. unstructured data
The unstructured information has a lion share of all information,
70% to 80% and it is reported that 80% of structured information
came from unstructured one. On the other-hand, SEC filings is an
important source of information (gold mine) since public companies,
certain insiders, and broker-dealers are required to make regular SEC filings.
Text analysis
Text is one of the most important informant belongs to unstructured
information. Text analysis, also called text mining, also referred
to as text data mining, roughly equivalent to text analytics, refers
to the process of deriving high-quality information from text.
For example, we could look at the frequency of each words, keywords,
number of lines, sentences, frequency of positive words vs. negative,
tone of the speech etc. For example, let s look at the top used words
by Reagan in 1994 and Obama 2008, see below. Which one belongs to Obama?
Text analysis for finance and accounting
Applying text analysis to finance and accounting does not have a long history.
Li (2008) shows that the readability of 10-K filings has a statistically
significant impact on the performance of a firm s subsequent performance.
The readability measure used by Li (2008) is call Fog index defined below.
Fog index=0.4*(n+p) (1)
where, n is the average number of words per sentence, while p is
the percentage of complex words. A complex word is a word has
more than two syllables.
Because of defining and measuring readability in the context of financial
disclosures becomes important with the increasing use of textual analysis
and the SEC s plain English initiative, Lougran and McDonald (2015) show
that the Fog Index the most commonly applied readability measure is poorly
specified in financial applications. Of Fog s two components, one is
mis specified and the other is difficult to measure. They suggest to use
the size of 10-K filing as a simple readability proxy and show that it
outperforms the Fog Index. Another added advantage is that it does not
require document parsing, thus facilitates replication.
According to Loughran and McDonald (2014), there are 632 different forms.
On the other hand, most researchers used only one or two forms, such
as 10-K. Thus, the SEC filings database is a gold mine waiting to
be explored.
Reference
Li, Feng, 2008, Annual report readability, current earnings, and earnings
persistence, Journal of Accounting and Economics 45, 221 247.
source(\"http://datayyy.com/textAnalysis.txt\")
//////////////////////////////////////
"
.C28EXPLAIN16<-"business cycle indicator
//////////////////////////////////////
There exist many profitable trading strategies, such as the individual stock
momentum, first documented in Jegadeesh and Titman (1993), the industry momentum
in Moskowitz and Grinblatt (1999), the effect of the 52-week high price in George
and Hwang (2004), and the effect of the maximum daily return in a month in
Bali et al. (2011). However, Yan and Zhang (2016) argue that those profitable
trading strategies would not be profitable during difficult time. In other words,
investors would change their behavior during a recession.
Objectives:
1) understand the concept of business cycle
2) generate a business cycle indicator
3) if possible run CAPM by including this indicator
Source of data:
1) Business cycle data is from the National Bureau of Economic Research center.
The original starting date is June 1854.
2) stock data is from Yahoo!Finance.
Comments on your result
References
Bali, Turan G., Nusret Cakici, and Robert F. Whitelaw, 2011, Maxing Out:
Stocks as Lotteries and the Cross-Section of Expected Returns,
Journal of Financial Economics 99 427-446.
George, Thomas J, and Chuan-Yang Hwang, 2004, The 52-week High and Momentum
Investing, Journal of Finance 54, 5, 2145-2176.
Grinblatt, Mark, and Bing Han, 2005, Prospect theory, mental accounting,
and momentum, Journal of Financial Economic 78, 311-339.
Jegadeesh, N., and S. Titman, 1993, Returns to Buying Winners and Selling
Losers: Implications for Stock Market Efficiency, Journal of Finance 48, 65-91.
Moskowitz, Tobias, and Mark Grinblatt, 1999, Do industries explain momentum?
Journal of Finance 54, 2017-2069.
Yan, Yuxing and Shaojun Zhang, 2016, Business cycle, investors preferences
and trading strategies, Frontiers of Business Research in China (forthcoming)
Table 1 from Yan and Zhang(2016)
For a peak, we assign a positive 1 while for a trough, we assign a
negative 1. Any months between those peaks and troughs, we linearly
interpolate, see Panel B below. P for Peak and T for Trough. T(t-1)
is for the pervious Trough and P(t-1) is for the previous Peak.
Contraction Expansion cycle
Peak (P) Trough (T) P to T T(t-1) to P T(-1) to T P(t-1) to P
----------- ----------------- ------ ---------- ----------- -----
May 1923(II) July 1924 (III) 14 22 36 40
October 1926(III) November 1927 (IV) 13 27 40 41
August 1929(III) March 1933 (I) 43 21 64 34
May 1937(II) June 1938 (II) 13 50 63 93
February 1945(I) October 1945 (IV) 8 80 88 93
November 1948(IV) October 1949 (IV) 11 37 48 45
July 1953(II) May 1954 (II) 10 45 55 56
August 1957(III) April 1958 (II) 8 39 47 49
April 1960(II) February 1961 (I) 10 24 34 32
December 1969(IV) November 1970 (IV) 11 106 117 116
November 1973(IV) March 1975 (I) 16 36 52 47
January 1980(I) July 1980 (III) 6 58 64 74
July 1981(III) November 1982 (IV) 16 12 28 18
July 1990(III) March 1991(I) 8 92 100 108
March 2001(I) November 2001 (IV) 8 120 128 128
December 2007(IV) June 2009 (II) 18 73 91 81
//////////////////////////////////////
"
.C28EXPLAIN17<-"illiquidity measure, Amihud (2002)
//////////////////////////////////////
Objective: estimate 12 stocks illiquidity measures for each month in 2016.
Note: you choose the last 6 stock yourself.
Comments on your findings
6 stock symbols are given below.
Company name Ticker Industry
---------------------- ------ ------------------
1 Microsoft Corporation MSFT Application software
2 Apple Inc. AAPL Personal Computer
3 Citigroup Inc. C Money Center Banks
4 Wal-Mart Stores, Inc. WMT Discount, Variety Stores
5 Home Depot, Inc. HD Home improvement services
.. ................. ... ......
12 General Electric Corp GE Technology
Amihud (2002) illiquidity measure uses the absolute daily return over
its corresponding trading dollar volume. A monthly stock illiquidity
measure is the mean of daily illiquidity measure.
1 |Ri|
illiq(t)= --- * sum (--------------- ) (1)
n pi * Vi
where illiq(t) is a monthly illiquidity measure, n is the number of
trading days within the month, Ri is daily return on day i,
Vi is the trading volume on day i and Pi is the closing
price of the underlying stock on day i.
The Amihud illiquidity measure includes two components:
spread and the impact of trading. Illiquidity is the
opposite of liquidity, i.e., a higher value indicates a low
liquidity and a small value indicates a higher liquidity level. Why?
Step 1: download daily price data from Yahoo Finance
Step 2: estimate daily returns and dollar trading volume
Step 3: estimate the ratio
Step 4: estimate monthly illiquidity measures
References
Amihud, Yakov, 2002, Illiquidity and Stock returns, Journal of Financial Markets 5, 31-56.
"
.C28EXPLAIN18<-"liquidity measure, Pastor and Stambough (2003)
//////////////////////////////////////
Objectives:
1) Understand the logic of the measure
2) Learn how to download and process data from Yahoo!Finance
3) estimate individual stock s liquidity
Basic logic:Pastor and Stambaugh (2003) design the following
regression to estimate individual stock s liquidity.
y(t)=alha + 1*x1(t-1)+ 2*x2(t-1)+error(t) (1)
where,y(t) is the excess stock return on day t, the excess return is
defined as R(t)-Rm(t), R(t) is the stock return, Rm(t) is the
market return at time t; x1(t-1) is the lagged stock return,
i.e., R(t-1), X2(t-1) is the lagged dollar trading volume, i.e.,
x2(t-1)=P(t-1)*V(t-1), P(t-1) is the daily closing price of the stock
at t-1 and V(t-1) is the daily trading volume at t-1.
The regression is based on the daily data within each month with a
minimum number of observations of 15. The liquidity measure for an
individual stock in each month is defined as:
liquidity measure=beta2 (2)
For the first trial, we ignore other constraints. The Market liquidity
is the equally weighted individual stock's liquidity and scaled by
the market capitalization.
Procedure:
Step 1: Retrieve daily data
Step 2: generate y, x1, x2 for each stocks
Step 3: Run regression (1) to estimate 2 for each month
References
Pastor, L. & Stambaugh, R., 2003, Liquidity risk and expected stock returns.
Journal of Political Economy 111, 642-685.
//////////////////////////////////////
"
.C28EXPLAIN19<-"Spread estimation from TAQ (Trade and Quote) high-frequency data
//////////////////////////////////////
Objectives:
1) Understand the structure of TAQ database
2) Using Excel to retrieve data from one day s data sets
3) estimate the spread for 10 stocks
High-Frequency trading has attracted lots of attention because of its
huge profits and because of it is not clear whether it is fair to small
investors and its impact on the health of the stock market. According to
Investopedia, HFT (High-Frequency Trading) is defined as: A program trading
platform that uses powerful computers to transact a large number of orders
at very fast speeds. High-frequency trading uses complex algorithms to analyze
multiple markets and execute orders based on market conditions. Typically, the
traders with the fastest execution speeds will be more profitable than traders
with slower execution speeds. As of 2009, it is estimated more than 50% of exchange
volume comes from high-frequency trading orders. To understand HFT, we have to
understand TAQ (Trade and Quote) database.
Data Sets: November 1, 2004 is randomly selected as our day, see its 4
data sets below. Two index files have an extension of .idx , while
two data files have an extension of .bin .
12/01/2004 04:03 PM 1,800,548,334 Q200411a.bin
12/01/2004 04:03 PM 182,424 Q200411a.idx
12/01/2004 04:08 PM 184,899,853 T200411a.bin
12/01/2004 04:08 PM 169,334 T200411a.idx
4 File(s) 1,985,799,945 bytes
References
Philips, Matthew, What Michael Lewis Gets Wrong About High-Frequency Trading, 4/1/2014
http://www.bloomberg.com/bw/articles/2014-04-01/what-michael-lewis-gets-wrong-about-high-frequency-trading
Appendix A: first several lines from CQ (consolidated Quotes) from TAQ
symbol date time bid ofr bidsiz ofrsiz mode EX MMID
A 20040401 8:00:02 30.62 32.64 1 1 12 P
A 20040401 8:11:40 29.68 33.58 20 20 12 P
A 20040401 8:12:56 30.7 33.58 2 20 12 P
A 20040401 8:30:02 0 0 0 0 12 T BRUT
A 20040401 8:30:02 1 100 1 1 12 T CAES
A 20040401 8:30:02 0 0 0 0 12 T DATA
A 20040401 8:30:02 0 0 0 0 12 T MADF
Table 1: structure of a binary index file. The size (bit) of an index file is 22 with 4 variables.
# Name of the variable Meaning Size Type
1 Ticker Stock symbol 10 Character
2 Date Trading date 4 Integer
3 Begrec Beginning record 4 Integer
4 Endrec Ending record 4 Integer
Table 2: Structure of binary CT (Consolidated Trade) file. The size is 29 with 8 variables.
# Name of the variable Meaning Size Type
1 Time Trading time 4 Integer
2 Price Trading price 8 Float
3 Tseq Sequence number 4 Integer
4 Size Trading size 4 Integer
5 G127 G127 rule 2 Integer
6 CORR Correction 2 Integer
7 COND Sale condition 4 Character
8 Ex Exchange 1 Character
Table 3: Structure of a binary CQ (Consolidated Quote) file. The size is 39 with 9 variables.
# Name of the variable Meaning Size Type
1 Time Trading time 4 Integer
2 Bid Bid price 8 Float
3 Ofr Ask price 8 Float
4 Qseq Sequence number 4 Integer
5 Bidsiz Bid size 4 Integer
6 Asksiz Ask size 4 Integer
7 MODE quote condition 2 Integer
8 EX Exchange 1 Character
9 MMID NASDAQ market maker 4 Character
//////////////////////////////////////
"
.C28EXPLAIN20<-"Reverse mortgage calculator
//////////////////////////////////////
EXAMPLE #1: John Bosworth, Age 68
Home Value - $250,000
Home Equity - $210,000
Approximate Mortgage Balance - $40,000
Challenge: John is a widower who lives at home alone. He would like to
keep his home, but is having trouble making payments and meeting expenses.
His monthly mortgage payment is $611. Even with both Social Security
income and pension, he is still short by $187 per month
Solution: John takes out a tax free reverse mortgage for $142,496. He takes a
lump sum of $40,000 and applies it to his existing mortgage and the
balance in monthly payments of $681. After paying the mortgage off
entirely, John s monthly income rises to $1,291. That s $611 per
month for the mortgage payment, plus another $681 from the reverse mortgage.
EXAMPLE #2
Craig Jenkins, Age 82, and Sylvia Jenkins, Age 79 (reverse mortgages are
calculated using the age of the youngest home owner)
Home Value - $375,000
Home Equity - $375,000
Challenge:
Craig and Sylvia both take medication to stay in good health. The cost
of monthly meds and treatments makes it difficult for them to find the
money needed to maintain the quality of life they once enjoyed.
Solution:
They take out a tax free reverse mortgage with the option of one lump sum
totaling $218,419, or a monthly income of $1,495. The extra cash flow from
their reverse mortgage more than covers their monthly cost for medication,
and allows Craig and Sylvia more freedom with much less stress.
EXAMPLE #3
Kathy Tobias, Age 63, and Rinaldi Tobias, Age 71 (reverse mortgages are
calculated using the age of the youngest home owner)
Home Value - $165,000
Home Equity - $165,000
Challenge:
Kathy and Rinaldi would like to spend their retirement traveling around
the U.S. in their RV, but don t have extra money they would need to help
pay for rising gas prices and other added travel expenses.
Solution:
They take out a tax free reverse mortgage of $82,419. This will give them
an extra $519 per month which they can use any way they d like, and more
than supplements their need for gas and RV maintenance.
EXAMPLE #4
Gordon Penilla, Age 62, and Joanne Penilla, Age 65 (reverse mortgages are
calculated using the age of the youngest home owner)
Home Value - $850,000
Home Equity - $850,000
Challenge:
Gordon and Joanne have no real debts, and their monthly income is
adequate for them to live life as planned, but they would like to
help out with the cost of college tuition for a grand child. For that,
their income monthly and savings do not suffice.
Solution:
Gordon and Joanne take out a tax free reverse mortgage credit line allowing
up to $265,411. Each grandparent can now bestow a monetary gift to the grandchild,
the amount being that which is currently allowed by law.
Note 1: Reverse mortgage proceeds are based upon the current interest rates at the time the
loan closes, the age of the youngest borrower, and the equity in the home. The examples
above are based on an interest rate of 6.26%.
Note 2: Borrowers can lock rates in for 60 days from the date of application to the closing.
All rates adjust weekly, and the rate for closing is determined by the weekly rate
set on Tuesdays of each week (excluding Federal Holidays) and stay valid until the following Monday.
http://www.seacoastreversemortgage.com/loanOptions/Custom%20Pages/Scenario%20Examples/
http://www.kiplinger.com/article/retirement/T035-C000-S001-reverse-mortgages-risky-for-boomers.html
//////////////////////////////////////
"
.C28EXPLAIN21<-"KMV model and default probability
//////////////////////////////////////
Objective:
1) Estimate market value and its volatility for KMV model
2) estimate default point
3) estimate default probability
KMV stands for Kealhofer, McQuown and Vasicek who found a company focusing
on measuring default risk. KMV methodology is one of the most important
methods to estimate the probability of default for a given company by
using its balance sheet information and the equity market information.
The objective here is to estimate the market value of total assets (A) and
its corresponding volatility (sigmaA). The result will be used to estimate
default distance and default probability.
The basic idea is to treat the equity of a firm as a call option and the debt is
its strike price. Let us look at the following simplest example. For a firm,
if its debt is $80 and equity is $20 then the total assets will be $100.
Assume that the assets jump to $110 and the debt remains the same, the equity
increases to $30. On the other hand, if the assets drop to $90, the equity
will be only $10. Since the equity holders are the residual claimer, their
value has the following expression.
E = max?(assets - debt,0)=max?(A-D,0) (1)
Recall for a call option, we have the following payoff function.
Payoff(call) = max?(sT-K,0) (2)
This means that we could treat equity as a call option with debt as
our exercise price. With appropriate notations, we will have the
following formulae for a firm s equity. KMV model is defined below.
E=A*N(d1 )-e^(-rT) N(d2)
ln?(A/D)+(r+0.5*sigma^2 )T
d1 = --------------------------- (3)
sigmaA * sqrt(T)
d2 =d1- sigmaA *sqrt(T)
On the other hand, the following relationship between the volatilities
of the equity and the total assets holds. In the following equation,
we have delta=dE/(dVA )=N(d1 ).
sigmaE=A/E
N(d1)*A*sigmaA
delta_sigmaA= ------------- (4)
E
Since d1 and d2 are defined by the above equations, we have two equations
for two unknown (A and sgimaA), see below. Thus, we could use a
trial-and-error or simultaneous equation method to solve for those
two unknowns. Eventually, we want to solve the following two equations for A and s_A.
E=A*N(d1)-e^(-rT) *N(d2 )
A
sigmaE= --- * N(d1)*siamgA (5)
E
We should pay attention that the estimated A (Market value of total assets)
from Equation (5) is different from the summation of market value of assets
plus the book value of the debt. The usages of those two derived values
(A and sigmaA) will be used by Equations (6-8).
Here is a KMV example: E=110,688 (shares outstanding*price of stock),
D=64,062 (total debt), Rf=0.07 (risk-free rate), T=1 (1 year).
The result is A=170,558 sA=0.29. Based on the following codes
we got A=170,393 and sigmaE=0.2615. The output is : A=170,393.78
and sigmaE is 0.2615. Please pay attention that the summation of
the book value of debt and the market value of equity is 174,750 (?170,558).
Distance to Default
Distance to default (DD) is defined by the following formula, where A
is the market value of the total assets and sigmaA is its risk.
The interpretation of this measure is clear, the higher DD, the safer is the firm.
A - Default Point
DD= ----------------------- (6)
A *sigmaA
In terms of Default Point, there is no theoretical fixed default point.
However, we could use all short-term debts plus the half of long-term
debts as our default point. After we have the values of MV of assets
and its volatility, we could use the following equation to estimate the
Distance to Default. The A and s_A are from the output from Equation (5).
On the other hand, if the default point equals to E, we would have the following formula.
ln(VA/D)+(r-0.5*sigmaA^2)T
DD= - ------------------------------- (7)
sigmaA*sqrt(T)
Note that there is a negative sign in front of the ratio
According to Black-Scholes model, the relationship between DD and Default Probability
is given below.
DP(Default Probability) = N(-DD) (8)
//////////////////////////////////////
"
.C28EXPLAIN22<-"Financial statement analysis
//////////////////////////////////////
Objective:
1) Understand the importance of financial statement analysis
2) understand the definitions of various ratios, such as
Debt/equity ratio, ROE, ROA, DuPont Identity
3) Compare the performance of the firm with itself and with peers
4) Given your recommendations
Note: If you could \"automate\" your process, it will be more
meaningful. For example, you spend one day to finaish one company.
How long you would finish the next one or 10th one?
Here are potential helps.
1) get financial statement easily, see type
c28
2) You can use some simple macro, such as record your operation
c26
Tool: Excel
Procedure:
1) Download a company's several years' financial statements
2) Conduct analysis such as ratio analysis
3) Compare its performance with itself and with peers
4) Write your comments and recommendation
//////////////////////////////////////
"
.C28EXPLAIN23<-"Black and Litterman model (1992)
//////////////////////////////////////
Objective:
----------
1) Understand the shortcomings of our optimization model
2) understand the contributions of Black and Litterman (1992)
3) using Excel to illustrate a few examples
4) Extension?
Sources
-------
blacklitterman.org
http://blacklitterman.org
Black and Little example
http://canisius.edu/~yany/excel/blacklitterman.xlsx
//////////////////////////////////////
"
.C28EXPLAIN24<-"Brandt, Santa-Clara and Valkanov model (2009)
//////////////////////////////////////
Objectives:
----------
1) Understand the traditional Markowitz mean-variance optimization
http://www.effisols.com/basics/MVO.htm
https://en.wikipedia.org/wiki/Modern_portfolio_theory
2) shortcomings and limitations of the current optimization model
3) understand the contributions of Brandt et al. (2009)
Brandt, Santa-Clara, Valkanov approach (2009)
http://www.nber.org/papers/w10996
4) using R to implement their approach
5) Extension
Data sources
-------
1) CRSP
2) Compustat
3) Prof. French's Data Library
Parametric Portfolio Policies: Exploiting Characteristics in the
Cross Section of Equity Returns
Abstract
We propose a novel approach to optimizing portfolios with large numbers of assets.
We model directly the portfolio weight in each asset as a function of the asset's
characteristics. The coefficients of this function are found by optimizing the
investor's average utility of the portfolio's return over the sample period.
Our approach is computationally simple, easily modified and extended, produces
sensible portfolio weights, and offers robust performance in and out of sample.
In contrast, the traditional approach of first modeling the joint distribution
of returns and then solving for the corresponding optimal portfolio weights is
not only difficult to implement for a large number of assets but also yields
notoriously noisy and unstable results. Our approach also provides a new test
of the portfolio choice implications of equilibrium asset pricing models.
We present an empirical implementation for the universe of all stocks in the
CRSP-Compustat dataset, exploiting the size, value, and momentum anomalies.
//////////////////////////////////////
"
.C28EXPLAIN25<-"TORQ database
//////////////////////////////////////
The TORQ database contains transactions, quotes, order processing data and
audit trail data for a sample of 144 NYSE stocks for the three months
November, 1990 through January 1991. This document covers installation,
formatting and use of the data. Conceptual and institutional details
concerning the data are given in a companion publication Hasbrouck
and Sosebee (1992).
These data are distributed for purposes of academic research. No warranty is
made that they are free of errors. The user assumes all responsibility for
the consequences of any errors.
Objectives:
----------
1) Understand TORQ database
2) understand how to retrieve data from the binary data sets
3) illustrate a few applications
4) comments
Sources
-------
Manual,
http://people.stern.nyu.edu/jhasbrou/Research/Working%20Papers/TORQDOC3.PDF
Source of data
http://people.stern.nyu.edu/jhasbrou/Research/Working%20Papers/
//////////////////////////////////////
"
.C28EXPLAIN26<-"SEC filings (dealing with index files)
//////////////////////////////////////
.Objectives:
----------
1) Understand what is the usages of SEC filings
2) understand how to search SEC EDGAR platform
3) download one quarterly file and using Excel to explore
a) How many company
b) how many CIK
c) how many forms
e) frequency of those forms
f) others
4) collect all observations related to 10K and
generate an R data sets
5) potential applications
Sources
-------
https://www.sec.gov/edgar.shtml
Quarterly index files
https://www.sec.gov/Archives/edgar/full-index/
The first several lines from Q3 2017
-------------------------------------------
Description: Master Index of EDGAR Dissemination Feed by Company Name
Last Data Received: September 30, 2017
Comments: webmaster@sec.gov
Anonymous FTP: ftp://ftp.sec.gov/edgar/
Company Name Form Type CIK Date Filed File Name
---------------------------------------------------------------------------------------------------------------------------------------------
(OurCrowd Investment in MST) L.P. D 1599496 2017-08-24 edgar/data/1599496/0001465818-17-000048.txt
1 800 FLOWERS COM INC 10-K 1084869 2017-09-15 edgar/data/1084869/0001437749-17-015969.txt
1 800 FLOWERS COM INC 3 1084869 2017-07-27 edgar/data/1084869/0001140361-17-028807.txt
1 800 FLOWERS COM INC 3 1084869 2017-07-27 edgar/data/1084869/0001140361-17-028809.txt
1 800 FLOWERS COM INC 3 1084869 2017-07-27 edgar/data/1084869/0001140361-17-028810.txt
1 800 FLOWERS COM INC 3 1084869 2017-07-27 edgar/data/1084869/0001140361-17-028811.txt
1 800 FLOWERS COM INC 3 1084869 2017-07-27 edgar/data/1084869/0001140361-17-028812.txt
1 800 FLOWERS COM INC 3 1084869 2017-07-27 edgar/data/1084869/0001140361-17-028813.txt
1 800 FLOWERS COM INC 3 1084869 2017-07-27 edgar/data/1084869/0001140361-17-028814.txt
1 800 FLOWERS COM INC 3 1084869 2017-07-27 edgar/data/1084869/0001140361-17-028815.txt
1 800 FLOWERS COM INC 3 1084869 2017-07-27 edgar/data/1084869/0001140361-17-028816.txt
Note: combine a) and b) below
a) https://www.sec.gov/Archives/
b) 2017-08-24 edgar/data/1599496/0001465818-17-000048.txt
we have
https://www.sec.gov/Archives/edgar/data/1599496/0001465818-17-000048.txt
//////////////////////////////////////
"
.C28EXPLAIN27<-"Rattle R package
//////////////////////////////////////
Welcome to Rattle (rattle.togaware.com).
Rattle is a free graphical user interface for Data Science, developed using R.
R is a free software environment for statistical computing, graphics, machine
learning and artificial intelligence. Together Rattle and R provide a
sophisticated environment for data science, statistical analyses, and
data visualization.
See the Help menu for extensive support in using Rattle. The books Data Mining with
Rattle and R and Essential Data Science are available from Amazon. The Togaware
Desktop Data Mining Survival Guide includes Rattle documentation and is available
from datamining.togaware.com
Rattle works with open source R which is limited to datasets and processing
that fit into your computers memory. Further details from
https://docs.microsoft.com/en-us/r-server/
Rattle is licensed under the GNU General Public License, Version 2. Rattle comes
with ABSOLUTELY NO WARRANTY. See Help -> About for details.
Rattle Version 5.1.0. Copyright 2006-2017 Togaware Pty Ltd. Rattle is a registered
trademark of Togaware Pty Ltd. Rattle was created and implemented by Graham Williams
with contributions as acknowledged in 'library(help=rattle)'.
-----------------------------------------
First, install an R package called rattle
>install.packages(\"rattle\")
-----------------------------------------
> library(rattle)
Rattle: A free graphical interface for data science with R.
Version 5.1.0 Copyright (c) 2006-2017 Togaware Pty Ltd.
Type 'rattle()' to shake, rattle, and roll your data.
>
> rattle()
https://cran.r-project.org/web/packages/rattle/vignettes/rattle.pdf
//////////////////////////////////////
"
.C28EXPLAIN28<-"SEC 10-K: BS, IS or CF
//////////////////////////////////////
This is a very interesting projects.
If you could generate BS or IS, it will be more than enough.
Step 1: download all SEC Financial Statements at
https://www.sec.gov/dera/data/financial-statement-data-sets.html
Step 2: unzip one and write a SAS program to retrieve data
Step 3: work on one zip file
Step 4: write SAS programs to generate many
individual SAS data sets or generate
one big SAS data set,
Step 5: Generate your own BS
Method I: download latest several years
BS from Yahoo!Finance
replicate with your data
Method II: generate your own BS
Advantage with your data sets
a) 10 years' data from 2009 to 2018
b) next year, we will have one more year
c) Easily estimate industry means such as
CA
Quick Ratio ----------
CL
CA is the current assets
CL is the current liability
d) you could generate some SAS, R or Python
data sets
//////////////////////////
"
.C28EXPLAIN29<-"SEC Forms 3, 4 and 5
//////////////////////////////////////
What are Forms 3, 4, and 5?
Corporate insiders meaning a company's officers and directors, and any beneficial owners
of more than ten percent of a class of the company's equity securities registered under
Section 12 of the Securities Exchange Act of 1934 must file with the SEC a statement
of ownership regarding those securities. On August 27, 2002, the SEC adopted rules and
amendments to Section 16 of the Exchange Act, implementing the provisions of the
Sarbanes-Oxley Act of 2002 that accelerated the deadline for filing most insider
ownership reports.
The initial filing is on Form 3. An insider of an issuer that is registering equity
securities for the first time under Section 12 of the Exchange Act must file this
Form no later than the effective date of the registration statement. If the issuer
is already registered under Section 12, the insider must file a Form 3 within ten
days of becoming an officer, director, or beneficial owner.
Changes in ownership are reported on Form 4 and must be reported to the SEC within
two business days. You can find the limited categories of transactions not subject
to the two-day reporting requirement in the new rule.
Insiders must file a Form 5 to report any transactions that should have been reported
earlier on a Form 4 or were eligible for deferred reporting. If a Form must be filed,
it is due 45 days after the end of the company's fiscal year.
Today, the financial statement analysis has nothing to do with
insider trading.
Step 1: download all SEC Financial Statements by using
.dumpSECfinS function from 2009 to 2018
Step 2: unzip one and write a SAS program to retrieve data
Step 3: work on one zip file
Step 4: write SAS programs to generate many
individual SAS data sets for Forms 3, 4 and 5
Step 5: Make your data sets quite user friendly
Advantages with your data sets
a) 10 years' data from 2009 to 2018
b) next year, we will have one more year
c) Easily estimate all insiders trades
d) you could generate some SAS, R or Python data sets
https://www.sec.gov/fast-answers/answersform345htm.html
//////////////////////////
"
.C28EXPLAIN30<-"SEC 10-K (13-f)
//////////////////////////////////////
What is 13-f?
--------------
Form 13F-?Reports Filed by Institutional Investment Managers
An institutional investment manager that uses the U.S. mail (or other means
or instrumentality of interstate commerce) in the course of its business,
and exercises investment discretion over $100 million or more in Section
13(f) securities (explained below) must report its holdings on Form 13F
with the Securities and Exchange Commission (SEC).
In general, an institutional investment manager is: (1) an entity that
invests in, or buys and sells, securities for its own account; or (2)
a natural person or an entity that exercises investment discretion over
the account of any other natural person or entity. Institutional
investment managers can include investment advisers, banks, insurance
companies, broker-dealers, pension funds, and corporations.
Form 13F is required to be filed within 45 days of the end of a calendar
quarter. The Form 13F report requires disclosure of the name of the
institutional investment manager that files the report, and, with respect
to each section 13(f) security over which it exercises investment discretion,
the name and class, the CUSIP number, the number of shares as of the end of
the calendar quarter for which the report is filed, and the total market value.
Today, the financial statement analysis does not consider the holdings
of financial institutions.
Step 1: download all SEC Financial Statements by using
.dumpSECfinS function from 2009 to 2018
Step 2: unzip one and write a SAS program to retrieve data
Step 3: work on one zip file
Step 4: write SAS programs to generate many
individual SAS data sets for Forms 3, 4 and 5
Step 5: Make your data sets quite user friendly
Advantages with your data sets
a) 10 years' data from 2009 to 2018
b) next year, we will have one more year
c) Easily estimate all insiders trades
d) you could generate some SAS, R or Python data sets
https://www.sec.gov/fast-answers/answers-form13fhtm.html
//////////////////////////
"
.C28EXPLAIN31<-"SEC Mutual Fund Prospectus Risk/Return Summary Data Sets
//////////////////////////
The Mutual Fund Prospectus Risk/Return Summary Data Sets provides text and
numeric information extracted from the risk/return summary section of
mutual fund prospectuses. The data is extracted from exhibits to mutual
fund prospectuses tagged in eXtensible Business Reporting Language (XBRL).
The information is presented without change from the \"as filed\" submissions
by each registrant as of the date of the submission. The data is presented
in a flattened format to help users analyze and compare corporate disclosure
information over time and across registrants.
The data sets will be updated quarterly. Data contained in documents filed
after 5:30PM Eastern on the last business day of a quarter will be included
in the subsequent quarterly posting.
https://www.sec.gov/dera/data/mutual-fund-prospectus-risk-return-summary-data-sets
The Mutual Fund Prospectus Risk-Return Summary Data Sets (PDF, 207 kb)
https://www.sec.gov/dera/data/rr1.pdf
provides documentation of scope, organization, file formats and table definitions.
//////////////////////////
"
.C28EXPLAIN32<-"Census Summary Form 1 (SF1)
//////////////////////////
What is SF 1?
Summary File 1 (SF 1) contains the data compiled from the questions asked of
all people and about every housing unit. Population items include sex, age,
race, Hispanic or Latino origin, household relationship, household type,
household size, family type, family size, and group quarters.
Housing items include occupancy status, vacancy status, and tenure (whether
a housing unit is owner-occupied or renter-occupied).
There are 177 population tables (identified with a \"P\") and 58 housing tables
(identified with an \"H\") shown down to the block level; 82 population tables
(identified with a \"PCT\") and 4 housing tables (identified with an \"HCT\")
shown down to the census tract level; and 10 population tables (identified with
a \"PCO\") shown down to the county level, for a total of 331 tables. The SF 1
Urban/Rural Update added 2 PCT tables,increasing the total number to 333 tables.
There are 14 population tables and 4 housing tables shown down to the block level
and 5 population tables shown down to the census tract level that are repeated by
the major race and Hispanic or Latino groups.
SF 1 includes population and housing characteristics for the total population,
population totals for an extensive list of race (American Indian and Alaska
Native tribes, Asian, and Native Hawaiian and Other Pacific Islander) and
Hispanic or Latino groups, and population and housing characteristics for
a limited list of race and Hispanic or Latino groups. Population and housing
items may be cross-tabulated. Selected aggregates and medians also are provided.
A complete listing of subjects in this file is found in the \"Subject Locator\" chapter.
To download all data, type
.dumpCensusSF1
source of data
https://www2.census.gov/census_2010/04-Summary_File_1/
Manual
https://www.census.gov/prod/cen2010/doc/sf1.pdf
//////////////////////////
"
.C28EXPLAIN33<-"Census Summary Form 2 (SF2)
//////////////////////////
What is SF2?
Summary File 2 (SF 2) contains the data compiled from the questions asked of
all people and about every housing unit. SF 2 includes population characteristics,
such as sex, age, average household size, household type, and relationship to
householder such as nonrelative or child. The file includes housing characteristics,
such as tenure (whether a housing unit is owner-occupied or renter-occupied),
age of householder, and household size for occupied housing units.
Selected aggregates and medians also are provided. A complete listing of
subjects in SF 2 is found in Chapter 3, Subject Locator. The layout of the
tables in SF 2 is similar to those in SF 1.
These data are presented in 47 population tables (identified with a \"PCT\")
and 14 housing tables (identified with an \"HCT\") shown down to the census
tract level; and 10 population tables (identified with a \"PCO\") shown
down to the county level, for a total of 71 tables. Each table is iterated
for 331 population groups: the total population, 75 race categories, 114
American Indian and Alaska Native categories (reflecting 60 tribal groupings),
47 Asian categories (reflecting 24 Asian groups), 43 Native Hawaiian and Other
Pacific Islander categories (reflecting 22 Native Hawaiian and Other Pacific
Islander groups) and 51 Hispanic/not Hispanic groups. The presentation of SF 2
tables for any of the 331 population groups is subject to a population threshold
of 100 or more people. That is, if there are fewer than 100 people in a specific
population group in a specific geographic area, their population and housing
characteristics data are not available for that geographic area in SF 2.
To download all data, type
.dumpCensusSF2
Source of data
https://www2.census.gov/census_2010/05-Summary_File_2/
Manual
https://www.census.gov/prod/cen2010/doc/sf2.pdf
//////////////////////////
"
.C28EXPLAIN34<-"Census Demographic profile
//////////////////////////
A short intro
-------------
The Demographic Profile Summary File contains 100 percent data asked of
all people and about every housing unit on topics such as sex, age, race,
Hispanic or Latino origin, household relationship, household type, group
quarters population, housing occupancy, and housing tenure.
GEOGRAPHIC CONTENT
The Demographic Profile Summary File is released as individual files for
the United States, each of the 50 states, the District of Columbia, and
Puerto Rico. The data items are identical for all files, but the geographic
coverage differs.
The summary level sequence chart outlines the hierarchical and geographic
summaries in their entirety.
To download all data, type
---------------------------
.dumpCensusDemographicProfile
Source of the data
---------------------------
https://www2.census.gov/census_2010/03-Demographic_Profile/
Manual
---------------------------
https://www.census.gov/prod/cen2010/doc/dpsf.pdf
Manual about the data structure
---------------------------
https://www2.census.gov/census_2010/03-Demographic_Profile/0README_DPSF.pdf
//////////////////////////
"
.C28EXPLAIN35<-"Census Redistribution
//////////////////////////
To download all data, type
.dumpCensusRedistribution
source of data
https://www2.census.gov/census_2010/redistricting_file--pl_94-171/
Manual
//////////////////////////
"
.C28EXPLAIN36<-"Census Congressional Districts113
//////////////////////////
To download all data, type
.dumpCensusCongressionalDistricts113
Source of data
https://www2.census.gov/census_2010/08-SF1_Congressional_Districts_113/
//////////////////////////
"
.C28EXPLAIN37<-"Census Congressional Districts115
//////////////////////////
To download all data, type
.dumpCensusCongressionalDistricts115
Source of data
https://www2.census.gov/census_2010/08-SF1_Congressional_Districts_115/
//////////////////////////
"
.C28EXPLAIN38<-"Survey of Consumer Finances (SCF)
//////////////////////////
To download all data, type
.dumpSCF
Source of data
https://www.federalreserve.gov/econres/scfindex.htm
//////////////////////////
"
"
1 R package Shiny Ryan, Jiawen, Dan and Dan
2 Business cycle indicator Huanyuan, Yixiang, and Yilin
3 Retirement Calculator Xiaobing and Tian
4 Momentum strding strategy Bhrigu, Samkit and Maharsh
5 Financial Statement Analysis Arunmohan, Darshan, Husham and Heta
6 Black-Litterman Model KANNAN, YUQIONG and JINGYU
7 Which one is the best? Brandon Pritchard
8 Simulation for Jackjack Brandon, Wen-chien, Yujie
9 Prdict bankruptcy using Z-score Xiufeng, Chenyu and Wei
10 KMV model & default probability Kunal,Zuzar and Zhoongjian
"
.C28EXPLAIN39<-"Supporting data sets and codes
//////////////////////////////////////
To help students finish various topics, I have generated many
data sets and wrote some basic functions. Below is a partial
list.
Data Sets for CRSP:
---------
CRSP monthly stock data set
CRSP monthly index data set
CRSP daily stock data sets
CRSP daily index data set
CRSP information data set
CRSP S&P500 add and delete data set
Data Sets for Fama-French factors:
---------
FF3monthly : Fama-Fench 3 factor monthly data set
FFC4monthly: Fama-French-Carhard 4 factor monthly data set
FF5monthly : Fama-French 5 factor monthly data set
FF3daily : Fama-Fench 3 factor daily data set
FFC4daily : Fama-French-Carhard 4 factor daily data set
FF5daily : Fama-French 5 factor monthly daily dat set
Other data Sets
---------
tradingDaysM: trading days for monthly data set
tradingDaysD: trading days for daily data set
Programs for CRSP:
--------
loadCRSP
explainCRSP
show_crspInfo
show_sp500monthly
show_sp500daily
show_sp500add
showIndexMonthly
showIndexDaily
showStockMonthly
showStockDaily
findInfoGivenTickers
findPERMNOgivenTickers
getCRSPmonthlyGivenPERMNO
getStockMonthlyGivenTickerAddTicker
getCRSPmonthlyIndexRet
getStockMonthlySeveralPERMNOs
getStockMonthlySeveralTickers
oneStockPlusSP500monthly
ewPortfolio
capWeighted
Programs for others:
--------
getYear
getMonth
getDay
saveYan
//////////////////////////////////////
"
.C28EXPLAIN40<-"Projects taken already
//////////////////////////////////////
# Name of the topic Group Date to present
-- ---------------------- --------------------------- ----------------
1 Test of the January Effect Amarendra
2 52-week high trading strategy Steve,Ed,Guodong, Rick
3 Financial statement analysis Hariston, Eric
4 R package quantmod Bennett and Sharan
5 PerformanceAnalytics Hongjian, Chris, ZiAng
//////////////////////////////////////
"
.C28EXPLAIN40_2018<-"Projects taken already (updated on 11/30/2018)
//////////////////////////////////////
# Name of the topic Group Date to present
-- ---------------------- --------------------------- ----------------
1 Market correlations Sam, Valerie,Ryan 12/3
2 VaR estimation using R Mohit, Madhavi,Dhruv Mistry 12/3
3 Fin Statement Analysis Linh, Chandra, Aditya 12/3
4 Black Jack Simulation Karthik,Lakshmi,Priya,Sumaita 12/3
5 KMV default prob. Ankita,Sarang,Nidhi,Sonal 11/26
6 52-w high/low strategy Trusha, Sindhu,Yogesh 12/3
7 Portfolio Analytics packages Nick, Alex 11/26
8 Max trading strategy Saharsh,Alex,Morgan,Gina
9 Bankruptcy prediction Kapil,Will,Prateek,Yousuf,Edward 12/3
10 SEC 10-K filings->BS Bill,Mark 12/3
11 Black-Litterman model Owen, Yongjicai, Zhong 12/3
12 MonteCarlo for slot machine Ruby,Supreet,Ashher 12/3
13 Illiquidity Measure,Amihud Raaj, Jingwei,Hongjun 12/3
14 Retirement calculator Qi,Juncen,Zhenglin 12/3
15 Spread from daily prices Matt, Niveditha 12/3
//////////////////////////////////////
"