Using some series of stock prices, in this notebook we will learn how to …
We will also learn how to
vector
or
list
First of all, we read.csv()
the following data files +
data/IBMStock.csv
+ data/GEStock.csv
+
data/ProcterGambleStock.csv
+
data/CocaColaStock.csv
+
data/BoeingStock.csv
into data frames
IBM
、GE
、PnG
、CocaCola
and Boeing
and put these data frame in a list object
L
.
= list(
L IBM = read.csv("data/IBMStock.csv"),
GE = read.csv("data/GEStock.csv"),
PnG = read.csv("data/ProcterGambleStock.csv"),
CocaCola = read.csv("data/CocaColaStock.csv"),
Boeing = read.csv("data/BoeingStock.csv"))
We say a list
is a collective object because it
accommodate more than one sub-elements. Collectives are also regarded as
🌻 lapply(x, fun)
applies fun to each element of
x
and return the results in a list
lapply(L, class)
$IBM
[1] "data.frame"
$GE
[1] "data.frame"
$PnG
[1] "data.frame"
$CocaCola
[1] "data.frame"
$Boeing
[1] "data.frame"
🌻 sapply(x, fun)
do the same thing as
lapply
plus it simplified the resultant object whenever
it’s possible …
sapply(L, class)
IBM GE PnG CocaCola Boeing
"data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
Here it returns a named vector which is simpler than a list.
sapply(L, names)
IBM GE PnG CocaCola Boeing
[1,] "Date" "Date" "Date" "Date" "Date"
[2,] "StockPrice" "StockPrice" "StockPrice" "StockPrice" "StockPrice"
In one line of code, we see that there are two columns
Date
and StockPrice
in each of these data
frames
Besides the build in functions, we can define our own functions, For an example …
= lapply(L, function(df) {
L $Date = as.Date(df$Date, format="%m/%d/%y")
df
df } )
We define and apply a function that takes a data frame
df
, converts the Date
column and returns the
data frame. When we apply it to L
, we accomplish 5 date
conversion operations in one shoot.
You’d find it easier in answering the following questions by
lapply
and sapply
.
§ 1.1 Our five datasets all have the same number of observations. How many observations are there in each data set?
#
#
§ 1.2 What is the earliest year in our datasets?
#
#
§ 1.3 What is the latest year in our datasets?
#
#
§ 1.4 What is the mean stock price of IBM over this time period?
#
#
§ 1.5 What is the minimum stock price of General Electric (GE) over this time period?
#
#
§ 1.6 What is the maximum stock price of Coca-Cola over this time period?
#
#
§ 1.7 What is the median stock price of Boeing over this time period?
#
#
§ 1.8 What is the standard deviation of the stock price of Procter & Gamble over this time period?
#
#
§ 2.1 Around what year did Coca-Cola has its highest stock price in this time period? Around what year did Coca-Cola has its lowest stock price in this time period?
#
#
§ 2.2 In March of 2000, the technology bubble burst, and a stock market crash occurred. According to this plot, which company’s stock dropped more?
#
#
§ 2.3 (a) Around 1983, the stock for one of these companies (Coca-Cola or Procter and Gamble) was going up, while the other was going down. Which one was going up?
#
#
#
#
§ 3.1 Which stock fell the most right after the technology bubble burst in March 2000?
#
#
§ 3.2 Which stock reaches the highest value in the time period 1995-2005?
#
#
§ 3.3 In October of 1997, there was a global stock market crash that was caused by an economic crisis in Asia. Comparing September 1997 to November 1997, which companies saw a decreasing trend in their stock price? (Select all that apply.)
#
#
§ 3.4 In the last two years of this time period (2004 and 2005) which stock seems to be performing the best, in terms of increasing stock price?
#
#
§ 4.1 For IBM, compare the monthly averages to the overall average stock price. In which months has IBM historically had a higher stock price (on average)? Select all that apply.
#
#
§ 4.2 General Electric and Coca-Cola both have their highest average stock price in the same month. Which month is this?
#
#
§ 4.3 For the months of December and January, every company’s average stock is higher in one month and lower in the other. In which month are the stock prices lower?
#
#
By combining tapply
and sapply
, we can
acquire the monthly average stock prices for every stocks …
= sapply(L, function(df) {
mx tapply(df$StockPrice, format(df$Date, "%m"), mean) })
mx
IBM GE PnG CocaCola Boeing
01 150.2 62.05 79.62 60.37 46.51
02 152.7 62.52 79.03 60.73 46.89
03 152.4 63.15 77.35 62.07 46.88
04 152.1 64.48 77.69 62.69 47.05
05 151.5 60.87 77.86 61.44 48.14
06 139.1 56.47 77.39 60.81 47.39
07 139.1 56.73 76.65 58.98 46.55
08 140.1 56.50 76.82 58.88 46.86
09 139.1 56.24 76.62 57.60 46.30
10 137.3 56.24 76.68 57.94 45.22
11 138.0 57.29 78.46 59.10 45.15
12 140.8 59.10 78.30 59.73 46.17
❓ Can you see how it works?
🌻 apple()
applies a function to a two dimensional
object in either row or column direction (specified by its second
argument, see online help for detail.)
apply(mx, 2, which.max)
IBM GE PnG CocaCola Boeing
2 4 1 4 5
So we can answer questions 4.1
and 4.2
in
one shoot.
To answer 4.3
we need to compare the monthly averages of
January and December for the 5 stocks. We simply use an index that
select the 1st and the 12th rows of mx
…
c(1,12),] mx[
IBM GE PnG CocaCola Boeing
01 150.2 62.05 79.62 60.37 46.51
12 140.8 59.10 78.30 59.73 46.17
🌷 This is what we mean by analysis - organizing data in a way that serves our interest.
💡 The apply
Family:
R’s major mechanism for iteration is the
apply()
functions …
■ tapply(x, category, fun)
applies fun
to
x
by category
■
lapply(x, fun)
applies fun
to each elements of
x
and returns a list
■ sapply(x, fun)
same
as lapply
but simply the return object whenever
possible
■ apply(x, margin, fun)
applies
fun
to every row (column) of x
when margin is
set to 1 (2)