AS3-2: Stock Market Dynamics

Using some series of stock prices, in this notebook we will learn how to …

handle time series data
use line plot to visualize sequential data
draw multiple lines in the same plot and
add assisting lines for for comparison

We will also learn how to

apply functions to every element of collective objects such as vector or list

First of all, we read.csv() the following data files + data/IBMStock.csv + data/GEStock.csv + data/ProcterGambleStock.csv + data/CocaColaStock.csv + data/BoeingStock.csv

into data frames IBM、GE、PnG、CocaCola and Boeing and put these data frame in a list object L.

L = list(
  IBM = read.csv("data/IBMStock.csv"),
  GE = read.csv("data/GEStock.csv"),
  PnG = read.csv("data/ProcterGambleStock.csv"),
  CocaCola = read.csv("data/CocaColaStock.csv"),
  Boeing = read.csv("data/BoeingStock.csv"))

We say a list is a collective object because it accommodate more than one sub-elements. Collectives are also regarded as iteratives, because we can apply some function on each of their elements repeatedly. For examples …

🌻 lapply(x, fun) applies fun to each element of x and return the results in a list

lapply(L, class)

$IBM
[1] "data.frame"

$GE
[1] "data.frame"

$PnG
[1] "data.frame"

$CocaCola
[1] "data.frame"

$Boeing
[1] "data.frame"

🌻 sapply(x, fun) do the same thing as lapply plus it simplified the resultant object whenever it’s possible …

sapply(L, class)

         IBM           GE          PnG     CocaCola       Boeing 
"data.frame" "data.frame" "data.frame" "data.frame" "data.frame"

Here it returns a named vector which is simpler than a list.

sapply(L, names)

     IBM          GE           PnG          CocaCola     Boeing      
[1,] "Date"       "Date"       "Date"       "Date"       "Date"      
[2,] "StockPrice" "StockPrice" "StockPrice" "StockPrice" "StockPrice"

In one line of code, we see that there are two columns Date and StockPrice in each of these data frames

Besides the build in functions, we can define our own functions, For an example …

L = lapply(L, function(df) {
  df$Date =  as.Date(df$Date, format="%m/%d/%y")
  df
  } )

We define and apply a function that takes a data frame df, converts the Date column and returns the data frame. When we apply it to L, we accomplish 5 date conversion operations in one shoot.

You’d find it easier in answering the following questions by lapply and sapply.

Section-1 Summary Statistics

§ 1.1 Our five datasets all have the same number of observations. How many observations are there in each data set?

#
#

§ 1.2 What is the earliest year in our datasets?

#
#

§ 1.3 What is the latest year in our datasets?

#
#

§ 1.4 What is the mean stock price of IBM over this time period?

#
#

§ 1.5 What is the minimum stock price of General Electric (GE) over this time period?

#
#

§ 1.6 What is the maximum stock price of Coca-Cola over this time period?

#
#

§ 1.7 What is the median stock price of Boeing over this time period?

#
#

§ 1.8 What is the standard deviation of the stock price of Procter & Gamble over this time period?

#
#

Section-2 Visualizing Stock Dynamics

§ 2.1 Around what year did Coca-Cola has its highest stock price in this time period? Around what year did Coca-Cola has its lowest stock price in this time period?

#
#

§ 2.2 In March of 2000, the technology bubble burst, and a stock market crash occurred. According to this plot, which company’s stock dropped more?

#
#

§ 2.3 (a) Around 1983, the stock for one of these companies (Coca-Cola or Procter and Gamble) was going up, while the other was going down. Which one was going up?

#
#

In the time period shown in the plot, which stock generally has lower values?

#
#

Section-3 Visualizing Stock Dynamics 1995-2005

§ 3.1 Which stock fell the most right after the technology bubble burst in March 2000?

#
#

§ 3.2 Which stock reaches the highest value in the time period 1995-2005?

#
#

§ 3.3 In October of 1997, there was a global stock market crash that was caused by an economic crisis in Asia. Comparing September 1997 to November 1997, which companies saw a decreasing trend in their stock price? (Select all that apply.)

#
#

§ 3.4 In the last two years of this time period (2004 and 2005) which stock seems to be performing the best, in terms of increasing stock price?

#
#

Section-4 Monthly Trends

§ 4.1 For IBM, compare the monthly averages to the overall average stock price. In which months has IBM historically had a higher stock price (on average)? Select all that apply.

#
#

§ 4.2 General Electric and Coca-Cola both have their highest average stock price in the same month. Which month is this?

#
#

§ 4.3 For the months of December and January, every company’s average stock is higher in one month and lower in the other. In which month are the stock prices lower?

#
#

● NINJA’s DOJO ●

By combining tapply and sapply, we can acquire the monthly average stock prices for every stocks …

mx = sapply(L, function(df) {
  tapply(df$StockPrice, format(df$Date, "%m"), mean) })
mx

     IBM    GE   PnG CocaCola Boeing
01 150.2 62.05 79.62    60.37  46.51
02 152.7 62.52 79.03    60.73  46.89
03 152.4 63.15 77.35    62.07  46.88
04 152.1 64.48 77.69    62.69  47.05
05 151.5 60.87 77.86    61.44  48.14
06 139.1 56.47 77.39    60.81  47.39
07 139.1 56.73 76.65    58.98  46.55
08 140.1 56.50 76.82    58.88  46.86
09 139.1 56.24 76.62    57.60  46.30
10 137.3 56.24 76.68    57.94  45.22
11 138.0 57.29 78.46    59.10  45.15
12 140.8 59.10 78.30    59.73  46.17

❓ Can you see how it works?

🌻 apple() applies a function to a two dimensional object in either row or column direction (specified by its second argument, see online help for detail.)

apply(mx, 2, which.max)

     IBM       GE      PnG CocaCola   Boeing 
       2        4        1        4        5

So we can answer questions 4.1 and 4.2 in one shoot.

To answer 4.3 we need to compare the monthly averages of January and December for the 5 stocks. We simply use an index that select the 1st and the 12th rows of mx …

mx[c(1,12),]

     IBM    GE   PnG CocaCola Boeing
01 150.2 62.05 79.62    60.37  46.51
12 140.8 59.10 78.30    59.73  46.17

🌷 This is what we mean by analysis - organizing data in a way that serves our interest.

💡 The apply Family：
R’s major mechanism for iteration is the apply() functions …
■ tapply(x, category, fun) applies fun to x by category
■ lapply(x, fun) applies fun to each elements of x and returns a list
■ sapply(x, fun) same as lapply but simply the return object whenever possible
■ apply(x, margin, fun) applies fun to every row (column) of x when margin is set to 1 (2)