1. Intro to Basics

Take your first steps with R. In this chapter, you will learn how to use the console as a calculator and how to assign variables. You will also get to know the basic data types in R. Let’s get started.

How it works

In the editor on the right there is already some sample code. Can you see which lines are actual R code and which are comments?

# Calculate 3 + 4
[1] 7
# Calculate 6 + 12
[1] 18

🌻 輸入運算式(expression),R會輸出運算的結果

Arithmetic with R
# An addition
5 + 5 
[1] 10
# A subtraction
5 - 5 
[1] 0
# A multiplication
3 * 5
[1] 15
 # A division
(5 + 5) / 2 
[1] 5


# Exponentiation: Type 2^5 in the editor to calculate 2 to the power 5.
[1] 32
# Modulo: Type 28 %% 6 to calculate 28 modulo 6.
[1] 4

🌻 Quick-R有很多對初學者有用的資訊

Variable assignment

我們可以用 <-=,把運算的結果儲存到一個「資料物件(data object, variable)」裡面

# Assign the value 42 to x
x <- 42
# Print out the value of the variable x
[1] 42
Variable assignment (2)
# Assign the value 5 to the variable my_apples
# Print out the value of the variable my_apples
[1] 5
Variable assignment (3)
# Assign a value to the variables my_apples and my_oranges
my_apples <- 5

# Assign to my_oranges the value 6.
my_oranges <- 6

# Add these two variables together
my_apples + my_oranges
[1] 11
# Create the variable my_fruit
my_fruit = my_apples + my_oranges


Apples and oranges

Arithmetic operators allow objects of numeric datatypes but not character.

# Assign a value to the variable my_apples
my_apples <- 5 

# Fix the assignment of my_oranges, so that it can be added with `my_apples`
# my_oranges <- "six"
my_oranges <- 6

# Create the variable my_fruit and print it out
my_fruit <- my_apples + my_oranges 
[1] 11
Basic datatypes in R
# Change my_numeric to be 42
my_numeric <- 42

# Change my_character to be "universe"
my_character <- "universe"

# Change my_logical to be FALSE
my_logical <- FALSE

🌻 常用的R資料種類(types)

  • 整數(integer)、實數(numeric)
  • 文字、字串(character)
  • 類別(factor)
  • 邏輯(logical/boolean)
  • 日期(Date)、時間(POXIXct,…)

What’s that data type?
# Declare variables of different types
my_numeric <- 42
my_character <- "universe"
my_logical <- FALSE 

# Check class of my_numeric
[1] "numeric"
# Check class of my_character
[1] "character"
# Check class of my_logical
[1] "logical"

2. Vectors 向量

向量(vector)是哪裡面最基本的資料結構,一個向量物件就是一個系列同一種資料種類(data type)的值。

We take you on a trip to Vegas, where you will learn how to analyze your gambling results using vectors in R. After completing this chapter, you will be able to create vectors in R, name them, select elements from them, and compare different vectors.

Create a vector
# Assign the value "Go!" to the variable `vegas`. Remember: R is case sensitive!
vegas <- "Go!"
Create a vector (2)

In R, you create a vector with the combine function c(). You place the vector elements separated by a comma between the parentheses. For example:
numeric_vector <- c(1, 10, 49)
character_vector <- c("a", "b", "c")

🌻 Use the c() function to create vector.

🌻 All elements in c() must be the same datatype.


vector1 <- c(1.2, 3.5, 15.2)            # `vector1`是一個數值向量
vector2 = c("Alice", "Bob", "Cindy")    # `vector2`是一個文字向量
# Complete the code for `boolean_vector` contains the three elements: `TRUE`, `FALSE` and `TRUE` (in that order).
boolean_vector <- c(TRUE, FALSE, TRUE)
Create a vector (3)
# Poker winnings from Monday to Friday
# 從星期一到星期五玩撲克贏或輸的錢
poker_vector <- c(140, -50, 20, -120, 240)
# Roulette winnings: Monday lost $24, Tuesday lost $50, 
# Wednesday won $100, Thursday lost $350, and Friday won $10.
# 從星期一到星期五玩輪盤贏或輸的錢
roulette_vector <- c(-24,-50,100,-350,10)
Naming vector elements 向量元件的名稱


[1]  140  -50   20 -120  240


# Assign days as names of poker_vector
names(poker_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
   Monday   Tuesday Wednesday  Thursday    Friday 
      140       -50        20      -120       240 

🌻 如上所示,名稱可以讓物件(和子物件)都變得更容易解讀。

# Assign days as names of roulette_vector
names(roulette_vector) = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
   Monday   Tuesday Wednesday  Thursday    Friday 
      -24       -50       100      -350        10 
Naming a vector (2)
# Poker winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240)

# Roulette winnings from Monday to Friday
roulette_vector <- c(-24, -50, 100, -350, 10)

# The variable days_vector
# 如果我們與先把週一到週五的文字預先存放在`days_vector`裡面
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
# Assign the names of the day to roulette_vector and poker_vector
# 程式寫起來就會比較簡捷
names(poker_vector) <- days_vector  
names(roulette_vector) <- days_vector
Calculating total winnings 向量的數學運算
A_vector <- c(1, 2, 3)
B_vector <- c(4, 5, 6)

# Take the sum of A_vector and B_vector
total_vector <- A_vector + B_vector
# Print out total_vector
[1] 5 7 9
Calculating total winnings (2)
# Assign to total_daily how much you won/lost on each day
total_daily <- poker_vector + roulette_vector
   Monday   Tuesday Wednesday  Thursday    Friday 
      116      -100       120      -470       250 
Calculating total winnings (3)

R用function_name()來表示功能呼叫,如sum(vector1)會回傳vector1之中所有數值的總和,常用的R內建功能請參考:Built-in Functions

# Total winnings with poker 玩撲克總共贏了多少錢呢?
total_poker <- sum(poker_vector)

# Total winnings with roulette 玩輪盤總共贏了多少錢呢?  
total_roulette <-  sum(roulette_vector)

# Total winnings overall 這一周的總輸贏是?  
total_week <- total_poker + total_roulette

# Print out total_week
[1] -84
Comparing total winnings
# Calculate total gains for poker and roulette
total_poker <- sum(poker_vector)
total_roulette <-  sum(roulette_vector)

# Check if you realized higher total gains in poker than in roulette
total_poker > total_roulette
[1] TRUE

🌻 Comparison Operators 比較運算符號

  • < for less than
  • > for greater than
  • <= for less than or equal to
  • >= for greater than or equal to
  • == for equal to each other
  • != not equal to each other

🌻 比較運算式(Comparison Expression)運算的結果會是邏輯值:TRUEFALSE

🌷 索引(index)可以取出物件之中的某些子物件,R用[]來作索引

🌷 R的索引方式非常靈活,一共有三種索引方式:

  • 位置索引:[整數向量]如:[c(3,5,10)], [2]
  • 名稱索引:[文字向量]如:[c(“Monday”,“Friday”)]
  • 條件索引:[邏輯向量]如:[poker_vector > 0]

Vector selection: the good times 位置(整數)索引
# Assign the poker results of Wednesday to the variable poker_wednesday. Using index notation `[]`
poker_wednesday <- poker_vector[3]
Vector selection: the good times (2) 位置(整數)索引
# Assign the poker results of Tuesday, Wednesday and Thursday to the variable poker_midweek, using `[c(2,3,4)]`
poker_midweek <- poker_vector[c(2,3,4)]
Vector selection: the good times (3) 位置(整數)索引


# Poker and roulette winnings from Monday to Friday:
# Assign to roulette_selection_vector the roulette results from Tuesday up to Friday; make use of `[2:5]`
roulette_selection_vector <- roulette_vector[2:5]
Vector selection: the good times (4) 名稱(文字)索引
# Poker and roulette winnings from Monday to Friday:
# Select poker results by names `[c("Monday", "Tuesday", "Wednesday")]`
poker_start <- poker_vector[c("Monday", "Tuesday", "Wednesday")]
   Monday   Tuesday Wednesday 
      140       -50        20 


# Calculate the average of the elements in poker_start by `mean()`
[1] 36.67
Selection by comparison - Step 1

🌷 條件索引是最常用的索引

# Poker and roulette winnings from Monday to Friday:
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector

# Which days did you make money on poker?
selection_vector <- poker_vector > 0
# Print out selection_vector
   Monday   Tuesday Wednesday  Thursday    Friday 
     TRUE     FALSE      TRUE     FALSE      TRUE 

🌻 Comparison Operators

  • < for less than
  • > for greater than
  • <= for less than or equal to
  • >= for greater than or equal to
  • == for equal to each other
  • != not equal to each other
Selection by comparison - Step 2
# Select from poker_vector these days using the indexing vector `[selection_vector]`
poker_winning_days <- poker_vector[selection_vector]
   Monday Wednesday    Friday 
      140        20       240 
Advanced selection
# Poker and roulette winnings from Monday to Friday:
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector

# Which days did you make money on roulette?
selection_vector <- roulette_vector > 0

# Select from roulette_vector these days
roulette_winning_days <- roulette_vector[selection_vector]
Wednesday    Friday 
      100        10 

3. Matrices 矩陣

矩陣(maatrix)與向量一樣,一個矩陣物件裡面所有的子物件都必須要有同樣的資料類別(data type),不過矩陣是一種二維的資料結構,向量的子元件是一維的排列,而矩陣的子元件是二維的排列。在這個章節裡面, 我們先練習在R語言裡面如何定義和使用矩陣。

A matrix is two dimensional data object of a collection of elements of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns.

You can construct a matrix in R with the matrix() function. Consider the following example: matrix(1:9, byrow = TRUE, nrow = 3)

What’s a matrix?


# Construct a matrix with 3 rows containing the numbers 1 up to 9, filled row-wise.
matrix(1:9, byrow=T, nrow=3)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
Analyze matrices, you shall


# Box office Star Wars (in millions!)
new_hope <- c(460.998, 314.4)
empire_strikes <- c(290.475, 247.900)
return_jedi <- c(309.306, 165.8)

# Create box_office
# Concatenate the 3 vectors `c(new_hope, empire_strikes, return_jedi)` 
# Then create a matrix by `matrix()`. Remember to specify `byrow` and `nrow`
box_office <- c(new_hope,empire_strikes,return_jedi)
[1] 461.0 314.4 290.5 247.9 309.3 165.8
# Construct star_wars_matrix
star_wars_matrix <- matrix(box_office, byrow=T, nrow=3)
# print out the matrix
      [,1]  [,2]
[1,] 461.0 314.4
[2,] 290.5 247.9
[3,] 309.3 165.8
Naming a matrix


# Box office Star Wars (in millions!)
new_hope <- c(460.998, 314.4)
empire_strikes <- c(290.475, 247.900)
return_jedi <- c(309.306, 165.8)

# Construct matrix
star_wars_matrix <- matrix(c(new_hope, empire_strikes, return_jedi), nrow = 3, byrow = TRUE)

# Vectors region and titles, used for naming
region <- c("US", "non-US")
titles <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
# Name the columns with region with `colnames()`
colnames(star_wars_matrix) = region

# Name the rows with titles  with `rownames()`
rownames(star_wars_matrix) = titles

# Print out star_wars_matrix
                           US non-US
A New Hope              461.0  314.4
The Empire Strikes Back 290.5  247.9
Return of the Jedi      309.3  165.8
Calculating the worldwide box office


# Calculate worldwide box office figures for each movies with `rowSums()`
worldwide_vector <- rowSums(star_wars_matrix) 

             A New Hope The Empire Strikes Back      Return of the Jedi 
                  775.4                   538.4                   475.1 
Adding a column for the Worldwide box office


# Construct worldwide box office vector
# Bind the new variable worldwide_vector as a column to star_wars_matrix with `cbind()`
all_wars_matrix <-cbind(star_wars_matrix,worldwide_vector)  

                           US non-US worldwide_vector
A New Hope              461.0  314.4            775.4
The Empire Strikes Back 290.5  247.9            538.4
Return of the Jedi      309.3  165.8            475.1
Adding rows


star_wars_matrix2 = matrix(
  c(474.5,  552.5, 310.7,  338.7, 380.3,  468.5),
  byrow=T, nrow=3)
rownames(star_wars_matrix2) = c(
  "The Phantom Menace","Attack of the Clones",
  "Revenge of the Sith") 
colnames(star_wars_matrix2)=c("US", "non-US")

                        US non-US
The Phantom Menace   474.5  552.5
Attack of the Clones 310.7  338.7
Revenge of the Sith  380.3  468.5


# Combine both Star Wars trilogies in one matrix with `rbind()`
all_wars_matrix <- rbind(star_wars_matrix, star_wars_matrix2) 
                           US non-US
A New Hope              461.0  314.4
The Empire Strikes Back 290.5  247.9
Return of the Jedi      309.3  165.8
The Phantom Menace      474.5  552.5
Attack of the Clones    310.7  338.7
Revenge of the Sith     380.3  468.5
The total box office revenue for the entire saga


# Total revenue for US and non-US with `colSums()`
total_revenue_vector <- colSums(all_wars_matrix)
# Print out total_revenue_vector
    US non-US 
  2226   2088 
Selection of matrix elements 矩陣的索引


# Select the non-US revenue for all movies with index notation `[,2]`
non_us_all <- all_wars_matrix[,2]
# Average non-US revenue with `mean()` 
[1] 348


# Select the non-US revenue for first two movies with index notation `[1:2,2]`
non_us_some <- all_wars_matrix[1:2,2]
# Average non-US revenue for first two movies with `mean()`
[1] 281.1


A little arithmetic with matrices


# Estimate the visitors, assuming ticket price is $5
visitors <- all_wars_matrix/5
# Print the estimate to the console
                           US non-US
A New Hope              92.20  62.88
The Empire Strikes Back 58.10  49.58
Return of the Jedi      61.86  33.16
The Phantom Menace      94.90 110.50
Attack of the Clones    62.14  67.74
Revenge of the Sith     76.06  93.70
A little arithmetic with matrices (2)


ticket_prices_matrix = matrix(
  c(5,5,6,6,7,7,4,4,4.5,4.5,4.9,4.9), byrow=T, nrow=6,
  ); ticket_prices_matrix
                         US non-US
A New Hope              5.0    5.0
The Empire Strikes Back 6.0    6.0
Return of the Jedi      7.0    7.0
The Phantom Menace      4.0    4.0
Attack of the Clones    4.5    4.5
Revenge of the Sith     4.9    4.9


# Estimated number of visitors 
visitors <- all_wars_matrix / ticket_prices_matrix
                            US non-US
A New Hope               92.20  62.88
The Empire Strikes Back  48.41  41.32
Return of the Jedi       44.19  23.69
The Phantom Menace      118.62 138.12
Attack of the Clones     69.04  75.27
Revenge of the Sith      77.61  95.61


# US visitors 
us_visitors <- visitors[,1] 

# Average number of US visitors
[1] 75.01

4. Factors 類別(因素)


Data often falls into a limited number of categories. For example, human hair color can be categorized as black, brown, blond, red, grey, or white—and perhaps a few more options for people who color their hair. In R, categorical data is stored in factors. Factors are very important in data analysis, so start learning how to create, subset, and compare them now.

What’s a factor and why would you use it?
# Assign to variable theory the value "factors"
theory = "factors"
What’s a factor and why would you use it? (2)
# create `sex vector`
sex_vector <- c("Male", "Female", "Female", "Male", "Male")
[1] "Male"   "Female" "Female" "Male"   "Male"  



# Convert `sex_vector` to a factor
factor_sex_vector <- factor(sex_vector)

# Print out factor_sex_vector
[1] Male   Female Female Male   Male  
Levels: Female Male


🌻 列印類別物件時,在向量值的下方會註明這個類別物件裡面有哪一些類別(Levels:)

What’s a factor and why would you use it? (3)


# Animals
animals_vector <- c("Elephant", "Giraffe", "Donkey", "Horse")
factor_animals_vector <- factor(animals_vector)
[1] Elephant Giraffe  Donkey   Horse   
Levels: Donkey Elephant Giraffe Horse


# Temperature
temperature_vector <- c("High", "Low", "High","Low", "Medium")
factor_temperature_vector <- factor(
  temperature_vector, order = TRUE, 
  levels = c("Low", "Medium", "High"))
[1] High   Low    High   Low    Medium
Levels: Low < Medium < High

🌻 Levels: Low < Medium < High代表:HighMedium大,MediumLow

Factor levels
# Code to build factor_survey_vector
survey_vector <- c("M", "F", "F", "M", "M")
factor_survey_vector <- factor(survey_vector)
[1] M F F M M
Levels: F M


# Specify the levels of factor_survey_vector
levels(factor_survey_vector) <- c("Female","Male")
[1] Male   Female Female Male   Male  
Levels: Female Male
Summarizing a factor

Take a summary() of the survey_vector and factor_survey_vector. Interpret the results of both vectors. Are they both equally useful in this case?


# Generate summary for survey_vector
   Length     Class      Mode 
        5 character character 
# Generate summary for factor_survey_vector
Female   Male 
     2      3 
Battle of the sexes
# Build factor_survey_vector with clean levels
survey_vector <- c("M", "F", "F", "M", "M")
factor_survey_vector <- factor(survey_vector)
levels(factor_survey_vector) <- c("Female", "Male")

# Male
male <- factor_survey_vector[1]

# Female
female <- factor_survey_vector[2]

# Battle of the sexes: Male 'larger' than female?
male > female
Warning in Ops.factor(male, female): '>' not meaningful for factors
[1] NA

🌷 因為在產生factor_survey_vector時我們並沒有指定order=TRUE,所以它的各子元件之間是不能比大小的!

Ordered factors

Defind speed_vector as a Character vector with 5 entries, one for each analyst. Each entry should be either “slow”, “medium”, or “fast”. Use the list below:

  • Analyst 1 is medium,
  • Analyst 2 is slow,
  • Analyst 3 is slow,
  • Analyst 4 is medium and
  • Analyst 5 is fast.


# Create speed_vector
speed_vector <- c("medium", "slow", "slow", "medium", "fast")
Ordered factors (2)

然後用`factor(…,order=TRUE)將其轉變為一個有序的類別向量(ordinal factor vector)

# Convert speed_vector to ordered factor vector
factor_speed_vector <- factor(

# Print factor_speed_vector
[1] medium slow   slow   medium fast  
Levels: slow < medium < fast
Comparing ordered factors
# Factor value for second data analyst
da2 <- factor_speed_vector[2]
# Factor value for fifth data analyst
da5 <- factor_speed_vector[5]

# Is data analyst 2 faster than data analyst 5?
da2 > da5


5. Data Frame 資料框

跟矩陣一樣資料框(data frame)也是一種二維的資料結構,不過矩陣物件裡全部的子元件必須要有同樣的資料類別(data type),而資料框沒有這一個限制,它的行columns可以有不一樣的資料種類,在後續的單元裡面我們會看到資料框是商業數據分析裡面最常見、最重要的一種資料結構,在這個章節我們先介紹一下載R語言裡面如何定義、使用資料框。

Data often falls into a limited number of categories. For example, human hair color can be categorized as black, brown, blond, red, grey, or white—and perhaps a few more options for people who color their hair. In R, categorical data is stored in factors. Factors are very important in data analysis, so start learning how to create, subset, and compare them now.

What’s a data frame?


                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2


Quick, have a look at your dataset


# Call head() on mtcars
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

🌻 商業數據通常會是表格的型式,通常每個row代表一個分析對象(車型),每個column(mpg耗油量,cyl汽缸數,…)代表分析對象的某一個屬性。

Have a look at the structure

str()可以看出資料物件的資料種類(data frame)和它的內部結構

# Investigate the structure of mtcars with `str()`
'data.frame':   32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
Creating a data frame


# Definition of vectors
name <- c("Mercury", "Venus", "Earth", 
          "Mars", "Jupiter", "Saturn", 
          "Uranus", "Neptune")
type <- c("Terrestrial planet", 
          "Terrestrial planet", 
          "Terrestrial planet", 
          "Terrestrial planet", "Gas giant", 
          "Gas giant", "Gas giant", "Gas giant")
diameter <- c(0.382, 0.949, 1, 0.532, 
              11.209, 9.449, 4.007, 3.883)
rotation <- c(58.64, -243.02, 1, 1.03, 
              0.41, 0.43, -0.72, 0.67)

# Create a data frame from the vectors
planets_df <- data.frame(name,type,diameter,rotation,rings)
     name               type diameter rotation rings
1 Mercury Terrestrial planet    0.382    58.64 FALSE
2   Venus Terrestrial planet    0.949  -243.02 FALSE
3   Earth Terrestrial planet    1.000     1.00 FALSE
4    Mars Terrestrial planet    0.532     1.03 FALSE
5 Jupiter          Gas giant   11.209     0.41  TRUE
6  Saturn          Gas giant    9.449     0.43  TRUE
7  Uranus          Gas giant    4.007    -0.72  TRUE
8 Neptune          Gas giant    3.883     0.67  TRUE
Creating a data frame (2)
# Check the structure of planets_df with `str`
'data.frame':   8 obs. of  5 variables:
 $ name    : chr  "Mercury" "Venus" "Earth" "Mars" ...
 $ type    : chr  "Terrestrial planet" "Terrestrial planet" "Terrestrial planet" "Terrestrial planet" ...
 $ diameter: num  0.382 0.949 1 0.532 11.209 ...
 $ rotation: num  58.64 -243.02 1 1.03 0.41 ...
 $ rings   : logi  FALSE FALSE FALSE FALSE TRUE TRUE ...
Selection of data frame elements


# 位置索引 Print out diameter of Mercury (row 1, column 3) with `[1,3]`
[1] 0.382
# 位置索引 Print out data for Mars (entire fourth row) with `[4,]`
  name               type diameter rotation rings
4 Mars Terrestrial planet    0.532     1.03 FALSE
Selection of data frame elements (2)


# Select first 5 values of diameter column
[1]  0.382  0.949  1.000  0.532 11.209
Only planets with rings

$符號可以用來抽取資料框的某一個column,注意一下,資料框(data frame)的一個column其實是一個向量(vector)

# Select the rings variable from planets_df
rings_vector <- planets_df$rings
# Print out rings_vector
Only planets with rings (2)


# Adapt the code to select all columns for planets with rings
planets_df[rings_vector, ]
     name      type diameter rotation rings
5 Jupiter Gas giant   11.209     0.41  TRUE
6  Saturn Gas giant    9.449     0.43  TRUE
7  Uranus Gas giant    4.007    -0.72  TRUE
8 Neptune Gas giant    3.883     0.67  TRUE
Only planets with rings but shorter


# Select planets with diameter < 1 with `subset(df, condition)`
subset(planets_df, subset = diameter < 1)
     name               type diameter rotation rings
1 Mercury Terrestrial planet    0.382    58.64 FALSE
2   Venus Terrestrial planet    0.949  -243.02 FALSE
4    Mars Terrestrial planet    0.532     1.03 FALSE
planets_df[planets_df$diameter>1, ]
     name      type diameter rotation rings
5 Jupiter Gas giant   11.209     0.41  TRUE
6  Saturn Gas giant    9.449     0.43  TRUE
7  Uranus Gas giant    4.007    -0.72  TRUE
8 Neptune Gas giant    3.883     0.67  TRUE


# Play around with the `order()` function in the console
[1] 2 4 5 3 1
# 最小的是排在第2位的`10`
# 第二小的是排在第4位的`20`
# ...
# 最大的是排在第1位的`50`
Sorting your data frame


# Use order() to create order index by diameter
positions <- order(planets_df$diameter)


# Use positions to sort planets_df with `df[order_index,]`
     name               type diameter rotation rings
1 Mercury Terrestrial planet    0.382    58.64 FALSE
4    Mars Terrestrial planet    0.532     1.03 FALSE
2   Venus Terrestrial planet    0.949  -243.02 FALSE
3   Earth Terrestrial planet    1.000     1.00 FALSE
8 Neptune          Gas giant    3.883     0.67  TRUE
7  Uranus          Gas giant    4.007    -0.72  TRUE
6  Saturn          Gas giant    9.449     0.43  TRUE
5 Jupiter          Gas giant   11.209     0.41  TRUE

6. Lists 序列


As opposed to vectors, lists can hold components of different types, just as your to-do lists can contain different categories of tasks. This chapter will teach you how to create, name, and subset these lists.

Lists, why would you need them?

Congratulations! At this point in the course you are already familiar with:

  • Vectors (one dimensional array): can hold numeric, character or logical values. The elements in a vector all have the same data type.
  • Matrices (two dimensional array): can hold numeric, character or logical values. The elements in a matrix all have the same data type.
  • Data frames (two-dimensional objects): can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type. Pretty sweet for an R newbie, right?
Lists, why would you need them? (2)

A list in R is similar to your to-do list at work or school: the different items on that list most likely differ in length, characteristic, and type of activity that has to be done.

A list in R allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other lists, etc. It is not even required that these objects are related to each other in any way.

You could say that a list is some kind super data type: you can store practically any piece of information in it!

Creating a list


# Vector with numerics from 1 up to 10
my_vector <- 1:10 

# Matrix with numerics from 1 up to 9
my_matrix <- matrix(1:9, ncol = 3)

# First 10 elements of the built-in data frame mtcars
my_df <- mtcars[1:10,]

# Construct list with these different elements:
my_list <- list(my_vector,my_matrix,my_df)

 [1]  1  2  3  4  5  6  7  8  9 10

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4


Creating a named list


# Adapt list() call to change the components names to `vec`, `mat` and `df` 
my_list <- list(vec=my_vector, mat=my_matrix, df=my_df)

# Print out my_list
 [1]  1  2  3  4  5  6  7  8  9 10

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4


     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
Creating a named list (2)

將一部電影(“The Shining”)的資料放在一個序列(shining_list)裡面

# The variables mov, act and rev are available
mov="The Shining"
act = c("Jack Nicholson","Shelley Duvall","Danny Lloyd",
        "Scatman Crothers","Barry Nelson")
rev = data.frame(
  scores = c(4.5,4.0,5.0),
  sources = c("IMDb1","IMDb2","IMDb3"),
  comments = c(
    "Best Horror Film I Have Ever Seen",
    "A truly brilliant and scary film from Stanley Kubrick",
    "A masterpiece of psychological horror"))

# Finish the code to build shining_list
shining_list <- list(
  moviename = mov,actors=act,reviews=rev)

[1] "The Shining"

[1] "Jack Nicholson"   "Shelley Duvall"   "Danny Lloyd"      "Scatman Crothers"
[5] "Barry Nelson"    

  scores sources                                              comments
1    4.5   IMDb1                     Best Horror Film I Have Ever Seen
2    4.0   IMDb2 A truly brilliant and scary film from Stanley Kubrick
3    5.0   IMDb3                 A masterpiece of psychological horror
Selecting elements from a list
# Print out the vector representing the actors
[1] "Jack Nicholson"   "Shelley Duvall"   "Danny Lloyd"      "Scatman Crothers"
[5] "Barry Nelson"    
# Print the second element of the vector representing the actors
[1] "Shelley Duvall"
Creating a new list for another movie

將另一部電影(“The Departed”)的資料放在另一個序列(shining_list)裡面

# define the comments and scores vectors
scores <- c(4.6, 5, 4.8, 5, 4.2)
comments <- c("I would watch it again", "Amazing!", "I liked it", 
              "One of the best movies","Fascinating plot")
movie_title = "The Departed"
movie_actors = c( "Leonardo DiCaprio","Matt Damon","Jack Nicholson",
                  "Mark Wahlberg","Vera Farmiga","Martin Sheen")

# Save the average of the scores vector as avg_review
avg_review = mean(scores)

# Combine scores and comments into the reviews_df data frame
reviews_df = data.frame(scores, comments)

# Create a list, called `departed_list`, 
# that contains the `movie_title`, `movie_actors`, 
# reviews data frame as `reviews_df`, 
# and the average review score as `avg_review`, and print it out.
departed_list = list( 
  movie_title, movie_actors, 
  reviews_df, avg_review)

[1] "The Departed"

[1] "Leonardo DiCaprio" "Matt Damon"        "Jack Nicholson"   
[4] "Mark Wahlberg"     "Vera Farmiga"      "Martin Sheen"     

  scores               comments
1    4.6 I would watch it again
2    5.0               Amazing!
3    4.8             I liked it
4    5.0 One of the best movies
5    4.2       Fascinating plot

[1] 4.72