Take your first steps with R. In this chapter, you will learn how to use the console as a calculator and how to assign variables. You will also get to know the basic data types in R. Let’s get started.
In the editor on the right there is already some sample code. Can you see which lines are actual R code and which are comments?
[1] 7
[1] 18
🌻 輸入
[1] 10
[1] 0
[1] 15
[1] 5
除了四則運算和括號,R還有一些常用的運算符號
[1] 32
[1] 4
🌻 Quick-R有很多對初學者有用的資訊
我們可以用 <-
或 =
,把運算的結果儲存到一個「
[1] 42
[1] 5
# Assign a value to the variables my_apples and my_oranges
my_apples <- 5
# Assign to my_oranges the value 6.
my_oranges <- 6
# Add these two variables together
my_apples + my_oranges
[1] 11
我們可以在運算式裡面使用資料物件,把運算的結果
Arithmetic operators allow objects of numeric datatypes but not character
.
# Assign a value to the variable my_apples
my_apples <- 5
# Fix the assignment of my_oranges, so that it can be added with `my_apples`
# my_oranges <- "six"
my_oranges <- 6
# Create the variable my_fruit and print it out
my_fruit <- my_apples + my_oranges
my_fruit
[1] 11
# Change my_numeric to be 42
my_numeric <- 42
# Change my_character to be "universe"
my_character <- "universe"
# Change my_logical to be FALSE
my_logical <- FALSE
🌻 常用的R
Date
)、時間(POXIXct
,…)We take you on a trip to Vegas, where you will learn how to analyze your gambling results using vectors in R. After completing this chapter, you will be able to create vectors in R, name them, select elements from them, and compare different vectors.
In R, you create a vector with the combine function c(). You place the vector elements separated by a comma between the parentheses. For example:
In R, you create a vector with the combine function c(). You place the vector elements separated by a comma between the parentheses. For example:
🌻 Use the c()
function to create vector.
🌻 All elements in c() must be the same datatype.
我們可以用c()
這一個
然後用<-
或=
為這個
不只是物件可以有
[1] 140 -50 20 -120 240
names()
可以用來指定物件(poker_vector
)中的每一個子物件的名字
# Assign days as names of poker_vector
names(poker_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
poker_vector
Monday Tuesday Wednesday Thursday Friday
140 -50 20 -120 240
🌻 如上所示,名稱可以讓物件(和子物件)都變得更容易解讀。
# Assign days as names of roulette_vector
names(roulette_vector) = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
roulette_vector
Monday Tuesday Wednesday Thursday Friday
-24 -50 100 -350 10
# Poker winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240)
# Roulette winnings from Monday to Friday
roulette_vector <- c(-24, -50, 100, -350, 10)
# The variable days_vector
# 如果我們與先把週一到週五的文字預先存放在`days_vector`裡面
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
A_vector <- c(1, 2, 3)
B_vector <- c(4, 5, 6)
# Take the sum of A_vector and B_vector
total_vector <- A_vector + B_vector
# Print out total_vector
total_vector
[1] 5 7 9
# Assign to total_daily how much you won/lost on each day
total_daily <- poker_vector + roulette_vector
total_daily
Monday Tuesday Wednesday Thursday Friday
116 -100 120 -470 250
R用function_name()
來表示sum(vector1)
會回傳vector1
之中所有數值的總和,常用的R內建功能請參考:Built-in Functions
# Total winnings with poker 玩撲克總共贏了多少錢呢?
total_poker <- sum(poker_vector)
# Total winnings with roulette 玩輪盤總共贏了多少錢呢?
total_roulette <- sum(roulette_vector)
# Total winnings overall 這一周的總輸贏是?
total_week <- total_poker + total_roulette
# Print out total_week
total_week
[1] -84
# Calculate total gains for poker and roulette
total_poker <- sum(poker_vector)
total_roulette <- sum(roulette_vector)
# Check if you realized higher total gains in poker than in roulette
total_poker > total_roulette
[1] TRUE
🌻
<
for less than>
for greater than<=
for less than or equal to>=
for greater than or equal to==
for equal to each other!=
not equal to each other🌻 比較運算式(Comparison Expression)運算的結果會是邏輯值:TRUE
或FALSE
🌷 []
來作索引
🌷 R的索引方式非常靈活,一共有三種索引方式:
a:b
這種形式,例如2:6
代表c(2,3,4,5,6)
# Poker and roulette winnings from Monday to Friday:
# Select poker results by names `[c("Monday", "Tuesday", "Wednesday")]`
poker_start <- poker_vector[c("Monday", "Tuesday", "Wednesday")]
poker_start
Monday Tuesday Wednesday
140 -50 20
mean()
計算數值向量之中所有數值的平均值
[1] 36.67
🌷
# Poker and roulette winnings from Monday to Friday:
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector
# Which days did you make money on poker?
selection_vector <- poker_vector > 0
# Print out selection_vector
selection_vector
Monday Tuesday Wednesday Thursday Friday
TRUE FALSE TRUE FALSE TRUE
🌻 Comparison Operators
<
for less than>
for greater than<=
for less than or equal to>=
for greater than or equal to==
for equal to each other!=
not equal to each other# Select from poker_vector these days using the indexing vector `[selection_vector]`
poker_winning_days <- poker_vector[selection_vector]
poker_winning_days
Monday Wednesday Friday
140 20 240
# Poker and roulette winnings from Monday to Friday:
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector
# Which days did you make money on roulette?
selection_vector <- roulette_vector > 0
# Select from roulette_vector these days
roulette_winning_days <- roulette_vector[selection_vector]
roulette_winning_days
Wednesday Friday
100 10
A matrix is two dimensional data object of a collection of elements of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns.
You can construct a matrix in R with the matrix() function. Consider the following example: matrix(1:9, byrow = TRUE, nrow = 3)
matrix()
可以將一維的向量傳變成二維的
# Construct a matrix with 3 rows containing the numbers 1 up to 9, filled row-wise.
matrix(1:9, byrow=T, nrow=3)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
製作星際大戰前三部電影的票房矩陣
# Box office Star Wars (in millions!)
new_hope <- c(460.998, 314.4)
empire_strikes <- c(290.475, 247.900)
return_jedi <- c(309.306, 165.8)
# Create box_office
# Concatenate the 3 vectors `c(new_hope, empire_strikes, return_jedi)`
# Then create a matrix by `matrix()`. Remember to specify `byrow` and `nrow`
box_office <- c(new_hope,empire_strikes,return_jedi)
box_office
[1] 461.0 314.4 290.5 247.9 309.3 165.8
# Construct star_wars_matrix
star_wars_matrix <- matrix(box_office, byrow=T, nrow=3)
# print out the matrix
star_wars_matrix
[,1] [,2]
[1,] 461.0 314.4
[2,] 290.5 247.9
[3,] 309.3 165.8
矩陣每一行(column)和每一列(row)都可以有名稱
# Box office Star Wars (in millions!)
new_hope <- c(460.998, 314.4)
empire_strikes <- c(290.475, 247.900)
return_jedi <- c(309.306, 165.8)
# Construct matrix
star_wars_matrix <- matrix(c(new_hope, empire_strikes, return_jedi), nrow = 3, byrow = TRUE)
# Vectors region and titles, used for naming
region <- c("US", "non-US")
titles <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
# Name the columns with region with `colnames()`
colnames(star_wars_matrix) = region
# Name the rows with titles with `rownames()`
rownames(star_wars_matrix) = titles
# Print out star_wars_matrix
star_wars_matrix
US non-US
A New Hope 461.0 314.4
The Empire Strikes Back 290.5 247.9
Return of the Jedi 309.3 165.8
使用rowSums()
這個功能來計算星際大戰前三部電影的全球票房
# Calculate worldwide box office figures for each movies with `rowSums()`
worldwide_vector <- rowSums(star_wars_matrix)
worldwide_vector
A New Hope The Empire Strikes Back Return of the Jedi
775.4 538.4 475.1
cbind()
在column的方向合併矩陣,用這個功能將全球票房向量worldwide_vector
併入all_wars_matrix
票房矩陣
# Construct worldwide box office vector
# Bind the new variable worldwide_vector as a column to star_wars_matrix with `cbind()`
all_wars_matrix <-cbind(star_wars_matrix,worldwide_vector)
all_wars_matrix
US non-US worldwide_vector
A New Hope 461.0 314.4 775.4
The Empire Strikes Back 290.5 247.9 538.4
Return of the Jedi 309.3 165.8 475.1
rbind()
在row的方向合併矩陣,我們先製作星戰系列後三部電影的票房矩陣(star_wars_matrix2
)
star_wars_matrix2 = matrix(
c(474.5, 552.5, 310.7, 338.7, 380.3, 468.5),
byrow=T, nrow=3)
rownames(star_wars_matrix2) = c(
"The Phantom Menace","Attack of the Clones",
"Revenge of the Sith")
colnames(star_wars_matrix2)=c("US", "non-US")
star_wars_matrix2
US non-US
The Phantom Menace 474.5 552.5
Attack of the Clones 310.7 338.7
Revenge of the Sith 380.3 468.5
然後用rbind()
將它併入all_wars_matrix
# Combine both Star Wars trilogies in one matrix with `rbind()`
all_wars_matrix <- rbind(star_wars_matrix, star_wars_matrix2)
all_wars_matrix
US non-US
A New Hope 461.0 314.4
The Empire Strikes Back 290.5 247.9
Return of the Jedi 309.3 165.8
The Phantom Menace 474.5 552.5
Attack of the Clones 310.7 338.7
Revenge of the Sith 380.3 468.5
用colSums
計算星戰系列的US
和non-US
的票房總合
# Total revenue for US and non-US with `colSums()`
total_revenue_vector <- colSums(all_wars_matrix)
# Print out total_revenue_vector
total_revenue_vector
US non-US
2226 2088
二維物件的索引需要以[row_index,column_index]
的形式指定想要抽取的rows和columns
# Select the non-US revenue for all movies with index notation `[,2]`
non_us_all <- all_wars_matrix[,2]
# Average non-US revenue with `mean()`
mean(non_us_all)
[1] 348
星戰系列電影海外票房的平均值
# Select the non-US revenue for first two movies with index notation `[1:2,2]`
non_us_some <- all_wars_matrix[1:2,2]
# Average non-US revenue for first two movies with `mean()`
mean(non_us_some)
[1] 281.1
星戰系列前兩部電影海外票房的平均值
假設所有電影在所有地區的票價都是$5
,我們可以從票房矩陣推算出每一部電影的觀眾人數
# Estimate the visitors, assuming ticket price is $5
visitors <- all_wars_matrix/5
# Print the estimate to the console
visitors
US non-US
A New Hope 92.20 62.88
The Empire Strikes Back 58.10 49.58
Return of the Jedi 61.86 33.16
The Phantom Menace 94.90 110.50
Attack of the Clones 62.14 67.74
Revenge of the Sith 76.06 93.70
假設每部電影在各地區的票價不相同,我們就要先製作一個票價矩陣(ticket_prices_matrix
)
ticket_prices_matrix = matrix(
c(5,5,6,6,7,7,4,4,4.5,4.5,4.9,4.9), byrow=T, nrow=6,
dimnames=list(
rownames(all_wars_matrix),
colnames(all_wars_matrix))
); ticket_prices_matrix
US non-US
A New Hope 5.0 5.0
The Empire Strikes Back 6.0 6.0
Return of the Jedi 7.0 7.0
The Phantom Menace 4.0 4.0
Attack of the Clones 4.5 4.5
Revenge of the Sith 4.9 4.9
再推算出每一部電影在不同地區的觀眾人數
US non-US
A New Hope 92.20 62.88
The Empire Strikes Back 48.41 41.32
Return of the Jedi 44.19 23.69
The Phantom Menace 118.62 138.12
Attack of the Clones 69.04 75.27
Revenge of the Sith 77.61 95.61
星戰系列各部電影美國觀眾人數的平均值是多少人呢?
[1] 75.01
某一些資料,像是顧客的性別(“男性”,“女性”),交通工具(“汽車”,“火車”,“飛機”)等,雖然說這些資料都是文字的方式呈現,但它們它們的內容只限定於某一些固定的類別,而不是任意的文字字串,在程式語言裡面,這一種資料為別會被稱為“類別(Factor)”資料。在這裡我們先介紹一下載R語言裡面如何定義、使用
Data often falls into a limited number of categories. For example, human hair color can be categorized as black, brown, blond, red, grey, or white—and perhaps a few more options for people who color their hair. In R, categorical data is stored in factors. Factors are very important in data analysis, so start learning how to create, subset, and compare them now.
[1] "Male" "Female" "Female" "Male" "Male"
sex_vector
是一個文字向量
factor()
這一個功能可以把文字或者數值物件轉換成類別物件
# Convert `sex_vector` to a factor
factor_sex_vector <- factor(sex_vector)
# Print out factor_sex_vector
factor_sex_vector
[1] Male Female Female Male Male
Levels: Female Male
轉換之後,factor_sex_vector
就是一個類別向量。
🌻 列印類別物件時,在向量值的下方會註明這個類別物件裡面有哪一些類別(Levels:
)
一般的類別物件裡面,各類別之間並沒有大小的區別,雖然說列印的時候,R會依字母的順序列出各個類別,但是這列印順序並沒有大小的意涵。
# Animals
animals_vector <- c("Elephant", "Giraffe", "Donkey", "Horse")
factor_animals_vector <- factor(animals_vector)
factor_animals_vector
[1] Elephant Giraffe Donkey Horse
Levels: Donkey Elephant Giraffe Horse
如果我們想要讓類別之間有大小,在呼叫factor()
時就需要加進去order=TRUE
這一個參數選項。
# Temperature
temperature_vector <- c("High", "Low", "High","Low", "Medium")
factor_temperature_vector <- factor(
temperature_vector, order = TRUE,
levels = c("Low", "Medium", "High"))
factor_temperature_vector
[1] High Low High Low Medium
Levels: Low < Medium < High
🌻 Levels: Low < Medium < High
代表:High
比Medium
大,Medium
比Low
大
# Code to build factor_survey_vector
survey_vector <- c("M", "F", "F", "M", "M")
factor_survey_vector <- factor(survey_vector)
factor_survey_vector
[1] M F F M M
Levels: F M
levels()
可以用來改變類別(level
)的名稱
# Specify the levels of factor_survey_vector
levels(factor_survey_vector) <- c("Female","Male")
factor_survey_vector
[1] Male Female Female Male Male
Levels: Female Male
Take a summary()
of the survey_vector
and factor_survey_vector
. Interpret the results of both vectors. Are they both equally useful in this case?
對類別向量factor_survey_vector
而言,summary()
可以統計各分類的次數,但它對文字向量(survey_vector
)是沒有用的
Length Class Mode
5 character character
Female Male
2 3
# Build factor_survey_vector with clean levels
survey_vector <- c("M", "F", "F", "M", "M")
factor_survey_vector <- factor(survey_vector)
levels(factor_survey_vector) <- c("Female", "Male")
# Male
male <- factor_survey_vector[1]
# Female
female <- factor_survey_vector[2]
# Battle of the sexes: Male 'larger' than female?
male > female
Warning in Ops.factor(male, female): '>' not meaningful for factors
[1] NA
🌷 因為在產生factor_survey_vector
時我們並沒有指定order=TRUE
,所以它的各子元件之間是不能比大小的!
Defind speed_vector
as a Character vector with 5 entries, one for each analyst. Each entry should be either “slow”, “medium”, or “fast”. Use the list below:
我們用speed_vector
來記錄五位分析師的工作速度
然後用`factor(…,order=TRUE)將其轉變為一個有序的類別向量(ordinal factor vector)
# Convert speed_vector to ordered factor vector
factor_speed_vector <- factor(
speed_vector,levels=c("slow","medium","fast"),ordered=TRUE)
# Print factor_speed_vector
factor_speed_vector
[1] medium slow slow medium fast
Levels: slow < medium < fast
跟矩陣一樣
Data often falls into a limited number of categories. For example, human hair color can be categorized as black, brown, blond, red, grey, or white—and perhaps a few more options for people who color their hair. In R, categorical data is stored in factors. Factors are very important in data analysis, so start learning how to create, subset, and compare them now.
R裡面有一個內建的資料框mtcars
,如下所示,它記錄了一些汽車型號的各種屬性
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
各屬性的定義
看看這個資料框的前面幾筆計錄
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
🌻 商業數據通常會是表格的型式,通常每個row代表一個分析對象(車型),每個column(mpg
耗油量,cyl
汽缸數,…)代表分析對象的某一個屬性。
str()
可以看出資料物件的資料種類(data frame)和它的內部結構
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
將八大行星的屬性紀錄在planets_df
這一個資料框裡面
# Definition of vectors
name <- c("Mercury", "Venus", "Earth",
"Mars", "Jupiter", "Saturn",
"Uranus", "Neptune")
type <- c("Terrestrial planet",
"Terrestrial planet",
"Terrestrial planet",
"Terrestrial planet", "Gas giant",
"Gas giant", "Gas giant", "Gas giant")
diameter <- c(0.382, 0.949, 1, 0.532,
11.209, 9.449, 4.007, 3.883)
rotation <- c(58.64, -243.02, 1, 1.03,
0.41, 0.43, -0.72, 0.67)
rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
# Create a data frame from the vectors
planets_df <- data.frame(name,type,diameter,rotation,rings)
planets_df
name type diameter rotation rings
1 Mercury Terrestrial planet 0.382 58.64 FALSE
2 Venus Terrestrial planet 0.949 -243.02 FALSE
3 Earth Terrestrial planet 1.000 1.00 FALSE
4 Mars Terrestrial planet 0.532 1.03 FALSE
5 Jupiter Gas giant 11.209 0.41 TRUE
6 Saturn Gas giant 9.449 0.43 TRUE
7 Uranus Gas giant 4.007 -0.72 TRUE
8 Neptune Gas giant 3.883 0.67 TRUE
'data.frame': 8 obs. of 5 variables:
$ name : chr "Mercury" "Venus" "Earth" "Mars" ...
$ type : chr "Terrestrial planet" "Terrestrial planet" "Terrestrial planet" "Terrestrial planet" ...
$ diameter: num 0.382 0.949 1 0.532 11.209 ...
$ rotation: num 58.64 -243.02 1 1.03 0.41 ...
$ rings : logi FALSE FALSE FALSE FALSE TRUE TRUE ...
因為資料框是一個二維結構,所以要抽取他的某一個部分也需要兩個索引([,]
)
[1] 0.382
name type diameter rotation rings
4 Mars Terrestrial planet 0.532 1.03 FALSE
位置索引、名稱索引和條件索引可以混合使用
[1] 0.382 0.949 1.000 0.532 11.209
$
符號可以用來抽取資料框的某一個column,注意一下,資料框(data frame)的一個column其實是一個向量(vector)
# Select the rings variable from planets_df
rings_vector <- planets_df$rings
# Print out rings_vector
rings_vector
[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
因為rings_vector
是一個邏輯向量,我們可以用它當邏輯索引
name type diameter rotation rings
5 Jupiter Gas giant 11.209 0.41 TRUE
6 Saturn Gas giant 9.449 0.43 TRUE
7 Uranus Gas giant 4.007 -0.72 TRUE
8 Neptune Gas giant 3.883 0.67 TRUE
subset()
可以依條件對資料框做篩選
# Select planets with diameter < 1 with `subset(df, condition)`
subset(planets_df, subset = diameter < 1)
name type diameter rotation rings
1 Mercury Terrestrial planet 0.382 58.64 FALSE
2 Venus Terrestrial planet 0.949 -243.02 FALSE
4 Mars Terrestrial planet 0.532 1.03 FALSE
name type diameter rotation rings
5 Jupiter Gas giant 11.209 0.41 TRUE
6 Saturn Gas giant 9.449 0.43 TRUE
7 Uranus Gas giant 4.007 -0.72 TRUE
8 Neptune Gas giant 3.883 0.67 TRUE
order()
會以整數向量的行式回傳向量中每個值的排序(由小到大)
[1] 2 4 5 3 1
我們可以依直徑的大小(從小到大)對八大行星做一個排序,將次序向量放在positions
裡面
用次序向量(positions
)作索引,等於是對資料框(依diameter
從小到大)作排序
name type diameter rotation rings
1 Mercury Terrestrial planet 0.382 58.64 FALSE
4 Mars Terrestrial planet 0.532 1.03 FALSE
2 Venus Terrestrial planet 0.949 -243.02 FALSE
3 Earth Terrestrial planet 1.000 1.00 FALSE
8 Neptune Gas giant 3.883 0.67 TRUE
7 Uranus Gas giant 4.007 -0.72 TRUE
6 Saturn Gas giant 9.449 0.43 TRUE
5 Jupiter Gas giant 11.209 0.41 TRUE
As opposed to vectors, lists can hold components of different types, just as your to-do lists can contain different categories of tasks. This chapter will teach you how to create, name, and subset these lists.
Congratulations! At this point in the course you are already familiar with:
A list in R is similar to your to-do list at work or school: the different items on that list most likely differ in length, characteristic, and type of activity that has to be done.
A list in R allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other lists, etc. It is not even required that these objects are related to each other in any way.
You could say that a list is some kind super data type: you can store practically any piece of information in it!
將向量(my_vector)、矩陣(my_matrix)和資料框(my_df)放在同一個序列(my_list)之中
# Vector with numerics from 1 up to 10
my_vector <- 1:10
# Matrix with numerics from 1 up to 9
my_matrix <- matrix(1:9, ncol = 3)
# First 10 elements of the built-in data frame mtcars
my_df <- mtcars[1:10,]
# Construct list with these different elements:
my_list <- list(my_vector,my_matrix,my_df)
my_list
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10
[[2]]
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
[[3]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
my_list
裡面有三個子物件但這些子物件並沒有名稱
我們可以在產生my_list
時,對每一個子物件指定名稱
# Adapt list() call to change the components names to `vec`, `mat` and `df`
my_list <- list(vec=my_vector, mat=my_matrix, df=my_df)
# Print out my_list
my_list
$vec
[1] 1 2 3 4 5 6 7 8 9 10
$mat
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
$df
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
然後我們就可以依名稱抽取該序列之中的子物件
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
將一部電影(“The Shining”)的資料放在一個序列(shining_list
)裡面
# The variables mov, act and rev are available
mov="The Shining"
act = c("Jack Nicholson","Shelley Duvall","Danny Lloyd",
"Scatman Crothers","Barry Nelson")
rev = data.frame(
scores = c(4.5,4.0,5.0),
sources = c("IMDb1","IMDb2","IMDb3"),
comments = c(
"Best Horror Film I Have Ever Seen",
"A truly brilliant and scary film from Stanley Kubrick",
"A masterpiece of psychological horror"))
# Finish the code to build shining_list
shining_list <- list(
moviename = mov,actors=act,reviews=rev)
shining_list
$moviename
[1] "The Shining"
$actors
[1] "Jack Nicholson" "Shelley Duvall" "Danny Lloyd" "Scatman Crothers"
[5] "Barry Nelson"
$reviews
scores sources comments
1 4.5 IMDb1 Best Horror Film I Have Ever Seen
2 4.0 IMDb2 A truly brilliant and scary film from Stanley Kubrick
3 5.0 IMDb3 A masterpiece of psychological horror
[1] "Jack Nicholson" "Shelley Duvall" "Danny Lloyd" "Scatman Crothers"
[5] "Barry Nelson"
[1] "Shelley Duvall"
將另一部電影(“The Departed”)的資料放在另一個序列(shining_list
)裡面
# define the comments and scores vectors
scores <- c(4.6, 5, 4.8, 5, 4.2)
comments <- c("I would watch it again", "Amazing!", "I liked it",
"One of the best movies","Fascinating plot")
movie_title = "The Departed"
movie_actors = c( "Leonardo DiCaprio","Matt Damon","Jack Nicholson",
"Mark Wahlberg","Vera Farmiga","Martin Sheen")
# Save the average of the scores vector as avg_review
avg_review = mean(scores)
# Combine scores and comments into the reviews_df data frame
reviews_df = data.frame(scores, comments)
# Create a list, called `departed_list`,
# that contains the `movie_title`, `movie_actors`,
# reviews data frame as `reviews_df`,
# and the average review score as `avg_review`, and print it out.
departed_list = list(
movie_title, movie_actors,
reviews_df, avg_review)
departed_list
[[1]]
[1] "The Departed"
[[2]]
[1] "Leonardo DiCaprio" "Matt Damon" "Jack Nicholson"
[4] "Mark Wahlberg" "Vera Farmiga" "Martin Sheen"
[[3]]
scores comments
1 4.6 I would watch it again
2 5.0 Amazing!
3 4.8 I liked it
4 5.0 One of the best movies
5 4.2 Fascinating plot
[[4]]
[1] 4.72