### 1. Intro to Basics

Take your first steps with R. In this chapter, you will learn how to use the console as a calculator and how to assign variables. You will also get to know the basic data types in R. Let’s get started.

##### How it works

In the editor on the right there is already some sample code. Can you see which lines are actual R code and which are comments?

# Calculate 3 + 4
3+4
[1] 7
# Calculate 6 + 12
6+12
[1] 18

🌻 輸入運算式(expression)，R會輸出運算的結果

##### Arithmetic with R
5 + 5
[1] 10
# A subtraction
5 - 5
[1] 0
# A multiplication
3 * 5
[1] 15
# A division
(5 + 5) / 2
[1] 5

# Exponentiation: Type 2^5 in the editor to calculate 2 to the power 5.
2^5
[1] 32
# Modulo: Type 28 %% 6 to calculate 28 modulo 6.
28%%6
[1] 4

🌻 Quick-R有很多對初學者有用的資訊

##### Variable assignment

# Assign the value 42 to x
x <- 42
# Print out the value of the variable x
x
[1] 42
##### Variable assignment (2)
# Assign the value 5 to the variable my_apples
my_apples=5
# Print out the value of the variable my_apples
my_apples
[1] 5
##### Variable assignment (3)
# Assign a value to the variables my_apples and my_oranges
my_apples <- 5

# Assign to my_oranges the value 6.
my_oranges <- 6

# Add these two variables together
my_apples + my_oranges
[1] 11
# Create the variable my_fruit
my_fruit = my_apples + my_oranges

##### Apples and oranges

Arithmetic operators allow objects of numeric datatypes but not character.

# Assign a value to the variable my_apples
my_apples <- 5

# Fix the assignment of my_oranges, so that it can be added with `my_apples`
# my_oranges <- "six"
my_oranges <- 6

# Create the variable my_fruit and print it out
my_fruit <- my_apples + my_oranges
my_fruit
[1] 11
##### Basic datatypes in R
# Change my_numeric to be 42
my_numeric <- 42

# Change my_character to be "universe"
my_character <- "universe"

# Change my_logical to be FALSE
my_logical <- FALSE

🌻 常用的R資料種類(types)

• 整數(integer)、實數(numeric)
• 文字、字串(character)
• 類別(factor)
• 邏輯(logical/boolean)
• 日期(Date)、時間(POXIXct,…)

##### What’s that data type?
# Declare variables of different types
my_numeric <- 42
my_character <- "universe"
my_logical <- FALSE

# Check class of my_numeric
class(my_numeric)
[1] "numeric"
# Check class of my_character
class(my_character)
[1] "character"
# Check class of my_logical
class(my_logical)
[1] "logical"

### 2. Vectors 向量

We take you on a trip to Vegas, where you will learn how to analyze your gambling results using vectors in R. After completing this chapter, you will be able to create vectors in R, name them, select elements from them, and compare different vectors.

##### Create a vector
# Assign the value "Go!" to the variable `vegas`. Remember: R is case sensitive!
vegas <- "Go!"
##### Create a vector (2)

In R, you create a vector with the combine function c(). You place the vector elements separated by a comma between the parentheses. For example:
In R, you create a vector with the combine function c(). You place the vector elements separated by a comma between the parentheses. For example:

numeric_vector <- c(1, 10, 49)
character_vector <- c("a", "b", "c")

🌻 Use the c() function to create vector.

🌻 All elements in c() must be the same datatype.

vector1 <- c(1.2, 3.5, 15.2)            # `vector1`是一個數值向量
vector2 = c("Alice", "Bob", "Cindy")    # `vector2`是一個文字向量
# Complete the code for `boolean_vector` contains the three elements: `TRUE`, `FALSE` and `TRUE` (in that order).
boolean_vector <- c(TRUE, FALSE, TRUE)
##### Create a vector (3)
# Poker winnings from Monday to Friday
# 從星期一到星期五玩撲克贏或輸的錢
poker_vector <- c(140, -50, 20, -120, 240)
# Roulette winnings: Monday lost \$24, Tuesday lost \$50,
# Wednesday won \$100, Thursday lost \$350, and Friday won \$10.
# 從星期一到星期五玩輪盤贏或輸的錢
roulette_vector <- c(-24,-50,100,-350,10)
##### Naming vector elements 向量元件的名稱

poker_vector
[1]  140  -50   20 -120  240

names()可以用來指定物件(poker_vector)中的每一個子物件的名字

# Assign days as names of poker_vector
names(poker_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
poker_vector
Monday   Tuesday Wednesday  Thursday    Friday
140       -50        20      -120       240

🌻 如上所示，名稱可以讓物件(和子物件)都變得更容易解讀。

# Assign days as names of roulette_vector
names(roulette_vector) = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
roulette_vector
Monday   Tuesday Wednesday  Thursday    Friday
-24       -50       100      -350        10
##### Naming a vector (2)
# Poker winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240)

# Roulette winnings from Monday to Friday
roulette_vector <- c(-24, -50, 100, -350, 10)

# The variable days_vector
# 如果我們與先把週一到週五的文字預先存放在`days_vector`裡面
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
# Assign the names of the day to roulette_vector and poker_vector
# 程式寫起來就會比較簡捷
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector
##### Calculating total winnings 向量的數學運算
A_vector <- c(1, 2, 3)
B_vector <- c(4, 5, 6)

# Take the sum of A_vector and B_vector
total_vector <- A_vector + B_vector

# Print out total_vector
total_vector
[1] 5 7 9
##### Calculating total winnings (2)
# Assign to total_daily how much you won/lost on each day
total_daily <- poker_vector + roulette_vector
total_daily
Monday   Tuesday Wednesday  Thursday    Friday
116      -100       120      -470       250
##### Calculating total winnings (3)

R用function_name()來表示功能呼叫，如sum(vector1)會回傳vector1之中所有數值的總和，常用的R內建功能請參考：Built-in Functions

# Total winnings with poker 玩撲克總共贏了多少錢呢?
total_poker <- sum(poker_vector)

# Total winnings with roulette 玩輪盤總共贏了多少錢呢?
total_roulette <-  sum(roulette_vector)

# Total winnings overall 這一周的總輸贏是?
total_week <- total_poker + total_roulette

# Print out total_week
total_week
[1] -84
##### Comparing total winnings
# Calculate total gains for poker and roulette
total_poker <- sum(poker_vector)
total_roulette <-  sum(roulette_vector)

# Check if you realized higher total gains in poker than in roulette
total_poker > total_roulette
[1] TRUE

🌻 Comparison Operators 比較運算符號

• < for less than
• > for greater than
• <= for less than or equal to
• >= for greater than or equal to
• == for equal to each other
• != not equal to each other

🌻 比較運算式(Comparison Expression)運算的結果會是邏輯值：TRUEFALSE

🌷 索引(index)可以取出物件之中的某些子物件，R用[]來作索引

🌷 R的索引方式非常靈活，一共有三種索引方式：

• 位置索引：[整數向量]如：[c(3,5,10)], [2]
• 名稱索引：[文字向量]如：[c(“Monday”,“Friday”)]
• 條件索引：[邏輯向量]如：[poker_vector > 0]

##### Vector selection: the good times 位置(整數)索引
# Assign the poker results of Wednesday to the variable poker_wednesday. Using index notation `[]`
poker_wednesday <- poker_vector[3]
##### Vector selection: the good times (2) 位置(整數)索引
# Assign the poker results of Tuesday, Wednesday and Thursday to the variable poker_midweek, using `[c(2,3,4)]`
poker_midweek <- poker_vector[c(2,3,4)]
##### Vector selection: the good times (3) 位置(整數)索引

# Poker and roulette winnings from Monday to Friday:
# Assign to roulette_selection_vector the roulette results from Tuesday up to Friday; make use of `[2:5]`
roulette_selection_vector <- roulette_vector[2:5]
##### Vector selection: the good times (4) 名稱(文字)索引
# Poker and roulette winnings from Monday to Friday:
# Select poker results by names `[c("Monday", "Tuesday", "Wednesday")]`
poker_start <- poker_vector[c("Monday", "Tuesday", "Wednesday")]
poker_start
Monday   Tuesday Wednesday
140       -50        20

mean()計算數值向量之中所有數值的平均值

# Calculate the average of the elements in poker_start by `mean()`
mean(poker_start)
[1] 36.67
##### Selection by comparison - Step 1

🌷 條件索引是最常用的索引

# Poker and roulette winnings from Monday to Friday:
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector

# Which days did you make money on poker?
selection_vector <- poker_vector > 0

# Print out selection_vector
selection_vector
Monday   Tuesday Wednesday  Thursday    Friday
TRUE     FALSE      TRUE     FALSE      TRUE

🌻 Comparison Operators

• < for less than
• > for greater than
• <= for less than or equal to
• >= for greater than or equal to
• == for equal to each other
• != not equal to each other
##### Selection by comparison - Step 2
# Select from poker_vector these days using the indexing vector `[selection_vector]`
poker_winning_days <- poker_vector[selection_vector]
poker_winning_days
Monday Wednesday    Friday
140        20       240
# Poker and roulette winnings from Monday to Friday:
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector

# Which days did you make money on roulette?
selection_vector <- roulette_vector > 0

# Select from roulette_vector these days
roulette_winning_days <- roulette_vector[selection_vector]
roulette_winning_days
Wednesday    Friday
100        10

### 3. Matrices 矩陣

A matrix is two dimensional data object of a collection of elements of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns.

You can construct a matrix in R with the matrix() function. Consider the following example: matrix(1:9, byrow = TRUE, nrow = 3)

##### What’s a matrix?

matrix()可以將一維的向量傳變成二維的矩陣

# Construct a matrix with 3 rows containing the numbers 1 up to 9, filled row-wise.
matrix(1:9, byrow=T, nrow=3)
[,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
##### Analyze matrices, you shall

# Box office Star Wars (in millions!)
new_hope <- c(460.998, 314.4)
empire_strikes <- c(290.475, 247.900)
return_jedi <- c(309.306, 165.8)

# Create box_office
# Concatenate the 3 vectors `c(new_hope, empire_strikes, return_jedi)`
# Then create a matrix by `matrix()`. Remember to specify `byrow` and `nrow`
box_office <- c(new_hope,empire_strikes,return_jedi)
box_office
[1] 461.0 314.4 290.5 247.9 309.3 165.8
# Construct star_wars_matrix
star_wars_matrix <- matrix(box_office, byrow=T, nrow=3)

# print out the matrix
star_wars_matrix
[,1]  [,2]
[1,] 461.0 314.4
[2,] 290.5 247.9
[3,] 309.3 165.8
##### Naming a matrix

# Box office Star Wars (in millions!)
new_hope <- c(460.998, 314.4)
empire_strikes <- c(290.475, 247.900)
return_jedi <- c(309.306, 165.8)

# Construct matrix
star_wars_matrix <- matrix(c(new_hope, empire_strikes, return_jedi), nrow = 3, byrow = TRUE)

# Vectors region and titles, used for naming
region <- c("US", "non-US")
titles <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
# Name the columns with region with `colnames()`
colnames(star_wars_matrix) = region

# Name the rows with titles  with `rownames()`
rownames(star_wars_matrix) = titles

# Print out star_wars_matrix
star_wars_matrix
US non-US
A New Hope              461.0  314.4
The Empire Strikes Back 290.5  247.9
Return of the Jedi      309.3  165.8
##### Calculating the worldwide box office

# Calculate worldwide box office figures for each movies with `rowSums()`
worldwide_vector <- rowSums(star_wars_matrix)

worldwide_vector
A New Hope The Empire Strikes Back      Return of the Jedi
775.4                   538.4                   475.1
##### Adding a column for the Worldwide box office

cbind()在column的方向合併矩陣，用這個功能將全球票房向量worldwide_vector併入all_wars_matrix票房矩陣

# Construct worldwide box office vector
# Bind the new variable worldwide_vector as a column to star_wars_matrix with `cbind()`
all_wars_matrix <-cbind(star_wars_matrix,worldwide_vector)

all_wars_matrix
US non-US worldwide_vector
A New Hope              461.0  314.4            775.4
The Empire Strikes Back 290.5  247.9            538.4
Return of the Jedi      309.3  165.8            475.1

rbind()在row的方向合併矩陣，我們先製作星戰系列後三部電影的票房矩陣(star_wars_matrix2)

star_wars_matrix2 = matrix(
c(474.5,  552.5, 310.7,  338.7, 380.3,  468.5),
byrow=T, nrow=3)
rownames(star_wars_matrix2) = c(
"The Phantom Menace","Attack of the Clones",
"Revenge of the Sith")
colnames(star_wars_matrix2)=c("US", "non-US")

star_wars_matrix2
US non-US
The Phantom Menace   474.5  552.5
Attack of the Clones 310.7  338.7
Revenge of the Sith  380.3  468.5

# Combine both Star Wars trilogies in one matrix with `rbind()`
all_wars_matrix <- rbind(star_wars_matrix, star_wars_matrix2)

all_wars_matrix
US non-US
A New Hope              461.0  314.4
The Empire Strikes Back 290.5  247.9
Return of the Jedi      309.3  165.8
The Phantom Menace      474.5  552.5
Attack of the Clones    310.7  338.7
Revenge of the Sith     380.3  468.5
##### The total box office revenue for the entire saga

colSums計算星戰系列的USnon-US的票房總合

# Total revenue for US and non-US with `colSums()`
total_revenue_vector <- colSums(all_wars_matrix)

# Print out total_revenue_vector
total_revenue_vector
US non-US
2226   2088
##### Selection of matrix elements 矩陣的索引

# Select the non-US revenue for all movies with index notation `[,2]`
non_us_all <- all_wars_matrix[,2]

# Average non-US revenue with `mean()`
mean(non_us_all)
[1] 348

# Select the non-US revenue for first two movies with index notation `[1:2,2]`
non_us_some <- all_wars_matrix[1:2,2]

# Average non-US revenue for first two movies with `mean()`
mean(non_us_some)
[1] 281.1

##### A little arithmetic with matrices

# Estimate the visitors, assuming ticket price is \$5
visitors <- all_wars_matrix/5

# Print the estimate to the console
visitors
US non-US
A New Hope              92.20  62.88
The Empire Strikes Back 58.10  49.58
Return of the Jedi      61.86  33.16
The Phantom Menace      94.90 110.50
Attack of the Clones    62.14  67.74
Revenge of the Sith     76.06  93.70
##### A little arithmetic with matrices (2)

ticket_prices_matrix = matrix(
c(5,5,6,6,7,7,4,4,4.5,4.5,4.9,4.9), byrow=T, nrow=6,
dimnames=list(
rownames(all_wars_matrix),
colnames(all_wars_matrix))
); ticket_prices_matrix
US non-US
A New Hope              5.0    5.0
The Empire Strikes Back 6.0    6.0
Return of the Jedi      7.0    7.0
The Phantom Menace      4.0    4.0
Attack of the Clones    4.5    4.5
Revenge of the Sith     4.9    4.9

# Estimated number of visitors
visitors <- all_wars_matrix / ticket_prices_matrix
visitors
US non-US
A New Hope               92.20  62.88
The Empire Strikes Back  48.41  41.32
Return of the Jedi       44.19  23.69
The Phantom Menace      118.62 138.12
Attack of the Clones     69.04  75.27
Revenge of the Sith      77.61  95.61

# US visitors
us_visitors <- visitors[,1]

# Average number of US visitors
mean(us_visitors)
[1] 75.01

### 4. Factors 類別(因素)

Data often falls into a limited number of categories. For example, human hair color can be categorized as black, brown, blond, red, grey, or white—and perhaps a few more options for people who color their hair. In R, categorical data is stored in factors. Factors are very important in data analysis, so start learning how to create, subset, and compare them now.

##### What’s a factor and why would you use it?
# Assign to variable theory the value "factors"
theory = "factors"
##### What’s a factor and why would you use it? (2)
# create `sex vector`
sex_vector <- c("Male", "Female", "Female", "Male", "Male")
sex_vector
[1] "Male"   "Female" "Female" "Male"   "Male"

sex_vector是一個文字向量

factor()這一個功能可以把文字或者數值物件轉換成類別物件

# Convert `sex_vector` to a factor
factor_sex_vector <- factor(sex_vector)

# Print out factor_sex_vector
factor_sex_vector
[1] Male   Female Female Male   Male
Levels: Female Male

🌻 列印類別物件時，在向量值的下方會註明這個類別物件裡面有哪一些類別(Levels:)

##### What’s a factor and why would you use it? (3)

# Animals
animals_vector <- c("Elephant", "Giraffe", "Donkey", "Horse")
factor_animals_vector <- factor(animals_vector)
factor_animals_vector
[1] Elephant Giraffe  Donkey   Horse
Levels: Donkey Elephant Giraffe Horse

# Temperature
temperature_vector <- c("High", "Low", "High","Low", "Medium")
factor_temperature_vector <- factor(
temperature_vector, order = TRUE,
levels = c("Low", "Medium", "High"))
factor_temperature_vector
[1] High   Low    High   Low    Medium
Levels: Low < Medium < High

🌻 Levels: Low < Medium < High代表：HighMedium大，MediumLow

##### Factor levels
# Code to build factor_survey_vector