### 1. 向量(一維)的索引 Vector (1-dim) Index

• We need index to access specific elements within collective objects
• In R, indexes are specified in angle baskets. Ex.,
• x[5] take the 5th element in x
• x[c(2,5)] takes the 2nd and the 5th elements in x
• x[2:5] takes the 2nd, 3rd, 4th and the 5th elements in x
• Vector is one dimensional, so it need only one index
• Matrix and Data Frame are two dimensional. They need two index.
• There’re three type of index, as will be elaborated below

First, let’s define some vectors.

freq = c(3L, 5L, 1L, 1L, 3L)                # integer vector
amount = c(100, 168, 180, 280, 199)         # numeric
member = c(FALSE, TRUE, FALSE, TRUE, TRUE)  # logical
name = c("Amy", "Bob", "Cindy", "Danny", "Edward")  # character
gender = as.factor( c("F", "M", "F", "M", "M") )    # factor
class = as.factor(c('A','B','A','B','A'))           # factor

##### 1.1 位置(整數)索引 Position (Integer) Index

🌻 Position indexes take the format of integer vector

freq[c(1,5)]
[1] 3 3
gender[2:4]
[1] M F M
Levels: F M
gender[5]      
[1] M
Levels: F M

An atomic object is equivalent to an vector of length 1

i = c(1:3, 5)   # we can define an integer vector object
amount[i]       # and use it as an position index 
[1] 100 168 180 199

Index can be used to select

amount[ c(1,2) ]
[1] 100 168

reproduce

amount[ c(1,2,2,3,3,3,4,4,4,4) ] 
 [1] 100 168 168 180 180 180 280 280 280 280
# an position index may be longer than the targeted vector

or reorder elements in the targeted object

amount[ c(5,4,3,2,1)  ]
[1] 199 280 180 168 100

##### 1.2 名稱(文字)索引 Name (Character) Index

amount is an unnamed numeric vector

amount
[1] 100 168 180 280 199
names(amount)
NULL

We can assign a name to each element of an collective objects

names(amount) =  c("Amy", "Bob", "Cindy", "Danny", "Edward")
amount
   Amy    Bob  Cindy  Danny Edward
100    168    180    280    199 

Now amount becomes an named numeric vector, and we can access its elements by names.

🌻 A name index is an character vector

i = c("Bob", "Cindy")
amount[ i ]
  Bob Cindy
168   180 

##### 1.3 條件(邏輯)索引 Conditional (Logical) Index

🌻 Conditional indexes are logical vectors (whose length equal to their targeted vectors.)

freq[ c(T,T,F,F,T) ]
[1] 3 5 3

🌻 Logical indexes let us select elements by conditions

amount[ freq <= 2 ]
Cindy Danny
180   280 

🗿 QUIZ：
With the vectors defined below …

noBuy = c(3L, 5L, 1L, 1L, 3L)                       # Integer
height = c(175, 168, 180, 181, 169)                 # numeric
isMale = c(FALSE, TRUE, FALSE, TRUE, TRUE)          # logical
name = c("Amy", "Bob", "Cindy", "Danny", "Edward")  # character
gender = factor( c("F", "M", "F", "M", "M") )       # factor
skin_color = factor( c("black", "black", "white", "yellow", "white") )  # factor

Use index and math functions to answer the following questions …

🗿: list the name of males

name[isMale]
[1] "Bob"    "Danny"  "Edward"

🗿: list the names of those who higher than 180

#

🗿: list the names of those who higher than 180 and skin color is “yellow”

#

🗿: calculate the average height of males

mean( height[gender == "M"] )
[1] 172.7

🗿: calculate the total number of buys (noBuy) by females

#

🗿: count the number of white female

#

### 2. 資料框(二維)索引 Indexing Data Frames

##### 2.1 資料框的優點 The Benefit of Data Frame

Data frame is the most common and useful data structure. Usually

• each row of a data frame represents an subject (unit of analysis) and
• each column represents an an attribute or measure of interest.
df = data.frame(
noBuy = c(3L, 5L, 1L, 1L, 3L),
height = c(175, 168, 180, 181, 169),
isMale = c(FALSE, TRUE, FALSE, TRUE, TRUE),
name = c("Amy", "Bob", "Cindy", "Danny", "Edward"),
gender = factor( c("F", "M", "F", "M", "M") ),
skin_color = factor( c("black", "black", "white", "yellow", "white")),
stringsAsFactors=FALSE
)

Data frame is easier to examine

df
  noBuy height isMale   name gender skin_color
1     3    175  FALSE    Amy      F      black
2     5    168   TRUE    Bob      M      black
3     1    180  FALSE  Cindy      F      white
4     1    181   TRUE  Danny      M     yellow
5     3    169   TRUE Edward      M      white

to select

subset(df, isMale & skin_color == "black")
  noBuy height isMale name gender skin_color
2     5    168   TRUE  Bob      M      black

to summaries

mean(df$height) [1] 174.6 to count 分類計數 table(df$gender)

F M
2 3 

to 分類統計

tapply(df$height, df$gender, mean)
    F     M
177.5 172.7 

##### 2.2 資料框的索引

Data Frames are a two-dimensional objects, so they take two indexes:

• between the angle baskets and separated by comma - df[ row_idx, col_idx ]
• usually row are selected by condition (logical index)
• column are selected by name (name index)

We can index data frame by all three forms of index:

• Positional/Integer Index: df[c(1,2), c(2,3)]
• Name/Character Index: df[c(1,2), c("noBuy","height")]
• Condition/Logical Index: df[df$gender=="M, c("noBuy","height")] and some others extra indexing forms: • Empty Index:df[c(1,2), ] selects all columns (rows) • column name ($): df$name selects a specific column • subset() & filter() • subset(df, height<175 & isMale) • subset(df, height<175 & isMale, name) • subset(df, height<175 & isMale)$name
• subset(df, height<175 & isMale, c(name, noBuy))

Below are some examples …

df[c(1,2), c(2,3)]
  height isMale
1    175  FALSE
2    168   TRUE
df[c(1,2), ]
  noBuy height isMale name gender skin_color
1     3    175  FALSE  Amy      F      black
2     5    168   TRUE  Bob      M      black
df[df$height < 175 & df$isMale, ]
  noBuy height isMale   name gender skin_color
2     5    168   TRUE    Bob      M      black
5     3    169   TRUE Edward      M      white
df[df$height < 175 & df$isMale, "name"]
[1] "Bob"    "Edward"
df$name[df$height < 175 & df$isMale] [1] "Bob" "Edward" subset(df, height<175 & isMale)  noBuy height isMale name gender skin_color 2 5 168 TRUE Bob M black 5 3 169 TRUE Edward M white subset(df, height<175 & isMale, name)  name 2 Bob 5 Edward subset(df, height<175 & isMale)$name
[1] "Bob"    "Edward"
subset(df, height<175 & isMale, c(name, noBuy))
    name noBuy
2    Bob     5
5 Edward     3

🗿 QUIZ：
Annotate the function of each underlying code chunks as remarks …

For an example

df$name[df$isMale] # names of all males  
[1] "Bob"    "Danny"  "Edward"
df[df$height > 180 , "name"] #  [1] "Danny" subset(df, height > 170 & !isMale)$name # 
[1] "Amy"   "Cindy"
mean(df$height[df$isMale]) # 
[1] 172.7
df$height[!df$isMale] %>% mean # 
[1] 177.5
sum( subset(df, !isMale)$noBuy ) #  [1] 4 subset(df, skin_color == "white" & !isMale ) %>% nrow #  [1] 1 sum(df$skin_color == "white" & !df\$isMale ) # 
[1] 1