### 1. Vector Indexing

• We need index to access specific elements within an collectives
• In R, indexes are specified in angle baskets. Ex.,
• x take the 5th element in x
• x[c(2,5)] takes the 2nd and the 5th elements in x
• x[2:5] takes the 2nd, 3rd, 4th and the 5th elements in x
• Vector is one dimensional, so it need only one index
• Matrix and Data Frame are two dimensional. They need two index.
• There’re three type of index, as will be elaborated below

First, let’s define some vectors.

freq = c(3L, 5L, 1L, 1L, 3L)                # integer vector
amount = c(100, 168, 180, 280, 199)         # numeric
member = c(FALSE, TRUE, FALSE, TRUE, TRUE)  # logical
name = c("Amy", "Bob", "Cindy", "Danny", "Edward")  # character
gender = as.factor( c("F", "M", "F", "M", "M") )    # factor
class = as.factor(c('A','B','A','B','A'))           # factor

##### 1.1 Position Index

🌻 Position indexes take the format of integer vector

freq[c(1,5)]
 3 3
gender[2:4]
 M F M
Levels: F M
gender      
 M
Levels: F M

An atomic object is treated as an vector of length 1

i = c(1:3, 5)   # we can define an integer vector object
amount[i]       # and use it as an position index 
 100 168 180 199

Index can be used to select

amount[ c(1,2) ]
 100 168

reproduce

amount[ c(1,2,2,3,3,3,4,4,4,4) ] 
  100 168 168 180 180 180 280 280 280 280
# an position index may be londer than the targeted vector

or reorder elements in the targeted object

amount[ c(5,4,3,2,1)  ]
 199 280 180 168 100

##### 1.2 Name Index

amount is an unnamed numeric vector

amount
 100 168 180 280 199
names(amount)
NULL

We can assign a name to each element of an collective objects

names(amount) =  c("Amy", "Bob", "Cindy", "Danny", "Edward")
amount
   Amy    Bob  Cindy  Danny Edward
100    168    180    280    199 

Now amount becomes an named numeric vector, and we can access its elements by names.

🌻 A name index is an character vector

i = c("Bob", "Cindy")
amount[ i ]
  Bob Cindy
168   180 

##### 1.3 Conditional (Logical) Index

🌻 Conditional indexes are logical vectors (whose length equal to their targeted vectors.)

freq[ c(T,T,F,F,T) ]
 3 5 3

🌻 Logical indexes let us select elements by conditions

amount[ freq <= 2 ]
Cindy Danny
180   280 

🗿 QUIZ：
With the vectors defined below …

noBuy = c(3L, 5L, 1L, 1L, 3L)                       # Integer
height = c(175, 168, 180, 181, 169)                 # numeric
isMale = c(FALSE, TRUE, FALSE, TRUE, TRUE)          # logical
name = c("Amy", "Bob", "Cindy", "Danny", "Edward")  # character
gender = factor( c("F", "M", "F", "M", "M") )       # factor
skin_color = factor( c("black", "black", "white", "yellow", "white") )  # factor

Use index and math functions to answer the following questions …

🗿: list the name of males

name[isMale]
 "Bob"    "Danny"  "Edward"

🗿: list the names of those who higher than 180

#

🗿: list the names of those who higher than 180 and skin color is “yellow”

#

🗿: calculate the average height of males

mean( height[gender == "M"] )
 172.7

🗿: calculate the total number of buys (noBuy) by females

#

🗿: count the number of white female

#

### 2. Indexing Data Frames

##### 2.1 The Benefit of Data Frame

Data frame is the most common and useful data structure. Usually

• each row of a data frame represents an subject (unit of analysis) and
• each column represents an an attribute or measure of interest.
df = data.frame(
noBuy = c(3L, 5L, 1L, 1L, 3L),
height = c(175, 168, 180, 181, 169),
isMale = c(FALSE, TRUE, FALSE, TRUE, TRUE),
name = c("Amy", "Bob", "Cindy", "Danny", "Edward"),
gender = factor( c("F", "M", "F", "M", "M") ),
skin_color = factor( c("black", "black", "white", "yellow", "white")),
stringsAsFactors=FALSE
)

Data frame is easier to examine

df
  noBuy height isMale   name gender skin_color
1     3    175  FALSE    Amy      F      black
2     5    168   TRUE    Bob      M      black
3     1    180  FALSE  Cindy      F      white
4     1    181   TRUE  Danny      M     yellow
5     3    169   TRUE Edward      M      white

to select

subset(df, isMale & skin_color == "black")
  noBuy height isMale name gender skin_color
2     5    168   TRUE  Bob      M      black

to count

table(df$gender)  F M 2 3  to summaries mean(df$height)
 174.6

to summaries/count by groups

tapply(df$height, df$gender, mean)
    F     M
177.5 172.7 

##### 2.2 Indexing Data Frames

Data Frames are a two-dimensional objects, so they take two indexes:

• between the angle baskets and separated by comma - df[ row_idx, col_idx ]
• usually row are selected by condition (logical index)
• column are selected by name (name index)

We can index data frame by all three forms of index:

• Positional/Integer Index: df[c(1,2), c(2,3)]
• Name/Character Index: df[c(1,2), c("noBuy","height")]
• Condition/Logical Index: df[df$gender=="M, c("noBuy","height")] and some others extra indexing forms: • Empty Index:df[c(1,2), ] selects all columns (rows) • column name ($): df$name selects a specific column • subset() & filter() • subset(df, height<175 & isMale) • subset(df, height<175 & isMale, name) • subset(df, height<175 & isMale)$name
• subset(df, height<175 & isMale, c(name, noBuy))

Below are some examples …

df[c(1,2), c(2,3)]
  height isMale
1    175  FALSE
2    168   TRUE
df[c(1,2), ]
  noBuy height isMale name gender skin_color
1     3    175  FALSE  Amy      F      black
2     5    168   TRUE  Bob      M      black
df[df$height < 175 & df$isMale, ]
  noBuy height isMale   name gender skin_color
2     5    168   TRUE    Bob      M      black
5     3    169   TRUE Edward      M      white
df[df$height < 175 & df$isMale, "name"]
 "Bob"    "Edward"
df$name[df$height < 175 & df$isMale]  "Bob" "Edward" subset(df, height<175 & isMale)  noBuy height isMale name gender skin_color 2 5 168 TRUE Bob M black 5 3 169 TRUE Edward M white subset(df, height<175 & isMale, name)  name 2 Bob 5 Edward subset(df, height<175 & isMale)$name
 "Bob"    "Edward"
subset(df, height<175 & isMale, c(name, noBuy))
    name noBuy
2    Bob     5
5 Edward     3

🗿 QUIZ：
Annotate the function of each underlying code chunks as remarks …

For an example

df$name[df$isMale] # names of all males  
 "Bob"    "Danny"  "Edward"
df[df$height > 180 , "name"] #   "Danny" subset(df, height > 170 & !isMale)$name # 
 "Amy"   "Cindy"
mean(df$height[df$isMale]) # 
 172.7
df$height[!df$isMale] %>% mean # 
 177.5
sum( subset(df, !isMale)$noBuy ) #   4 subset(df, skin_color == "white" & !isMale ) %>% nrow #   1 sum(df$skin_color == "white" & !df\$isMale ) # 
 1