x[5]
take the 5th element in x
x[c(2,5)]
takes the 2nd and the 5th elements in
x
x[2:5]
takes the 2nd, 3rd, 4th and the 5th elements in
x
First, let’s define some vectors.
= c(3L, 5L, 1L, 1L, 3L) # integer vector
freq = c(100, 168, 180, 280, 199) # numeric
amount = c(FALSE, TRUE, FALSE, TRUE, TRUE) # logical
member = c("Amy", "Bob", "Cindy", "Danny", "Edward") # character
name = as.factor( c("F", "M", "F", "M", "M") ) # factor
gender = as.factor(c('A','B','A','B','A')) # factor class
🌻 Position indexes take the format of
c(1,5)] freq[
[1] 3 3
2:4] gender[
[1] M F M
Levels: F M
5] gender[
[1] M
Levels: F M
An atomic object is equivalent to an vector of length 1
= c(1:3, 5) # we can define an integer vector object
i # and use it as an position index amount[i]
[1] 100 168 180 199
Index can be used to
c(1,2) ] amount[
[1] 100 168
c(1,2,2,3,3,3,4,4,4,4) ] amount[
[1] 100 168 168 180 180 180 280 280 280 280
# an position index may be longer than the targeted vector
or
c(5,4,3,2,1) ] amount[
[1] 199 280 180 168 100
amount
is an unnamed numeric vector
amount
[1] 100 168 180 280 199
names(amount)
NULL
We can assign a name to each element of an collective objects
names(amount) = c("Amy", "Bob", "Cindy", "Danny", "Edward")
amount
Amy Bob Cindy Danny Edward
100 168 180 280 199
Now amount
becomes an named numeric
vector, and we can access its elements by names.
🌻 A name index is an
= c("Bob", "Cindy")
i amount[ i ]
Bob Cindy
168 180
🌻 Conditional indexes are
c(T,T,F,F,T) ] freq[
[1] 3 5 3
🌻 Logical indexes let us select elements
<= 2 ] amount[ freq
Cindy Danny
180 280
🗿 QUIZ:
With the vectors
defined below …
= c(3L, 5L, 1L, 1L, 3L) # Integer
noBuy = c(175, 168, 180, 181, 169) # numeric
height = c(FALSE, TRUE, FALSE, TRUE, TRUE) # logical
isMale = c("Amy", "Bob", "Cindy", "Danny", "Edward") # character
name = factor( c("F", "M", "F", "M", "M") ) # factor
gender = factor( c("black", "black", "white", "yellow", "white") ) # factor skin_color
Use index and math functions to answer the following questions
…
🗿: list the name of males
name[isMale]
[1] "Bob" "Danny" "Edward"
🗿: list the names of those who higher than 180
#
🗿: list the names of those who higher than 180 and skin color is “yellow”
#
🗿: calculate the average height of males
mean( height[gender == "M"] )
[1] 172.7
🗿: calculate the total number of
buys (noBuy
) by females
#
🗿: count the number of white female
#
Data frame is the most common and useful data structure. Usually
= data.frame(
df noBuy = c(3L, 5L, 1L, 1L, 3L),
height = c(175, 168, 180, 181, 169),
isMale = c(FALSE, TRUE, FALSE, TRUE, TRUE),
name = c("Amy", "Bob", "Cindy", "Danny", "Edward"),
gender = factor( c("F", "M", "F", "M", "M") ),
skin_color = factor( c("black", "black", "white", "yellow", "white")),
stringsAsFactors=FALSE
)
Data frame is easier to
df
noBuy height isMale name gender skin_color
1 3 175 FALSE Amy F black
2 5 168 TRUE Bob M black
3 1 180 FALSE Cindy F white
4 1 181 TRUE Danny M yellow
5 3 169 TRUE Edward M white
to
subset(df, isMale & skin_color == "black")
noBuy height isMale name gender skin_color
2 5 168 TRUE Bob M black
to
mean(df$height)
[1] 174.6
to count
table(df$gender)
F M
2 3
to
tapply(df$height, df$gender, mean)
F M
177.5 172.7
Data Frames are a two-dimensional objects, so they take two indexes:
df[ row_idx, col_idx ]
We can index data frame by all three forms of index:
df[c(1,2), c(2,3)]
df[c(1,2), c("noBuy","height")]
df[df$gender=="M, c("noBuy","height")]
and some others extra indexing forms:
df[c(1,2), ]
selects all columns
(rows)$
): df$name
selects a
specific columnsubset()
& filter()
:
subset(df, height<175 & isMale)
subset(df, height<175 & isMale, name)
subset(df, height<175 & isMale)$name
subset(df, height<175 & isMale, c(name, noBuy))
Below are some examples …
c(1,2), c(2,3)] df[
height isMale
1 175 FALSE
2 168 TRUE
c(1,2), ] df[
noBuy height isMale name gender skin_color
1 3 175 FALSE Amy F black
2 5 168 TRUE Bob M black
$height < 175 & df$isMale, ] df[df
noBuy height isMale name gender skin_color
2 5 168 TRUE Bob M black
5 3 169 TRUE Edward M white
$height < 175 & df$isMale, "name"] df[df
[1] "Bob" "Edward"
$name[df$height < 175 & df$isMale] df
[1] "Bob" "Edward"
subset(df, height<175 & isMale)
noBuy height isMale name gender skin_color
2 5 168 TRUE Bob M black
5 3 169 TRUE Edward M white
subset(df, height<175 & isMale, name)
name
2 Bob
5 Edward
subset(df, height<175 & isMale)$name
[1] "Bob" "Edward"
subset(df, height<175 & isMale, c(name, noBuy))
name noBuy
2 Bob 5
5 Edward 3
🗿 QUIZ:
Annotate the function
of each underlying code chunks as remarks …
For an example
$name[df$isMale] # names of all males df
[1] "Bob" "Danny" "Edward"
$height > 180 , "name"] # df[df
[1] "Danny"
subset(df, height > 170 & !isMale)$name #
[1] "Amy" "Cindy"
mean(df$height[df$isMale]) #
[1] 172.7
$height[!df$isMale] %>% mean # df
[1] 177.5
sum( subset(df, !isMale)$noBuy ) #
[1] 4
subset(df, skin_color == "white" & !isMale ) %>% nrow #
[1] 1
sum(df$skin_color == "white" & !df$isMale ) #
[1] 1