💡 Major Types and
Structures
Major Data Types/Classes
■
Integer(int)
: c(10L, 55L, 99L)
,
21:29
■ Numeric
:
c(10.5, 22.3, 22)
■ Logical(logi)
:
c(TRUE,FALSE,FALSE,TRUE)
, c(T,F,T,T)
■
Character(chr)
: c("Amy","Bob","Cindy")
■
Factor
: as.factor( c("IBMBA","MIS","BA") )
■ Date
:
as.Date( c("2020-10-01","2020-11-01","2020-12-01") )
Major Data Structures
■ Atomic
: a value
of certain type
■ Vector
: one-dimension arrays of
values in the
■ Matrix
:
two-dimension arrays of values in the
■
Data Frame
: the most common structure, compose of equal
length columns of various types
■ List
: the most
flexible structure, a sequence of objects of various types
🌻 class(x)
tells the class/type of x
as an
character string
[1] "numeric"
atomic/singular objects of various types/classes
c( class(22), class(22L), class(FALSE),
class("Amy"),
class( as.factor("Amy") ),
class( as.Date("2021-09-23") ) )
[1] "numeric" "integer" "logical" "character" "factor" "Date"
[1] "numeric" "integer" "logical" "character" "factor" "Date"
💡 Iteratives and Iteration
■
c()
constructs a vector
■ list()
constructs a list
■ Iterative objects is convenient for iterative
operation
define vectors
freq = c(3L, 5L, 1L, 1L, 3L) # integer vector
amount = c(100, 168, 180, 280, 199) # numeric vector
member = c(FALSE, TRUE, FALSE, TRUE, TRUE) # logical vector
name = c("Amy", "Bob", "Cindy", "Danny", "Edward") # character vector
define a factor/categorical vector
# factor/categorical vectors
skin = as.factor( c("black", "black", "white", "yellow", "white") ) # 3 levels
gender = as.factor( c("F", "M", "F", "M", "M") ) # 2 levels
define a Date vector
examine the data structure str()
and data types
class()
int [1:5] 3 5 1 1 3
[1] "integer"
num [1:5] 100 168 180 280 199
[1] "numeric"
put the vectors in a list
check the data classes/types
[1] "character" "integer" "numeric" "logical" "factor" "factor" "Date"
check the data structures
chr [1:5] "Amy" "Bob" "Cindy" "Danny" "Edward"
int [1:5] 3 5 1 1 3
num [1:5] 100 168 180 280 199
logi [1:5] FALSE TRUE FALSE TRUE TRUE
Factor w/ 2 levels "F","M": 1 2 1 2 2
Factor w/ 3 levels "black","white",..: 1 1 2 3 2
Date[1:5], format: "2021-08-02" "2021-03-02" "2021-05-20" "2021-07-12" "2021-06-15"
str(obj)
print the structure of obj
and
return NULLx
), so it
doesn’t mess up the printoutsx =
). See what’d happen.
🌻 list
is the most flexible data structure. It will be
elaborated latter.
Since the vectors are all in the same length, we can put them in a data frame
Data frames are easier to observe …
name freq amount member gender skin last.buy
1 Amy 3 100 FALSE F black 2021-08-02
2 Bob 5 168 TRUE M black 2021-03-02
3 Cindy 1 180 FALSE F white 2021-05-20
4 Danny 1 280 TRUE M yellow 2021-07-12
5 Edward 3 199 TRUE M white 2021-06-15
Easier to manipulate …
name freq amount member gender skin last.buy
2 Bob 5 168 TRUE M black 2021-03-02
5 Edward 3 199 TRUE M white 2021-06-15
[1] 185.4
F M
140.0 215.7
🌻 Operations of data frame will be further elaborated in the
datacamp assignment.