wholesales datasetW = read.csv('data/wholesales.csv')
W$Channel = factor( paste0("Ch",W$Channel) )
W$Region = factor( paste0("Reg",W$Region) )
W[3:8] = lapply(W[3:8], log, base=10)
summary(W) Channel Region Fresh Milk Grocery
Ch1:298 Reg1: 77 Min. :0.477 Min. :1.74 Min. :0.477
Ch2:142 Reg2: 47 1st Qu.:3.495 1st Qu.:3.19 1st Qu.:3.333
Reg3:316 Median :3.930 Median :3.56 Median :3.677
Mean :3.792 Mean :3.53 Mean :3.666
3rd Qu.:4.229 3rd Qu.:3.86 3rd Qu.:4.028
Max. :5.050 Max. :4.87 Max. :4.968
Frozen Detergents_Paper Delicassen
Min. :1.40 Min. :0.477 Min. :0.477
1st Qu.:2.87 1st Qu.:2.409 1st Qu.:2.611
Median :3.18 Median :2.912 Median :2.985
Mean :3.17 Mean :2.947 Mean :2.895
3rd Qu.:3.55 3rd Qu.:3.594 3rd Qu.:3.260
Max. :4.78 Max. :4.611 Max. :4.681
Clustering: Group
The most common used Methods of Clustering :
💡 Steps of Hierarchical Cluster
Analysis:
■ scale() : Standardize the
Variable
■ dist() : Calculate Distance Matrix
■
hclust() : Call hclust Function
■
plot() : Make Deprogram
■ rect.hclust()
: Cut the Dendrogram
■ cutree() : Obtain the
Clustering Vector
For simplicity, let’s start with two clutering variables
The result of the cultering analysis is returned and kept in the data
object hc.
Make and Interpreting the Dendrogram Determining the number of groups and Cut Dendrogram
Obtain and Save the Clustering Vector
Save it as an categorical variable, so it won’t be interpreted as numerics.
Plot the subjects in the Variable Space
For better looks …
Dimension Reduction: Compress the space of many variables into a low dimension space for easier observation
The most used Methods of Dimension Reduction:
PCA()cmdscale()W[,3:8] %>% PCA(graph=FALSE) %>% fviz_pca_biplot(
col.ind=W$group, #
label="var", pointshape=19, mean.point=F,
addEllipses=T, ellipse.level=0.7,
ellipse.type = "convex", palette="ucscgb",
repel=T
)
💡 Key Learnings:
■ The
Concept and Purpose of Clustering Analysis
■ Clustering Analysis
in a pipeline
■
df %>% scale %>% dist %>% hclust
■ Making,
Interpreting and Cutting Dendrogram
■ The Concept and Purpose
of Dimension Reduction
■ Combining Dimension Reduction and
Clustering Analysis
■ Visualize the subject/groups in an Reduced
Variable Space.