wholesales datasetW = read.csv('data/wholesales.csv')
W$Channel = factor( paste0("Ch",W$Channel) )
W$Region = factor( paste0("Reg",W$Region) )
W[3:8] = lapply(W[3:8], log, base=10) 
summary(W) Channel    Region        Fresh            Milk         Grocery     
 Ch1:298   Reg1: 77   Min.   :0.477   Min.   :1.74   Min.   :0.477  
 Ch2:142   Reg2: 47   1st Qu.:3.495   1st Qu.:3.19   1st Qu.:3.333  
           Reg3:316   Median :3.930   Median :3.56   Median :3.677  
                      Mean   :3.792   Mean   :3.53   Mean   :3.666  
                      3rd Qu.:4.229   3rd Qu.:3.86   3rd Qu.:4.028  
                      Max.   :5.050   Max.   :4.87   Max.   :4.968  
     Frozen     Detergents_Paper   Delicassen   
 Min.   :1.40   Min.   :0.477    Min.   :0.477  
 1st Qu.:2.87   1st Qu.:2.409    1st Qu.:2.611  
 Median :3.18   Median :2.912    Median :2.985  
 Mean   :3.17   Mean   :2.947    Mean   :2.895  
 3rd Qu.:3.55   3rd Qu.:3.594    3rd Qu.:3.260  
 Max.   :4.78   Max.   :4.611    Max.   :4.681  Clustering: Group 
The most common used Methods of Clustering :
💡 Steps of Hierarchical Cluster
Analysis:
   ■ scale() : Standardize the
Variable
   ■ dist() : Calculate Distance Matrix
   ■
hclust() : Call hclust Function
   ■
plot() : Make Deprogram
   ■ rect.hclust()
: Cut the Dendrogram
   ■ cutree() : Obtain the
Clustering Vector
For simplicity, let’s start with two clutering variables
The result of the cultering analysis is returned and kept in the data
object hc.
Make and Interpreting the Dendrogram Determining the number of groups and Cut Dendrogram
Obtain and Save the Clustering Vector
Save it as an categorical variable, so it won’t be interpreted as numerics.
Plot the subjects in the Variable Space
For better looks …
Dimension Reduction: Compress the space of many variables into a low dimension space for easier observation
The most used Methods of Dimension Reduction:
PCA()cmdscale()W[,3:8] %>% PCA(graph=FALSE) %>% fviz_pca_biplot(
  col.ind=W$group,  # 
  label="var", pointshape=19, mean.point=F,
  addEllipses=T, ellipse.level=0.7,
  ellipse.type = "convex", palette="ucscgb",
  repel=T
  )
💡 Key Learnings:
   ■ The
Concept and Purpose of Clustering Analysis 
   ■ Clustering Analysis
in a pipeline
     ■
df %>% scale %>% dist %>% hclust
   ■ Making,
Interpreting and Cutting Dendrogram
 
   ■ The Concept and Purpose
of Dimension Reduction
   ■ Combining Dimension Reduction and
Clustering Analysis
   ■ Visualize the subject/groups in an Reduced
Variable Space.