`::p_load(dplyr, ggplot2, readr, FactoMineR, factoextra, dendextend) pacman`

`wholesales`

dataset```
= read.csv('data/wholesales.csv')
W $Channel = factor( paste0("Ch",W$Channel) )
W$Region = factor( paste0("Reg",W$Region) )
W3:8] = lapply(W[3:8], log, base=10)
W[summary(W)
```

```
Channel Region Fresh Milk Grocery
Ch1:298 Reg1: 77 Min. :0.477 Min. :1.74 Min. :0.477
Ch2:142 Reg2: 47 1st Qu.:3.495 1st Qu.:3.19 1st Qu.:3.333
Reg3:316 Median :3.930 Median :3.56 Median :3.677
Mean :3.792 Mean :3.53 Mean :3.666
3rd Qu.:4.229 3rd Qu.:3.86 3rd Qu.:4.028
Max. :5.050 Max. :4.87 Max. :4.968
Frozen Detergents_Paper Delicassen
Min. :1.40 Min. :0.477 Min. :0.477
1st Qu.:2.87 1st Qu.:2.409 1st Qu.:2.611
Median :3.18 Median :2.912 Median :2.985
Mean :3.17 Mean :2.947 Mean :2.895
3rd Qu.:3.55 3rd Qu.:3.594 3rd Qu.:3.260
Max. :4.78 Max. :4.611 Max. :4.681
```

**Clustering:** Group

- subjects： a retail store in each row
- variables： each column is the sales amount (normalized) of a product category

The most common used Methods of Clustering :

- Hierarchical Clustering
- Kmean Clustering
- …

💡 Steps of Hierarchical Cluster
Analysis：

■ `scale()`

: Standardize the
Variable

■ `dist()`

: Calculate Distance Matrix

■
`hclust()`

: Call `hclust`

Function

■
`plot()`

: Make Deprogram

■ `rect.hclust()`

: Cut the Dendrogram

■ `cutree()`

: Obtain the
Clustering Vector

For simplicity, let’s start with two clutering variables

`= W[,3:4] %>% scale %>% dist %>% hclust hc `

The result of the cultering analysis is returned and kept in the data
object `hc`

.

**Make and Interpreting the Dendrogram**
**Determining the number of groups and Cut Dendrogram**

```
plot(hc)
=6; rect.hclust(hc, k=k, border="red") k
```

**Obtain and Save the Clustering Vector**

`$group = cutree(hc, k=8) %>% factor W`

Save it as an categorical variable, so it won’t be interpreted as numerics.

**Plot the subjects in the Variable Space**

```
ggplot(W, aes(x=Fresh, y=Milk, col=group)) +
geom_point(size=3, alpha=0.5)
```

```
= W[,3:7] %>% scale %>% dist %>% hclust
hc plot(hc)
= 6; rect.hclust(hc, k, border="red") k
```