🏠 Key
Points :
■ 3 different types of bar plots in
ggplot
■ Bar plots for group comparison
■
Choose the basis of comparison by setting position
■
Ill-designed bar plots could be misleading in an intuitive way! ⏰
Load libraries and set global options
pacman::p_load(dplyr,ggplot2,plotly,gridExtra)
theme_set(theme_get() + theme(
text=element_text(size=8), legend.key.size=unit(10,"points")
))Load the comics dataset
Rows: 7,250
Columns: 9
$ publisher <chr> "dc", "dc", "dc", "dc", "dc", "dc", "dc", "dc", "dc", "dc"…
$ name <chr> "Batman (Bruce Wayne)", "Superman (Clark Kent)", "Green La…
$ align <chr> "Good", "Good", "Good", "Good", "Good", "Good", "Good", "G…
$ eye <chr> "Blue", "Blue", "Brown", "Brown", "Blue", "Blue", "Blue", …
$ hair <chr> "Black", "Black", "Brown", "White", "Black", "Black", "Blo…
$ sex <chr> "Male", "Male", "Male", "Male", "Male", "Female", "Male", …
$ alive <chr> "Living", "Living", "Living", "Living", "Living", "Living"…
$ appearances <int> 3093, 2496, 1565, 1316, 1237, 1231, 1121, 1095, 1075, 1028…
$ year <int> 1939, 1986, 1959, 1987, 1940, 1941, 1941, 1989, 1969, 1956…
tibble is an enhanced data frame defined in
dplyr. We convert D into tibble for better
display.
🌻 There 3 types of bar plots in ggplot
g1 = ggplot(D, aes(x=year)) + geom_histogram(binwidth=10)
g2 = ggplot(D, aes(x=align)) + geom_bar()
g3 = group_by(D, sex) %>%
summarise( casualty.ratio = mean(alive=="Deceased") ) %>%
ggplot(aes(x=sex, y=casualty.ratio)) + geom_col()
grid.arrange(g1,g2,g3,nrow=1) # align three plots in a row 🌻 These bar plot help us
The complexity of bar plots kicks in when the comparisons involve
sub-groups. To distinguish sub-groups within a bars, we map the
fill attribute to the sub-grouping variable.
g1 = ggplot(D, aes(x=year, fill=publisher)) + geom_histogram(binwidth=10)
g2 = ggplot(D, aes(x=align, fill=sex)) + geom_bar()
grid.arrange(g1,g2,nrow=1) # align three plots in a row 🚴 EXERCISE :
Try to make the following chart by modifying the code snippet below.
# group_by(D, sex) %>%
# summarise( casualty.ratio = mean(alive=="Deceased") ) %>%
# ggplot(aes(x=sex, y=casualty.ratio)) + geom_col()
❓ DISCUSSION:
Based on the
chart, above …
We can cope with these problem by using the position
argument within geom_col()
See? Now we can compare the casualty ratio of all of the subgroups easily.
Sub-Group comparison is based on a data structure called
Female Male
Bad 676 2239
Good 1264 1914
Neutral 432 725
However, ggplot cannot take the table format. To be
compatible with the aes() mapping mechanism,
we need to prepare the data in the
count() is a handy way to make long table, when
comparing to group_by() %>% summarise().
# A tibble: 6 × 3
align sex n
<chr> <chr> <int>
1 Bad Female 676
2 Bad Male 2239
3 Good Female 1264
4 Good Male 1914
5 Neutral Female 432
6 Neutral Male 725
positionBy setting the position argument in
geom_col(), we can align and compare the numbers in
different ways.
dx = count(D, align, sex)
gg = lapply(c("stack","dodge","fill"), function(pos) {
ggplot(dx, aes(sex,n,fill=align)) +
geom_col(position=pos, alpha=0.6) + labs(title=pos,y="")
})
grid.arrange(grobs=gg, nrow=1)🌻 Different plot serves different purpose …
stack the numbers emphasizes the sums by sexdodge to compare all of the numbers in the tablefill convert numbers into factions for relative
comparison
❓ QUIZ :
Which of the 3 above
charts is easier to …
🚴 EXERCISE :
Actually we can
make three more plots out of exactly the same data. Try to make the
following chart by modifying the code snippet below.
# gg = lapply(c("stack","dodge","fill"), function(pos) {
# ggplot(dx, aes(sex,n,fill=align)) +
# geom_col(position=pos, alpha=0.6) + labs(title=pos,y="")
# })
# grid.arrange(grobs=gg, nrow=1)
Let’s mutate a decade column in D.
3 4 5 6 7 8 9 10 11
28 271 106 678 823 1304 1581 1803 656
❓ How is the number of each align varies in time by
sex, by publisher?
🌷 See how easy we can answer this seemingly complicate query in two lines of simple code. If I’d answered this query with a table full of numbers, would it be helpful at all.
Comparing to the numbers, the variations in ratios might better
reflect the trend. To convert numbers into ratios, we simply put
position='fill' in geom_bar().
# here we convert `align` into a factor so we can
# re-order the align levels in a desirable way
D2 = D %>% mutate(
align=factor(align,levels=c("Bad","Neutral","Good"))
) %>%
filter(decade >= 6)
ggplot(D2, aes(x=decade, fill=align)) +
geom_bar(position="fill") +
facet_grid(sex~publisher)Below is a bar plot that show ratios of good and bad aligns in different hair and eye colors.
hx = count(D2, hair, sort=T)
ex = count(D2, eye, sort=T)
D2 %>% filter(
hair%in%hx$hair[1:3], eye%in%ex$eye[1:3],
align!="Neutral") %>%
ggplot(aes(decade,fill=align)) +
geom_bar(position="fill",alpha=0.7) +
labs(x="eye",y="hair") +
facet_grid(hair~eye)
🚴 EXERCISE :
Can you modify
the above code chuck