🏠 Key Points :
■ Observe trends by line plot
■ Concatenate dplyr
and ggplot
in a pipeline
■ Comparing trends in various aspects at once
Load libraries and load the comics dataset
pacman::p_load(dplyr,tidyr,ggplot2,plotly,gridExtra)
theme_set(theme_get() + theme(
text=element_text(size=8), legend.key.size=unit(10,"points")
))
D = read.csv("data/comics1.csv") %>% as_tibble
Let’s count and plot the numbers of new roles debuted annually.
count(D, year) %>% # count the number of new roles per year
ggplot(aes(year, n)) + geom_line() # and make a line plot
count()
and ggplot() + geom_line()
is a very handy combo. We can count
by an extra group variable publisher
to compare the trend lines of DC and Marvel.
The trend lines look pretty choppy. A common technique in observing long term trend is to increase the measurement period. Aggregation tends to cancel out random errors, thus smoooth the trend line. Since DC was not active before 1980, …
breaks=seq(1980,2010,5)
D2 = filter(D, year>=1980, year<=2010) %>%
mutate(period = cut(year,breaks,breaks[-1],T) %>%
as.character %>% as.integer)
table(D2$period, useNA='ifany')
1985 1990 1995 2000 2005 2010
690 763 926 659 817 1073
It’s a lot easier to tell the long term trend, isn’t it. We see a valley at 2000 and Marvel seems to bounce back earlier and stronger than DC did.
Combing the power of dplyr
and ggplot
we can compare trends in more than one aspects at once. Let’s start with a simple plot that visualizes how the ratio of each align varies.
count(D2, period, align) %>%
group_by(period) %>%
mutate(rate = n/sum(n)) %>%
ggplot(aes(period, rate)) +
geom_line(aes(col=align))
Is the trend varies across publisher?
count(D2, period, publisher, align) %>% # counts by the group variables
group_by(period, publisher) %>% # then use group mutate to
mutate(rate = n/sum(n)) %>% # convert counts into ratios
ggplot(aes(period, rate)) +
facet_grid(~publisher) + # produce 1 panel per publisher
geom_line(aes(col=align), size=1) + # make the line thicker and
geom_point() # add points to make it look better
🚴 EXERCISE :
Let’s try to compare the trends of align by publisher and sex. As you can see in the chart below, there’d be 4 patterns in 4 separate panels. Can you make the following chart by by modifying the code chunk?
# count(D2, period, publisher, align) %>%
# group_by(period, publisher) %>%
# mutate(rate = n/sum(n)) %>%
# ggplot(aes(period, rate)) +
# facet_grid(sex~publisher) + # 1 panel per publisher per sex
# geom_line(aes(col=align), size=1) +
# geom_point()