🏠 Key
Points :
■ Observe trends by line plot
■
Concatenate dplyr and ggplot in a pipeline
■ Comparing trends in various aspects at once
Load libraries and load the comics dataset
pacman::p_load(dplyr,tidyr,ggplot2,plotly,gridExtra)
theme_set(theme_get() + theme(
text=element_text(size=8), legend.key.size=unit(10,"points")
))
D = read.csv("data/comics1.csv") %>% as_tibbleLet’s count and plot the numbers of new roles debuted annually.
count(D, year) %>% # count the number of new roles per year
ggplot(aes(year, n)) + geom_line() # and make a line plot count() and ggplot() + geom_line() is a
very handy combo. We can count by an extra group variable
publisher to compare the trend lines of DC and Marvel.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
generated.
The trend lines look pretty choppy. A common technique in observing long term trend is to increase the measurement period. Aggregation tends to cancel out random errors, thus smoooth the trend line. Since DC was not active before 1980, …
breaks=seq(1980,2010,5)
D2 = filter(D, year>=1980, year<=2010) %>%
mutate(period = cut(year,breaks,breaks[-1],T) %>%
as.character %>% as.integer)
table(D2$period, useNA='ifany')
1985 1990 1995 2000 2005 2010
690 763 926 659 817 1073
It’s a lot easier to tell the long term trend, isn’t it. We see a valley at 2000 and Marvel seems to bounce back earlier and stronger than DC did.
Combing the power of dplyr and ggplot we
can compare trends in more than one aspects at once. Let’s start with a
simple plot that visualizes how the ratio of each align varies.
count(D2, period, align) %>%
group_by(period) %>%
mutate(rate = n/sum(n)) %>%
ggplot(aes(period, rate)) +
geom_line(aes(col=align))Is the trend varies across publisher?
count(D2, period, publisher, align) %>% # counts by the group variables
group_by(period, publisher) %>% # then use group mutate to
mutate(rate = n/sum(n)) %>% # convert counts into ratios
ggplot(aes(period, rate)) +
facet_grid(~publisher) + # produce 1 panel per publisher
geom_line(aes(col=align), size=1) + # make the line thicker and
geom_point() # add points to make it look better
🚴 EXERCISE :
Let’s try to
compare the trends of align by publisher and sex. As you can see in the
chart below, there’d be 4 patterns in 4 separate panels. Can you make
the following chart by by modifying the code chunk?
# count(D2, period, publisher, align) %>%
# group_by(period, publisher) %>%
# mutate(rate = n/sum(n)) %>%
# ggplot(aes(period, rate)) +
# facet_grid(sex~publisher) + # 1 panel per publisher per sex
# geom_line(aes(col=align), size=1) +
# geom_point()