🏠 Key Points :
■ Observe trends by line plot
ggplot in a pipeline
■ Comparing trends in various aspects at once
Load libraries and load the comics dataset
Let’s count and plot the numbers of new roles debuted annually.
ggplot() + geom_line() is a very handy combo. We can
count by an extra group variable
publisher to compare the trend lines of DC and Marvel.
The trend lines look pretty choppy. A common technique in observing long term trend is to increase the measurement period. Aggregation tends to cancel out random errors, thus smoooth the trend line. Since DC was not active before 1980, …
1985 1990 1995 2000 2005 2010 690 763 926 659 817 1073
It’s a lot easier to tell the long term trend, isn’t it. We see a valley at 2000 and Marvel seems to bounce back earlier and stronger than DC did.
Combing the power of
ggplot we can compare trends in more than one aspects at once. Let’s start with a simple plot that visualizes how the ratio of each align varies.
Is the trend varies across publisher?
count(D2, period, publisher, align) %>% # counts by the group variables group_by(period, publisher) %>% # then use group mutate to mutate(rate = n/sum(n)) %>% # convert counts into ratios ggplot(aes(period, rate)) + facet_grid(~publisher) + # produce 1 panel per publisher geom_line(aes(col=align), size=1) + # make the line thicker and geom_point() # add points to make it look better
🚴 EXERCISE :
Let’s try to compare the trends of align by publisher and sex. As you can see in the chart below, there’d be 4 patterns in 4 separate panels. Can you make the following chart by by modifying the code chunk?