Comparing Trends in Line Plots

🏠 Key Points :
■ Observe trends by line plot
■ Concatenate dplyr and ggplot in a pipeline
■ Comparing trends in various aspects at once

Load libraries and load the comics dataset

pacman::p_load(dplyr,tidyr,ggplot2,plotly,gridExtra)
theme_set(theme_get() + theme(
  text=element_text(size=8), legend.key.size=unit(10,"points")
  ))
D = read.csv("data/comics1.csv") %>% as_tibble

1. Simple Line Plots

Let’s count and plot the numbers of new roles debuted annually.

count(D, year) %>%               # count the number of new roles per year
  ggplot(aes(year, n)) + geom_line()  # and make a line plot

count() and ggplot() + geom_line() is a very handy combo. We can count by an extra group variable publisher to compare the trend lines of DC and Marvel.

count(D, year, publisher) %>%                    
  ggplot(aes(year, n, col=publisher)) + 
  geom_line(size=1)

1.1 Debouncing the Choppy Trend Lines

The trend lines look pretty choppy. A common technique in observing long term trend is to increase the measurement period. Aggregation tends to cancel out random errors, thus smoooth the trend line. Since DC was not active before 1980, …

filter out the data before 1980 and after 2010
divide the 30 years into 6 periods

breaks=seq(1980,2010,5)
D2 = filter(D, year>=1980, year<=2010) %>%  
  mutate(period = cut(year,breaks,breaks[-1],T) %>% 
           as.character %>% as.integer)
table(D2$period, useNA='ifany')


1985 1990 1995 2000 2005 2010 
 690  763  926  659  817 1073

ggplot(D2, aes(period, fill=publisher)) + 
  geom_bar(position='dodge')

It’s a lot easier to tell the long term trend, isn’t it. We see a valley at 2000 and Marvel seems to bounce back earlier and stronger than DC did.

2. Comparing Trends in Multiple Aspects

Combing the power of dplyr and ggplot we can compare trends in more than one aspects at once. Let’s start with a simple plot that visualizes how the ratio of each align varies.

count(D2, period, align) %>% 
  group_by(period) %>% 
  mutate(rate = n/sum(n)) %>% 
  ggplot(aes(period, rate)) + 
  geom_line(aes(col=align))

Is the trend varies across publisher?

count(D2, period, publisher, align) %>%  # counts by the group variables
  group_by(period, publisher) %>%        # then use group mutate to 
  mutate(rate = n/sum(n)) %>%            # convert counts into ratios
  ggplot(aes(period, rate)) + 
  facet_grid(~publisher) +               # produce 1 panel per publisher
  geom_line(aes(col=align), size=1) +    # make the line thicker and
  geom_point()                           # add points to make it look better

🚴 EXERCISE :
Let’s try to compare the trends of align by publisher and sex. As you can see in the chart below, there’d be 4 patterns in 4 separate panels. Can you make the following chart by by modifying the code chunk?

# count(D2, period, publisher, align) %>%  
#   group_by(period, publisher) %>%        
#   mutate(rate = n/sum(n)) %>%            
#   ggplot(aes(period, rate)) + 
#   facet_grid(sex~publisher) +            # 1 panel per publisher per sex
#   geom_line(aes(col=align), size=1) +    
#   geom_point()

Comparing Trends in Line Plots

Tony Chuo, NSYSU

2022-10-12 11:09:04

1. Simple Line Plots

1.1 Debouncing the Choppy Trend Lines

2. Comparing Trends in Multiple Aspects