I am practicing with R and have hit a speedbump while trying to create a graph of airline passengers per month.
I want to show a separate monthly line graph for each year from 1949 to 1960 whereby data has been recorded. To do this I have used ggplot to create a line graph with the values per month. This works fine, however when I try to separate this by year using facet_wrap() and formatting the current month field: facet_wrap(format(air$month[seq(1, length(air$month), 12)], "%Y")); it returns this:
Graph returned
I have also tried to format the facet by inputting my own sequence for the years: rep(c(1949:1960), each = 12). This returns a different result which is better but still wrong:
Second graph
Here is my code:
air = data.frame(
month = seq(as.Date("1949-01-01"), as.Date("1960-12-01"), by="months"),
air = as.vector(AirPassengers)
)
ggplot(air, aes(x = month, y = air)) +
geom_point() +
labs(x = "Month", y = "Passengers (in thousands)", title = "Total passengers per month, 1949 - 1960") +
geom_smooth(method = lm, se = F) +
geom_line() +
scale_x_date(labels = date_format("%b"), breaks = "12 month") +
facet_wrap(format(air$month[seq(1, length(air$month), 12)], "%Y"))
#OR
facet_wrap(rep(c(1949:1960), each = 12))
So how do I make an individual graph per year?
Thanks!
In the second try you were really close. The main problem with the data is that you are trying to make a facetted plot with different x-axis values (dates including the year). An easy solution to fix that would be to transform the data to a "common" x axis scale and then do the facetted plot. Here is the code that should output the desired plot.
library(tidyverse)
library(lubridate)
air %>%
# Get the year value to use it for the facetted plot
mutate(year = year(month),
# Get the month-day dates and set all dates with a dummy year (2021 in this case)
# This will get all your dates in a common x axis scale
month_day = as_date(paste(2021,month(month),day(month), sep = "-"))) %>%
# Do the same plot, just change the x variable to month_day
ggplot(aes(x = month_day,
y = air)) +
geom_point() +
labs(x = "Month",
y = "Passengers (in thousands)",
title = "Total passengers per month, 1949 - 1960") +
geom_smooth(method = lm,
se = F) +
geom_line() +
# Set the breaks to 1 month
scale_x_date(labels = scales::date_format("%b"),
breaks = "1 month") +
# Use the year variable to do the facetted plot
facet_wrap(~year) +
# You could set the x axis in an 90° angle to get a cleaner plot
theme(axis.text.x = element_text(angle = 90,
vjust = 0.5,
hjust = 1))
Related
I have a dataframe which contains a variable for week-since-2017. So, it counts up from 1 to 313 in that column. I mutated another variable into the dataframe to indicate the year. So, in my scatterplot, I have each week as a point, but the x-axis is horrid, counting up from 1 to 313. Is there a way I can change the scale at the bottom to instead display the variable year, possibly even adding vertical lines in between to show when the year changes?
Currently, I have this:
ggplot(HS, aes(as.integer(Obs), Total)) + geom_point(aes(color=YEAR)) + geom_smooth() + labs(title="Weekly Sales since 2017",x="Week",y="Written Sales") + theme(axis.line = element_line(colour = "orange", size = 1, linetype = "solid"))
You can convert the number of weeks to a number of days using 7 * Obs and add this value on to the start date (as.Date('2017-01-01')). This gives you a date-based x axis which you can format as you please.
Here, we set the breaks at the turn of each year so the grid fits to them:
ggplot(HS, aes(as.Date('2017-01-01') + 7 * Obs, Total)) +
geom_point(aes(color = YEAR)) +
geom_smooth() +
labs(title = "Weekly Sales since 2017", x = "Week", y = "Written Sales") +
theme(axis.line = element_line(colour = "orange", size = 1)) +
scale_x_date('Year', date_breaks = 'year', date_labels = '%Y')
Data used
Obviously, we don't have your data, so I had to create a reproducible set with the same names and similar values to yours for the above example:
set.seed(1)
HS <- data.frame(Obs = 1:312,
Total = rnorm(312, seq(1200, 1500, length = 312), 200)^2,
YEAR = rep(2017:2022, each = 52))
MWE data:
qdat = data.frame(id = rep(c(rep("a",5), rep("b",5)),2),
month = as.factor(c(rep(1,10), rep(2,10))),
time = c(rep(1:5,4)),
score = c(12,23,34,45,56,
4,5,6,8,7,
11,22,33,55,44,
7,8,9,10,12))
I want to create a facet_wrap plot for each id, with the month as the individual facet. I want to label the max score for each month per id.
I can create the plots for each id as follows. However, this is only labelling the max score per id not for each month per id:
for (ID in unique(qdat$id)) {
p = subset(qdat, id ==ID)
ind = which(p$score == max(p$score))
plot = ggplot(p, aes(x= time, y = score, colour = month))+
geom_point()+
labs(title = ID)+
geom_text(data = p[unique(c(ind)),], aes(label = score))+
facet_wrap(~month)
print(plot)
}
The plot for id ==a, should label 55 in month 2 and the plot for id==b, should label 8 in month 1, in addition to the labels already present.
PS:
The reason for using ind is because I'd also like to be able to label the points which are n rows above and below the max value e.g. ind+1. I appreciate this is an additional question, but would settle for an answer to main question
I find it's easier when working with facets to extract the data beforehand, and add it later:
maxData <- qdat %>%
group_by(month) %>%
slice_max(score)
for (ID in unique(qdat$id)) {
p = subset(qdat, id ==ID)
ind = which(p$score == max(p$score))
plot = ggplot(p, aes(x= time, y = score, colour = month))+
geom_point()+
labs(title = ID)+
geom_text(data = maxData, aes(label = score)) +
facet_wrap(~month) +
theme_minimal()
print(plot)
}
I am trying to plot average daily trip counts by month. However, I am struggling in finding how I can only include the mean number of trips per day by month in the plot instead of the total monthly trips.
The days of the week and months have already been converted from numeric type to abbreviations and have also been ordered (type: ).
Here's what I've tried for the plot.
by_day <- df_temp %>%
group_by(Start.Day)
ggplot(by_day, aes(x=Start.Month,
fill=Start.Month)) +
geom_bar() +
scale_fill_brewer(palette = "Paired") +
labs(title="Number of Daily Trips by Month",
x=" ",
y="Number of Daily Trips")
Here's the plot I am trying to replicate:
You are almost there. Since you did not share a reproducible example, I simulate your data. You may need to adapt the variable naming and/or correct my assumptions.
{lubridate} is a powerful package for date-time crunching. It comes handy when working with dates and binning dates for summaries, etc.
# simulating your data
## a series of dates from June through October
days <- seq(from = lubridate::ymd("2020-06-01")
,to = lubridate::ymd("2020-10-30")
,by = "1 day")
## random trips on each day
set.seed(666)
trips <- sample(2000:5000, length(days), replace = TRUE)
# putting things together in a data frame
df_temp <- data.frame(date = days, counts = trips) %>%
# I assume the variable Start.Month is the monthly bin
# let's use lubridate to "bin" the month from the date
mutate(Start.Month = lubridate::floor_date(date, unit = "month"))
# aggregate trips for each month, calculate average daily trips
by_month <- df_temp %>%
group_by(Start.Month) %>% # group by the binning variable
summarise(Avg.Trips = mean(counts)) # calculate the mean for each group
ggplot( data = by_month
, aes(x = Start.Month, y = Avg.Trips
, fill=as.factor(Start.Month)) # to work with a discrete palette, factorise
) +
# ------------ bar layer -----------------------------------------
## instead of geom_bar(... stat = "identity"), you can use geom_col()
## and define the fill colour
geom_col() +
scale_fill_brewer(palette = "Paired") +
# ------------ if you like provide context with annotation -------
geom_text(aes(label = Avg.Trips %>% round(2)), vjust = 1) +
# ------------ finalise plot with labels, theme, etc.
labs(title="Number of Daily Trips by Month",
x=NULL, # setting an unused lab to NULL is better than printing empty " "!
y="Number of Daily Trips"
) +
theme_minimal() +
theme(legend.position = "none") # to suppress colour legend
I have a time series of monthly data for 10 years:
myts <- ts(rnorm(12*10), frequency = 12, start = 2001)
Now, I'd like to plot the data but with the x-axis restricted to a range/ticks from Jan - Dec (generic year). Thus, the whole time series should be broken in ten lines where each line starts at Jan and ends at Dec. So multiple lines should be overplotted each other which I'd like to use to visually compare different years. Is there a straight forward command to do that in R?
So far I came up with following solution using matplot which might not be the most sophisticated one:
mydf <- as.data.frame(matrix(myts, 12))
matplot(mydf,type="l")
Or even better would be a way to calculate an average value and the corresponding CI/standard deviation for each month and plot then the average from Jan - Dec as a line and the corresponding CI/standard deviation as a band around the line for the average.
Consider using ggplot2.
library(ggplot2)
library(ggfortify)
d <- fortify(myts)
d$year <- format(d$Index, "%Y")
d$month <- format(d$Index, "%m")
It's useful to start by reshaping the ts object into a long dataframe. Given the dataframe, it's straightforward to create the plots you have in mind:
ggplot(d, aes(x = month, y = Data, group = year, colour = year)) +
geom_line()
ggplot(d, aes(x = month, y = Data, group = month)) +
stat_summary(fun.data = mean_se, fun.args = list(mult = 1.96))
Result:
You can also summarise the data yourself, then plot it:
d_sum <- do.call(rbind, (lapply(split(d$Data, d$month), mean_se, mult = 1.96)))
d_sum$month <- rownames(d_sum)
ggplot(d_sum, aes(x = month, y = y, ymin = ymin, ymax = ymax)) +
geom_errorbar() +
geom_point() +
geom_line(aes(x = as.numeric(month)))
Result:
I am trying to develop a weather plot like that appears in weather data - something like.
I want to plot daily value (although average value can appear in circle). I am using ggplot2 as it need multifaceted (for each month and year).
st <- as.Date ("2009-1-1")
en <- as.Date ("2011-12-28")
date1 <- seq(st, en, "1 day")
year <- format(date1, "%Y")
month <- format (date1, "%b")
day <- as.numeric (format(date1, "%d"))
avgtm <- round (rnorm (length(date1), 50,5), 1)
maxtm <- avgtm + abs(rnorm (length (avgtm), 0, 5))
mintm <- avgtm - abs(rnorm (length (avgtm), 0, 5))
myd <- data.frame ( year, month, day, avgtm, maxtm, mintm)
require(ggplot2)
qplot(day, avgtm, data = myd, geom = "line", col = "red") +
facet_grid(year ~ month) + theme_bw()
There is one major problem here, line will connect between months.
Each month is plotted to maximum (although one month can end in 28, leaving blank at the month).
Is there a smart way to achieve what I want to achieve. I tried ggplot2 but there might be other nice options.
Edit:
I am trying to add vertical line at the first day of month to demark the months. Here is I tried to find the first day of month:
td = as.Date (seq(as.Date("2009/1/1"), as.Date("2011/12/28"), "months"))
I tried to use this to plot line:
qplot(date, avgtm, data = myd, geom = "line", col = "red") +
facet_wrap(~year, scales='free_x', ncol=1, nrow=3) +
geom_vline(xintercept=td, linetype="dotted") + theme_bw()
But running an error:
Error : Invalid intercept type: should be a numeric vector, a function, or a name of a function
How can plot the vertical line with the date ?
There is a solution with panel.xblocks from latticeExtra:
st <- as.Date("2009-1-1")
en <- as.Date("2011-12-28")
date1 <- seq(st, en, "1 day")
avgtm <- round (rnorm (length(date1), 50,5), 1)
myd <- data.frame(date1, avgtm)
I define two functions to extract month and year values instead of
including them in the data.frame. This approach is useful with
panel.xblocks in the panel function of xyplot:
month <- function(x)format(x, '%m')
year <- function(x)format(x, '%Y')
I use year(date1) as conditioning variable to produce three
panels. Each of these panels will display the time series for that
year (panel.xyplot) and a sequence of contiguous blocks with
alternating colors to highlight months (panel.xblocks). You
should note that the y argument in panel.xblocks is the
function month previously defined:
xyplot(avgtm ~ date1 | year(date1), data=myd,
type='l', layout=c(1, 3),
scales=list(x=list(relation='free')),
xlab='', ylab='',
panel=function(x, y, ...){
panel.xblocks(x, month,
col = c("lightgray", "white"),
border = "darkgray")
panel.xyplot(x, y, lwd = 1, col='black', ...)
})
How about making a date column, then faceting on year only
myd$date <- as.Date(paste(myd$year, myd$month, myd$day), format='%Y %b %d')
qplot(date, avgtm, data = myd, geom = "line", col = "red") +
facet_wrap(~year, scales='free_x', ncol=1, nrow=3)
You could add scales='free_x' to your plot as well, but will find it makes interpretation difficult.
By faceting on month and year you are telling the viewer and the plotting tool that the variables plotted are not continuous. This is incorrect as you've pointed out in your question. Thus, no faceting... You can add tick marks for each month or each day if you want.
library(scales)
qplot(date, avgtm, data = myd, geom = "line", col = "red") +
facet_wrap(~year, scales='free_x', ncol=1, nrow=3) +
scale_x_date(breaks=date_breaks("month"), labels=date_format("%b"))
Alternatively you could extract day of year and plot everything on one plot, coloring by year:
myd$doy <- format(myd$date, '%j')
p <- ggplot(myd, aes(x=doy, y=avgtm, color=year, group=year))
p + geom_line()
or
p + geom_smooth()