I have a time series of monthly data for 10 years:
myts <- ts(rnorm(12*10), frequency = 12, start = 2001)
Now, I'd like to plot the data but with the x-axis restricted to a range/ticks from Jan - Dec (generic year). Thus, the whole time series should be broken in ten lines where each line starts at Jan and ends at Dec. So multiple lines should be overplotted each other which I'd like to use to visually compare different years. Is there a straight forward command to do that in R?
So far I came up with following solution using matplot which might not be the most sophisticated one:
mydf <- as.data.frame(matrix(myts, 12))
matplot(mydf,type="l")
Or even better would be a way to calculate an average value and the corresponding CI/standard deviation for each month and plot then the average from Jan - Dec as a line and the corresponding CI/standard deviation as a band around the line for the average.
Consider using ggplot2.
library(ggplot2)
library(ggfortify)
d <- fortify(myts)
d$year <- format(d$Index, "%Y")
d$month <- format(d$Index, "%m")
It's useful to start by reshaping the ts object into a long dataframe. Given the dataframe, it's straightforward to create the plots you have in mind:
ggplot(d, aes(x = month, y = Data, group = year, colour = year)) +
geom_line()
ggplot(d, aes(x = month, y = Data, group = month)) +
stat_summary(fun.data = mean_se, fun.args = list(mult = 1.96))
Result:
You can also summarise the data yourself, then plot it:
d_sum <- do.call(rbind, (lapply(split(d$Data, d$month), mean_se, mult = 1.96)))
d_sum$month <- rownames(d_sum)
ggplot(d_sum, aes(x = month, y = y, ymin = ymin, ymax = ymax)) +
geom_errorbar() +
geom_point() +
geom_line(aes(x = as.numeric(month)))
Result:
Related
The data I'm trying to visualize has two assessments performed on the same time scale, but at different intervals (i.e. Temperature taken 4 times over 12 hours, pain assessed every hour), as an example:
df <- data.frame(
Hour = 0:12,
Pain = sample(7:10, 13, TRUE),
Temp = c(36.8,rep(NA,3),37.2,rep(NA,3),37.4,rep(NA,3),37.0)
)
In ggplot, I'd visualize it like this:
library(ggplot2)
ggplot(df) +
geom_col(aes(x = Hour, y = Pain)) +
geom_point(aes(x = Hour, y = Temp/3)) +
geom_line(data = df[!is.na(df$Temp), ], aes(x = Hour, y = Temp/3)) +
scale_y_continuous(sec.axis = sec_axis(~.*3,name = "Temp"))
In echarts4r however, I cannot get my line to be continuous (I believe because of the NA values)
library(echarts4r)
e_chart(df, Hour) |> e_bar(Pain) |> e_line(Temp)
Is there a way to subset the dataset before e_line to remove the missing values - I've searched online and can't seem to find anything? Or should I be structuring my data differently?
I am practicing with R and have hit a speedbump while trying to create a graph of airline passengers per month.
I want to show a separate monthly line graph for each year from 1949 to 1960 whereby data has been recorded. To do this I have used ggplot to create a line graph with the values per month. This works fine, however when I try to separate this by year using facet_wrap() and formatting the current month field: facet_wrap(format(air$month[seq(1, length(air$month), 12)], "%Y")); it returns this:
Graph returned
I have also tried to format the facet by inputting my own sequence for the years: rep(c(1949:1960), each = 12). This returns a different result which is better but still wrong:
Second graph
Here is my code:
air = data.frame(
month = seq(as.Date("1949-01-01"), as.Date("1960-12-01"), by="months"),
air = as.vector(AirPassengers)
)
ggplot(air, aes(x = month, y = air)) +
geom_point() +
labs(x = "Month", y = "Passengers (in thousands)", title = "Total passengers per month, 1949 - 1960") +
geom_smooth(method = lm, se = F) +
geom_line() +
scale_x_date(labels = date_format("%b"), breaks = "12 month") +
facet_wrap(format(air$month[seq(1, length(air$month), 12)], "%Y"))
#OR
facet_wrap(rep(c(1949:1960), each = 12))
So how do I make an individual graph per year?
Thanks!
In the second try you were really close. The main problem with the data is that you are trying to make a facetted plot with different x-axis values (dates including the year). An easy solution to fix that would be to transform the data to a "common" x axis scale and then do the facetted plot. Here is the code that should output the desired plot.
library(tidyverse)
library(lubridate)
air %>%
# Get the year value to use it for the facetted plot
mutate(year = year(month),
# Get the month-day dates and set all dates with a dummy year (2021 in this case)
# This will get all your dates in a common x axis scale
month_day = as_date(paste(2021,month(month),day(month), sep = "-"))) %>%
# Do the same plot, just change the x variable to month_day
ggplot(aes(x = month_day,
y = air)) +
geom_point() +
labs(x = "Month",
y = "Passengers (in thousands)",
title = "Total passengers per month, 1949 - 1960") +
geom_smooth(method = lm,
se = F) +
geom_line() +
# Set the breaks to 1 month
scale_x_date(labels = scales::date_format("%b"),
breaks = "1 month") +
# Use the year variable to do the facetted plot
facet_wrap(~year) +
# You could set the x axis in an 90° angle to get a cleaner plot
theme(axis.text.x = element_text(angle = 90,
vjust = 0.5,
hjust = 1))
I have a table with many variables. One of the variables contains year information: from 1999 till 2010.
I need to do for each year the same analysis, for instance, to plot a graph, a histogram, etc.
Currently, I subset the data so that each year goes into a data frame(table) and I do the analysis in turn for each year. This is very inefficient:
dates <- (sample(seq(as.Date('1999/01/01'), as.Date('2010/01/01'), by="day"), 50, replace = TRUE))
dt<-data.table( YEAR = format.Date(dates,"%Y"),
Var1=sample(0:100, 50, rep=TRUE),
Var2 =sample(0:500, 50, rep=TRUE)
)
year_1999<-dt[YEAR=="1999"]
plot_1999<- ggplot(year_1999, aes (x=Var1))+
geom_line(aes(y=Var2), size=1, color="blue") +
labs(y="V2", x="V1", title="Year 1999")
plot_1999
How can I better write this in a compact way? I suppose I need a function but I have no idea how to.
Instead of repeating the code several times, we can specify the 'YEAR' in facet_wrap
library(ggplot2)
ggplot(dt, aes(x = Var1, y = Var2)) +
geom_line(aes(size = 1, color = "blue")) +
labs(y = "V2", x = "V1") +
facet_wrap(~ YEAR)
Try this if you want to create a separate plot object for each unique year in dt$YEAR:
for (i in unique(dt$YEAR)) {
year <- dt[YEAR==i]
plot <- ggplot(year, aes (x=Var1))+
geom_line(aes(y=Var2), size=1, color="blue") +
labs(y="V2", x="V1", title="Year 1999")
assign(paste("plot", i, sep=""), plot)
}
I'm trying to create a grouped bar chart of monthly data, aggregated from daily data, over multiple years. I have accomplished what I wanted from my x-axis from faceting, using faceting as a way to apply a secondary sort (on year and month). Now that I've faceted by year, ggplot is showing all months - even when there's no data. This is wasting space and my actual data set has years of data and I want to add labels, so space is an issue.
How can I accomplish this without the wasted space? Is there a way to add the secondary sort (year,month) on the x-axis without faceting?
# create data set
date = seq(as.Date("2014-05-01"),as.Date("2015-05-10"), "day")
revenue = runif(375, min = 0, max = 200)
cost = runif(375, min = 0, max = 100)
df = data.frame(date,revenue,cost)
head(df)
# adding month and year column, then aggregating to monthly revenue and cost
library(plyr)
df$month <- month(df$date, label=TRUE)
df$year <- year(df$date)
df <- as.data.frame(ddply(df, .(month,year), numcolwise(sum)))
# melting the data for a 'grouped chart' in ggplot
library(reshape)
df <-melt(df, id = c("month","year"))
#create chart
library(ggplot2)
g <-ggplot(df, aes(x=month, y=value, fill=variable))
g + geom_bar(stat="identity", position="dodge") + facet_wrap(~ year)
I feel certain that there's a more elegant way to do this from within ggplot. Am I right?
The key is to use scale = "free" in facet_wrap(). By following your code (with a revision), you'll see the graphic below.
set.seed(222)
date = seq(as.Date("2014-05-01"),as.Date("2015-05-10"), "day")
revenue = runif(375, min = 0, max = 200)
cost = runif(375, min = 0, max = 100)
mydf = data.frame(date,revenue,cost)
mydf$month <- month(mydf$date, label=TRUE)
mydf$year <- year(mydf$date)
mydf2 <- as.data.frame(ddply(mydf, .(month,year), numcolwise(sum)))
mydf3 <- melt(mydf2, id = c("month","year"))
ggplot(mydf3, aes(x=month, y=value, fill=variable)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~ year, scale = "free")
I am trying to develop a weather plot like that appears in weather data - something like.
I want to plot daily value (although average value can appear in circle). I am using ggplot2 as it need multifaceted (for each month and year).
st <- as.Date ("2009-1-1")
en <- as.Date ("2011-12-28")
date1 <- seq(st, en, "1 day")
year <- format(date1, "%Y")
month <- format (date1, "%b")
day <- as.numeric (format(date1, "%d"))
avgtm <- round (rnorm (length(date1), 50,5), 1)
maxtm <- avgtm + abs(rnorm (length (avgtm), 0, 5))
mintm <- avgtm - abs(rnorm (length (avgtm), 0, 5))
myd <- data.frame ( year, month, day, avgtm, maxtm, mintm)
require(ggplot2)
qplot(day, avgtm, data = myd, geom = "line", col = "red") +
facet_grid(year ~ month) + theme_bw()
There is one major problem here, line will connect between months.
Each month is plotted to maximum (although one month can end in 28, leaving blank at the month).
Is there a smart way to achieve what I want to achieve. I tried ggplot2 but there might be other nice options.
Edit:
I am trying to add vertical line at the first day of month to demark the months. Here is I tried to find the first day of month:
td = as.Date (seq(as.Date("2009/1/1"), as.Date("2011/12/28"), "months"))
I tried to use this to plot line:
qplot(date, avgtm, data = myd, geom = "line", col = "red") +
facet_wrap(~year, scales='free_x', ncol=1, nrow=3) +
geom_vline(xintercept=td, linetype="dotted") + theme_bw()
But running an error:
Error : Invalid intercept type: should be a numeric vector, a function, or a name of a function
How can plot the vertical line with the date ?
There is a solution with panel.xblocks from latticeExtra:
st <- as.Date("2009-1-1")
en <- as.Date("2011-12-28")
date1 <- seq(st, en, "1 day")
avgtm <- round (rnorm (length(date1), 50,5), 1)
myd <- data.frame(date1, avgtm)
I define two functions to extract month and year values instead of
including them in the data.frame. This approach is useful with
panel.xblocks in the panel function of xyplot:
month <- function(x)format(x, '%m')
year <- function(x)format(x, '%Y')
I use year(date1) as conditioning variable to produce three
panels. Each of these panels will display the time series for that
year (panel.xyplot) and a sequence of contiguous blocks with
alternating colors to highlight months (panel.xblocks). You
should note that the y argument in panel.xblocks is the
function month previously defined:
xyplot(avgtm ~ date1 | year(date1), data=myd,
type='l', layout=c(1, 3),
scales=list(x=list(relation='free')),
xlab='', ylab='',
panel=function(x, y, ...){
panel.xblocks(x, month,
col = c("lightgray", "white"),
border = "darkgray")
panel.xyplot(x, y, lwd = 1, col='black', ...)
})
How about making a date column, then faceting on year only
myd$date <- as.Date(paste(myd$year, myd$month, myd$day), format='%Y %b %d')
qplot(date, avgtm, data = myd, geom = "line", col = "red") +
facet_wrap(~year, scales='free_x', ncol=1, nrow=3)
You could add scales='free_x' to your plot as well, but will find it makes interpretation difficult.
By faceting on month and year you are telling the viewer and the plotting tool that the variables plotted are not continuous. This is incorrect as you've pointed out in your question. Thus, no faceting... You can add tick marks for each month or each day if you want.
library(scales)
qplot(date, avgtm, data = myd, geom = "line", col = "red") +
facet_wrap(~year, scales='free_x', ncol=1, nrow=3) +
scale_x_date(breaks=date_breaks("month"), labels=date_format("%b"))
Alternatively you could extract day of year and plot everything on one plot, coloring by year:
myd$doy <- format(myd$date, '%j')
p <- ggplot(myd, aes(x=doy, y=avgtm, color=year, group=year))
p + geom_line()
or
p + geom_smooth()