plot data grouped by year - r

I'm trying to create a plot of temperatures but I can't aggregate the data by year.
My data comes in this form:
02-2012 7.2
02-2013 7.2
02-2010 7.9
02-2011 9.8
03-2013 10.7
03-2010 13.1
03-2012 18.5
03-2011 13.7
02-2003 9.2
...
containing all the months between Jan 2000 and Dec 2013. I've loaded the data with zoo:
f <- function(x) as.yearmon(format(x), "%m-%Y")
temp <- read.zoo(file="results.tsv", FUN = f)
Plotting the temp var I obtain a plot with X axis going from Jan 2000 to Dec 2013, but what I'd like to have is a plot where the X axis goes from Jan to Dec and the temperatures of every year are plotted as a separate line. Any hint?
Thanks,
Andrea

First you'll want to separate out the date into it's year and month components:
names(temp) <- c("date","temperature")
tmpSplit <- strsplit(temp$date, "-")
temp$month <- sapply(tmpSplit, "[", 1)
temp$year <- sapply(tmpSplit, "[", 2)
Then, my preference would be to use the ggplot2 package to plot your data:
library(ggplot2)
ggplot(temp, aes(x=month, y=temperature, group=year)) + geom_line()

Here are several approaches. First set up the data. Note the simplification in read.zoo.
library(zoo)
temp <- read.zoo("results.tsv", FUN = as.yearmon, format = "%m-%Y")
In addition to the plots below there is monthplot in the base of R (stats package). It does not appear to work with as.ts(temp) if temp is the subset of data provided in the question but if the actual data looks more like those in the examples of ?monthplot then it would work.
1) ggplot2
Create a data.frame DF with columns for the month, year and value. cycle can get the month numbers and format with a format of %Y can give the years.
Note that we want the years to be a factor which data frame makes them as the result of format being character. Finally create the plot using ggplot2 or lattice:
library(ggplot2)
DF <- data.frame(Month = coredata(cycle(temp)),
Year = format(index(temp), "%Y"),
Value = coredata(temp))
ggplot(DF, aes(Month, Value, col = Year)) +
geom_line() +
scale_x_continuous(breaks = 1:12)
2) lattice With DF from 2) this would work:
library(lattice)
xyplot(Value ~ Month, DF, group = Year, type = "l", auto = TRUE)
REVISED Added solutions and additional commentary.

Related

How can I plot a dataframe in R given in quarterly years?

i have a dataset given with:
Country Time Value
1 USA 1999-Q1 292929
2 USA 1999-Q2 392023
3. USA 1999-Q3 9392992
4
.... and so on. Now I would like to plot this dataframe with Time being on the x-axis and y being the Value. But the problem I face is I dont know how to plot the Time. Because it is not given in month/date/year. If that would be the case I would just code as.Date( format = "%m%d%y"). I am not allowed to change the quarterly name. So when I plot it, it should stay that way. How can I do this?
Thank you in advance!
Assuming DF shown in the Note at the end, convert the Time column to yearqtr class which directly represents year and quarter (as opposed to using Date class) and use scale_x_yearqtr. See ?scale_x_yearqtr for more information.
library(ggplot2)
library(zoo)
fmt <- "%Y-Q%q"
DF$Time <- as.yearqtr(DF$Time, format = fmt)
ggplot(DF, aes(Time, Value, col = Country)) +
geom_point() +
geom_line() +
scale_x_yearqtr(format = fmt)
(continued after graphics)
It would also be possible to convert it to a wide form zoo object with one column per country and then use autoplot. Using DF from the Note below:
fmt <- "%Y-Q%q"
z <- read.zoo(DF, split = "Country", index = "Time",
FUN = as.yearqtr, format = fmt)
autoplot(z) + scale_x_yearqtr(format = fmt)
Note
Lines <- "
Country Time Value
1 USA 1999-Q1 292929
2 USA 1999-Q2 392023
3 USA 1999-Q3 9392992"
DF <- read.table(text = Lines)
Using ggplot2:
library(ggplot2)
ggplot(df, aes(Time, Value, fill = Country)) + geom_col()
I know other people have already answered, but I think this more general answer should also be here.
When you do as.Date(), you can only do the beginning. I tried it on your data frame (I called it df), and it worked:
> as.Date(df$Time, format = "%Y")
[1] "1999-11-28" "1999-11-28" "1999-11-28"
Now, I don't know if you want to use plot(), ggplot(), the ggplot2 library... I don't know that, and it doesn't matter. However you want to specify the y axis, you can do it this way.

Time series from three years in one plot

I am struggling (due to lack of knowledge and experience) to create a plot in R with time series from three different years (2009, 2013 and 2017). Failing to solve this problem by searching online has led me here.
I wish to create a plot that shows change in nitrate concentrations over the course of May to October for all years, but keep failing since the x-axis is defined by one specific year. I also receive errors because the x-axis lengths differ (due to different number of samples). To solve this I have tried making separate columns for month and year, with no success.
Data example:
date NO3.mg.l year month
2009-04-22 1.057495 2009 4
2013-05-08 1.936000 2013 5
2017-05-02 2.608000 2017 5
Code:
ggplot(nitrat.all, aes(x = date, y = NO3.mg.l, colour = year)) + geom_line()
This code produces a plot where the lines are positioned next to one another, whilst I want a plot where they overlay one another. Any help will be much appreciated.
Nitrate plot
Probably, that will be helpful for plotting:
library("lubridate")
library("ggplot2")
# evample of data with some points for each year
nitrat.all <- data.frame(date = c(ymd("2009-03-21"), ymd("2009-04-22"), ymd("2009-05-27"),
ymd("2010-03-15"), ymd("2010-04-17"), ymd("2010-05-10")), NO3.mg.l = c(1.057495, 1.936000, 2.608000,
3.157495, 2.336000, 3.908000))
nitrat.all$year <- format(nitrat.all$date, format = "%Y")
ggplot(data = nitrat.all) +
geom_point(mapping = aes(x = format(date, format = "%m-%d"), y = NO3.mg.l, group = year, colour = year)) +
geom_line(mapping = aes(x = format(date, format = "%m-%d"), y = NO3.mg.l, group = year, colour = year))
As for selecting of the dates corresponding to a certain month, you may subset your data frame by a condition using basic R-functions:
n_month1 <- 3 # an index of the first month of the period to select
n_month2 <- 4 # an index of the first month of the period to select
test_for_month <- (as.numeric(format(nitrat.all$date, format = "%m")) >= n_month1) &
(as.numeric(format(nitrat.all$date, format = "%m")) <= n_month2)
nitrat_to_plot <- nitrat.all[test_for_month, ]
Another quite an elegant approach is to use filter() from dplyr package
nitrat.all$month <- as.numeric(format(nitrat.all$date, format = "%m"))
library("dplyr")
nitrat_to_plot <- filter(nitrat.all, ((month >= n_month1) & (month <= n_month2)))

R - How to create a seasonal plot - Different lines for years

I already asked the same question yesterday, but I didnt get any suggestions until now, so I decided to delete the old one and ask again, giving additional infos.
So here again:
I have a dataframe like this:
Link to the original dataframe: https://megastore.uni-augsburg.de/get/JVu_V51GvQ/
Date DENI011
1 1993-01-01 9.946
2 1993-01-02 13.663
3 1993-01-03 6.502
4 1993-01-04 6.031
5 1993-01-05 15.241
6 1993-01-06 6.561
....
....
6569 2010-12-26 44.113
6570 2010-12-27 34.764
6571 2010-12-28 51.659
6572 2010-12-29 28.259
6573 2010-12-30 19.512
6574 2010-12-31 30.231
I want to create a plot that enables me to compare the monthly values in the DENI011 over the years. So I want to have something like this:
http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Seasonal%20Plot
Jan-Dec on the x-scale, values on the y-scale and the years displayed by different colored lines.
I found several similar questions here, but nothing works for me. I tried to follow the instructions on the website with the example, but the problem is that I cant create a ts-object.
Then I tried it this way:
Ref_Data$MonthN <- as.numeric(format(as.Date(Ref_Data$Date),"%m")) # Month's number
Ref_Data$YearN <- as.numeric(format(as.Date(Ref_Data$Date),"%Y"))
Ref_Data$Month <- months(as.Date(Ref_Data$Date), abbreviate=TRUE) # Month's abbr.
g <- ggplot(data = Ref_Data, aes(x = MonthN, y = DENI011, group = YearN, colour=YearN)) +
geom_line() +
scale_x_discrete(breaks = Ref_Data$MonthN, labels = Ref_Data$Month)
That also didnt work, the plot looks horrible. I dont need to put all the years in 1 plot from 1993-2010. Actually only a few years would be ok, like from 1998-2006 maybe.
And suggestions, how to solve this?
As others have noted, in order to create a plot such as the one you used as an example, you'll have to aggregate your data first. However, it's also possible to retain daily data in a similar plot.
reprex::reprex_info()
#> Created by the reprex package v0.1.1.9000 on 2018-02-11
library(tidyverse)
library(lubridate)
# Import the data
url <- "https://megastore.uni-augsburg.de/get/JVu_V51GvQ/"
raw <- read.table(url, stringsAsFactors = FALSE)
# Parse the dates, and use lower case names
df <- as_tibble(raw) %>%
rename_all(tolower) %>%
mutate(date = ymd(date))
One trick to achieve this would be to set the year component in your date variable to a constant, effectively collapsing the dates to a single year, and then controlling the axis labelling so that you don't include the constant year in the plot.
# Define the plot
p <- df %>%
mutate(
year = factor(year(date)), # use year to define separate curves
date = update(date, year = 1) # use a constant year for the x-axis
) %>%
ggplot(aes(date, deni011, color = year)) +
scale_x_date(date_breaks = "1 month", date_labels = "%b")
# Raw daily data
p + geom_line()
In this case though, your daily data are quite variable, so this is a bit of a mess. You could hone in on a single year to see the daily variation a bit better.
# Hone in on a single year
p + geom_line(aes(group = year), color = "black", alpha = 0.1) +
geom_line(data = function(x) filter(x, year == 2010), size = 1)
But ultimately, if you want to look a several years at a time, it's probably a good idea to present smoothed lines rather than raw daily values. Or, indeed, some monthly aggregate.
# Smoothed version
p + geom_smooth(se = F)
#> `geom_smooth()` using method = 'loess'
#> Warning: Removed 117 rows containing non-finite values (stat_smooth).
There are multiple values from one month, so when plotting your original data, you got multiple points in one month. Therefore, the line looks strange.
If you want to create something similar to the example your provided, you have to summarize your data by year and month. Below I calculated the mean of each year and month for your data. In addition, you need to convert your year and month to factors if you want to plot it as discrete variables.
library(dplyr)
Ref_Data2 <- Ref_Data %>%
group_by(MonthN, YearN, Month) %>%
summarize(DENI011 = mean(DENI011)) %>%
ungroup() %>%
# Convert the Month column to factor variable with levels from Jan to Dec
# Convert the YearN column to factor
mutate(Month = factor(Month, levels = unique(Month)),
YearN = as.factor(YearN))
g <- ggplot(data = Ref_Data2,
aes(x = Month, y = DENI011, group = YearN, colour = YearN)) +
geom_line()
g
If you don't want to add in library(dplyr), this is the base R code. Exact same strategy and results as www's answer.
dat <- read.delim("~/Downloads/df1.dat", sep = " ")
dat$Date <- as.Date(dat$Date)
dat$month <- factor(months(dat$Date, TRUE), levels = month.abb)
dat$year <- gsub("-.*", "", dat$Date)
month_summary <- aggregate(DENI011 ~ month + year, data = dat, mean)
ggplot(month_summary, aes(month, DENI011, color = year, group = year)) +
geom_path()

How to draw time series plot for data in date format in R

I have data where there are dates of visits of children.
date
16.08.13
16.08.13
16.08.13
17.08.13
27.08.13
03.09.13
04.09.13
05.09.13
07.09.13
07.09.13
I want to draw a time series plot in R that shows the dates and corresponding number of visits. For example, above there are 3 children on 16.08.2013.
In addition, my data cover 3 years. So, I would like to see the seasonal change over 3 years.
First let us create a longer data set called r. Use table to compute the frequencies, convert to a zoo time series and plot. Then compute the mean of each year/month and create a monthplot. Finally plot the means over all months vs month.
# test data
set.seed(123)
r <- as.Date("2000-01-01") + cumsum(rpois(1000, 1))
library(zoo)
opar <- par(mfrow = c(2,2)) # create a 2x2 grid of plots - optional
# plot freq vs. time
tab <- table(r)
z <- zoo(c(tab), as.Date(names(tab)))
plot(z) # this will be the upper left plot
# plot each month separately
zm <- aggregate(z, as.yearmon, mean)
monthplot(zm) # upper right plot
# plot month means
# zc <- aggregate(zm, cycle(zm), mean) # alternative but not equivalent
zc <- aggregate(z, cycle(as.yearmon(time(z))), mean)
plot(zc) # lower plot
par(opar) # reset grid
Note: The sum of z for each year/month is zym and the average of those for all the January months, all the February months, ...., all December months is:
zym <- aggregate(z, as.yearmon(time(z)), sum)
aggregate(zym, cycle(as.yearmon(time(zym))), mean)
With ggplot and scale packages you can try something like this (which is a piece of my code that actually works):
library(ggplot2)
library(lubridate)
library(scales)
g_sm_ddply <- ggplot(final_data, aes(x = as.Date(dates), y = scon_me, fill = tipo))
g_sm_ddply + geom_bar(position = "dodge", stat = "identity") +
labs(title = "SCONTRINO MEDIO ACQ_ISS_KPMG NUOVA CLUSTERIZZAZIONE", x = "data", y = "scontrino medio")+
scale_x_date(breaks = date_breaks("month"), labels = date_format("%Y/%m"))
I assume that you are already familiar with basic data manipulation in R.
One way to do what you want, is to tabulate the date vector and create a proper times series object or a data.frame
df <- as.data.frame(table(date)) ### tabulate
df$date <- as.Date(df$date, "%d.%m.%y") ### turn your date to Date class
df
## date Freq
## 1 2013-09-03 1
## 2 2013-09-04 1
## 3 2013-09-05 1
## 4 2013-09-07 2
## 5 2013-08-16 3
## 6 2013-08-17 1
## 7 2013-08-27 1
plot(Freq ~ date, data = df, pch = 19) ### plot
So far we are still missing the seasonal trend analysis the OP asked for. I think it is the more difficult part of the question.
If your data covers only 3 years, you can maybe observe the seasonal changes by simple looking at the monthly average daily visits.
Depending on your needs you can go with a simple monthly plot or you might have to prepare further your data to compute the exact trend in seasonality.
Below a suggestion on how to compute and plot the Monthly average number visits per day (with at least one visit per day)
library(ggplot2)
df<-read.table(text="
16.08.13
16.08.13
16.08.13
17.08.13
27.08.13
03.09.13
04.10.13
05.09.13
07.09.13
07.01.14
03.02.14
04.03.14
04.03.14
04.03.14
15.05.14
15.05.14
15.09.14
20.10.14
20.09.14 ", col.names="date")
df <- as.data.frame(table(df)) #get the frequency count (daily)
df$date <- as.Date(df$df, "%d.%m.%y") # turn your date variable to Date class
df$year<-sapply(df$df,function(x) strptime(x,"%d.%m.%Y")$year+1900) #extract month of the visit
df$month<-sapply(df$df,function(x) strptime(x,"%d.%m.%Y")$mon+1) #extract year of the visit
#plot daily frequency
ggplot(aes(x=date, y=Freq), data = df) +
geom_bar(stat = 'identity', position = 'dodge')+
ggtitle("Daily visits")
#compute monthly average visit per day (for days with at least one visit)
library(dplyr)
df2<-df[,c("year","month","Freq")]%>%
group_by(year,month) %>%
summarise_each(funs(mean=mean(., na.rm=TRUE)))
#recreate a date for the graph
df2$date<-as.Date(paste(rep("01",length(df2)),df2$month,df2$year),"%d %m %y")
ggplot(aes(x=date, y=Freq), data = df2) +
geom_bar(stat = 'identity', position = 'dodge')+
ggtitle("Average daily visits per month")

Selecting and plotting months in ggplot2

I have a time series dataset in this format with two columns date (e.g Jan 1980, Feb 1980...Dec 2013) and it's corresponding temperature. This dataset is from 1980 to 2013. I am trying to subset and plot time series in ggplot for the months separately (e.g I only want all Feb so that I can plot it using ggplot). Tried the following, but the Feb1 is empty
Feb1 <- subset(temp, date ==5)
The structure of my dataset is:
'data.frame': 408 obs. of 2 variables:
$ date :Class 'yearmon' num [1:359] 1980 1980 1980 1980 1980 ...
$ temp: int 16.9 12.7 13 6 6.0 5 6 10.9 0.9 16 ...
What about this?:
library(zoo)
# Generating some data:
df <- data.frame(date = as.yearmon("1980-01") + 0:407/12, val = rnorm(408))
# Subsetting to get a specific month:
df.sub <- subset(df, format(df$date,"%b")=="Jan")
# The actual plot:
ggplot(df.sub) + geom_line(aes(x = as.Date(date), y = val))
I believe your column being in a 'yearmon' class comes in the format "mm YY". I'm a little confused by how you are subsetting the data by 'date==5'. Below I try a method.
temp$month<-substr(temp$date,1,3)
Feb1<-subset(temp,month=='Feb')
#more elegant
Feb1<-subset(temp,substr(temp$date,1,3)=='Feb')
You can also directly plot the subset in ggplot2 without creating a new data frame.
Based on RStudent's solution:
library(zoo)
# Generating some data:
df <- data.frame(date = as.yearmon("1980-01") + 0:407/12, val = rnorm(408))
library(ggplot2)
ggplot(df[format(df$date,"%b")=="Jan", ], aes(x = as.Date(date), y = val))+
geom_line()
Convert the data to zoo, use cycle to split into months and autoplot.zoo to plot. Below we show four different ways to plot. First we plot just January. Then we plot all the months with each month in a separate panel and then we plot all months with each month as a separate series all in the same panel. Finally we use monthplot (not ggplot2) to plot them all in a single panel in a different manner.
library(zoo)
library(ggplot2)
# test data
set.seed(123)
temp <- data.frame(date = as.yearmon(1980 + 0:479/12), value = rnorm(480))
z <- read.zoo(temp, FUN = identity) # convert to zoo
# split into 12 series and cbind them together so zz480 is 480 x 12
# Then aggregate to zz which is 40 x 12
zz480 <- do.call(cbind, split(z, cycle(z)))
zz <- aggregate(zz480, as.numeric(trunc(time(zz480))), na.omit)
### now we plot this 4 different ways
#####################################
# 1. plot just January
autoplot(zz[, 1]) + ggtitle("Jan")
# 2. plot each in separate panel
autoplot(zz)
# 3. plot them all in a single panel
autoplot(zz, facet = NULL)
# 4. plot them all in a single panel in a different way (not using ggplot2)
monthplot(z)
Note that an alternative way to calculate zz would be:
zz <- zoo(matrix(coredata(z), 40, 12, byrow=TRUE), unique(as.numeric(trunc(time(z)))))
Update: Added plot types and improved the approach.

Resources