Selecting and plotting months in ggplot2 - r

I have a time series dataset in this format with two columns date (e.g Jan 1980, Feb 1980...Dec 2013) and it's corresponding temperature. This dataset is from 1980 to 2013. I am trying to subset and plot time series in ggplot for the months separately (e.g I only want all Feb so that I can plot it using ggplot). Tried the following, but the Feb1 is empty
Feb1 <- subset(temp, date ==5)
The structure of my dataset is:
'data.frame': 408 obs. of 2 variables:
$ date :Class 'yearmon' num [1:359] 1980 1980 1980 1980 1980 ...
$ temp: int 16.9 12.7 13 6 6.0 5 6 10.9 0.9 16 ...

What about this?:
library(zoo)
# Generating some data:
df <- data.frame(date = as.yearmon("1980-01") + 0:407/12, val = rnorm(408))
# Subsetting to get a specific month:
df.sub <- subset(df, format(df$date,"%b")=="Jan")
# The actual plot:
ggplot(df.sub) + geom_line(aes(x = as.Date(date), y = val))

I believe your column being in a 'yearmon' class comes in the format "mm YY". I'm a little confused by how you are subsetting the data by 'date==5'. Below I try a method.
temp$month<-substr(temp$date,1,3)
Feb1<-subset(temp,month=='Feb')
#more elegant
Feb1<-subset(temp,substr(temp$date,1,3)=='Feb')

You can also directly plot the subset in ggplot2 without creating a new data frame.
Based on RStudent's solution:
library(zoo)
# Generating some data:
df <- data.frame(date = as.yearmon("1980-01") + 0:407/12, val = rnorm(408))
library(ggplot2)
ggplot(df[format(df$date,"%b")=="Jan", ], aes(x = as.Date(date), y = val))+
geom_line()

Convert the data to zoo, use cycle to split into months and autoplot.zoo to plot. Below we show four different ways to plot. First we plot just January. Then we plot all the months with each month in a separate panel and then we plot all months with each month as a separate series all in the same panel. Finally we use monthplot (not ggplot2) to plot them all in a single panel in a different manner.
library(zoo)
library(ggplot2)
# test data
set.seed(123)
temp <- data.frame(date = as.yearmon(1980 + 0:479/12), value = rnorm(480))
z <- read.zoo(temp, FUN = identity) # convert to zoo
# split into 12 series and cbind them together so zz480 is 480 x 12
# Then aggregate to zz which is 40 x 12
zz480 <- do.call(cbind, split(z, cycle(z)))
zz <- aggregate(zz480, as.numeric(trunc(time(zz480))), na.omit)
### now we plot this 4 different ways
#####################################
# 1. plot just January
autoplot(zz[, 1]) + ggtitle("Jan")
# 2. plot each in separate panel
autoplot(zz)
# 3. plot them all in a single panel
autoplot(zz, facet = NULL)
# 4. plot them all in a single panel in a different way (not using ggplot2)
monthplot(z)
Note that an alternative way to calculate zz would be:
zz <- zoo(matrix(coredata(z), 40, 12, byrow=TRUE), unique(as.numeric(trunc(time(z)))))
Update: Added plot types and improved the approach.

Related

Can we plot multiple time series in one plot using hydroTSM?

I have daily precipitation data in the following format:
> head(df)
I_2004 G_2004 T_2004 Date
1 3628.79853 2199.310 12741.413 2004-01-01
2 1556.66704 4322.884 5464.395 2004-01-02
3 20.43379 5592.103 72.998 2004-01-03
4 265.94247 8145.041 942.344 2004-01-04
5 914.93958 9668.531 3227.579 2004-01-05
6 2585.63558 6825.905 9043.866 2004-01-06
usually I plot the time series of all 3 variables together using ggplot2:
dfmelt<-melt(df,id.vars="Date")
ggplot(dfmelt,aes(x=Date,y=value,
col=variable,group=12))+
labs(title='ANNUAL')+
geom_line()
I have used hydroTSM to plot ts but never multi variable one. I was wondering if there was any way to achieve this using packages like hydroTSM?
my current method requires subsetting and doing so for multiple years is time consuming. I'm hoping to shorten this using hydroTSM or any other suitable package.
my aim to is plot monthly and seasonal time series plots.
We use a larger data frame below (see Note at end) so that it is possible to display month plots. Convert the data frame df to a zoo series -- hydroTSM makes zoo available -- and use autoplot.zoo . Use aggregate with tail or mean to create a monthly plot and convert that to ts to create the seasonal plot. Except for ggplot2 the following only uses packages already pulled in by hydroTSM.
library(ggplot2)
library(hydroTSM)
z <- read.zoo(df, index = "Date")
autoplot(z) # separate panels
autoplot(z, facets = NULL) # single panel
# monthly plot
zm <- aggregate(z, as.yearmon, tail, 1, frequency = 12)
autoplot(zm)
# for seasonal plot
tt <- as.ts(zm)
nc <- ncol(tt)
opar <- par(mfrow = c(nc, 1), mar = c(2, 4, 0, 4))
for(j in 1:nc) monthplot(tt[, j], ylab = colnames(tt)[j])
par(opar)
Note
df in reproducible form. Larger than in question so that monthly plots can be shown.
set.seed(123)
n <- 700
df <- data.frame(I_2004 = rnorm(n),
G_2004 = rnorm(n),
T_2004 = rnorm(n),
Date = as.Date("2004-01-01") + 1:n - 1)

ggplot: Plotting timeseries data with missing values

I have been trying to plot a graph between two columns from a data frame which I had created. The data values stored in the first column is daily time data named "Time"(format- YYYY-MM-DD) and the second column contains precipitation magnitude, which is a numeric value named "data1".
This data is taken from an excel file "St Lucia3" which has a total 11598 data points and stores daily precipitation data from 1981 to 2018 in two columns:
YearMonthDay (format- "YYYYMMDD", example "19810501")
Rainfall (mm)
The code for importing data into R:
StLucia <- read_excel("C:/Users/hp/Desktop/St Lucia3.xlsx")
The code for time data "Time" :
Time <- as.Date(as.character(StLucia$YearMonthDay), format= "%Y%m%d")
The code for precipitation data "data1" :
library("imputeTS")
data1 <- na_ma(StLucia$`Rainfall (mm)`, k = 4, weighting = "exponential")
The code for data frame "Pecip1" :
Precip1 <- data.frame(Time, data1, check.rows=TRUE)
The code for ggplot is:
ggplot(data = Precip1, mapping= aes(x= Time, y= data1)) + geom_line()
Using ggplot for plotting the graph between "Time" and "data1" results as:
Can someone please explain to me why there is an "unusual kink" like behavior at the right end of the graph, even though there are no such values in the column "data1".
The plot of "data1" data against its index is as shown:
The code for this plot is:
plot(data1, type = "l")
Any help would be highly appreciated. Thanks!
By using pad we can make up for those lost values an assign an NA value as to
avoid plotting in the region of missing data.
library(padr)
library(zoo)
YearMonthDay<-c(19810501,19810502,19810504,19810505)
Data<-c(1,2,3,4)
StLucia<-data.frame(YearMonthDay,Data)
StLucia$YearMonthDay <- as.Date(as.character(StLucia$YearMonthDay), format=
"%Y%m%d")
> StLucia
YearMonthDay Data
1 1981-05-01 1
2 1981-05-02 2
3 1981-05-04 3
4 1981-05-05 4
Note: you can see we are missing a date, but still there is no gap between position 2 and 3, thus plotting versus indexing you would not see a gap.
So lets add the missing date:
StLucia<-pad(StLucia,interval="day")
> StLucia
YearMonthDay Data
1 1981-05-01 1
2 1981-05-02 2
3 1981-05-03 NA
4 1981-05-04 3
5 1981-05-05 4
plot(StLucia, type = "l")
If you want to fill in those NA values, use na.locf() from package(zoo)
Here is a reproducible example - change the names to match your data.
# create sample data
set.seed(47)
dd = data.frame(t = Sys.Date() + c(0:5, 30:32), y = runif(9))
# demonstrate problem
ggplot(dd, aes(t, y)) +
geom_point() +
geom_line()
The easiest solution, as Tung points out, is to use a more appropriate geom, like geom_col:
ggplot(dd, aes(t, y)) +
geom_col()
If you really want to use lines, you should fill in the missing dates with NA for rainfall. H
# calculate all days
all_days = data.frame(t = seq.Date(from = min(dd$t), to = max(dd$t), by = "day"))
# join to original data
library(dplyr)
dd_complete = left_join(all_days, dd, by = "t")
# ggplot won't connect lines across missing values
ggplot(dd_complete, aes(t, y)) +
geom_point() +
geom_line()
Alternately, you could replace the missing values with 0s to have the line just go along the axis, but I think it's nicer to not plot the line, which implies no data/missing data, rather than plot 0s which implies no rainfall.

How to plot different months as different series in the same graph in R

I have the following dataset
head(Data)
Fecha PriceStats
1 01-2002 45.2071
2 02-2002 46.6268
3 03-2002 48.4712
4 04-2002 53.5067
5 05-2002 55.6527
6 06-2002 57.6684
ThereĀ“s a total of 176 observations.
Every row corresponds to a different month.
I would like to create a graph with the 12 months of the year in the x-axis and that every year of the dataset (containing 12 months each) corresponds to a series in the graph so I can plot all the different years overlapping (in these case would be 15 series).
Do I have to set levels on the dataset or ggplot can do that directly?
This should do it:
library(ggplot2)
library(lubridate)
Data <- data.frame(date = seq(ymd('2014/01/01'), ymd('2016/12/01'), 30),
n = sample(1:50, 36))
Data$month <- month(Data$date)
Data$year <- year(Data$date)
ggplot(Data, aes(x = month, y = n, group = year)) +
geom_line(aes(colour = as.factor(year)))

How to draw time series plot for data in date format in R

I have data where there are dates of visits of children.
date
16.08.13
16.08.13
16.08.13
17.08.13
27.08.13
03.09.13
04.09.13
05.09.13
07.09.13
07.09.13
I want to draw a time series plot in R that shows the dates and corresponding number of visits. For example, above there are 3 children on 16.08.2013.
In addition, my data cover 3 years. So, I would like to see the seasonal change over 3 years.
First let us create a longer data set called r. Use table to compute the frequencies, convert to a zoo time series and plot. Then compute the mean of each year/month and create a monthplot. Finally plot the means over all months vs month.
# test data
set.seed(123)
r <- as.Date("2000-01-01") + cumsum(rpois(1000, 1))
library(zoo)
opar <- par(mfrow = c(2,2)) # create a 2x2 grid of plots - optional
# plot freq vs. time
tab <- table(r)
z <- zoo(c(tab), as.Date(names(tab)))
plot(z) # this will be the upper left plot
# plot each month separately
zm <- aggregate(z, as.yearmon, mean)
monthplot(zm) # upper right plot
# plot month means
# zc <- aggregate(zm, cycle(zm), mean) # alternative but not equivalent
zc <- aggregate(z, cycle(as.yearmon(time(z))), mean)
plot(zc) # lower plot
par(opar) # reset grid
Note: The sum of z for each year/month is zym and the average of those for all the January months, all the February months, ...., all December months is:
zym <- aggregate(z, as.yearmon(time(z)), sum)
aggregate(zym, cycle(as.yearmon(time(zym))), mean)
With ggplot and scale packages you can try something like this (which is a piece of my code that actually works):
library(ggplot2)
library(lubridate)
library(scales)
g_sm_ddply <- ggplot(final_data, aes(x = as.Date(dates), y = scon_me, fill = tipo))
g_sm_ddply + geom_bar(position = "dodge", stat = "identity") +
labs(title = "SCONTRINO MEDIO ACQ_ISS_KPMG NUOVA CLUSTERIZZAZIONE", x = "data", y = "scontrino medio")+
scale_x_date(breaks = date_breaks("month"), labels = date_format("%Y/%m"))
I assume that you are already familiar with basic data manipulation in R.
One way to do what you want, is to tabulate the date vector and create a proper times series object or a data.frame
df <- as.data.frame(table(date)) ### tabulate
df$date <- as.Date(df$date, "%d.%m.%y") ### turn your date to Date class
df
## date Freq
## 1 2013-09-03 1
## 2 2013-09-04 1
## 3 2013-09-05 1
## 4 2013-09-07 2
## 5 2013-08-16 3
## 6 2013-08-17 1
## 7 2013-08-27 1
plot(Freq ~ date, data = df, pch = 19) ### plot
So far we are still missing the seasonal trend analysis the OP asked for. I think it is the more difficult part of the question.
If your data covers only 3 years, you can maybe observe the seasonal changes by simple looking at the monthly average daily visits.
Depending on your needs you can go with a simple monthly plot or you might have to prepare further your data to compute the exact trend in seasonality.
Below a suggestion on how to compute and plot the Monthly average number visits per day (with at least one visit per day)
library(ggplot2)
df<-read.table(text="
16.08.13
16.08.13
16.08.13
17.08.13
27.08.13
03.09.13
04.10.13
05.09.13
07.09.13
07.01.14
03.02.14
04.03.14
04.03.14
04.03.14
15.05.14
15.05.14
15.09.14
20.10.14
20.09.14 ", col.names="date")
df <- as.data.frame(table(df)) #get the frequency count (daily)
df$date <- as.Date(df$df, "%d.%m.%y") # turn your date variable to Date class
df$year<-sapply(df$df,function(x) strptime(x,"%d.%m.%Y")$year+1900) #extract month of the visit
df$month<-sapply(df$df,function(x) strptime(x,"%d.%m.%Y")$mon+1) #extract year of the visit
#plot daily frequency
ggplot(aes(x=date, y=Freq), data = df) +
geom_bar(stat = 'identity', position = 'dodge')+
ggtitle("Daily visits")
#compute monthly average visit per day (for days with at least one visit)
library(dplyr)
df2<-df[,c("year","month","Freq")]%>%
group_by(year,month) %>%
summarise_each(funs(mean=mean(., na.rm=TRUE)))
#recreate a date for the graph
df2$date<-as.Date(paste(rep("01",length(df2)),df2$month,df2$year),"%d %m %y")
ggplot(aes(x=date, y=Freq), data = df2) +
geom_bar(stat = 'identity', position = 'dodge')+
ggtitle("Average daily visits per month")

plot data grouped by year

I'm trying to create a plot of temperatures but I can't aggregate the data by year.
My data comes in this form:
02-2012 7.2
02-2013 7.2
02-2010 7.9
02-2011 9.8
03-2013 10.7
03-2010 13.1
03-2012 18.5
03-2011 13.7
02-2003 9.2
...
containing all the months between Jan 2000 and Dec 2013. I've loaded the data with zoo:
f <- function(x) as.yearmon(format(x), "%m-%Y")
temp <- read.zoo(file="results.tsv", FUN = f)
Plotting the temp var I obtain a plot with X axis going from Jan 2000 to Dec 2013, but what I'd like to have is a plot where the X axis goes from Jan to Dec and the temperatures of every year are plotted as a separate line. Any hint?
Thanks,
Andrea
First you'll want to separate out the date into it's year and month components:
names(temp) <- c("date","temperature")
tmpSplit <- strsplit(temp$date, "-")
temp$month <- sapply(tmpSplit, "[", 1)
temp$year <- sapply(tmpSplit, "[", 2)
Then, my preference would be to use the ggplot2 package to plot your data:
library(ggplot2)
ggplot(temp, aes(x=month, y=temperature, group=year)) + geom_line()
Here are several approaches. First set up the data. Note the simplification in read.zoo.
library(zoo)
temp <- read.zoo("results.tsv", FUN = as.yearmon, format = "%m-%Y")
In addition to the plots below there is monthplot in the base of R (stats package). It does not appear to work with as.ts(temp) if temp is the subset of data provided in the question but if the actual data looks more like those in the examples of ?monthplot then it would work.
1) ggplot2
Create a data.frame DF with columns for the month, year and value. cycle can get the month numbers and format with a format of %Y can give the years.
Note that we want the years to be a factor which data frame makes them as the result of format being character. Finally create the plot using ggplot2 or lattice:
library(ggplot2)
DF <- data.frame(Month = coredata(cycle(temp)),
Year = format(index(temp), "%Y"),
Value = coredata(temp))
ggplot(DF, aes(Month, Value, col = Year)) +
geom_line() +
scale_x_continuous(breaks = 1:12)
2) lattice With DF from 2) this would work:
library(lattice)
xyplot(Value ~ Month, DF, group = Year, type = "l", auto = TRUE)
REVISED Added solutions and additional commentary.

Resources