R: Computing monthly averages from hourly data and then plotting - r

After converting a date/time character string into POSIXlt using strptime, I am left with the following (data truncated for ease here):
DateTime North South West East Seast System
1 2008-09-12 01:00:00 1919.9 3721.4 2085.9 2565.5 2571.1 12863.8
2 2008-09-12 02:00:00 1827.0 3518.1 1965.3 2396.9 2410.7 12118.0
3 2008-09-12 03:00:00 1755.4 3388.4 1866.8 2338.7 2335.2 11684.5
4 2008-09-12 04:00:00 1733.5 3327.1 1810.0 2295.6 2290.2 11456.4
5 2008-09-12 05:00:00 1742.7 3327.3 1831.4 2314.2 2302.3 11517.9
6 2008-09-12 06:00:00 1912.2 3504.4 1986.7 2515.0 2502.6 12420.9
I then have aggregated the data (seemingly right) into year-month averages using the following snippet of code:
North_Monthly_Avg <- aggregate(North, list(Date=format(DateTime, "%Y-%m")),mean)
which yields the following:
Date x
1 2008-09 2192.066
2 2008-10 1885.074
3 2008-11 1675.373
4 2008-12 1637.231
5 2009-01 1752.693
6 2009-02 1743.393
I can plot the 'x' values but cannot get the year-months to label properly on the x-axis since it is only plotting the index. Not sure what I am missing...I have played around with axis.POSIXct, but have no luck.

Try zoo and lattice:
library(zoo)
library(lattice)
dat <- 'Date Time North South West East Seast System
2008-09-12 01:00:00 1919.9 3721.4 2085.9 2565.5 2571.1 12863.8
2008-09-12 02:00:00 1827.0 3518.1 1965.3 2396.9 2410.7 12118.0
2008-09-12 03:00:00 1755.4 3388.4 1866.8 2338.7 2335.2 11684.5
2008-09-12 04:00:00 1733.5 3327.1 1810.0 2295.6 2290.2 11456.4
2008-09-12 05:00:00 1742.7 3327.3 1831.4 2314.2 2302.3 11517.9
2008-09-12 06:00:00 1912.2 3504.4 1986.7 2515.0 2502.6 12420.9'
z <- read.zoo(text = dat, header = TRUE, index.column = 1:2, tz = "")
xyplot(z)
zAgg <- aggregate(z$North, by = as.yearmon, FUN = mean)
dat2 <- 'Date x
2008-09 2192.066
2008-10 1885.074
2008-11 1675.373
2008-12 1637.231
2009-01 1752.693
2009-02 1743.393'
zAgg <- read.zoo(text = dat2, header = TRUE, FUN = as.yearmon)
plot(zAgg, xaxt = "n")
tt <- time(zAgg)
m <- format(tt, "%m")
axis(side = 1, at = tt, labels = ifelse(m == "01", trunc(tt), m), cex.axis = .7)

Try using as.integer() on the date
North_Monthly_Avg <- aggregate(North, list(Date=as.integer(format(DateTime, "%Y-%m"))),mean)

#user1062431,
To edit the tick names to your preferred format, edit the m <- format(tt, "%m") line in the answer of Oscar.
To get the format 12 - 2008 you need to modify:
m <- format(tt, "%m") to m <- format(tt, "%m - %Y")
To get the format dec 2008 you need to modify:
m <- format(tt, "%m") to m <- format(tt, "%b %Y")

I think the problem is that there is no date. You will have to settle with a 1st of the month or 15th of the month and apply that to your aggregated table.
I came up with this:
North_Monthly_Avg=aggregate(North,by=list(format(DateTime,'%Y-%m')),mean)
names(North_Monthly_Avg)=c('Month','North')
North_Monthly_Avg$day=15
North_Monthly_Avg$Date=paste(North_Monthly_Avg$Month,North_Monthly_Avg$day,sep='-')
North_Monthly_Avg$Date=strptime(North_Monthly_Avg$Date,'%Y-%m-%d')
plot(m$Date,m$North,xaxt='n') # the xaxt='n' removes any ticks on the x axis
axis(1,as.numeric(m$Date),labels=format(m$Date,'%Y-%m')) # formats the x axis to your liking
I am fairly new to R, so this may not be the most elegant solution, but it will work.
Replace the 15 with 1 in the $day line if you prefer 1st of the month and the sep in paste should be changed to '-0'.

The problem you're having is because you are using format to create the groupings to use for the subdivision. This makes the values into strings, so that plotting functions don't know to plot them like dates.
The cut function has a cut.POSIXlt variant that will do exactly what you need, and preserve the type information so that all the plotting stuff will just work.
Instead of
North_Monthly_Avg <- aggregate(North, list(Date=format(DateTime, "%Y-%m")),mean)
Just use
North_Monthly_Avg <- aggregate(North, cut(DateTime, "month"), mean)

You could can try the package openair and use it's function timeAverage
Hourly to monthly
library(openair)
mydata$date <- as.POSIXct(strptime(mydata$date, format = "%d/%m/%Y %H:%M", tz = "GMT"))
hourly<-timeAverage(mydata, average.time = "day")

Related

how can i extract month and date and year from data column in R

I had a column with date datatype. in my column the dates are in 4/1/2007 format. now I want to extract month value from that column and date value from that column in different column in R. my date are from 01/01/2012 to 01/01/ 2015 plz help me.
If your variable is date type (as you say in the post) simply use following to extract month:
month_var = format(df$datecolumn, "%m") # this will give output like "09"
month_var = format(df$datecolumn, "%b") # this will give output like "Sep"
month_var = format(df$datecolumn, "%B") # this will give output like "September"
If your date variable in not in date format, then you will have to convert them into date format.
df$datecolumn<- as.Date(x, format = "%m/%d/%Y")
Assuming your initial data is character and not POSIX.
df <- data.frame(d = c("4/1/2007", "01/01/2012", "02/01/2015"),
stringsAsFactors = FALSE)
df
# d
# 1 4/1/2007
# 2 01/01/2012
# 3 02/01/2015
These are not yet "dates", just strings.
df$d2 = as.POSIXct(df$d, format = "%m/%d/%Y")
df
# d d2
# 1 4/1/2007 2007-04-01
# 2 01/01/2012 2012-01-01
# 3 02/01/2015 2015-02-01
Now they proper dates (in the R fashion). These two lines extract just a single component from each "date"; see ?strptime for details on all available formats.
df$dY = format(df$d2, "%Y")
df$dm = format(df$d2, "%m")
df
# d d2 dY dm
# 1 4/1/2007 2007-04-01 2007 04
# 2 01/01/2012 2012-01-01 2012 01
# 3 02/01/2015 2015-02-01 2015 02
An alternative method would be to extract the substrings from each string, but now you're getting into regex-pain; for that, I'd suggest sticking with somebody else's regex lessons-learned, and translate through POSIXct (or even POSIXlt if you want).

Interpolation of constrained gaps

In continuity to the following question:
Efficient dynamic addition of rows in dataframe and dynamic calculation in R
I have the following table:
Lines <- "D1,Diff
1,20/11/2014 16:00,0.01
2,20/11/2014 17:00,0.02
3,20/11/2014 19:00,0.03 <-- Gap I
4,21/11/2014 16:00,0.04
5,21/11/2014 17:00,0.06 <-- Gap II
6,21/11/2014 20:00,0.10"
As can be seen there are a gap of 18:00 in 20/11/2014 and two gaps of 18:00 and 19:00 at 21/11/2014.
An addition gap is between the days 20/11/2014 19:00 and 21/11/2014 16:00.
I would to interpolate (fill in) the value which the gap is up to 3 hours between the rows.
The required result should be as followed (in dataframe format):
Lines <- "D1,Diff
1,20/11/2014 16:00,0.01
2,20/11/2014 17:00,0.02
3,20/11/2014 18:00,0.025<-- Added lines
4,20/11/2014 19:00,0.03
5,21/11/2014 16:00,0.04
6,21/11/2014 17:00,0.06
6,21/11/2014 18:00,0.073 <--
6,21/11/2014 19:00,0.086 <--
6,21/11/2014 20:00,0.10"
Here is the code I use that fills in the gap between days that is over 3 hours:
library (zoo)
z <- read.zoo(text = Lines, tz = "", format = "%d/%m/%Y %H:%M", sep = ",")
interpolated1 <-na.approx(z, xout = seq(start(z), end(z), "hours"))
We can merge z with a zero width zoo series z0 which is based on a grid of hours. This will transform z to an hourly series with NAs. Then use the maxgap argument to na.approx as shown below to fill in the desired gaps only. This still leaves NAs in the longer gaps so remove them using na.omit .
fortify.zoo(z3) would transform the result to data frame but since z3, the resulting series with only gaps to length 3 filled, is a time series this is probably not a good idea and it would be better to leave it as a zoo object so that you can use all the facilities of zoo.
No packages other than zoo are used.
z0 <- zoo(, seq(start(z), end(z), "hours"))
z3 <- na.omit(na.approx(merge(z, z0), maxgap = 3))
giving:
> z3
2014-11-20 16:00:00 2014-11-20 17:00:00 2014-11-20 18:00:00 2014-11-20 19:00:00
0.01000000 0.02000000 0.02500000 0.03000000
2014-11-21 16:00:00 2014-11-21 17:00:00 2014-11-21 18:00:00 2014-11-21 19:00:00
0.04000000 0.06000000 0.07333333 0.08666667
2014-11-21 20:00:00
0.10000000
Source 1: Creating a specific sequence of date/times in R. Answer by mnel on Sep 13 2012 and edit by Matt Dowle on Sep 13 2012
&
Source 2: Creating regular 15-minute time-series from irregular time-series. Answer by mnel on Sep 13 2012 and edit by Dirk Eddelbuettel on May 3 2012
library(zoo)
library(xts)
library(data.table)
library(devtools)
devtools::install_github("iembry-USGS/ie2misc")
library(ie2misc)
# iembry released a version of ie2misc so you should be able to install
# the package now
# `na.interp1` is a function that combines zoo's `na.approx` and pracma's
# `interp1`
The rest of the code starts after the creation of your z zoo object
## Source 1 begins
startdate <- as.character((start(z)))
# set the start date/time as the 1st entry in the time series and make
# this a character vector.
start <- as.POSIXct(startdate)
# transform the character vector to a POSIXct object
enddate <- as.character((end(z)))
# set the end date/time as the last entry in the time series and make
# this a character vector.
end <- as.POSIXct(enddate)
# transform the character vector to a POSIXct object
gridtime <- seq(from = start, by = 3600, to = end)
# create a sequence beginning with the start date/time with a 60 minute
# interval ending at the end date/time
## Source 1 ends
## Source 2 begins
timeframe <- data.frame(rep(NA, length(gridtime)))
# create 1 NA column spaced out by the gridtime to complement the single
# column of z
timelength <- xts(timeframe, order.by = gridtime)
# create a xts time series object using timeframe and gridtime
zDate <- merge(timelength, z)
# merge the z zoo object and the timelength xts object
## Source 2 ends
The next steps involve the process of interpolating your data as requested.
Lines <- as.data.frame(zDate)
# to data.frame from zoo
Lines[, "D1"] <- rownames(Lines)
# create column named D1
Lines <- setDT(Lines)
# create data.table out of data.frame
setcolorder(Lines, c(3, 2, 1))
# set the column order as the 3rd column followed by the 2nd and 1st
# columns
Lines <- Lines[, 3 := NULL]
# remove the 3rd column
setnames(Lines, 2, "diff")
# change the name of the 2nd column to diff
Lines <- setDF(Lines)
# return to data.frame
rowsinterps1 <- which(is.na(Lines$diff == TRUE))
# index of rows of Lines that have NA (to be interpolated)
xi <- as.numeric(Lines[which(is.na(Lines$diff == TRUE)), 1])
# the Date-Times for diff to be interpolated in numeric format
interps1 <- na.interp1(as.numeric(Lines$Time), Lines$diff, xi = xi,
na.rm = FALSE, maxgap = 3)
# the interpolated values where only gap sizes of 3 are filled
Lines[rowsinterps1, 2] <- interps1
# replace the NAs in diff with the interpolated diff values
Lines <- na.omit(Lines) # remove rows with NAs
Lines
This is the Lines data.frame:
Lines
D1 diff
1 2014-11-20 16:00:00 0.01000000
2 2014-11-20 17:00:00 0.02000000
3 2014-11-20 18:00:00 0.02500000
4 2014-11-20 19:00:00 0.03000000
25 2014-11-21 16:00:00 0.04000000
26 2014-11-21 17:00:00 0.06000000
27 2014-11-21 18:00:00 0.07333333
28 2014-11-21 19:00:00 0.08666667
29 2014-11-21 20:00:00 0.10000000

R Create function to add water year column

I want to be able to create a water year column for a time series. The US water year is from Oct-Sept and is considered the year it ends on. For example the 2014 water year is from October 1, 2013 - September 30, 2014.
This is the US water year, but not the only water year. Therefore I want to enter in a start month and have a water year calculated for the date.
For example if my data looks like
date
2008-01-01 00:00:00
2008-02-01 00:00:00
2008-03-01 00:00:00
2008-04-01 00:00:00
.
.
.
2008-12-01 00:00:00
I want my function to work something like:
wtr_yr <- function(data, start_month) {
does stuff
}
Then my output would be
wtr_yr(data, 2)
date wtr_yr
2008-01-01 00:00:00 2008
2008-02-01 00:00:00 2009
2008-03-01 00:00:00 2009
2008-04-01 00:00:00 2009
.
.
.
2009-01-01 00:00:00 2009
2009-02-01 00:00:00 2010
2009-03-01 00:00:00 2010
2009-04-01 00:00:00 2010
I started by breaking the date up into separate columns, but I don't think that is the best way to go about it. Any advice?
Thanks in advance!
We can use POSIXlt to come up with an answer.
wtr_yr <- function(dates, start_month=9) {
# Convert dates into POSIXlt
dates.posix = as.POSIXlt(dates)
# Year offset
offset = ifelse(dates.posix$mon >= start_month - 1, 1, 0)
# Water year
adj.year = dates.posix$year + 1900 + offset
# Return the water year
adj.year
}
Let's now use this function in an example.
# Sample input vector
dates = c("2008-01-01 00:00:00",
"2008-02-01 00:00:00",
"2008-03-01 00:00:00",
"2008-04-01 00:00:00",
"2009-01-01 00:00:00",
"2009-02-01 00:00:00",
"2009-03-01 00:00:00",
"2009-04-01 00:00:00")
# Display the function output
wtr_yr(dates, 2)
# Combine the input and output vectors in a dataframe
df = data.frame(dates, wtr_yr=wtr_yr(dates, 2))
I had a similar problem a while back but dealing with fiscal years that started in October. I found this function which also computes the quarters within the year. For one part, I only wanted it to output the fiscal year, so I edited a tiny part of the function to do that. There is surely a much cleaner/efficient way of doing it, but this should work for smaller data sets. Here is the edited function:
getYearQuarter <- function(x,
firstMonth=7,
fy.prefix='FY',
quarter.prefix='Q',
sep='-',
level.range=c(min(x), max(x)) ) {
if(level.range[1] > min(x) | level.range[2] < max(x)) {
warning(paste0('The range of x is greater than level.range. Values ',
'outside level.range will be returned as NA.'))
}
quarterString <- function(d) {
year <- as.integer(format(d, format='%Y'))
month <- as.integer(format(d, format='%m'))
y <- ifelse(firstMonth > 1 & month >= firstMonth, year+1, year)
q <- cut( (month - firstMonth) %% 12, breaks=c(-Inf,2,5,8,Inf),
labels=paste0(quarter.prefix, 1:4))
return(paste0(fy.prefix, substring(y,3,4)))
}
vals <- quarterString(x)
levels <- unique(quarterString(seq(
as.Date(format(level.range[1], '%Y-%m-01')),
as.Date(format(level.range[2], '%Y-%m-28')), by='month')))
return(factor(vals, levels=levels, ordered=TRUE))
}
Your input vector should be type Date, and then specify the start month. Assuming you have a data frame(df) with the 'date' column as in your question, this should do the trick.
df$wtr_yr <- getYearQuarter(df$date, firstMonth=10)
You can also achieve adding a column by water year by using the "lfstat" package
https://www.rdocumentation.org/packages/lfstat/versions/0.9.4/topics/water_year

How to select time range during weekdays and associated data on the next column

Here is an example of a subset data in .csv files. There are three columns with no header. The first column represents the date/time and the second column is load [kw] and the third column is 1= weekday, 0 = weekends/ holiday.
9/9/2010 3:00 153.94 1
9/9/2010 3:15 148.46 1
I would like to program in R, so that it selects the first and second column within time ranges from 10:00 to 20:00 for all weekdays (when the third column is 1) within a month of September and do not know what's the best and most efficient way to code.
code dt <- read.csv("file", header = F, sep=",")
#Select a column with weekday designation = 1, weekend or holiday = 0
y <- data.frame(dt[,3])
#Select a column with timestamps and loads
x <- data.frame(dt[,1:2])
t <- data.frame(dt[,1])
#convert timestamps into readable format
s <- strptime("9/1/2010 0:00", format="%m/%d/%Y %H:%M")
e <- strptime("9/30/2010 23:45", format="%m/%d/%Y %H:%M")
range <- seq(s,e, by = "min")
df <- data.frame(range)
OP ask for "best and efficient way to code" this without showing "inefficient code", so #Justin is right.
It's seems that the OP is new to R (and it's officially the summer of love) so I give it a try and I have a solution (not sure about efficiency..)
index <- c("9/9/2010 19:00", "9/9/2010 21:15", "10/9/2010 11:00", "3/10/2010 10:30")
index <- as.POSIXct(index, format = "%d/%m/%Y %H:%M")
set.seed(1)
Data <- data.frame(Date = index, load = rnorm(4, mean = 120, sd = 10), weeks = c(0, 1, 1, 1))
## Data
## Date load weeks
## 1 2010-09-09 19:00:00 113.74 0
## 2 2010-09-09 21:15:00 121.84 1
## 3 2010-09-10 11:00:00 111.64 1
## 4 2010-10-03 10:30:00 135.95 1
cond <- expression(format(Date, "%H:%M") < "20:00" &
format(Date, "%H:%M") > "10:00" &
weeks == 1 &
format(Date, "%m") == "09")
subset(Data, eval(cond))
## Date load weeks
## 3 2010-09-10 11:00:00 111.64 1

Loading a time series into R

i would like to load the following data structure as a time series into R:
Date 06:00 07:00 .... 22:00
01.11.2011 1 4 .... 42
02.11.2011 6 2 .... 21
...
is this loadable with R ? Do i need to transform my data ?
can anybody help me with this?
First create some data:
Lines <- "Date 06:00 07:00 08:00
01.11.2011 1 4 42
02.11.2011 6 2 21"
DF <- read.table(text = Lines, header = TRUE, check.names = FALSE)
Now create zoo object z using chron date/times:
library(zoo)
library(chron)
tt <- as.chron(outer(DF[[1]], names(DF)[-1], paste), format = "%d.%m.%Y %H:%M")
z <- zoo(c(as.matrix(DF[-1])), tt)
(Replacing as.chron with as.POSIXct would give POSIXct date/times.)

Resources