Can I subset specific years and months directly from POSIXct datetimes? - r

I have time series data and I am trying to subset the following:
1) periods between specific years (beginning 12AM January 1 and ending 11pm December 31)
2) periods without specific months
These are two independent subsets I am trying to do.
Given the following dataframe:
test <- data.frame(seq(from = as.POSIXct("1983-03-09 01:00"), to = as.POSIXct("1985-01-08 00:00"), by = "hour"))
colnames(test) <- "DateTime"
test$Value<-sample(0:100,16104,rep=TRUE)
I can first create Year and Month columns and use these to subset:
# Add year column
test$Year <- as.numeric(format(test$DateTime, "%Y"))
# Add month column
test$Month <- as.numeric(format(test$DateTime, "%m"))
# Subset specific year (1984 in this case)
sub1 = subset(test, Year!="1983" & Year!="1985")
# Subset specific months (April and May in this case)
sub2 = subset(test, Month=="4" | Month=="5")
However, I am wondering if there is a better way to do this directly from the POSIXct datetimes (without having to first create the Year and Month columns. Any ideas?

sub1 <- subset(test, format(DateTime, "%Y") %in% c("1983" , "1985") )
sub2 <- subset(test, as.numeric(format(DateTime, "%m")) %in% 4:5)

Related

Loop in R using changing variable to write and name files

I am trying to create a loop in R that reads daily values of a netcdf file I have imported and converts them into annual sums, then creates a raster for each year. I have converted the netcdf into an array - this is named Biased_corrected.array in my code below. I am not sure how to include the variable 'year' in my file names as it changes with each iteration of the loop. I have tried using paste but this seems to be where it fails. Any suggestions?
# read in file specifying which days correspond to years
YearsDays <- read.csv("Data\\Years.csv") # a df with 49 obs. of 3 variables (year, start day, and end day
YearsDays[1,2:3] #returns 1 and 366 (the days for year 1972)
YearsDays[2,2:3] #returns 367 and 731 (the days for year 1973)
YearsDays[1,1] #returns 1972
YearsDays[2,1] #returns 1973
counter <- 1
startyear <- YearsDays[1,1]
year <- startyear
while(year < 2021){
#set variables to loop through
startday <- YearsDays[counter,2]
endday <- YearsDays[counter,3]
BC_rain.slice <- Biased_corrected.array[,,startday:endday]
paste(year, "_Annual_rain") <- apply(BC_rain.slice, c(1,2), sum)
#save data in a raster
paste(year, "_rain_r") <- raster(t(paste(year, "_Annual_Rain"), xmn=min(x), xmx=max(x), ymn=min(y), ymx=max(y), crs=WGS84)
# move on to next year
counter <- counter + 1
year <- 1971 + counter
}
EDIT: The working code for anyone interested:
YearsDays <- read.csv("Data\\Years.csv") # a df with 49 obs. of 3 variables (year, start day, and end day
for (idx in seq(nrow(YearsDays))){
#set variables to loop through
year <- YearsDays[idx,1]
startday <- YearsDays[idx,2]
endday <- YearsDays[idx,3]
BC_rain.slice <- Biased_corrected.array[,,startday:endday]
assign(paste(year, "_Annual_rain"),apply(BC_rain.slice, c(1,2), sum))
annual_rain <- apply(BC_rain.slice, c(1,2), sum)
#save data in a raster
assign(paste(year, "_rain_r"),raster(t(annual_rain), xmn=min(x), xmx=max(x), ymn=min(y), ymx=max(y), crs=WGS84))
}
You can't use paste to create a variable name as you've listed. You can enclose it within assign or eval, however it may be easier to instead store your results within a data frame. Below is an example of what I believe you're trying to achieve. I have also replaced your while loop and counter with a for loop iterating over years:
YearsDays <- read.csv("Data\\Years.csv") # a df with 49 obs. of 3 variables (year, start day, and end day
output <- data.frame(year = YearsDays[,1], rain_r = NA)
for (idx in seq(nrow(YearsDays))){
#set variables to loop through
year <- YearsDays[idx,1]
startday <- YearsDays[idx,2]
endday <- YearsDays[idx,3]
BC_rain.slice <- Biased_corrected.array[,,startday:endday]
annual_rain <- apply(BC_rain.slice, c(1,2), sum)
#save data in a raster
output$rain_r[output$year == year] <- raster(t(annual_rain, xmn=min(x), xmx=max(x), ymn=min(y), ymx=max(y), crs=WGS84))
}
How about to replace your part
paste(year, "_Annual_rain") <- apply(BC_rain.slice, c(1,2), sum)
#save data in a raster
paste(year, "_rain_r") <- raster(t(paste(year, "_Annual_Rain"), xmn=min(x), xmx=max(x), ymn=min(y), ymx=max(y), crs=WGS84)
to
txt <- paste0(year, "_Annual_rain <- apply(BC_rain.slice, c(1,2), sum)")
eval(parse(text = txt))
# save data in a raster
txt <- paste0(year, "_rain_r <- raster(t(", year, "_Annual_Rain), xmn=min(x), xmx=max(x), ymn=min(y), ymx=max(y), crs=WGS84)")
eval(parse(text = txt))

R How to calculate the seasonal average value for each year using stackApply?

I wanted to calculate the seasonal average value for each year and not for the entire period. I defined my seasons as follows: DJF (December-February), MAM (March-May), JJA (June-August) and SON (September-November).
Inspired by the solution of the question of Fredrick, I created a index "groups" which represents the seasons then I applied the command "stackApply" but this one calculates the average seasonal value for the all of period. I explain, the final layer obtained contains only 4 raster but for my case I wanted to calculate "the seasonal average value for each year, therefore it is necessary to have 4 rasters for each year and in total the rasterstack should have 136 rasters.
Below my code
Thanks for your help
library(raster)
set.seed(123)
r <- raster(ncol=10, nrow=10)
r_brick <- brick(sapply(1:408, function(i) setValues(r, rnorm(ncell(r), i, 3))))
dim(r_brick)
dates <- seq(as.Date("1982-01-01"), as.Date("2015-12-31"), by="month")
months <- format(dates, "%Y-%m")
groups <- function(x) {
d <- as.POSIXlt(x)
ans <- character(length(x))
ans[d$mon %in% c(11,0:1)] <- "DJF"
ans[d$mon %in% 2:4] <- "MAM"
ans[d$mon %in% 5:7] <- "JJA"
ans[d$mon %in% 8:10] <- "SON"
ans
}
data.frame(dates, groups(dates))
r_brick.s <- stackApply(r_brick, indices=groups(dates), fun=mean,na.rm=TRUE)
nlayers(r_brick.s)
Your example data
library(raster)
r <- raster(ncol=10, nrow=10)
b <- brick(sapply(1:408, function(i) setValues(r, rnorm(ncell(r), i, 3))))
dates <- seq(as.Date("1982-01-01"), as.Date("2015-12-31"), by="month")
As Majid points out, if you want to group by years, you need to use these
years <- as.integer(format(dates, "%Y"))
months <- as.integer(format(dates, "%m"))
Now the months need to be grouped. Note that as you start with December you must make sure to not combine January and December of the same year. Rather you want to combine December of year i with January and February of year i+1. Here is one way to accomplish that (make the year start in December!)
n <- length(months)
# move all months back one month
mnt <- c(months[-1], ifelse(months[n] < 12, months[n]+1, 1))
# move the years along
yrs <- c(years[-1], ifelse(months[n] < 12, years[n], years[n]+1))
# group by trimesters using integer division (or do: floor((mnt-1) / 3))
trims <- (mnt-1) %/% 3
# get names instead of 0, 1, 2, 3
trimnms <- c("DJF", "MAM", "JJA", "SON")[trims + 1]
Combine years and names
yt <- paste(yrs, trimnms, sep="_")
Use the index
s <- stackApply(b, indices=yt, fun=mean, na.rm=TRUE)
If the above business of moving the months backward is difficult to follow, try it out with just a few dates (the first 15 or so)
Everything is OK with your script, except for the groups you are generating (i.e. DJF, ...). The index you are planning to use for groups should be unique for each target group while, at this stage, there is no difference between JJA for 1982 and JJA for 1983 and so on!
#data.frame(dates, paste(substr(dates, 1,4), groups(dates), sep="_"))
idx <- paste(substr(dates, 1,4), groups(dates), sep="_")
r_brick.s <- stackApply(r_brick, indices=idx, fun=mean, na.rm=TRUE)
nlayers(r_brick.s)
#136 that is the number of seasons

Add 1 in column according to specific dates and count

I have a table including a time series of daily values (value), Date and a column with "0s". Here are the variables:
value <- c(37,19.75,19.5,14.5,24.75,25,25.5,19.75,19.75,14.25,21.25,21.75,17.5,16.25,14.5,
14.5,14.75,9.5,11.75,15.25,14.25,16.5,13.5,18.25,13.5,11.25,10.75,12,8.5,
9.75,14.75)
Date <- c("1997-05-01","1997-05-02","1997-05-03","1997-05-04","1997-05-05",
"1997-05-06","1997-05-07","1997-05-08","1997-05-09","1997-05-10",
"1997-05-11","1997-05-12","1997-05-13","1997-05-14","1997-05-15",
"1997-05-16","1997-05-17","1997-05-18","1997-05-19","1997-05-20",
"1997-05-21","1997-05-22","1997-05-23","1997-05-24","1997-05-25",
"1997-05-26","1997-05-27","1997-05-28","1997-05-29","1997-05-30",
"1997-05-31")
ncol <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)`
data <- data.frame(value, Date, ncol)
Date is formatted as Date using the "as.Date" function. now I want to add "1" to the some values in column "newcol" (with 0s) on a specific 5 days, eg. on the "1997.05.05","1997.05.11","1997.05.14","1997.05.18","1997.05.25" in the time series.
I created this function, but works for a date only:
x <- 1
i <- which(format(data$Date, "%Y.%m.%d") == "1997.05.05")
data$newcol[i] <- data$newcol[i] + x
how to do that best?
Then I would like to count the number of times that "value" appears >20 from a specific date (newcol = 1) for the previous 5 days. For example, the date 1997.05.25 how many times the value appears >20 to 1997.05.21.
This answers the 1st part of your question:
library(data.table)
setDT(data)[ Date %in% c("1997-05-05","1997-05-11","1997-05-14","1997-05-18","1997-05-25"), newcol := ncol+1 ]
# or perhaps better:
setDT(data)[, newcol := ifelse(Date %in% c("1997-05-05","1997-05-11","1997-05-14","1997-05-18","1997-05-25"), ncol+1, 0) ]
With base R this can be done
transform(data, newcol = as.integer(as.character(Date) %in%
c("1997-05-05","1997-05-11","1997-05-14","1997-05-18","1997-05-25")))

How to filter or subset specific date and time intervals in R? Lubridate?

I have a dataset consisting of two columns. A datetime column and a column with numerical values. Its a simple dataset, so I did not attach it..
What I need to do, is to filter or subset the data corresponding with a class schedule, so that I get a dataset/dataframe with datetime values and numerical values for the time when the class has lectures only.
The class schedule is different from each day of the week, e.g. Mondays 8:00-9.50, 10.30-11.30, 14.50-15:50. Tuesdays 10.30-11.30, 14.10-15.30, Wednesdays...an so on.
Any idea how I could do this?
I usually convert datetime-values to POSIXct format, but recently I read about lubridate.
I am just still not sure how to efficiently subset with all these criteria.
Perhaps I should subset the data according to the weekdays first.
And then subset the different weekdays according to the lecture time...
Hope someone can help me.
BTW: The data is for all of 2014, so I actually have to avoid the data when the class have holidays as well...
Convert class intervals to an interval class in lubridate. Then subset based on the test of if the dates are in the intervals...
> a <- new_interval(Sys.time(), Sys.time() + 120)
> Sys.time() %within% a
[1] TRUE
I will try this, where the D$Time is in POSIXct format:
# Create column with weekday
D$Weekday <- D$Time
D$Weekday <- weekdays(as.Date(D$Time))
# Subset weekdays
MO <- subset(D, D$Weekday == "Monday")
head(MO)
TU <- subset(D, D$Weekday == "Tuesday")
WE <- subset(D, D$Weekday == "Wednesday")
MO <- subset(D, D$Weekday == "Thursday")
MO <- subset(D, D$Weekday == "Friday")
MO <- subset(D, D$Weekday == "Saturday")
# Subset lecture of weekday
MO_L1 <- subset(MO, format(MO$Time, "%H:%M:$S") > "07:55:00" &
format(MO$Time, "%H:%M:$S") < "09:30:00")
head(MO_L1)
tail(MO_L1)
MO_L2 <- subset(MO, format(MO$Time, "%H:%M:$S") > "10:55:00" &
format(MO$Time, "%H:%M:$S") < "11:30:00")
And in the end combine all the subset to a new dataset...

Creating a sequence of columns in a data frame based on an index for loop or using plyr in r

I wish to create 24 hourly data frames in which each data.frame contains hourly demand for a product as 1 column, and the next 8 columns contain hourly temperatures. For example, for the data.frame for 8am, the data.frame will contain a column for demand at 8am, then eight columns for temperature ranging from the most current hour to the 7 past hours. The additional complication is that for hours before 8AM i.e. "4AM", I have to get yesterday's temperatures. I am hitting my head against the wall trying to figure out how to do this with apply or plyr, or a vectorized function.
demand8AM Temp8AM Temp7AM Temp6AM...Temp1AM
Demand4AM Temp4AM Temp3AM Temp2AM Temp1AM Temp12AM Temp11pm(Lag) Temp10pm(Lag)
In my code Hours are numbers; 1 is 12AM etc.
Here is some simple code I created to create the dataset I am dealing with.
#Creating some Fake Data
require(plyr)
# setting up some fake data
set.seed(31)
foo <- function(myHour, myDate){
rlnorm(1, meanlog=0,sdlog=1)*(myHour) + (150*myDate)
}
Hour <- 1:24
Day <-1:90
dates <-seq(as.Date("2012-01-01"), as.Date("2012-3-30"), by = "day")
myData <- expand.grid( Day, Hour)
names(myData) <- c("Date","Hour")
myData$Temperature <- apply(myData, 1, function(x) foo(x[2], x[1]))
myData$Date <-dates
myData$Demand <-(rnorm(1,mean = 0, sd=1)+.75*myData$Temperature )
## ok, done with the fake data generation.
It looks as though you could benefit from utilizing a time series. Here's my interpretation of what you want (I used the "mean" function in rollapply), not what you asked for. I recommend you read over the xts and zoo packages.
#create dummy time vector
time_index <- seq(from = as.POSIXct("2012-05-15 07:00"),
to = as.POSIXct("2012-05-17 18:00"), by = "hour")
#create dummy demand and temp.C
info <- data.frame(demand = sample(1:length(time_index), replace = T),
temp.C = sample (1:10))
#turn demand + temp.C into time series
eventdata <- xts(info, order.by = time_index)
x2 <- eventdata$temp.C
for (i in 1:8) {x2 <- cbind(x2, lag(eventdata$temp.C, i))}

Resources