Removing specific times and days every week from time dataframe - r

Been learning R for a couple months and stumbled across an issue that I can't seem to find yet on stackoverflow. I have a timeframe dataset dictated by:
ts <- seq.POSIXt(as.POSIXlt("2014-08-01 15:00"), as.POSIXlt("2017-08-04 19:33"), by="min")
ts <- format.POSIXct(ts,'%Y%m%d %H%M')
df <- data.frame(timestamp=ts)
I have seen how to remove specific times from every day, and how to remove complete days such as weekends/holidays but I am looking to remove subsets from every week, specifically 8:00 on every Saturday to 9:00 on every Monday throughout the entire dataset. I have tried doing the reverse, by subsetting the period I need by using lubridate (thanks #Christian):
dfc = ymd_hm(df$timestamp)
df[day(dfc) == 2 & hour(dfc) >= 9 | day(dfc) == 7 & hour(dfc) >= 8,]
but it didn't seem to work.
Cheers.

you cant subset when using lubridate with square brackets. Instead its called like a regular function. try to replace e.g. hour[dfc] with hour(dfc) and you should be fine.
edit: to subset a range you need to be aware of == is not like >=
edit2: a bit more of a pointing into the right direction
ts_sat_until_monday = seq.POSIXt(as.POSIXlt("2014-08-02 09:00"),
as.POSIXlt("2014-08-04 08:00"), by = 1)
unique(day(ts_sat_until_monday))
unique(hour(ts_sat_until_monday))
#what about sunday? up to you

Related

Next week day for a given vector of dates

I'm trying to get the next week day for a vector of dates in R. My approach was to create a vector of weekdays and then find the date to the weekend date I have. The problem is that for Saturday and some holidays (which are a lot in my country) i end up getting the previous week day which doesn't work.
This is an example of my problem:
vecDates = as.Date(c("2011-01-11","2011-01-12","2011-01-13","2011-01-14","2011-01-17","2011-01-18",
"2011-01-19","2011-01-20","2011-01-21","2011-01-24"))
testDates = as.Date(c("2011-01-22","2011-01-23"))
findInterval(testDates,vecDates)
for both dates the correct answer should be 10 which is "2011-01-24" but I get 9.
I though of a solution where I remove all the previous dates to the date i'm analyzing, and then use findInterval. It works but it is not vectorized and therefore kind of slow which does not work for my actual purpose.
Does this do what you want?
vecDates = as.Date(c("2011-01-11","2011-01-12",
"2011-01-13","2011-01-14",
"2011-01-17","2011-01-18",
"2011-01-19","2011-01-20",
"2011-01-21","2011-01-24"))
testDates = as.Date(c("2011-01-20","2011-01-22","2011-01-23"))
get_next_biz_day <- function(testdays, bizdays){
o <- findInterval(testdays, bizdays) + 1
bizdays[o]
}
get_next_biz_day(testDates, vecDates)
#[1] "2011-01-21" "2011-01-24" "2011-01-24"

R - For loop over large list of elements

I have split my large data set by date like so to create a large list of several elements:
days <- split(df, df$Date)
My data has columns including time of sunrise, sunset etc. for each day. I now want to use a for loop to do further work on each day separately like this:
for(i in 1:length(days){
sunrisetime <- as.character(df$Sunrise[1])
# Further similar work (using time of sunrise & sunset for each date to split
into daytime hours and nighttime hours)
}
My question is about the df$Sunrise on the second line - I don't think this is the right code to use when trying to access the sunrise time of each day on the days list. I have tried all sorts of variations but am an R newbie so must just be hitting the wrong terms.
Thanks in advance.
sunrisetime<-rep(NA,length(days))
for(i in 1:length(days){
sunrisetime[i] <- as.character(df$Sunrise[i])
}

Creating a specific sequence of date/times in R

I want to create a single column with a sequence of date/time increasing every hour for one year or one month (for example). I was using a code like this to generate this sequence:
start.date<-"2012-01-15"
start.time<-"00:00:00"
interval<-60 # 60 minutes
increment.mins<-interval*60
x<-paste(start.date,start.time)
for(i in 1:365){
print(strptime(x, "%Y-%m-%d %H:%M:%S")+i*increment.mins)
}
However, I am not sure how to specify the range of the sequence of dates and hours. Also, I have been having problems dealing with the first hour "00:00:00"? Not sure what is the best way to specify the length of the date/time sequence for a month, year, etc? Any suggestion will be appreciated.
I would strongly recommend you to use the POSIXct datatype. This way you can use seq without any problems and use those data however you want.
start <- as.POSIXct("2012-01-15")
interval <- 60
end <- start + as.difftime(1, units="days")
seq(from=start, by=interval*60, to=end)
Now you can do whatever you want with your vector of timestamps.
Try this. mondate is very clever about advancing by a month. For example, it will advance the last day of Jan to last day of Feb whereas other date/time classes tend to overshoot into Mar. chron does not use time zones so you can't get the time zone bugs that code as you can using POSIXct. Here x is from the question.
library(chron)
library(mondate)
start.time.num <- as.numeric(as.chron(x))
# +1 means one month. Use +12 if you want one year.
end.time.num <- as.numeric(as.chron(paste(mondate(x)+1, start.time)))
# 1/24 means one hour. Change as needed.
hours <- as.chron(seq(start.time.num, end.time.num, 1/24))

Obtaining or subsetting the first 5 minutes of each day of data from an xts

I would like to subset out the first 5 minutes of time series data for each day from minutely data, however the first 5 minutes do not occur at the same time each day thus using something like xtsobj["T09:00/T09:05"] would not work since the beginning of the first 5 minutes changes. i.e. sometimes it starts at 9:20am or some other random time in the morning instead of 9am.
So far, I have been able to subset out the first minute for each day using a function like:
k <- diff(index(xtsobj))> 10000
xtsobj[c(1, which(k)+1)]
i.e. finding gaps in the data that are larger than 10000 seconds, but going from that to finding the first 5 minutes of each day is proving more difficult as the data is not always evenly spaced out. I.e. between first minute and 5th minute there could be from 2 row to 5 rows and thus using something like:
xtsobj[c(1, which(k)+6)]
and then binding the results together
is not always accurate. I was hoping that a function like 'first' could be used, but wasn't sure how to do this for multiple days, perhaps this might be the optimal solution. Is there a better way of obtaining this information?
Many thanks for the stackoverflow community in advance.
split(xtsobj, "days") will create a list with an xts object for each day.
Then you can apply head to the each day
lapply(split(xtsobj, "days"), head, 5)
or more generally
lapply(split(xtsobj, "days"), function(x) {
x[1:5, ]
})
Finally, you can rbind the days back together if you want.
do.call(rbind, lapply(split(xtsobj, "days"), function(x) x[1:5, ]))
What about you use the package lubridate, first find out the starting point each day that according to you changes sort of randomly, and then use the function minutes
So it would be something like:
five_minutes_after = starting_point_each_day + minutes(5)
Then you can use the usual subset of xts doing something like:
5_min_period = paste(starting_point_each_day,five_minutes_after,sep='/')
xtsobj[5_min_period]
Edit:
#Joshua
I think this works, look at this example:
library(lubridate)
x <- xts(cumsum(rnorm(20, 0, 0.1)), Sys.time() - seq(60,1200,60))
starting_point_each_day= index(x[1])
five_minutes_after = index(x[1]) + minutes(5)
five_min_period = paste(starting_point_each_day,five_minutes_after,sep='/')
x[five_min_period]
In my previous example I made a mistake, I put the five_min_period between quotes.
Was that what you were pointing out Joshua? Also maybe the starting point is not necessary, just:
until5min=paste('/',five_minutes_after,sep="")
x[until5min]

How do remove data from a certain weekday period from a R time-series?

I have a R xts timeseries. How can I create a new timeseries from it, which contains all the data from the original, except the data points occurring on Monday between 12:00 and 18:00?
Here's one way to do it.
x <- .xts(rnorm(100), as.POSIXct("2011-01-06 10:00:00")-100:1*3600)
x[with(as.POSIXlt(index(x)), !(wday==1 & hour > 12 & hour < 18)),]
And if you only need the times between 12:00-18:00 you can use xts-subsetting like this:
x["T12:00/T18:00"]

Resources