How do remove data from a certain weekday period from a R time-series? - r

I have a R xts timeseries. How can I create a new timeseries from it, which contains all the data from the original, except the data points occurring on Monday between 12:00 and 18:00?

Here's one way to do it.
x <- .xts(rnorm(100), as.POSIXct("2011-01-06 10:00:00")-100:1*3600)
x[with(as.POSIXlt(index(x)), !(wday==1 & hour > 12 & hour < 18)),]
And if you only need the times between 12:00-18:00 you can use xts-subsetting like this:
x["T12:00/T18:00"]

Related

Filtering h2o dataset by date, but being column imported as time in R

I have a .csv that I am importing into h2o which has dates stored as "YYYY-mm-dd" format. When I import this into h2o through R, these columns are read in as time (milliseconds) since 1970 (as explained by the problem listed here - https://0xdata.atlassian.net/browse/PUBDEV-3434).
> head(data.hex$date_used_dt)
date_used_dt
1 1489449600000
2 1520380800000
3 1469491200000
4 1465862400000
5 1464912000000
6 1516147200000
I need to turn this column into a date format. h2o.as_date() cannot work since this is not a factor or string. Is there a function that converts the time variable from h2o to a date within h2o? Something like h2o.as_date(), but that could be used on time variables? I need to keep this dataset in h2o.
All dates within h2o are represented like this. Even if you have a character column of dates ("2018-01-01") and you use h2o.as_date() it will be represented in milliseconds.
What you can do if you want to filter on dates is use the h2o.day, h2o.month and h2o.year functions.
data.hex[h2o.day(data.hex$date_used_dt) == 5, ] if you only want every 5th day of every month.
Or any combination of month and year like data.hex[h2o.year(data.hex$date_used_dt) == 2017 & h2o.month(data.hex$date_used_dt) == 12, ] if you just want december 2017.

Reading Time series data in R

I am trying to import time series data in R with the below code. The data is from 1-7-2014 to 30-4-2017 making it 1035 data point. But when I use the below code it gives 1093 observation.
series <- ts(data1, start=c(2014,7,1), end=c(2017,4,30), frequency = 365)
Can someone help me in understanding where am I going wrong?
ts doesn't allow input for start and end in this form. Either a single number or a vector of two integers is allowed. In second case it's year and day number, starting from 1st January.
With the help of lubridate you can use the following. decimal_date will convert the date to proper integer, suitable for ts.
library(lubridate)
series <- ts(data1, start=decimal_date(as.Date("2014-07-01")), end=decimal_date(as.Date("2017-04-30") + 1), frequency = 365)
> length(series)
[1] 1035

Removing specific times and days every week from time dataframe

Been learning R for a couple months and stumbled across an issue that I can't seem to find yet on stackoverflow. I have a timeframe dataset dictated by:
ts <- seq.POSIXt(as.POSIXlt("2014-08-01 15:00"), as.POSIXlt("2017-08-04 19:33"), by="min")
ts <- format.POSIXct(ts,'%Y%m%d %H%M')
df <- data.frame(timestamp=ts)
I have seen how to remove specific times from every day, and how to remove complete days such as weekends/holidays but I am looking to remove subsets from every week, specifically 8:00 on every Saturday to 9:00 on every Monday throughout the entire dataset. I have tried doing the reverse, by subsetting the period I need by using lubridate (thanks #Christian):
dfc = ymd_hm(df$timestamp)
df[day(dfc) == 2 & hour(dfc) >= 9 | day(dfc) == 7 & hour(dfc) >= 8,]
but it didn't seem to work.
Cheers.
you cant subset when using lubridate with square brackets. Instead its called like a regular function. try to replace e.g. hour[dfc] with hour(dfc) and you should be fine.
edit: to subset a range you need to be aware of == is not like >=
edit2: a bit more of a pointing into the right direction
ts_sat_until_monday = seq.POSIXt(as.POSIXlt("2014-08-02 09:00"),
as.POSIXlt("2014-08-04 08:00"), by = 1)
unique(day(ts_sat_until_monday))
unique(hour(ts_sat_until_monday))
#what about sunday? up to you

How do I make periods out of times in R?

I have 10 million+ data points which look like:
Identifier Times Data
6597104 2015-05-01 04:08:05 0.15512575543732
In order to study these I want to add a Period (1, 2,...) column so the oldest row with the 6597104 identifier is period 1 and the second oldest is period 2 etc. However the times come irregularly so I can't just make it a time series object.
Does anyone know how to do this? Thanks in advance
Let's call your data frame data
First sort it using
data <- data[sort(data$Times,decreasing=TRUE),]
Then add a new column called Period
for i in 1:nrow(data){
data$Period[i] <- paste("Period",i,sep=" ")
}

R - For loop over large list of elements

I have split my large data set by date like so to create a large list of several elements:
days <- split(df, df$Date)
My data has columns including time of sunrise, sunset etc. for each day. I now want to use a for loop to do further work on each day separately like this:
for(i in 1:length(days){
sunrisetime <- as.character(df$Sunrise[1])
# Further similar work (using time of sunrise & sunset for each date to split
into daytime hours and nighttime hours)
}
My question is about the df$Sunrise on the second line - I don't think this is the right code to use when trying to access the sunrise time of each day on the days list. I have tried all sorts of variations but am an R newbie so must just be hitting the wrong terms.
Thanks in advance.
sunrisetime<-rep(NA,length(days))
for(i in 1:length(days){
sunrisetime[i] <- as.character(df$Sunrise[i])
}

Resources