Dividing time series by datetime period? - r

I have some problems splitting a datetime variable into two variables. My time series is a /hour/day/month count of a full year (about 360 days).
I would like to generate one variable that ranges from the 1st of each month to the 19th of each month and the second variable captures 20 to the rest of the month:
Format:
datetime hours var1 var2 var3
2011-01-1 00:00:00
2011-01-1 01:00:00
... ...
2011-01-1 23:00:00
2011-01-2 00:00:00
2011-01-2 01:00:00
... ...
2011-01-2 23:00:00
... ...
... ...
2011-01-20 01:00:00
...
2011-01-31 00:00:00
... ...
2011-12-30 00:00:00
2011-12-30 01:00:00
.. ..
Desired Format:
datetime1 datetime2 var1 var2 var2
2011-01-1 00:00:00 2011-01-20 00:00:00
2011-01-1 01:00:00 2011-01-21 01:00:00
.. .. .. ..
2011-01-19 00:00:00 2011-01-30 00:00:00
2011-01-19 01:00:00 2011-01-30 01:00:00
.. .. .. ...
.. .. .. ...
2011-12-19 00:00:00 2011-12-30 00:00:00
2011-12-19 01:00:00 2011-12-30 01:00:00
Originally, I was able to produce the datetime variable by:
rbind or rbind.fill ( plyr) two data frames with datetime1 and datetime2
df3<-rbind(df1,df2)
That is, the original version (two data frames) had these two variables but Im not able to seperate them now.
I just couldt formulate the code...

Try this to extract the day of the date:
xx <- as.Date("2014-12-31")
as.POSIXlt(xx)$mday
You can then use the date as a condition to attribute NA to one column and value to the other.
EDIT: Here's the more in-depth version.
#Setting up a replicable example
mydata <- as.data.frame(matrix(rnorm(90), ncol=3))
names(mydata)[1:2] <- paste0("time",1:2)
mydata$time1 <- as.Date(NA)
mydata$time2 <- as.Date(NA)
str(mydata)
#Getting 30 consecutive days:
datestring <- rep(Sys.time(), 30)
for(i in 1:30)
datestring[i]<- Sys.time() + 60*60*24*i
mydata <- cbind(datestring, mydata)
#Doing what I think you're trying to do:
for (i in 1:dim(mydata)[1]){
if(as.POSIXlt(mydata$datestring[i])$mday <=19)
{mydata$time1[i] <- mydata$datestring[i]}
else {mydata$time2[i] <- mydata$datestring[i]}
}

Related

How to create a date (column) from a date-time (column) in R

I have imported a CSV containing dates in the column "Activity_Date_Minute". The date value for example is "04/12/2016 01:12:00". Now when I read the .csv into a dataframe and extract only the date this gives me date in the column as 4-12-20. Can someone help how to get the date in mm-dd-yyyy in a separate column?
Tried the below code. Was expecting to see a column with dates e.g 04/12/2016 (mm/dd/yyyy).
#Installing packages
install.packages("tidyverse")
library(tidyverse)
install.packages('ggplot2')
library(ggplot2)
install.packages("dplyr")
library(dplyr)
install.packages("lubridate")
library(lubridate)
##Installing packages
install.packages("tidyverse")
library(tidyverse)
install.packages('ggplot2')
library(ggplot2)
install.packages("dplyr")
library(dplyr)
install.packages("lubridate")
library(lubridate)
##Reading minute-wise METs into "minutewiseMET_Records" and summarizing MET per day for all the IDs
minutewiseMET_Records <- read.csv("minuteMETsNarrow_merged.csv")
str(minutewiseMET_Records)
## converting column ID to character,Activity_Date_Minute to date
minutewiseMET_Records$Id <- as.character(minutewiseMET_Records$Id)
minutewiseMET_Records$Date <- as.Date(minutewiseMET_Records$Activity_Date_Minute)
str(minutewiseMET_Records)
The Console is as follows:
> minutewiseMET_Records <- read.csv("minuteMETsNarrow_merged.csv")
> str(minutewiseMET_Records)
'data.frame': 1048575 obs. of 3 variables:
$ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
$ Activity_Date_Minute: chr "04/12/2016 00:00" "04/12/2016 00:01" "04/12/2016 00:02" "04/12/2016 00:03" ...
$ METs : int 10 10 10 10 10 12 12 12 12 12 ...
> ## converting column ID to character,Activity_Date_Minute to date
> minutewiseMET_Records$Id <- as.character(minutewiseMET_Records$Id)
> minutewiseMET_Records$Date <- as.Date(minutewiseMET_Records$Activity_Date_Minute)
> ## converting column ID to character,Activity_Date_Minute to date
> minutewiseMET_Records$Id <- as.character(minutewiseMET_Records$Id)
> minutewiseMET_Records$Date <- as.Date(minutewiseMET_Records$Activity_Date_Minute)
> str(minutewiseMET_Records)
'data.frame': 1048575 obs. of 4 variables:
$ Id : chr "1503960366" "1503960366" "1503960366" "1503960366" ...
$ Activity_Date_Minute: chr "04/12/2016 00:00" "04/12/2016 00:01" "04/12/2016 00:02" "04/12/2016 00:03" ...
$ METs : int 10 10 10 10 10 12 12 12 12 12 ...
$ Date : Date, format: "4-12-20" "4-12-20" ...
>
I think this will work for you
minutewiseMET_Records$Date <- format(as.Date(minutewiseMET_Records$Activity_Date_Minute, format = "%d/%m/%Y"),"%m/%d/%Y")
Fist of all you have to tell R the format of your initial data. Then, you ask it which is the format you want for the output.
Activity_Date_Minute isn’t a datetime in your initial data, it’s a character. So you’ll have to first convert it to a datetime (e.g., using lubridate::mdy_hm()), then use as.Date().
library(dplyr)
library(lubridate)
minutewiseMET_Records %>%
mutate(
Activity_Date_Minute = mdy_hm(Activity_Date_Minute),
Activity_Date = as.Date(Activity_Date_Minute)
)
# A tibble: 4 × 2
Activity_Date_Minute Activity_Date
<dttm> <date>
1 2016-04-12 00:00:00 2016-04-12
2 2016-04-12 00:01:00 2016-04-12
3 2016-04-12 00:02:00 2016-04-12
4 2016-04-12 00:03:00 2016-04-12

How to convert monthly time-series in R

I am working on a monthly-based time-series data set:
> head(data, n=10)
# A tibble: 10 x 2
Month Inflation
<dttm> <dbl>
1 1979-01-01 00:00:00 0.0258
2 1979-02-01 00:00:00 0.0234
3 1979-03-01 00:00:00 0.0055
4 1979-04-01 00:00:00 0.0302
5 1979-05-01 00:00:00 0.0305
6 1979-06-01 00:00:00 0.0232
7 1979-07-01 00:00:00 0.025
8 1979-08-01 00:00:00 0.0234
9 1979-09-01 00:00:00 0.0074
10 1979-10-01 00:00:00 0.0089
Although it appears that the data is yet to be recognized as a time-series data as it shows the following structure:
> str(data)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 479 obs. of 2 variables:
$ Month : POSIXct, format: "1979-01-01" "1979-02-01" "1979-03-01" "1979-04-01" ...
$ Inflation: num 0.0258 0.0234 0.0055 0.0302 0.0305 0.0232 0.025 0.0234 0.0074 0.0089 ...
When I tried to convert it using xts function, it gave me this error:
> inflation <- xts(data[,-1], order.by=as.Date(data[,1], "%m/%d/%Y"))
Error in as.Date.default(data[, 1], "%m/%d/%Y") :
do not know how to convert 'data[, 1]' to class “Date”
Please help me with the most appropriate way of data conversion.
Thanks
# You have something like:
data <- data.frame(
Month = as.Date(as.Date("1979-01-01"):as.Date("2000-01-01"), origin="1970-01-01"),
Inflation = rnorm(7671)) # same number of obs
Create TS
choose start and end dates appropriatelly
tseries <- ts(data$Inflation, start = c(1979,1), end = c(2000,1), frequency = 12)
plot(tseries)

Divide time-series data into weekday and weekend datasets using R

I have dataset consisting of two columns (timestamp and power) as:
str(df2)
'data.frame': 720 obs. of 2 variables:
$ timestamp: POSIXct, format: "2015-08-01 00:00:00" "2015-08-01 01:00:00" " ...
$ power : num 124 149 118 167 130 ..
This dataset is of entire one month duration. I want to create two subsets of it - one containing the weekend data, and other one containing weekday (Monday - Friday) data. In other words, one dataset should contain data corresponding to saturday and sunday and the other one should contain data of other days. Both of the subsets should retain both of the columns. How can I do this in R?
I tried to use the concept of aggregate and split, but I am not clear in the function parameter (FUN) of aggregate, how should I specify a divison of dataset.
You can use R base functions to do this, first use strptime to separate date data from first column and then use function weekdays.
Example:
df1<-data.frame(timestamp=c("2015-08-01 00:00:00","2015-10-13 00:00:00"),power=1:2)
df1$day<-strptime(df1[,1], "%Y-%m-%d")
df1$weekday<-weekdays(df1$day)
df1
timestamp power day weekday
2015-08-01 00:00:00 1 2015-08-01 Saturday
2015-10-13 00:00:00 2 2015-10-13 Tuesday
Building on top of #ShruS example:
df<-data.frame(timestamp=c("2015-08-01 00:00:00","2015-10-13 00:00:00", "2015-10-11 00:00:00", "2015-10-14 00:00:00"))
df$day<-strptime(df[,1], "%Y-%m-%d")
df$weekday<-weekdays(df$day)
df1 = subset(df,df$weekday == "Saturday" | df$weekday == "Sunday")
df2 = subset(df,df$weekday != "Saturday" & df$weekday != "Sunday")
> df
timestamp day weekday
1 2015-08-01 00:00:00 2015-08-01 Saturday
2 2015-10-13 00:00:00 2015-10-13 Tuesday
3 2015-10-11 00:00:00 2015-10-11 Sunday
4 2015-10-14 00:00:00 2015-10-14 Wednesday
> df1
timestamp day weekday
1 2015-08-01 00:00:00 2015-08-01 Saturday
3 2015-10-11 00:00:00 2015-10-11 Sunday
> df2
timestamp day weekday
2 2015-10-13 00:00:00 2015-10-13 Tuesday
4 2015-10-14 00:00:00 2015-10-14 Wednesday
Initially, I tried for complex approaches using extra libraries, but at the end, I came out with a basic approach using R.
#adding day column to existing set
df2$day <- weekdays(as.POSIXct(df2$timestamp))
# creating two data_subsets, i.e., week_data and weekend_data
week_data<- data.frame(timestamp=factor(), power= numeric(),day= character())
weekend_data<- data.frame(timestamp=factor(),power=numeric(),day= character())
#Specifying weekend days in vector, weekend
weekend <- c("Saturday","Sunday")
for(i in 1:nrow(df2)){
if(is.element(df2[i,3], weekend)){
weekend_data <- rbind(weekend_data, df2[i,])
} else{
week_data <- rbind(week_data, df2[i,])
}
}
The datasets created, i.e., weekend_data and week_data are my required sub datasets.

connecting 2 columns in R isnot working

I try to read data from a file which holds date and time and wrote the following code to concatenate the coulms Date and Time into 1 colum named Datetime:
df <-read.csv("file", header=TRUE)
df = data.frame(DateTime=as.POSIXct(paste(df$Date, df$Time)), df)
The problem is that the output holds only the Date and not the Time.
I also tried to change the format of the data with df$Date <- as.Date(df$Date , "%y/%m/%d") but the output is NA.
Please advice.
The file sample is here:
Date,Time
2011/12/22,02:00:00
2011/12/22,02:01:00
2011/12/22,02:02:00
2011/12/22,02:03:00
2011/12/22,02:04:00
2011/12/22,02:05:00
2011/12/22,02:06:00
2011/12/22,02:07:00
2011/12/22,02:08:00
2011/12/22,02:09:00
2011/12/22,02:10:00
2011/12/22,02:11:00
2011/12/22,02:12:00
2011/12/22,02:13:00
2011/12/22,02:14:00
2011/12/22,02:15:00
2011/12/22,02:16:00
2011/12/22,02:17:00
2011/12/22,02:18:00
2011/12/22,02:19:00
2011/12/22,02:20:00
Try
df$datetime <- as.POSIXct(paste(df$Date, df$Time), format="%Y/%m/%d %H:%M:%S")
df$Date <- as.Date(df$Date, "%Y/%m/%d")
head(df,3)
# ( Date Time datetime
#1 2011-12-22 02:00:00 2011-12-22 02:00:00
#2 2011-12-22 02:01:00 2011-12-22 02:01:00
#3 2011-12-22 02:02:00 2011-12-22 02:02:00
str(df)
#'data.frame': 21 obs. of 3 variables:
#$ Date : Date, format: "2011-12-22" "2011-12-22" ...
#$ Time : chr "02:00:00" "02:01:00" "02:02:00" "02:03:00" ...
#$ datetime: POSIXct, format: "2011-12-22 02:00:00" "2011-12-22 02:01:00" ...

How to join 2 data.tables using a time interval and a group-by

I have a data.table of frequently collected data:
set.seed(1)
t1 <- seq(from=as.POSIXct('2014-1-1'), to=as.POSIXct('2014-6-1'), by='day')
T1 <- data.table(time1=t1, group=rep(c('A', 'B'), length(t1)/2), value1=rnorm(length(t1)))
and a data.table of infrequently collected data:
t2 <- seq(from=as.POSIXct('2014-1-1'), to=as.POSIXct('2014-6-1'), by='week')
T2 <- data.table(time2=t2, group=rep(c('A', 'B'), length(t2)/2), value2='ArbitraryText')
For each row of T2 I would like to find all of the rows in T1 that fall between T2$t2 and T2$t2minus 1 week, then take the average value of T1$V2, by T2$group.
So the number of rows in the resulting table would be exactly equal to the number of rows in T2 and the "correct" value that should be returned for the second row of T2 (the average value of those T1$value that are in T1$group B and fall between Jan 1 and Jan 22) would look like this:
t2 group value1 value2
2014-01-22 00:00:00 B 0.1674069 "Arbitrary Text"
I imagine the fist step would be setting the keys for each data.table:
setkey(T1, group, time1)
setkey(T2, group, time2)
I'm unsure of how to proceed. Curiously T1[T2[time1 %between% c(t2, t2-604800)]] yields only results between Jan 1 and Jan 8, despite the default mult='all'.
EDIT: I should point out that each of the intervals (T2$time2 minus 3 weeks to T2$time2) overlap each other on purpose. This means that each row of T1 "belongs" to more than one desired average because it falls into the interval specified by more than one row of T2.
Try creating a grouping vector within T1 that is constructed using T2 breakpoints passed to the cut.POSIXt function:
T1[ , grp := cut(time1, breaks=T2[,time2]) ]
> str(T1)
Classes ‘data.table’ and 'data.frame': 151 obs. of 4 variables:
$ time1: POSIXct, format: "2014-01-01 00:00:00" "2014-01-02 00:00:00" "2014-01-03 00:00:00" ...
$ group: chr "A" "B" "A" "B" ...
$ value: num -0.626 0.184 -0.836 1.595 0.33 ...
$ grp : Factor w/ 21 levels "2014-01-01 00:00:00",..: 1 1 1 1 1 1 1 2 2 2 ...
- attr(*, ".internal.selfref")=<externalptr>
#------------------
> T1[, mean(value), by="grp"]
#----------------
grp V1
1: 2014-01-01 00:00:00 0.04475859
2: 2014-01-08 00:00:00 0.01062880
3: 2014-01-15 00:00:00 0.62024902
4: 2014-01-22 00:00:00 -0.31364304
5: 2014-01-29 00:00:00 0.02178433
6: 2014-02-05 00:00:00 0.08238828
7: 2014-02-12 00:00:00 0.12544920
8: 2014-02-19 00:00:00 0.47033820
9: 2014-02-26 00:00:00 0.29648943
10: 2014-03-05 00:00:00 0.20856893
11: 2014-03-12 01:00:00 -0.28046960
12: 2014-03-19 01:00:00 -0.22334306
13: 2014-03-26 01:00:00 0.25434429
14: 2014-04-02 01:00:00 0.48056376
15: 2014-04-09 01:00:00 -0.52624880
16: 2014-04-16 01:00:00 0.62330703
17: 2014-04-23 01:00:00 0.01092562
18: 2014-04-30 01:00:00 0.12544150
19: 2014-05-07 01:00:00 -0.15919531
20: 2014-05-14 01:00:00 -0.61236195
21: 2014-05-21 01:00:00 -0.37797879
22: NA -0.61483084
grp V1
You don't get the same number of groups as events in T2 but rather that number minus 1. I didn't use setkey since my by call was to the constructed column. If it's only a one time use, then I'm not sure its needed.

Resources