I have a data frame (df) like the following:
Date Arrivals
2014-07 100
2014-08 150
2014-09 200
I know that I can convert the yearmon dates to the first date of each month as follows:
df$Date <- as.POSIXct(paste0(as.character(df[,1]),"-01"), format = "%Y-%m-%d")
However, given that my data is not available until the end of the month I want to index it to the end rather than the beginning, and I cannot figure it out. Any help appreciated.
If the Date variable is an actual yearmon class vector, from the zoo package, the as.Date.yearmon method can do what you want via its argument frac.
Using your data, and assuming that the Date was originally a character vector
library("zoo")
df <- data.frame(Date = c("2014-07", "2014-08", "2014-09"),
Arrivals = c(100, 150, 200))
I convert this to a yearmon vector:
df <- transform(df, Date2 = as.yearmon(Date))
Assuming this is what you have, then you can achieve what you want using as.Date() with frac = 1:
df <- transform(df, Date3 = as.Date(Date2, frac = 1))
which gives:
> df
Date Arrivals Date2 Date3
1 2014-07 100 Jul 2014 2014-07-31
2 2014-08 150 Aug 2014 2014-08-31
3 2014-09 200 Sep 2014 2014-09-30
That shows the individual steps. If you only want the final Date this is a one-liner
## assuming `Date` is a `yearmon` object
df <- transform(df, Date = as.Date(Date, frac = 1))
## or if not a `yearmon`
df <- transform(df, Date = as.Date(as.yearmon(Date), frac = 1))
The argument frac in the fraction of the month to assign to the resulting dates when converting from yearmon objects to Date objects. Hence, to get the first day of the month, rather than convert to a character and paste on "-01" as your Question showed, it's better to coerce to a Date object with frac = 0.
If the Date in your df is not a yearmon class object, then you can solve your problem by converting it to one and then using the as.Date() method as described above.
Here is a way to do it using the zoo package.
R code:
library(zoo)
df
# Date Arrivals
# 1 2014-07 100
# 2 2014-08 150
# 3 2014-09 200
df$Date <- as.Date(as.yearmon(df$Date), frac = 1)
# output
# Date Arrivals
# 1 2014-07-31 100
# 2 2014-08-31 150
# 3 2014-09-30 200
Using lubridate, you can add a month and subtract a day to get the last day of the month:
library(lubridate)
ymd(paste0(df$Date, '-01')) + months(1) - days(1)
# [1] "2014-07-31" "2014-08-31" "2014-09-30"
Related
I have a data frame (df) like the following:
Date Arrivals
2014-07 100
2014-08 150
2014-09 200
I know that I can convert the yearmon dates to the first date of each month as follows:
df$Date <- as.POSIXct(paste0(as.character(df[,1]),"-01"), format = "%Y-%m-%d")
However, given that my data is not available until the end of the month I want to index it to the end rather than the beginning, and I cannot figure it out. Any help appreciated.
If the Date variable is an actual yearmon class vector, from the zoo package, the as.Date.yearmon method can do what you want via its argument frac.
Using your data, and assuming that the Date was originally a character vector
library("zoo")
df <- data.frame(Date = c("2014-07", "2014-08", "2014-09"),
Arrivals = c(100, 150, 200))
I convert this to a yearmon vector:
df <- transform(df, Date2 = as.yearmon(Date))
Assuming this is what you have, then you can achieve what you want using as.Date() with frac = 1:
df <- transform(df, Date3 = as.Date(Date2, frac = 1))
which gives:
> df
Date Arrivals Date2 Date3
1 2014-07 100 Jul 2014 2014-07-31
2 2014-08 150 Aug 2014 2014-08-31
3 2014-09 200 Sep 2014 2014-09-30
That shows the individual steps. If you only want the final Date this is a one-liner
## assuming `Date` is a `yearmon` object
df <- transform(df, Date = as.Date(Date, frac = 1))
## or if not a `yearmon`
df <- transform(df, Date = as.Date(as.yearmon(Date), frac = 1))
The argument frac in the fraction of the month to assign to the resulting dates when converting from yearmon objects to Date objects. Hence, to get the first day of the month, rather than convert to a character and paste on "-01" as your Question showed, it's better to coerce to a Date object with frac = 0.
If the Date in your df is not a yearmon class object, then you can solve your problem by converting it to one and then using the as.Date() method as described above.
Here is a way to do it using the zoo package.
R code:
library(zoo)
df
# Date Arrivals
# 1 2014-07 100
# 2 2014-08 150
# 3 2014-09 200
df$Date <- as.Date(as.yearmon(df$Date), frac = 1)
# output
# Date Arrivals
# 1 2014-07-31 100
# 2 2014-08-31 150
# 3 2014-09-30 200
Using lubridate, you can add a month and subtract a day to get the last day of the month:
library(lubridate)
ymd(paste0(df$Date, '-01')) + months(1) - days(1)
# [1] "2014-07-31" "2014-08-31" "2014-09-30"
I have the following data frame representing user subscriptions:
User StartDate EndDate
1 2015-09-03 2015-10-17
2 2015-10-27 2015-12-25
...
How can I transform it into a time series that gives me the count of active monthly subscriptions over time (assuming it is active in the month if at least for one day in that month). Something like this (based on the example above, assuming only 2 records):
Month Count
2015-08 0
2015-09 1
2015-10 2
2015-11 1
2015-12 1
2016-01 0
Rem: I took some arbitrary start and end dates for the time series, to make the example clear.
Prepare the data and make sure that the date columns are actually stored as dates:
data <- read.table(text = "User StartDate EndDate
1 2015-09-03 2015-10-17
2 2015-10-27 2015-12-25", header = TRUE)
data$StartDate <- as.Date(StartDate)
data$EndDate <- as.Date(EndDate))
This function returns a vector with all month that are within a subscription:
library(lubridate)
subscr_month <- function(start, end) {
start <- floor_date(start, "month")
seq <- seq(start, end, by = "1 month")
months <- format(seq, format = "%Y-%m")
return(months)
}
It uses the function floor_date() from the lubridate package. It is necessary to round of the start date, because otherwise the last month might be missing. For example, for user 2, if you add two month to the start date, you end up on 2015-12-27, which is after the end date, such that no date from December will be included in seq. The last line converts the Dates to character that only include year and month.
Now, you can apply this function to each start and end date from your data using mapply(). Afterwards, table() creates a table of counts of all dates in the resulting list:
all_month <- mapply(subscr_month, data$StartDate, data$EndDate, SIMPLIFY = FALSE)
table(unlist(all_month))
## 2015-09 2015-10 2015-11 2015-12
## 1 2 1 1
You can also convert the table to a data frame:
as.data.frame(table(unlist(all_month)))
## Var1 Freq
## 1 2015-09 1
## 2 2015-10 2
## 3 2015-11 1
## 4 2015-12 1
Your example output also includes the counts for months that do not appear in the data set. If you want to have this, you can convert the vector of months to a factor and set the levels to all the months you want to include:
month_list <- format(seq(as.Date("2015-08-01"), as.Date("2016-01-01"), by = "1 month"), format = "%Y-%m")
all_month_factor <- factor(unlist(all_month), levels = month_list)
table(all_month_factor)
## all_month_factor
## 2015-08 2015-09 2015-10 2015-11 2015-12 2016-01
## 0 1 2 1 1 0
read the data frame mentioned.
df = structure(list(StartDate = structure(c(16681, 16735), class = "Date"),
EndDate = structure(c(16735, 16794), class = "Date")), class = "data.frame", .Names = c("StartDate",
"EndDate"), row.names = c(NA, -2L))
Could make good use of do in dplyr package and seq
df %>%
rowwise() %>% do({
w <- seq(.$StartDate,.$EndDate,by = "15 days") #for month difference less than 1 complete month
m <- format(w,"%Y-%m") %>% unique
data.frame(Month = m)
}) %>%
group_by(Month) %>%
summarise(Count = length(Month))
I've been scouring the net but haven't found a solution to this quite possibly simple problem.
This is the half-hourly data using the library 'xts',
library(xts)
data.xts <- as.xts(1:nrow(data), as.POSIXct("2007-08-24 17:30:00") +
1800 * (1:nrow(data)))
data.xts <-as.data.frame(data.xts)
I changed it to data.frame because the original data is in data.frame format. Actually, in the original data frame, there is a time_stamp column and I prefer if I can just use the time_stamp column instead of using the 'xts' format.
How can I average every hourly data for a month so that I can plot a hourly time series of 24 hours for the different months?
For example,
2007-08-24 17:30:00 1
2007-08-25 17:00:00 47
2007-08-25 17:30:00 48
2007-08-26 17:00:00 95
would be averaged for the month of August 2007, etc.
Goal is to plot averaged 24-hourly time series for each month.
Thanks!
Try
library(dplyr)
res <- dat %>%
group_by(month=format(datetime, '%m'),
#year=format(datetime, '%Y'), #if you need year also
# as grouping variable
hour=format(as.POSIXct(cut(datetime, breaks='hour')), '%H')) %>%
summarise(Meanval=mean(val, na.rm=TRUE))
head(res,3)
# month hour Meanval
#1 01 00 -0.02780036
#2 01 01 -0.06589948
#3 01 02 -0.02166218
Update
If your datetime is POSIXlt you could convert it to POSIXct.
dat$datetime <- as.POSIXlt(dat$datetime)
By running the above code, I get the error
# Error: column 'datetime' has unsupported type
You could use mutate and convert the datetime to POSIXct class by as.POSIXct
res1 <- dat %>%
mutate(datetime= as.POSIXct(datetime)) %>%
group_by(month=format(datetime, '%m'),
#year=format(datetime, '%Y'), #if you need year also
# as grouping variable
hour=format(as.POSIXct(cut(datetime, breaks='hour')), '%H')) %>%
summarise(Meanval=mean(val, na.rm=TRUE))
data
set.seed(24)
dat <- data.frame(datetime=seq(Sys.time(), by='1 hour', length.out=2000),
val=rnorm(2000))
If I understand you correctly, you want to average all the values for a given hour, for all the days in a given month, and do this for all months. So average all the values between midnight and 00:59:59 for all the days in a given month, etc.
I see that you want to avoid xts but aggregate.zoo(...) was designed for this, and avoids dplyr and cut.
library(xts)
# creates sample dataset...
set.seed(1)
data <- rnorm(1000)
data.xts <- as.xts(data, as.POSIXct("2007-08-24 17:30:00") +
1800 * (1:length(data)))
# using aggregate.zoo(...)
as.hourly <- function(x) format(x,"%Y-%m %H")
result <- aggregate(data.xts,by=as.hourly,mean)
result <- data.frame(result)
head(result)
# result
# 2007-08 00 0.12236024
# 2007-08 01 0.41593567
# 2007-08 02 0.22670817
# 2007-08 03 0.23402842
# 2007-08 04 0.22175078
# 2007-08 05 0.05081899
I have a dataframe with a date in the form YYYY-MM, class factor and I am trying to convert it to class date.
I tried:
Date <- c("2015-08","2015-09","2015-08")
Val <- c(1,2,3)
df <- data.frame(Date,Val)
df[,1] <- as.POSIXct(as.character(df[,1]), format = "%Y-%m")
df
But this does not work. I would be grateful for your help.
1) Convert the dates to zoo's "yearmon" class and then to "Date" class:
> library(zoo)
> transform(df, Date = as.Date(as.yearmon(Date)))
Date Val
1 2015-08-01 1
2 2015-09-01 2
3 2015-08-01 3
The question did not specify which date to convert to so we used the first of the month. Had the last of the month been wanted we could have used this instead:
transform(df, Date = as.Date(as.yearmon(Date), frac = 1))
2) Another possibility not using zoo is to just add the day of the month yourself and then convert to "Date" class.
> transform(df, Date = paste(Date, 1, sep = "-"))
Date Val
1 2015-08-01 1
2 2015-09-01 2
3 2015-08-01 3
3) Alternately, might want to just use "yearmon" directly since that directly models year and month with no day.
> library(zoo)
> transform(df, Date = as.yearmon(Date))
Date Val
1 Aug 2015 1
2 Sep 2015 2
3 Aug 2015 3
Note: Do not use "POSIXct" class as this gives a time zone dependent result that can cause subtle errors if you are not careful. A date in one time zone is not necessarily the same as in another time zone.
R does not support Dates in the format "%Y-%m"... A day is needed
You can do the following:
as.POSIXct(paste0(as.character(df[,1]),"-01"), format = "%Y-%m-%d")
Resulting in
"2015-08-01 CEST" "2015-09-01 CEST" "2015-08-01 CEST"
I've been scouring the net but haven't found a solution to this quite possibly simple problem.
This is the half-hourly data using the library 'xts',
library(xts)
data.xts <- as.xts(1:nrow(data), as.POSIXct("2007-08-24 17:30:00") +
1800 * (1:nrow(data)))
data.xts <-as.data.frame(data.xts)
I changed it to data.frame because the original data is in data.frame format. Actually, in the original data frame, there is a time_stamp column and I prefer if I can just use the time_stamp column instead of using the 'xts' format.
How can I average every hourly data for a month so that I can plot a hourly time series of 24 hours for the different months?
For example,
2007-08-24 17:30:00 1
2007-08-25 17:00:00 47
2007-08-25 17:30:00 48
2007-08-26 17:00:00 95
would be averaged for the month of August 2007, etc.
Goal is to plot averaged 24-hourly time series for each month.
Thanks!
Try
library(dplyr)
res <- dat %>%
group_by(month=format(datetime, '%m'),
#year=format(datetime, '%Y'), #if you need year also
# as grouping variable
hour=format(as.POSIXct(cut(datetime, breaks='hour')), '%H')) %>%
summarise(Meanval=mean(val, na.rm=TRUE))
head(res,3)
# month hour Meanval
#1 01 00 -0.02780036
#2 01 01 -0.06589948
#3 01 02 -0.02166218
Update
If your datetime is POSIXlt you could convert it to POSIXct.
dat$datetime <- as.POSIXlt(dat$datetime)
By running the above code, I get the error
# Error: column 'datetime' has unsupported type
You could use mutate and convert the datetime to POSIXct class by as.POSIXct
res1 <- dat %>%
mutate(datetime= as.POSIXct(datetime)) %>%
group_by(month=format(datetime, '%m'),
#year=format(datetime, '%Y'), #if you need year also
# as grouping variable
hour=format(as.POSIXct(cut(datetime, breaks='hour')), '%H')) %>%
summarise(Meanval=mean(val, na.rm=TRUE))
data
set.seed(24)
dat <- data.frame(datetime=seq(Sys.time(), by='1 hour', length.out=2000),
val=rnorm(2000))
If I understand you correctly, you want to average all the values for a given hour, for all the days in a given month, and do this for all months. So average all the values between midnight and 00:59:59 for all the days in a given month, etc.
I see that you want to avoid xts but aggregate.zoo(...) was designed for this, and avoids dplyr and cut.
library(xts)
# creates sample dataset...
set.seed(1)
data <- rnorm(1000)
data.xts <- as.xts(data, as.POSIXct("2007-08-24 17:30:00") +
1800 * (1:length(data)))
# using aggregate.zoo(...)
as.hourly <- function(x) format(x,"%Y-%m %H")
result <- aggregate(data.xts,by=as.hourly,mean)
result <- data.frame(result)
head(result)
# result
# 2007-08 00 0.12236024
# 2007-08 01 0.41593567
# 2007-08 02 0.22670817
# 2007-08 03 0.23402842
# 2007-08 04 0.22175078
# 2007-08 05 0.05081899