Categorizing data using date variable in R - r

I am having trouble in using the date variable in my dataset to create categories of 6 months time period. I want to create these time period categories for years between 2017-1-1 and 2020-6-30. The time period categories for each year would be from 2017-1-1 to 2017-6-30, and 2017-7-1 to 2017-12-31 until 2020-6-30.
I have used the following two types of codes to create date categories but I am getting a similar error:
#CODE1
#checking for date class
myData <- str(myData)
myData #date in factor class
#convert to date class
date_class <- as.Date(myData$date, format = "%m/%d/%Y")
myData$date_class <- as.Date(myData$date, format = "%m/%d/%Y")
myData
#creating timeperiod category 1
date_cat <- NA
myData$date_cat[which(myData$date_class >= "2017-1-1" & myData$date_class < "2017-7-1")] <- 1
#CODE2
#converting to date format
myData$date <- strptime(myData$date,format="%m/%d/%Y")
myData$date <- as.POSIXct(myData$date)
myData
#creating timeperiod category 1
date_cat <- NA
myData$date_cat[which(myData$date >= "2017-1-1" & myData$date < "2017-7-1")] <- 1
For both the codes I am getting a similar error
Error in $<-.data.frame(*tmp*, date_cat, value = numeric(0)) :
replacement has 0 rows, data has 1123
Please help me with understanding where I am going wrong.
Thanks,
Priya

Here's a function (to.interval) that returns a time interval {0, 1, 2, 3, ...}, given parameters of the event date, index date, and interval width. Probably a good idea to include error checking in the function, so if for example the event date is prior to the anchor date, it returns NA.
df <- data.frame(event.date=as.Date(c("2017-01-01", "2017-08-01", "2018-04-30")))
to.interval <- function(anchor.date, future.date, interval.days){
round(as.integer(future.date - anchor.date) / interval.days, 0)}
df$interval <- to.interval(as.Date('2017-01-01'),
df$event.date, 180 )
df
Output
event.date interval
1 2017-01-01 0
2 2017-08-01 1
3 2018-04-30 3

Related

How do I get the frequency value by month and the total cost in the same month in the dataset?

How do I get the frequency value by month and the total cost in the same month in the dataset?
This is my dataset
install.packages("CASdatasets", repos = "http://dutangc.free.fr/pub/RRepos/", type="source")
library(CASdatasets)
data("itamtplcost") #load the dataset
head(itamtplcost)
names(itamtplcost)
View(itamtplcost)
I would like to count the items by the month and get the "total claim size" in the Date column in R.
Below is the output I would like to achieve:
I tried the below code to get what I want but I can't get the "total claim size".
itamtplcost$Date <- as.Date(itamtplcost$Date, format="%d/%m/%Y")
tab <- table(cut(itamtplcost$Date, 'month'))
monthly_aggre<-data.frame(Date=format(as.Date(names(tab)), '%m/%Y'),
Frequency=as.vector(tab))
Can someone help me? Thanks in advance.
You can use apply.monthly() function from xts package.
itamtplcost$Date <- as.Date(itamtplcost$Date, format="%d/%m/%Y")
library(xts)
df <- as.xts(itamtplcost$UltimateCost, order.by = itamtplcost$Date)
apply.monthly(df, sum)
Output
[,1]
1997-01-08 726986.95
1997-03-18 1651225.47
1997-04-11 895903.67
1997-06-23 3554487.85
1997-07-28 3545689.57
1997-08-29 673680.14
1997-09-27 738011.81
1997-10-02 1008364.26
...
To convert this into the required format
library(lubridate)
df2 <- data.frame(Month = paste0(month(index(df)), "/",year(index(df))),
itamtplcost = df)
rownames(df2) <- NULL
OUtput:
> df2
Month itamtplcost
1 1/1997 726986.95
2 3/1997 1222682.37
3 3/1997 428543.10
4 4/1997 258786.06
5 4/1997 637117.61
6 6/1997 1769925.16
7 6/1997 702034.03
8 6/1997 1082528.66
9 7/1997 1005364.03
10 7/1997 780910.04
11 7/1997 429970.06
12 7/1997 700445.53
...

Converting filenames to date in year + weeks returns Error in charToDate (x): character string is not in a standard unambiguous format

For a time series analysis of over 1000 raster in a raster stack I need the date. The data is almost weekly in the structure of the files
"... 1981036 .... tif"
The zero separates year and week
I need something like: "1981-36"
but always get the error
Error in charToDate (x): character string is not in a standard unambiguous format
library(sp)
library(lubridate)
library(raster)
library(Zoo)
raster_path <- ".../AVHRR_All"
all_raster <- list.files(raster_path,full.names = TRUE,pattern = ".tif$")
all_raster
brings me:
all_raster
".../VHP.G04.C07.NC.P1981036.SM.SMN.Andes.tif"
".../VHP.G04.C07.NC.P1981037.SM.SMN.Andes.tif"
".../VHP.G04.C07.NC.P1981038.SM.SMN.Andes.tif"
…
To get the year and the associated week, I have used the following code:
timeline <- data.frame(
year= as.numeric(substr(basename(all_raster), start = 17, stop = 17+3)),
week= as.numeric(substr(basename(all_raster), 21, 21+2))
)
timeline
brings me:
timeline
year week
1 1981 35
2 1981 36
3 1981 37
4 1981 38
…
But I need something like = "1981-35" to be able to plot my time series later
I tried that:
timeline$week <- as.Date(paste0(timeline$year, "%Y")) + week(timeline$week -1, "%U")
and get the error:Error in charToDate(x) : character string is not in a standard unambiguous format
or I tried that
fileDates <- as.POSIXct(substr((all_raster),17,23), format="%y0%U")
and get the same error
until someone will post a better way to do this, you could try:
x <- c(".../VHP.G04.C07.NC.P1981036.SM.SMN.Andes.tif", ".../VHP.G04.C07.NC.P1981037.SM.SMN.Andes.tif",
".../VHP.G04.C07.NC.P1981038.SM.SMN.Andes.tif")
xx <- substr(x, 21, 27)
library(lubridate)
dates <- strsplit(xx,"0")
dates <- sapply(dates,function(x) {
year_week <- unlist(x)
year <- year_week[1]
week <- year_week[2]
start_date <- as.Date(paste0(year,'-01-01'))
date <- start_date+weeks(week)
#note here: OP asked for beginning of week.
#There's some ambiguity here, the above is end-of-week;
#uncommment here for beginning of week, just subtracted 6 days.
#I think this might yield inconsistent results, especially year-boundaries
#hence suggestion to use end of week. See below for possible solution
#date <- start_date+weeks(week)-days(6)
return (as.character(date))
})
newdates <- as.POSIXct(dates)
format(newdates, "%Y-%W")
Thanks to #Soren who posted this anwer here: Get the month from the week of the year
You can do it if you specify that Monday is a Weekday 1 with %u:
w <- c(35,36,37,38)
y <- c(1981,1981,1981,1981)
s <- c(1,1,1,1)
df <- data.frame(y,w,s)
df$d <- paste(as.character(df$y), as.character(df$w),as.character(df$s), sep=".")
df$date <- as.Date(df$d, "%Y.%U.%u")
# So here we have variable date as date if you need that for later.
class(df$date)
#[1] "Date"
# If you want it to look like Y-W, you can do the final formatting:
df$date <- format(df$date, "%Y-%U")
# y w s d date
# 1 1981 35 1 1981.35.1 1981-35
# 2 1981 36 1 1981.36.1 1981-36
# 3 1981 37 1 1981.37.1 1981-37
# 4 1981 38 1 1981.38.1 1981-38
# NB: though it looks correct, the resulting df$date is actually a character:
class(df$date)
#[1] "character"
Alternatively, you could do the same by setting the Sunday as 0 with %w.

Need to create sequence of date intervals

I need to create shift timings for next 15 days
So I am trying like this:
library(lubridate)
c = matrix(nrow=360, ncol=45)
date1 <- ymd_hms("2000-01-01 00:00:00",tz = "US/Eastern")
date2 <- ymd_hms("2000-01-01 08:00:00",tz = "US/Eastern")
date3<- ymd_hms("2000-01-01 16:00:00",tz = "US/Eastern")
date4<- ymd_hms("2000-02-01 00:00:00",tz = "US/Eastern")
I created three shift intervals for day1 as follows:
shift1<-interval(date1,date2)
shift2<-interval(date2,date3)
shift3<-interval(date4,date3)
And I want to create similar intervals for next 14 days. I am trying like this:
end<-as.matrix(rep(c(shift1,shift2,shift3)+days(1),14)))
This is the error
Error in per#.Data + num :
Arithmetic operators undefined for 'numeric' and 'Interval' classes:
convert one to numeric or a matching time-span class.
How about this apply family solution?
library(lubridate)
date1 <- ymd_hms("2000-01-01 00:00:00",tz = "US/Eastern")
end<- lapply(0:14, function(x){
lapply(c(0,8,16), function(y){
interval((date1+days(x)+hours(y)), (date1+days(x)+hours(y+8)))
})
})
lapply(0:14) will create group of intervals from day 0 to day 14 starting in date1
lapply(c(0,8,16) will create n+1 intervals where n = c(0,8,16) starting from date1 + day X (X determined by first lapply)
The result will be a list of lists where for example end[[8]][3] will be the 3rd intervals of day 8 starting date1
Best!

Fill in missing date and fill with the data above

I've researched enough until i ask this here but can you please help me with some ideas for this issue?
My data table (df) looks like this:
client id value repmonth
123 100 2012-01-31
123 200 2012-02-31
123 300 2012-05-31
Therefore I have 2 missing months. And i want my data table to look like this:
client id value repmonth
123 100 2012-01-31
123 200 2012-02-31
123 200 2012-03-31
123 200 2012-04-31
123 300 2012-05-31
The code should be filling in the missing repmonth and fill the rows with the last value, in this case 200 and the came client id.
I have tried the following:
zoo library
tidyr library
dlpyr library
posixct
As for codes: ...plenty of fails
library(tidyr)
df %>%
mutate (repmonth = as.Date(repmonth)) %>%
complete(repmonth = seq.Date(min(repmonth), max(repmonth),by ="month"))
or
library(dplyr)
df$reportingDate.end.month <- as.POSIXct(df$datetime, tz = "GMT")
df <- tbl_df(df)
list_df <- list(df, df) # fake list of data.frames
seq_df <- data_frame(datetime = seq.POSIXt(as.POSIXct("2012-01-31"),
as.POSIXct("2018-12-31"),
by="month"))
lapply(list_df, function(x){full_join(total_loan_portfolios_3$reportingDate.end.month, seq_df, by=reportingDate.end.month)})
total_loan_portfolios_3$reportingmonth_notmissing <- full_join(seq_df,total_loan_portfolios_3$reportingDate.end.month)
or
library(dplyr)
ts <- seq.POSIXt(as.POSIXct("2012-01-01",'%d/%m/%Y'), as.POSIXct("2018/12/01",'%d/%m/%Y'), by="month")
ts <- seq.POSIXt(as.POSIXlt("2012-01-01"), as.POSIXlt("2018-12-01"), by="month")
ts <- format.POSIXct(ts,'%d/%m/%Y')
df <- data.frame(timestamp=ts)
total_loan_portfolios_3 <- full_join(df,total_loan_portfolios_3$Reporting_date)
Finally, I have plenty of errors like
the format is not date
or
Error in seq.int(r1$mon, 12 * (to0$year - r1$year) + to0$mon, by) :
'from' must be a finite number
and others.
The following solution uses lubridate and tidyr packages. Note that in OP example, dates are malformed, but implies having data with last-day-of-month input, so tried to replicate it here. Solution creates a sequence of dates from min input date to max input date to get all possible months of interest. Note that input dates are normalized to first-day-of-month to ensure proper sequence generation. With the sequence created, a left-join merge is done to merge data we have and identify missing data. Then fill() is applied to columns to fill in the missing NAs.
library(lubridate)
library(tidyr)
#Note OP has month of Feb with 31 days... Corrected to 28 but this fails to parse as a date
df <- data.frame(client_id=c(123,123,123),value=c(100,200,300),repmonth=c("2012-01-31","2012-02-29","2012-05-31"),stringsAsFactors = F)
df$repmonth <- ymd(df$repmonth) #convert character dates to Dates
start_month <- min(df$repmonth)
start_month <- start_month - days(day(start_month)-1) #first day of month to so seq.Date sequences properly
all_dates <- seq.Date(from=start_month,to=max(df$repmonth),by="1 month")
all_dates <- (all_dates %m+% months(1)) - days(1) #all end-of-month-day since OP suggests having last-day-of-month input?
all_dates <- data.frame(repmonth=all_dates)
df<-merge(x=all_dates,y=df,by="repmonth",all.x=T)
df <- fill(df,c("client_id","value"))
Solution yields:
> df
repmonth client_id value
1 2012-01-31 123 100
2 2012-02-29 123 200
3 2012-03-31 123 200
4 2012-04-30 123 200
5 2012-05-31 123 300

weekend dates within an interval R

I'm trying to identify whether or not a weekend fell within an interval of dates. I've been able to identify if a specific date is a weekend, but not when trying to look at a range of dates. Is this possible? If so, please advise. TIA.
library(lubridate, chron)
start.date <- c("1/1/2017", "2/1/2017")
end.date <- c("1/21/2017", "2/11/2017")
df <- data.frame(start.date, end.date)
df$start.date <- mdy(df$start.date)
df$end.date <- mdy(df$end.date)
df$interval.date <- interval(df$start.date, df$end.date)
df$weekend.exist <- ifelse(is.weekend(df$interval.date), 1, 0)
# Error in dts - floor(dts) :
# Arithmetic operators undefined for 'Interval' and 'Interval' classes:
# convert one to numeric or a matching time-span class.
why don't you prefer a seq of dates rather than creating the interval ? like
df$weekend.exist <- sapply(1:nrow(df), function(i)
as.numeric(any(is.weekend(seq(df$start.date[i], df$end.date[i],by = "day")))))
# [1] 1 1
library(dplyr)
df %>%
group_by(start.date,end.date) %>%
mutate(weekend.exist = as.numeric(any(is.weekend(seq(start.date, end.date,by = "day")))))
# start.date end.date weekend.exist
# <date> <date> <dbl>
# 1 2017-01-01 2017-01-21 1
# 2 2017-02-01 2017-02-03 1

Resources