Adding a random number of days to dates via some function - r

My data contains a column of order dates. It also has a column of delivery dates. Some of the delivery dates are a date (12/31/1990) that occurred before the order date, which is causing problems in calculating average shipping time. I would like to take the order date for these rows and add a random number of days from a uniform distribution.
First, I tried to write a function that I could apply to the data, but the result was not what I wanted. What I want is for the simulated delivery date to end up in the delivery date column.
func1 = function(x){
if(x[2]=="1990-12-31" && !is.na(x[2]))
x[2] = as.Date(x[1]) + floor(runif(1,min=0,max=30))
return (x)
}
Example data:
x <- structure(list(orderDate = structure(c(15706, 15706, 15706, 15706,
15706), class = "Date"), deliveryDate = structure(c(15707, 15707,
7669, 15707, 7669), class = "Date")), .Names = c("orderDate",
"deliveryDate"), row.names = c(NA, 5L), class = "data.frame")
# orderDate deliveryDate
#1 2013-01-01 2013-01-02
#2 2013-01-01 2013-01-02
#3 2013-01-01 1990-12-31
#4 2013-01-01 2013-01-02
#5 2013-01-01 1990-12-31

If I did not get it wrong, x is a data frame with 2 columns. A vectorized if implementation can be achieved via ifelse:
x[[2]] <- structure(ifelse(x[[2]] == "1990-12-31" & !is.na(x[[2]]),
as.Date(x[[1]]) + sample(0:30, 1),
x[[2]]),
class = "Date")
Or a faster replacement:
ind <- x[[2]] == "1990-12-31" & !is.na(x[[2]])
x[ind, 2] <- as.Date(x[ind, 1]) + sample(0:30, sum(ind), replace = TRUE)
With your example dataset and the same random seed 0, both options give the same result:
# orderDate deliveryDate
#1 2013-01-01 2013-01-02
#2 2013-01-01 2013-01-02
#3 2013-01-01 2013-01-28
#4 2013-01-01 2013-01-02
#5 2013-01-01 2013-01-28
In the first case, ifelse alone is returning integers (the internal representation of "Date"), hence we need to give "Date" class to it to make it a "Date".

Related

r: How to calculate duration in hh:mm:s to a fixed time in hours but overlapping midnight

I have the following hh:mm:ss defined in df$times
times
1 22:55:00
2 01:05:00
3 21:00:00
I want to calculate the duration from 20:00:00 to each of these times. However, some times are after midnight and the duration should in this case be estimated to 20:00:00 the 'following day`
Expected output
times new
1 22:55:00 2.92
2 01:05:00 5.08
3 21:00:00 -1.00
Data
df <- structure(list(times = c("22:55:00", "01:05:00", "21:00:00")), class = "data.frame", row.names = c(NA,
-3L))
We convert to period class with hms, create a condition to check whether the compared time value is greater than the 'times', then add 1 day and subtract or else just subtract
library(dplyr)
library(lubridate)
df %>%
mutate(times1 = hms(times), times2 = hms("20:00:00"),
new = as.numeric(case_when(times1 < times2 ~
(times1 + hms("24:00:00") - times2), TRUE ~ times2- times1))/3600 ) %>%
select(times, new)
# times new
#1 22:55:00 -2.916667
#2 01:05:00 5.083333
#3 21:00:00 -1.000000
library(dplyr)
library(data.table)
mydf <- structure(list(times = c("22:55:00", "01:05:00", "21:00:00")), class = "data.frame", row.names = c(NA,
-3L))
mydf %>%
mutate(
times = as.ITime(times),
difference = case_when (
times > as.ITime("20:00:00") ~ difftime(times, as.ITime("20:00:00")),
TRUE ~ difftime(as.ITime("23:59:59"), as.ITime("19:59:59"))
+ difftime( times, as.ITime("00:00:01"))
)
)

How can I add 1 to a column in R when A conditional is met?

I am trying to fill a new column in a data frame (in R) based on the following conditional:
df$B<- ifelse(difftime(df$A,lag(df$A))>minutes(30), increment(1), increment(0))
Here, the A column is time. So in A, every time the time difference between row i and row i-1 is greater than 30 minutes, I increment the new column B by one.
A B
1:00 1
1:31 2
1:40 2
2:30 3
Example
Any help is greatly appreciated, thank you.
In base R, you can use cumsum with difftime :
df$B <- cumsum(c(TRUE, difftime(df$A[-1], df$A[-nrow(df)], units = 'mins') > 30))
df
# A B
#1 2020-02-03 01:00:00 1
#2 2020-02-03 02:00:00 2
#3 2020-02-03 02:15:00 2
#4 2020-02-03 03:00:00 3
data
Make sure class(df$A) returns "POSIXct" :
df <- structure(list(A = structure(c(1580691600, 1580695200, 1580696100,
1580698800), class = c("POSIXct", "POSIXt"), tzone = "UTC")),
class = "data.frame", row.names = c(NA, -4L))

Prophet Date Format R

year_month amount_usd
201501 -390217.24
201502 230944.09
201503 367259.69
201504 15000.00
201505 27000.21
201506 38249.65
df <- structure(list(year_month = 201501:201506, amount_usd = c(-390217.24,
230944.09, 367259.69, 15000, 27000.21, 38249.65)), class = "data.frame", row.names = c(NA,
-6L))
I want to bring it in to DD/MM/YYYY format for usability in Prophet Forecasting code.
this is what i have tried so far.
for (loopitem in loopvec){
df2 <- subset(df, account_id==loopitem)
df3 <- df2[,c("year_month","amount_usd")]
df3$year_month <- as.Date(df3$year_month, format="YYYY-MM", origin="1/1/1970")
try <- prophet(df3, seasonality.mode = 'multiplicative')
}
Error in fit.prophet(m, df, ...) :
Dataframe must have columns 'ds' and 'y' with the dates and values respectively.
You need to paste the day number (I'm just using the first) to the year_month values, then can use the ymd() function from lubridate to convert the column to a date object.
library(dplyr)
library(lubridate)
mutate_at(df, "year_month", ~ymd(paste(., "01")))
year_month amount_usd
1 2015-01-01 -390217.24
2 2015-02-01 230944.09
3 2015-03-01 367259.69
4 2015-04-01 15000.00
5 2015-05-01 27000.21
6 2015-06-01 38249.65

convert quarter year to last date of quarter in R

I have an issue when I use as.Date(as.yearqtr(test[,1],format ="%qQ%Y"),frac =1), but it returns an error,and quater-year didn't change to date. The error is:
error in as.yearqtr(as.numeric(x)) (list) object cannot be coerced to type 'double'
This is my dataframe in R.
TIME VALUE
1Q2019 1
2Q2019 2
3Q2019 3
4Q2019 4
The ideal output is
TIME VALUE
2019-03-31 1
2019-06-30 2
2019-09-30 3
2019-12-31 4
We can convert to Date with zoo and get the last date of the quarter with frac. We use some RegEx to rearrange in zoo's suitable format:
df$TIME=as.Date(as.yearqtr(gsub("(\\d)(Q)(\\d{1,})","\\3 Q\\1",df$TIME)),frac = 1)
df
TIME VALUE
1 2019-03-31 1
2 2019-06-30 2
3 2019-09-30 3
4 2019-12-31 4
Data:
df <-structure(list(TIME = structure(1:4, .Label = c("1Q2019", "2Q2019",
"3Q2019", "4Q2019"), class = "factor"), VALUE = 1:4), class = "data.frame", row.names = c(NA,
-4L))
Here is a function that will return a vector of dates, given an input vector in the form of 1Q2019...
dateStrings <- c("1Q2019","2Q2019","3Q2019","4Q2019","1Q2020")
lastDayOfQuarter <- function(x){
require(lubridate)
result <- NULL
months <-c(3,6,9,12)
days <- c(31,30,30,31)
for(i in 1:length(x)) {
qtr <- as.numeric(substr(x[i],1,1))
result[i] <- mdy(paste(months[qtr],days[qtr],(substr(x[i],3,6)),sep="-"))
}
as.Date(result)
}
lastDayOfQuarter(dateStrings)
and the output:
>lastDayOfQuarter(dateStrings)
[1] "2019-03-31" "2019-06-30" "2019-09-30" "2019-12-31" "2020-03-31"
>

Shaping Interval Dates and Range Dates in Same DF

I'm trying to calculate how long one person stays in a homeless shelter using R. The homeless shelter has two different types of check-ins, one for overnight and another for a long-term. I would like to shape the data to get an EntryDate and ExitDate for every stay which does not have at least a one day break.
Here are what the data currently look like:
PersonalID EntryDate ExitDate
1 2016-12-01 2016-12-02
1 2016-12-03 2016-12-04
1 2016-12-16 2016-12-17
1 2016-12-17 2016-12-18
1 2016-12-18 2016-12-19
2 2016-10-01 2016-10-20
2 2016-10-21 2016-10-22
3 2016-09-01 2016-09-02
3 2016-09-20 2016-09-21
Ultimately, I'm trying to get the above date to represent continuous ranges to calculate total length of stay by participant.
For example, the above data would become:
PersonalID EntryDate ExitDate
1 2016-12-01 2016-12-04
1 2016-12-16 2016-12-19
2 2016-10-01 2016-10-22
3 2016-09-01 2016-09-02
3 2016-09-20 2016-09-21
Here is an ugly solution. It is probably possible to do something more clean... But it works. This solution should alaso be debugged with real data (I have added one line to your exaple to have more different situations)
d <- read.table(text = '
PersonalID EntryDate ExitDate
1 2016-12-01 2016-12-02
1 2016-12-03 2016-12-04
1 2016-12-16 2016-12-17
1 2016-12-17 2016-12-18
1 2016-12-18 2016-12-19
2 2016-10-01 2016-10-20
2 2016-10-21 2016-10-22
3 2016-09-01 2016-09-02
3 2016-09-20 2016-09-21
4 2016-09-20 2016-09-21
', header = TRUE)
#' transorm in Date format
d$EntryDate <- as.Date(as.character(d$EntryDate))
d$ExitDate <- as.Date(as.character(d$ExitDate))
summary(d)
#' Reorder to be sure that the ExitDate / Entry date are in chronological order
d <- d[order(d$PersonalID, d$EntryDate),]
#' Add a column that will store the number of days between one exit and the next entry
d$nbdays <- 9999
# Split to have a list with dataframe for each ID
d <- split(d, d$PersonalID)
d
for(i in 1:length(d)) {
# Compute number of days between one exit and the next entry (only if there are
# more than one entry)
if(nrow(d[[i]])>1) {
d[[i]][-1,"nbdays"] <- d[[i]][2:nrow(d[[i]]),"EntryDate"] -
d[[i]][1:(nrow(d[[i]])-1),"ExitDate"]
}
x <- d[[i]] # store a copy of the data to lighten the syntax
# Entry dates for which the previous exit is higher than 1 day (including the first one)
entr <- x[x$nbdays>1,"EntryDate"]
# Exit dates just before cases where nbdays are > 1 and includes the last exit date.
# We use unique to avoid picking 2 times the last exit
whichexist <- unique(c(c(which(x$nbdays > 1)-1)[-1],nrow(x)))
exit <- x[whichexist,"ExitDate"]
d[[i]] <- data.frame(
PersonalID = x[1,1],
EntryDate = entr,
ExitDate = exit
)
}
# paste the elements of this list into one data.frame
do.call(rbind, d)
Here a solution using dplyr.
library(dplyr)
d = structure(list(PersonalID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L,
3L), EntryDate = structure(c(17136, 17138, 17151, 17152, 17153,
17075, 17095, 17045, 17064), class = "Date"), ExitDate = structure(c(17137,
17139, 17152, 17153, 17154, 17094, 17096, 17046, 17065), class = "Date")), class = "data.frame", .Names = c("PersonalID",
"EntryDate", "ExitDate"), row.names = c(NA, -9L))
First create a temporary dataframe to hold all the dates between entry and exit date:
d2 = d %>%
rowwise() %>%
do(data.frame(PersonalID = .$PersonalID, Present = seq(.$EntryDate, .$ExitDate, by = 'day'))) %>%
unique %>% ## remove double dates when exit and re-entry occur on the same day
ungroup()
Then look for all the consecutive dates with some inpiration from https://stackoverflow.com/a/14868742/827766
d2 %>%
group_by(PersonalID) %>%
mutate(delta = c(1, diff(as.Date(Present)))) %>%
group_by(PersonalID, stay = cumsum(delta!=1)) %>%
summarize(EntryDate = min(Present), ExitDate = max(Present)) %>%
subset(select = -c(stay))

Resources