calculate age in years and months and melt data - r

I'm working with some time data and I'm having problems converting a time difference to years and months.
My data looks more or less like this,
dfn <- data.frame(
Today = Sys.time(),
DOB = seq(as.POSIXct('2007-03-27 00:00:01'), len= 26, by="3 day"),
Patient = factor(1:26, labels = LETTERS))
First I subtract the data of birth (DOB) form today's data (Today).
dfn$ageToday <- dfn$Today - dfn$DOB
This gives me the Time difference in days.
dfn$ageToday
Time differences in days
[1] 1875.866 1872.866 1869.866 1866.866 1863.866
[6] 1860.866 1857.866 1854.866 1851.866 1848.866
[11] 1845.866 1842.866 1839.866 1836.866 1833.866
[16] 1830.866 1827.866 1824.866 1821.866 1818.866
[21] 1815.866 1812.866 1809.866 1806.866 1803.866
[26] 1800.866
attr(,"tzone")
[1] ""
This is where first part of my question comes in; how do I convert this difference to years and months (rounded to months)? (i.e. 4.7, 4.11, etc.)
I read the ?difftime man page and the ?format, but I did not figure it out.
Any help would be appreciated.
Furthermore, I would like to melt my final object and if I try using melt on the data frame above using this command,
require(plyr)
require(reshape)
mdfn <- melt(dfn, id=c('Patient'))
I get this strange warning I haven't see before
Error in as.POSIXct.default(value) :
do not know how to convert 'value' to class "POSIXct"
So, my second question is; how do I create a time diffrence I can melt alongside my POSIXct variables? If I melt without dfn$ageToday everything works like a charm.
Thanks, Eric

The lubridatepackage makes working with dates and times, including finding time differences, really easy.
library("lubridate")
library("reshape2")
dfn <- data.frame(
Today = Sys.time(),
DOB = seq(as.POSIXct('2007-03-27 00:00:01'), len= 26, by="3 day"),
Patient = factor(1:26, labels = LETTERS))
dfn$diff <- new_interval(dfn$DOB, dfn$Today) / duration(num = 1, units = "years")
mdfn <- melt(dfn, id=c('Patient'))
class(mdfn$value) # all values are coerced into numeric
The new_interval() function calculates the time difference between two dates. Note that there is a function today() that could substitute for your use of Sys.time. Finally note the duration() function that creates a standard, ehm, duration that you can use to divide the interval by a length of standard units, in this case, a unit of one year.
In case you want to preserve the contents of Today and DOB, then you may want to convert everything to character first and reconvert later...
library("lubridate")
library("reshape2")
dfn <- data.frame(
Today = Sys.time(),
DOB = seq(as.POSIXct('2007-03-27 00:00:01'), len= 26, by="3 day"),
Patient = factor(1:26, labels = LETTERS))
# Create standard durations for a year and a month
one.year <- duration(num = 1, units = "years")
one.month <- duration(num = 1, units = "months")
# Calculate the difference in years as float and integer
dfn$diff.years <- new_interval(dfn$DOB, dfn$Today) / one.year
dfn$years <- floor( new_interval(dfn$DOB, dfn$Today) / one.year )
# Calculate the modulo for number of months
dfn$diff.months <- round( new_interval(dfn$DOB, dfn$Today) / one.month )
dfn$months <- dfn$diff.months %% 12
# Paste the years and months together
# I am not using the decimal point so as not to imply this is
# a numeric representation of the diference
dfn$y.m <- paste(dfn$years, dfn$months, sep = '|')
# convert Today and DOB to character so as to preserve them in melting
dfn$Today <- as.character(dfn$Today)
dfn$DOB <- as.character(dfn$DOB)
# melt using string representation of difference between the two dates
dfn2 <- dfn[,c("Today", "DOB", "Patient", "y.m")]
mdfn2 <- melt(dfn2, id=c('Patient'))
# alternative melt using numeric representation of difference in years
dfn3 <- dfn[,c("Today", "DOB", "Patient", "diff.years")]
mdfn3 <- melt(dfn3, id=c('Patient'))

Related

Create n different dates in consecutive months from a starting year-month

I have a starting time specified as a year-month character, e.g. "2020-12". From the start, for each of T consecutive months, I need to generate n different dates (year-month-day), where the day is random.
Any help will be useful!
The data I'm working on:
data <- data.frame(
data = sample(seq(as.Date('2000/01/01'), as.Date('2020/01/01'), by="day"), 500),
price = round(runif(500, min = 10, max = 20),2),
quantity = round(rnorm(500,30),0)
)
func <- function(start, months, n) {
startdate <- as.Date(paste0(start, "-01"))
enddate <- seq(startdate, by = "month", length.out = months)
months <- seq_len(months)
enddate_lt <- as.POSIXlt(enddate)
enddate_lt$mon <- enddate_lt$mon + 1
enddate_lt$mday <- enddate_lt$mday - 1
days_per_month <- as.integer(format(enddate_lt, format = "%d"))
days <- lapply(days_per_month, sample, size = n)
dates <- Map(`+`, enddate, days)
do.call(c, dates)
}
set.seed(2021)
func("2020-12", 4, 3)
# [1] "2020-12-08" "2020-12-07" "2020-12-15" "2021-01-27" "2021-01-08" "2021-01-13" "2021-02-21" "2021-02-07" "2021-02-28"
# [10] "2021-03-28" "2021-03-07" "2021-03-15"
func("2020-12", 5, 2)
# [1] "2020-12-06" "2020-12-16" "2021-01-08" "2021-01-10" "2021-02-24" "2021-02-13" "2021-03-20" "2021-03-29" "2021-04-19"
# [10] "2021-04-28"
func("2020-12", 2, 10)
# [1] "2020-12-29" "2020-12-30" "2020-12-04" "2020-12-15" "2020-12-09" "2020-12-27" "2020-12-05" "2020-12-06" "2020-12-23"
# [10] "2020-12-17" "2021-01-03" "2021-01-20" "2021-01-05" "2021-01-22" "2021-01-23" "2021-01-06" "2021-01-10" "2021-01-07"
# [19] "2021-01-19" "2021-01-12"
Most of the dancing with POSIXlt objects is because it gives us clean (base R) access to the number of days in a month, which makes sampleing the days in a month rather simple. It can also be done (code-golf shorter) using the lubridate package, but I don't know that that is any more correct than this code is.
This just dumps out a sequence of random dates, with n days per month. It does not sort within each month, though it does output the months in order. (That's not a difficult extension, there just wasn't a requirement for it.) It doesn't put out a frame, you can easily extend this to fit in a frame or call data.frame(date = do.call(c, dates)) on the last line, depending on what you need to do with the output.
You could convert the start time to a class for monthly data, zoo::yearmon. Then use as.Date.yearmon and its frac argument ("a number between 0 and 1 inclusive that indicates the fraction of the way through the period that the result represents") with random values from runif (uniform between 0 and 1) to convert to a random date within each year-month.
start = "2020-12"
T = 3
n = 2
library(zoo)
set.seed(1)
as.Date(as.yearmon(start) + rep((1:T)/12, each = n), frac = runif(T * n))
# [1] "2021-01-08" "2021-01-12" "2021-02-16" "2021-02-25" "2021-03-07" "2021-03-27"

Construction of a random date generator with weighted weekends in R

I am performing creel surveys and am attempting to construct a random date generator that weights the weekends higher than the weekdays. So far I have a simplistic random date generator that does not take into account the day type. We expect more angling pressure on the weekends (as that is when more people have time to fish) but do not have a way to select random days without including bias. I would like to select 15 days within a given month.
I've already generated a simplistic random date generator:
dates <- data.frame(seq.Date(as.Date(day.start),as.Date(day.end),by="day"))
dates
sample(dates$seq.Date.as.Date.day.start...as.Date.day.end...by....day.., size = 15, replace = FALSE)
[1] "2019-11-10" "2019-11-06" "2019-11-04" "2019-11-27" "2019-11-30" "2019-11-15"
[7] "2019-11-18" "2019-11-21" "2019-11-13" "2019-11-01" "2019-11-19" "2019-11-25"
[13] "2019-11-07" "2019-11-02" "2019-11-23"
Ideally I would have an end product that allows me to input the month start and end and outputs 15 random days.
Explanation in comments in code below:
# Generate initial data; as in question
day_start <- as.Date("2010-10-01")
day_end <- as.Date("2010-10-31")
dates <- data.frame(date = seq.Date(day_start, day_end,by="day"))
# Determine inclusion probabilities for each date; give weekend a higher
# probability.
dates$day <- as.numeric(format(dates$date, "%u"))
dates$psamp <- ifelse(dates$day >= 6, 0.2, 0.1)
# Make sure probabilites add up to requires sample size
samplesize <- 15
dates$psamp <- dates$psamp * samplesize/sum(dates$psamp)
# Do not use sample for sampling without replacement with unequal probabilities!
# The sampling package has a large number of routines for sampling without
# replacement and unequal probabilites. The following gives a fixed size sample
# (sum dates$psamp)
library(sampling)
dates$selected <- UPrandomsystematic(dates$psamp)
As for the reason why I don't use sample see, for example, https://stat.ethz.ch/pipermail/r-help/2008-February/153601.html.
Here's a somewhat general function that does what you want. It takes the start day, end day, and the weight (relative to 1) that you want to put on weekends as its own arguments, and passes on other additional arguments (size, replace, etc.) to sample. No dependencies other than base R.
However, if sampling without replacement, you may want to use the sampling package as recommended in Jan van der Laan's answer.
rday = function(
start_day = as.Date("2019-01-01"),
end_day = as.Date("2019-01-31"),
weekend_weight = 2,
...
) {
if (! "Date" %in% class(start_day)) start_day = as.Date(start_day)
if (! "Date" %in% class(end_day)) end_day = as.Date(end_day)
dates = seq(start_day, end_day, by = "1 day")
weights = rep(1, length(dates))
weights[weekdays(dates) %in% c("Saturday", "Sunday")] = 1
sample(dates, ..., prob = weights)
}
rday(size = 15)
# [1] "2019-01-24" "2019-01-07" "2019-01-21" "2019-01-15" "2019-01-27" "2019-01-04" "2019-01-30" "2019-01-12"
# [9] "2019-01-11" "2019-01-08" "2019-01-20" "2019-01-01" "2019-01-03" "2019-01-19" "2019-01-29"

Difficulty aggregating R POSIXlt list :Invalid type list error message

I am trying to aggregate quarterly hour data but I am getting the error message invalid type (list). The list is a POSIXlt list and I have aggregated minutely and hourly data before but I have never seen this error before. Do I need to convert the list to a different type and if so, would I still be able to extract the 15min data? Here is my code, I would really appreciate any help:
seq_start <- as.POSIXct("2015-09-10 01:00:00 BST")
Arrivals <- floor(runif(60, min = 1, max = 14))
Minute_Seq <- seq(trunc(seq_start, units='mins'), by='1 mins',length = 60)
Arrival_board = data.frame(Minute_Seq,Arrivals)
Arrival_board$QTR= as.POSIXlt(round(as.double(Arrival_board$Minute_Seq)/(5*60))*(5*60),origin=(as.POSIXlt('1970-01-01')))
arrive_stats <- aggregate(Arrival_board$Arrivals ~ Arrival_board$QTR, Arrival_board, FUN=mean)
POSIXlt is a list type, use POSIXct instead:
aggregate(Arrivals ~ QTR, transform(Arrival_board, QTR=as.POSIXct(QTR)), FUN=mean)
Here is an alternative to binning your data via your QTR expression. It uses the seq.Date and cut command. It is more straight forward than divide round and multiple:
seq_start <- as.POSIXct("2015-09-10 01:00:00 BST")
Arrivals <- floor(runif(60, min = 1, max = 14))
Minute_Seq <- seq(trunc(seq_start, units='mins'), by='1 mins',length = 60)
Arrival_board = data.frame(Minute_Seq,Arrivals)
QTR= seq(trunc(seq_start, units='mins'), by='5 mins',length = 13)
Arrival_board$QTR = cut(Arrival_board$Minute_Seq,QTR)
arrive_stats <- aggregate(Arrival_board$Arrivals ~ QTR, Arrival_board, FUN=mean)
Do to the differences on how the bins are defined there will be slight shift in the results. To correct for this case with a 5 minute window, change the seq_start time by 2 minutes:
QTR= seq(trunc(seq_start-(2*60), units='mins'), by='5 mins',length = 14)

R Programming 30 day Months

I'm currently writing a script in the R Programming Language and I've hit a snag.
I have time series data organized in a way where there are 30 days in each month for 12 months in 1 year. However, I need the data organized in a proper 365 days in a year calendar, as in 30 days in a month, 31 days in a month, etc.
Is there a simple way for R to recognize there are 30 days in a month and to operate within that parameter? At the moment I have my script converting the number of days from the source in UNIX time and it counts up.
For example:
startingdate <- "20060101"
endingdate <- "20121230"
date <- seq(from = as.Date(startingdate, "%Y%m%d"), to = as.Date(endingdate, "%Y%m%d"), by = "days")
This would generate an array of dates with each month having 29 days/30 days/31 days etc. However, my data is currently organized as 30 days per month, regardless of 29 days or 31 days present.
Thanks.
The first 4 solutions are basically variations of the same theme using expand.grid. (3) uses magrittr and the others use no packages. The last two work by creating long sequence of numbers and then picking out the ones that have month and day in range.
1) apply This gives a series of yyyymmdd numbers such that there are 30 days in each month. Note that the line defining yrs in this case is the same as yrs <- 2006:2012 so if the years are handy we could shorten that line. Omit as.numeric in the line defining s if you want character string output instead. Also, s and d are the same because we have whole years so we could omit the line defining d and use s as the answer in this case and also in general if we are always dealing with whole years.
startingdate <- "20060101"
endingdate <- "20121230"
yrs <- seq(as.numeric(substr(startingdate, 1, 4)), as.numeric(substr(endingdate, 1, 4)))
g <- expand.grid(yrs, sprintf("%02d", 1:12), sprintf("%02d", 1:30))
s <- sort(as.numeric(apply(g, 1, paste, collapse = "")))
d <- s[ s >= startingdate & s <= endingdate ] # optional if whole years
Run some checks.
head(d)
## [1] 20060101 20060102 20060103 20060104 20060105 20060106
tail(d)
## 20121225 20121226 20121227 20121228 20121229 20121230
length(d) == length(2006:2012) * 12 * 30
## [1] TRUE
2) no apply An alternative variation would be this. In this and the following solutions we are using yrs as calculated in (1) so we omit it to avoid redundancy. Also, in this and the following solutions, the corresponding line to the one setting d is omitted, again, to avoid redundancy -- if you don't have whole years then add the line defining d in (1) replacing s in that line with s2.
g2 <- expand.grid(yr = yrs, mon = sprintf("%02d", 1:12), day = sprintf("%02d", 1:30))
s2 <- with(g2, sort(as.numeric(paste0(yr, mon, day))))
3) magrittr This could also be written using magrittr like this:
library(magrittr)
expand.grid(yr = yrs, mon = sprintf("%02d", 1:12), day = sprintf("%02d", 1:30)) %>%
with(paste0(yr, mon, day)) %>%
as.numeric %>%
sort -> s3
4) do.call Another variation.
g4 <- expand.grid(yrs, 1:12, 1:30)
s4 <- sort(as.numeric(do.call("sprintf", c("%d%02d%02d", g4))))
5) subset sequence Create a sequence of numbers from the starting date to the ending date and if each number is of the form yyyymmdd pick out those for which mm and dd are in range.
seq5 <- seq(as.numeric(startingdate), as.numeric(endingdate))
d5 <- seq5[ seq5 %/% 100 %% 100 %in% 1:12 & seq5 %% 100 %in% 1:30]
6) grep Using seq5 from (5)
d6 <- as.numeric(grep("(0[1-9]|1[0-2])(0[1-9]|[12][0-9]|30)$", seq5, value = TRUE))
Here's an alternative:
date <- unclass(startingdate):unclass(endingdate) %% 30L
month <- rep(1:12, each = 30, length.out = NN <- length(date))
year <- rep(1:(NN %/% 360 + 1), each = 360, length.out = NN)
(of course, we can easily adjust by adding constants to taste if you want a specific day to be 0, or a specific month, etc.)

How to change time into time intervals in R? [duplicate]

This question already has an answer here:
Split time series data into time intervals (say an hour) and then plot the count
(1 answer)
Closed 7 years ago.
Date,Time,Lots,Status
"10-28-15","00:04:50","13-09","1111111110000000"
"10-28-15","00:04:50","13-10","1111100000000000"
"10-28-15","00:04:50","13-11","1111111011100000"
"10-28-15","00:04:50","13-12","1111011111000000"
"10-28-15","00:04:57","13-13","1111111111000000"
"10-28-15","00:04:57","13-14","1111111111110000"
"10-28-15","00:04:57","13-15","1111111100000000"
"10-28-15","00:04:57","13-16","1111111111000000"
"10-28-15","00:05:04","13-17","1111111110000000"
"10-28-15","00:05:04","13-18","1111101100000000"
"10-28-15","00:05:04","13-19","1111111111100000"
"10-28-15","00:05:04","13-20","1111111111100000"
"10-28-15","00:05:11","13-21","1111110100000000"
"10-28-15","00:05:11","13-22","1000011111100000"
"10-28-15","00:05:11","13-23","1101011111110000"
"10-28-15","00:05:11","13-24","1111111111000000"
"10-28-15","00:05:19","13-25","1011000000000000"
"10-28-15","00:05:19","13-26","0000000000000000"
"10-28-15","00:05:19","13-27","1111011110000000"
"10-28-15","00:05:19","13-28","1010000000000000"
say dfrm above "sample", I need to convert the time into time interval of 15 minutes. How do I do that? Eg: I have 4 intervals for every hour. 00:04:50 will go into 00:04:45 - 00:04:50. Thanks!
I have tried using:
format(as.POSIXlt(as.POSIXct('2000-1-1', "UTC") + round(as.numeric(sample$V3)/300)*300), format = "%H:%M:%S")
I hope I understood your question correctly, but I think you could do it like this:
# example data frame:
myDat <- data.frame(date = rep("2010-08-15",30),
time = sprintf("%02i:%02i:%02i",rep(12:14,each=10),rep(c(0,15,16,29:31,44:45,59,1),3),sample(1:59)[30]))
#produce a datetime variable
myDat$datetime <- strptime(x = paste(myDat$date,myDat$time),format = "%Y-%m-%d %H:%M:%S")
# extract the minutes
myDat$min <- as.integer(as.character(myDat$datetime,format="%M"))
# find the interval to put them in
myDat$ival <- findInterval(myDat$min,c(15,30,45,60)); myDat$ival <- factor(myDat$ival)
levels(myDat$ival) <- c("00","15","30","45")
# concatenate minute interval and hour
myDat$timeIval <- sprintf("%s:%s:00",as.character(myDat$datetime,format="%H"),myDat$ival)
myDat[order(myDat$datetime),]
I'd first convert the data to POSIXct
time <- as.POSIXct(
strptime(paste0(df$column1, " ", df$column2), format="%m-%d-%y %H:%M:%S")
)
and then use seq.POSIXct and cut to generate a factor variable for the 15 min intervals.
interval <- cut(
time,
breaks = seq(starttime, endtime, as.difftime(15, units="mins))
)
You may want to add -Inf or Inf to the breaks to avoid generating NA values.

Resources