Storing lubridate intervals in a dataframe - r

I'm trying to store some intervals in a dataframe. A cut down version of the code that does this is here:
DateHired <- c("29/09/14", "07/04/08", "18/06/09", "09/03/15", "30/05/11", "05/11/07", "08/09/08", "30/09/13", "10/08/09", "13/08/14", "18/09/06", "21/01/08", "05/12/11", "28/06/10", "19/07/10", "05/05/14", "26/08/09", "21/04/08", "19/10/09")
TerminationDate <- c("11/06/10", "10/02/10", "06/10/09", "02/04/15", "30/06/11", "10/11/07", "17/04/14", "04/10/13", "08/02/12", "11/06/10", "03/07/09", "11/06/10", "08/08/13", "23/12/10", "20/12/13", "11/06/10", "11/06/10", "05/12/08", "01/03/10")
tenures = data.frame(DateHired, TerminationDate, stringsAsFactors=FALSE)
tenures$isoStart <- as.Date(tenures$DateHired, format="%d/%m/%Y")
tenures$isoFinish <- as.Date(tenures$TerminationDate, format="%d/%m/%Y")
tenures$periods = apply(tenures, 1, function(x) interval(x['isoStart'], x['isoFinish']) )
This ends up with this result:
> tenures$periods
[1] -135734400 58233600 9504000 2073600 2678400 432000 176860800 345600 78796800 -131673600 88041600 75340800
[13] 52876800 15379200 108000000 -123033600 24969600 19699200 11491200
When I do the same but manually. I.e.
> interval(as.Date("29/09/14", format="%d/%m/%Y"),as.Date("29/09/15", format="%d/%m/%Y") )
[1] 14-09-29 10:04:52 LMT--15-09-29 10:04:52 LMT
it gives a lubridate interval.
There are ways that I can probably solve this in other ways, but I was hoping to use the intervals in the next part of the puzzle!

tenures$isoStart <- as.Date(tenures$DateHired, format="%d/%m/%y")
tenures$isoFinish <- as.Date(tenures$TerminationDate, format="%d/%m/%y")
tenures$periods = interval(tenures$isoStart, tenures$isoFinish)
Your date format "%d/%m/%Y" did not reflect the two-digit years in your data. The capital %Y is for four-digit years.
Also, the interval function is vectorized, meaning it will take the first element of each vector and create an interval, then move on to the second of each, and continue to the end.
head(tenures$periods)
#[1] 2014-09-28 20:00:00 EDT--2010-06-10 20:00:00 EDT 2008-04-06 20:00:00 EDT--2010-02-09 19:00:00 EST
#[3] 2009-06-17 20:00:00 EDT--2009-10-05 20:00:00 EDT 2015-03-08 20:00:00 EDT--2015-04-01 20:00:00 EDT
#[5] 2011-05-29 20:00:00 EDT--2011-06-29 20:00:00 EDT 2007-11-04 19:00:00 EST--2007-11-09 19:00:00 EST
Why didn't your first function work? Well it did work in a sense. The output is the span between the two dates, but the format/class was unexpected. Instead of the interval output, the number of seconds between the two dates were given.
For more on coercion and ?apply:
If X is not an array but an object of a class with a non-null dim
value (such as a data frame), apply attempts to coerce it to an array
via as.matrix if it is two-dimensional (e.g., a data frame) or via
as.array.
The function will work on data.frames, but with a warning that the results may not be what you expect after coercing to matrix. lapply is friendlier towards data frames and in this case, the function is already vectorised.

Related

How to convert length of time to numeric in R?

I have a data frame with the amount of time it takes to do a lap and I'm trying to separate that into individual data frames for each driver.
These time values look like this, being in minutes:seconds.milliseconds, except for the first lap which has a Colon in between seconds and milliseconds.
13:14:50 1:28.322 1:24.561 1:23.973 1:23.733 1:24.752
I'd like to have these in a separate data frame in a seconds format like this.
794.500 88.322 84.561 83.973 83.733 84.752
When I convert this to a numeric it gives the following values.
214 201 174 150 133 183
And when I use strptime or POSIXlt it gives me huge values which are also wrong, even when I use the format codes. However, I subtracted 2 values to find that the time difference was correct, and through that I found that were all off by 1609164020. Also, these values ignore the decimal values which I need.
You can use POSIXlt in conjunction with a conversion to seconds.
First, add a date to your first time element:
ds <- c("13:14:50", "1:28.322", "1:24.561", "1:23.973", "1:23.733", "1:24.752")
ds[1] <- paste( Sys.Date(), ds[1] )
#[1] "2020-12-29 13:14:50" "1:28.322" "1:24.561"
#[4] "1:23.973" "1:23.733" "1:24.752"
Create a function to convert the subsequent minutes:seconds.milliseconds to seconds.milliseconds:
to_sec <- function(x){ as.numeric(sub( ":.*","", x )) * 60 +
as.numeric( sub( ".*:","", x ) ) }
Convert the vector to dates that enable calculation of time differences:
ds[2:6] <- to_sec(ds[2:6])
ds[2:6] <- cumsum(ds[2:6])
dv <- c( as.POSIXlt(ds[1]), as.POSIXlt(ds[1]) + as.numeric(ds[2:6]) )
# [1] "2020-12-29 13:14:50 CET" "2020-12-29 13:16:18 CET"
# [3] "2020-12-29 13:17:42 CET" "2020-12-29 13:19:06 CET"
# [5] "2020-12-29 13:20:30 CET" "2020-12-29 13:21:55 CET"
dv[6] - dv[1]
# Time difference of 7.089017 mins

Convert A POSIXt Time To A Julian Day

I was looking for someway to transform a date in POSIXct format to Julian day
_
The Julian Day Number (JDN) is the integer assigned to a whole solar day in the Julian day count starting from noon Universal time, with Julian day number 0 assigned to the day starting at noon on Monday, January 1, 4713 BC, proleptic Julian calendar (November 24, 4714 BC, in the proleptic Gregorian calendar)
_
But I just figured out how to make a year day - 1 to 365 for leap year.
Could someone help me find some function that turns POSIXct dates (like: 2010-10-02 21:00:00) into Julian dates?
I have a column on a dataframe with several dates to be transformed into Julian days.
head(head(all_jub2$timestamp_adjusted)
[1] 2010-10-02 21:00:00 2010-10-03 03:00:00 2010-10-03 09:00:00 2010-10-03 15:00:00
[5] 2010-10-03 21:00:00 2010-10-04 03:00:00
6120 Levels: 2003-10-17 21:00:00 2003-10-18 03:00:00 ... 2020-01-10 09:00:00
The lubridate package makes it easy to deal with dates. Does this solve your issue?
library(lubridate)
date <- as.POSIXct('2010-10-02 21:00:00')
julian <- yday(date)
You can do this with base R provided you start at the right point. The text in your question your data is still a factor-type variable. That is not good -- you need to parse this, and for example the anytime() function of the anytime package can help you. See other questions here on that.
Back to the question. If you start with a POSIXct variable from current time, and for argument's sake an hour ago to have vector, we can a) convert it to Date and then b) from Date to a Julian:
R> input <- Sys.time() + c(-3600,0)
R> input
[1] "2020-09-29 13:07:50.225898 CDT" "2020-09-29 14:07:50.225898 CDT"
R> as.Date(input)
[1] "2020-09-29" "2020-09-29"
R> julian(as.Date(input))
[1] 18534 18534
attr(,"origin")
[1] "1970-01-01"
R>
So for today our Julian date is 18534. For your first two data points, we get 14884 and 14885.
R> input <- c("2010-10-02 21:00:00", "2010-10-03 03:00:00")
R> anytime::anydate(input)
[1] "2010-10-02" "2010-10-03"
R> julian(anytime::anydate(input))
[1] 14884 14885
attr(,"tzone")
[1] "America/Chicago"
attr(,"origin")
[1] "1970-01-01"
R>
If you only want to day of the year, you get that as the component yday of the POSIXlt representation but you need to adjust by one as it is zero based:
R> as.POSIXlt(anytime::anytime(input))$yday + 1
[1] 275 276
R>
Thanks for the answers, but i foun what i've been looking for using the package insol
library(insol)
julian_day <- insol::JD(as.POSIXct('2010-10-02 21:00:00'))
[1] 2455473

R - Formatting dates in dataframe - mix of decimal and character values

I have a date column in a dataframe. I have read this df into R using openxlsx. The column is 'seen' as a character vector when I use typeof(df$date).
The column contains date information in several formats and I am looking to get this into the one format.
#Example
date <- c("43469.494444444441", "12/31/2019 1:41 PM", "12/01/2019 16:00:00")
#What I want -updated
fixed <- c("2019-04-01", "2019-12-31", "2019-12-01")
I have tried many work arounds including openxlsx::ConvertToDate, lubridate::parse_date_time, lubridate::date_decimal
openxlsx::ConvertToDateso far works best but it will only take 1 format and coerce NAs for the others
update
I realized I actually had one of the above output dates wrong.
Value 43469.494444444441 should convert to 2019-04-01.
Here is one way to do this in two-step. Change excel dates separately and all other dates differently. If you have some more formats of dates that can be added in parse_date_time.
temp <- lubridate::parse_date_time(date, c('mdY IMp', 'mdY HMS'))
temp[is.na(temp)] <- as.Date(as.numeric(date[is.na(temp)]), origin = "1899-12-30")
temp
#[1] "2019-01-04 11:51:59 UTC" "2019-12-31 13:41:00 UTC" "2019-12-01 16:00:00 UTC"
as.Date(temp)
#[1] "2019-01-04" "2019-12-31" "2019-12-01"
You could use a helper function to normalize the dates which might be slightly faster than lubridate.
There are weird origins in MS Excel that depend on platform. So if the data are imported from different platforms, you may want to work woth dummy variables.
normDate <- Vectorize(function(x) {
if (!is.na(suppressWarnings(as.numeric(x)))) # Win excel
as.Date(as.numeric(x), origin="1899-12-30")
else if (grepl("A|P", x))
as.Date(x, format="%m/%d/%Y %I:%M %p")
else
as.Date(x, format="%m/%d/%Y %R")
})
For additional date formats just add another else if. Format specifications can be found with ?strptime.
Then just use as.Date() with usual origin.
res <- as.Date(normDate(date), origin="1970-01-01")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019-01-04" "2019-12-31" "2019-12-01"
class(res)
# [1] "Date"
Edit: To achieve a specific output format, use format, e.g.
format(res, "%Y-%d-%m")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019-04-01" "2019-31-12" "2019-01-12"
format(res, "%Y/%d/%m")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019/04/01" "2019/31/12" "2019/01/12"
To lookup the codes type ?strptime.

extract data based on datetime

I have two dataframes:
dat is a 9752x8 dataframe that contains some POSIXlt dates
trips.df is a 35772x28 dataframe that contains hourly temperature
data
I would like to save the corresponding temperature for each dates in dat.
I have tried:
trips.df$temperature<-lapply(trips.df$fin, function(x){
dat_meteo[dat_meteo$Date.Heure==round(x,"hours"),7]})
But I got this error, which makes me think that x is not passed as a datetime variable
Error in round(x, "hours") :
non-numeric argument to mathematical function
I have also tried this:
merge(trips.df,dat_meteo[,c(1,7)])
But I also got an error:
Error: cannot allocate vector of size 653.8 Mb
Any advice on how to retrieve data on dat_meteo by dates?
I am using R version 3.4.0 with RStudio Version 1.0.143 on Windows 10
And here are an exercpt of my data:
> head(trips.df$fin)
[1] "2013-06-25 16:34:16 EDT" "2013-06-25 16:34:16 EDT" "2013-06-26 13:00:05 EDT"
[4] "2013-06-29 12:52:21 EDT" "2013-06-29 15:34:13 EDT" "2013-06-29 17:39:29 EDT"
> dat_meteo[1870:1875,c(1,7)]
Date.Heure Temp...C.
1870 2013-03-19 18:00:00 -1,2
1871 2013-03-19 19:00:00 -1,7
1872 2013-03-19 20:00:00 -2,1
1873 2013-03-19 21:00:00 -2,8
1874 2013-03-19 22:00:00 -3,0
1875 2013-03-19 23:00:00 -3,7
You may want to take a slightly different approach and use data.table.
trips.dt <- data.table(trips.df)
dat <- data.table(dat)
trips.dt <- trips.dt[ , dates.a := strptime(as.POSIXct(fin,format='%m/%d/%Y %H:%M:%S'),format='%m/%d/%Y')][,dates.b := dates.a]
dat <- dat[dates.dat.a := strptime(as.POSIXct(Date.Heure, format = '%m/%d/%Y %H:%M:%S'),format='%m/%d/%Y')][, dates.dat.b := dates.dat.a]
setkey(trips.dt, id, dates.a, dates.b)
setkey(dat , id, dates.dat.a, dates.dat.b)
combo <- foverlaps(trips.df, dat, type = "within")
This creates date ranges for both trip.df and dat after converting it to a data.table, then merges trips.df to dat and stores the result as combo
Make sure that the two time columns you want to match have the same format (POSIXct). It is more straightforward to use the POSIXct format within a dataframe, as the POSIXlt format actually corresponds to a list of named elements whereas POSIXct is in vector form.
dat_meteo$Date.Heure=as.POSIXct(dat_meteo$Date.Heure,format="%Y-%m-%d %H:%M:%S")
Create a column in trips.df of times rounded to the closest hours, converting it to POSIXct too, as round converts POSIXct to POSIXlt:
trips.df$fin_r=as.POSIXct(round(trips.df‌​$fin,"hours"))
Then use merge:
res=merge(trips.df,dat_meteo[,c(1,7)],by.x="fin_r",by.y ="Date.Heure")

Subtract exactly one year from a POSIXct object

lets say we have this date "2014-05-11 14:45:00 UTC". I would like to get the exact POSIXct object for 1 year before so "2013-05-11 14:45:00 UTC".
My first thought is to create a whole new POSIXct object by subtracting one from the year bit and pasting it together with the remainder of the string and then creating a new POSIXct object with that string like so:
time <- as.POSIXct("2014-05-11 14:45:00 UTC",tz="UTC",origin="1970-01-01")
newTime <- as.POSIXct(paste(as.character(as.numeric(substr(time,1,4)) - 1),substr(time,5,19),sep=""),tz="UTC",origin="1970-01-01")
this works fine (except in case of leap years!) but the thing is I need to do this in a large data.table for each row and preferably put the results right back in data.table.
Is there any other way of subtracting a year off an object like this?
Some extra I need to apply this to a data.table like this one:
Time
1: 1349206200
2: 1349207100
3: 1349208000
4: 1349208900
5: 1349209800
6: 1349210700
7: 1349211600
8: 1349212500
9: 1349213400
10: 1349214300
11: 1349215200
but this happens when I do:
SOdata[,Time:=as.numeric(as.POSIXct(paste(as.character(as.numeric(substr(Time,1,4)) - 1),substr(Time,5,19),sep=""),tz="UTC",origin="1970-01-01"))]
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
I am guessing I need to use something like lapply, but I always mess up syntax when using that function. So does anyone know how?
lubridate is your friend.
library(lubridate)
time <- as.POSIXct("2014-05-11 14:45:00 UTC",tz="UTC",origin="1970-01-01")
time-dyears(1)
#[1] "2013-05-11 14:45:00 UTC"
time+dyears(1)
#[1] "2015-05-11 14:45:00 UTC"
For leap years
> x <- as.POSIXct(c("2012-02-28", "2012-02-29"), tz="UTC",origin="1970-01-01")
> x - dyears(1)
[1] "2011-02-28 UTC" "2011-03-01 UTC"
I haven't tested the other answers, but the following should work as required regardless of leap years:
time <- as.POSIXct("2014-05-11 14:45:00 UTC",tz="UTC",origin="1970-01-01")
time <- as.POSIXlt(time)
time$year <- time$year - 1
time <- as.POSIXct(time)
#[1] "2013-05-11 14:45:00 UTC"
With Gabor's leap year example:
time <- as.POSIXct("2012-02-29 14:45:00 UTC",tz="UTC",origin="1970-01-01")
time <- as.POSIXlt(time)
time$year <- time$year - 1
time <- as.POSIXct(time)
#[1] "2011-03-01 14:45:00 UTC"
seq in base can be used:
LastYr <- function(x) seq(x, length = 2, by = "-1 year")[2]
toPOSIXct <- function(x) as.POSIXct(x, origin = "1970-01-01")
# example 1
LastYr(as.POSIXct("2012-02-28"))
## [1] "2011-02-28 EST"
# example 2 - leap year
LastYr(as.POSIXct("2012-02-29"))
## [1] "2011-03-01 EST"
# example 3 - vector case
x <- as.POSIXct(c("2012-02-28", "2012-02-29")) # test data
toPOSIXct(sapply(x, LastYr))
## [1] "2011-02-28 EST" "2011-03-01 EST"
# example 4 - data.table shown in question
DT[, Time := sapply(toPOSIXct(Time), LastYr)]
Revised simplified using functions LastYr and toPOSIXct.
or you can try, in base R :
> time + as.difftime(52*7+1,units="days")
[1] "2015-05-11 14:45:00 UTC"
> time - as.difftime(52*7+1,units="days")
[1] "2013-05-11 14:45:00 UTC"
of course, it would be easier if units could be years...

Resources