Inconsistency with as.difftime - r

I can convert strings containing hour, minute or second specifications to a difftime:
> as.difftime("12 h", "%H")
Time difference of 12 hours
> as.difftime("12 m", "%M")
Time difference of 12 mins
> as.difftime("12 s", "%S")
Time difference of 12 secs
But I can't do so with a week specification, because there is no appropriate format …, although "weeks" is a legitimate unit of difftime:
> as.difftime("12 w", "%…")
Am I overlooking something?

Roland noted:
It doesn't work beyond hours. … obvious, if you study the code. The relevant part is
difftime(strptime(tim, format = format), strptime("0:0:0", format = "%X"), units = units)
. If you specify only a time for strptime it adds the current date.
Indeed it is easy to study the code in the R Console:
> as.difftime
function (tim, format = "%X", units = "auto")
{
if (inherits(tim, "difftime"))
return(tim)
if (is.character(tim)) {
difftime(strptime(tim, format = format), strptime("0:0:0",
format = "%X"), units = units)
}
else {
if (!is.numeric(tim))
stop("'tim' is not character or numeric")
if (units == "auto")
stop("need explicit units for numeric conversion")
if (!(units %in% c("secs", "mins", "hours", "days", "weeks")))
stop("invalid units specified")
.difftime(tim, units = units)
}
}
The crux of the matter is that the use of strptime causes a time interval character string given to as.difftime to be treated as a point in time, from which 0 h of the
current day is subtracted. This for various reasons makes as.difftime with a string unusable for days and weeks, for example a value of 0 days is not accepted by strptime, although it were perfectly valid as an interval.
A request for comments on the R-devel mailing list didn't elicit wide response. (I received some valuable thoughts from Emil Bode by private mail, thus I won't reproduce them here.) So, I'll refrain from proposing a change to as.difftime, also because changing it would introduce discrepances between R versions.

Related

converting dates in R is changing to future dates, not past

I use this:
want=as.Date(date, '%d-%b-%y')
to convert dates like this: 1-JAN-52
Instead of returning '1952-01-01' I am getting '2052-01-01'. Any advice?
Welcome to modern computers, all shaped after the early Unix systems of the 1970s. The start of time, so to speak, is the epoch, aka 1 Jan 1970.
Your problem here, in a nutshell, is the inferior input data. You only supply two years and by a widely followed convention, values less than 70 are taken for the next century. It's all about the epoch.
So you have two choices. You could preprend '19' to the year part and parse via %Y, or you could just take the year value out of the date and reduce it by 100 if need be.
Some example code for the second (and IMHO better) option, makeing 1970 the cutoff date:
> datestr <- "1-Jan-52"
> d <- as.Date(datestr, '%d-%b-%y')
>
> d
[1] "2052-01-01"
>
> if (as.integer(strftime(d, "%Y")) >= 1970) {
+ dp <- as.POSIXlt(d)
+ dp$year <- dp$year - 100
+ d <- as.Date(dp)
+}
> d
[1] "1952-01-01"
>
You need to go via POSIXlt to get the components easily.

Trying to extract a date from a 5 or 6-digit number

I am trying to extract a date from a number. The date is stored as the first 6 digits of a 11-digit personal ID-number (date-month-year). Unfortunately the cloud-based database (REDCap) output of this gets formatted as a number, so that the leading zero in those born on the first nine days of the month end up with a 10 digit ID number instead of a 11 digit one. I managed to extract the 6 or 5 digit number corresponding to the date, i.e. 311230 for 31st December 1930, or 11230 for first December 1930. I end up with two problems that I have not been able to solve.
Let's say we use the following numbers:
dato <- c(311230, 311245, 311267, 311268, 310169, 201104, 51230, 51269, 51204)
I convert these into string, and then apply the as.Date() function:
datostr <- as.character(dato)
datofinal <- as.Date(datostr, "%d%m%y")
datofinal
The problems i have are:
Five-digit numbers (eg 11230) gets reported as NA.
Six-digit numbers are recognized, but those born before 1.1.1969 gets reported with 100 years added, i.e. 010160 gets converted to 2060.01.01
I am sure this must be easy for those who are more knowledgeable about R, but, I struggle a bit solving this. Any help is greatly appreciated.
Greetings
Bjorn
If your 5-digit numbers really just need to be zero-padded, then
dato_s <- sprintf("%06d", dato)
dato_s
# [1] "311230" "311245" "311267" "311268" "310169" "201104" "051230" "051269" "051204"
From there, your question about "dates before 1969", take a look at ?strptime for the '%y' pattern:
'%y' Year without century (00-99). On input, values 00 to 68 are
prefixed by 20 and 69 to 99 by 19 - that is the behaviour
specified by the 2018 POSIX standard, but it does also say
'it is expected that in a future version the default century
inferred from a 2-digit year will change'.
So if you have specific alternate years for those, you need to add the century before sending to as.Date (which uses strptime-patterns).
dato_d <- as.Date(gsub("([0-4][0-9])$", "20\\1",
gsub("([5-9][0-9])$", "19\\1", dato_s)),
format = "%d%m%Y")
dato_d
# [1] "2030-12-31" "2045-12-31" "1967-12-31" "1968-12-31" "1969-01-31" "2004-11-20"
# [7] "2030-12-05" "1969-12-05" "2004-12-05"
In this case, I'm assuming 50-99 will be 1900, everything else 2000. If you need 40s or 30s, feel free to adjust the pattern: add digits to the second pattern (e.g., [3-9]) and remove from the first pattern (e.g., [0-2]), ensuring that all decades are included in exactly one pattern, not "neither" and not "both".
Borrowing from Allan's answer, I like that assumption of now() (since you did mention "born on"). Without lubridate, try this:
dato_s <- sprintf("%06d", dato)
dato_d <- as.Date(dato_s, format = "%d%m%y")
dato_d[ dato_d > Sys.Date() ] <-
as.Date(sub("([0-9]{2})$", "19\\1", dato_s[ dato_d > Sys.Date() ]), format = "%d%m%Y")
dato_d
# [1] "1930-12-31" "1945-12-31" "1967-12-31" "1968-12-31" "1969-01-31" "2004-11-20"
# [7] "1930-12-05" "1969-12-05" "2004-12-05"
You can make this a bit easier using lubridate, and noting that no-one can have a date of birth that is in the future of the current time:
library(lubridate)
dato <- dmy(sprintf("%06d", dato))
dato[dato > now()] <- dato[dato > now()] - years(100)
dato
#> [1] "1930-12-31" "1945-12-31" "1967-12-31" "1968-12-31" "1969-01-31"
#> [6] "2004-11-20" "1930-12-05" "1969-12-05" "2004-12-05"
Of course, without further information, this method will not (nor will any other method) be able to pick out the edge cases of people who are aged over 100. This might be easy to determine from the context.
Created on 2020-06-29 by the reprex package (v0.3.0)
Converting five digit "numbers" to six digits is straightforward: x <- stringr::str_pad(x, 6, pad="0") or similar will do the trick.
Your problem with years is the Millennium bug revisited. You'll have to consult with whoever compiled your data to see what assumptions they used.
I suspect all dates on or before 31Dec1970 are affected, not just those before 01Jan1960. That's because as.Date uses a default origin of 01Jan1970 when deciding how to handle two digit years. So your solution is to pick an appropriate origin in your conversion to fix this dataset. Something like d <- as.Date(x, origin="1900-01-01"). And then start using four digit years in the fiture! ;)

Descriptive statistics of time variables

I want to compute simple descriptive statistics (mean, etc) of times when people go to bed. I ran into two problems. The original data comes from an Excel file in which just the time that people went to bed, were typed in - in 24 hrs format. My problem is that r so far doesn't recognizes if people went to bed at 1.00 am the next day. Meaning that a person who went to bed at 10 pm is 3 hrs apart from the one at 1.00 am (and not 21 hrs).
In my dataframe the variable in_bed is a POSIXct format so I thought to apply an if-function telling that if the time is before 12:00 than I want to add 24 hrs.
My function is:
Patr$in_bed <- if(Patr$in_bed <= ) {
Patr$in_bed + 24*60*60
}
My data frame looks like this
in_bed
1 1899-12-30 22:13:00
2 1899-12-30 23:44:00
3 1899-12-30 00:08:00
If I run my function my variable gets deleted and the following error message gets printed:
Warning message:
In if (Patr$in_bed < "1899-12-30 12:00") { :
the condition has length > 1 and only the first element will be used
What do I do wrong or does anyone has a better idea? And can I run commands such as mean on variables in POSIXct format and if not how do I do it?
When you compare Patr$in_bed (vector) and "1899-12-30 12:00" (single value), you get a logical vector. But the IF statement requires a single logical, thus it generates a warning and consider only the first element of the vector.
You can try :
Patr$in_bed <- Patr$in_bed + 24*60*60 * (Patr$in_bed < as.POSIXct("1899-12-30 12:00"))
Explanations : the comparison in the parenthesis will return a logical vector, which will be converted to integer (0 for FALSE and 1 for TRUE). Then the dates for which the statement is true will have +24*60*60, and the others dates will have +0.
But since the POSIXct format includes the date, I don't see the purpose of adding 24 hrs. For instance,
as.POSIXct("1899-12-31 01:00:00") - as.POSIXct("1899-12-30 22:00:00")
returns a time difference of 3 hours, not 21.
To answer your last question, yes you can compute the mean of a POSIXct vector, simply with :
mean(Patr$in_bed)
Hope it helps,
Jérémy

Milliseconds in POSIXct Class

How can I parse milliseconds correctly?
as.POSIXct function works as following in my environment.
> as.POSIXct("2014-02-24 11:30:00.001")
[1] "2014-02-24 11:30:00.000 JST"
> as.POSIXct("2014-02-24 11:30:00.0011")
[1] "2014-02-24 11:30:00.001 JST"
My R version is x86 v3.0.2 for Windows.
Specify the input format, using %OS to represent the seconds with their fractional parts.
x <- c("2014-02-24 11:30:00.123", "2014-02-24 11:30:00.456")
y <- as.POSIXct(x, format = "%Y-%m-%d %H:%M:%OS")
When you come to display the value, append a number between 0 and 6 to the format string to tell R how many decimal places of seconds to display.
format(y, "%Y-%m-%d %H:%M:%OS6")
## [1] "2014-02-24 11:30:00.122999" "2014-02-24 11:30:00.456000"
(Note that you get rounding errors, and R's datetime formatting always rounds downwards, so if you show less decimal places it sometimes looks like you've lost a millisecond.)
Datetime formats are documented on the ?strptime help page. The relevant paragraph is:
Specific to R is '%OSn', which for output gives the seconds
truncated to '0 <= n <= 6' decimal places (and if '%OS' is not
followed by a digit, it uses the setting of
'getOption("digits.secs")', or if that is unset, 'n = 3').
Further, for 'strptime' '%OS' will input seconds including
fractional seconds. Note that '%S' ignores (and not rounds)
fractional parts on output.

Forcing full weeks with apply.weekly()

I'm trying to figure out what xts (or zoo) uses as the time after doing an apply.period. Consider the following:
> myTs = xts(1:10, as.Date(1:10, origin = '2012-12-1'))
> apply.weekly(myTs, colSums)
[,1]
2012-12-02 1
2012-12-09 35
2012-12-11 19
I think the '2012-12-02' means "for the week ending 2012-12-02, the sum is 1". So basically the time is the end of the week.
But the problem is with that "2012-12-11" - I think what it's doing is saying that the 11th is the last day of the week that was given, so it's giving that as the time.
Is there any way to force it to give the sunday on which it ends, even if that day was not included in the data set?
Try this:
nextsun <- function(x) 7 * ceiling(as.numeric(x-0+4) / 7) + as.Date(0-4)
aggregate(myTs, nextsun, sum)
where nextsun was derived from nextfri code given in the zoo quick reference by replacing 5 (for Friday) with 0 (for Sunday).
Those are full weeks. It's only showing you the date of the very last observation. See ?endpoints (apply.weekly, is essentially a thin wrapper for endpoints).
apply.weekly
function (x, FUN, ...)
{
ep <- endpoints(x, "weeks")
period.apply(x, ep, FUN, ...)
}
<environment: namespace:xts>
From ?endpoints
endpoints returns a numeric vector corresponding to the last
observation in each period specified by on, with a zero added to the
beginning of the vector, and the index of the last observation in x at
the end.
Valid values for the argument on include: “us” (microseconds),
“microseconds”, “ms” (milliseconds), “milliseconds”, “secs” (seconds),
“seconds”, “mins” (minutes), “minutes”, “hours”, “days”, “weeks”,
“months”, “quarters”, and “years”.
The answer to your second question is no, there is no option to do so. But you could always edit the last date manually, if you're going to present all data wrapped up anyways, I don't see any harm in it.
No you can't force it give you the sunday.
Because the index of the result of period.apply is given by
ep <- endpoints(myTs,'weeks')
myTs[ep]
[,1]
2012-12-02 2
2012-12-09 9
2012-12-10 10
So you need to shift the last date. Unfortunately xts don't offer this option, you can't shift a single value of the index. I don't know why (maybe a design choice get unique index)
e.g You can do the flowing:
ts.weeks <- apply.weekly(myTs, colSums)
ts.weeks[length(ts.weeks)] <- last(index(myTs)) + 7-last(floor(diff(ep)))

Resources