Date format changes - r

I am just preparing the some table like cols<- c("Metrics",as.Date(Sys.Date()-8,origin="1899-12-30"),as.Date(Sys.Date()-1,origin="1899-12-30")) , and I am not getting the expected output. Any one please help.
Output : "Metrics" "17927" "17934"
cols<- c("Metrics",as.Date(Sys.Date()-8,origin="1899-12-30"),as.Date(Sys.Date()-1,origin="1899-12-30"))
cols<- c("Metrics",as.Date(Sys.Date()-8,origin="1899-12-
30"),as.Date(Sys.Date()-1,origin="1899-12-30"))
Expected Output:
"Metrics" "2019-01-31" "2019-02-07"

1) character output If you are looking for a character vector as the result then convert the Date class components to character. Also note that the as.Date shown in the question is not needed since Sys.Date() and offsets from it are already of Date class. Further note that if Sys.Date() were called twice right at midnight it is possible that the two calls might occur on different days. To avoid this possibility we create a today variable so that it only has to be called once.
today <- Sys.Date()
cols <- c("Metrics", as.character(today-8), as.character(today-1))
cols
## [1] "Metrics" "2019-01-31" "2019-02-07"
1a) This could be made even shorter like this.
cols <- c("Metrics", as.character(Sys.Date() - c(8, 1)))
cols
## [1] "Metrics" "2019-01-31" "2019-02-07"
2) list output Alternately if what you want is a list with one character component and two Date components then:
today <- Sys.Date()
L <- list("Metrics", today - 8, today - 1)
L
giving:
[[1]]
[1] "Metrics"
[[2]]
[1] "2019-01-31"
[[3]]
[1] "2019-02-07"
If we already had L and wanted a character vector then we could further convert it like this:
sapply(L, as.character)
## [1] "Metrics" "2019-01-31" "2019-02-07"

Related

Loop over dates in R

I want to loop over a series of dates in R. Here's some sample code:
myDates <- seq.Date(as.Date("2020-01-01"), as.Date("2020-01-03"), by = "day")
myDates[1]
class(myDates[1])
This creates a vector of dates, and I confirm this by printing and checking the class of the first element.
However, when I run this loop:
for (myDate in myDates) print(myDate)
I get this output:
[1] 18262
[1] 18263
[1] 18264
Having checked out this question I've got some workarounds to solve my immediate issue, but can anyone explain to me why this happens, and if there's a simple way to iterate directly over a vector of dates?
The reason has been explained by #r2evans in the comments of your post. Actually you have a couple of methods to circumvent the issue, e.g.,
> d <- Map(print,myDates)
[1] "2020-01-01"
[1] "2020-01-02"
[1] "2020-01-03"
or
> for (myDate in as.character(myDates)) print(myDate)
[1] "2020-01-01"
[1] "2020-01-02"
[1] "2020-01-03"

R: What are dates in a dates vector: dates or numeric values? (difference between x[i] and i)

Could anyone explain please why in the first loop each element of my dates vector is a date while in the second each element of my dates vector is numeric?
Thank you!
x <- as.Date(c("2018-01-01", "2018-01-02", "2018-01-02", "2018-05-06"))
class(x)
# Loop 1 - each element is a Date:
for (i in seq_along(x)) print(class(x[i]))
# Loop 2 - each element is numeric:
for (i in x) print(class(i))
The elements are Date, the first loop is correct.
Unfortunately R does not consistently have the style of the second loop. I believe that the issue is that the for (i in x) syntax bypasses the Date methods for accessors like [, which it can do because S3 classes in R are very thin and don't prevent you from not using their intended interfaces. This can be confusing because something like for (i in 1:4) print(i) works directly, since numeric is a base vector type. Date is S3, so it is coerced to numeric. To see the numeric objects that are printing in the second loop, you can run this:
x <- as.Date(c("2018-01-01", "2018-01-02", "2018-01-02", "2018-05-06"))
for (i in x) print(i)
#> [1] 17532
#> [1] 17533
#> [1] 17533
#> [1] 17657
which is giving you the same thing as the unclassed version of the Date vector. These numbers are the days since the beginning of Unix time, which you can also see below if you convert them back to Date with that origin.
unclass(x)
#> [1] 17532 17533 17533 17657
as.Date(unclass(x), "1970-01-01")
#> [1] "2018-01-01" "2018-01-02" "2018-01-02" "2018-05-06"
So I would stick to using the proper accessors for any S3 vector types as you do in the first loop.
When you run:
for (i in seq_along(x)) print(class(x[i]))
You're using an iterator i over each element of x. Which means that each time you get the class of each iterated member of x.
However, when you run:
for (i in x) print(class(i))
You're looking for the class of each member. Using the ?Date:
Dates are represented as the number of days since 1970-01-01
Which is the reason why you get numeric as your class.
Moreover, if you'll use print() for each loop you'll get dates and numbers:
for (i in seq_along(x)) print(x[i])
[1] "2018-01-01"
[1] "2018-01-02"
[1] "2018-01-02"
[1] "2018-05-06"
and
for (i in x) print(i)
[1] 17532
[1] 17533
[1] 17533
[1] 17657
Lastly, if you want to test R's logic we can do something like that:
x[1] - as.Date("1970-01-01")
Taking the first element of x ("2018-01-01") and subtract "1970-01-01", which is the first date. Our output will be:
Time difference of 17532 days
If you look at ?'for', you'll see that for(var in seq) is only defined when seq is "An expression evaluating to a vector", and is.vector(x) is FALSE. So the documentation says (maybe not so clearly) that the behavior here is undefined, which is why the behavior is unexpected.
As joran mentions, as.vector(x) returns a numeric vector, same as unclass(x) mentioned by Calum You.

R use of '\' in a string

I am experiment with R and came across an issue I don't fully understand.
dates = c("03-19-76", "04/19/76", as.character("04\19\76"), "05.19.76", "060766")
dates
[1] "03-19-76" "04/19/76" "04\0019>" "05.19.76" "060766"
Why should the third date be interpreted and what sort of interpretation is taking place. I also got this output when I left out the as.character function.
Thanks
Echoing the comments, make sure to escape backslashes in strings.
dates = c("03-19-76", "04/19/76", "04\\19\\76", "05.19.76", "060766")
> dates
[1] "03-19-76" "04/19/76" "04\\19\\76" "05.19.76" "060766"
Now that you've got the dates stored, there's actually a lot of built in functions you can use with dates. Dates even have their own object types! To do so use as.Date. Since you're using nonstandard date formats, you have to tell R how you've formatted them.
> as.Date(dates[1], "%m-%d-%y")
[1] "1976-03-19"
> as.Date(dates[2], "%m/%d/%y")
[1] "1976-04-19"
> as.Date("20\\10\\1999", "%d\\%m\\%Y")
[1] "1999-10-20"
a <- as.Date(dates[1], "%m-%d-%y")
b <- as.Date(dates[2], "%m/%d/%y")
> b - a
Time difference of 31 days
d <- as.numeric(b-a)
> d
[1] 31
> a + d^2
[1] "1978-11-05"
Note that since you're using 2-digit years, you use %y. If you used 4-digit years, you'd use %Y. If you forget, you'll get oddities like this:
> as.Date("03/14/2001", "%m/%d/%y")
[1] "2020-03-14"
> as.Date("03/14/10", "%m/%d/%Y")
[1] "0010-03-14"

Vectorizing a function that uses strsplit

I am trying to make a function that converts time (in character form) to decimal format such that 1 corresponds to 1 am and 23 corresponds to 11 pm and 24 means the end of the day.
Here are the two function that does this. Here one function vectorizes while other do
time2dec <- function(time0)
{
time.dec <-as.numeric(substr(time0,1,2))+as.numeric(substr(time0,4,5))/60+(as.numeric(substr(time0,7,8)))/3600
return(time.dec)
}
time2dec1 <- function(time0)
{
time.dec <-as.numeric(strsplit(time0,':')[[1]][1])+as.numeric(strsplit(time0,':')[[1]][2])/60+as.numeric(strsplit(time0,':')[[1]][3])/3600
return(time.dec)
}
This is what I get...
times <- c('12:23:12','10:23:45','9:08:10')
#>time2dec(times)
[1] 12.38667 10.39583 NA
Warning messages:
1: In time2dec(times) : NAs introduced by coercion
2: In time2dec(times) : NAs introduced by coercion
#>time2dec1(times)
[1] 12.38667
I know time2dec which is vectorized, gives NA for the last element because it extracts 9: instead of 9 as hour. That is why I created time2dec1 but I do not know why it is not getting vectorized.
I will also be interested in getting a better function for doing what I am trying to do.
I saw this which explain a part of my question but does not provide a clue to do what I am trying.
Don't try to reinvent the wheel:
times1 <- difftime(as.POSIXct(times, "%H:%M:%S", tz="GMT"),
as.POSIXct("0:0:0", "%H:%M:%S", tz="GMT"),
units="hours")
#Time differences in hours
#[1] 12.386667 10.395833 9.136111
as.numeric(times1)
#[1] 12.386667 10.395833 9.136111
In the following we shall use this test vector:
ch <- c('12:23:12','10:23:45','9:08:10')
1) To fix up the solution in the question we prepend a 0 and then replace any string of 3 digits with the last two:
num.substr <- function(...) as.numeric(substr(...))
time2dec <- function(time0) {
t0 <- sub("\\d(\\d\\d)", "\\1", paste0(0, time0))
num.substr(t0, 1, 2) + num.substr(t0, 4, 5) / 60 + num.substr(t0, 7, 8) / 3600
}
time2dec(ch)
## [1] 12.386667 10.395833 9.136111
2) Parsing the string is slightly easier with strapply in the gsubfn package:
strapply(ch, "^(.?.):(..):(..)",
~ as.numeric(h) + as.numeric(m)/60 + as.numeric(s)/36000,
simplify = c)
## [1] 12.383667 10.384583 9.133611
3) We can reduce the string manipulation to just removing the colons and then convert the resulting character string to numeric so we can manipulate it numerically:
num <- as.numeric(gsub(":", "", ch))
num %/% 10000 + num %% 10000 %/% 100 / 60 + num %% 100 / 3600
## [1] 12.386667 10.395833 9.136111
4) The chron package has a "times" class that internally represents times as fractions of a day. Converting that to hours gives an easy solution:
library(chron)
24 * as.numeric(times(ch))
## [1] 12.386667 10.395833 9.136111
ADDED Added more solutions.
as.numeric( strptime(times, "%H:%M:%S")-strptime(Sys.Date(), "%Y-%m-%d" ))
[1] 12.386667 10.395833 9.136111
Basically the same as Roland's but bypassing some steps, and I try to avoid using difftime if I can. Had too many bugs arise because I don't really understand the function or the class ... or something. And when I timed it versus Roland's his was faster. Oh, well.
Emulating #G.Grothendieck's efforts (and essentially working similarly to his elegant strapply solution:
num <- apply( matrix(scan(text=gsub(":", " ", ch), what=numeric(0)),nrow=3), 2,
function(x) x[1]+x[2]/60 +x[3]/3600 )
#Read 9 items
num
#[1] 12.386667 10.395833 9.136111
And this actually answers the original question:
num <- sapply( strsplit(ch, ":"), function(x){ x2 <- as.numeric(x);
x2[1]+x2[2]/60 +x2[3]/3600})
num
#[1] 12.386667 10.395833 9.136111
The following does what you want
sapply(strsplit(times, ":"), function(d) {
sum(as.numeric(d)*c(1,1/60,1/3600))
})
Step by step:
strsplit(times, ":")
returns a list with character vectors. Each character vector contains the three part of the time (hour, minutes, seconds). We now want to convert each of the elements in the list to a numeric values. For this we need to apply a function to each element and put the results of the back into a vector which is what sapply does.
sapply(strsplit(times, ":", function(d) {
})
As for the function. We first need to convert the character values to numeris values using as.numeric. The we multiply the first element with 1, the second with 1/60 and the third with 1/3600 and add the results (for which we use sum). Resulting in
sapply(strsplit(times, ":"), function(d) {
sum(as.numeric(d)*c(1,1/60,1/3600))
})

How to avoid implicit character conversion when using apply on dataframe

When using apply on a data.frame, the arguments are (implicitly) converted to character. An example:
df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
class(df$t2[1])
## [1] "POSIXct" "POSIXt" (correct)
but:
apply(df, 1, function(y) class(y["t2"]))
## [1] "character" "character" "character" "character" "character" "character"
## [7] "character" "character" "character" "character"
Is there any way to avoid this conversion? Or do I always have to convert back through as.POSIXlt(y["t2"])?
edit
My df has 2 timestamps (say, t2 and t3) and some other fields (say, v1, v2). For each row with given t2, I want to find k (e.g. 3) rows with t3 closest to, but lower than t2 (and the same v1), and return a statistics over v2 from these rows (e.g. an average). I wrote a function f(t2, v1, df) and just wanted to apply it on all rows using apply(df, 1, function(x) f(y["t2"], y["v1"], df). Is there any better way to do such things in R?
Let's wrap up multiple comments into an explanation.
the use of apply converts a data.frame to a matrix. This
means that the least restrictive class will be used. The least
restrictive in this case is character.
You're supplying 1 to apply's MARGIN argument. This applies
by row and makes you even worse off as you're really mixing classes
together now. In this scenario you're using apply designed for matrices
and data.frames on a vector. This is not the right tool for the job.
In ths case I'd use lapply or sapply as rmk points out to grab the classes of
the single t2 column as seen below:
Code:
df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
sapply(df[, "t2"], class)
lapply(df[, "t2"], class)
## [[1]]
## [1] "POSIXct" "POSIXt"
##
## [[2]]
## [1] "POSIXct" "POSIXt"
##
## [[3]]
## [1] "POSIXct" "POSIXt"
##
## .
## .
## .
##
## [[9]]
## [1] "POSIXct" "POSIXt"
##
## [[10]]
## [1] "POSIXct" "POSIXt"
In general you choose the apply family that fits the job. Often I personally use lapply or a for loop to act on specific columns or subset the columns I want using indexing ([, ]) and then proceed with apply. The answer to this problem really boils down to determining what you want to accomplish, asking is apply the most appropriate tool, and proceed from there.
May I offer this blog post as an excellent tutorial on what the different apply family of functions do.
Try:
sapply(df, function(y) class(y["t2"]))
$v
[1] "integer"
$t
[1] "integer"
$t2
[1] "POSIXct" "POSIXt"

Resources