I want to loop over a series of dates in R. Here's some sample code:
myDates <- seq.Date(as.Date("2020-01-01"), as.Date("2020-01-03"), by = "day")
myDates[1]
class(myDates[1])
This creates a vector of dates, and I confirm this by printing and checking the class of the first element.
However, when I run this loop:
for (myDate in myDates) print(myDate)
I get this output:
[1] 18262
[1] 18263
[1] 18264
Having checked out this question I've got some workarounds to solve my immediate issue, but can anyone explain to me why this happens, and if there's a simple way to iterate directly over a vector of dates?
The reason has been explained by #r2evans in the comments of your post. Actually you have a couple of methods to circumvent the issue, e.g.,
> d <- Map(print,myDates)
[1] "2020-01-01"
[1] "2020-01-02"
[1] "2020-01-03"
or
> for (myDate in as.character(myDates)) print(myDate)
[1] "2020-01-01"
[1] "2020-01-02"
[1] "2020-01-03"
Related
I'm passing arguements from a shell script to an R script and R.
library(rLandsat) #used later in the script
args<-commandArgs(trailingOnly=TRUE)
max_date = Sys.Date()-as.numeric(args[1])
min_date = Sys.Date()-(as.numeric(args[1])+as.numeric(args[2]))
path<-as.numeric(as.character(args[4]))
row<-as.numeric(as.character(args[5]))
cloud<-as.numeric(as.character(args[6]))
foldername<-as.character(args[7])
for(i in args){
print(typeof(i))
}
print(args)
print(c(max_date,min_date,path,row,cloud,foldername))
for(i in c(max_date,min_date,path,row,cloud,foldername)){
print(typeof(i))
}
and R is for some reason converting the arguments to some type of date that is still a character. Here is the output from the script. args[3] is used later but I should probably check that too. I know the arg is already a character but it returned the same values with only as.numeric() for path row and cloud. The first two arguments are returned correctly "2016-12-31" "2016-01-01" but the others I would like the same value as the original argument returned. Will check out list() instead of c()
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "1048" "365" "Yellowstone" "38" "29"
[6] "20" "2016"
[1] "2016-12-31" "2016-01-01" "1970-02-08" "1970-01-30" "1970-01-21"
[6] "1975-07-10"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
c() was causing the error. See MichaelChirico's comment
You really should look into lubridate package to make operations involving dates https://github.com/rstudio/cheatsheets/raw/master/lubridate.pdf
As a rule of thumb when you combine data with different classes they are coerced to a format. This happens frequently in the c() command. In your case mixing dates and numerics might get you mixed results.
Could anyone explain please why in the first loop each element of my dates vector is a date while in the second each element of my dates vector is numeric?
Thank you!
x <- as.Date(c("2018-01-01", "2018-01-02", "2018-01-02", "2018-05-06"))
class(x)
# Loop 1 - each element is a Date:
for (i in seq_along(x)) print(class(x[i]))
# Loop 2 - each element is numeric:
for (i in x) print(class(i))
The elements are Date, the first loop is correct.
Unfortunately R does not consistently have the style of the second loop. I believe that the issue is that the for (i in x) syntax bypasses the Date methods for accessors like [, which it can do because S3 classes in R are very thin and don't prevent you from not using their intended interfaces. This can be confusing because something like for (i in 1:4) print(i) works directly, since numeric is a base vector type. Date is S3, so it is coerced to numeric. To see the numeric objects that are printing in the second loop, you can run this:
x <- as.Date(c("2018-01-01", "2018-01-02", "2018-01-02", "2018-05-06"))
for (i in x) print(i)
#> [1] 17532
#> [1] 17533
#> [1] 17533
#> [1] 17657
which is giving you the same thing as the unclassed version of the Date vector. These numbers are the days since the beginning of Unix time, which you can also see below if you convert them back to Date with that origin.
unclass(x)
#> [1] 17532 17533 17533 17657
as.Date(unclass(x), "1970-01-01")
#> [1] "2018-01-01" "2018-01-02" "2018-01-02" "2018-05-06"
So I would stick to using the proper accessors for any S3 vector types as you do in the first loop.
When you run:
for (i in seq_along(x)) print(class(x[i]))
You're using an iterator i over each element of x. Which means that each time you get the class of each iterated member of x.
However, when you run:
for (i in x) print(class(i))
You're looking for the class of each member. Using the ?Date:
Dates are represented as the number of days since 1970-01-01
Which is the reason why you get numeric as your class.
Moreover, if you'll use print() for each loop you'll get dates and numbers:
for (i in seq_along(x)) print(x[i])
[1] "2018-01-01"
[1] "2018-01-02"
[1] "2018-01-02"
[1] "2018-05-06"
and
for (i in x) print(i)
[1] 17532
[1] 17533
[1] 17533
[1] 17657
Lastly, if you want to test R's logic we can do something like that:
x[1] - as.Date("1970-01-01")
Taking the first element of x ("2018-01-01") and subtract "1970-01-01", which is the first date. Our output will be:
Time difference of 17532 days
If you look at ?'for', you'll see that for(var in seq) is only defined when seq is "An expression evaluating to a vector", and is.vector(x) is FALSE. So the documentation says (maybe not so clearly) that the behavior here is undefined, which is why the behavior is unexpected.
As joran mentions, as.vector(x) returns a numeric vector, same as unclass(x) mentioned by Calum You.
I am just preparing the some table like cols<- c("Metrics",as.Date(Sys.Date()-8,origin="1899-12-30"),as.Date(Sys.Date()-1,origin="1899-12-30")) , and I am not getting the expected output. Any one please help.
Output : "Metrics" "17927" "17934"
cols<- c("Metrics",as.Date(Sys.Date()-8,origin="1899-12-30"),as.Date(Sys.Date()-1,origin="1899-12-30"))
cols<- c("Metrics",as.Date(Sys.Date()-8,origin="1899-12-
30"),as.Date(Sys.Date()-1,origin="1899-12-30"))
Expected Output:
"Metrics" "2019-01-31" "2019-02-07"
1) character output If you are looking for a character vector as the result then convert the Date class components to character. Also note that the as.Date shown in the question is not needed since Sys.Date() and offsets from it are already of Date class. Further note that if Sys.Date() were called twice right at midnight it is possible that the two calls might occur on different days. To avoid this possibility we create a today variable so that it only has to be called once.
today <- Sys.Date()
cols <- c("Metrics", as.character(today-8), as.character(today-1))
cols
## [1] "Metrics" "2019-01-31" "2019-02-07"
1a) This could be made even shorter like this.
cols <- c("Metrics", as.character(Sys.Date() - c(8, 1)))
cols
## [1] "Metrics" "2019-01-31" "2019-02-07"
2) list output Alternately if what you want is a list with one character component and two Date components then:
today <- Sys.Date()
L <- list("Metrics", today - 8, today - 1)
L
giving:
[[1]]
[1] "Metrics"
[[2]]
[1] "2019-01-31"
[[3]]
[1] "2019-02-07"
If we already had L and wanted a character vector then we could further convert it like this:
sapply(L, as.character)
## [1] "Metrics" "2019-01-31" "2019-02-07"
I have a data.table of data numbers in character format that I am trying to convert to numeric numbers. However the issue is that the numbers are very long and I want to retain all of the numbers without any rounding from R. For examle the first 5 elements of the data.table:
> TimeO[1]
[1] "20110630224701281482"
> TimeO[2]
[1] "20110630224701281523"
> TimeO[3]
[1] "20110630224701281533"
> TimeO[4]
[1] "20110630224701281548"
> TimeO[5]
[1] "20110630224701281762"
I wrote a function to convert from a character into numeric:
convert_time_fast <- function(tim){
b <- tim - tim%/%10^12*10^12
# hhmmssffffff
ms <- b%%10^6; b <-(b-ms)/10^6
ss <- b%%10^2; b <-(b-ss)/10^2
mm <- b%%10^2; hh <-(b-mm)/10^2
# if hours>=22, subtract 24 (previous day)
hh <- hh - (hh>=22)*24
return(hh+mm/60+ss/3600+ms/(3600*10^6))
}
However the rounding occurs in R so datapoints now have the same time. See first 5 elements after converting:
TimeOC <--convert_time_fast(as.numeric(TimeO))
> TimeOC[1]
[1] 1.216311
> TimeOC[2]
[1] 1.216311
> TimeOC[3]
[1] 1.216311
> TimeOC[4]
[1] 1.216311
> TimeOC[5]
[1] 1.216311
Any help figuring this out would be greatly appreciated!
You should test to see if they are really equal (all.equal()).
Usually R limits the number of digits it prints (usually to 7), but they are still there.
See also this example:
> as.numeric("1.21631114")
[1] 1.216311
> as.numeric("1.21631118")
[1] 1.216311
> all.equal(as.numeric("1.21631114"), as.numeric("1.21631118"))
[1] "Mean relative difference: 3.288632e-08" # which indicates they're not the same
I am experiment with R and came across an issue I don't fully understand.
dates = c("03-19-76", "04/19/76", as.character("04\19\76"), "05.19.76", "060766")
dates
[1] "03-19-76" "04/19/76" "04\0019>" "05.19.76" "060766"
Why should the third date be interpreted and what sort of interpretation is taking place. I also got this output when I left out the as.character function.
Thanks
Echoing the comments, make sure to escape backslashes in strings.
dates = c("03-19-76", "04/19/76", "04\\19\\76", "05.19.76", "060766")
> dates
[1] "03-19-76" "04/19/76" "04\\19\\76" "05.19.76" "060766"
Now that you've got the dates stored, there's actually a lot of built in functions you can use with dates. Dates even have their own object types! To do so use as.Date. Since you're using nonstandard date formats, you have to tell R how you've formatted them.
> as.Date(dates[1], "%m-%d-%y")
[1] "1976-03-19"
> as.Date(dates[2], "%m/%d/%y")
[1] "1976-04-19"
> as.Date("20\\10\\1999", "%d\\%m\\%Y")
[1] "1999-10-20"
a <- as.Date(dates[1], "%m-%d-%y")
b <- as.Date(dates[2], "%m/%d/%y")
> b - a
Time difference of 31 days
d <- as.numeric(b-a)
> d
[1] 31
> a + d^2
[1] "1978-11-05"
Note that since you're using 2-digit years, you use %y. If you used 4-digit years, you'd use %Y. If you forget, you'll get oddities like this:
> as.Date("03/14/2001", "%m/%d/%y")
[1] "2020-03-14"
> as.Date("03/14/10", "%m/%d/%Y")
[1] "0010-03-14"