R uses the date "1970-01-01" as an origin. Does it make an exception from its typical 1-indexing to index dates with 0-indexing?
> x <- as.Date("1970-01-01")
> y <- as.Date("1970-01-02")
> unclass(x)
[1] 0
> unclass(y)
[1] 1
No. This is not an indexing thing. "Dates are represented as the number of days since 1970-01-01" (From the ?Date help page). Also note
unclass(as.Date("1969-12-31")) == -1
So it's not an index, it's a difference from a sentinel value. There's no underlying vector here.
Related
In dataset, I have date variable that has this format : "2020-01-01"
This variable is stored as "Date" format
This code works:
dataset[which(dataset$date_variable > 2020-01-01),]
This code also works:
dataset[which(dataset$date_variable > 2020-01-19),]
But together I get no output:
dataset[which(dataset$date_variable > 2020-01-01 & dataset$date_variable < 2020-01-19),]
# produce empty result
How I can correct this code? How in R to subset between date range? I should maybe convert variable type format?
2018-01-25 means 2018 minus 1 minus 25. Surround the dates with quotes since Date objects can be compared to character representations. Using the reproducible input in the Note at the end we have the following.
x[x > "2018-01-24" & x < "2018-01-26"]
## [1] "2018-01-25"
Note
x <- structure(c(17556, 17555, 17554), class = "Date")
x
## [1] "2018-01-25" "2018-01-24" "2018-01-23"
Could anyone explain please why in the first loop each element of my dates vector is a date while in the second each element of my dates vector is numeric?
Thank you!
x <- as.Date(c("2018-01-01", "2018-01-02", "2018-01-02", "2018-05-06"))
class(x)
# Loop 1 - each element is a Date:
for (i in seq_along(x)) print(class(x[i]))
# Loop 2 - each element is numeric:
for (i in x) print(class(i))
The elements are Date, the first loop is correct.
Unfortunately R does not consistently have the style of the second loop. I believe that the issue is that the for (i in x) syntax bypasses the Date methods for accessors like [, which it can do because S3 classes in R are very thin and don't prevent you from not using their intended interfaces. This can be confusing because something like for (i in 1:4) print(i) works directly, since numeric is a base vector type. Date is S3, so it is coerced to numeric. To see the numeric objects that are printing in the second loop, you can run this:
x <- as.Date(c("2018-01-01", "2018-01-02", "2018-01-02", "2018-05-06"))
for (i in x) print(i)
#> [1] 17532
#> [1] 17533
#> [1] 17533
#> [1] 17657
which is giving you the same thing as the unclassed version of the Date vector. These numbers are the days since the beginning of Unix time, which you can also see below if you convert them back to Date with that origin.
unclass(x)
#> [1] 17532 17533 17533 17657
as.Date(unclass(x), "1970-01-01")
#> [1] "2018-01-01" "2018-01-02" "2018-01-02" "2018-05-06"
So I would stick to using the proper accessors for any S3 vector types as you do in the first loop.
When you run:
for (i in seq_along(x)) print(class(x[i]))
You're using an iterator i over each element of x. Which means that each time you get the class of each iterated member of x.
However, when you run:
for (i in x) print(class(i))
You're looking for the class of each member. Using the ?Date:
Dates are represented as the number of days since 1970-01-01
Which is the reason why you get numeric as your class.
Moreover, if you'll use print() for each loop you'll get dates and numbers:
for (i in seq_along(x)) print(x[i])
[1] "2018-01-01"
[1] "2018-01-02"
[1] "2018-01-02"
[1] "2018-05-06"
and
for (i in x) print(i)
[1] 17532
[1] 17533
[1] 17533
[1] 17657
Lastly, if you want to test R's logic we can do something like that:
x[1] - as.Date("1970-01-01")
Taking the first element of x ("2018-01-01") and subtract "1970-01-01", which is the first date. Our output will be:
Time difference of 17532 days
If you look at ?'for', you'll see that for(var in seq) is only defined when seq is "An expression evaluating to a vector", and is.vector(x) is FALSE. So the documentation says (maybe not so clearly) that the behavior here is undefined, which is why the behavior is unexpected.
As joran mentions, as.vector(x) returns a numeric vector, same as unclass(x) mentioned by Calum You.
I am doing a simple operation of multiplying a decimal number and converting it to integer but the result seems to be different than expected. Apologies if this is discussed else where, I am not able to find any straight forward answers to this
> as.integer(1190.60 * 100)
[1] 119059
EDIT:
So, I have to convert that to character and then do as.integer to get what is expected
> temp <- 1190.60
> temp2 <- 1190.60 * 100
> class(temp)
[1] "numeric"
> class(temp2)
[1] "numeric"
> as.character(temp2)
[1] "119060"
> as.integer(temp2)
[1] 119059
> as.integer(as.character(temp2))
[1] 119060
EDIT2: According to the comments, thanks #andrey-shabalin
> temp2
[1] 119060
> as.integer(temp2)
[1] 119059
> as.integer(round(temp2))
[1] 119060
EDIT3: As mentioned in the comments the question is related to behaviour of as.integer and not about floating calculations
The answer to this is "floating point error". You can see this easily by checking the following:
> temp <- 1190.60
> temp2 <- 1190.60 * 100
> temp2 - 119060
[1] -1.455192e-11
Due to floating point errors, temp2 isn't exactly 119060 but :
> sprintf("%.20f", temp2)
[1] "119059.99999999998544808477"
If you use as.integer on a float, it works the same way as trunc, i.e. it does round the float in the direction of 0. So in this case that becomes 119059.
If you convert to character using as.character(), R will make sure that it uses maximum 15 significant digits. In this example that would be "119059.999999999". The next digit is another 9, so R will round this to 119060 before conversion. I avoid this in the code above by using sprintf() instead of as.character().
I have a data.table of data numbers in character format that I am trying to convert to numeric numbers. However the issue is that the numbers are very long and I want to retain all of the numbers without any rounding from R. For examle the first 5 elements of the data.table:
> TimeO[1]
[1] "20110630224701281482"
> TimeO[2]
[1] "20110630224701281523"
> TimeO[3]
[1] "20110630224701281533"
> TimeO[4]
[1] "20110630224701281548"
> TimeO[5]
[1] "20110630224701281762"
I wrote a function to convert from a character into numeric:
convert_time_fast <- function(tim){
b <- tim - tim%/%10^12*10^12
# hhmmssffffff
ms <- b%%10^6; b <-(b-ms)/10^6
ss <- b%%10^2; b <-(b-ss)/10^2
mm <- b%%10^2; hh <-(b-mm)/10^2
# if hours>=22, subtract 24 (previous day)
hh <- hh - (hh>=22)*24
return(hh+mm/60+ss/3600+ms/(3600*10^6))
}
However the rounding occurs in R so datapoints now have the same time. See first 5 elements after converting:
TimeOC <--convert_time_fast(as.numeric(TimeO))
> TimeOC[1]
[1] 1.216311
> TimeOC[2]
[1] 1.216311
> TimeOC[3]
[1] 1.216311
> TimeOC[4]
[1] 1.216311
> TimeOC[5]
[1] 1.216311
Any help figuring this out would be greatly appreciated!
You should test to see if they are really equal (all.equal()).
Usually R limits the number of digits it prints (usually to 7), but they are still there.
See also this example:
> as.numeric("1.21631114")
[1] 1.216311
> as.numeric("1.21631118")
[1] 1.216311
> all.equal(as.numeric("1.21631114"), as.numeric("1.21631118"))
[1] "Mean relative difference: 3.288632e-08" # which indicates they're not the same
I am experiment with R and came across an issue I don't fully understand.
dates = c("03-19-76", "04/19/76", as.character("04\19\76"), "05.19.76", "060766")
dates
[1] "03-19-76" "04/19/76" "04\0019>" "05.19.76" "060766"
Why should the third date be interpreted and what sort of interpretation is taking place. I also got this output when I left out the as.character function.
Thanks
Echoing the comments, make sure to escape backslashes in strings.
dates = c("03-19-76", "04/19/76", "04\\19\\76", "05.19.76", "060766")
> dates
[1] "03-19-76" "04/19/76" "04\\19\\76" "05.19.76" "060766"
Now that you've got the dates stored, there's actually a lot of built in functions you can use with dates. Dates even have their own object types! To do so use as.Date. Since you're using nonstandard date formats, you have to tell R how you've formatted them.
> as.Date(dates[1], "%m-%d-%y")
[1] "1976-03-19"
> as.Date(dates[2], "%m/%d/%y")
[1] "1976-04-19"
> as.Date("20\\10\\1999", "%d\\%m\\%Y")
[1] "1999-10-20"
a <- as.Date(dates[1], "%m-%d-%y")
b <- as.Date(dates[2], "%m/%d/%y")
> b - a
Time difference of 31 days
d <- as.numeric(b-a)
> d
[1] 31
> a + d^2
[1] "1978-11-05"
Note that since you're using 2-digit years, you use %y. If you used 4-digit years, you'd use %Y. If you forget, you'll get oddities like this:
> as.Date("03/14/2001", "%m/%d/%y")
[1] "2020-03-14"
> as.Date("03/14/10", "%m/%d/%Y")
[1] "0010-03-14"