why as.numeric function in R doesn't work properly? - r

I have these two characters and the "as.numeric" function doesn't work same for them. Can anyone help me why this is happening?
options(digits=22)
a="27"
as.numeric(a)
[1] 27.00000000000000000000
a="193381411288395777"
as.numeric(a)
[1] 193381411288395776.0000
It can be seen that in the second case the last digit is not "7" and it is "6". Basically the "as.numeric" function decreases 1 unit from the number in the second case.
Any help is appreciated.

You need to learn about the limits of representation of exact numbers. R can tell you what it has:
R> .Machine
$double.eps
[1] 2.22045e-16
$double.neg.eps
[1] 1.11022e-16
$double.xmin
[1] 2.22507e-308
$double.xmax
[1] 1.79769e+308
$double.base
[1] 2
$double.digits
[1] 53
$double.rounding
[1] 5
$double.guard
[1] 0
$double.ulp.digits
[1] -52
$double.neg.ulp.digits
[1] -53
$double.exponent
[1] 11
$double.min.exp
[1] -1022
$double.max.exp
[1] 1024
$integer.max
[1] 2147483647
$sizeof.long
[1] 8
$sizeof.longlong
[1] 8
$sizeof.longdouble
[1] 16
$sizeof.pointer
[1] 8
R>

Use the int64 package:
library(int64)
> as.int64("193381411288395777")
[1] 193381411288395777

Related

Print a date range in a loop with correctly formatted dates [duplicate]

This question already has answers here:
Looping over a Date or POSIXct object results in a numeric iterator
(7 answers)
How to iterate over list of Dates without coercion to numeric?
(1 answer)
Closed 1 year ago.
Typing this into the console gives:
seq(as.Date('2020-04-02'), as.Date('2020-04-30'), by = 'day')
[1] "2020-04-02" "2020-04-03" "2020-04-04" "2020-04-05" "2020-04-06" "2020-04-07" "2020-04-08" "2020-04-09" "2020-04-10" "2020-04-11" "2020-04-12"
[12] "2020-04-13" "2020-04-14" "2020-04-15" "2020-04-16" "2020-04-17" "2020-04-18" "2020-04-19" "2020-04-20" "2020-04-21" "2020-04-22" "2020-04-23"
[23] "2020-04-24" "2020-04-25" "2020-04-26" "2020-04-27" "2020-04-28" "2020-04-29" "2020-04-30"
My loop:
for(i in seq(as.Date('2020-04-02'), as.Date('2020-04-30'), by = 'day')) {print(i)}
Gives:
[1] 18354
[1] 18355
[1] 18356
[1] 18357
[1] 18358
[1] 18359
[1] 18360
[1] 18361
[1] 18362
[1] 18363
[1] 18364
[1] 18365
[1] 18366
[1] 18367
[1] 18368
[1] 18369
[1] 18370
[1] 18371
[1] 18372
[1] 18373
[1] 18374
[1] 18375
[1] 18376
[1] 18377
[1] 18378
[1] 18379
[1] 18380
[1] 18381
[1] 18382
Expected actual dates.
Tried:
print(as.Date(i))
But this gives:
Error in as.Date.numeric(i) : 'origin' must be supplied
How can I print my date range via a loop?
Try:
for (i in as.list(seq(as.Date('2020-04-02'), as.Date('2020-04-30'), by = 'day'))) {
print(i)
}
I don't know why this is necessary, but if you run
for (i in Sys.Date()) {browser();print(i);}
# Called from: top level
# Browse[1]>
debug at #1: print(i)
# Browse[1]>
i
# [1] 18709
you'll see that i is being converted to numeric in the for (.) portion. The as.list helps preserve that class.
Another way is to supply the origin argument to as.Date:
for(i in seq(as.Date('2020-04-02'), as.Date('2020-04-30'), by = 'day')){
print(as.Date(i, origin="1970-01-01"))}
When R transforms a date into a numeric, it returns the number of days after 197-01-01. Other softwares use different origins.

How to display real dates in a loop in r

When I iterate over dates in a loop, R prints out the numeric coding of the dates.
For example:
dates <- as.Date(c("1939-06-10", "1932-02-22", "1980-03-13", "1987-03-17",
"1988-04-14", "1979-08-28", "1992-07-16", "1989-12-11"), tryFormats = c("%Y-%m-%d"))
for(d in dates){
print(d)
}
The output is as follows:
[1] -11163
[1] -13828
[1] 3724
[1] 6284
[1] 6678
[1] 3526
[1] 8232
[1] 7284
How do I get R to print out the actual dates?
So the output reads:
[1] "1939-06-10"
[1] "1932-02-22"
[1] "1980-03-13"
[1] "1987-03-17"
[1] "1988-04-14"
[1] "1979-08-28"
[1] "1992-07-16"
[1] "1989-12-11"
Thank you!
When you use dates as seq in a for loop in R, it loses its attributes.
You can use as.vector to strip attributes and see for yourself (or dput to see under the hood on the full object):
as.vector(dates)
# [1] -11163 -13828 3724 6284 6678 3526 8232 7284
dput(dates)
# structure(c(-11163, -13828, 3724, 6284, 6678, 3526, 8232, 7284), class = "Date")
In R, Date objects are just numeric vectors with class Date (class is an attribute).
Hence you're seeing numbers (FWIW, these numbers count days since 1970-01-01).
To restore the Date attribute, you can use the .Date function:
for (d in dates) print(.Date(d))
# [1] "1939-06-10"
# [1] "1932-02-22"
# [1] "1980-03-13"
# [1] "1987-03-17"
# [1] "1988-04-14"
# [1] "1979-08-28"
# [1] "1992-07-16"
# [1] "1989-12-11"
This is equivalent to as.Date(d, origin = '1970-01-01'), the numeric method for as.Date.
Funnily enough, *apply functions don't strip attributes:
invisible(lapply(dates, print))
# [1] "1939-06-10"
# [1] "1932-02-22"
# [1] "1980-03-13"
# [1] "1987-03-17"
# [1] "1988-04-14"
# [1] "1979-08-28"
# [1] "1992-07-16"
# [1] "1989-12-11"
There are multiple way you can handle this :
Loop over index of dates :
for(d in seq_along(dates)){
print(dates[d])
}
#[1] "1939-06-10"
#[1] "1932-02-22"
#[1] "1980-03-13"
#[1] "1987-03-17"
#[1] "1988-04-14"
#[1] "1979-08-28"
#[1] "1992-07-16"
#[1] "1989-12-11"
Or convert date to list and then print directly.
for(d in as.list(dates)) {
print(d)
}

Split c() inside of a string vector

I am working with a vector of strings in r. However, when I see the first item in the list I see this:
> uni_list[1]
[1] c("ENSMUSG00000000204", "ENSMUSG00000115878", "ENSMUSG00000116453", "ENSMUSG00000116134")
15940 Levels: c("ENSMUSG00000000204", "ENSMUSG00000115878", "ENSMUSG00000116453", "ENSMUSG00000116134")
How can I split this one in separate values?
Thanks in advance,
Juan
You can use split, i.e.
split(l3[[1]], seq(length(l3[[1]])))
$`1`
[1] "ENSMUSG00000000204"
$`2`
[1] "ENSMUSG00000115878"
$`3`
[1] "ENSMUSG00000116453"
$`4`
[1] "ENSMUSG00000116134"
where
l3
[[1]]
[1] "ENSMUSG00000000204" "ENSMUSG00000115878" "ENSMUSG00000116453" "ENSMUSG00000116134"

If minutes are doubles, why is the minimum of minutes 0?

I want to take the minimum of minute values.
> typeof(minutes(7))
[1] "double"
> min(c(minutes(7),minutes(8)))
[1] 0
> seconds(min(as.numeric(c(minutes(7),minutes(8)))))
[1] "420S"
Why is that?
After all it works with regular doubles:
> typeof(c(7.0,8.0))
[1] "double"
> min(c(7.0,8.0))
[1] 7

Rearranging list into data.frame

I scraped 99 user profiles from forums for my PhD research.
The output is a list with 99 elements. Since each user can decide for himself which information he or she is going to put on the profile there's a different number of information snippets attached to each element.
Here's a sample of the output (I also don't know why the numeration has all these $ and ' signs) :
$`77.1`
$`77.1`[[1]]
[1] "Username:"
$`77.1`[[2]]
[1] "*Username*"
$`77.1`[[3]]
[1] "*Username*"
$`77.1`[[4]]
[1] "Rank:"
$`77.1`[[5]]
[1] "*Rank*"
$`77.1`[[6]]
[1] "Groups:"
$`77.1`[[7]]
[1] "*Groups*"
$`77.1`[[8]]
[1] "Location:"
$`77.1`[[9]]
[1] "*Location*"
$`77.1`[[10]]
[1] ""
$`78.1`
$`78.1`[[1]]
[1] "Username:"
$`78.1`[[2]]
[1] "*Username*"
$`78.1`[[3]]
[1] "*Username*"
$`78.1`[[4]]
[1] "Rank:"
$`78.1`[[5]]
[1] "*Rank*"
$`78.1`[[6]]
[1] "Age:"
$`78.1`[[7]]
[1] "*AGE*"
$`78.1`[[8]]
[1] "Groups:"
$`78.1`[[9]]
[1] "*Groups*"
$`78.1`[[10]]
[1]"Interests in history:"
$`78.1`[[11]]
[1] "*Interests*"
$`78.1`[[12]]
[1] "Location:"
$`78.1`[[13]]
[1] "*Location*"
$`78.1`[[14]]
[1] ""
Is there a way to arrange this list into a data frame where each row consists of information from one element?
I tried to arrange them into a matrix, but this doesn't work well because the matrix needs a consistent amount of columns, which isn't given.
I would love it to look like this:
Id 1 2 3 4 5 6
1 Username: *Username* Rank *Rank* Groups: *Groups*
2 Username: *Username2* ...

Resources