R is changing my variable value by itself

R is changing my variable value by itself - r

I have a dataframe that has an id field with values as these two:
587739706883375310
587739706883375408
The problem is that, when I ask R to show these two numbers, the output that I get is the following:
587739706883375360
587739706883375360
which are not the real values of my ID field, how do I solve that?
For your information: I have executed options(scipen = 999) to R does not convert my number to a scientific notation.
This problem also happens in R console, if I enter these examples numbers I also get the same printing as shown above.
EDIT: someone asked
dput(yourdata$id)
I did that and the result was:
c(587739706883375360, 587739706883375360, 587739706883375488, 587739706883506560, 587739706883637632, 587739706883637632, 587739706883703040)
To compare, the original data in the csv file is:
587739706883375310,587739706883375408,587739706883375450,587739706883506509,587739706883637600,587739706883637629,587739706883703070
I also did the following test with one of these numbers:
> 587739706883375408
[1] 587739706883375360
> as.double(587739706883375408)
[1] 587739706883375360
> class(as.double(587739706883375408))
[1] "numeric"
> is.double(as.double(587739706883375408))
[1] TRUE

You can use the bit64 package to represent such large numbers:
library(bit64)
as.integer64("587739706883375408")
# integer64
# [1] 587739706883375408
as.integer64("587739706883375408") + 1
# integer64
# [1] 587739706883375409

Related

R read excel file numeric precision problem

I have a number in an excel file that is equal to -29998,1500000003
When I try to open it in R I get
> library(openxlsx)
> posotest <- as.character(read.xlsx("sofile.xlsx"))
> posotest
[1] "-29998.1500000004"
Any help? Desired result: -29998,1500000003
EDIT: with options(digits=13) I get -29998.150000000373 which could explain why the rounding is done, however even with options(digits=13) I get
> as.character(posotest)
[1] "-29998.1500000004"
Do you have any function that would allow me to get the full number in characters?
EDIT2 format does this but it adds artificial noise at the end.
x <- -29998.150000000373
format(x,digits=22)
[1] "-29998.15000000037252903"
How can I know how many digits to use in format since nchar will give me a wrong value?
The file is here

You can get a string with up to 22 digits of precision via format():
x <- -29998.150000000373
format(x,digits=22)
[1] "-29998.15000000037252903"
Of course, this will show you all sorts of ugliness related to trying to represent a decimal number in a binary representation with finite precision ...

Strange unexpected tokens inside the string

I have two simmingly identical strings in two data frames. For example, both
df_cont$winner[20]
df_assist$winner[609]
return "ivarovskaya"
But the comparison
identical(df_cont$winner[20], df_assist$winner[609])
returns FALSE.
So, dplyr joins don't work on them and when I count characters in those strings, I get different numbers.
Then I found out that copying those strings from View() panel into Rscript results in this:
Output of problem variables looks like this:
> df_cont$winner[20]
[1] "ivarovskaya"
> df_assist$winner[609]
[1] "ivarovskaya"
> nchar(df_cont$winner[20])
[1] 14
> nchar(df_assist$winner[609])
[1] 11
dput() function also results in identical strings:
> dput(df_cont$winner[20])
"ivarovskaya"
> dput(df_cont$winner[20])
"ivarovskaya"
How can I get rid of those strange red dots?

Why is Date is being returned as type 'double'?

I'm having some trouble working with the as.Date function in R. I have a vector of dates that I'm reading in from a .csv file that are coming in as a factor of integers or as character (depending on how I read in the file, but this doesn't seem to have anything to do with the issue), formatted as %m/%d/%Y.
I'm going through the file row by row, pulling out the date field and trying to convert it for use elsewhere using the following code:
tmpDtm <- as.Date(as.character(tempDF$myDate), "%m/%d/%Y")
This seems to give me what I want, for example, if I do this to a starting value of 12/30/2014, I get the value "2014-12-30" returned. However, if I examine this value using typeof(), R tells me that it its data type is 'double'. Additionally, if I try to bind this to other values and store it in a data frame using c() or cbind(), in the data frame, it winds up being stored as 16434, which looks to me like some sort of different internal storage value of a date. I'm pretty sure that's what it is too because if I try to convert that value again using as.Date(), it throws an error asking for an origin.
So, two questions: Is this as expected? If so, is there a more appropriate way to convert a date so that I actually end up with a date-typed object?
Thank you

Dates are internally represented as double, as you can see in the following example:
> typeof(as.Date("09/12/16", "%m/%d/%y"))
[1] "double"
it is still marked a class Date, as in
> class(as.Date("09/12/16", "%m/%d/%y"))
[1] "Date"
and because it is a double, you can do computations with it. But because it is of class Date, these computations lead to Dates:
> as.Date("09/12/16", "%m/%d/%y") + 1
[1] "2016-09-13"
> as.Date("09/12/16", "%m/%d/%y") + 31
[1] "2016-10-13"
EDIT
I have asked for c() and cbind(), because they can be assciated with some strange behaviour. See the following example, where switching the order within c changes not the type but the class of the result:
> c(as.Date("09/12/16", "%m/%d/%y"), 1)
[1] "2016-09-12" "1970-01-02"
> c(1, as.Date("09/12/16", "%m/%d/%y"))
[1] 1 17056
> class(c(as.Date("09/12/16", "%m/%d/%y"), 1))
[1] "Date"
> class(c(1, as.Date("09/12/16", "%m/%d/%y")))
[1] "numeric"
EDIT 2 - c() and cbind force objects to be of one type. The first edit shows an anomaly of coercion, but generally, the vector must be of one shared type. cbind shares this behavior because it coerces to matrix, which in turn coerces to a single type.
For more help on typeof and class see this link

This is as expected. You used typeof(); you probably should used class():
R> Sys.Date()
[1] "2016-09-12"
R> typeof(Sys.Date()) # this more or less gives you how it is stored
[1] "double"
R> class(Sys.Date()) # where as this gives you _behaviour_
[1] "Date"
R>
Minor advertisement: I have a new package anytime, currently in incoming at CRAN, which deals with this as it converts "anything" to POSIXct (via anytime()) or Date (via anydate().
E.g.:
R> anydate("12/30/2014") # no format needed
[1] "2014-12-30"
R> anydate(as.factor("12/30/2014")) # converts from factor too
[1] "2014-12-30"
R>

Read a CSV in R as a data.frame

I am new to R and trying to read a csv. The documentation shows a function read.csv(). However, when I read the file and check the type of the variable it shows a list. Documentation shows it as a data.frame. Can someone explain why it happens that way?
My code so far:
mytable<-read.csv(InputFile,header=TRUE,stringsAsFactors=FALSE)
dim(mytable)
typeof(mytable)
Output:
dim(mytable)
[1] 500 20
typeof(mytable)
[1] "list"

As it is explained in the answer https://stackoverflow.com/a/6258536/8900683.
In R every "object" has a mode and a class. The former represents how an object is stored in memory (numeric, character, list and function) while the later represents its abstract type.
For example:
d <- data.frame(V1=c(1,2))
class(d)
# [1] "data.frame"
mode(d)
# [1] "list"
typeof(d)
# list

Compute Column in R

What is the difference between the two statements below. They are rendering different outcomes, and since I am trying to come to R from SPSS, I am a little confused.
ds$share.all <- ds[132]/ ds[3]
mean(ds$share.all, na.rm=T)
and
ds$share.all2 <- ds$col1/ ds$Ncol2
mean(ds$share.all2, na.rm=T)
they render the same mean, but on the first, the output is printed as
col1
0.02669424
and the second only prints the .02xxxxx.
Any help will be much appreciated.

Indicating a column of a data frame with single brackets (your first example) produces a data frame with just that column, but using the $ operator (as in your second example) is just a vector. Printing something will print the names associated with it if it has names (the col1 in your first example). The data frame you get with ds[132] has a name attribute, but the vector you get with ds$col1 does not. The equivalent of ds$col1 would be to use double instead of single brackets: ds[[132]]. For example:
> x<-data.frame(1:10)
> names(x)<-"var"
> class(x$var)
[1] "integer"
> class(x[1])
[1] "data.frame"
> identical(x[1],x$var)
[1] FALSE
> identical(x[[1]],x$var)
[1] TRUE

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R is changing my variable value by itself - r

You can use the bit64 package to represent such large numbers: library(bit64) as.integer64("587739706883375408") # integer64 # [1] 587739706883375408 as.integer64("587739706883375408") + 1 # integer64 # [1] 587739706883375409

Related

R read excel file numeric precision problem

Strange unexpected tokens inside the string

Why is Date is being returned as type 'double'?

Read a CSV in R as a data.frame

Compute Column in R

Categories

Resources