I have a numeric variable imported from Oracle with 17 numbers, for example: 20172334534654667.
Now I imported it from Oracle using dbGetQuery() in R, but R use scientific notation: 2.01723e+16
If I try to convert the number using:
mydata$var <- format(mydata$a, scientific=FALSE)
I obtain 20172334534654600 instead of 20172334534654667
So, the last two numbers are always substituted with 00.
How can I solve it, possibly without using additional packages?
I was unable to replicate your issue, but I think it would probably be best to use formatC rather than format.
For your case, it could be:
numb <- 20172334534654667
numb
formatC(numb, format = "f", digits = 0)
Which gives:
[1] "20172334534654668"
Hopefully that works for you!
Related
I am new to R, please have mercy. I imported a table from an Access database via odbc:
df <- select(dbReadTable(accdb_path, name ="accdb_table"),"Col_1","Col_2","Col_3")
For
> typeof(df$Col_3)
I get
[1] "list"
Using library(dplyr.teradata). I converted blob to string (maybe already on the wrong path here):
df$Hex <- blob_to_string(df$Col_3)
and now end up with a column (typeof = character) full of Hex:
df[1,4]
[1] 49206765742061206c6f74206f662048657820616e642068617665207468652069737375652077697468207370656369616c2063687261637465727320696e204765726d616e206c616e6775616765206c696b65206e2b4150592d7
My question is, how to convert each value in Col_3 into proper Text (if possible, with respect to German special chracters like ü,ö, ä and ß).
I am aware of this solution How to convert a hex string to text in R?, but can't apply it properly:
df$Text <- rawToChar(as.raw(strtoi(df$Hex, 16L)))
Fehler in rawToChar(as.raw(strtoi(BinData$Hex, 16L))) :
Zeichenkette '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
Thx!
If I understand this correctly, what you want to do it to apply a function to each element of a list so that it returns a character vector (that you can add to a data frame, if you so wish).
This can be easily accomplished with the purrr family of functions. The following takes each element df$Col_3 and runs the function (with each element being the x in the given function)
purrr::map_chr(.x = df$Col_3,
.f = function(x) {rawToChar(as.raw(strtoi(x,16L)))})
You should probably achieve the same with base R functions such as lapply() followed by unlist(), or sapply() but with purrr it's often easier to find inconsistent results.
I dont want a function. I just want to have that be the default way in which the R interpreter always displays numbers. Thanks in advance.
While I'm not aware of a way to have your numbers always display with commas, there is a way to turn-off scientific notation for your session and then format your numerical output to show commas as a string.
Here's one possible solution:
# Load library
load(scales)
# Turn-off scientific notation for your R session
options(scipen = 999)
# An example vector of big numbers
x = c(1000000000000000, 2000000000000, 3000000000000)
# Use the scales::comma() function to add commas
# Output will be formated as a string
comma(x)
#> [1] "1,000,000,000,000,000" "2,000,000,000,000" "3,000,000,000,000"
I'm trying to read an Excel file into R.
I used read_excel function of the readxl package with parameter col_types = "text" since the columns of the Excel sheet contain mixed data types.
df <- read_excel("Test.xlsx",sheet="Sheet1",col_types = "text")
But it appears a very slight difference in the numeric value is introduced. It's always those few values so I think it's some hidden attributes in Excel.
I tried to format those values as numbers in Excel, and also tried add 0s after the number, but it won't work.
I changed the numeric value of a cell from 2.3 to 2.4, and it was read correctly by R.
This is a consequence of floating-point imprecision, but it's a little tricky. When you enter the number 1.2 (for example) into R or Excel, it's not represented exactly as 1.2:
print(1.2,digits=22)
## [1] 1.199999999999999955591
Excel and R usually try to shield you from these details, which are inevitable if you're using fixed precision floating-point values (which most computer systems do), by limiting the printing precision to a level that will ignore those floating-point imprecisions. When you explicitly convert to character, however, R figures you don't want to lose information, so it gives you all the digits. Numbers that can be represented exactly in a binary representation, such as 2.375, don't gain all those extra digits.
However, there's a simple solution in this case:
readxl::read_excel("Test.xlsx", na="ND")
This tells R that the string "ND" should be treated as a special "not available" value, so all of your numeric values get handled properly. When you examine your data, the tiny imprecisions will still be there, but R will print the numbers the same way that Excel does.
I feel like there's probably a better way to approach this (mixed-type columns are really hard to deal with), but if you need to 'fix' the format of the numbers you can try something like this:
x <- c(format(1.2,digits=22),"abc")
## [1] "1.199999999999999955591" "abc"
fix_nums <- function(x) {
nn <- suppressWarnings(as.numeric(x))
x[!is.na(nn)] <- format(nn[!is.na(nn)])
return(x)
}
fix_nums(x)
## [1] "1.2" "abc"
Then if you're using tidyverse you can use my_data %>% mutate_all(fix_nums)
I use the following multiplication in R (v. R-3.6.1): 115*1.044. I get 120.1. In Excel I get 120.06. By hand I get 120.062.
I select use options(digits=4) in R, but I still get the same result: 120.1.
Why does R behave like this? I use to trust it more than Excel, but it seems that here Excel is more accurate in what it returns. Is there a way to force R to return the accurate digits I would get if multiplying by hand?
The function format has the digits option referred to the total digits of the number considered as a whole (integer and decimal part):
> format(115*1.044, digits = 5)
[1] "120.06"
> format(115*1.044, digits = 4)
[1] "120.1"
As part of my dataset, one of the columns is a series of 24-digit numbers.
Example:
bigonumber <- 429382748394831049284934
When I import it using either data.table::fread or read.csv, it shows up as numeric in exponential format (EG: 4.293827e+23).
options(digits=...) won't work since the number is longer than 22 digits.
When I do
as.character(bigonumber)
what I get is "4.29382748394831e+23"
Is there a way to get bigonumber converted to a character string and show all of the digits as characters? I don't need to do any math on it, but I do need to search against it and do dplyr joins on it.
I need to this after import, since the column number varies from month to month.
(Yes, in the perfect world, my upstream data provider would use a hash instead of a long number and a static number of columns that stay the same every month, but I don't get to dictate that to them.)
You can specify colClasses on your fread or read.csv statement.
bignums
429382748394831049284934
429382748394831049284935
429382748394831049284936
429382748394831049284937
429382748394831049284938
429382748394831049284939
bignums <- read.csv("~/Desktop/bignums.txt", sep="", colClasses = 'character')
You can suppress the scientific notation with
options(scipen=999)
If you define the number then
bigonumber <- 429382748394831049284934
you can convert it into a string:
big.o.string <- as.character(bigonumber)
Unfortunately, this does not work because R converts the number to a double, thereby losing precision:
#[1] "429382748394831019507712"
The last digits are not preserved, as pointed out by #SabDeM. Even setting
options(digits=22)
doesn't help, and in any case 22 is the largest number that is allowed; and in your case there are 24 digits. So it seems that you will have to read the data directly as character or factor. Great answers have been posted showing how this can be achieved.
As a side note, there is a package called gmp that allows using arbitrarily large integer numbers. However, there is a catch: they have to be read as characters (again, in order to prevent R's internal conversion into double).
library(gmp)
bigonumber <- as.bigz("429382748394831049284934")
> bigonumber
Big Integer ('bigz') :
[1] 429382748394831049284934
> class(bigonumber)
[1] "bigz"
The advantage is that you can indeed treat these entries as numbers and perform calculations while preserving all the digits.
> bigonumber * 2
#Big Integer ('bigz') :
#[1] 858765496789662098569868
This package and my answer here may not solve your problem, because reading the numbers directly as characters is an easier way to achieve your goal, but I thought I might post this anyway as an information for users who may need to use large integers with more than 22 digits.
Use digest::digest on bigonumber to generate an md5 hash of the number yourself?
bigonumber <- 429382748394831049284934
hash_big <- digest::digest(bigonumber)
hash_big
# "e47e7d8a9e1b7d74af6a492bf4f27193"
I saw this before I posted my answer, but dont see it here anymore.
set options(scipen) to a big value so that there is no truncation:
options(scipen = 999)
bigonumber <- 429382748394831049284934
bigonumber
# [1] 429382748394831019507712
as.character(bigonumber)
# [1] "429382748394831019507712"
Use "scan" to read the file - the "what" parameter lets you define the input type of each column.
If you want numbers as numbers you can't print all values. The digits options allows a maximum of 22 digits. The range is from 1 to 22. It uses the print.default method. You can set it with:
options( digits = 22 )
Even with this options, the numbers will change. I ignore why that happens, most likely due to the fact that the object your are about to print (the number) is longer than the allowed amount of digits and so R does some weird stuff. I'll investigate about it.