I want to write a vector into a file and then read it using Rstudio. The vector includes some large integers (numbers are of order 10^40) and it seems that It can not be written properly since when I want to read it I keep getting these errors:
"ReadList::readn: Invalid real number found when reading from "/Users/Research/RF_improvment/testNTT.txt."
and
"Part::partw: Part 1025 of {$Failed} does not exist.
Set::partw: Part 1025 of {Mod[$Failed + {$Failed}[[1025]], 115792089237316195423570985008687907853269984665640564039457584007913129461761]} does not exist."
Does anyone know how to write large numbers into a file using write function in R? I do not have a problem with calculations and the errors are for reading and writing into a file.
The Maximum integer R can work with can be found this way:
> .Machine$integer.max
# [1] 2147483647
>
So no write or read function within R can deal with integers of this magnitude:
# So when you compute large numbers using R
# they are computed with double precision:
options("scipen"=400, "digits"=4)
anum <- 10^40
bnum <- 9^40
# The above numbers are no longer integers,
# but rather floating values calculated with double precision:
str(anum)
num 10000000000000000304008240626848262428282
If you are using some packages to compute large powers of integers, the result is actually not an integer:
library(gmp)
bigN <- as.bigz(2)^40
bigN
# Big Integer ('bigz') :
# [1] 1099511627776
str(bigN)
# Class 'bigz' raw [1:20] 01 00 00 00 ...
If the goal however is to save these values into a text file and then read them back, then the following approach can be taken:
# Create "big" numbers using gmp package
library(gmp)
bigA <- as.bigz(10)^40
bigB <- as.bigz(9)^40
bigA
# Big Integer ('bigz') :
# [1] 10000000000000000000000000000000000000000
#Save them as character vector:
# write them to a file
write.csv(data.frame(a=as.character(bigA), b=as.character(bigB)), "myfile.csv", row.names=FALSE)
# Let's take a look at the file
system("cat myfile.csv")
#"a","b"
#"10000000000000000000000000000000000000000","147808829414345923316083210206383297601"
# Read them back as a character strings first.
new.dt <- read.csv("myfile.csv", colClasses=c("character","character"))
str(new.dt)
# 'data.frame': 1 obs. of 2 variables:
# $ a: chr "10000000000000000000000000000000000000000"
# $ b: chr "147808829414345923316083210206383297601"
# Convert them back to "bigz" objects:
bigA.new <- as.bigz(new.dt$a)
bigB.new <- as.bigz(new.dt$b)
bigA
#Big Integer ('bigz') :
# [1] 10000000000000000000000000000000000000000
Related
I'm trying to create a sequence of integers from big numbers and couldn't find a way to suceed. Is there a way to do this ?
I tried :
(2^128):(2^128+3000) which returns: [1] 3.402824e+38
So I tried to use the gmp library :
library(gmp)
as.bigz(2^128):as.bigz(2^128+3000)
and got
messages: 1: In as.bigz(2^128):as.bigz(2^128 + 3000) : numerical
expression has 32 elements: only the first used 2: In
as.bigz(2^128):as.bigz(2^128 + 3000) : numerical expression has 32
elements: only the first used
Add your sequence to your "big number":
library(gmp)
as.bigz(2^128) + 0:3000
Big Integer ('bigz') object of length 3001:
[1] 340282366920938463463374607431768211456 340282366920938463463374607431768211457
[3] 340282366920938463463374607431768211458 340282366920938463463374607431768211459
[5] 340282366920938463463374607431768211460 340282366920938463463374607431768211461
# ...
We can use seq
library(gmp)
seq(as.bigz(2^128), length.out = 3001)
I'm in a bit of a pickle. I have a bunch (thousands) of .csv files where a few lines contain a vector of numbers instead of a single value that I need to read into a tibble or data frame with the vector as a character for further processing. For example:
"col1","col2","col3"
"a",1,integer(0)
"c",c(3,4),5
"e",6,7
should end up as
col1 col2 col3
<chr> <chr> <chr>
1 a 1 integer(0)
2 c c(3,4) 5
3 e 7 7
The vector is only ever in "col2" and contains integers. The vector usually contains 2 entries, but it could be more. In reality, there are two columns in the middle that could contain multiple entries, but I know the positions of both.
I can't work out how to read these to R successfully. read.csv or read_csv can't seem to handle them. Is there a way I could read in the files line by line (they're thankfully not long) and eval() the line, maybe, before splitting by commas? I thought about replacing c( with "c( and ) with )" in bash before reading the files in (and will have to do this to integer(.
Alternatively, I've thought of splitting the .csvs in bash into ones that contain "normal" lines and ones that contain the vectors (grep c() but I'm not sure how to then nest 2:length(-1) of the columns back into a vector.
However, I'd definitely prefer a method that was self-contained in R. Any ideas appreciated!
I typed your example into a csv file, then brought it in with read.csv and specified that column 2 is character. Using gsub I replace the letter c and the open and close parentheses. Then I loop through column 2 to find cases where a comma appears and convert those instances to a list of integers.
data <- read.csv("SO question.csv", colClasses = c("character","character","integer"))
data$col2 <- gsub("(c|\\(|\\))","",data$col2)
for (i in 1:nrow(data)) {
if (grepl(",", data$col2[i]) == TRUE) {
temp <- unlist(strsplit(data$col2[i],","))
data$col2[i] <- list(as.integer(temp))
}
}
data
It looks like both col2 and col3 have complex contents. Assuming that the possible complex contents are c(...) and integer(0) we enclose both in double quotes and read them in as character converting from character to list in the final line. (We have used the r'{...}' literal string constant notation introduced in R 4.0 to avoid double backslashes. Modify as needed if you are using an earlier version of R.)
library(dplyr)
DF <- "myfile.csv" %>%
readLines %>%
gsub(r'{(c\(.*?\)|integer\(0\))}', r'{"\1"}', .) %>%
read.csv(text = .) %>%
mutate(across(2:3, ~ lapply(., function(x) eval(parse(text = x)))))
giving:
> str(DF)
'data.frame': 3 obs. of 3 variables:
$ col1: chr "a" "c" "e"
$ col2:List of 3
..$ : num 1
..$ : num 3 4
..$ : num 6
$ col3:List of 3
..$ : int
..$ : num 5
..$ : num 7
Note
We assume the file is as shwon reproducibly below:
Lines <- "\"col1\",\"col2\",\"col3\"\n\"a\",1,integer(0)\n\"c\",c(3,4),5\n\"e\",6,7\n"
cat(Lines, file = "myfile.csv")
I'm working with zip codes, which of course have leading zeros. I am correctly loading my dataframe to preserve the leading zeros in R, but the upload step seems to fail. Here's what I mean:
Here's my minimal.csv file:
zip,val
07030,10
10001,100
90210,1000
60602,10000
Here's the R code
require("bigrquery")
filename <- "minimal.csv"
tablename <- "as_STRING"
ds <- bq_dataset(project='myproject', dataset="zips")
I am also correctly setting the type in my schema to expect them as strings.
# first pass
df <- read.csv(filename, stringsAsFactors=F)
# > df
# zip val
# 1 7030 10
# 2 10001 100
# 3 90210 1000
# 4 60602 10000
# uh oh! Let's fix it!
cols <- unlist(lapply(df, class))
cols[[1]] <- "character" # make zipcode a character
# then reload
df2 <- read.csv(filename, stringsAsFactors=F, colClasses=cols)
# > df2
# zip val
# 1 07030 10
# 2 10001 100
# 3 90210 1000
# 4 60602 10000
# much better! You can see my zips are now strings.
However, when I try to upload strings, the bigrquery interface complains that I am uploading integers, which they are not. Here's the schema, expecting strings:
# create schema
bq_table_create(bq_table(ds, tablename), fields=df2) # using df2, which has strings
# now prove it got the strings right:
> bq_table_meta(bq_table(ds, tablename))$schema$fields
[[1]]
[[1]]$name
[1] "zip"
[[1]]$type
[1] "STRING" # GOOD, ZIP IS A STRING!
[[1]]$mode
[1] "NULLABLE"
[[2]]
[[2]]$name
[1] "val"
[[2]]$type
[1] "INTEGER"
[[2]]$mode
[1] "NULLABLE"
Now it's time to upload....
bq_table_upload(bq_table(ds, tablename), df2) # using df2, with STRINGS
Error: Invalid schema update. Field zip has changed type from STRING to INTEGER [invalid]
Huh? What is this invalid schema update, and how can I stop it from trying to change my strings, which the data contains, and the schema is, to integers, which my data does not contain, and which the schema is not?
Is there a javascript serialization that's happening and turning my strings back to integers?
That is because BigQuery will auto-detect the schema when it is not specified. This could be solved by specifying fields argument, like this (see this similar question for more details):
bq_table_upload(bq_table(ds, tablename), df2,fields = list(bq_field("zip", "string"),bq_field("val", "integer")))
UPDATE:
Looking into the code,bq_table_upload is calling bq_perform_upload, which take the argument fields as schema. At the end, it parses the data frame as JSON file to upload it to the BigQuery.
Simply changing:
bq_table_upload(tab, df)
to
bq_table_upload(tab, df, fields=df)
works.
I found out that there is function called .hex.to.dec in the fBasics package.
When I do .hex.to.dec(a), it works.
I have a data frame with a column samp_column consisting of such values:
a373, 115c6, a373, 115c6, 176b3
When I do .hex.to.dec(samp_column), I get this error:
"Error in nchar(b) : 'nchar()' requires a character vector"
When I do .hex.to.dec(as.character(samp_column)), I get this error:
"Error in rep(base.out, 1 + ceiling(log(max(number), base =
base.out))) : invalid 'times' argument"
What would be the best way of doing this?
Use base::strtoi to convert hexadecimal character vectors to integer:
strtoi(c("0xff", "077", "123"))
#[1] 255 63 123
There is a simple and generic way to convert hex <-> other formats using "C/C++ way":
V <- c(0xa373, 0x115c6, 0xa373, 0x115c6, 0x176b3)
sprintf("%d", V)
#[1] "41843" "71110" "41843" "71110" "95923"
sprintf("%.2f", V)
#[1] "41843.00" "71110.00" "41843.00" "71110.00" "95923.00"
sprintf("%x", V)
#[1] "a373" "115c6" "a373" "115c6" "176b3"
As mentioned in #user4221472's answer, strtoi() overflows with integers larger than 2^31.
The simplest way around that is to use as.numeric().
V <- c(0xa373, 0x115c6, 0x176b3, 0x25cf40000)
as.numeric(V)
#[1] 41843 71110 95923 10149429248
As #MS Berends noted in the comments, "[a]lso notice that just printing V in the console will already print in decimal."
strtoi() has a limitation of 31 bits. Hex numbers with the high order bit set return NA:
> strtoi('0x7f8cff8b')
[1] 2139946891
> strtoi('0x8f8cff8b')
[1] NA
To get a signed value with 16 bits:
temp <- strtoi(value, base=16L)
if (temp>32767){ temp <- -(65535 - temp) }
In a general form:
max_unsigned <- 65535 #0xFFFF
max_signed <- 32767 #0x7FFF
temp <- strtoi(value, base=16L)
if (temp>max_signed){ temp <- -(max_unsigned- temp) }
Using the function fractions in the library MASS, I can convert a decimal to a fraction:
> fractions(.375)
[1] 3/8
But then how to I extract the numerator and denominator? The help for fractions mentions an attribute "fracs", but I can't seem to access it.
A character representation of the fraction is stored in an attribute:
x <- fractions(0.175)
> strsplit(attr(x,"fracs"),"/")
[[1]]
[1] "7" "40"
You can get the fracs attribute from your fraction object the following way, but it is just the character representation of your fraction :
x <- fractions(.375)
attr(x, "fracs")
# [1] "3/8"
If you want to access numerator and denominator values, you can just split the string with the following function :
getfracs <- function(frac) {
tmp <- strsplit(attr(frac,"fracs"), "/")[[1]]
list(numerator=as.numeric(tmp[1]),denominator=as.numeric(tmp[2]))
}
Which you can use this way :
fracs <- getfracs(x)
fracs$numerator
# [1] 3
fracs$denominator
# [1] 8