Scientific notation last digits omitted as zeros when write.csv - r

I have a lot of long numbers and r reads them as scientific notation. But when I write.csv, the scientific notation becomes an incorrect number with a bunch of zeros following. For example, 3.894e+13 will become 38944400000000 after the write.csv.
I have exact numbers in the place where the zeros are.
How do I keep the exact number when exporting a data file?
[update]:
(1) The problem is because when I save as csv in excel, it loses digits of long numbers. It is an excel bug and I use excel 2016.
(2) when the above problem occurred, I tired to set options(scipen=999). When I summarize the data, the summary statistics are omitted always in this file. I tried other files, it (summary) works without losing precision. When I do print the numbers, it is correct, only the summary statistics are omitted after I set options.

Set the the scipen option to be a large enough number before writing the csv file is one way to make it work:
df = data.frame(x = 1232939143546532)
options(scipen = 30)
write.csv(df, "test.cv")
This gives the following:
"","x"
"1",1232939143546532

Related

R truncates numbers from Excel

I have a large dataset as a .csv file with one important column being a 14 digit number. When opening it in R or Excel, the number becomes truncated i.e. 83990969388422 becomes 8.4^13. I tried saving the file as an Excel worksheet file where the numbers are correctly displayed. However, as soon as I import it to R, the numbers become truncated.
How do I avoid this truncation in R?
To set the use of scientific notation in your entire R session, you can use the scipen option. From the documentation (?options)
options(scipen=999)
As suggested, you can use options(scipen = 999). However, this always prevents R from using scientific notation. So this is mostly usefull if you never want the scientific notation. If it is just specific variables you don't want in scientific notation, you could also mutate the variable using format(., scientific = FALSE) to turn off the scientific notation for that specific variable.

How to prevent R from dropping leading zeros in an integer vector

Is there any way of stopping R from dropping leading zeros in an integer? e.g.,
a<-c(00217,00007,00017)
I understand this is not the correct way of writing integers. Sadly I've been given a text file (person and non-R code are not around anymore) containing thousands of vectors in a single list:
list(drugA=c(...), drugB=c(....),........)
I need to keep the leading zeros as 00002 becomes 2. I could load these thousands of values in and then write a function to parse the list and convert into characters whilst correcting for any number that isn't five characters long but I was hoping for a speedy alternative.
UPDATE1
An example of the text file I've been provided:
list(CETUXIMAB=c(05142,05316),
DORNASEALFA=c(94074),
ETANERCEPT=c(05342,99075),
BIVALIRUDIN=c(04400,09177),
LEUPROLIDE=c(02074,03219,91035,91086),
PEGINTERFERONALFA2A=c(03162),
ALTEPLASE=c(00486,01032,03371,05314),
DARBEPOETINALFA=c(02217,03421),
GOSERELIN=c(99221),
RETEPLASE=c(00157),
ERYTHROPOIETIN=c(92078,92122))
I have truncated the list as there are thousands of vectors. This was a text file generated using a program written in C++ (code not available). Some of the values e.g., RETEPLASE=c(00157) becomes truncated to 157.
library(stringr)
str_pad(a, 5, pad = "0")

R studio numeric integer display format options

I don't want the display format like this: 2.150209e+06
the format I want is 2150209
because when I export data, format like 2.150209e+06 caused me a lot of trouble.
I did some search found this function could help me
formatC(numeric_summary$mean, digits=1,format="f").
I am wondering can I set options to change this forever? I don't want to apply this function to every variable of my data because I have this problem very often.
One more question is, can I change the class of all integer variables to numeric automatically? For integer format, when I sum the whole column usually cause trouble, says "integer overflow - use sum(as.numeric(.))".
I don't need integer format, all I need is numeric format. Can I set options to change integer class to numeric please?
I don't know how you are exporting your data, but when I use write.csv with a data frame containing numeric data, I don't get scientific notation, I get the full number written out, including all decimal precision. Actually, I also get the full number written out even with factor data. Have a look here:
df <- data.frame(c1=c(2150209.123, 10001111),
c2=c('2150209.123', '10001111'))
write.csv(df, file="C:\\Users\\tbiegeleisen\\temp.txt")
Output file:
"","c1","c2"
"1",2150209.123,"2150209.123"
"2",10001111,"10001111"
Update:
It is possible that you are just dealing with a data rendering issue. What you see in the R console or in your spreadsheet does not necessarily reflect the precision of the underlying data. For instance, if you are using Excel, you highlight a numeric cell, press CTRL + 1 and then change the format. You should be able to see full/true precision of the underlying data. Similarly, the number you see printed in the R console might use scientific notation only for ease of reading (SN was invented partially for this very reason).
Thank you all.
For the example above, I tried this:
df <- data.frame(c1=c(21503413542209.123, 10001111),
c2=c('2150209.123', '100011413413111'))
c1 in df is scientific notation, c2 is not.
then I run write.csv(df, file="C:\Users\tbiegeleisen\temp.txt").
It does out put all digits.
Can I disable scientific notation in R please? Because, it still cause me trouble, although it exported all digits to txt.
Sometimes I want to visually compare two big numbers. For example, if I run
df <- data.frame(c1=c(21503413542209.123, 21503413542210.123),
c2=c('2150209.123', '100011413413111'))
df will be
c1 c2
2.150341e+13 2150209.123
2.150341e+13 100011413413111
The two values for c1 are actually different, but I cannot differentiate them in R, unless I exported them to txt. The numbers here are fake numbers, but the same problem I encounter very day.

How does R deal with tiny decimals? Converting .csv file list to numbers

I've seen a number of threads about similar problems and have tried all the suggestions with no luck so far - I think maybe the issue is the size of the numbers in my file.
I have a .csv file with 5399 rows and one column of numbers with six decimal places and only three or four significant figures (eg 0.000615).
I can import the file without any problems and de-select the string to factor option, resulting in mode:list.
I have tried as.numeric(), as.character(as.numeric()), tried copying the data into a data frame, into a matrix, into a vector... nothing works. I can do simple maths on the list but cannot run the functions and loops I need to.
Because as.character(as.numeric()) normally works, I figure it must be an issue with the size of the numbers. Does R have a problem with tiny decimals?

R claims that data is non-numeric, but after writing to file is numeric

I have read in a table in R, and am trying to take log of the data. This gives me an error that the last column contains non-numeric values:
> log(TD_complete)
Error in Math.data.frame(list(X2011.01 = c(187072L, 140815L, 785077L, :
non-numeric variable in data frame: X2013.05
The data "looks" numeric, i.e. when I read it my brain interprets it as numbers. I can't be totally wrong since the following will work:
> write.table(TD_complete,"C:\\tmp\\rubbish.csv", sep = ",")
> newdata = read.csv("C:\\tmp\\rubbish.csv")
> log(newdata)
The last line will happily output numbers.
This doesn't make any sense to me - either the data is numeric when I read it in the first time round, or it is not. Any ideas what might be going on?
EDIT: Unfortunately I can't share the data, it's confidential.
Review the colClasses argument of read.csv(), where you can specify what type each column should be read and stored as. That might not be so helpful if you have a large number of columns, but using it makes sure R doesn't have to guess what type of data you're using.
Just because "the last line will happily output numbers" doesn't mean R is treating the values as numeric.
Also, it would help to see some of your data.
If you provide the actual data or a sample of it, help will be much easier.
In this case I assume R has the column in question saved as a string and writes it without any parantheses into the CSV file. Once there, it reads it again and does not bother to interpret a value without any characters as anything else than a number. In other words, by writing and reading a CSV file you converted a string containing only numbers into a proper integer (or float).
But without the actual data or the rest of the code this is mere conjecture.

Resources