I have a large dataset as a .csv file with one important column being a 14 digit number. When opening it in R or Excel, the number becomes truncated i.e. 83990969388422 becomes 8.4^13. I tried saving the file as an Excel worksheet file where the numbers are correctly displayed. However, as soon as I import it to R, the numbers become truncated.
How do I avoid this truncation in R?
To set the use of scientific notation in your entire R session, you can use the scipen option. From the documentation (?options)
options(scipen=999)
As suggested, you can use options(scipen = 999). However, this always prevents R from using scientific notation. So this is mostly usefull if you never want the scientific notation. If it is just specific variables you don't want in scientific notation, you could also mutate the variable using format(., scientific = FALSE) to turn off the scientific notation for that specific variable.
Related
Is there any way of stopping R from dropping leading zeros in an integer? e.g.,
a<-c(00217,00007,00017)
I understand this is not the correct way of writing integers. Sadly I've been given a text file (person and non-R code are not around anymore) containing thousands of vectors in a single list:
list(drugA=c(...), drugB=c(....),........)
I need to keep the leading zeros as 00002 becomes 2. I could load these thousands of values in and then write a function to parse the list and convert into characters whilst correcting for any number that isn't five characters long but I was hoping for a speedy alternative.
UPDATE1
An example of the text file I've been provided:
list(CETUXIMAB=c(05142,05316),
DORNASEALFA=c(94074),
ETANERCEPT=c(05342,99075),
BIVALIRUDIN=c(04400,09177),
LEUPROLIDE=c(02074,03219,91035,91086),
PEGINTERFERONALFA2A=c(03162),
ALTEPLASE=c(00486,01032,03371,05314),
DARBEPOETINALFA=c(02217,03421),
GOSERELIN=c(99221),
RETEPLASE=c(00157),
ERYTHROPOIETIN=c(92078,92122))
I have truncated the list as there are thousands of vectors. This was a text file generated using a program written in C++ (code not available). Some of the values e.g., RETEPLASE=c(00157) becomes truncated to 157.
library(stringr)
str_pad(a, 5, pad = "0")
I don't want the display format like this: 2.150209e+06
the format I want is 2150209
because when I export data, format like 2.150209e+06 caused me a lot of trouble.
I did some search found this function could help me
formatC(numeric_summary$mean, digits=1,format="f").
I am wondering can I set options to change this forever? I don't want to apply this function to every variable of my data because I have this problem very often.
One more question is, can I change the class of all integer variables to numeric automatically? For integer format, when I sum the whole column usually cause trouble, says "integer overflow - use sum(as.numeric(.))".
I don't need integer format, all I need is numeric format. Can I set options to change integer class to numeric please?
I don't know how you are exporting your data, but when I use write.csv with a data frame containing numeric data, I don't get scientific notation, I get the full number written out, including all decimal precision. Actually, I also get the full number written out even with factor data. Have a look here:
df <- data.frame(c1=c(2150209.123, 10001111),
c2=c('2150209.123', '10001111'))
write.csv(df, file="C:\\Users\\tbiegeleisen\\temp.txt")
Output file:
"","c1","c2"
"1",2150209.123,"2150209.123"
"2",10001111,"10001111"
Update:
It is possible that you are just dealing with a data rendering issue. What you see in the R console or in your spreadsheet does not necessarily reflect the precision of the underlying data. For instance, if you are using Excel, you highlight a numeric cell, press CTRL + 1 and then change the format. You should be able to see full/true precision of the underlying data. Similarly, the number you see printed in the R console might use scientific notation only for ease of reading (SN was invented partially for this very reason).
Thank you all.
For the example above, I tried this:
df <- data.frame(c1=c(21503413542209.123, 10001111),
c2=c('2150209.123', '100011413413111'))
c1 in df is scientific notation, c2 is not.
then I run write.csv(df, file="C:\Users\tbiegeleisen\temp.txt").
It does out put all digits.
Can I disable scientific notation in R please? Because, it still cause me trouble, although it exported all digits to txt.
Sometimes I want to visually compare two big numbers. For example, if I run
df <- data.frame(c1=c(21503413542209.123, 21503413542210.123),
c2=c('2150209.123', '100011413413111'))
df will be
c1 c2
2.150341e+13 2150209.123
2.150341e+13 100011413413111
The two values for c1 are actually different, but I cannot differentiate them in R, unless I exported them to txt. The numbers here are fake numbers, but the same problem I encounter very day.
I have a lot of long numbers and r reads them as scientific notation. But when I write.csv, the scientific notation becomes an incorrect number with a bunch of zeros following. For example, 3.894e+13 will become 38944400000000 after the write.csv.
I have exact numbers in the place where the zeros are.
How do I keep the exact number when exporting a data file?
[update]:
(1) The problem is because when I save as csv in excel, it loses digits of long numbers. It is an excel bug and I use excel 2016.
(2) when the above problem occurred, I tired to set options(scipen=999). When I summarize the data, the summary statistics are omitted always in this file. I tried other files, it (summary) works without losing precision. When I do print the numbers, it is correct, only the summary statistics are omitted after I set options.
Set the the scipen option to be a large enough number before writing the csv file is one way to make it work:
df = data.frame(x = 1232939143546532)
options(scipen = 30)
write.csv(df, "test.cv")
This gives the following:
"","x"
"1",1232939143546532
I have data in excel and after reading in R it reads as follows
as
lob2 lob3
1.86E+12 7.58E+12
I want it as
lob2 lob3
1857529190776.75 7587529190776.75
This difference causes me to have different results after doing my analysis later on
How is the data stored in Excel (does it think it is a number, a string, a date, etc.)?
How are you getting the data from Excel to R? If you save the data as a .csv file then read it into R, look at the intermediate file, Excel is known to abbreviate when saving and R would then see character strings instead of numbers. You need to find a way to tell excel to export the data in the correct format with the correct precision.
If you are using a package (there are more than 1) then look into the details of that package for how to grab the numbers correctly (you may need to make changes in Excel so that it knows they are numbers).
Lastly, what does the str function on your R object say? It could be that R is storing the proper numbers and only displaying the short version as mentioned in the comments. Or, it could be that R received strings that did not convert nicely to numbers and is storing them as characters or factors. The str function will let you see how your data is stored in R, and therefore how to convert or display it correctly.
my question is I have a column which has such format as 20000000002185979. Everytime I read the csv file into R, it became "2e+16". So I can't distinguish from different values. Do you have any good ideas about how to keep the original format when read the file into R? Thx!
Since it turned out to be the answer you wanted. I'll post it here to close out the question.
Since R is unable to maintain that many digits of precision with it's numeric values, you'll have to read it in as a character value. You can do that by setting the colClasses parameter of read.table.