I have data in excel and after reading in R it reads as follows
as
lob2 lob3
1.86E+12 7.58E+12
I want it as
lob2 lob3
1857529190776.75 7587529190776.75
This difference causes me to have different results after doing my analysis later on
How is the data stored in Excel (does it think it is a number, a string, a date, etc.)?
How are you getting the data from Excel to R? If you save the data as a .csv file then read it into R, look at the intermediate file, Excel is known to abbreviate when saving and R would then see character strings instead of numbers. You need to find a way to tell excel to export the data in the correct format with the correct precision.
If you are using a package (there are more than 1) then look into the details of that package for how to grab the numbers correctly (you may need to make changes in Excel so that it knows they are numbers).
Lastly, what does the str function on your R object say? It could be that R is storing the proper numbers and only displaying the short version as mentioned in the comments. Or, it could be that R received strings that did not convert nicely to numbers and is storing them as characters or factors. The str function will let you see how your data is stored in R, and therefore how to convert or display it correctly.
Related
I got a task, where I have an excel file, with multiple sheets and various formulas. The sheets are linked between each other in many formulas.
What I was thinking to do was first to convert the excel in multiple CSVs and then finding all the formulas by looking up at the cells that start with '='. The thing I am struggling is to convert an excel formula such as: "=C2*xyz!D3/xyz!C4"
The final output should be a final data.frame or data.table containing all the codes/names of the formulas and the formulas converted in R syntax.
R formulas need the input data to be defined as a list. but in excel is was defined as per cell.
Since you are only (I assume) interested in 'some' list and not the full 'per cell' data,it'll be a waste to translate all formula to R.
Instead, have a seat back.. and look which data is the real input cell (or in R, the input list).. Then look how it was transformed/linked to other sheet, and trace till the output table. Once you got the draft algo, then re-writing the whole story(data relation) in R will not be as hard.
Hope it helps.
p/s : I'm proposing a method as a solution, not the full solution (since there is no sample data/file provided). ( :
I don't want the display format like this: 2.150209e+06
the format I want is 2150209
because when I export data, format like 2.150209e+06 caused me a lot of trouble.
I did some search found this function could help me
formatC(numeric_summary$mean, digits=1,format="f").
I am wondering can I set options to change this forever? I don't want to apply this function to every variable of my data because I have this problem very often.
One more question is, can I change the class of all integer variables to numeric automatically? For integer format, when I sum the whole column usually cause trouble, says "integer overflow - use sum(as.numeric(.))".
I don't need integer format, all I need is numeric format. Can I set options to change integer class to numeric please?
I don't know how you are exporting your data, but when I use write.csv with a data frame containing numeric data, I don't get scientific notation, I get the full number written out, including all decimal precision. Actually, I also get the full number written out even with factor data. Have a look here:
df <- data.frame(c1=c(2150209.123, 10001111),
c2=c('2150209.123', '10001111'))
write.csv(df, file="C:\\Users\\tbiegeleisen\\temp.txt")
Output file:
"","c1","c2"
"1",2150209.123,"2150209.123"
"2",10001111,"10001111"
Update:
It is possible that you are just dealing with a data rendering issue. What you see in the R console or in your spreadsheet does not necessarily reflect the precision of the underlying data. For instance, if you are using Excel, you highlight a numeric cell, press CTRL + 1 and then change the format. You should be able to see full/true precision of the underlying data. Similarly, the number you see printed in the R console might use scientific notation only for ease of reading (SN was invented partially for this very reason).
Thank you all.
For the example above, I tried this:
df <- data.frame(c1=c(21503413542209.123, 10001111),
c2=c('2150209.123', '100011413413111'))
c1 in df is scientific notation, c2 is not.
then I run write.csv(df, file="C:\Users\tbiegeleisen\temp.txt").
It does out put all digits.
Can I disable scientific notation in R please? Because, it still cause me trouble, although it exported all digits to txt.
Sometimes I want to visually compare two big numbers. For example, if I run
df <- data.frame(c1=c(21503413542209.123, 21503413542210.123),
c2=c('2150209.123', '100011413413111'))
df will be
c1 c2
2.150341e+13 2150209.123
2.150341e+13 100011413413111
The two values for c1 are actually different, but I cannot differentiate them in R, unless I exported them to txt. The numbers here are fake numbers, but the same problem I encounter very day.
I'm using R to write a csv file, which is then ingested by PowerBi365 to fuel an existing report. One of the columns in the csv file ends up having some NA values, which means that PowerBi erroneously identifies the column as a string field. This prevents the application from being able to analyze the numerical data in the column. Is there a way to avoid this? I've tried using write.csv(x,file,na=""), but the problem persists. How can I write the csv in such a way that PowerBi365 recognizes the field as a number field? Thanks!
Turns out this problem was caused by the use of the Round() function in R. That function in combination with the existing NA values in the dataset caused the csv to assume that this particular column was a character field. I solved the problem by wrapping the Round() function in the function as.numeric(), which coerced the field back into num data type.
I have read in a table in R, and am trying to take log of the data. This gives me an error that the last column contains non-numeric values:
> log(TD_complete)
Error in Math.data.frame(list(X2011.01 = c(187072L, 140815L, 785077L, :
non-numeric variable in data frame: X2013.05
The data "looks" numeric, i.e. when I read it my brain interprets it as numbers. I can't be totally wrong since the following will work:
> write.table(TD_complete,"C:\\tmp\\rubbish.csv", sep = ",")
> newdata = read.csv("C:\\tmp\\rubbish.csv")
> log(newdata)
The last line will happily output numbers.
This doesn't make any sense to me - either the data is numeric when I read it in the first time round, or it is not. Any ideas what might be going on?
EDIT: Unfortunately I can't share the data, it's confidential.
Review the colClasses argument of read.csv(), where you can specify what type each column should be read and stored as. That might not be so helpful if you have a large number of columns, but using it makes sure R doesn't have to guess what type of data you're using.
Just because "the last line will happily output numbers" doesn't mean R is treating the values as numeric.
Also, it would help to see some of your data.
If you provide the actual data or a sample of it, help will be much easier.
In this case I assume R has the column in question saved as a string and writes it without any parantheses into the CSV file. Once there, it reads it again and does not bother to interpret a value without any characters as anything else than a number. In other words, by writing and reading a CSV file you converted a string containing only numbers into a proper integer (or float).
But without the actual data or the rest of the code this is mere conjecture.
I am fairly new to R. I have a datafile which has a matrix of complex numbers, each of the form 123+123i, when I try to read in the data in R, using read.table(), it returns strings, which is not what I want. Is there some way to read in a file of complex numbers?
One possible thing that I could do, since the program that generates the matrix is available to me, I can modify it to generate two real numbers instead of a single complex number, and after reading into R, I can make them into a single complex number, now would this be the canonical way to doing what I want?
See ?read.table, in particular you want to use the colClasses="complex" argument.