How do I get EXCEL to interpret character variable without scientific notation in R using fwrite? - r

I have a relatively simple issue when writing out in R with fwrite from the data.table package I am getting a character vector interpreted as scientific notation by Excel. You can run the following code to create the data issue:
#create example
samp = data.table(id = c("7E39", "7G32","5D99999"))
fwrite(samp,"test.csv",row.names = F)
When you read this back into R you get values back no problem if you have scinote disable. My less code capable colleagues work with the csv directly in excel and they see this:
They can attempt to change the variable to text but excel then interprets all the zeros. I want them to see the original "7E39" from the data table created. Any ideas how to avoid this issue?
PS: I'm working with millions of rows so write.csv is not really an option
EDIT:
One workaround I've found is to just create a mock variable with quotes:
samp = data.table(id = c("7E39", "7G32","5D99999"))[,id2:=shQuote(id)]
I prefer a tidyr solution (pun intended), as I hate unnecessary columns
EDIT2:
Following R2Evan's solution I adapted it to data table with the following (factoring another numerical column, to see if any changes occured):
#create example
samp = data.table(id = c("7E39", "7G32","5D99999"))[,second_var:=c(1,2,3)]
fwrite(samp[,id:=sprintf("=%s", shQuote(id))],
"foo.csv", row.names=FALSE)

It's a kludge, and dang-it for Excel to force this (I've dealt with it before).
write.csv(data.frame(id=sprintf("=%s", shQuote(c("7E39", "7G32","5D99999")))),
"foo.csv", row.names=FALSE)
This is forcing Excel to consider that column a formula, and interpret it as such. You'll see that in Excel, it is a literal formula that assigns a static string.
This is obviously not portable and prone to all sorts of problems, but that is Excel's way in this regard.
(BTW: I used write.csv here, but frankly it doesn't matter which function you use, as long as it passes the string through.)

Another option, but one that your consumers will need to do, not you.
If you export the file "as is", meaning the cell content is just "7E39", then an auto-import within Excel will always try to be smart about that cell's content. However, you can manually import the data.
Using Excel 2016 (32bit, on win10_64bit, if it matters):
Open Excel (first), have an (optionally empty) worksheet already open
On the ribbon: Data > Get External Data > From Text
Navigate to the appropriate file (CSV)
Select "Delimited" (file type), click Next, select "Comma" (and optionally deselect any others that may default to selected), Next
Click on the specific column(s) and set the "Default data format" to "Text" (this will need to be done for any/all columns where this is a problem). Multiple columns can be Shift-selected (for a range of columns), but not Ctrl-selected. Finish.
Choose the top-left cell to import/paste the data (or a new worksheet)
Select Properties..., and deselect "Save query definition". Without this step, the data is considered a query into an external data source, which may not be a problem but makes some things a little annoying. (For example, try to highlight all data and delete it ... Excel really wants to make sure you know what you're doing there.)
This method provides a portable solution. It "punishes" the Excel users, but anybody/anything else will still be able to consume the files directly without change. The biggest disadvantage with this method is that you won't know if somebody loads it incorrectly unless/until they get odd results when the try to use the data and some fields are silently converted.

Related

When I upload my excel file to R, the column titles are in the rows and the data seems all jumbled. How do I fix this?

hi literally day one new coder
On the excel sheet, my data looks organized, but when I upload my file to R, it's not able to read the excel properly and the column headers are in the rows and the data seems randomized.
So far I have tried:
library(readxl)
dataset <-read_excel("pathname")
View(dataset)
Also tried:
dataset <-read_excel("pathname", sheet=1, colNames=TRUE)
Also tried to use the package openxlsx
but nothing is giving me the correct, organized data set.
I tried formatting my Excel to a CSV file, and the CSV file looks exactly like the data that shows up on R (both are messed up).
How should I approach this problem?
I deal with importing .xlsx into R frequently. It can be challenging due to the flexibility of the excel platform. I generally use readxl::read_xlsx() to fetch data from .xlsx files. My suggestions:
First, specify exactly the data you want to import with the range argument.
A cell range to read from, as described in cell-specification. Includes typical Excel
ranges like "B3:D87", possibly including the sheet name like "Budget!B2:G14"
Second, if there are there merged cells or other formatting challenges in column headers, I resort to setting col_names = FALSE. And supplying clean names after import with names(df) <- c("first_col", "second_col")
Third, if there are merged cells elsewhere in the spreadsheet I generally I resort to "fixing" them in excel (not ideal but easier for my use case), however, others may have suggestions on a programmatic fix.
It may be helpful to provide a screenshot of your spreadsheet.

The R package XLSX is converting entire column to string or boolean when one cell is not numeric

I am using a shiney interface under R to read in a CSV file and load it into one sheet of an excel xlsm file. The file then allows user input and preforms calculations based on VBA macros.
The R xlsx package is working well for preserving the VBA and formatting in the original excel sheet. However some of the data is being converted to a different data type than intended. For example a cell containing the string "F" is causing the column containing it to be converted to type boolean, or a miss-entered number in one cell is causing the entire column to be converted to string.
Can this behavior be controlled so that, for example, cells with valid numbers are not converted to string type? Is there a work-around? Or can someone just help me to understand what is happening in the guts of the package to cause this effect so I can try to find a way around it?
Here are the calls in question:
#excelType() points to an excel xlsm template
data = read.csv("results.csv")
excelForm = loadWorkbook(excelType())
sheets = getSheets(excelForm)
addDataFrame(data, sheets[[1]], col.names = FALSE, row.names = FALSE, startRow=2, colStyle = NULL)
saveWorkbook(excelForm, "results.xlsm")
Thanks!
I hope this is the correct protocol for explaining the outcome which worked for me. I hope it will be of help to others if they end up doing something similar, though the solution is not very elegant!
I tried r2evans's suggestion of forcing column types I could not get that to work in this case. Using readxls gave the same problem, and also broke my VBA. Given lebelionz's comment suggesting that this is an R thing and not a package thing I followed his advice to deal with it after the fact. (I do not see how to credit a comment rather than an answer, but for the record this was very helpful, as were the others).
I therefore altered the program producing the CSV that was being loaded through R. I appended "::" to each cell produced, so that R saw all cells as strings, regardless of the original content. Thus "F" was stored as "::F", and therefore was not altered by R.
I added an autorun macro to the excel sheet thus created, so that when opened it automatically performed a global search and replace to remove the prefix "::" from the whole of the data. This forces Excel to choose a data type for each cell after it was restored, resulting in the types being detected cell by cell and in the correct format for my purposes.
It feels kludgy, but it works and is relatively transparent to the user. One hazard is that if the user data intentionally contained the string "::" it would be lost (I am confident this cannot arise in my particular application, but if someone would like to suggest a better prefix I would be interested). I still hope for an eventual solution rather than a work-around.
And here I thought it was only the movie industry that had to "fix it in post"!

characters converted to dates when using write.csv

my data frame has a column A with strings in character form
> df$A
[1] "2-60", "2-61", "2-62", "2-63" etc
I saved the table using write.csv, but when I open it with Excel column A appears formatted as date:
Feb-60
Feb-61
Feb-62
Feb-63
etc
Anyone knows what can I do to avoid this?
I tweaked the arguments of write.csv but nothing worked, and I can't seem to find an example in Stack Overflow that helps solve this problem.
As said in the comments, this is an excel behaviour, not R's. And that can't be deactivated:
Microsoft Excel is preprogrammed to make it easier to enter dates. For
example, 12/2 changes to 2-Dec. This is very frustrating when you
enter something that you don't want changed to a date. Unfortunately
there is no way to turn this off. But there are ways to get around it.
Microsoft Office Article
The first suggested way around it according to the article is not helpful, because it relies on changing the cell formatting, but that's too late when you open the .csv file in excel (it's already converted to an integer representing the date).
There is, however, a useful tip:
If you only have a few numbers to enter, you can stop Excel from
changing them into dates by entering:
An apostrophe (‘) before you enter a number, such as ’11-53 or ‘1/47. The apostrophe isn’t displayed in the cell after you press
Enter.
So you can make the data display as original by using
vec <- c("2-60", "2-61", "2-62", "2-63")
vec <- paste0("'", vec)
Just remember the values will still have the apostrophe if you read them again in R, so you might have to use
vec <- sub("'", "", vec)
This might not be ideal but at least it works.
One alternative is enclosing the text in =" ", as an excel formula, but that has the same end result and uses more characters.
Another solution - a bit tedious, Use Import Text File in Excel, click thru the dialog boxes and in Step 3 of 3 of the Text Import Wizard, you will have an option of setting the column data format, use "Text" for the column that has "2-60", "2-61", "2-62", "2-63". If you use General (the default), Excel tries to be smart and converts the answer for you.
I solved the problem by saving the file using the .xlsx format by using the function
write.xlsx()
from the package xlsx (https://www.rdocumentation.org/packages/xlsx/versions/0.6.5)

Large csv file fails to fully read in to R data.frame

I am trying to load a fairly large csv file into R. It has about 50 columns and 2million row.
My code is pretty basic, and I have used it to open files before but none this large.
mydata <- read.csv('file.csv', header = FALSE, sep=",", stringsAsFactors = FALSE)
The result is that it reads in the data but stops after 1080000 rows or so. This is roughly where excel stops as well. Is their way to get R to read the whole file in? Why is it stopping around half way.
Update: (11/30/14)
After speaking with the provider of the data it was discovered that they may have been some corruption issue with the file. A new file was provided which also is smaller and loads into R easily.
As, "read.csv()" read up to 1080000 rows, "fread" from library(data.table) should read it with ease. If not, there exists two other options, either try with library(h20) or with "fread" you can use select option to read required columns (or read in two halves, do some cleaning and can merge them back).
You can try using read.table and include the parameter colClasses to specify the type of the individual columns.
With your current code, R will read all data first as strings and then check for each column if it is convertible e. g. to a numeric type, which needs more memory than reading right away as numeric. colClasses will also allow you to ignore columns you might not need.

How to write multiple tables, dataframes, regression results etc - to one excel file?

I am looking for an easy way to get objects into MS Excel.
(I am using the preinstalled "Puromycin"-dataset for the examples)
I would like to place the contents of these objects to a single excel file:
Puromycin
summary(Puromycin$rate)
summary(Purymycin$conc)
table(Puromycin$state)
lm( conc ~ rate , data=Puromycin)
By "contents" i mean what is shown in the console when i press enter. I dont know what to call it.
I tried to do this:
sink("datafilewhichexcelhopefullyunderstands.csv")
Puromycin
summary(Puromycin$rate)
summary(Purymycin$conc)
table(Puromycin$state)
lm( conc ~ rate , data=Puromycin)
sink()
This gives med a file with the CSV-extension, however when i open the file in notepad,
there is comma-separation. That means that i cant get Excel to open it properly. By properly
i mean that each number is in its own cell.
Others have suggested this for a similar problem
https://stackoverflow.com/a/13007555/1831980
But as a novice i feel that the solution is too complex, and I am hoping for a simpler method.
What I am doing now is this:
write.table(Puromycin, file="clipboard" , sep=";" , row.names=FALSE )
write.table(summary(Purymycin$conc), file="clipboard" , sep=";" , row.names=FALSE )
... etc...
But this requires i lot of copy-ing and pasting, which I hope to eliminate.
Any help would appreciated.
write.table and its friends are intended to write out columns of data separated by whatever separator is specified. Your clipboard contains several data types because you are using summary which always gives a unique output.
For writing the data values out, you can use write.csv on a data frame and then open with Excel. For example, Puromycin is already a data frame (which you can see with str(Puromycin)) so you can just write it out directly:
write.csv(file = "some file.csv", x = Puromycin)
Which will go into the current working directory (which can be determined with getwd()).
To write out/save the results of the regression model is a bit more of a challenge. You could definitely use sink as you did, but specify an extension of .txt on your file so a text editor can open it. There are fancier methods (sweave, knitr) which you might want to look into in the long run, as they can write really nice reports automatically.
In the meantime, get to know str(any R object) as it will be your friend. You can see all the objects in your workspace with ls().
This will only be helpful if you are prepared to use Excel's Data/Text to Columns functions:
capture.output( sapply( c(Puromycin,
summary(Puromycin$rate),
summary(Puromycin$conc),
table(Puromycin$state),
lm( conc ~ rate , data=Puromycin) ), FUN=print), file="datafilewhichexcelhopefullyunderstands.csv", append=TRUE)
The problem being that Excel will not read the whitespace as a cell separator unless you specifically tell it to. You can (and I have often done so) use the fixed filed input features offered by the Text-to-Columns dialog interface.
Your simplest option may be to use the RExcel tool, it transfers information between R and Excel. However it is not free software.
The XLConnect package is another option, it can be used to write information directly to an Excel file.
The tricky part is the lm call. lm does not return a simple vector, matrix, or data frame (all of which are easy to convert to csv or send directly) and there is not a clear way to convert the various parts of a list to cells in a spreadsheet. What would be better is to use extractor functions to pull the important parts from the return of lm or the summary of the lm object and send those to Excel using the other tools.
If you can tell us more about why you want the numbers in Excel and what you plan to do with them after, then we may be able to offer better help (you may be able to completely skip excel).
If the main goal is to share output with others then you should really look at the knitr package (or other related packages). This will not create Excel files, but can be used (along with the pandoc program and possibly other tools) to create a report file in a format easy to share with others not familiar with R. You could put everything into a .pdf file or a .docx file (the latter read by MS Word and would have tables wich can be edited using Word). There is not a simple way to get edits back into R, but with the track changes you can easily see what changes have been made and hand edit your R script/template accordingly.

Resources