I am using EPPlus to generate an excel report in ASP.Net. I have the following code in use:
aCell.Value = CDec(aValue)
aCell is of type OfficeOpenXml.ExcelRange.
This works fine most of the time. However, when aValue is 201600000000515561, the cell in Excel is set with the value of 201600000000516000.
Stepping through the code and watching the value of aCell.Value shows that the correct value being set. But when the excel is opened, the value has been rounded up to the next 1000.
Does anybody have a solution or is this a bug in EPPlus?
Its not an Epplus thing, more of an Excel thing. Because of the way excel stores numbers you have up to 15 significant figures after which excel basically inserts 0's. If you count 201600000000515561 starting from the left you will see this. Have a look at this:
https://en.wikipedia.org/wiki/Numeric_precision_in_Microsoft_Excel
I think the most popular work around would be to store the numbers as text which of course leads to its own set of issues. Here is a good thread on it:
http://answers.microsoft.com/en-us/office/forum/office_2007-excel/15-digit-number-limitation-non-text-workaround/a4974853-7c3c-4830-8562-2e88369d981b?auth=1
Related
I recently ran into an issue in R where it wasn't reading the date values from my csv file. I had reviewed and revised my code many times before realizing the source of the issue was the file itself. After experimentation, I realized that the date would only read if I expanded the date column in Excel and then resaved the file. This doesn't seem logical to me, the data is stored in the spreadsheet so while I expect Excel to make it unreadable to the human eye until the column is expanded, I did not expect it to be unreadable to another computer program, in this case, R. I feel that I got lucky discovering this and would like to understand why it works that way?
It helps to mention that I noticed a very similar issue when pasting un-expanded dates from Excel to Google Sheets; the result is cells filled with "######" instead of the actual date values.
Because Excel is the common party here I'm assuming this is an Excel issue.
What is happening with Excel to make the un-expanded date values unreadable to other software? Is this something that Excel is aware of?
Un-expanded dates
R-Misread
Google Sheets Paste
After expanding the column in the csv file and saving it, both the R issue and the Google Sheets issue went away.
I have a relatively simple issue when writing out in R with fwrite from the data.table package I am getting a character vector interpreted as scientific notation by Excel. You can run the following code to create the data issue:
#create example
samp = data.table(id = c("7E39", "7G32","5D99999"))
fwrite(samp,"test.csv",row.names = F)
When you read this back into R you get values back no problem if you have scinote disable. My less code capable colleagues work with the csv directly in excel and they see this:
They can attempt to change the variable to text but excel then interprets all the zeros. I want them to see the original "7E39" from the data table created. Any ideas how to avoid this issue?
PS: I'm working with millions of rows so write.csv is not really an option
EDIT:
One workaround I've found is to just create a mock variable with quotes:
samp = data.table(id = c("7E39", "7G32","5D99999"))[,id2:=shQuote(id)]
I prefer a tidyr solution (pun intended), as I hate unnecessary columns
EDIT2:
Following R2Evan's solution I adapted it to data table with the following (factoring another numerical column, to see if any changes occured):
#create example
samp = data.table(id = c("7E39", "7G32","5D99999"))[,second_var:=c(1,2,3)]
fwrite(samp[,id:=sprintf("=%s", shQuote(id))],
"foo.csv", row.names=FALSE)
It's a kludge, and dang-it for Excel to force this (I've dealt with it before).
write.csv(data.frame(id=sprintf("=%s", shQuote(c("7E39", "7G32","5D99999")))),
"foo.csv", row.names=FALSE)
This is forcing Excel to consider that column a formula, and interpret it as such. You'll see that in Excel, it is a literal formula that assigns a static string.
This is obviously not portable and prone to all sorts of problems, but that is Excel's way in this regard.
(BTW: I used write.csv here, but frankly it doesn't matter which function you use, as long as it passes the string through.)
Another option, but one that your consumers will need to do, not you.
If you export the file "as is", meaning the cell content is just "7E39", then an auto-import within Excel will always try to be smart about that cell's content. However, you can manually import the data.
Using Excel 2016 (32bit, on win10_64bit, if it matters):
Open Excel (first), have an (optionally empty) worksheet already open
On the ribbon: Data > Get External Data > From Text
Navigate to the appropriate file (CSV)
Select "Delimited" (file type), click Next, select "Comma" (and optionally deselect any others that may default to selected), Next
Click on the specific column(s) and set the "Default data format" to "Text" (this will need to be done for any/all columns where this is a problem). Multiple columns can be Shift-selected (for a range of columns), but not Ctrl-selected. Finish.
Choose the top-left cell to import/paste the data (or a new worksheet)
Select Properties..., and deselect "Save query definition". Without this step, the data is considered a query into an external data source, which may not be a problem but makes some things a little annoying. (For example, try to highlight all data and delete it ... Excel really wants to make sure you know what you're doing there.)
This method provides a portable solution. It "punishes" the Excel users, but anybody/anything else will still be able to consume the files directly without change. The biggest disadvantage with this method is that you won't know if somebody loads it incorrectly unless/until they get odd results when the try to use the data and some fields are silently converted.
I am using the following R script to write the data. table into the excel file in my set directory. However, the size of the file is in GB's as the total rows are 50 million+. Hence upon opening the file, I just see a blank grey screen and nothing else.
How can I see the contents in the file?
The first line is just for illustration purpose.
Final1 <- rep(iris, times = 1000000)
fwrite(Final1,"data2.csv")
You mention that this is part of the report. I would be willing to bet good money that whoever will be reading this report will not check all or most of the values by hand. In which case, you don't need a format that is easily browsable, e.g. xlsx or even csv. If this indeed is the case, you might want to try a (relational) database. If you do not have anything centralized, you might want to give SQLite a try. You save everything into one file which acts as a database. There are packages that handle this interaction in R. You can try with sqldf or RSQLite.
I am using a shiney interface under R to read in a CSV file and load it into one sheet of an excel xlsm file. The file then allows user input and preforms calculations based on VBA macros.
The R xlsx package is working well for preserving the VBA and formatting in the original excel sheet. However some of the data is being converted to a different data type than intended. For example a cell containing the string "F" is causing the column containing it to be converted to type boolean, or a miss-entered number in one cell is causing the entire column to be converted to string.
Can this behavior be controlled so that, for example, cells with valid numbers are not converted to string type? Is there a work-around? Or can someone just help me to understand what is happening in the guts of the package to cause this effect so I can try to find a way around it?
Here are the calls in question:
#excelType() points to an excel xlsm template
data = read.csv("results.csv")
excelForm = loadWorkbook(excelType())
sheets = getSheets(excelForm)
addDataFrame(data, sheets[[1]], col.names = FALSE, row.names = FALSE, startRow=2, colStyle = NULL)
saveWorkbook(excelForm, "results.xlsm")
Thanks!
I hope this is the correct protocol for explaining the outcome which worked for me. I hope it will be of help to others if they end up doing something similar, though the solution is not very elegant!
I tried r2evans's suggestion of forcing column types I could not get that to work in this case. Using readxls gave the same problem, and also broke my VBA. Given lebelionz's comment suggesting that this is an R thing and not a package thing I followed his advice to deal with it after the fact. (I do not see how to credit a comment rather than an answer, but for the record this was very helpful, as were the others).
I therefore altered the program producing the CSV that was being loaded through R. I appended "::" to each cell produced, so that R saw all cells as strings, regardless of the original content. Thus "F" was stored as "::F", and therefore was not altered by R.
I added an autorun macro to the excel sheet thus created, so that when opened it automatically performed a global search and replace to remove the prefix "::" from the whole of the data. This forces Excel to choose a data type for each cell after it was restored, resulting in the types being detected cell by cell and in the correct format for my purposes.
It feels kludgy, but it works and is relatively transparent to the user. One hazard is that if the user data intentionally contained the string "::" it would be lost (I am confident this cannot arise in my particular application, but if someone would like to suggest a better prefix I would be interested). I still hope for an eventual solution rather than a work-around.
And here I thought it was only the movie industry that had to "fix it in post"!
I have a script using PHPExcel framework and everything is working perfectly fine except for one big issue when calculating the sum of a column.
It works and the right number is in the cell, the only thing is that when I run the script (and I checked the Excel file I uploaded, no wrong formats there) it converts my TIME cell into a DD.MM.YYY HH:MM cell which corrupts my sum equation.
Whenever I change the field types (formats) manually in Excel, the sum is right and everything works.
What can I do to find and fix the problem?