Ignoring calculated values when reading in Excel file with PHPExcel - phpexcel

I am reading an XLS file with setReadDataOnly(true). The read object is saved again as a new Excel file. Unfortunately, some cell values are calculated incorrectly (this has to do with a calculation bug over cells with a subtotal formula). If I understand correctly, each cell in an XLS file contains a pre-calculated value along with the formula. If I can get PHPExcel to not attempt to calculate the formulas when reading the file in (instead just use the precalculated values as is) I can get around this issue. I thought that setReadDataOnly(true) or setPreCalculateFormulas(false) might accomplish this, but it does not.
Additional info
Thanks to Mark's explanation, I investigated the difference between getCalculatedValue() and getOldCalculatedValue() in my case. I use the following code to read the file in and then write it out again:
$excel_reader = PHPExcel_IOFactory::createReaderForFile($file);
$excel_reader->setReadDataOnly(true);
$excel_obj_temp = $excel_reader->load($file);
// Test one of the values in question
$excel_obj_temp->setActiveSheetIndexByName("Form 11");
error_log("val:".$excel_obj_temp->getActiveSheet()->getCell("E36")->getCalculatedValue());
error_log("old_val:".$excel_obj_temp->getActiveSheet()->getCell("E36")->getOldCalculatedValue());
$new_file = "new_generated_name";
$excel_writer = new PHPExcel_Writer_Excel5($excel_obj_temp);
$excel_writer->setPreCalculateFormulas(false);
$excel_writer->save($unprotected_file);
When Reading the the file in, it shows the correct value with getOldCalculatedValue(). If I then save the file without setPreCalculateFormulas(false) and read the file in again, both getCalculatedValue() and getOldCalculatedValue() returns the same (incorrect) result. This is inline with Mark's explanation that the values will be recalculated on save if you do not set setPreCalculateFormulas(false)
However, if I instead save the file with setPreCalculateFormulas(false) (which seems to be the correct way) and read the file again, getCalculatedValue() returns the incorrect result and getOldCalculatedValue() returns 0, which is wrong.
Why is the cached values cleared after saving? Is there some other setting I need to apply along with setPreCalculateFormulas(false)?

PHPExcel does not calculate any values when you load a spreadsheet file. It will only calculate cell values if you explicitly call the cell's getCalculatedValue() or getFormattedValue() methods, or by default when you save (unless you use the Writer's setPreCalculateFormulas(false))... though using autofit columns forces a recalculation on save for any cells in those columns regardless of any other settings.
MS Excel will normally save a calculated value for all formula cells in a spreadsheet (unless this is explicitly disabled), and this value can be read in PHPExcel using the cell's getOldCalculatedValue() method.

Related

In PhpSpreadSheet I want the calculated or final value of cells instead of the formula when reading an xlsx file. How can I do this?

I'm using PhpSpreadSheet to read the content of an xlsx file into an array. Unfortunately, the cells are giving the formula instead of the final value. Is there a way to just get the final value instead of the formula ?
You can use the calculation engine more information in the documentation here.

How do I get EXCEL to interpret character variable without scientific notation in R using fwrite?

I have a relatively simple issue when writing out in R with fwrite from the data.table package I am getting a character vector interpreted as scientific notation by Excel. You can run the following code to create the data issue:
#create example
samp = data.table(id = c("7E39", "7G32","5D99999"))
fwrite(samp,"test.csv",row.names = F)
When you read this back into R you get values back no problem if you have scinote disable. My less code capable colleagues work with the csv directly in excel and they see this:
They can attempt to change the variable to text but excel then interprets all the zeros. I want them to see the original "7E39" from the data table created. Any ideas how to avoid this issue?
PS: I'm working with millions of rows so write.csv is not really an option
EDIT:
One workaround I've found is to just create a mock variable with quotes:
samp = data.table(id = c("7E39", "7G32","5D99999"))[,id2:=shQuote(id)]
I prefer a tidyr solution (pun intended), as I hate unnecessary columns
EDIT2:
Following R2Evan's solution I adapted it to data table with the following (factoring another numerical column, to see if any changes occured):
#create example
samp = data.table(id = c("7E39", "7G32","5D99999"))[,second_var:=c(1,2,3)]
fwrite(samp[,id:=sprintf("=%s", shQuote(id))],
"foo.csv", row.names=FALSE)
It's a kludge, and dang-it for Excel to force this (I've dealt with it before).
write.csv(data.frame(id=sprintf("=%s", shQuote(c("7E39", "7G32","5D99999")))),
"foo.csv", row.names=FALSE)
This is forcing Excel to consider that column a formula, and interpret it as such. You'll see that in Excel, it is a literal formula that assigns a static string.
This is obviously not portable and prone to all sorts of problems, but that is Excel's way in this regard.
(BTW: I used write.csv here, but frankly it doesn't matter which function you use, as long as it passes the string through.)
Another option, but one that your consumers will need to do, not you.
If you export the file "as is", meaning the cell content is just "7E39", then an auto-import within Excel will always try to be smart about that cell's content. However, you can manually import the data.
Using Excel 2016 (32bit, on win10_64bit, if it matters):
Open Excel (first), have an (optionally empty) worksheet already open
On the ribbon: Data > Get External Data > From Text
Navigate to the appropriate file (CSV)
Select "Delimited" (file type), click Next, select "Comma" (and optionally deselect any others that may default to selected), Next
Click on the specific column(s) and set the "Default data format" to "Text" (this will need to be done for any/all columns where this is a problem). Multiple columns can be Shift-selected (for a range of columns), but not Ctrl-selected. Finish.
Choose the top-left cell to import/paste the data (or a new worksheet)
Select Properties..., and deselect "Save query definition". Without this step, the data is considered a query into an external data source, which may not be a problem but makes some things a little annoying. (For example, try to highlight all data and delete it ... Excel really wants to make sure you know what you're doing there.)
This method provides a portable solution. It "punishes" the Excel users, but anybody/anything else will still be able to consume the files directly without change. The biggest disadvantage with this method is that you won't know if somebody loads it incorrectly unless/until they get odd results when the try to use the data and some fields are silently converted.

The R package XLSX is converting entire column to string or boolean when one cell is not numeric

I am using a shiney interface under R to read in a CSV file and load it into one sheet of an excel xlsm file. The file then allows user input and preforms calculations based on VBA macros.
The R xlsx package is working well for preserving the VBA and formatting in the original excel sheet. However some of the data is being converted to a different data type than intended. For example a cell containing the string "F" is causing the column containing it to be converted to type boolean, or a miss-entered number in one cell is causing the entire column to be converted to string.
Can this behavior be controlled so that, for example, cells with valid numbers are not converted to string type? Is there a work-around? Or can someone just help me to understand what is happening in the guts of the package to cause this effect so I can try to find a way around it?
Here are the calls in question:
#excelType() points to an excel xlsm template
data = read.csv("results.csv")
excelForm = loadWorkbook(excelType())
sheets = getSheets(excelForm)
addDataFrame(data, sheets[[1]], col.names = FALSE, row.names = FALSE, startRow=2, colStyle = NULL)
saveWorkbook(excelForm, "results.xlsm")
Thanks!
I hope this is the correct protocol for explaining the outcome which worked for me. I hope it will be of help to others if they end up doing something similar, though the solution is not very elegant!
I tried r2evans's suggestion of forcing column types I could not get that to work in this case. Using readxls gave the same problem, and also broke my VBA. Given lebelionz's comment suggesting that this is an R thing and not a package thing I followed his advice to deal with it after the fact. (I do not see how to credit a comment rather than an answer, but for the record this was very helpful, as were the others).
I therefore altered the program producing the CSV that was being loaded through R. I appended "::" to each cell produced, so that R saw all cells as strings, regardless of the original content. Thus "F" was stored as "::F", and therefore was not altered by R.
I added an autorun macro to the excel sheet thus created, so that when opened it automatically performed a global search and replace to remove the prefix "::" from the whole of the data. This forces Excel to choose a data type for each cell after it was restored, resulting in the types being detected cell by cell and in the correct format for my purposes.
It feels kludgy, but it works and is relatively transparent to the user. One hazard is that if the user data intentionally contained the string "::" it would be lost (I am confident this cannot arise in my particular application, but if someone would like to suggest a better prefix I would be interested). I still hope for an eventual solution rather than a work-around.
And here I thought it was only the movie industry that had to "fix it in post"!

Limited count of column in XLSX file

I should generate a big excel spreadsheet with XLConnect. I am filling each column in this spreadscheet with my calculation and at the end I am writing my calculation in the spreadscheet:
writeWorksheetToFile(file=FileName,mtr,startRow=1,startCol=strcol,sheet="Sheet1",header=FALSE,rownames=FALSE)
but if I a open the excel file I can only see until AMJ coloumn. Is there a way, that I see my total column or the count of columns in XLSX file is limited?
I am not sure Kaja about this, because I use a convenience wrapper for XLConnect, but all I need to provide as arguments are the object I want to print to file (for you, "mtr") and the filename, such as perhaps "mtr.print.xlsx".
Why do you need to specify the startRow? Also, perhaps your startCol argument only leaves the AMJ column? Have you tried omitting it?
In short, only all the R object and the filename and see what happens.

Excel data organized in multiple nested rows, can R read it?

Please see the picture. I've started using R, and know how/that it can read files from Excel, but can it read something formatted like this?
http://www.flickr.com/photos/68814612#N05/8632809494/
(my apologies, upload was not working for me)
Elaborating on some of what's in the comments:
If you load the file into Excel, you can save it as a fixed-width or comma-delimited text file. Either should be easy to read into R.
The following may be obvious to you already.
(First, a question: Are you sure that you can't get the data in a format that has one set of data per line? Is it possible that the file you're getting was generated from a different file format that is more conducive to loading the data into R?)
Whether you should start rearranging the data in R or instead manipulate the raw text depends on what comes naturally to you (or to people you have around who can help). For me, personally, I would rearrange the text file outside of R before loading it into R. That's what's easiest for me. Perl is a great language for this purpose, but you could also do it with Unix shell scripts if that's accessible to you, or using a powerful editor such as Vim or Emacs. If you have no preference, I'd suggest Perl. If you have any significant programming experience, you'll be able to learn what you need. On the other hand, you're already loading it into R, so maybe it would be better to process the data there.
For example, you could execute a loop that goes the text file line by line and does something like this:
while (still have lines to read) {
read first header line into an vector if this is the first time through the loop
otherwise, read it and throw it away
read data line 1 into an vector
read second header line into vector if this is the first time
otherwise, read it and throw it away
read data line 2 into an vector
read third header line into vector if this is the first time
otherwise, read it and throw it away
read data line 3 into an vector
if this is first time through, concatenate the header vectors; store as next row
in something (a file, a matrix, a dataframe, etc.)
concatenate the data vectors you've been saving, and store as next row in same thing
}
write out the whole 2D data structure
Or if the headers will never change, then you could just embed them literally into the script before the loop, and throw them out no matter what. That will make the code cleaner. Or read the first few lines of the file separately to get the headers, and then have a separate script to read the data and add it to the file with the headers in it. (The headers will probably be useful in R, so I would suggest preserving them at the top of the text file.)

Resources