phpexcel calculate formulas before saving sheet - phpexcel

If I save a phpexcel document in Excel5 format that contains values only, people that reference the document can open and close it without issue.
But if I put some formulas in cells, I have two undesired outcomes.
Just before saving the document I set the column widths based on the contents of the columns. Since the formulas have not been calcuated, the columns appear to be only as large as the largest single value in the column so the width is set too narrow. Once the =sum() formulas are calculated after being opened in excel, the contents overflow the cell width and display as a string of ###.
The second effect is that when the total is calculated by excel, the book is marked as modified by excel. When the user attempts to exit the book, they are prompted to see if they want to save their changes. This is disconcerting because in their mind they have not changed anything and annoying because it is an interuption that they really don't want to contend with.
I have been searching the documentation. I found a reference to $objWriter->setPreCalculateFormulas(true) but it does not help with either issue.

If a column is set to AutoSize, PHPExcel attempts to calculate the column width based on the calculated value of the column (so on the result of your SUM() formula), and any additional characters added by format masks such as thousand separators. By default, this is an estimated width: a more accurate calculation method is available, based on using GD, but this is a much bigger overhead, so it is turned off by default. You can enable the more accurate calculation using
PHPExcel_Shared_Font::setAutoSizeMethod(PHPExcel_Shared_Font::AUTOSIZE_METHOD_EXACT);
If a worksheet contains formulae, then some versions of MS Excel files hold additional information detailing the calculation tree: data that is not saved by PHPExcel (because calculating the tree structure is a big overhead). You don't indicate which format you are using to save your workbooks, or which version of MS Excel you're using to open them; but this is the normal explanation for any prompting to save changes when a PHPExcel-generated file is opened in MS Excel.

It works for me if you paste it just before saving the file, like this:
$objWriter->setPreCalculateFormulas(true);
$objWriter -> save("file.xlsx");

Related

R misreading of time from xlsx dataTable

I have an issue very annoying.
I have some oxygen measurements saved in .xlsx table (created directly by the device software). Opened with excel, this is my part of my file.
In the first picture, we can notice that sometimes, the software skips a second (11:13:00 then 13:02).
in the second picture, just notice the continuity of time from 11:19:01 to 11:19:09.
I call my excel table in R with the package readxl with the code
oxy <- read_excel("./Metabolism/20180502 DAPH 20.xlsx" , 1)
And before any manipulation, when I check my table in R (Rstudio), I have that:
In the first case, R kept the time continuity by adding 11:13:01 and shift the next rows.
Then, later, reverse situation: the continuity of time was respected in excel, but R skips a second and again, shits the next rows.
At the end, there is the same number of rows. I guess it is a problem with the way R and excel round the time. But these little errors prevent me using the date to merge two tables, and the calculations afterwards are wrong.
May I do something to tell R to read the data exactly the same way Excel saved them?
Thank you very much!
Index both with a sequential integer counter each starting at the same point and use that for merging like with like. If you want the Excel version to be 'definitive' convert the index back to time with a lookup based on your Excel version.

Weka Apriori No Large Itemset and Rules Found

I am trying to do apriori association mining with WEKA (i use 3.7) using given database table
So, i exported two columns (orderLineNumber and productCode) and load it into weka, as far as i go, i haven't got any success attempt, always ended with "No large itemsets and rules found!"
Again, i tried to convert the csv into ARFF file first using ARFF Converter and still get the same message;
I also tried using database loader in WEKA, the data loaded just fine but still give the same result;
The filter i've applied in preprocessing is only numericToNominal filter;
What have i wrongly done here, i suspiciously think it was my ARFF format though, thank you
Update
After further trial, i found out that i exported wrong column and i lack 1 filter process, which is "denormalized", i installed the plugin via packet manager and denormalized my data after converting it to nominal first;
I then compared the results with "Supermarket" sample's result; The only difference are my output came with 'f' instead of 't' (like shown below) and the confidence value seems like always 100%;
First of all, OrderLine is the wrong column.
Obviously, the position on the printed bill is not very important.
Secondly, the file format is not appropriate.
You want one line for every order, one column for every possible item in the #data section. To save memory, it may be helpful to use sparse formats (do not forget to set flags appropriately)
Other tools like ELKI can process input formats like this, that may be easier to use (it also was a lot faster than Weka):
apple banana
milk diapers beer
but last I checked, ELKI would "only" find frequent itemsets (the harder part) not compute association rules. I then used a tiny python script to produce actual association rules as desired.

Either unformatted I/O is giving absurd values, or I'm reading them incorrectly in R

I have a problem with unformatted data and I don't know where, so I will post my entire workflow.
I'm integrating my own code into an existing climate model, written in fortran, to generate a custom variable from the model output. I have been successful in getting sensible and readable formatted output (values up to the thousands), but when I try to write unformatted output then the values I get are absurd (on the scale of 1E10).
Would anyone be able to take a look at my process and see where I might be going wrong?
I'm unable to make a functional replication of the entire code used to output the data, however the relevant snippet is;
c write customvar to file [UNFORMATTED]
open (unit=10,file="~/output_test_u",form="unformatted")
write (10)customvar
close(10)
c write customvar to file [FORMATTED]
c open (unit=10,file="~/output_test_f")
c write (10,*)customvar
c close(10)
The model was run twice, once with the FORMATTED code commented out and once with the UNFORMATTED code commented out, although I now realise I could have run it once if I'd used different unit numbers. Either way, different runs should not produce different values.
The files produced are available here;
unformatted(9kb)
formatted (31kb)
In order to interpret these files, I am using R. The following code is what I used to read each file, and shape them into comparable matrices.
##Read in FORMATTED data
formatted <- scan(file="output_test_f",what="numeric")
formatted <- (matrix(formatted,ncol=64,byrow=T))
formatted <- apply(formatted,1:2,as.numeric)
##Read in UNFORMATTED data
to.read <- file("output_test_u","rb")
unformatted <- readBin(to.read,integer(),n=10000)
close(to.read)
unformatted <- unformatted[c(-1,-2050)] #to remove padding
unformatted <- matrix(unformatted,ncol=64,byrow=T)
unformatted <- apply(unformatted,1:2,as.numeric)
In order to check the the general structure of the data between the two files is the same, I checked that zero and non-zero values were in the same position in each matrix (each value represents a grid square, zeros represent where there was sea) using;
as.logical(unformatted)-as.logical(formatted)
and an array of zeros was returned, indicating that it is the just the values which are different between the two, and not the way I've shaped them.
To see how the values relate to each other, I tried plotting formatted vs unformatted values (note all zero values are removed)
As you can see they have some sort of relationship, so the inflation of the values is not random.
I am completely stumped as to why the unformatted data values are so inflated. Is there an error in the way I'm reading and interpreting the file? Is there some underlying way that fortran writes unformatted data that alters the values?
The usual method that Fortran uses to write unformatted file is:
A leading record marker, usually four bytes, with the length of the following record
The actual data
A trailing record marker, the same number of bytes as the leading record marker, with the same information (used for BACKSPACE)
The usual number of bytes in the record marker is four bytes, but eight bytes have also been sighted (e.g. very old versions of gfortran for 64-bit systems).
If you don't want to deal with these complications, just use stream access. On the Fortran side, open the file with
OPEN(unit=10,file="foo.dat",form="unformatted",access="stream")
This will give you a stream-oriented I/O model like C's binary streams.
Otherwise, you would have to look at your compiler's documentation to see how exactly unformatted I/O is implemented, and take care of the record markers from the R side. A word of caution here: Different compilers have different methods of dealing with very long records of more than 2^31 bytes, even if they have four-byte record markers.
Following on from the comments of #Stibu and #IanH, I experimented with the R code and found that the source of error was the incorrect handling of the byte size in R. Explicitly specifying a bite size of 4, i.e
unformatted <- readBin(to.read,integer(),size="4",n=10000)
allows the data to be perfectly read in.

PHPExcel get display value of unknown type

It is possible to use getValue(), getCalculatedValue() and getOldCalculatedValue() to retrieve the value of a cell in phpexcel.
Is there a way to determine programatically the content type of the cell and apply the corresponding correct method. I need to use this in a general way. i.e. to display the same value as when opening excel.
I know there is something called getDataType() but not sure how to apply it in this case (not in documentation). In my experience sometimes only one of these three retrieves the correct value.
(i.e. sometimes getOldCalculatedValue works but not getCalculatedValue for a formula for example. other times only getvalue works, etc.)
getOldCalculatedValue() is used to retrieve the result of a previous calculation in MS Excel itself; and should not be relied on, because it is possible to disable autocalculate in MS Excel, which can leave this field empty, or even with an incorrect value. It is used within PHPExcel as a "fallback" for cell formulae that are reliant on external spreadsheet data, but it still shouldn't be trusted as an absolute.
getValue() returns the "raw" value of the cell. The returned value may require "interpretation". A cell containing a date and/or time is simply a float value in MS Excel, so it will return that float (e.g. 42017.7916666667 instead of a human-readable date/time like 13-Jan-2015 19:00;
and it will return the actual formula if a cell contains a formula (e.g. =TODAY()); or 0.8 for a value that might be formatted as a percentage and that appears as 80% in MS Excel itself.
getCalculatedValue() will attempt to execute a formula calculation if a cell contains a formula, and return the result of that calculation. If the cell doesn't contain a formula, then it will return the "raw" value, in the same way as getValue(). While PHPExcel has a fairly good calculation engine, it isn't perfect (it can't handle 3d cell ranges or array formulae for example), so it is possible for some formulae to fail. Likewise, formulae containing references to external resources may also fail, and while PHPExcel will attempt to use the getOldCalculatedValue() in that circumstance, it isn't (as mentioned above) guaranteed to maintain the correct result.
getFormattedValue() will execute getCalculatedValue(), and then apply any number formatting mask that applies to that cell against the result, so that (for example) a float with a date mask will be displayed as a date.
However, if you've loaded a spreadsheet file with readDataOnly(true), then that tells PHPExcel not to load any formatting, including number format masks, so it will not be able to format the result.
When you access MS Excel itself, then the closest result to the values displayed in MS Excel itself will be getFormattedValue()

Active Cells of Excel In RExcel

Cells in MS-Excel are always actives. Formulas update automatically when any value is modified. In R-Excel, I put data into R array/Dataframe and use it in a formula, and get the output.
When I change any data, I have to do all steps again to get the modified result. I want to do it automatically without writing any macros as excel does. I may do it to create an excel macro, but I don't want to.
Or how to keep data into R-Excel in active cell, so R may take the current value of every variable for every run/execution of R-commands.
Can anyone tell me the solution?
RApply should do what you want.

Resources