I am using PHPExcel library to read spread sheet data. This library is giving me trouble reading the 'DATE' values.
Question:
How to instruct the PHPExcel to read the 'DATE' values properly even if 'setReadDataOnly' is set as 'false'?
Description:
PHPExcel version: 2.1, Environment: Ubuntu 12
Here is the code block:
$objReader = \PhpExcel\PHPExcel_IOFactory::createReaderForFile($spreadSheetFullFilePathString);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($spreadSheetFullFilePathString);
$objWorksheetObject = $objPHPExcel->getActiveSheet();
Am getting 'integer' values for the 'DATE' column values by default.
I found that, the line '$objReader->setReadDataOnly(true)' is causing the trouble.
So, Changed it as '$objReader->setReadDataOnly(false)'.
I am getting the Date column value properly after this change. But now the reader is reading ALL the ROWS + COLUMNS found in the excel.
Any help would be greatly appreciated.
If $objReader->setReadDataOnly(true);, then the only reader that can identify dates is the Gunumeric Reader. A date value can only be identified by the number format mask, and setReadDataOnly(true) tells the Reader not to read the formatting, so dates cannot subsequently be identified (Gnumeric is the exception, because it was written after the problem was identified, but none of the other Readers have yet been modified to read this information regardless of the setReadDataOnly value. If you need to identify dates, then the only option at this point is $objReader->setReadDataOnly(false);
However, for date cells, you shouldn't get an integer value; you should get a float: it's the decimal that identifies the time part of the Excel datetime serialized value.
If you know which cells contain dates, then you can convert them to unix timestamps or PHPExcel DateTime objects using the helper functions defined in the PHPExcel_Shared_DateTime class (and can then use all the standard PHP date handling functions).
You say that the reader is now reading ALL the ROWS + COLUMNS found in the excel: it should, irrespective of the setReadDataOnly value. If you don't want to read all the rows and columns, then you need to set a Read Filter.
Related
I have some very large CSV files (~183mio. rows by 8 columns) that I want to load into a database using R. I use duckdb for this and it its built-in function duckdb_read_csv, which is supposed to auto-detect datatypes for each column. If I enter the following code:
con = dbConnect(duckdb::duckdb(), dbdir="testdata.duckdb", read_only = FALSE)
duckdb_read_csv(con, "d15072021","mydata.csv",
header = TRUE)
It produces this error:
Error: rapi_execute: Failed to run query
Error: Invalid Input Error: Could not convert string '2' to BOOL between line 12492801 and 12493825 in column 9. Parser options: DELIMITER=',', QUOTE='"', ESCAPE='"' (default), HEADER=1, SAMPLE_SIZE=10240, IGNORE_ERRORS=0, ALL_VARCHAR=0
I've looked at the rows in question and I can't find any irregularities in column 9. Unfortunately, I cannot post the dataset because it's confidential. But the entire column is filled with either FALSE or TRUE.
If I set the parameter nrow.check to something larger than 12493825 it doesn't produce the same error but takes very long and simply converts the column to VARCHAR instead of a logical. Setting nrow.check to -1 (meaning it checks every row for a pattern) crashes R and my PC completely.
The weird thing: This isn't consistent. Earlier I imported the dataset whilst keeping the default value for nrow.check at 500 and it read the file with no issue (though still converting column 9 to VARCHAR). I have to read a lot of files that are the same pattern so I need to have a reliable way of reading them. Anyone know how duckdb_read_csv actually works and why I might get this error?
Note that reading the files into memory and then into a database isn't an option because I run out of memory instantly.
the way the sniffer works is by sampling nrow.check rows to figure out the data type, so the result can differ from runs if you get unlucky, increasing it will reduce the chances of failing it, mainly because the sniffer looks at more rows.
If increasing the number of rows is not possible due to performance issues, you can of course first define the schema of the CSV file. But then you must know the schema beforehand.
As an example of how you can define the schema and turn off the sniffer:
select * from
SELECT * FROM read_csv('test.csv', COLUMNS=STRUCT_PACK(a := 'INTEGER', b := 'INTEGER'), auto_detect='false')
I am working with an S3 bucket and one of the fields is a datetime group and the values appear to be 128-bit integers. An example: 45376204963002810833065984. Has anyone seen this before? If so, how should this be parsed? This should work out to a date in 2022.
This is giving me problems when I try to get the data into a pandas (or polars) dataframe. The data outputs as a JSON string, and then when I try to put it into a pandas dataframe, I get an error: Value too big!. So I'm hoping to use SQL to cast this integer as a datetime before outputting as a JSON file.
I should read data from more than 4 different excel file with different cell formating but same data within, so how i can change the cell format then read the data using phpexcel?
If you're storing a numeric value that's longer than a 32-bit signed integer can handle (such as 435546567567345) then treat it as a string using
$objPHPExcel->getActiveSheet()
->setCellValueExplicit(
'A1',
'435546567567345',
PHPExcel_Cell_DataType::TYPE_STRING
)
EDIT
If you're reading this value from an Excel worksheet, and it is actually a number value rather than a string containing a numeric value, then it is likely being treated as a float by MS Excel, so there may well be some loss of precision already (unless the file was created using a 64-bit version of MS Excel), even before PHPExcel reads it. If it is a number created using a 64-bit version of MS Excel, then you'll need a 64-bit version of PHP to read it without loss of precision.
Try reading the raw, unformatted value using getValue() and then doing a var_dump() to see what datatype it actually is; or try using getDataType() to see what the value was being stored as in the Excel file
I've am trying to format cells in an Excel document I create with PHPExcel using the setFormatCode method. I've been going through the list of 'FORMAT_DATE_xxxxxxx' options associated with this method, but I've been unable to find one that lists the month, date, and year in that order. Did I miss something here?
You missed FORMAT_DATE_XLSX14 and FORMAT_DATE_XLSX22
You also missed the fact that these are simply predefined string values, and that you can use any valid Excel numberformat string value that you set yourself.
$objPHPExcel->getActiveSheet()->getStyle('C9')
->getNumberFormat()
->setFormatCode(
'mm-dd-yyyy' // my own personal preferred format that isn't predefined
);
In PhpExcel library when i am assigning values to IW4 the assigned value not generatted there
Steps:
We are using The code to generate the Value to cell in PHPExcel
**$objPHPExcel->getActiveSheet()->setCellValue('A1', 'cell value here');**
When i am using it to generate value to IW4 cell the value not getting generatted
**$objPHPExcel->getActiveSheet()->setCellValue('IW4', 'cell value here');**
Please Help me to find the solution
BIFF format Excel files only allow 256 columns (up to IV), OfficeOpenXML allows more.
If you set a value in a column beyond the limit, PHPExcel only knows it's invalid at the point where you save (when it knows whether you're saving as an Excel5 or Excel2007 file), Rather than trigger an exception at that point (which would be much more frustrating if it was a long running script), it silently discards the invalid columns or rows.
This is similar behaviour to Excel itself, if you open an xlsx file in an earlier version of Excel that doesn't support as many rows and columns.