PHPExcel: Opening a file takes a long time - phpexcel

I'm using PHPExcel to read through Excel spreadsheets of various sizes and then import the cell data into a database. Reading through the spreadsheet itself works great and is very quick, but I've noticed that the time to actually load/open the file for PHPExcel to use can take up to 10-20 seconds (the larger the file, the longer it takes--especially if the spreadsheet is >1MB in size).
This is the code I'm using to load the file before iterating through it:
$filetype = PHPExcel_IOFactory::identify($file);
$objReader = PHPExcel_IOFactory::createReader($filetype);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($file);
What can I do to get the file to load faster? It's frustrating that the greatest latency in importing the data is just in opening up the file initially.
Thank you!

I've seen this same behavior with Ruby and an Excel library: a non-trivial amount of time to open a large file, where large is > 500KB.
I think the cause is two things:
1) an xlsx file is zip compressed, so it must first be un-compressed
2) an xlsx file is a series of XML files, which all must be parsed.
#1 can be a small hit, but most likely it pales in comparison to #2. I believe its the XML parsing that is the real culprit. In addition, the XML parser is a DOM-based parser, so the whole XML DOM must be parsed and loaded into memory.
I don't think there is really anything you can do to speed this up. A large xlsx file contains a lot of XML which must be parsed and loaded into memory.

Actually, there is something you can do. The problem with most of the XML parsers is that they first load the entire document in memory. For big documents, this takes a considerable amount of time.
A way to avoid this is to use parsers that allow streaming. So instead of loading all the XML files content in memory, you just load the part you need. That way, you can pretty much have only one row at a time in memory. This is super fast AND memory efficient.
If you are curious, you can find an example of a library using this technique here: https://github.com/box/spout

Related

Converapi - Pdf Limitation and Load

We are thinking to use ConverAPI component to handle the pdf conversion in our application.
But we are still unclear about the Limitation of Pdf generation and the Load handling.
How much Load will it support to do the pdf conversion? (e.g. in a sequence if we send 100 request at a time to do the pdf conversion will it work without any crash?)
What is the limitation of handling the pdf conversion? (e.g. if i send a document size around 800MB -1024MB will it be able to handle it for doing the Pdf conversion?)
100 simultaneous file uploads is inefficient. The best is to use ~4 (and it also highly depends on the situation). If you are really planning to convert 100x1Gb files simultaneously, please consult with the support.
The hard limit is 1Gb for files that are processed. Rest depends on file complexity and conversion.
The best would be to register and try it for free with your files.

What is the size limit for .r file extension size?

What is the max file size limit for .r extension file now?
I read that it has 5MB limit, is it still the same? How does that change, will it be different from OS to OS or R version to version.
Reference: RStudio maximum file size reached
I'm very new to R, can someone please help me?
Thanks
There is no documented limit for the maximum file size or R code files. In fact, R will be able to deal with anything that’s even remotely reasonable. But for what it’s worth a 5 MiB source code file is not reasonable. If you actually have such files, I strongly suggest removing the large data that’s declared inside it, and moving it to a proper data file instead: separate your code and data. Actual code will never be this big.
As for editing such a file, different code editors have different limits for the size of files they deal well with. Again, having such a big code file is plain unreasonable, so not many code editors bother catering to this use-case, and even though few editors have a hard-coded limit, interactively editing such large files may not work.

Update a single sheet in a workbook

I like using Excel as a poor man's database for storing data dictionaries and such, because Excel makes it super easy to edit the data in there without the pains of installing a RDBMS.
Now I hit an unexpected problem. I can't find a simple way to rewrite just one of the worksheets, at least not without reading and writing the whole file.
write.xlsx(df,file ="./codebook.xlsx",sheetName="mysheet",overwrite=F)
Complains file exists. With overwrite=T, my sheets are lost.

Read CSV - Memory Issue

Easy question. I need to read a CSV file in .NET, and for that I'm using the Lumenworks CSV library.
The problem is that it seems this solution reads the entire CSV content into memory. I was wondering if there's another option that would let me run through the CSV content one element at a time, and therefore, consume less memory.
Something like XmlDocument vs. XmlReader.
Thanks
You can use StreamReader Class to load the file line by line to do some operations like searching, matching, etc., with the method StreamReader.ReadLine Method. One sample is contained in it to show how. This really costs little time.
Store the position or line number after once of operation, then in the next operation use the Stream.Seek Method to start load from the stored position.

Parse a large JSON file in ActionScript 3

I need to parse a large trace file (up to 200-300 MB) in a Flex application. I started using JSON instead of XML hoping to avoid these problems, but it did not help much. When the file is bigger than 50MB, JSON decoder can't handle it (I am using the as3corelib).
I have doing some research and I found some options:
Try to split the file: I would really like to avoid this; I don't want to change the current format of the trace files and, in addition, it would be very uncomfortable to handle.
Use a database: I was thinking of writing the trace into a SQLite database and then reading from there, but that would force me to modify the program that creates the trace file.
From your experience, what do you think of these options? Are there better options?
The program that writes the trace file is in C++.
Using AMF will give you much smaller data sizes for transfer because it is a binary, not text format. That is the best option. But, you'll need some middleware to translate the C++ program's output into AMF data.
Check out James Ward's census application for more information about benchmarks when sharing data:
http://www.jamesward.com/census/
http://www.jamesward.com/2009/06/17/blazing-fast-data-transfer-in-flex/
Maybe you could parse the file into chunks, without splitting the file itself. That supposes some work on the as3 core lib Json parser, but it should be doable, I think.
I found this library which is a lot faster than the official one: https://github.com/mherkender/actionjson
I am using it now and works perfectly. It also has asynchronous decoder and encoder

Resources