I am trying to extract some sheets in phpexcel as follows (see https://stackoverflow.com/a/10587576/813801 for reference).
$newObjPHPExcel = new PHPExcel();
$newObjPHPExcel->removeSheetByIndex(0); //remove first default sheet
foreach ($sheets as $sheetIndex) {
echo "trying-$sheetIndex\n";
$workSheet = $objPHPExcel->getSheet($sheetIndex);
echo "done-$sheetIndex\n";
$newObjPHPExcel->addExternalSheet($workSheet);
}
(sheets is an array of indexes which are within the bounds of the sheet. I checked with listWorksheetInfo)
If I comment out the the last line $newObjPHPExcel->addExternalSheet($workSheet);
The getSheet method works fine. Otherwise, I get an error:
Fatal error: Uncaught exception 'PHPExcel_Exception' with message 'Your requested sheet index: 2 is out of bounds. The actual number of sheets is 1.' in /Xls/PHPExcel/PHPExcel.php:577
Why should newObjPHPExcel interfere with objPHPExcel?
UPDATE:
I found a workaround which seems to work. not sure why the other version did not work though.
$newObjPHPExcel = new PHPExcel();
$newObjPHPExcel->removeSheetByIndex(0); //remove first default sheet
foreach ($sheets as $sheetIndex) {
echo "trying-$sheetIndex\n";
$workSheet[] = $objPHPExcel->getSheet($sheetIndex);
echo "done-$sheetIndex\n";
}
foreach ($workSheet as $obj)
$newObjPHPExcel->addExternalSheet($obj);
The addExternalSheet() method actually removes the worksheet from the old workbook and moves it to the new one, but your iterator over the old workbook collection of sheets still believes that it contains that worksheet when it no longer does.
In your second "workround" code, you're not removing the sheets from the old workbook until after you've completed iterating over the first loop, you're simply setting an array of pointers to the sheets, then iterating over that array, so the pointer array doesn't care that the sheets are being moved from one workbook to another, they all still exists, so no error.
An alternative might be to use the worksheet iterator against the old workbook, which should update cleanly as the sheets are removed.
Related
I have this data below in a .csv file:
Needed_values,TEMP,Desc
,022.3,NewYork
3,022.30,India
,027.0,Australia
,027.00,Russia
1,027.1,Austria
,027.10,Norway
,036.2,Hungary
,036.20,Lithunia
2,785.52,Nigeria
I saw in one of the StackOverflow question the way to remove header using FILTER. So,
When I load this file in my pig script and use Filter to remove the header of my csv then all the NULL values under Needed_values also got removed!
LOAD_DATA = LOAD 'DATA.csv' Using PigStorage(',') as
(
NEEDED_VALUES:chararray,
TEMP:chararray,
DESC:chararray
);
FILTER_HEADER = FILTER LOAD_DATA BY NEEDED_VALUES != 'Needed_values';
ACTUAL OUTPUT:
(3,022.30,India)
(1,027.1,Austria)
(1,027.1,Austria)
I'm expecting the output to include everything except the headers- Needed_values,TEMP,Desc:
,022.3,NewYork
3,022.30,India
,027.0,Australia
,027.00,Russia
1,027.1,Austria
,027.10,Norway
,036.2,Hungary
,036.20,Lithunia
2,785.52,Nigeria
The null values will not pass the filter condition. Change the filter to:
FILTER_HEADER = FILTER LOAD_DATA BY NEEDED_VALUES != 'Needed_values' OR NEEDED_VALUES IS NULL;
I have an excel file that contains all the filenames of the Images. The path of these images are stored in an Observable Collection via <File> class which came from the folder that contains all of the images. My goal is to create a hyperlink of these filenames by matching it through the pool of image file collection.
I would like to ask if how can I iterate faster through a large collection of file classes in order to get their paths easily.
For example:
Image name from Excel :
ABC_0001
The Full path from the collection must be:
C:\Users\admin\Desktop\Images\ABC_0001.jpg
In order to get their full path, I perform the iteration through Stream.
My procedures:
Extract data using Apache POI.
Stream through the Image Collection by converting each data into
their base filenames vs extracted data.
Get the result and store the fullpath on the object via
getAbsolutePath().
Code:
//storage during iteration
ObservableList<DetailedData> dataCollection = FXCollections.observableArrayList()
//Image collection containing over 13k Images listed via commons-io
ObservableList<File> IMAGE_COLLECTION = FXCollections.observableArrayList(FileUtils.listFiles(browsedFOLDER, new String[]{"JPG", "JPEG", "TIF", "TIFF", "jpg", "jpeg", "tif", "tiff"}, true));
//Sheet data
Sheet sheet1 = wb.getsheetAt(0);
for (Row row: sheet1)
{
DetailedData data = new DetailedData();
//extracted data from excel
String FILENAME = row.getCell(0,Row.MissingCellPolicy.CREATE_NULL_AS_BLANK).getStringCellValue();
//to be filled up based on stream result.
String IMAGE_SOURCE = null;
//stream code with the help of commons-io
File IMAGE = IMAGE_COLLECTION.stream().filter(e -> FilenameUtils.getBaseName(e.getName()).toLowerCase().equals(FILENAME.toLowerCase())).findFirst().orElse(null);
if (IMAGE != null)
IMAGE_SOURCE = IMAGE.getAbsolutePath();
data.setFileName(FILENAME);
data.setFullPath(IMAGE_SOURCE);
dataCollection.add(data);
}
Result:
Excel rows = 9,400
Image Files = 13,000
Iteration Time = 120,000ms
Are the results should appear normal or it can become faster?
I tried using parallelStream() and the results went faster but it consumes higher CPU usage.
This code should speed your code up a lot, but there are a few questions about your code.
ObservableList<DetailedData> dataCollection = FXCollections.observableArrayList() Why are you using ObservableList? Why is this a list of DetailedData and not File. Given that detailed data has setFileName and setFullPath. File already has these.
ObservableList<File> IMAGE_COLLECTION = FXCollections.observableArrayList(FileUtils.listFiles(browsedFOLDER, new String[]{"JPG", "JPEG", "TIF", "TIFF", "jpg", "jpeg", "tif", "tiff"}, true)); Why ObservableList?
These two are small things, but I am curious.
So what I think you should do is use a Map. Your code should look something like the code below.
//storage during iteration
List<DetailedData> dataCollection = new ArrayList();
//Image collection containing over 13k Images listed via commons-io
List<File> IMAGE_COLLECTION = new ArrayList(FileUtils.listFiles(new File("C:\\Users\\blj0011\\Pictures"), new String[]{"JPG", "JPEG", "TIF", "TIFF", "jpg", "jpeg", "tif", "tiff"}, true));
//Use this to map file name to file
Map<String, File> map = new HashMap();
//Use this to add data to the map
IMAGE_COLLECTION.forEach((file) -> {map.put(file.getName().substring(0, file.getName().lastIndexOf(".")).toLowerCase(), file);});
for (Row row: sheet1)
{
//extracted data from excel
String FILENAME = row.getCell(0,Row.MissingCellPolicy.CREATE_NULL_AS_BLANK).getStringCellValue();
//If the map contains the file name, create `DetailedData` object. Then set data. Then add object to datacollection list.
if (map.containsKey(FILENAME.toLowerCase()))
{
DetailedData data = new DetailedData();
data.setFileName(FILENAME);
data.setFullPath(map.get(FILENAME.toLowerCase()).getAbsolutePath());
dataCollection.add(data);
}
}
Comments in the code
I still believe this could be cleaned up a little more if you used List<File> dataCollection = new ArrayList()
If you really want to speed up your search, you should try not to do things repeatedly which could just be done once. For example you could use two loops. The first to prepare your search and the second to actually do the search. Inside your filter you call FilenameUtils.getBaseName and two time a conversion to lower case. It would be better to do these things only once in the first loop and store the resulting Strings in a list. In the second loop you then do the search on this list.
I am also wondering why you use ObservableLists here. A simple List would do as well.
I've tested another approach in this slow iteration.
It seems that the cause is declaring the Stream repeatedly inside the foreach.
I tried using Baeldung's solution <Supplier> and declared it outside the loop together with parallelStream()
Sample Code:
Supplier<Stream<File>> streamSupplier = () -> imageCollection.parallelStream();
for (Row row : sheet)
{
File IMAGE = streamSupplier.get().filter(e -> FilenameUtils.getBaseName(e.getName()).toLowerCase().equals(FILENAME.toLowerCase())).findFirst().orElse(null);
if (IMAGE != null)
IMAGE_SOURCE = IMAGE.getAbsolutePath();
}
Result went 45000ms
Please correct me if my approach was not right.
I create a sheet
$xls = new PHPExcel();
$xls->addSheet('my sheet');
I create an object to read a .csv
$objReader = PHPExcel_IOFactory::createReader('myfile.csv');
$objPHPExcel = $objReader->load($myFileName);
Is there a short way to place all the rows from the csv into the current worksheet?
The worksheet object has fromArray() and toArray() methods, so why not use those?
$xls->getActiveSheet()
->fromArray(
$objPHPExcel->getActiveSheet()->toArray()
);
Otherwise you can iterate over $objPHPExcel->getActiveSheet() reading cells, or rows at a time, then inserting them into $xls->getActiveSheet()
I am trying to make a small program to apply autocorrect changes to an exiting document. I am using the docX library. My question is, how do you iterate (or loop) through each word in the document, using the docX library, to check if it needs to be corrected or not (I have already inserted all auto correct entries in a list<T>).
try this...
DocX document = DocX.Load( <document path> );
foreach(Novacode.Paragraph item in document.Paragraphs) {
// use this if you need whole text of a paragraph
string paraText = item.Text;
// use this if you need word by word
foreach(var data in item.MagicText) {
string word = data.text;
}
}
I am looking for help with 2 parts of my iMacro Script...
Part1 - Variable
I am clicking on the follwoing line of a page in order to access the page I need to extract from.
1st Link
TAG POS=**8** TYPE=A FORM=NAME:xxyy ATTR=HREF:https://aaa.aaaa.com/en/administration/xxxx.jsp?reqID=h*
2nd Link
TAG POS=**9** TYPE=A FORM=NAME:xxyy ATTR=HREF:https://aaa.aaaa.com/en/administration/xxxx.jsp?reqID=h*
The tag pos is the variable, how can I get this so that when running on loop, the macro will select the next value on the screen (ie choose 8,9,10)? Some screens have 100 plus links to be clicked on.
Part 2 - Save CSV file
I have the saveas line in my file. But how can I make it so that there is only 1 csv file created (even if macro is runn 50 times)? Also, is there a way to format the CSV file from the iMacros so that each new run starts on another row (currently, all data extracts to row 1 across many columns.)
Thank you in advance,
Adam
This will do what you asked. It will loop the macro and each time set the new position number in the macro.
1)
var macro;
macro ="CODE:";
macro +="TAG POS={{number}} TYPE=A FORM=NAME:xxyy ATTR=HREF:https://aaa.aaaa.com/en/administration/xxxx.jsp?reqID=h*"+"\n";
for(var i=1;i<100;i++)
{
iimSet("number",i)
iimPlay(macro)
}
For the solution of part two you will need JavaScript scripting. First part is declaring macro and the second part is initiating the macro and the third part is the function which saves the extracted text into a file. Each time you run it will save in the new line.
2)
var macroExtractSomething;
macroExtractSomething ="CODE:";
macroExtractSomething +="TAG POS=1 TYPE=DIV ATTR=CLASS:some_class_of_some_div EXTRACT=TXT"+"\n";
iimPlay(macroExtractSomething)
var extracted_text=iimGetLastExtract();
WriteFile("C:\\some_folder\\some_file.csv",extracted)
//This function writes string into a file. It will also create file on that location
function WriteFile(path,string)
{
//import FileUtils.jsm
Components.utils.import("resource://gre/modules/FileUtils.jsm");
//declare file
var file = new FileUtils.File(path);
//declare file path
file.initWithPath(path);
//if it exists move on if not create it
if (!file.exists())
{
file.create(file.NORMAL_FILE_TYPE, 0666);
}
var charset = 'EUC-JP';
var fileStream = Components.classes['#mozilla.org/network/file-output-stream;1']
.createInstance(Components.interfaces.nsIFileOutputStream);
fileStream.init(file, 18, 0x200, false);
var converterStream = Components
.classes['#mozilla.org/intl/converter-output-stream;1']
.createInstance(Components.interfaces.nsIConverterOutputStream);
converterStream.init(fileStream, charset, string.length,
Components.interfaces.nsIConverterInputStream.DEFAULT_REPLACEMENT_CHARACTER);
//write file to location
converterStream.writeString("\r\n"+string);
converterStream.close();
fileStream.close();
}