How do I take a .csv file and add it as a worksheet to a PHPExcel object? - phpexcel

I create a sheet
$xls = new PHPExcel();
$xls->addSheet('my sheet');
I create an object to read a .csv
$objReader = PHPExcel_IOFactory::createReader('myfile.csv');
$objPHPExcel = $objReader->load($myFileName);
Is there a short way to place all the rows from the csv into the current worksheet?

The worksheet object has fromArray() and toArray() methods, so why not use those?
$xls->getActiveSheet()
->fromArray(
$objPHPExcel->getActiveSheet()->toArray()
);
Otherwise you can iterate over $objPHPExcel->getActiveSheet() reading cells, or rows at a time, then inserting them into $xls->getActiveSheet()

Related

Iterate faster over a large collection of files (objects) inside an Observable List (JavaFX 8)

I have an excel file that contains all the filenames of the Images. The path of these images are stored in an Observable Collection via <File> class which came from the folder that contains all of the images. My goal is to create a hyperlink of these filenames by matching it through the pool of image file collection.
I would like to ask if how can I iterate faster through a large collection of file classes in order to get their paths easily.
For example:
Image name from Excel :
ABC_0001
The Full path from the collection must be:
C:\Users\admin\Desktop\Images\ABC_0001.jpg
In order to get their full path, I perform the iteration through Stream.
My procedures:
Extract data using Apache POI.
Stream through the Image Collection by converting each data into
their base filenames vs extracted data.
Get the result and store the fullpath on the object via
getAbsolutePath().
Code:
//storage during iteration
ObservableList<DetailedData> dataCollection = FXCollections.observableArrayList()
//Image collection containing over 13k Images listed via commons-io
ObservableList<File> IMAGE_COLLECTION = FXCollections.observableArrayList(FileUtils.listFiles(browsedFOLDER, new String[]{"JPG", "JPEG", "TIF", "TIFF", "jpg", "jpeg", "tif", "tiff"}, true));
//Sheet data
Sheet sheet1 = wb.getsheetAt(0);
for (Row row: sheet1)
{
DetailedData data = new DetailedData();
//extracted data from excel
String FILENAME = row.getCell(0,Row.MissingCellPolicy.CREATE_NULL_AS_BLANK).getStringCellValue();
//to be filled up based on stream result.
String IMAGE_SOURCE = null;
//stream code with the help of commons-io
File IMAGE = IMAGE_COLLECTION.stream().filter(e -> FilenameUtils.getBaseName(e.getName()).toLowerCase().equals(FILENAME.toLowerCase())).findFirst().orElse(null);
if (IMAGE != null)
IMAGE_SOURCE = IMAGE.getAbsolutePath();
data.setFileName(FILENAME);
data.setFullPath(IMAGE_SOURCE);
dataCollection.add(data);
}
Result:
Excel rows = 9,400
Image Files = 13,000
Iteration Time = 120,000ms
Are the results should appear normal or it can become faster?
I tried using parallelStream() and the results went faster but it consumes higher CPU usage.
This code should speed your code up a lot, but there are a few questions about your code.
ObservableList<DetailedData> dataCollection = FXCollections.observableArrayList() Why are you using ObservableList? Why is this a list of DetailedData and not File. Given that detailed data has setFileName and setFullPath. File already has these.
ObservableList<File> IMAGE_COLLECTION = FXCollections.observableArrayList(FileUtils.listFiles(browsedFOLDER, new String[]{"JPG", "JPEG", "TIF", "TIFF", "jpg", "jpeg", "tif", "tiff"}, true)); Why ObservableList?
These two are small things, but I am curious.
So what I think you should do is use a Map. Your code should look something like the code below.
//storage during iteration
List<DetailedData> dataCollection = new ArrayList();
//Image collection containing over 13k Images listed via commons-io
List<File> IMAGE_COLLECTION = new ArrayList(FileUtils.listFiles(new File("C:\\Users\\blj0011\\Pictures"), new String[]{"JPG", "JPEG", "TIF", "TIFF", "jpg", "jpeg", "tif", "tiff"}, true));
//Use this to map file name to file
Map<String, File> map = new HashMap();
//Use this to add data to the map
IMAGE_COLLECTION.forEach((file) -> {map.put(file.getName().substring(0, file.getName().lastIndexOf(".")).toLowerCase(), file);});
for (Row row: sheet1)
{
//extracted data from excel
String FILENAME = row.getCell(0,Row.MissingCellPolicy.CREATE_NULL_AS_BLANK).getStringCellValue();
//If the map contains the file name, create `DetailedData` object. Then set data. Then add object to datacollection list.
if (map.containsKey(FILENAME.toLowerCase()))
{
DetailedData data = new DetailedData();
data.setFileName(FILENAME);
data.setFullPath(map.get(FILENAME.toLowerCase()).getAbsolutePath());
dataCollection.add(data);
}
}
Comments in the code
I still believe this could be cleaned up a little more if you used List<File> dataCollection = new ArrayList()
If you really want to speed up your search, you should try not to do things repeatedly which could just be done once. For example you could use two loops. The first to prepare your search and the second to actually do the search. Inside your filter you call FilenameUtils.getBaseName and two time a conversion to lower case. It would be better to do these things only once in the first loop and store the resulting Strings in a list. In the second loop you then do the search on this list.
I am also wondering why you use ObservableLists here. A simple List would do as well.
I've tested another approach in this slow iteration.
It seems that the cause is declaring the Stream repeatedly inside the foreach.
I tried using Baeldung's solution <Supplier> and declared it outside the loop together with parallelStream()
Sample Code:
Supplier<Stream<File>> streamSupplier = () -> imageCollection.parallelStream();
for (Row row : sheet)
{
File IMAGE = streamSupplier.get().filter(e -> FilenameUtils.getBaseName(e.getName()).toLowerCase().equals(FILENAME.toLowerCase())).findFirst().orElse(null);
if (IMAGE != null)
IMAGE_SOURCE = IMAGE.getAbsolutePath();
}
Result went 45000ms
Please correct me if my approach was not right.

Save a data frame to a file addressing by name

I have a data frame and a text variable containing the name of this data frame:
adsl = data.frame(a=2, b=7, w=17)
ds_name = "adsl"
I want to save my data frame from the workspace to the file named "dest_file". The code should be wrapped into a function get_r()
with the data frame name as an argument:
get_r(ds_name="adsl")
So I need to avoid using the explicit name "adsl" inside the code.
The following works almost correctly but the resulting data frame is called "temp_dataset", not "adsl":
get_r = function(ds_name){
temp_dataset = eval(parse(text=ds_name))
save(temp_dataset, file = "dest_file")
}
Here is another option which works wrong (the text string is being saved, not the data frame):
get_r = function(ds_name){
save(ds_name, file = "dest_file")
}
What should I do to make R just execute
save(adsl, file="dest_file")
inside the function? Thank you for any help.
Try
save(list = ds_name, file = "dest_file")
The list argument in save() allows you to pass the name of the data as a character string. See help(save) for more.

PHPExcel listWorksheetNames returns Empty

The code below is returning an empty sheets array for at least two different .xlsx files (from the same source), although for other .xls and .xlsx it does return an filled array.
$inputFileType is valid" EXCEL2007", $path is valid and there are sheets in the file. var_dump on $objReader display an object.
Any ideas?
public function listSheets($path)
{
$inputFileType = PHPExcel_IOFactory::identify($path);
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$sheets = $objReader->listWorksheetNames($path);
return $sheets;
}
Update
The function listWorkBookInfo also fails for the same files that listWorkBookNames fails.
The workbook xml are different. Theses xmk are outut from
$this->getFromZipArchive($zip, "{$rel['Target']}")
PHPExcel likes
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:ignorable="x15" xmlns:x15="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main"><fileversion appname="xl" lastedited="6" lowestedited="4" rupbuild="14420"><workbookpr defaultthemeversion="124226"><mc:alternatecontent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"><mc:choice requires="x15"><x15ac:abspath url="W:\Rates\carrier\" xmlns:x15ac="http://schemas.microsoft.com/office/spreadsheetml/2010/11/ac"></x15ac:abspath></mc:choice></mc:alternatecontent><bookviews><workbookview xwindow="0" ywindow="0" windowwidth="21570" windowheight="8145"></workbookview></bookviews><sheets><sheet name="Rates" sheetid="1" r:id="rId1"><sheet name="Data" sheetid="4" state="hidden" r:id="rId2"></sheet></sheet></sheets><calcpr calcid="152511"></calcpr></workbookpr></fileversion></workbook>
and PHPExcel does not likes
<x:workbook xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><x:fileversion appname="xl" lastedited="5" lowestedited="5" rupbuild="9303"><x:workbookpr defaultthemeversion="124226"><x:bookviews><x:workbookview xwindow="360" ywindow="45" windowwidth="10515" windowheight="7245"></x:workbookview></x:bookviews><x:sheets><x:sheet name="International Destinations" sheetid="5" r:id="rId1"><x:sheet name="National Destinations" sheetid="4" r:id="rId2"></x:sheet></x:sheet></x:sheets><x:calcpr calcid="145621"></x:calcpr></x:workbookpr></x:fileversion></x:workbook>

Error copying sheets in phpexcel

I am trying to extract some sheets in phpexcel as follows (see https://stackoverflow.com/a/10587576/813801 for reference).
$newObjPHPExcel = new PHPExcel();
$newObjPHPExcel->removeSheetByIndex(0); //remove first default sheet
foreach ($sheets as $sheetIndex) {
echo "trying-$sheetIndex\n";
$workSheet = $objPHPExcel->getSheet($sheetIndex);
echo "done-$sheetIndex\n";
$newObjPHPExcel->addExternalSheet($workSheet);
}
(sheets is an array of indexes which are within the bounds of the sheet. I checked with listWorksheetInfo)
If I comment out the the last line $newObjPHPExcel->addExternalSheet($workSheet);
The getSheet method works fine. Otherwise, I get an error:
Fatal error: Uncaught exception 'PHPExcel_Exception' with message 'Your requested sheet index: 2 is out of bounds. The actual number of sheets is 1.' in /Xls/PHPExcel/PHPExcel.php:577
Why should newObjPHPExcel interfere with objPHPExcel?
UPDATE:
I found a workaround which seems to work. not sure why the other version did not work though.
$newObjPHPExcel = new PHPExcel();
$newObjPHPExcel->removeSheetByIndex(0); //remove first default sheet
foreach ($sheets as $sheetIndex) {
echo "trying-$sheetIndex\n";
$workSheet[] = $objPHPExcel->getSheet($sheetIndex);
echo "done-$sheetIndex\n";
}
foreach ($workSheet as $obj)
$newObjPHPExcel->addExternalSheet($obj);
The addExternalSheet() method actually removes the worksheet from the old workbook and moves it to the new one, but your iterator over the old workbook collection of sheets still believes that it contains that worksheet when it no longer does.
In your second "workround" code, you're not removing the sheets from the old workbook until after you've completed iterating over the first loop, you're simply setting an array of pointers to the sheets, then iterating over that array, so the pointer array doesn't care that the sheets are being moved from one workbook to another, they all still exists, so no error.
An alternative might be to use the worksheet iterator against the old workbook, which should update cleanly as the sheets are removed.

empty row and column while Exporting Several XtraGrid Controls to a Single Excel File

I have read this article and tested, it works but my problem is that there are one wide empty column (column A) and one wide empty row (row 1) in every sheet (in excel file). I know that this is the setting of PrintingBase class. But how can i remove those first empty column and row ?
i have found the answer to my own question:
var compositeLink = new CompositeLinkBase();
var link1 = new PrintableComponentLinkBase();
// this is the margins in sheet1
link1.Margins.Left = 0;
link1.MinMargins.Left = 0;
link1.Component = DG1;
compositeLink.Links.Add(link1);
// then export to excel :)

Resources