Iterate faster over a large collection of files (objects) inside an Observable List (JavaFX 8) - javafx

I have an excel file that contains all the filenames of the Images. The path of these images are stored in an Observable Collection via <File> class which came from the folder that contains all of the images. My goal is to create a hyperlink of these filenames by matching it through the pool of image file collection.
I would like to ask if how can I iterate faster through a large collection of file classes in order to get their paths easily.
For example:
Image name from Excel :
ABC_0001
The Full path from the collection must be:
C:\Users\admin\Desktop\Images\ABC_0001.jpg
In order to get their full path, I perform the iteration through Stream.
My procedures:
Extract data using Apache POI.
Stream through the Image Collection by converting each data into
their base filenames vs extracted data.
Get the result and store the fullpath on the object via
getAbsolutePath().
Code:
//storage during iteration
ObservableList<DetailedData> dataCollection = FXCollections.observableArrayList()
//Image collection containing over 13k Images listed via commons-io
ObservableList<File> IMAGE_COLLECTION = FXCollections.observableArrayList(FileUtils.listFiles(browsedFOLDER, new String[]{"JPG", "JPEG", "TIF", "TIFF", "jpg", "jpeg", "tif", "tiff"}, true));
//Sheet data
Sheet sheet1 = wb.getsheetAt(0);
for (Row row: sheet1)
{
DetailedData data = new DetailedData();
//extracted data from excel
String FILENAME = row.getCell(0,Row.MissingCellPolicy.CREATE_NULL_AS_BLANK).getStringCellValue();
//to be filled up based on stream result.
String IMAGE_SOURCE = null;
//stream code with the help of commons-io
File IMAGE = IMAGE_COLLECTION.stream().filter(e -> FilenameUtils.getBaseName(e.getName()).toLowerCase().equals(FILENAME.toLowerCase())).findFirst().orElse(null);
if (IMAGE != null)
IMAGE_SOURCE = IMAGE.getAbsolutePath();
data.setFileName(FILENAME);
data.setFullPath(IMAGE_SOURCE);
dataCollection.add(data);
}
Result:
Excel rows = 9,400
Image Files = 13,000
Iteration Time = 120,000ms
Are the results should appear normal or it can become faster?
I tried using parallelStream() and the results went faster but it consumes higher CPU usage.

This code should speed your code up a lot, but there are a few questions about your code.
ObservableList<DetailedData> dataCollection = FXCollections.observableArrayList() Why are you using ObservableList? Why is this a list of DetailedData and not File. Given that detailed data has setFileName and setFullPath. File already has these.
ObservableList<File> IMAGE_COLLECTION = FXCollections.observableArrayList(FileUtils.listFiles(browsedFOLDER, new String[]{"JPG", "JPEG", "TIF", "TIFF", "jpg", "jpeg", "tif", "tiff"}, true)); Why ObservableList?
These two are small things, but I am curious.
So what I think you should do is use a Map. Your code should look something like the code below.
//storage during iteration
List<DetailedData> dataCollection = new ArrayList();
//Image collection containing over 13k Images listed via commons-io
List<File> IMAGE_COLLECTION = new ArrayList(FileUtils.listFiles(new File("C:\\Users\\blj0011\\Pictures"), new String[]{"JPG", "JPEG", "TIF", "TIFF", "jpg", "jpeg", "tif", "tiff"}, true));
//Use this to map file name to file
Map<String, File> map = new HashMap();
//Use this to add data to the map
IMAGE_COLLECTION.forEach((file) -> {map.put(file.getName().substring(0, file.getName().lastIndexOf(".")).toLowerCase(), file);});
for (Row row: sheet1)
{
//extracted data from excel
String FILENAME = row.getCell(0,Row.MissingCellPolicy.CREATE_NULL_AS_BLANK).getStringCellValue();
//If the map contains the file name, create `DetailedData` object. Then set data. Then add object to datacollection list.
if (map.containsKey(FILENAME.toLowerCase()))
{
DetailedData data = new DetailedData();
data.setFileName(FILENAME);
data.setFullPath(map.get(FILENAME.toLowerCase()).getAbsolutePath());
dataCollection.add(data);
}
}
Comments in the code
I still believe this could be cleaned up a little more if you used List<File> dataCollection = new ArrayList()

If you really want to speed up your search, you should try not to do things repeatedly which could just be done once. For example you could use two loops. The first to prepare your search and the second to actually do the search. Inside your filter you call FilenameUtils.getBaseName and two time a conversion to lower case. It would be better to do these things only once in the first loop and store the resulting Strings in a list. In the second loop you then do the search on this list.
I am also wondering why you use ObservableLists here. A simple List would do as well.

I've tested another approach in this slow iteration.
It seems that the cause is declaring the Stream repeatedly inside the foreach.
I tried using Baeldung's solution <Supplier> and declared it outside the loop together with parallelStream()
Sample Code:
Supplier<Stream<File>> streamSupplier = () -> imageCollection.parallelStream();
for (Row row : sheet)
{
File IMAGE = streamSupplier.get().filter(e -> FilenameUtils.getBaseName(e.getName()).toLowerCase().equals(FILENAME.toLowerCase())).findFirst().orElse(null);
if (IMAGE != null)
IMAGE_SOURCE = IMAGE.getAbsolutePath();
}
Result went 45000ms
Please correct me if my approach was not right.

Related

Import text file to R and change it's format

I have a data in text file. The example of the text file looks like this:
"vLatitude ='23.8145833';
vLongitude ='90.4043056';
vcontents ='LRP: LRPS</br>Start of Road From the End of Banani Rail Crossing Over Pass</br>Division:Gazipur</br>Sub-Division:Tongi';
vLocations = new Array(vcontents, vLatitude, vLongitude);
locations.push(vLocations);"
Can I change it to like this in R?
eg.
latitute longtitude contents
23.8145833 90.4043056 LRP: LRPS Start...Tongi
Solution 1
That looks a lot like javascript code. Execute the javascript (using a web browser) and save the result to JSON, then open the file with R with jsonlite.
With your example, create this file and save it as my_page.html:
<html>
<header>
<script>
// Initialize locations to be able to push more values in it
// probably not required with your full code
var locations = [];
vLatitude ='23.8145833';
vLongitude ='90.4043056';
vcontents ='LRP: LRPS</br>Start of Road From the End of Banani Rail Crossing Over Pass</br>Division:Gazipur</br>Sub-Division:Tongi';
vLocations = new Array(vcontents, vLatitude, vLongitude);
locations.push(vLocations);
// convert locations to json
var jsonData = JSON.stringify(locations);
// actually write the json to file
function download(content, fileName, contentType) {
var a = document.createElement("a");
var file = new Blob([content], {type: contentType});
a.href = URL.createObjectURL(file);
a.download = fileName;
a.click();
}
download(jsonData, 'export_json.txt', 'text/plain');
</script>
</header>
<body>
Download should start automatically. You can look at the web console for errors.
</body>
</html>
When you open it with your web browser it should "download" a file, that you can open with R:
jsonlite::read_json("export_json.txt",simplifyVector = TRUE)
One problem is that the javascript code is created an array without names. So the names are not exported. I don't see how you could make javascript export it.
Solution 2
Instead of relying on a browser to execute the javascript code, you could do it directly in R with a javascript engine. It should give you the same result, but makes communication between the two easier.
Solution 3
If the file really looks like that all along, you might be able to remove the javascript lines that organize the arrays, and only keep the lines that define variables. In R, the symbols = and ; are technically valid, it's not too hard to rewrite the javascript into R code. Note this solution could be very fragile depending on what else is in your javascript code!
js_script <- "var locations = [];
vLatitude ='23.8145833';
vLongitude ='90.4043056';
vcontents ='LRP: LRPS</br>Start of Road From the End of Banani Rail Crossing Over Pass</br>Division:Gazipur</br>Sub-Division:Tongi';
vLocations = new Array(vcontents, vLatitude, vLongitude);
locations.push(vLocations);
// convert locations to json
var jsonData = JSON.stringify(locations);" %>%
str_split(pattern = "\n", simplify=TRUE) %>%
as.character() %>%
str_trim()
# Find the lines that look like defining variables
js_script <- js_script[str_detect(js_script, pattern = "^\\w+ ?= ?'.*' ?;$")]
# make it into an R expression
r_code <- str_remove(js_script, ";$") %>%
paste(collapse = ",")
r_code <- paste0("c(", r_code, ")")
# Execute
eval(str2expression(r_code))

How to write Key Value Pairs in Map in a Loop without overwriting them?

the following is my problem:
I do a select on my database and want to write the values of every record i get in a map. When i do this, i only have the values of one record in my map, because the put() function overwrites the entries everytime the loop starts again. Since I have to transfer the Key:Value pairs via JSON into my Javascript and write something into a field for every Key:Value pair, an ArrayList is not an option.
I've already tried to convert a ArrayList, which contains the Map, to a Map or to a String and then to Map, but i failed.
EDIT:
HereĀ“s my Code
def valueArray = new ArrayList();
def recordValues = [:]
while (rs.next())
{
fremdlKorr = rs.getFloatValue(1)
leistungKorr = rs.getFloatValue(2)
materialKorr = rs.getFloatValue(3)
strid = rs.getStringValue(4)
recordValues.put("strid", strid);
recordValues.put("material", materialKorr);
recordValues.put("fremdl", fremdlKorr);
recordValues.put("leistung", leistungKorr);
valueArray.add(korrekturWerte);
}
The ArrayList was just a test, i dont want to have an ArrayList, i need a Map.
The code as written, will give you a list of maps, but the maps will all contain the values of the last row. The reaons is the def recordValues = [:] that is outside of the while loop. So you basically add always the same map to the list and the map values get overwritten each loop.
While moving the code, would fix the problem, I'd use Sql instead. That all boils down to:
def valueArray = sql.rows("select strid, material, fremdl, leistung from ...")

DM Script to import a 2D image in text (CSV) format

Using the built-in "Import Data..." functionality we can import a properly formatted text file (like CSV and/or tab-delimited) as an image. It is rather straight forward to write a script to do so. However, my scripting approach is not efficient - which requires me to loop through each raw (use the "StreamReadTextLine" function) so it takes a while to get a 512x512 image imported.
Is there a better way or an "undocumented" script function that I can tap in?
DigitalMicrograph offers an import functionality via the File/Import Data... menu entry, which will give you this dialog:
The functionality evoked by this dialog can also be accessed by script commands, with the command
BasicImage ImageImportTextData( String img_name, ScriptObject stream, Number data_type_enum, ScriptObject img_size, Boolean lines_are_rows, Boolean size_by_counting )
As with the dialog, one has to pre-specify a few things.
The data type of the image.
This is a number. You can find out which number belongs to which image data type by, f.e., creating an image outputting its data type:
image img := Realimage( "", 4, 100 )
Result("\n" + img.ImageGetDataType() )
The file stream object
This object describes where the data is stored. The F1 help-documention explains how one creates a file-stream from an existing file, but essentially you need to specify a path to the file, then open the file for reading (which gives you a handle), and then using the fileHandle to create the stream object.
string path = "C:\\test.txt"
number fRef = OpenFileForReading( path )
object fStream = NewStreamFromFileReference( fRef, 1 )
The image size object
This is a specific script object you need to allocate. It wraps image size information. In case of auto-detecting the size from the text, you don't need to specify the actual size, but you still need the object.
object imgSizeObj = Alloc("ImageData_ImageDataSize")
imgSizeObj.SetNumDimensions(2) // Not needed for counting!
imgSizeObj.SetDimensionSize(0,10) // Not used for counting
imgSizeObj.SetDimensionSize(1,10) // Not used for counting
Boolean checks
Like with the checkboxes in the UI, you spefic two conditions:
Lines are Rows
Get Size By Counting
Note, that the "counting" flag is only used if "Lines are Rows" is also true. Same as with the dialog.
The following script improrts a text file with couting:
image ImportTextByCounting( string path, number DataType )
{
number fRef = OpenFileForReading( path )
object fStream = NewStreamFromFileReference( fRef, 1 )
number bLinesAreRows = 1
number bSizeByCount = 1
bSizeByCount *= bLinesAreRows // Only valid together!
object imgSizeObj = Alloc("ImageData_ImageDataSize")
image img := ImageImportTextData( "Imag Name ", fStream, DataType, imgSizeObj, bLinesAreRows, bSizeByCount )
return img
}
string path = "C:\\test.txt"
number kREAL4_DATA = 2
image img := ImportTextByCounting( path, kREAL4_DATA )
img.ShowImage()

How to automate moving data from one google sheet to another

I am trying to send values from Google sheets to Firebase so that the data table updates automatically. To do this I used Google Drive CMS which exported the data to Firebase perfectly. The problem is that I scrape data off of websites. For example, I get a list of data through the use of importXML:
=IMPORTXML("https://www.congress.gov/search?q=%7B%22source%22%3A%22legislation%22%7D", "//li[#class='expanded']/span[#class='result-heading']/a[1]")
CMS doesn't seem to get the values that this formula results in but the actual formula which causes errors. The way I adressed this is to make a new tab that has the formulas and keep the CMS tab with the value only. I've been copying and pasting this manually but want to make that process automatic. I cannot find any help to make a script that takes the values of the formulas from one tab and place those values in a different sheet.
Here are some pictures for reference:
*I put the blue highlight on the cell I am referring to for the google sheets and the data from the firebase is showing the first row of data that was exported.
How about this sample script? Please think of this as one of several answers. The flow of this script is as follows. When you use this script, please copy and paste it and run sample().
Flow :
Input a source range of active sheet.
Retrieve the source range.
Input a destination range of a destination spreadsheet.
Retrieve the destination range of the destination sheet.
Copy data from the source range to the destination range.
Data which included values, formulas and formats is copied.
Sample script :
function sample() {
// Source
var range = "a1:b5"; // Source range
var ss = SpreadsheetApp.getActiveSpreadsheet();
var srcrange = ss.getActiveSheet().getRange(range);
// Destination
var range = "c1:d5"; // Destination range,
var dstid = "### file id ###"; // Destination spreadsheet ID
var dst = "### sheet name ###"; // Destination sheet name
var dstrange = SpreadsheetApp.openById(dstid).getSheetByName(dst).getRange(range);
var dstSS = dstrange.getSheet().getParent();
var copiedsheet = srcrange.getSheet().copyTo(dstSS);
copiedsheet.getRange(srcrange.getA1Notation()).copyTo(dstrange);
dstSS.deleteSheet(copiedsheet);
}
If I misunderstand your question, I'm sorry.
Edit :
This sample script copies from values of source spreadsheet to destination spreadsheet.
function sample() {
// Source
var range = "a1:b5"; // Source range
var ss = SpreadsheetApp.getActiveSpreadsheet();
var srcrange = ss.getActiveSheet().getRange(range);
// Destination
var range = "c1:d5"; // Destination range,
var dstid = "### file id ###"; // Destination spreadsheet ID
var dst = "### sheet name ###"; // Destination sheet name
var dstrange = SpreadsheetApp.openById(dstid).getSheetByName(dst).getRange(range);
var dstSS = dstrange.getSheet().getParent();
var sourceValues = srcrange.getValues();
dstrange.setValues(sourceValues);
}

iMacro - Setting Variable + SaveAs CSV

I am looking for help with 2 parts of my iMacro Script...
Part1 - Variable
I am clicking on the follwoing line of a page in order to access the page I need to extract from.
1st Link
TAG POS=**8** TYPE=A FORM=NAME:xxyy ATTR=HREF:https://aaa.aaaa.com/en/administration/xxxx.jsp?reqID=h*
2nd Link
TAG POS=**9** TYPE=A FORM=NAME:xxyy ATTR=HREF:https://aaa.aaaa.com/en/administration/xxxx.jsp?reqID=h*
The tag pos is the variable, how can I get this so that when running on loop, the macro will select the next value on the screen (ie choose 8,9,10)? Some screens have 100 plus links to be clicked on.
Part 2 - Save CSV file
I have the saveas line in my file. But how can I make it so that there is only 1 csv file created (even if macro is runn 50 times)? Also, is there a way to format the CSV file from the iMacros so that each new run starts on another row (currently, all data extracts to row 1 across many columns.)
Thank you in advance,
Adam
This will do what you asked. It will loop the macro and each time set the new position number in the macro.
1)
var macro;
macro ="CODE:";
macro +="TAG POS={{number}} TYPE=A FORM=NAME:xxyy ATTR=HREF:https://aaa.aaaa.com/en/administration/xxxx.jsp?reqID=h*"+"\n";
for(var i=1;i<100;i++)
{
iimSet("number",i)
iimPlay(macro)
}
For the solution of part two you will need JavaScript scripting. First part is declaring macro and the second part is initiating the macro and the third part is the function which saves the extracted text into a file. Each time you run it will save in the new line.
2)
var macroExtractSomething;
macroExtractSomething ="CODE:";
macroExtractSomething +="TAG POS=1 TYPE=DIV ATTR=CLASS:some_class_of_some_div EXTRACT=TXT"+"\n";
iimPlay(macroExtractSomething)
var extracted_text=iimGetLastExtract();
WriteFile("C:\\some_folder\\some_file.csv",extracted)
//This function writes string into a file. It will also create file on that location
function WriteFile(path,string)
{
//import FileUtils.jsm
Components.utils.import("resource://gre/modules/FileUtils.jsm");
//declare file
var file = new FileUtils.File(path);
//declare file path
file.initWithPath(path);
//if it exists move on if not create it
if (!file.exists())
{
file.create(file.NORMAL_FILE_TYPE, 0666);
}
var charset = 'EUC-JP';
var fileStream = Components.classes['#mozilla.org/network/file-output-stream;1']
.createInstance(Components.interfaces.nsIFileOutputStream);
fileStream.init(file, 18, 0x200, false);
var converterStream = Components
.classes['#mozilla.org/intl/converter-output-stream;1']
.createInstance(Components.interfaces.nsIConverterOutputStream);
converterStream.init(fileStream, charset, string.length,
Components.interfaces.nsIConverterInputStream.DEFAULT_REPLACEMENT_CHARACTER);
//write file to location
converterStream.writeString("\r\n"+string);
converterStream.close();
fileStream.close();
}

Resources