iMacro - Setting Variable + SaveAs CSV - web-scraping

I am looking for help with 2 parts of my iMacro Script...
Part1 - Variable
I am clicking on the follwoing line of a page in order to access the page I need to extract from.
1st Link
TAG POS=**8** TYPE=A FORM=NAME:xxyy ATTR=HREF:https://aaa.aaaa.com/en/administration/xxxx.jsp?reqID=h*
2nd Link
TAG POS=**9** TYPE=A FORM=NAME:xxyy ATTR=HREF:https://aaa.aaaa.com/en/administration/xxxx.jsp?reqID=h*
The tag pos is the variable, how can I get this so that when running on loop, the macro will select the next value on the screen (ie choose 8,9,10)? Some screens have 100 plus links to be clicked on.
Part 2 - Save CSV file
I have the saveas line in my file. But how can I make it so that there is only 1 csv file created (even if macro is runn 50 times)? Also, is there a way to format the CSV file from the iMacros so that each new run starts on another row (currently, all data extracts to row 1 across many columns.)
Thank you in advance,
Adam

This will do what you asked. It will loop the macro and each time set the new position number in the macro.
1)
var macro;
macro ="CODE:";
macro +="TAG POS={{number}} TYPE=A FORM=NAME:xxyy ATTR=HREF:https://aaa.aaaa.com/en/administration/xxxx.jsp?reqID=h*"+"\n";
for(var i=1;i<100;i++)
{
iimSet("number",i)
iimPlay(macro)
}
For the solution of part two you will need JavaScript scripting. First part is declaring macro and the second part is initiating the macro and the third part is the function which saves the extracted text into a file. Each time you run it will save in the new line.
2)
var macroExtractSomething;
macroExtractSomething ="CODE:";
macroExtractSomething +="TAG POS=1 TYPE=DIV ATTR=CLASS:some_class_of_some_div EXTRACT=TXT"+"\n";
iimPlay(macroExtractSomething)
var extracted_text=iimGetLastExtract();
WriteFile("C:\\some_folder\\some_file.csv",extracted)
//This function writes string into a file. It will also create file on that location
function WriteFile(path,string)
{
//import FileUtils.jsm
Components.utils.import("resource://gre/modules/FileUtils.jsm");
//declare file
var file = new FileUtils.File(path);
//declare file path
file.initWithPath(path);
//if it exists move on if not create it
if (!file.exists())
{
file.create(file.NORMAL_FILE_TYPE, 0666);
}
var charset = 'EUC-JP';
var fileStream = Components.classes['#mozilla.org/network/file-output-stream;1']
.createInstance(Components.interfaces.nsIFileOutputStream);
fileStream.init(file, 18, 0x200, false);
var converterStream = Components
.classes['#mozilla.org/intl/converter-output-stream;1']
.createInstance(Components.interfaces.nsIConverterOutputStream);
converterStream.init(fileStream, charset, string.length,
Components.interfaces.nsIConverterInputStream.DEFAULT_REPLACEMENT_CHARACTER);
//write file to location
converterStream.writeString("\r\n"+string);
converterStream.close();
fileStream.close();
}

Related

Iterate faster over a large collection of files (objects) inside an Observable List (JavaFX 8)

I have an excel file that contains all the filenames of the Images. The path of these images are stored in an Observable Collection via <File> class which came from the folder that contains all of the images. My goal is to create a hyperlink of these filenames by matching it through the pool of image file collection.
I would like to ask if how can I iterate faster through a large collection of file classes in order to get their paths easily.
For example:
Image name from Excel :
ABC_0001
The Full path from the collection must be:
C:\Users\admin\Desktop\Images\ABC_0001.jpg
In order to get their full path, I perform the iteration through Stream.
My procedures:
Extract data using Apache POI.
Stream through the Image Collection by converting each data into
their base filenames vs extracted data.
Get the result and store the fullpath on the object via
getAbsolutePath().
Code:
//storage during iteration
ObservableList<DetailedData> dataCollection = FXCollections.observableArrayList()
//Image collection containing over 13k Images listed via commons-io
ObservableList<File> IMAGE_COLLECTION = FXCollections.observableArrayList(FileUtils.listFiles(browsedFOLDER, new String[]{"JPG", "JPEG", "TIF", "TIFF", "jpg", "jpeg", "tif", "tiff"}, true));
//Sheet data
Sheet sheet1 = wb.getsheetAt(0);
for (Row row: sheet1)
{
DetailedData data = new DetailedData();
//extracted data from excel
String FILENAME = row.getCell(0,Row.MissingCellPolicy.CREATE_NULL_AS_BLANK).getStringCellValue();
//to be filled up based on stream result.
String IMAGE_SOURCE = null;
//stream code with the help of commons-io
File IMAGE = IMAGE_COLLECTION.stream().filter(e -> FilenameUtils.getBaseName(e.getName()).toLowerCase().equals(FILENAME.toLowerCase())).findFirst().orElse(null);
if (IMAGE != null)
IMAGE_SOURCE = IMAGE.getAbsolutePath();
data.setFileName(FILENAME);
data.setFullPath(IMAGE_SOURCE);
dataCollection.add(data);
}
Result:
Excel rows = 9,400
Image Files = 13,000
Iteration Time = 120,000ms
Are the results should appear normal or it can become faster?
I tried using parallelStream() and the results went faster but it consumes higher CPU usage.
This code should speed your code up a lot, but there are a few questions about your code.
ObservableList<DetailedData> dataCollection = FXCollections.observableArrayList() Why are you using ObservableList? Why is this a list of DetailedData and not File. Given that detailed data has setFileName and setFullPath. File already has these.
ObservableList<File> IMAGE_COLLECTION = FXCollections.observableArrayList(FileUtils.listFiles(browsedFOLDER, new String[]{"JPG", "JPEG", "TIF", "TIFF", "jpg", "jpeg", "tif", "tiff"}, true)); Why ObservableList?
These two are small things, but I am curious.
So what I think you should do is use a Map. Your code should look something like the code below.
//storage during iteration
List<DetailedData> dataCollection = new ArrayList();
//Image collection containing over 13k Images listed via commons-io
List<File> IMAGE_COLLECTION = new ArrayList(FileUtils.listFiles(new File("C:\\Users\\blj0011\\Pictures"), new String[]{"JPG", "JPEG", "TIF", "TIFF", "jpg", "jpeg", "tif", "tiff"}, true));
//Use this to map file name to file
Map<String, File> map = new HashMap();
//Use this to add data to the map
IMAGE_COLLECTION.forEach((file) -> {map.put(file.getName().substring(0, file.getName().lastIndexOf(".")).toLowerCase(), file);});
for (Row row: sheet1)
{
//extracted data from excel
String FILENAME = row.getCell(0,Row.MissingCellPolicy.CREATE_NULL_AS_BLANK).getStringCellValue();
//If the map contains the file name, create `DetailedData` object. Then set data. Then add object to datacollection list.
if (map.containsKey(FILENAME.toLowerCase()))
{
DetailedData data = new DetailedData();
data.setFileName(FILENAME);
data.setFullPath(map.get(FILENAME.toLowerCase()).getAbsolutePath());
dataCollection.add(data);
}
}
Comments in the code
I still believe this could be cleaned up a little more if you used List<File> dataCollection = new ArrayList()
If you really want to speed up your search, you should try not to do things repeatedly which could just be done once. For example you could use two loops. The first to prepare your search and the second to actually do the search. Inside your filter you call FilenameUtils.getBaseName and two time a conversion to lower case. It would be better to do these things only once in the first loop and store the resulting Strings in a list. In the second loop you then do the search on this list.
I am also wondering why you use ObservableLists here. A simple List would do as well.
I've tested another approach in this slow iteration.
It seems that the cause is declaring the Stream repeatedly inside the foreach.
I tried using Baeldung's solution <Supplier> and declared it outside the loop together with parallelStream()
Sample Code:
Supplier<Stream<File>> streamSupplier = () -> imageCollection.parallelStream();
for (Row row : sheet)
{
File IMAGE = streamSupplier.get().filter(e -> FilenameUtils.getBaseName(e.getName()).toLowerCase().equals(FILENAME.toLowerCase())).findFirst().orElse(null);
if (IMAGE != null)
IMAGE_SOURCE = IMAGE.getAbsolutePath();
}
Result went 45000ms
Please correct me if my approach was not right.

Function for Google Sheets' Script editor with a button for TODAY(), and NOW() in two different columns of which are the next not blank in the column

Currently, I'm looking at some simple documentation for vague ways to make a 'button' (image) over a Google sheet to trigger a function on the script editor. I'm not familiar with this type of Syntax, I typically do AutoHotKey, and a bit of python.
All I want to do is have this button populate 2 columns. The current date in one, and the current time in the other (It doesn't even have to have its year or the seconds tbh). I don't know if it matters of what the pages name is based on how the script works. So the range is ( 'Log'!G4:H ).
Like if I were to make it for AutoHotkey I would put it as :
WinGet, winid ,, A ; <-- need to identify window A = active
MsgBox, winid=%winid%
;do some stuff
WinActivate ahk_id %winid%
So it affects any page it's active on.
I would like to use the same function on the same columns across different sheets. Ideally, that is. I don't care if I have to clone each a unique function based on the page, but I just can't even grasp this first step, lol.
I'm not too familiar with this new macro. If I use this macro does it only work for my client, because of say like it recording relative aspect ratio movements?
IE if I record a macro on my PC, and play it on my android. Will the change in the platform change its execution?
If anyone can point me in any direction as to any good documentation or resources for the Google Sheet Script Editor or its syntaxes I would really appreciate it.
EDIT: Just to clarify. Im really focused in on it being a function that populates from a click/press(mobile) of an image. I currently use an onEDIT on the sheet, and it wouldnt serve the purposes that I want for this function. Its just a shortcut to quickly input a timestamp, and those fields can still be retouched without it just reapplying a new function for a newer current time/date.
EDIT:EDIT: Ended up with a image button that runs a script that can only input to the current cell.
function timeStamp() {
SpreadsheetApp.getActiveSheet()
.getActiveCell()
.setValue(new Date());
}
It only works on the cell targeted.
I would like to force the input in the next availible cell in the column, and split the date from the time, and put them into cells adjacent from one another.
maybe this will help... if the 1st column is edited it will auto-print date in 2nd column and time in 3rd column on Sheet1:
function onEdit(e) {
var s = SpreadsheetApp.getActiveSheet();
if( s.getName() == "Sheet1" ) {
var r = s.getActiveCell();
if( r.getColumn() == 1 ) {
var nextCell = r.offset(0, 1);
var newDate = Utilities.formatDate(new Date(),
"GMT+8", "MM/dd/yyyy");
nextCell.setValue(newDate);
}
if( r.getColumn() == 1 ) {
var nextCell = r.offset(0, 2);
var newDate1 = Utilities.formatDate(new Date(),
"GMT+8", "hh:mm:ss");
nextCell.setValue(newDate1);
}}}
https://webapps.stackexchange.com/a/130253/186471

DM Script to import a 2D image in text (CSV) format

Using the built-in "Import Data..." functionality we can import a properly formatted text file (like CSV and/or tab-delimited) as an image. It is rather straight forward to write a script to do so. However, my scripting approach is not efficient - which requires me to loop through each raw (use the "StreamReadTextLine" function) so it takes a while to get a 512x512 image imported.
Is there a better way or an "undocumented" script function that I can tap in?
DigitalMicrograph offers an import functionality via the File/Import Data... menu entry, which will give you this dialog:
The functionality evoked by this dialog can also be accessed by script commands, with the command
BasicImage ImageImportTextData( String img_name, ScriptObject stream, Number data_type_enum, ScriptObject img_size, Boolean lines_are_rows, Boolean size_by_counting )
As with the dialog, one has to pre-specify a few things.
The data type of the image.
This is a number. You can find out which number belongs to which image data type by, f.e., creating an image outputting its data type:
image img := Realimage( "", 4, 100 )
Result("\n" + img.ImageGetDataType() )
The file stream object
This object describes where the data is stored. The F1 help-documention explains how one creates a file-stream from an existing file, but essentially you need to specify a path to the file, then open the file for reading (which gives you a handle), and then using the fileHandle to create the stream object.
string path = "C:\\test.txt"
number fRef = OpenFileForReading( path )
object fStream = NewStreamFromFileReference( fRef, 1 )
The image size object
This is a specific script object you need to allocate. It wraps image size information. In case of auto-detecting the size from the text, you don't need to specify the actual size, but you still need the object.
object imgSizeObj = Alloc("ImageData_ImageDataSize")
imgSizeObj.SetNumDimensions(2) // Not needed for counting!
imgSizeObj.SetDimensionSize(0,10) // Not used for counting
imgSizeObj.SetDimensionSize(1,10) // Not used for counting
Boolean checks
Like with the checkboxes in the UI, you spefic two conditions:
Lines are Rows
Get Size By Counting
Note, that the "counting" flag is only used if "Lines are Rows" is also true. Same as with the dialog.
The following script improrts a text file with couting:
image ImportTextByCounting( string path, number DataType )
{
number fRef = OpenFileForReading( path )
object fStream = NewStreamFromFileReference( fRef, 1 )
number bLinesAreRows = 1
number bSizeByCount = 1
bSizeByCount *= bLinesAreRows // Only valid together!
object imgSizeObj = Alloc("ImageData_ImageDataSize")
image img := ImageImportTextData( "Imag Name ", fStream, DataType, imgSizeObj, bLinesAreRows, bSizeByCount )
return img
}
string path = "C:\\test.txt"
number kREAL4_DATA = 2
image img := ImportTextByCounting( path, kREAL4_DATA )
img.ShowImage()

How to automate moving data from one google sheet to another

I am trying to send values from Google sheets to Firebase so that the data table updates automatically. To do this I used Google Drive CMS which exported the data to Firebase perfectly. The problem is that I scrape data off of websites. For example, I get a list of data through the use of importXML:
=IMPORTXML("https://www.congress.gov/search?q=%7B%22source%22%3A%22legislation%22%7D", "//li[#class='expanded']/span[#class='result-heading']/a[1]")
CMS doesn't seem to get the values that this formula results in but the actual formula which causes errors. The way I adressed this is to make a new tab that has the formulas and keep the CMS tab with the value only. I've been copying and pasting this manually but want to make that process automatic. I cannot find any help to make a script that takes the values of the formulas from one tab and place those values in a different sheet.
Here are some pictures for reference:
*I put the blue highlight on the cell I am referring to for the google sheets and the data from the firebase is showing the first row of data that was exported.
How about this sample script? Please think of this as one of several answers. The flow of this script is as follows. When you use this script, please copy and paste it and run sample().
Flow :
Input a source range of active sheet.
Retrieve the source range.
Input a destination range of a destination spreadsheet.
Retrieve the destination range of the destination sheet.
Copy data from the source range to the destination range.
Data which included values, formulas and formats is copied.
Sample script :
function sample() {
// Source
var range = "a1:b5"; // Source range
var ss = SpreadsheetApp.getActiveSpreadsheet();
var srcrange = ss.getActiveSheet().getRange(range);
// Destination
var range = "c1:d5"; // Destination range,
var dstid = "### file id ###"; // Destination spreadsheet ID
var dst = "### sheet name ###"; // Destination sheet name
var dstrange = SpreadsheetApp.openById(dstid).getSheetByName(dst).getRange(range);
var dstSS = dstrange.getSheet().getParent();
var copiedsheet = srcrange.getSheet().copyTo(dstSS);
copiedsheet.getRange(srcrange.getA1Notation()).copyTo(dstrange);
dstSS.deleteSheet(copiedsheet);
}
If I misunderstand your question, I'm sorry.
Edit :
This sample script copies from values of source spreadsheet to destination spreadsheet.
function sample() {
// Source
var range = "a1:b5"; // Source range
var ss = SpreadsheetApp.getActiveSpreadsheet();
var srcrange = ss.getActiveSheet().getRange(range);
// Destination
var range = "c1:d5"; // Destination range,
var dstid = "### file id ###"; // Destination spreadsheet ID
var dst = "### sheet name ###"; // Destination sheet name
var dstrange = SpreadsheetApp.openById(dstid).getSheetByName(dst).getRange(range);
var dstSS = dstrange.getSheet().getParent();
var sourceValues = srcrange.getValues();
dstrange.setValues(sourceValues);
}

Creating a log file with the name as "Logging_20120402.log" where 20120402 is current date

I am working on a logging project i want the name of the loggign file that gets created to be like :-
Logging_20120402.log
where 20120402 is the current date since this is a rotation log so every day when the date changes the name of the log file gets changing too for example if a ne file would get created tomorrow it would have the name as
Logging_20120403.log
i tried using this code however it didnt work
i have in my project a filelogger.cpp which has a function:-
string Lastdate()
{
returns date1;//also declared as global in this filelogger.cpp
}
now when i use it in my main program what i did was this
void test()//function
{
string* ptr = &date1;//i passed a pointer ptr to the address of the date1(filecreation
}
now i write the staement as:-
Logging::FileLogger filelog(logger, "logging_" + ptr + ".log");//creating a
Daily.log text File
i am expecting this statement to create a file as Logging_20120402.log at the path
specified,however there is a compile time error,need help

Resources