Data extraction using imacros - web-scraping

I need to have a imacros script to extract all data from this website
http://www.gibsondunn.com/Search/Pages/LawyersSearch.aspx?k=('Last Name'~A*).
I manually click on alphabet link and count the result then input on max loop to play loop but its really time consuming doing that way and try to find on Google but there is no luck and try to find here. Hope there's someone help me with this.
Here also the script I create through record imacros.
VERSION BUILD=8871104 RECORDER=FX
SET !TIMEOUT_PAGE 20
SET !EXTRACT_TEST_POPUP NO
SET !ERRORIGNORE YES
TAB T=1
TAG POS={{!LOOP}} TYPE=A ATTR=HREF:/lawyers/* EXTRACT=HREF
TAB OPEN NEW
TAB T=2
URL GOTO={{!EXTRACT}}
WAIT SECONDS=2
'data text
SET !EXTRACT {{!URLCURRENT}}
TAG POS=1 TYPE=H1 ATTR=CLASS:gd_title EXTRACT=TXT
TAG POS=4 TYPE=SPAN ATTR=* EXTRACT=TXT
TAG POS=13 TYPE=DIV ATTR=* EXTRACT=TXT
TAG POS=15 TYPE=DIV ATTR=* EXTRACT=TXT
TAG POS=19 TYPE=DIV ATTR=* EXTRACT=TXT
TAG POS=20 TYPE=DIV ATTR=* EXTRACT=TXT
TAG POS=21 TYPE=DIV ATTR=* EXTRACT=TXT
TAG POS=22 TYPE=DIV ATTR=* EXTRACT=TXT
TAG POS=23 TYPE=DIV ATTR=* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=gibsondunn.csv
tab close
TAB T=1
Really appreciate your help .

Without using JavaScript Scripting Interface, the simplest way facilitating your task is to get rid of counting the number of lines you have to extract. Below the code that prompts it. Write down this macro in a separate .iim-file and, of course, input the obtained value manually on max loop before running your own one.
SET !EXTRACT_TEST_POPUP NO
URL GOTO=javascript:{var<SP>els=window.document.getElementsByTagName("td");var<SP>n=0;for(i=0;i<els.length;i++){if(els[i].className=="gd_nameColumnData")n++}n;}
URL GOTO=javascript:{window.history.back()}
TAG POS=1 TYPE=HTML ATTR=* EXTRACT=TXT
BACK
PROMPT {{!EXTRACT}}

Related

Google Tag Manager link click image

I'm using Google Tag Manager to send an event to Google Analytics when you click an <a> or a<button> tag, which works fine.
However I also have an <a> tag wrapping an <img> and the event is not firing because the element getting clicked is not the <a> tag, it's the <img> tag.
I've tried to move my .sau-btn to the <img> tag but this doe not work either. I've also tried to move my a tag to act as a hotspot on top of it.
<v-img :src="require('#/assets/img/sales-banner-enterprise.jpg')" class="gtm-btn" data-gtm-category="Sales Offer" data-gtm-action="Test Link" data-gtm-label="Image Real Link One" alt="alt tag"></v-img>
How get I get my event to fire when you click anything inside of the class that it is holding?
If you're looking for the parent anchor tag to not capture the event down to the image tag, you can call event.stopPropagation() on it.
By default, events do get to the most nested element through the capturing phase. Here is a link to the API :
https://developer.mozilla.org/en-US/docs/Web/API/Event/stopPropagation

Skip A TAG Having Same Class in Imacros

I have created a script which works fine, but i am just stuck on one issue that i cannot figure-out how to tackle it. So I wanted your help guys.
I just want to click a button but the thing is there are several buttons having the exact same class on the page. So the script does is it just clicks on the first button that it finds on the page. I want to skip that first button and click the second button.
So in short
button 1 <-- skip
button 2 <-- click
(both have same class="buttons")
below is the code
TAG POS=1 TYPE=SPAN ATTR=TXT:NEXT
WAIT SECONDS=6
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:NoFormName ATTR=ID:knowledge-preregistered-email-response CONTENT={{!COL4}}
TAG POS=1 TYPE=SPAN ATTR=TXT:NEXT
WAIT SECONDS=7
TAG POS=1 TYPE=BUTTON ATTR=CLASS:yt-uix-button*yt-uix-button-size-default*yt-uix-button-default*yt-uix-button-empty*yt-uix-button-has-icon*no-icon-markup*comment-action-buttons-renderer-thumb*yt-uix-sessionlink*sprite-comment-actions*sprite-like*i-a-v-sprite-like*
WAIT SECONDS=7
This is the button class
> ATTR=CLASS:yt-uix-button*yt-uix-button-size-default*yt-uix-button-default*yt-uix-button-empty*yt-uix-button-has-icon*no-icon-markup*comment-action-buttons-renderer-thumb*yt-uix-sessionlink*sprite-comment-actions*sprite-like*i-a-v-sprite-like*
So i want to skip the first button having this class and move on to the next one and click that.
Damn im so silly. I did it myself :) .. just changed TAG POS=1 to 2

How to extract data with iMacros from a website whose TAG POS=x of the same element is variable between different webpages?

I wish to extract data from a website that contains multiple webpages by searching in the website according to a list of keywords defined in a datasource .csv.
iMacros should enter sequentially in each individual page, grab certain elements on each webpage and save data in a csv. The elements to be extracted are the same in between all webpages.
My problem is that the TAG POS=x does not remain the same for an element when moving from webpage to webpage.
e.g on a page a HTML TAG element has TAG POS=95 TYPE=SPAN ATTR=* EXTRACT=TXT,
while on other page same HTML TAG element changes to TAG POS=96 TYPE=SPAN ATTR=* EXTRACT=TXT
The only possibility I am thinking would be to pick the elements by their text attribute ( I mean their text).
Question:
Does the TXT parameter like TXT:Manufacturer (or eventually TXT:Manufacturer*) permits the selection without knowing the exact TAG POS=?
Is there other solution to make this kind of an extraction with iMacros?(variable position of the tag for the same html element across pages)
Thank you.
You can use the tag like below. So the below tag will extract the text, that has the attribute starts with "Manufacturer" irrespective of the position.
TAG POS=* TYPE=SPAN ATTR=TXT:Manufacturer* EXTRACT=TXT
(1) Generally speaking, that depends on a website which is scraped. Nevertheless you can try the command such as this:
TAG POS=1 TYPE=SPAN ATTR=TXT:Manufacturer* EXTRACT=TXT
(2) If you exactly know these tag positions, the following code may be helpful as well:
SET !ERRORIGNORE YES
SET !TIMEOUT_STEP 0
TAG POS=95 TYPE=SPAN ATTR=* EXTRACT=TXT
TAG POS=96 TYPE=SPAN ATTR=* EXTRACT=TXT
' other commands with extraction
SET !TIMEOUT_STEP 6
SET !ERRORIGNORE NO
SET !EXTRACT EVAL("'{{!EXTRACT}}'.replace(/\\[EXTRACT\\]|#EANF#/g, '').trim();")

Accessing dynamically created iframe within imacros

I need help with implementing an imacros script.
My basis script looks like this:
VERSION BUILD=8940826 RECORDER=FX
TAB T=1
URL GOTO=URL
TAG POS=1 TYPE=INPUT:TEXT ATTR=NAME:login-custnum CONTENT=12345
TAG POS=1 TYPE=INPUT:TEXT ATTR=NAME:login-username CONTENT=myuser
SET !ENCRYPTION NO
TAG POS=1 TYPE=INPUT:PASSWORD ATTR=NAME:login-password CONTENT=password
TAG POS=1 TYPE=BUTTON ATTR=NAME:login-login
This script works, the login is performed.
After this I need to use one of 3 iframes.
I cannot use
FRAME NAME="menu_iframe"
because the frames are created dynamically and NOT statically.
I tried the following:
var myframe = window.frames["menu_iframe"];
But this does not work.
After that I want to click a certain button in that iframe.
Thanks in advance
First of all I suggest checking the frame names. Try the macro below that consists of only one line. It must show names of all frames on the page with ‘alert’ dialog.
URL GOTO=javascript:{window.location.href='javascript:{var<SP>f=[];var<SP>l=window.frames.length;for(i=0;i<l;i++){try{f.push("\\""+window.frames[i].name+"\\"");}catch(e){f.push("no_frame_name")}}alert("FOUND<SP>"+f.length+"<SP>FRAMES:"+"\\n\\n"+f.join("\\n"));}';undefined;}

(iMacros) How to scrape usernames this way?

So I manage to make it work, but it also scrapes the name
inside the box like this
And all I want it to scrape is the username "nekoakatsuki"
the code I use to scrape the username box is this:
TAG POS=1 TYPE=DIV ATTR=CLASS:infolist&&TXT:* EXTRACT=TXT
so it scrapes anything in the "infolist" and look below is
what it grabs which is also the name
<div class="infolist">
<strong>nekoakatsuki</strong>
<br>
<span class="fullname">Jennifer Sandoval</span>
</div>
So how would I only scrape the username and not the Name also?
Website I'm using for this is http://web.stagram.com/tag/anime/?vm=grid
TAG POS=1 TYPE=SPAN ATTR=CLASS:fullname EXTRACT=TXT
Try this.

Resources