Chrome crashes when using robotframework - robotframework

Hi there i am scraping ADs data from olx using robotframework with my browser as chrome. The process works like this: it iterates over the ads click each ad scrapes the data then return on the previous page and then does the same. The problem is that after a certain limit lets say after scraping the data for the 320th AD the browser crashes. Is there anyway i can stop this or any way i can continue with my iteration after the crash. I new at this

Related

How to open 100000 different websites manually

I need to click in 100000 different url from websites to scrape different data about the website. The data has to be manually extracted because it is different on each website and it doesn't follow a pattern.
My question is, is there any program or script where I can paste the 100000 URL and it just open/preload some urls in tabs/windows so when I close one tab the next url opens in a new tab? This way I work in the main website tab that takes me 10 seconds to review and I click control+w and go to the next url.
This way I could save so much time instead of just clicking manually on each link and waiting for it to load.
You can use python web scraping or RPA(if you don't know basic python) and by logical steps, users can automate n number of tasks to be done.
Or you can also make use of 'pyautogui library' of python to click on visual things.
Thumbs up. If helped...

Classic asp code using more resources

I am using classic asp for a web application. I am running the web application on internet explorer.
I had developed few reports related to sales data. All the sales report are linked to Sales Dashboard. Every report has some selection criteria like customer selection date period selection product group selection and other few.
Now the problem which I am facing.
I open a total sales report for the entire year which takes almost 15 minutes to load on screen. while the report is executing if I try to open any other from the sales dashboard the page with selection criteria will appear after the first report is completely executed. If I copy the link location for the second report and open it in new window of internet explorer it will open normally.
I am not able to trace the problem did anyone had face the same problem.
First, I agree with this comment posted under the question:
IIS/ASP only allows one concurrent request per session. This is why the second request does not happen until after the first one completes. If you open a new browser instance or a different browser then this is treated as a different session.
Second, if all that is being asked here is whether other people have similar issues or not, then the answer is yes, due to what johna said in the comment.
If you're looking for a way to get around that for yourself, the way described in the comment (open a new browser instance or a different browser) will work.
However, if you're after a way to bypass the 15 minute wait time entirely, give some though to preparing the data before the report is called. What I mean by that is either schedule the report to run after close of business each day and store the relevant HTML or data separately, and/or provide a button to prepare the report based on current data which can be run whenever the user wants.

Is it possible to link data from R forms with Google Drive SDK

I wanted to get some feedback on the plausibility of this project before I grind too many gears.
Try running:
install.packages("webutils")
library(webutils)
demo_rhttpd()
Enter info. into the form and get a text file in the browser window after submitting it.
Now, is it possible to get R code to host this form in Google Drive:
https://support.google.com/drive/answer/2881970?hl=en
If not, why?
and is it possible to get R code to store the submitted form data in a Google Spreadsheet:
https://developers.google.com/google-apps/spreadsheets/
If not, why?

Scrapy: How to recrawl a page after some time?

Being lazy, I'm trying to use scrapy instead of implementing my own scraping service using celery+requests (been there, done that). Let's say I have a list of N pages that I like to monitor. After retrieving page X and reading its content, I want to tell the system to rescan it sometime later (depending on its content), say once two hours have passed.
Is such a thing possible with Scrapy?

Google analytics Using the API how can I tell where the actual url the user has come from?

I'm using the following to look at results from my Google analytics:
http://ga-dev-tools.appspot.com/explorer/?csw=1
i ve got it up and running ok but when i use ga:referralPath or ga:userdefinedvalue it says (not set)
I expected to be able to see where the user has come from for example:
if a user goes onto the dove site and wants to buy a product they click a buy now button which gives them the retailers, when they click on the retailer they are taken to the add to basket on the retailer site - where my UA code is.
I want to be able to see that they have come from the dove site - is this possible?
btw here's a good link to show how to see the full url (but at the moment i m getting (not set)) but if anyone knows how to get this working it would be perfect:
http://www.sebastienpage.com/2009/05/06/google-analytics-trick-see-the-full-referring-url/
After leaving the GA code on the site for a while we started getting the revenue results back and in ga:source and ga:refferalPath within the e commerce section i could see the results that i expected- one of the things that was changed was that we were told by the retailer who implemented our ga script was that they were getting a SSL certificate error - as we did not have one on our servers once this was added we seemed to be getting all the data through fine.

Resources