Web Scraping with python using Selenium and Chromedriver - web-scraping

I am using Selenium to scrape the data from web. However, when I try to run the Chromedriver it seems to run fine but no data has been extracted when I look into the csv file I export in the last. Additionally,when I time sleep the algorithm for 5 seconds and do right click + inspect it seems to work fine? Anybody knows what could be the issue and how can I fully automate the process without requiring the user to right click + inspect?

Related

Edge and Chrome flag pyinstaller as a virus

I created a simple python script to fetch crypto currency prices from website and then saves them to a csv file, the script works fine and flawlessly but when I compile the script and try to send it to someone or just create a release on GitHub it gets flagged as a virus when downloaded in edge and chrome but not brave (didn't test for Firefox) I tried compiling with pyinstaller of different versions 3.6 and the latest release I even tried to compile it myself by downloading the source code from github but it just keeps on getting flagged as virus, then I tried pyarmor-webui and changed some settings I know it uses pyinstaller but I though changing the settings will help but I was wrong
I tried nuikta with different settings but I keep on getting the same result
I followed this writeup but still not successful
I also tried changing the executable metadata with resource hacker and that didn't seem to help
I even tried to test with a simple hello world script and still got the same result
how do you fix this
import random
print ("Hello world!")
for i in range(3):
print (random.randrange(0,9))

Azure / R-server - head causes process to hang

I'm trying to use the head command for an hdfs data set in Azure using R-server via RStudio. This has worked in the past but in the last 2 days it seems like it's stopped working. When I execute it, it tells me the process has started running but never prints anything.
rxGetInfo seems to work fine on the same data. Any ideas to check why this is happening?

Alternative to Selenium scripts to download web data

I have to extract web data every day,to automate it I am using selenium recording script if i run recording script it will automatically download data based on my recording.Now i want alternative to do this task which has to do changes the dates and a few other changes in the website, then clicks the download button.
Thanks in advance.
#raju your question is very generic.Please make it more precise next time.
As per my understanding search for parameterization in selenium ide should help you. These are very basic case. Try with this at least something. Still you face some issue post the query with complete error log. We will be more than happy to help you.

RSelenium WebDriver TimeoutException after ClickElement

I'm using the Selenium WebDriver in R to parse some online data. I originally wrote the script a few months ago, and it worked great. However, I ran it again today and I receive the following error after running ClickElement():
Error: Summary: ScriptTimeout
Detail: A script did not complete before its timeout expired.
class: org.openqa.selenium.TimeoutException
I'm using Chrome as my browser, and have updated to the newest version (2.20) of ChromeDriver (I was using 2.19 when I wrote the script). This error is peculiar because it occurs pretty late in my script, after I have already used ClickElement() multiple other times. The element being clicked is a download button. Selenium completes the click and starts the download, but then throws the above error after a few minutes. At this point, the script continues.
I can only think of a few possible issues:
The ChromeDriver update has broken something. I've tried it with both 2.19 and 2.20, and I'm unsure how to test this further.
Some issue outside of my understanding of Selenium. From some experimenting and trying to Google similar problems, I've decided that it might have something to do with the download process itself, i.e. the driver freezes up because the download is currently running on the page.
I'm not sure what is going on, and I don't know enough about Selenium to troubleshoot it effectively. What can I do? I imagine that I'll need an alternative way to perform the download, or at least a way to click the element while ignoring the ScriptTimeout error. I receive the same error when I try to send the enter key to the element as well.
I have a similar issue but I can't figure out.
Try this after your clickElement code
Sys.sleep(2)

Programmatically creating PDF from webpage

I have my resume online in an html page (www.mysite.com/resume.html). Every time I need a PDF version of it, I use Google Chrome's print dialog to save it as a PDF, which uses my print CSS stylesheet.
I want to be able to navigate to www.mysite.com/resume.pdf to always have an up to date PDF version without having to go through Google Chrome manually. Is there a way to programmatically and automatically create a resume.pdf from resume html? If I can write a script that runs once every 24 hours or something like that, that would be good.
PhantomJS is perfect for this. It invokes an instance of WebKit from the command line which can then be used to output to file such as PDF.
PhantomJS:
http://phantomjs.org/
Specific instructions for exporting screen output to a file:
http://phantomjs.org/screen-capture.html
e.g.:
phantomjs rasterize.js 'http://www.example.com/resume.html' resume.pdf
Chrome has started headless program.
With that, we can create a pdf. e.g. for windows navigate your commandline to
C:\Users\{{your_username}}\AppData\Local\Google\Chrome SxS\Application>
Then hit the command:
chrome --headless --print-to-pdf="d:\\{{path and file name}}.pdf" https://google.com
If you are looking to do this server-side via PHP might I recommend the Browsershot library which leverages the Puppeteer (NodeJS package) and Chrome / Chromium headless browser?
It works really well. Way easier to install and get going than wkhtmltopdf (which is another option that doesn't rely on a NodeJS package nor the Chrome/Chromium headless browser. Personally, I recommend Browsershot solution since wkhtmltopdf has some issues depending on the type of server (Linux distro and version) you're running. That is, the only reliable way to install wkhtmltopdf that I've found is to download and compile from source on the server that you're running and not through a package manager).
Also, if you happen to be needing a solution specifically while working on a Laravel project then there's a wrapper library for Browsershot available.
Check out this tutorial to get started.

Resources