Selenium 2 - Setting user agent for IE and Chrome - webdriver

I need to change the user agent value in IE and Chrome for some of our tests. The only selenium 2 examples I've come across only work with FirefoxDriver.
Has anyone managed to change the user agent for IE and Chrome?
Mark

I know this is veery old by now, but I stumbled across it and a few seconds ago and I also found the real solution (at least for the latest version of Selenium).
So here we go (Python, example faking the iPad UA):
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--user-agent=Mozilla/5.0 (iPad; CPU OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3')
driver = webdriver.Chrome(chrome_options=options)
# ...loads of fun...
I hope this is helpful for anyone else having the same problem. Oh, and it also works with all the other Chrome command line options. Njoy ;)

This is how I got it running in python for Chrome.
from selenium import webdriver
...
def setUp(self):
capabilities = webdriver.DesiredCapabilities.CHROME
capabilities["chrome.switches"] = ["--user-agent="+USER_AGENT_STRING]
cls.driver = webdriver.Chrome(executable_path="servers/chromedriver",desired_capabilities=capabilities)
self.driver.implicitly_wait(5)
self.verificationErrors = []

Here's an answer for PHP:
$options = new ChromeOptions();
$options->addArguments(['--user-agent=my fake user-agent string']);
$capabilities = DesiredCapabilities::chrome();
$capabilities->setCapability(ChromeOptions::CAPABILITY, $options);
$driver = RemoteWebDriver::create($host,$capabilities);

I finally found out how to do that at least in chrome:
capabilities = webdriver.common.desired_capabilities.DesiredCapabilities.CHROME.copy()
capabilities['javascriptEnabled'] = True
options = webdriver.ChromeOptions()
options.add_argument('--user-agent=<YOUR USER AGENT HERE>')
driver = webdriver.Remote(command_executor='http://<YOUR SELENIUM HUB HERE>:4444/wd/hub',desired_capabilities=capabilities, options=options)
Sources:
https://gist.github.com/thureos/2db0bc44589669a00c22a86503c80bbb
https://seleniumhq.github.io/selenium/docs/api/py/webdriver_remote/selenium.webdriver.remote.webdriver.html?highlight=remote

Related

Newspaper3k, User Agents and Scraping

I'm making text files consisting of the author, date of publication and main text of news articles. I have code to do this, but I need for Newspaper3k to identify the relevant information from these articles first. Since user agent specification has been an issue before, I also specify the user agent. Here's my code so you can follow along. This is version 3.9.0 of Python.
import time, os, random, nltk, newspaper
from newspaper import Article, Config
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
config = Config()
config.browser_user_agent = user_agent
url = 'https://www.eluniversal.com.mx/estados/matan-3-policias-durante-ataque-en-nochistlan-zacatecas'
article = Article(url, config=config)
article.download()
#article.html #
article.parse()
article.nlp()
article.authors
article.publish_date
article.text
To better understand why this case is particularly puzzling, please substitute the link I've provided above with this one, and re-run the code. With this link, the code now runs correctly, returning the author, date and text. With the link in the code above, it doesn't. What am I overlooking here?
Apparently, Newspaper demands that we specify the language we're interested in. The code here still doesn't extract the author for some strange reason, but this is enough for me. Here's the code, if anyone else would benefit from it.
#
# Imports our modules
#
import time, os, random, nltk, newspaper
from newspaper import Article
from googletrans import Translator
translator = Translator()
# The link we're interested in
url = 'https://www.eluniversal.com.mx/estados/matan-3-policias-durante-ataque-en-nochistlan-zacatecas'
#
# Extracts the meta-data
#
article = Article(url, language='es')
article.download()
article.parse()
article.nlp()
#
# Makes these into strings so they'll get into the list
#
authors = str(article.authors)
date = str(article.publish_date)
maintext = translator.translate(article.summary).text
# Makes the list we'll append
elements = [authors+ "\n", date+ "\n", maintext+ "\n", url]
for x in elements:
print(x)

pyserial code working on Windows (COM1) but not on Linux (/dev/ttyS0)

I am using python 3.6.
The following code works just fine on Windows 10 Pro:
import serial
import binascii
ser = serial.Serial("COM1") # "COM1" will be "/dev/ttyS0" on Linux
if ser.is_open == True:
print("COM open")
ser.baudrate = 2400
print('Port configuration:')
print('baudrate:',ser.baudrate)
print('parity:',ser.parity)
print('stopbits:',ser.stopbits)
print('bytesize:',ser.bytesize)
print('xonxoff:',ser.xonxoff)
print('timeout:',ser.timeout)
print()
print('sending...')
frame = bytearray()
frame.append(0x7e)
frame.append(0x03)
frame.append(0x02)
frame.append(0x21)
frame.append(0x00)
frame.append(0xa4)
ser.write(frame)
print(binascii.hexlify(frame))
print()
print('receiving...')
recv = ser.readline()
recv_len = len(recv)
print(binascii.hexlify(recv))
print()
ser.close()
if ser.is_open == False:
print("COM closed")
But it gets stuck at 'ser.readline()' when I run it under CentOS 6.8, as there was no cable attached to the port.
It looks like a trivial issue, but I cannot figure out what's wrong or missing.
If you cannot either, I hope the sample code can result useful to someone at least.
False problem. The code worked using ttyS1 instead of ttyS0 (I knew it was something trivial).
Anyway, very useful to check
cat /proc/tty/driver/serial
which shows tx/rx statistics and parameters as DTS, RTS, RI, etc. next to each port.
For example, next to the ttyS1 I noticed an 'RI' which was the same parameter that Hercules terminal on Windows showed me (graphically) when I tried to open COM1. Very intuitive to identify a serial port this way!

Silencing ChromeDriver.exe logging

I am running ruby unit tests against Chrome using watir-webdriver. Whenever a test is run and chromedriver.exe is launched output similar to below appears:
Started ChromeDriver
port=9515
version=26.0.1383.0
log=C:\Home\Server\Test\Watir\web\chromedriver.log
[5468:8796:0404/150755:ERROR:accelerated_surface_win.cc(208)] Reseting D3D device
[5468:8996:0404/150758:ERROR:textfield.h(156)] NOT IMPLEMENTED
[WARNING:..\..\..\..\flash\platform\pepper\pep_module.cpp(63)] SANDBOXED
None of this impacts the correct functioning of the tests, but as one might imagine the appearance of "ERROR" and "WARNING" might be rather confusing to, for example, parsing rules in Jenkins looking for failures. Sure I can get really fancy with regular expression in the parsing rules, but it would be really nice to turn off this verbose and unnecessary logging on the part of chromedriver.exe. I have seen many mentions of this searching for an answer. No one has come up with a solution. Yes, chromedriver possibly has a "--silent" option, but there seems to be no way to pass that to the executable. Code similar to below is supposed to work, but has zero effect as far as I can see. Any ideas?
profile = Selenium::WebDriver::Chrome::Profile.new
profile['--cant-make-any-switches-work-here-how-about-you'] = true
browser = Watir::Browser.new :chrome, :profile => profile, :switches => %w[--ignore-certificate-errors --disable-extensions --disable-popup-blocking --disable-translate--allow-file-access]
Here's help for anyone else searching
Find ...selenium\webdriver\chrome\service.rb
Path start may differ on your system
And I added "-silent" to the passed parameters .... However, this silenced everything but the error/warning messages.
def initialize(executable_path, port)
#uri = URI.parse "http://#{Platform.localhost}:#{port}"
server_command = [executable_path, " -silent", "--port=#{port}"]
#process = ChildProcess.build(*server_command)
#socket_poller = SocketPoller.new Platform.localhost, port, START_TIMEOUT
#process.io.inherit! if $DEBUG == true
end
set chromeOptions with key --log-level=3 this should shut it up
I was able to divert the hundreds, yes hundreds, of chrome driver log messages that were showing up in cucumber stdout by using the :service_log_path argument.
#browser = Watir::Browser.new :chrome, :service_log_path => 'chromedriver.out'
the '-silent', or '--silent', or ' -silent', or ' --silent' parameter suggested above did nothing when I added it to ...selenium\webdriver\chrome\service.rb. And having to tweak the gem itself is not a particularly viable solution.
I couldn't find a place to capture the chromedriver stderr and divert it to null (not to mention having to handle doing that in windows and in *nix/osx)
The driver should default to something way less verbose. In this case INFO is way too verbose as hundreds of log entries pop out as INFO, 90%+ of them identical.
At least the :service_log_path argument works most of them.
You can try -Dwebdriver.chrome.logfile="/dev/null" and/or -Dwebdriver.chrome.args="--disable-logging" to the options of java that runs selenium-server-standalone-what.ever.jar

PyQt QWebkit Javascript Function.bind does not exist (ECMAScript 5 missing functions)

Javascript in a web application runs the following loop:
for (var name in this) {
if(typeof(this[name]) == "function") {
if((/^on_|^do_/).test(name)) {
console.debug("Adding ", name, " to ", this, "(", this[name], ")");
f = this[name].bind;
console.debug(f);
this[name] = this[name].bind(this);
}
}
}
Under Chrome 24.0.1312.56, the line f = this[name].bind correctly sets f to the native code function.bind(), while in my QWebKit Qt application it sets f to 'undefined'.
https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/Function/bind
Any idea how I'd be able to convince QtWebkit to behave correctly here?
Apparently, Function.prototype.bind is part of ECMAScript 5. It's implementation in webkit should be covered by (fixed bug): https://bugs.webkit.org/show_bug.cgi?id=26382
Perhaps there is a mode to enable ECMAScript 5 that i'm missing?
Apparently i'm using version 534.34 for QtWebkit:
(Pdb) str(QtWebKit.qWebKitVersion())
'534.34'
Which according to this:
https://trac.webkit.org/changeset/85696/trunk/Source/WebKit/mac/Configurations/Version.xcconfig
Corresponds to revision 85696. Combined with the comment in the above bug ("Fixed in r95751"), seems like I need a newer version, specifically anything better than 535.5. Now to find what version of PyQt uses that...
Thanks.
It seems that the latest version of PyQt (4.9.6-1) is compiled against wekbit version 534.34.
The first release of webkit that supports Function.prototype.bind is 535.5.
In addition, it seems that both PySite 1.2.2 and PyQt 4.9.6-1 report webkit version 535.34, and do not have Function.prototype.bind.
Try using the following code which forces you to use Function.prototype.bind
this[name] = Function.prototype.bind.call(this[name], this)
In IE, some of the host objects don't have a bind method on their methods (functions)... could be something related.

Retrieving URL using RCurl gives different date format than in browser

I am attempting to scrape a mobile-formatted webpage using RCurl, at the following URL:
http://m.fire.tas.gov.au/?pageId=incidentDetails&closed_incident_no=161685
Using this code:
library(RCurl)
options( RCurlOptions = list(verbose = TRUE, useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13"))
inurl <- getURL(http://m.fire.tas.gov.au/?pageId=incidentDetails&closed_incident_no=161685)
Note that I have attempted to set the user-agent to look like a Chrome browser - the results I get are the same with or without doing this. When I view the URL in Chrome, the dates come out formatted like this, with a time stamp as well:
And the HTML source matches that:
Last Updated: 24-Aug-2009 11:36<br>
First Reported: 24-Aug-2009 11:24<br>
But within R, after I've retrieved the data from the URL, the dates are formatted like this:
Last Updated: 2009-08-24<br>
First Reported: 2009-08-24<br>
Any ideas what's going on here? I figure the server is responding to the browser/Curl's user-agent or region or language or something similar, and returning different data, but can't figure out what I need to set in RCurl's options to change this.
Looks like the server is expecting 'Accept-Language' header:
library(RCurl)
getURL("http://m.fire.tas.gov.au/?pageId=incidentDetails&closed_incident_no=161685",
httpheader = c("Accept-Language" = "en-US,en;q=0.5"))
works for me (returns First Reported: 24-Aug-2009 11:24<br> etc.). I discovered this by using HttpFox Firefox plugin.

Resources