Is there any difference in performance when using Googlebot vs Mozilla as the CURLOPT_USERAGENT? I'm hypothesizing that some pages might output simpler HTML when Googlebot is the useragent, but I don't really know.
Setting a user agent won't make your cURL call faster. In fact, in some servers, they intend to block suspicious User Agent. Therefore, just use the default user agent is fine.
Related
I am exploring the Google PageSpeed insights api and there in the response I see a tag called:
{
...
lighthouse.userAgent:'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/84.0.4147.140 Safari/537.36'
...
}
Docs mention userAgent="The user agent that was used to run this LHR."
https://developers.google.com/speed/docs/insights/rest/v5/pagespeedapi/runpagespeed#LighthouseResultV5 What does that mean? How is this performance aggregated by running on all browsers?
PS: This is for Desktop version.
What does that mean?
This lets you know what browser was used to run the test.
It is useful for if you believe that there is an issue with Lighthouse (a bug in your report) so you can test it directly on the same browser that Lighthouse uses.
There is also the "environment" object which contains how Lighthouse presented itself (it sent a header saying "treat me like this browser") to the website that was being tested. (lighthouseResult.environment.networkUserAgent)
This is useful so you can check your server isn't blocking requests for that user agent etc.
It is also useful for checking your server logs to see what requests Lighthouse made etc. etc.
See the Wikipedia page for user agent for more info on user agents
How is this performance aggregated by running on all browsers?
As for your second question it doesn't quite make sense, but it has no impact on performance unless your server does something different for that user agent string if that is what you mean.
While it is important that people should use approved browsers downloaded from their legitimate sites to use them, is there any way for the server to detect if someone is spoofing the browser (user agent)?
My question is in particular reference to security. What if someone creates a browser (user agent) and does not respect some contracts (for example, Same origin policy of cookies) to exploit vulnerabilities there? This illegitimate browser can claim that it is a genuine user agent by populating the User-Agent header with standard values used in Firefox or Chrome.
Is there any way at the server side to detect if the user is using spoofed user agent so that the server can take counter measures if needed? Or is this the absolute responsibility of the individual using the browser to use approved browsers only (servers have no way to detect it)?
Browsers are just high level user interfaces to HTTP. Prior to the introduction of various security methods there was not much in place to prevent such attacks. Nowadays browsers (Chrome, Firefox, Edge) have restrictions and abide by certain rules/contracts (in order to work properly).
One can spoof(send) anything with a HTTP client (a very lean
"browser") such as CURL.
curl -A "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0" http://blah.nonexistent.tld
(The default curl header is something like User-Agent: curl/7.16.3)
A server could restrict or detect uncommon User-Agents to prevent scraping or scanning, but the user agent is nothing but a "name" and could just be changed to a common one as done above.
The security methods (contracts) that have been added such as Same Origin Resource Policy / Cross Origin Resource Sharing / HTTP Only are there to protect the client (browser) and server. They must be implemented by both in order to function properly(securely) to protect against an attack as mentioned. If the client and the server don't properly use the contracts agreed upon, then cookies could be exfiltrated (A modern browser is designed to fail fast and would still prevent this).
If you meant that you were to create your own browser set its User-Agent as Chrome, ignore contracts in place by properly configured servers then they might ignore you. What user cookies would you steal from a "custom" browser, that few people may use?
Recently I put some hidden links in a web site in order to trap web crawlers. (Used CSS visibility hidden style in order to avoid human users accessing it).
Any way, I found that there were plenty of HTTP requests with a reference of browsers which have accessed the hidden links.
E.g : "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31"
So now my problems are:
(1) Are these web crawlers? or else what can be?
(2) Are they malicious?
(3) Is there a way to profile their behaviour?
I searched on the web but couldn't find any valuable information. Can you please provide me some resources, or any help would be appreciated.
This is a HTTP user agent. They are not malicious at all. It's following the pattern, for example Mozilla/<version> and so on. A browser is a user-agent for example. However, they can be used by attackers and this can be identified by looking at anomalies. You can read this paper.
The Hypertext Transfer Protocol (HTTP) identifies the client software
originating the request, using a "User-Agent" header, even when the
client is not operated by a user.
The answer to your questions are, in order:
They are not web crawlers. They are user agents. Common term for a web developer.
Generally they aren't malicious but they can be, as I suggest, look at the paper.
I don't understand what you mean by profiling behaviour, they aren't malware!
I have a really buggy web application at work. In order to avoid using its interface, I want something that will save the HTTP requests I send with it, and enable me to resend them whenever I want. Do you know of anything that does that? Maybe there is an add-on for Firefox (I searched, but didn't find one)?
I need to be able to do this on Linux.
You can use Fiddler to intercept HTTP requests and responses between the browser and the client.
Fiddler also supports handcrafting and sending HTTP requests using its Request Builder feature:
Try iMacros for firefox
According to plugin description:
"Automate Firefox. Record and replay repetitious work. If you love the Firefox web browser, but are tired of repetitive tasks like visiting the same sites every days, filling out forms, and remembering passwords, then iMacros for Firefox is the solution you’ve been dreaming of!"
https://addons.mozilla.org/en-US/firefox/addon/3863/
I want to change first line of the HTTP header of my request, modifying the method and/or URL.
The (excellent) Tamperdata firefox plugin allows a developer to modify the headers of a request, but not the URL itself. This latter part is what I want to be able to do.
So something like...
GET http://foo.com/?foo=foo HTTP/1.1
... could become ...
GET http://bar.com/?bar=bar HTTP/1.1
For context, I need to tamper with (make correct) an erroneous request from Flash, to see if an error can be corrected by fixing the url.
Any ideas? Sounds like something that may need to be done on a proxy level. In which case, suggestions?
Check out Charles Proxy (multiplatform) and/or Fiddler2 (Windows only) for more client-side solutions - both of these run as a proxy and can modify requests before they get sent out to the server.
If you have access to the webserver and it's running Apache, you can set up some rewrite rules that will modify the URL before it gets processed by the main HTTP engine.
For those coming to this page from a search engine, I would also recommend the Burp Proxy suite: http://www.portswigger.net/burp/proxy.html
Although more specifically targeted towards security testing, it's still an invaluable tool.
If you're trying to intercept the HTTP packets and modify them on the way out, then Tamperdata may be route you want to take.
However, if you want minute control over these things, you'd be much better off simulating the entire browser session using a utility such as curl
Curl: http://curl.haxx.se/