Do web crawlers read the HTTP headers? - http

I own a url shortening service and I want to detect whether the request that I received was from a web crawler or not. In response to the request, I send a HTTP header 302 that redirects the requester to the original link. I was thinking that I could provide an invisible link with the response, so that a bot would send me a request for that page too but a normal user wont. This is based on the hypothesis that even if bots read the header and redirects, they still scans the page and send requests to the links found in it. Is the hypothesis correct? If it is not, I could also redirect them via Javascript but that would not be the standard way of redirecting(I suppose).

Yes, crawlers definitely follow redirects. Their purpose is to find as many pages (or content) as possible. Following redirects is a basic requirement for that goal. However, I do not know if commercial crawlers read the body of a redirect response. I think they don't since information displayed on a redirect page is never shown to a user since they are always redirected away from that page.
There are other crawlers like Crawljax that are build for testing web applications. They will read all the data but those crawlers aren't (or shouldn't be) used to crawl the public web.

Related

Why would WooCommerce's REST API respond with a 302 on authenticated calls?

So I am integrating with a woocommerce store via the WooCommerce API.
Sometimes, the api endpoint responds as expected, with a JSON payload containing relevant resource data. Sometimes, it responds with a 302 error and a redirect location.
When it redirects, the location looks like this: https://example.com?aiowpsec_do_log_out=1&al_additional_data=1&_wpnonce=861898d6ac
I assumed this may be an issue with the permalinks, but they are already set up as custom and not the default setting which many articles warn about.
The interesting behaviour though, is that if I make an API request without credentials the API still responds as expected. And, if I go to the permalinks page and save the settings, the API then starts working as expected for a short duration. If I clear cookies, it still performs the same.
Any idea what may be causing this behaviour?

HTTP Referrer: Prevent spoofing

I want to prevent users by spoofing referrer header to access my API services.
In other words, I want to allow API being called only from particular domain (i.e. www.abcd.ef).
Documentation states:
The Referer request header contains the address of the previous web
page from which a link to the currently requested page was followed.
The Referer header allows servers to identify where people are
visiting them from and may use that data for analytics, logging, or
optimized caching, for example.
Is there any way I can make sure that AJAX (javascript) call comes from certain domain, even though it is possible to edit http referer header field to act like you come from www.abcd.ef?

Regarding the workings of cookies in sign in systems on the web

I was using Fiddler see on-the-field how web sites use cookies in their login systems. Although I have some HTTP knowledge, I'm just just learning about cookies and how they are used within sites.
Initially I assumed that when submitting the form I'd see no cookies sent, and that the response would contain some cookie info that would then be saved by the browser.
In fact, just the opposite seems to be the case. It is the request that's sending in info, and the server returns nothing.
When fiddling about the issue, I noticed that even with a browser cleaned of cookies, the client seems to always be sending a RequestVerificationToken to the server, even when just looking around withot being signed in.
Why is this so?
Thanks
Cookies are set by the server with the Set-Cookie HTTP response header, and they can also be set through JavaScript.
A cookie has a path. If the path of a cookie matches the path of the document that is being requested, then the browser will include all such cookies in the Cookie HTTP request header.
You must make sure to be careful when setting or modifying cookies in order to avoid XSS attacks against your users. As such, it might be useful to include a hidden and unique secret within your login forms, and use such secret prior to setting any cookies. Alternatively, you can simply check that HTTP Referer header matches your site. Otherwise, a malicious site can copy your form fields, and create a login form to your site on their site, and do form.submit(), effectively logging out your user, or performing a brute-force attack on your site through unsuspecting users that happen to be visiting the malicious web-site.
The RequestVerificationToken that you mention has nothing to do with HTTP Cookies, it sounds like an implementation detail that some sites written in some specific site-scripting language use to protect their cookie-setting-pages against XSS attacks.
When you hit a page on a website, usually the response(the page that you landed on) contains instructions from the server in the http response to set some cookies.
Websites may use these to track information about your behavior or save your preferences for future or short term.
Website may do so on your first visit to any page or on you visit to a particular page.
The browser would then send all cookies that have been set with subsequent request to that domain.
Think about it, HTTP is stateless. You landed on Home Page and clicked set by background to blue. Then you went to a gallery page. The next request goes to your server but the server does not have any idea about your background color preference.
Now if the request contained a cookie telling the server about your preference, the website would serve you your right preference.
Now this is one way. Another way is a session. Think of cookies as information stored on client side. But what if server needs to store some temporary info about you on server side. Info that is maybe too sensitive to be exposed in cookies, which are local and easily intercepted.
Now you would ask, but HTTP is stateless. Correct. But Server could keep info about you in a map, whose is the session id. this session id is set on the client side as a cookie or resent with every request in parameters. Now server is only getting the key but can lookup information about you, like whether you are logged in successfully, what is your role in the system etc.
Wow, that a lot of text, but I hope it helped. If not feel free to ask more.

URLReferrer is null when page is HTTPS

We use the URLReferrer and a code passed in on the query string to produce online videos so that only our paid clients can link to our video playback page. This system has worked well for some time. I know the URL referrer can be spoofed, but who would tell their clients to do such a thing to access a video ? It's worked well for us.
However, today I was asked about someone for whom it did not work. The URLReferrer is null, and their site is HTTPS. I have done some reading online and I get the impression there's no way to access the URL referrer when the source page is https. Is this correct ? If I made a https version of our site, would that resolve it ? Or is there any other way for me to get around this ?
Thanks
Your online research is correct. The main reason for not setting an HTTP Referrer header or equivalent is that this could be a security issue. The referrer contains "where you come from", this is private information, and should not be exposed to others, what use is it otherwise to have a secure site if everyone can track where you have been?
So: you cannot get the referrer if the referrer is encrypted (with SSL or otherwise).
Update: here's what the HTTP specification says about coming from a secure site:
Clients SHOULD NOT include a Referer header field in a (non-secure)
HTTP request if the referring page was transferred with a secure
protocol.
As you might have guessed, there's no way around this restriction. Your only option is to use a different verification model. One such method is giving your users a key and require them to send that as a parameter with the request. Several other methods can be thought of.

Validate Cross website request - Asp.Net

I have scenario where My webpage is requested by another website. That website will have Hyperlink to my webpage.
I Need to check whether request is coming from valid website or not. I have done this by checking URL Referer of that website, and working fine.
Another way to validate this request is to validate client certificate(x.509).
I want to know which is the best/secure way to validate referer website? Is there any other way to validate referer site excepting url referer and certificate validation?
Thanks
Fenil
The client certificate would identify the person clicking on the link, but not the referring page, so it should be ruled out.
As for the referrer, it does work, but with a couple of caveats:
1 - it's not secure (for big values of "secure"). The http_referrer is an optional field that the browser inserts in the request for your site. So it's controlled at the client's side and can be easily forged. So, if the level of security you want is "make sure that somebody has not posted my link on another page, where unaware users may click on it" then you're quite fine with checking the referrer. If you're relying on this for anything more (like making sure the incoming person is authorized to do something on your site) then you probably want a form of user authentication
2- some software that may be installed on your users' computers (like "Norton Internet Security") masks the http_referrer out of privacy concerns, so some of your users may not have a http_referrer.

Resources