I am trying to build a web scraper for the following site:
https://webdiplomacy.net/gamelistings.php
When I visit there on my browser, my initial request includes a Cookie in the headers.
Cookie: __utma=56936876.27553852.1525640664.1525640664.1525640664.1; __utmc=56936876; __utmz=56936876.1525640664.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmt=1; __utmb=56936876.4.10.1525640664
How does my browser generate this cookie, and how does it know to include a cookie - given that I am visiting this page in an incognito window without having visited another page?
There is no set-cookie header in the response.
gamelistings.php is the first file accessed in the request. How does my browser know to include specific data when it is first accessing the site?
Surely the procedure for generating a cookie must be contained in the website, but this cannot be the case since my browser is generating a cookie before it ever actually receives any data (since this is the first request).
How could I possibly generate such a cookie with a web scraper?
They are Google Analytics cookie. Are you sure you have not visited another site, not even google search? I can't reproduce your problem on ubuntu, firefox.
Related
I am having trouble understanding how a session is restored via cookies. How do the client know which sessionID cookie to send to the server via the HTTP request the first time? Does the client send all cookies and the server accepts the one that it also knows?
For example, let's say there are websites A and B (both using PHP), and I logged into both websites and then closed my browser. Now I open the browser again and go to site A and see that I am kept logged in. In this scenario, when my browser sends the HTTP request, the cookie whose file name contains the session ID for site A must have been included in the header. However I have two sessionID cookies for both site A and B. As far as I understand, the host name is usually not stored in the cookie, at least in many PHP tutorials of the setcookie() function. How do my browser know which sessionID cookie is for site A? Does the browser just send all cookies to A and let A figure out which is the right one by comparing all the received session IDs with stored session IDs? This does not sound right to me.
I installed some chrome extension that pop ups a modal box when I'm on a certain domain.
If I click the button in that box, I see in the Network tab of chrome developer tools, that the extension makes an HTTP Post request to the website.
The request contains some request cookies from the domain: A,B,C,D.
And response cookies from the domain: A,B,C - without D.
When the request is done (and the extension finished doing its "magic"), I discovered that the value of cookie D has changed, even though D was not in the response cookies. I tested it several times.
How is this possible? Can the extension make something in the background that is hidden from the network tab, that will cause the cookie D from the domain to change?
I want to be able to capture and document this Cookie D generation behavior, and don't know how to do that.
Using the chrome.cookies API, a Chrome extension can manipulate the cookies that are stored in the browser without the need to perform an HTTP request. The extension will need the cookies permission to access this API.
You will not be able to capture, or intercept, the extension's calls to the chrome.cookies API.
In addition, through the chrome.webRequest API, a chrome extension can modify the request headers, including cookies, which are sent or received without directly changing the cookies which are stored in the browser. The extension will need the webRequest and webRequestBlocking permissions to make such changes.
In my web application, there's a link sending a redirect (302 to another GET request) together with some cookies. It works fine, except when used from MS Access by a guy I remotely work with. I know close to nothing about what and how he does, I only know that he uses Application.FollowHyperlink.
The link from Access should be opened in a browser, but after the redirect, there seem to be no cookies there. When used normally, there's no problem. Can it be like that Access handles the link itself and sends the redirected URL to the browser?
Maybe a stupid question, but I have no idea about Access (never ever seen it) and I'm sitting only on the server side. There's nothing interesting in the server logs...
The problem was MS doing some complicated things like here instead of simply opening an URL in a browser. Access accesses the page, sees the new URL, and gets and eats all cookies. While digesting the cookies, it points the browser to the new URL. The browser has no cookies and no access to anything.
This summarizes it nicely:
This problem occurs because of missing session cookies for the Web server. This problem is specific to certain Web-server designs that depend on cookie information instead of authentication information or that depend on cookie information plus authentication information.
To me it sounds like "works with MS only", though I'm not exactly sure what "authentication" they mean.
I was using Fiddler see on-the-field how web sites use cookies in their login systems. Although I have some HTTP knowledge, I'm just just learning about cookies and how they are used within sites.
Initially I assumed that when submitting the form I'd see no cookies sent, and that the response would contain some cookie info that would then be saved by the browser.
In fact, just the opposite seems to be the case. It is the request that's sending in info, and the server returns nothing.
When fiddling about the issue, I noticed that even with a browser cleaned of cookies, the client seems to always be sending a RequestVerificationToken to the server, even when just looking around withot being signed in.
Why is this so?
Thanks
Cookies are set by the server with the Set-Cookie HTTP response header, and they can also be set through JavaScript.
A cookie has a path. If the path of a cookie matches the path of the document that is being requested, then the browser will include all such cookies in the Cookie HTTP request header.
You must make sure to be careful when setting or modifying cookies in order to avoid XSS attacks against your users. As such, it might be useful to include a hidden and unique secret within your login forms, and use such secret prior to setting any cookies. Alternatively, you can simply check that HTTP Referer header matches your site. Otherwise, a malicious site can copy your form fields, and create a login form to your site on their site, and do form.submit(), effectively logging out your user, or performing a brute-force attack on your site through unsuspecting users that happen to be visiting the malicious web-site.
The RequestVerificationToken that you mention has nothing to do with HTTP Cookies, it sounds like an implementation detail that some sites written in some specific site-scripting language use to protect their cookie-setting-pages against XSS attacks.
When you hit a page on a website, usually the response(the page that you landed on) contains instructions from the server in the http response to set some cookies.
Websites may use these to track information about your behavior or save your preferences for future or short term.
Website may do so on your first visit to any page or on you visit to a particular page.
The browser would then send all cookies that have been set with subsequent request to that domain.
Think about it, HTTP is stateless. You landed on Home Page and clicked set by background to blue. Then you went to a gallery page. The next request goes to your server but the server does not have any idea about your background color preference.
Now if the request contained a cookie telling the server about your preference, the website would serve you your right preference.
Now this is one way. Another way is a session. Think of cookies as information stored on client side. But what if server needs to store some temporary info about you on server side. Info that is maybe too sensitive to be exposed in cookies, which are local and easily intercepted.
Now you would ask, but HTTP is stateless. Correct. But Server could keep info about you in a map, whose is the session id. this session id is set on the client side as a cookie or resent with every request in parameters. Now server is only getting the key but can lookup information about you, like whether you are logged in successfully, what is your role in the system etc.
Wow, that a lot of text, but I hope it helped. If not feel free to ask more.
In my previous understanding, for a web site, only login user of a web site (no matter what login/authentication approach is used) could have cookie as persistent identifier, so that if the user close the browser, open browser again to go to the same web site, the web site could remember the user.
But I learned recently that it seems for non-login user, there could still be a cookie associated with the user (after the user close browser, and then open the browser again to go to the same web site, the web site could remember the user), and it is called browser cookie? Is that true?
If it is true, who is responsible to set the browser cookie? i.e. need some coding/config at web server side, client browser configuration (without coding from server side), or both? How could web server access such cookie? Appreciate if any code samples.
thanks in advance,
George
Whether you actually "log in" or not is irrelevant to what cookies are stored.
If the browser requests a page, and the server includes a Set-Cookie response header, then the browser will store the value of that cookie in a local cache and every time it requests a page from the same server, it sends the value of the cookie back as well (in the Cookie request header).
It just so happens that when you "log in" to a website, the website will usually use the Set-Cookie header to tell the browser to store a value that indicates that you're already logged in (and your user-id and some other security-related stuff). But there's nothing stopping the web server from using Set-Cookie at any other time.