how to handle cookies while scraping - http

I'm in the middle of making a small scraping utility which is designed
to run as quickly as possible using multiple http connections to the server.
How does one handle cookies in this situation..
For example if the first connection scrapes the page for links, and the server
sets the cookie to something,, wouldn't firing off additional connections
cause the cookies to be out of sync ?

The answer is it really depends on the server.
If the server changes the cookie with each and every request
yes it will throw off the cookie. What exactly this does again
depend on the server.
If say the cookie is just set once at login, then it wouldn't really matter.

Related

How are user logins from a desktop application usually handled?

What is the common(best practice) way of allowing a user to sign in from a Windows desktop application. Some examples of what I mean are Dropbox or Google Picasa. You sign in with your credentials and then the software is permanently signed in.
I assume the communication takes place over HTTPS. Does the client store the credentials to be sent with requests or is there some sort of token generated? Can anyone point me to some resources on how this should be handled?
Logging into a website normally creates a session at the server. The server then has to identify subsequent requests by the session. Typically, there is one of the following two solutions applied:
Session cookie, storing a session identifier
URL rewriting, where the session identifier is appended to every link in the html source
Which approach is taken is site dependent, so if you are writing a general 'for all sites' client, you might have to implement both.
In the former case, your application will have to handle the session cookies, in the second case, your application has either nothing to do - if it caches the html response - or will have to emulate the url rewriting itself.
In both cases be aware that the session will expire at server side after a certain period without any activity, so you might be required to generate such.

Regarding the workings of cookies in sign in systems on the web

I was using Fiddler see on-the-field how web sites use cookies in their login systems. Although I have some HTTP knowledge, I'm just just learning about cookies and how they are used within sites.
Initially I assumed that when submitting the form I'd see no cookies sent, and that the response would contain some cookie info that would then be saved by the browser.
In fact, just the opposite seems to be the case. It is the request that's sending in info, and the server returns nothing.
When fiddling about the issue, I noticed that even with a browser cleaned of cookies, the client seems to always be sending a RequestVerificationToken to the server, even when just looking around withot being signed in.
Why is this so?
Thanks
Cookies are set by the server with the Set-Cookie HTTP response header, and they can also be set through JavaScript.
A cookie has a path. If the path of a cookie matches the path of the document that is being requested, then the browser will include all such cookies in the Cookie HTTP request header.
You must make sure to be careful when setting or modifying cookies in order to avoid XSS attacks against your users. As such, it might be useful to include a hidden and unique secret within your login forms, and use such secret prior to setting any cookies. Alternatively, you can simply check that HTTP Referer header matches your site. Otherwise, a malicious site can copy your form fields, and create a login form to your site on their site, and do form.submit(), effectively logging out your user, or performing a brute-force attack on your site through unsuspecting users that happen to be visiting the malicious web-site.
The RequestVerificationToken that you mention has nothing to do with HTTP Cookies, it sounds like an implementation detail that some sites written in some specific site-scripting language use to protect their cookie-setting-pages against XSS attacks.
When you hit a page on a website, usually the response(the page that you landed on) contains instructions from the server in the http response to set some cookies.
Websites may use these to track information about your behavior or save your preferences for future or short term.
Website may do so on your first visit to any page or on you visit to a particular page.
The browser would then send all cookies that have been set with subsequent request to that domain.
Think about it, HTTP is stateless. You landed on Home Page and clicked set by background to blue. Then you went to a gallery page. The next request goes to your server but the server does not have any idea about your background color preference.
Now if the request contained a cookie telling the server about your preference, the website would serve you your right preference.
Now this is one way. Another way is a session. Think of cookies as information stored on client side. But what if server needs to store some temporary info about you on server side. Info that is maybe too sensitive to be exposed in cookies, which are local and easily intercepted.
Now you would ask, but HTTP is stateless. Correct. But Server could keep info about you in a map, whose is the session id. this session id is set on the client side as a cookie or resent with every request in parameters. Now server is only getting the key but can lookup information about you, like whether you are logged in successfully, what is your role in the system etc.
Wow, that a lot of text, but I hope it helped. If not feel free to ask more.

Pass value from one ASP.NET app to another via HTTP Header

We are implementing a single sign on mechanism in an enterprise environment, where the token is shared between applications using HTTP header. Now, in order to do the integration test, I need to write an application to simulate this.
Is there any way in ASP.NET where I can redirect to another web-page and pass a custom HTTP header in the process?
Thanks
You need to create a page on Site B that Site A redirects the user too that sets a cookie with the desired value.
for instance.
http://siteb.com/authenticate.aspx?authtoken=15128901428901428904jasklads&returnUrl=http://siteb.com/index.aspx
authenticate.aspx would set a cookie and then every request would receive authtoken.
The server could send the HTTP header to the client on a redirect, but the client would not send it back to the other remote server.
The ideal solution in this case would be to use a Cookie, or a QueryString variable. Cookies may suffer from cross-domain issues and become complicated if host names are different enough.
In any of these approaches, one must be careful not to create a security hole by trusting this information as it is user input coming back from the client (or some black hat).

Push cookie notification

The question that I have is very basic: Is there a way to inform the web browser that the content of the cookie has changed?
I don't want to keep looking at the file and check if it has been updated because it'll cause performance degree on my app.
Thanks in advance!
It's always the job of the server to "push" the new cookie to the browser (in ASP.NET, by setting Response.Cookies("cookie_name")).
I'm not sure to understand your concern, only your application can know that the user's cookie needs to be changed, but it's usually not stored as a file on the server.
You would probably have to check for the Cookie or Set-Cookie http headers that come with the ASP page whenever it reloads in your application, though this would not account for changes made by javascript.
If the ASP page is refreshing each time a cookie changes, that is likely to be a much larger overhead than the WPF reading the cookie from the disk.
Well, since the ASP page needs to be active in order to SET the cookies, why couldn't you just use a HttpClient to request the cookies in your WPF app and on the server, set the cookies. Then send HTTP Response depending on whether or not you set the cookies. If you receive a 200 OK response, you can know your cookies are set. If there's an error, send a 500 Server Error back.

Is there a good way of securing an ASP.Net web service call made via Javascript on the click event handler of an HTML button?

The purpose of using a Javascript proxy for the Web Service using a service reference with Script Manager is to avoid a page load. If the information being retrieved is potentially sensitive, is there a way to secure this web service call other than using SSL?
If your worried about other people access your web service directly, you could check the calling IP address and host header and make sure it matches expected IP's addresses.
If your worried about people stealing information during it's journey from the server to the client, SSL is the only way to go.
I would use ssl it would also depend I suppose on how sensitive your information is.
I would:
Use SSL for the connection
Put a time and session based token in the request
Validate the inputs against expected ranges on the server
SSL prevents man-in-the-middle
Tokenized requests verify that the request is coming from an active and authenticated session, within a reasonable amount of time from the last activity within the session. This prevents stale requests being re-submitted and verifies that it came from the source of the session (store the IP address, user-agent, etc on the server for session management).
Validating that the inputs are within expected ranges verifies that the request has not been doctored by the party that you are talking to.
Though SSL would be best, there are a number of client-side cryptography libraries that could alleviate some of the security concerns - see https://github.com/jbt/js-crypto for a nice collection

Resources