How to separate background HTTP requests - http

This is more of an issue of trying to understand how HTTP really works and then implementing it.
I need to have a HTTP analyzer that will be able to separate between the main page requests and "background" requests from some HTTP log data. The idea is to separate HTTP requests made by the user from those that happen automatically (loosely using this term) in the background. So, from the first few impressions of the HTTP data that I've seen it seems like when I go to any normal website an text/html object is fetched followed by a lot of other objects like css, xml, javascript, images etc.
Now, the problem is how do I separate these "background" requests where the user is actively not generating the requests. This will mostly be ad fetches, redirections and some Ajax based things from what I know.
Does anyone has any idea with regards to this. Some, experience or may be resources that you could point me to get started with doing this analysis?

There's no way to distinguish which requests were generated by the browser because of specific user actions or because of other automated processes from the bare HTTP requests. The browser/client it the only one that has such knowledge, so that you have to make it part of the picture, e.g. implementing the analyzer as a browser plugin or to embed an HTTP client as part of the analyzer itself.
If you're trying to create a generic tool to analyze traffic load, it's usually not meaningful to distinguish between traffic generated by user's direct "clicks" and automated requests.

There's no direct and clean way to do this. However, you can get pretty close by filtering out requests for files that clearly are not "user" requests, like *.jpg. Furthermore, you can filter out what is not a HTTP/200 response (e.g., 301 and 302 redirects).
Try something along the lines of:
cat access.log
| grep -E -v "(.gif|.ico|.png|.jpg|.jpeg|.js|.css) HTTP"
| grep "HTTP/1.1\" 200"
(added line breaks for readability)

Related

What is the difference between Requests and Requests-html?

I have to give seminar on Requests and Requests-html. I am searching that but can't find any website. Both Requests and Requests-html has same methods but what is the difference
Requests-HTML helps you to parse contents of a webpage (aka web-scraping). You can connect to a webpage and parse its contents like links, raw data, search for specific terms. Generally, it is used for data analytical purpose and requires less technical expertise than requests.
Requests helps you to make HTTP calls programatically. You can send GET/POST et al requests just like curl commands and receive response to be processed by certain logic. Generally backend API developers use it and requires technical knowledge of how HTTP works.

Tool to run http requests

I'm looking for a tool to which I can feed a file of saved http requests with their respective headers and the tool executes it. I mean, is there something that does that without the need of creating a wrapper? I know I could easily achieve this in any language, but that's not the question in this case. I know Postman, Insomnia, etc, but not quite sure whether I can open a file with HTTP requests and if so what should be the delimiter per request.

Why are so many HTTP requests sent to www.google.com?

I'm using Burp suite to see the requests my computer sends out when I go to www.google.com, and noticed that there were a lot of different requests sent. Why is this the case? Shouldn't it just be one GET request to Google's server, and then done? Instead it's sending maybe 10 GET requests and a handful of POST requests.
There's one GET request for the page (and more for every image, CSS, and JavaScript file), and then there can be many other AJAX GET/POST requests that get done afterward for things like updating the suggestions as you type things in, sending location information, or doing stuff with the cookies on your computer. Pretty much any time new information is displayed without reloading the page, there's an AJAX request going on. AJAX is also used to make expensive requests so the page can load faster. There are many uses.
Here's a tutorial for how AJAX works if you would like to do it yourself: AJAX Tutorial
Note: AJAX is a method of sending requests, it's not its own programming language. It stands for "Asynchronous JavaScript and XML."
while it is hard to come up with a 100% answer to your question (I can not tell which requests your computer sends to Google) one possibility is that after the first GET request Google sends back a bunch of HTML/CSS/JavaScript. JavaScript is then executed on your computer (Client side) and might trigger another request towards Google servers. However, this is just one possibility.
Cheers,
Christian
Normally every element of a page is requestet with a separate GET. (css, images, scripts)
So you'll hardly (never) find a site which is being loaded by one single GET-request.

How to interact with a server-side Java program via HTTP?

This probably could not possibly be a more basic HTTP question, but I am very new to web development and I do not even know the right question to ask (evidenced by the fact that googling has not helped).
What I have: an AWS server with an Elastic Beanstalk environment set up. I have successfully compiled, uploaded, and run a simple "Hello World" program to the environment using Eclipse.
What I want to do: pass the server a number via HTTP request and have the server give me back an HTTP response containing the square of that number. On the back end, I want a simple Java class to do the squaring. (Of course, the goal is to be able to pass more complicated data to the server and have more sophisticated Java code on the back end for processing.)
What I think I need to do: create a Java Servlet to listen for and process the request. I think (hope) the documentation is good enough that I can figure out the HTTPServlet API, but I can't answer a more basic question: how do you pass an HTTP request containing some elementary data, like a number?
Thanks in advance!
You need to either GET, or POST (or PUT) your data. GET provides the data in the URL of the request, and will be displayed in the browser's address bar. POST data is provided as a separate request body.
http://www.w3schools.com/tags/ref_httpmethods.asp
A simple GET would look like this:
http://example.com/server?number=4
You can make a POST using a browser extension such as PostMan:
https://chrome.google.com/webstore/detail/postman-rest-client/fdmmgilgnpjigdojojpjoooidkmcomcm?hl=en
Or you can do it from the command line using curl:
curl -X POST http://example.com/server -d'data'
Once the data is more complicated than a few variables, you probably want to use POST rather than GET. Also, you can start to think about what your requests are doing. GETs should only retrieve data from the server. If you modify or create data, then POST (or PUT) requests are the methods to use.
As your server becomes more complex, you probably want to start reading about REST.
http://en.wikipedia.org/wiki/Representational_state_transfer

Using GET for a non-idempotent request

Simply put, I have a website where you can sign up as a user and add data. Currently it only makes sense to add specific data once, so an addition should be idempotent, but theoretically you could add the same data multiple times. I won't get into that here.
According to RFC 2616, GET requests should be idempotent (really nullipotent). I want users to be able to do something like visit
http://example.com/<username>/add/?data=1
And this would add that data. It would make sense to have a PUT request do this with REST, but I have no idea how to make a PUT request with a browser and I highly doubt most people do or would want to bother to. Even using POST would be appropriate, but this has a similar problem.
Is there some technically correct way to allow users to add data using only GET (e.g. by visiting the link manually, or allowing external websites to use the link). When they visit this page I could make my own POST/PUT request either with javascript or cURL, but this still seems to violate the spirit of idempotent GET requests.
Is there some technically correct way to allow users to add data using
only GET ... ?
No matter how you go about letting clients access it, you'll end up violating RFC2616. It's ultimately up to you how you handle requests (nothing's going to stop you from doing this), but keep in mind that if you go against the HTTP specification, you might cause unexpected side-effects to clients and proxies who do abide by it.
Also see: Why shouldn't data be modified on an HTTP GET request?
As far as not being able to PUT from the browser, there are workarounds for that [1], [2], most of which use POST but also pass some sort of _method request parameter that's intercepted by the server and routes to the appropriate server-side action.

Resources