Is there any sort of data dump or data set with information from Web Server logs?
The information that I am mainly looking for are:
a) what type of request is it (POST or GET or HTTP or something else)
b) What type of data is being transferred (image, audio, video or text)
c) what is the size of the data that is being transferred
Information such as IP address, URL can be anonymous.
Are you using Firefox? If so, you can use the included Web Console tool to view all the HTTP request body being sent from your browser to the server and the response bodies, along with things like the method (GET, POST, etc.). This would be the same thing that a web server would be logging (except the IP address of the client is always you, obviously). You should be able to copy all the data and paste it to a file if you want a data dump.
To use the web console, click the orange Firefox button and then Web Developer > Web Console. Or if you're using an older version or have the Firefox button disabled, it's under the tools menu.
Edit: To get the most out of it, you'll want to right click on the console and select Log Request and Response Bodies. This will get you more information than just the headers.
Related
I am developing a web scraper and I need to download a .pdf file from a page. I can get the file name from the html tag, but can't find the complete url (or request body) that downloads the file.
I have tried to sniff the traffic with the chrome and firefox network traffic tool and with wireshark, with no success. I can see it make a post request to the exact same url as the page itself, and so I can't understand why this happens. My guess is that the filename is being sent inside the POST request body, but I also can't find that information in those tools. If I could see the variable name in the body, I could create a copy of the request and then get the file.
How can I get that information?
Here is the website I am talking about: http://www2.trt8.jus.br/consultaprocesso/formulario/ProcessoConjulgado.aspx?sDsTelaOrigem=ListarProcessos.aspx&iNrInstancia=1&sFlTipo=T&iNrProcessoVaraUnica=126&iNrProcessoUnica=1267&iNrProcessoAnoUnica=2010&iNrRegiaoUnica=8&iNrJusticaUnica=5&iNrDigitoUnica=24&iNrProcesso=1267&iNrProcessoAno=2010&iNrProcesso2a=0&iNrProcessoAno2a=0
EDIT: for those seeking to do something similar, take a look at this website: http://curl.trillworks.com/
It converts a cURL to a python requests code. Very useful
The POST data used for the request is encoded content generated by ASP.NET. It contains various state/session information of the page that the link is on. This makes it difficult to directly scrape for the URL.
You can examine the HAR by exporting it from the Network tab in Chrome DevTools:
The __EVENTVALIDATION data is used to ensure events raised on the client originate from the controls rendered on the page from the server.
You might be able to achieve what you want by requesting the page the link is on first, then extract the required POST data from the response (containing the page state and embedded request for file), and then make a new request with this information. This assumes the server doesn't expire any sessions in the meantime.
If I enter below URL in a browser such as FF and fires in my workshop, I get JSON data back returns my home address based on my ID and its content is displayed in the browser.
http://host:port/get/my/address/id={xyz}
If I want to imitate same operation inside of web application written in Java, I have to write specific http get request code in a servlet.
My question is:
Does modern web browser like IE and FF have its own network program written in modern programming language maybe C/C++? that runs inside of the browser that would know how to handle http get/post request?
I need to know which requests a webpage sends. Basically the site i call, calls another service/api/url whatever and receives the data (probably within javascript) and show me this. Can i see all the calls it make?
Edit: concrete example:
From this site (http://www.flickriver.com/lenses/nikon/) you can choose a lens, at that moment, the page sends a request to flickr, and get all the data. But in chrome developer tools i could not see this request.
Here is a screenshot of get requests. I have looked through them but could not see any request to flickr.
The first is request to the page. And the sixth one is the picture request already, where it requests the picture by its id. So in between other 4 requests should contain a request to the external source which gives the picture id in return or do i miss sth?
And what if the backend makes this request? Do i still need to see this request in developer tools?
No, of course you cannot see the calls made by some server to another server. Why would you expect to be able to do that? Those calls have nothing to do with the browser. The browser knows nothing about those requests. The browser knows only about requests that it itself initiated. Devtools can only report on requests made by the browser. If in fact there were some way to spy on the requests made by a server to another server, it would be gaping security hole.
Part of a site I am working on at the moment requires Audio/Video previews.
These are server from a different server to the main site.
The Streaming URL is of the form:
www.myserver.com/Preview.aspx?e=I_AM_AN_ENCRYPTED_KEY
The Key is generated by the server that hosts the file, not the site on which the previews are actually displayed. It's kind of an API.
Part of the security to stop these previews being played anywhere except this website is supposed to check the domain which is requesting this, but it seems that HttpContext.Current.Request.UrlReferrer is NULL when requested from an HTML5 video/audio element.
Without posting the domain along with the Key to the API, is there any way that I can get the referring URL on the receiving server, server side?
EDIT:
To clarify:
There is a website with HTML5 elements which are directed to a URL on a different server, the URL and key is provided by this server (not the website)
When the API server receives a request to stream the preview it checks the Key (which basically tells it what to play) and also checks for the referring domain against a list of allowed domains.
Figured it out - in case anyone cares...
Simply replace:
ReferringDomain = HttpContext.Current.Request.UrlReferrer
with :
ReferringDomain = HttpContext.Current.Request.Headers("Referer")
Sorted! :)
Just trying to understand why they didn't use a REST API.
In REST, clients initiate requests to servers for resources; servers process those requests and return appropriate responses.
The utm.gif is not involved in server-to-client data transfer, but instead it's involved in moving data in the other direction.
Of course REST has HTTP methods for the client to communicate with servers (GET and POST) and indeed, Google Analytics directs the client's browser to send all analytics data to the GA servers via a GET Request. More precisely, a GET Request is comprised of a Request URL and Request Headers (e.g., Referer and User-Agent Headers).
All GA data--every single item--is assembled and packed into the Request URL's query string (everything after the '?'). But in order for that data to go from the client (where it is created) to the GA server (where it is logged and aggregated) there must be an HTTP Request, so the ga.js (google analytics script that's downloaded, unless it's cached, by the client, as a result of a function called when the page loads) directs the client to assemble all of the analytics data--e.g., cookies, location bar, request headers, etc.--concatenate it into a single string and append it as a query string to a URL (http://www.google-analytics.com/__utm.gif?) and that becomes the Request URL.
Of course there can't be an HTTP Request without a resource; so resource is the client requesting from the server? It doesn't need anything from the server, instead it wants to send information to the server. So the actual server resource requested by the client is purely pretextual--the resource isn't even needed by the client, it's solely requested to comply with the transmission protocol operator. Therefore, it makes sense to make that resource as small and as unobtrusive as possible, which is why it's a 1 x 1 transparent pixel in gif format. It is the smallest possible size and the least dense image format (bytes/pixel); I think it's a little over 30 bytes. A 1 x 1 image in the other common formats (e.g., jpeg, png, tiff) are larger.
This general scheme for transferring data between a client and a server has been around forever; there could very well be a better way of doing this, but it's the only way I know of (that satisfies the constraints imposed by a hosted analytics service).
(Google Analytics does indeed have two APIs--"Data Export" and "Management"--which are both RESTful Web Services.)
You can use __utm.gif in browsers that don't support javascript using the <noscript> tag (with some work on the server), as well as in email messages (with some work before sending the email).
How are you gonna make a REST request in an email message?
Because it's an image you can stick it anywhere you can use and image tag even if you can't execute JS. Many years back this Google pushed this for tracking of email campaigns. You could stick this formatted string in an html email message and then any client that displays the message will send that request to the GA servers and you will get at a minimum IP info (which get's you geo location also) depending on client you may also get OS, language and all the other browser settings. You don't get all the fancy analytics you get from the modern JS tracking scripts but if still has it's uses.
Here is a site that will help you format the request string and also has some more details.
Google pixel generator