What's image request with get parameters - http

While trying to understand how does the google keyword tool is requesting data I have found that it has a request for a .gif file with GET arguments.. for example:
https://ssl.google-analytics.com/__utm.gif?utmwv=&utms=&utmn=&utmhn=&utmt=&utme=&utmcs=utmsr=&utmvp=&utmsc=&utmul=&utmje=&utmfl=&utmdt=&utmhid=&utmr=&utmp=&utmac=&utmcc=&utmu=
(I have omitted all argument's data)
can someone please explain?

Although it's a request, it's purpose is to send analytics data in the query string parameters. For a good explanation, see Why does Google Analytics use __utm.gif?.
For more detail on what the actual parameters on the GIF request are, see: https://developers.google.com/analytics/resources/articles/gaTrackingTroubleshooting#gifParameters

This GET request gets handled by Google's analytics servers. It probably doesn't just directly serve __utm.gif from somewhere on the filesystem; it probably executes a script that takes all the parameters, does some processing, and logs that request in their analytics database, and then serves a 1x1 transparent GIF.

Related

head request returns different content-type [duplicate]

I would like to try send requests.get to this website:
requests.get('https://rent.591.com.tw')
and I always get
<Response [404]>
I knew this is a common problem and tried different way but still failed.
but all of other website is ok.
any suggestion?
Webservers are black boxes. They are permitted to return any valid HTTP response, based on your request, the time of day, the phase of the moon, or any other criteria they pick. If another HTTP client gets a different response, consistently, try to figure out what the differences are in the request that Python sends and the request the other client sends.
That means you need to:
Record all aspects of the working request
Record all aspects of the failing request
Try out what changes you can make to make the failing request more like the working request, and minimise those changes.
I usually point my requests to a http://httpbin.org endpoint, have it record the request, and then experiment.
For requests, there are several headers that are set automatically, and many of these you would not normally expect to have to change:
Host; this must be set to the hostname you are contacting, so that it can properly multi-host different sites. requests sets this one.
Content-Length and Content-Type, for POST requests, are usually set from the arguments you pass to requests. If these don't match, alter the arguments you pass in to requests (but watch out with multipart/* requests, which use a generated boundary recorded in the Content-Type header; leave generating that to requests).
Connection: leave this to the client to manage
Cookies: these are often set on an initial GET request, or after first logging into the site. Make sure you capture cookies with a requests.Session() object and that you are logged in (supplied credentials the same way the browser did).
Everything else is fair game but if requests has set a default value, then more often than not those defaults are not the issue. That said, I usually start with the User-Agent header and work my way up from there.
In this case, the site is filtering on the user agent, it looks like they are blacklisting Python, setting it to almost any other value already works:
>>> requests.get('https://rent.591.com.tw', headers={'User-Agent': 'Custom'})
<Response [200]>
Next, you need to take into account that requests is not a browser. requests is only a HTTP client, a browser does much, much more. A browser parses HTML for additional resources such as images, fonts, styling and scripts, loads those additional resources too, and executes scripts. Scripts can then alter what the browser displays and load additional resources. If your requests results don't match what you see in the browser, but the initial request the browser makes matches, then you'll need to figure out what other resources the browser has loaded and make additional requests with requests as needed. If all else fails, use a project like requests-html, which lets you run a URL through an actual, headless Chromium browser.
The site you are trying to contact makes an additional AJAX request to https://rent.591.com.tw/home/search/rsList?is_new_list=1&type=1&kind=0&searchtype=1&region=1, take that into account if you are trying to scrape data from this site.
Next, well-built sites will use security best-practices such as CSRF tokens, which require you to make requests in the right order (e.g. a GET request to retrieve a form before a POST to the handler) and handle cookies or otherwise extract the extra information a server expects to be passed from one request to another.
Last but not least, if a site is blocking scripts from making requests, they probably are either trying to enforce terms of service that prohibit scraping, or because they have an API they rather have you use. Check for either, and take into consideration that you might be blocked more effectively if you continue to scrape the site anyway.
One thing to note: I was using requests.get() to do some webscraping off of links I was reading from a file. What I didn't realise was that the links had a newline character (\n) when I read each line from the file.
If you're getting multiple links from a file instead of a Python data type like a string, make sure to strip any \r or \n characters before you call requests.get("your link"). In my case, I used
with open("filepath", 'w') as file:
links = file.read().splitlines()
for link in links:
response = requests.get(link)
In my case this was due to fact that the website address was recently changed, and I was provided the old website address. At least this changed the status code from 404 to 500, which, I think, is progress :)

What will the RightSignature API send to my callback URL when a signer signs a document

When I send a one-off document to RightSignature via their API, I'm specifying a callback location in the XML document as specified in RightSignature's schema definition. I then get a signer-link value back from their API for the document. I display the HTML response from the signer-link URL in an iFrame on our website. When our user signs the document in this iFrame, which is rendering the responses from their website, I want their website to post to our callback location.
Can I do this with the RightSignature API and does it make sense?
So far, I'm only getting content in the iFrame that indicates that the signing was successful. The callback location does not seem to be getting called.
I got it solved just now. Basically, i was doing two things wrong first you have to go in RightSignature Account and set it there the CallBack url
Account > Settings > Advanced Settings
But the thing which RS is unable to mention to us that this url can not be of localhost, but it should be of https i mean like Live URL of your site like
https://stagingmysite.azurewebsites.net/User/CallBackFunction
And then in your CallBack just write these two lines and you will receive complete XML which would have the GUID and document status as well.
byte[] data = Request.BinaryRead(Request.TotalBytes);
string callBackXML = System.Text.Encoding.UTF8.GetString(data);
I found the answer with some help from the API team at RightSignature. I was using callback_location but what I really wanted is redirect_location. Their online documentation was difficult to follow and did not clearly point out the difference.
I got this working after a lot of trial and error.

Logging into a webpage via HTTP Request

So I have a webpage, ("http://data.terapeak.com/verify/") and I don't see any & tags in the URL so I am unaware how to post data to this. I need to do this via HTTPRequest rather than browser control. I am creating a double threaded batch searching program. I have already successfully made this using a single browser control but that wont allow for multi-threading, atleast with my current knowledge due to the fact that even when creating a new frmBrw that already exists it needs for me to set the threat apartment to single. If i set it to single, I am unable to have it send the data the the excel sheet I need both threads to access. I hope this is clear... The basic question is how can I log into this form via HTTP request.
This isn't going to be easy to answer without further details however I suspect you'll need to provide the variables via a HTTP POST request.
Can you successfully login to this page in your browser? If so, run a proxy tool such as fiddler and check the HTTP headers it makes to the server. You should see the form variables being passed over. You then need to mimic this in code.
How to: Send Data Using the WebRequest Class
Hope this gets you started

How to find HTTP POST Data sent to a CGI Page?

I searched google for a good number of hours. Maybe I searched for the wrong keywords.
Here is what I want to do.
I'm posting data to a website which then makes a HTTP POST request and returns a .CGI webpage. I want to know the parameters the web page uses to send that HTTP POST request so that I can directly link a page from my Webpage to the final .CGI webpage by making the user enter the data on my own webpage.
How do I achieve it?
Usually the POST body is piped into STDIN, just read it as a normal file

Why does Google Analytics use __utm.gif?

Just trying to understand why they didn't use a REST API.
In REST, clients initiate requests to servers for resources; servers process those requests and return appropriate responses.
The utm.gif is not involved in server-to-client data transfer, but instead it's involved in moving data in the other direction.
Of course REST has HTTP methods for the client to communicate with servers (GET and POST) and indeed, Google Analytics directs the client's browser to send all analytics data to the GA servers via a GET Request. More precisely, a GET Request is comprised of a Request URL and Request Headers (e.g., Referer and User-Agent Headers).
All GA data--every single item--is assembled and packed into the Request URL's query string (everything after the '?'). But in order for that data to go from the client (where it is created) to the GA server (where it is logged and aggregated) there must be an HTTP Request, so the ga.js (google analytics script that's downloaded, unless it's cached, by the client, as a result of a function called when the page loads) directs the client to assemble all of the analytics data--e.g., cookies, location bar, request headers, etc.--concatenate it into a single string and append it as a query string to a URL (http://www.google-analytics.com/__utm.gif?) and that becomes the Request URL.
Of course there can't be an HTTP Request without a resource; so resource is the client requesting from the server? It doesn't need anything from the server, instead it wants to send information to the server. So the actual server resource requested by the client is purely pretextual--the resource isn't even needed by the client, it's solely requested to comply with the transmission protocol operator. Therefore, it makes sense to make that resource as small and as unobtrusive as possible, which is why it's a 1 x 1 transparent pixel in gif format. It is the smallest possible size and the least dense image format (bytes/pixel); I think it's a little over 30 bytes. A 1 x 1 image in the other common formats (e.g., jpeg, png, tiff) are larger.
This general scheme for transferring data between a client and a server has been around forever; there could very well be a better way of doing this, but it's the only way I know of (that satisfies the constraints imposed by a hosted analytics service).
(Google Analytics does indeed have two APIs--"Data Export" and "Management"--which are both RESTful Web Services.)
You can use __utm.gif in browsers that don't support javascript using the <noscript> tag (with some work on the server), as well as in email messages (with some work before sending the email).
How are you gonna make a REST request in an email message?
Because it's an image you can stick it anywhere you can use and image tag even if you can't execute JS. Many years back this Google pushed this for tracking of email campaigns. You could stick this formatted string in an html email message and then any client that displays the message will send that request to the GA servers and you will get at a minimum IP info (which get's you geo location also) depending on client you may also get OS, language and all the other browser settings. You don't get all the fancy analytics you get from the modern JS tracking scripts but if still has it's uses.
Here is a site that will help you format the request string and also has some more details.
Google pixel generator

Resources