Tamper with first line of URL request, in Firefox - http

I want to change first line of the HTTP header of my request, modifying the method and/or URL.
The (excellent) Tamperdata firefox plugin allows a developer to modify the headers of a request, but not the URL itself. This latter part is what I want to be able to do.
So something like...
GET http://foo.com/?foo=foo HTTP/1.1
... could become ...
GET http://bar.com/?bar=bar HTTP/1.1
For context, I need to tamper with (make correct) an erroneous request from Flash, to see if an error can be corrected by fixing the url.
Any ideas? Sounds like something that may need to be done on a proxy level. In which case, suggestions?

Check out Charles Proxy (multiplatform) and/or Fiddler2 (Windows only) for more client-side solutions - both of these run as a proxy and can modify requests before they get sent out to the server.
If you have access to the webserver and it's running Apache, you can set up some rewrite rules that will modify the URL before it gets processed by the main HTTP engine.

For those coming to this page from a search engine, I would also recommend the Burp Proxy suite: http://www.portswigger.net/burp/proxy.html
Although more specifically targeted towards security testing, it's still an invaluable tool.

If you're trying to intercept the HTTP packets and modify them on the way out, then Tamperdata may be route you want to take.
However, if you want minute control over these things, you'd be much better off simulating the entire browser session using a utility such as curl
Curl: http://curl.haxx.se/

Related

head request returns different content-type [duplicate]

I would like to try send requests.get to this website:
requests.get('https://rent.591.com.tw')
and I always get
<Response [404]>
I knew this is a common problem and tried different way but still failed.
but all of other website is ok.
any suggestion?
Webservers are black boxes. They are permitted to return any valid HTTP response, based on your request, the time of day, the phase of the moon, or any other criteria they pick. If another HTTP client gets a different response, consistently, try to figure out what the differences are in the request that Python sends and the request the other client sends.
That means you need to:
Record all aspects of the working request
Record all aspects of the failing request
Try out what changes you can make to make the failing request more like the working request, and minimise those changes.
I usually point my requests to a http://httpbin.org endpoint, have it record the request, and then experiment.
For requests, there are several headers that are set automatically, and many of these you would not normally expect to have to change:
Host; this must be set to the hostname you are contacting, so that it can properly multi-host different sites. requests sets this one.
Content-Length and Content-Type, for POST requests, are usually set from the arguments you pass to requests. If these don't match, alter the arguments you pass in to requests (but watch out with multipart/* requests, which use a generated boundary recorded in the Content-Type header; leave generating that to requests).
Connection: leave this to the client to manage
Cookies: these are often set on an initial GET request, or after first logging into the site. Make sure you capture cookies with a requests.Session() object and that you are logged in (supplied credentials the same way the browser did).
Everything else is fair game but if requests has set a default value, then more often than not those defaults are not the issue. That said, I usually start with the User-Agent header and work my way up from there.
In this case, the site is filtering on the user agent, it looks like they are blacklisting Python, setting it to almost any other value already works:
>>> requests.get('https://rent.591.com.tw', headers={'User-Agent': 'Custom'})
<Response [200]>
Next, you need to take into account that requests is not a browser. requests is only a HTTP client, a browser does much, much more. A browser parses HTML for additional resources such as images, fonts, styling and scripts, loads those additional resources too, and executes scripts. Scripts can then alter what the browser displays and load additional resources. If your requests results don't match what you see in the browser, but the initial request the browser makes matches, then you'll need to figure out what other resources the browser has loaded and make additional requests with requests as needed. If all else fails, use a project like requests-html, which lets you run a URL through an actual, headless Chromium browser.
The site you are trying to contact makes an additional AJAX request to https://rent.591.com.tw/home/search/rsList?is_new_list=1&type=1&kind=0&searchtype=1&region=1, take that into account if you are trying to scrape data from this site.
Next, well-built sites will use security best-practices such as CSRF tokens, which require you to make requests in the right order (e.g. a GET request to retrieve a form before a POST to the handler) and handle cookies or otherwise extract the extra information a server expects to be passed from one request to another.
Last but not least, if a site is blocking scripts from making requests, they probably are either trying to enforce terms of service that prohibit scraping, or because they have an API they rather have you use. Check for either, and take into consideration that you might be blocked more effectively if you continue to scrape the site anyway.
One thing to note: I was using requests.get() to do some webscraping off of links I was reading from a file. What I didn't realise was that the links had a newline character (\n) when I read each line from the file.
If you're getting multiple links from a file instead of a Python data type like a string, make sure to strip any \r or \n characters before you call requests.get("your link"). In my case, I used
with open("filepath", 'w') as file:
links = file.read().splitlines()
for link in links:
response = requests.get(link)
In my case this was due to fact that the website address was recently changed, and I was provided the old website address. At least this changed the status code from 404 to 500, which, I think, is progress :)

What happens when a HTTP request uses different browser headers?

I'm trying to understand how an IIS server handles different browsers in the header of an HTTP request.
The situation is that I have some load tests set up that fire off HTTP requests to an IIS server, constructing them and sending them over the wire. My code allows me to specify the browser in the header, but I'm not sure what that would actually change.
So what does IIS do with that particular information in the header?
As far as i am aware IIS doesn't actually do anything with the header.
You can create rules to explicitly handle a type of browser, this is pretty useful if you block traffic from countries but you still want to allow bots for example.
Its useful to also have this information in Log Files too

HTTP Response before Request

My question might sound stupid, but I just wanted to be sure:
Is it possible to send an HTTP response before having the request for that resource?
Say for example you have an HTML page index.html that only shows a picture called img.jpg.
Now, if your server knows that a visitor will request the HTML file and then the jpg image every time:
Would it be possible for the server to send the image just after the HTML file to save time?
I know that HTTP is a synchronous protocol, so in theory it should not work, but I just wanted someone to confirm it (or not).
A recent post by Jacques Mattheij, referencing your very question, claims that although HTTP was designed as a synchronous protocol, the implementation was not. In practise the browser (he doesn't specify which exactly) accepts answers to requests have not been sent yet.
On the other hand, if you are looking to something less hacky, you could have a look at :
push techniques that allows the server to send content to the browser. The modern implementation that replace long-polling/Comet "hacks" are the websockets. You may want to have a look at socket.io also.
Alternatively you may want to have a look at client-side routing. Some implementations combine this with caching techniques (like in derby.js I believe).
If someone requests /index.html and you send two responses (one for /index.html and the other for /img.jpg), how do you know the recipient will get the two responses and know what to do with them before the second request goes in?
The problem is not really with the sending. The problem is with the receiver possibly getting unexpected data.
One other issue is that you're denying the client the ability to use HTTP caching tools like If-Modified-Since and If-None-Match (i.e. the client might not want /img.jpg to be sent because it already has a cached copy).
That said, you can approximate the server-push benefits by using Comet techniques. But that is much more involved than simply anticipating incoming HTTP requests.
You'll get a better result by caching resources effectively, i.e. setting proper cache headers and configuring your web server for caching. You can also inline images using base 64 encoding, if that's a specific concern.
You can also look at long polling javascript solutions.
You're looking for server push: it isn't available in HTTP. Protocols like SPDY have it, but you're out of luck if you're restricted to HTTP.
I don't think it is possible to mix .html and image in the same HTTP response. As for sending image data 'immediately', right after the first request - there is a concept of 'static resources' which could be of help (but it will require client to create a new reqest for a specific resource).
There are couple of interesting things mentioned in the the article.
No it is not possible.
The first line of the request holds the resource being requested so you wouldn't know what to respond with unless you examined the bytes (at least one line's worth) of the request first.
No. HTTP is defined as a request/response protocol. One request: one response. Anything else is not HTTP, it is something else, and you would have to specify it properly and implement it completely at both ends.

How to spoof http referer

As of current, are there still any methods to spoof HTTP referer?
Yes.
The HTTP_REFERER is data passed by the client. Any data passed by the client can be spoofed/forged. This includes HTTP_USER_AGENT.
If you wrote the web browser, you're setting and sending the HTTP Referrer and User-Agent headers on the GET, POST, etc.
You can also use middleware such as a web proxy to alter these. Fiddler lets you control these values.
If you want to redirect a visitor to another website and set their browser's referrer to any value you desire, you'll need to develop a web browser-plugin or some other type of application that runs on their computer. Otherwise, you cannot set the referrer on the visitor's browser. It will show the page from your site that linked to it.
What might be a valid solution in your case would be for you to load the third party page on the visitor's behalf, using whatever referrer is necessary, then display the page to the user from your server.
Yes, the HTTP referer header can be spoofed.
A common way to play with HTTP headers is to use a tool like cURL:
Sending headers using cURL:
How to send a header using a HTTP request through a curl call?
or
The cURL docs:
http://curl.haxx.se/docs/
Yes of course. Browser can avoid to send it, and it can be also "spoofed". There's an addon for firefox (I haven't tried it myself) and likely you can use also something like privoxy (but it is harder to make it dynamically changing). Using other tools like wget, is as easy as setting the proper option.

Is there any way to check if a POST url exists?

Is there any way to determine if a POST endpoint exists without actually sending a POST request?
For GET endpoints, it's not problem to check for 404s, but I'd like to check POST endpoints without triggering whatever action resides on the remote url.
Sending an OPTIONS request may work
It may not be implemented widely but the standard way to do this is via the OPTIONS verb.
WARNING: This should be idempotent but a non-compliant server may do very bad things
OPTIONS
Returns the HTTP methods that the server supports for specified URL. This can be used to check the functionality of a web server by requesting '*' instead of a specific resource.
More information here
This is not possible by definition.
The URL that you're posting to could be run by anything, and there is no requirement that the server behave consistently.
The best you could do is to send a GET and see what happens; however, this will result in both false positives and false negatives.
You could send a HEAD request, if the server you are calling support it - the response will typically be way smaller than a GET.
Does endpoint = script? It was a little confusing.
I would first point out, why would you be POSTing somewhere if it doesn't exist? It seems a little silly?
Anyway, if there is really some element of uncertainty with your POST URL, you can use cURL, then set the header option in the cURL response. I would suggest that if you do this that you save all validated POSTs if its likely that the POST url would be used again.
You can send your entire POST at the same time as doing the CURL then check to see if its errored out.
I think you probably answered this question yourself in your tags of your question with cURL.

Resources