Tool to check if web server is following HTTP standard? - http

I'm writing a simple HTTP server which should play nicely with most clients, but is only meant to implement a subset of HTTP 1.1.
During development it would be nice to be able to validate that the generated HTTP reponses are HTTP 1.1 compliant. Is there a tool that can do something along those lines?
Thanks /Erik

It's not a complete conformance suite, but RED does check for a number of different HTTP requirements, and finds common problems.
http://redbot.org/

You could just write unit test cases using any http client library. Make GET and POST requests to your webserver, parse the response and make assertions. As you add additional features, add more test cases.
For example, lets say you only support url-encoded POST requests. So, you write a test case which verifies your server understands url-encoded requests and responds appropriately. Tomorrow, when you add support for multi-part support - that would be another test case altogether.
Every programming language under the sun has good support for HTTP, so writing the test case is a no-brainer.

Related

What is the difference between Requests and Requests-html?

I have to give seminar on Requests and Requests-html. I am searching that but can't find any website. Both Requests and Requests-html has same methods but what is the difference
Requests-HTML helps you to parse contents of a webpage (aka web-scraping). You can connect to a webpage and parse its contents like links, raw data, search for specific terms. Generally, it is used for data analytical purpose and requires less technical expertise than requests.
Requests helps you to make HTTP calls programatically. You can send GET/POST et al requests just like curl commands and receive response to be processed by certain logic. Generally backend API developers use it and requires technical knowledge of how HTTP works.

HTTP GET and POST semantics and limitations

Earlier this week, I had to do something which feels like a semantics violation. Let me explain.
I was making a simple AJAX client application, which was to make a request to a service with a given number of parameters. Since the whole app is basically read-only, I thought that using HTTP GET was the way to go. Some of the parameters that I had to pass were simple (such as the sort order, or page number).
However, one of the required parameters could be of variable length, and this made me worry. Since I was encoding all of the parameters in the querystring of the GET request, it seemed to me that this placed an unnecessary upper limit of (roughly) 2000 characters for the request URL. And regardless, I didn't like seeing 500-character-long request URLs.
So, since a POST request doesn't have a limitation like that, I decided to switch. But this doesn't feel right. I am under the impression that a POST denotes modification of data - but I'm using it for a simple read-only request.
Is there a better way to do this? To perform a GET, with many parameters? I've heard of one method - where you perform a preliminary POST of the parameters themselves, and then perform a GET. But, this technique leaves much to be desired.
But looking past this specific case, what are the real semantics and limitations of HTTP request methods? And why does GET not support any kind of parameter payload? Using the querystring in the URL almost feels like a hack to me.
A few points on this issue:
The HTTP spec (RFC 2616) doesn't forbit GET requests to have parameters, so it's not a matter of the semantics of HTTP GET itself. However, many HTTP stacks (for clients, services, or proxies) forbid bodies in HTTP requests, the fact that you can't use them is mostly an implementation detail (quite prevalent) than a semantic issue with the HTTP GET requests
Similarly, the limitation of the URI (or query string) length isn't specified on the RFC either. It's mostly a security mitigation implemented by several HTTP server stacks to prevent a bad client from consuming server resources (for example, in IIS/ASP.NET the default limit is 2k but you can increase it via some elements in web.config). Again, it's not a semantic but a practical issue.
POST requests do indicate data modification if you're following the REST philosophy, but there are many examples of HTTP POST requests used for read-only operations. SOAP uses POST in all of its requests, regardless of whether the operation it is calling is a "safe" or a "modifying" one. So you can use POST for those operations as well. However, by deviating from the REST (and the "canonical" HTTP) usage, you'll lose some of the features of the protocol, such as caching which can be applied for GET requests, but not for POST.
Your example of using two requests (POST with parameters + GET to "get" the results) seems overkill. As I mentioned, POST requests don't necessarily mean modifying resources, so you don't have to create a new "protocol" (POST+GET) to access your operation when one request is enough.

What is the difference between REST and HTTP protocols?

What is the REST protocol and what does it differ from HTTP protocol ?
REST is a design style for protocols, it was developed by Roy Fielding in his PhD dissertation and formalised the approach behind HTTP/1.0, finding what worked well with it, and then using this more structured understanding of it to influence the design of HTTP/1.1. So, while it was after-the-fact in a lot of ways, REST is the design style behind HTTP.
Fielding's dissertation can be found at http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm and is very much worth reading, and also very readable. PhD dissertations can be pretty hard-going, but this one is wonderfully well-described and very readable to those of us without a comparable level of Computer Science. It helps that REST itself is pretty simple; it's one of those things that are obvious after someone else has come up with it. (It also for that matter encapsulates a lot of things that older web developers learnt themselves the hard way in one simple style, which made reading it a major "a ha!" moment for many).
Other application-level protocols as well as HTTP can also use REST, but HTTP is the classic example.
Because HTTP uses REST, all uses of HTTP are using a REST system. The description of a web application or service as RESTful or non-RESTful relates to whether it takes advantage of REST or works against it.
The classic example of a RESTful system is a "plain" website without cookies (cookies aren't always counter to REST, but they can be): Client state is changed by the user clicking a link which loads another page, or doing GET form queries which brings results. POST form queries can change both server and client state (the server does something on the basis of the POST, and then sends a hypertext document that describes the new state). URIs describe resources, but the entity (document) describing it may differ according to content-type or language preferred by the user. Finally, it's always been possible for browsers to update the page itself through PUT and DELETE though this has never been very common and if anything is less so now.
The classic example of a non-RESTful system using HTTP is something which treats HTTP as if it was a transport protocol, and with every request sends a POST of data to the same URI which is then acted upon in an RPC-like manner, possibly with the connection itself having shared state.
A RESTful computer-readable (i.e. not a website in a browser, but something used programmatically) system would obtain information about the resources concerned by GETting URI which would then return a document (e.g. in XML, but not necessarily) which would describe the state of the resource, including URIs to related resources (hypermedia therefore), change their state through PUTting entities describing the new state or DELETEing them, and have other actions performed by POSTing.
Key advantages are:
Scalability: The lack of shared state makes for a much more scalable system (demonstrated to me massively when I removed all use of session state from a heavily hit website, while I was expecting it to give a bit of extra performance, even a long-time anti-session advocate like myself was blown-away by the massive gain from removing what had been pretty slim use of sessions, it wasn't even why I had been removing them!)
Simplicity: There are a few different ways in which REST is simpler than more RPC-like models, in particular there are only a few "verbs" that are ever possible, and each type of resource can be reasoned about in reasonable isolation to the others.
Lightweight Entities: More RPC-like models tend to end up with a lot of data in the entities sent both ways just to reflect the RPC-like model. This isn't needed. Indeed, sometimes a simple plain-text document is all that is really needed in a given case, in which case with REST, that's all we would need to send (though this would be an "end-result" case only, since plain-text doesn't link to related resources). Another classic example is a request to obtain an image file, RPC-like models generally have to wrap it in another format, and perhaps encode it in some way to let it sit within the parent format (e.g. if the RPC-like model uses XML, the image will need to be base-64'd or similar to fit into valid XML). A RESTful model would just transmit the file the same as it does to a browser.
Human Readable Results: Not necessarily so, but it is often easy to build a RESTful webservice where the results are relatively easy to read, which aids debugging and development no end. I've even built one where an XSLT meant that the entire thing could be used by humans as a (relatively crude) website, though it wasn't primarily for human-use (essentially, the XSLT served as a client to present it to users, it wasn't even in the spec, just done to make my own development easier!).
Looser binding between server and client: Leads to easier later development or moves in how the system is hosted. Indeed, if you keep to the hypertext model, you can change the entire structure, including moving from single-host to multiple hosts for different services, without changing client code at all.
Caching: For the GET operations where the client obtains information about the state of a resource, standard HTTP caching mechanisms allow both for statements that the resource won't meaningfully change until a certain date at the earliest (no need to query at all until then) or that it hasn't changed since the last query (send a couple hundred bytes of headers saying this rather than several kilobytes of data). The improvement in performance can be immense (big enough to move the performance of something from the point where it is impractical to use to the point where performance is no longer a concern, in some cases).
Availability of toolkits: Because it works at a relatively simple level, if you have a webserver you can build a RESTful system's server and if you have any sort of HTTP client API (XHR in browser javascript, HttpWebRequest in .NET, etc) you can build a RESTful system's client.
Resiliance: In particular, the lack of shared state means that a client can die and come back into use without the server knowing, and even the server can die and come back into use without the client knowing. Obviously communications during that period will fail, but once the server is back online things can just continue as they were. This also really simplifies the use of web-farms for redundancy and performance - each server acts like it's the only server there is, and it doesn't matter that its actually only dealing with a fraction of the requests from a given client.
REST is an approach that leverages the HTTP protocol, and is not an alternative to it.
http://en.wikipedia.org/wiki/Representational_State_Transfer
Data is uniquely referenced by URL and can be acted upon using HTTP operations (GET, PUT, POST, DELETE, etc). A wide variety of mime types are supported for the message/response but XML and JSON are the most common.
For example to read data about a customer you could use an HTTP get operation with the URL http://www.example.com/customers/1. If you want to delete that customer, simply use the HTTP delete operation with the same URL.
The Java code below demonstrates how to make a REST call over the HTTP protocol:
String uri =
"http://www.example.com/customers/1";
URL url = new URL(uri);
HttpURLConnection connection =
(HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.setRequestProperty("Accept", "application/xml");
JAXBContext jc = JAXBContext.newInstance(Customer.class);
InputStream xml = connection.getInputStream();
Customer customer =
(Customer) jc.createUnmarshaller().unmarshal(xml);
connection.disconnect();
For a Java (JAX-RS) example see:
http://bdoughan.blogspot.com/2010/08/creating-restful-web-service-part-45.html
REST is not a protocol, it is a generalized architecture for describing a stateless, caching client-server distributed-media platform. A REST architecture can be implemented using a number of different communication protocols, though HTTP is by far the most common.
REST is not a protocol, it is a way of exposing your application, mostly done over HTTP.
for example, you want to expose an api of your application that does getClientById
instead of creating a URL
yourapi.com/getClientById?id=4
you can do
yourapi.com/clients/id/4
since you are using a GET method it means that you want to GET data
You take advantage over the HTTP methods: GET/DELETE/PUT
yourapi.com/clients/id/4 can also deal with delete, if you send a delete method and not GET, meaning that you want to dekete the record
All the answers are good.
I hereby add a detailed description of REST and how it uses HTTP.
REST = Representational State Transfer
REST is a set of rules, that when followed, enable you to build a distributed application that has a specific set of desirable constraints.
It is stateless, which means that ideally no connection should be maintained between the client and server.
It is the responsibility of the client to pass its context to the server and then the server can store this context to process the client's further request. For example, session maintained by server is identified by session identifier passed by the client.
Advantages of Statelessness:
Web Services can treat each method calls separately.
Web Services need not maintain the client's previous interaction.
This in turn simplifies application design.
HTTP is itself a stateless protocol unlike TCP and thus RESTful Web Services work seamlessly with the HTTP protocols.
Disadvantages of Statelessness:
One extra layer in the form of heading needs to be added to every request to preserve the client's state.
For security we may need to add a header info to every request.
HTTP Methods supported by REST:
GET: /string/someotherstring:
It is idempotent(means multiple calls should return the same results every time) and should ideally return the same results every time a call is made
PUT:
Same like GET. Idempotent and is used to update resources.
POST: should contain a url and body
Used for creating resources. Multiple calls should ideally return different results and should create multiple products.
DELETE:
Used to delete resources on the server.
HEAD:
The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The meta information contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request.
OPTIONS:
This method allows the client to determine the options and/or requirements associated with a resource, or the capabilities of a server, without implying a resource action or initiating a resource retrieval.
HTTP Responses
Go here for all the responses.
Here are a few important ones:
200 - OK
3XX - Additional information needed from the client and url redirection
400 - Bad request
401 - Unauthorized to access
403 - Forbidden
The request was valid, but the server is refusing action. The user might not have the necessary permissions for a resource, or may need an account of some sort.
404 - Not Found
The requested resource could not be found but may be available in the future. Subsequent requests by the client are permissible.
405 - Method Not Allowed
A request method is not supported for the requested resource; for example, a GET request on a form that requires data to be presented via POST, or a PUT request on a read-only resource.
404 - Request not found
500 - Internal Server Failure
502 - Bad Gateway Error

AMF and Cross Site scripting vulnerabilty confusion

I just got hammered on a Security Audit by Deloitte on behalf of SFDC. Basically we use flex and communicate via AMF. We use FluorineFX for this (as opposed to LCDS and Blaze). We are being told that because the AMF response is not encoded and that someone can manipulate the AMF parameters and insert Javascript that this is a XSS vulnerability. I'm struggling to understand how the AMF response back, which could echo the passed in JS in an error message, can be executed by the browser or anything else for that matter. I'm quite experienced with XSS with HTML and JS but seeing it get tagged with AMF was a bit of a surprise. I'm in touch with FluorineFx team and they are perplexed as well.
I'd be surprised to see an AMF library encode the response data, Fluorine surely does not. It would seem though that security applications like PortSwigger and IBM AppScan are including this type of test in their tool chest. Have you run into this vulnerability with AMF and can you explain how the XSS issue can manifest itself? Just curious. I need to either argue my way out of this if an argument exists or patch the hole. Given the AMF usage with Flex I thought you might have some insight.
Additional information ...
So A little more on this from the actual vendor, PortSwigger. I posed the question to them and net, net, they concede this type of attack is extremely complicated. Initially they are classifying this as a High Severity security issue but I think their tune is changing now. I thought I'd post the content of their response for you all as I think the perspective is interesting none-the-less.
--- From PortSwigger on the issue ---
Thanks for your message. I think the answer is that this is potentially a
vulnerability, but is not trivial to exploit.
You're right, the issue wouldn't arise when the response is consumed by an
AMF client (unless it does something dumb), but rather if an attacker could
engineer a situation where the response is consumed by a browser. Most
browsers will overlook the HTTP Content-Type header, and will look at the
actual response content, and if it looks at all like HTML will happily
process it as such. Historically, numerous attacks have existed where people
embed HTML/JS content within other response formats (XML, images, other
application content) and this is executed as such by the browser.
So the issue is not so much the format of the response, but rather the
format of the request required to produce it. It's not trivial for an
attacker to engineer a cross-domain request containing a valid AMF message.
A similar thing arises with XML requests/responses which contain XSS-like
behaviour. It's certainly possible to create a valid XML response which gets
treated by the browser as HTML, but the challenge is how to send raw XML in
the HTTP body cross-domain. This can't be done using a standard HTML form,
so an attacker needs to find another client technology, or browser quirk, to
do this. Historically, things like this have been possible at various times,
until they were fixed by browser/plugin vendors. I'm not aware of anything
that would allow it at the moment.
So in short, it's a theoretical attack, which depending on your risk profile
you could ignore altogether or block using server-side input validation, or
by encoding the output on the server and decoding again on the client.
I do think that Burp should flag up the AMF request format as mitigation for
this issue, and downgrade the impact to low - I'll get this fixed.
Hope that helps.
Cheers
PortSwigger
--- more info on audit ---
what portSwigger does is not necessarily mess with binary payload, what they do is mess with the actual AMF parameters that are posted to the handler to direct the request. For example here is a snippet from the audit and it shows part of the AMF response to a request ...
HTTP/1.1 200 OK
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 2.0.50727
P3P: CP="CAO PSA OUR"
Content-Type: application/x-amf
Vary: Accept-Encoding
Expires: Tue, 06 Apr 2010 18:02:10 GMT
Date: Tue, 06 Apr 2010 18:02:10 GMT
Connection: keep-alive
Content-Length: 2595
......../7/onStatus.......
.SIflex.messaging.messages.ErrorMessage.faultCode.faultString
.faultDetail.rootCause.extendedData.correlationId.clientId.destination
.messageId.timestamp.timeToLive body.headers.#Server.Processing..kFailed
to locate the requested type
com.Analytics.ca.Services.XXX5c2ce<script>alert(1)</script>9ccff0bda62..
....I506E8A27-8CD0-598D-FF6E-D4490E3DA69F.Id95ab281-d83b-4beb-abff-c668b9fd42d5
..fluorine.I04165c8e-f878-447f-a19a-a08cbb7def2a.A.q..#............
. DSId.Aeb5eeabcbc1d4d3284cbcc7924451711.../8/onRes
...[SNIP]...
note the "alert" script in there ... what they did was appended some script enclosed JS to one of the parameters that are passed containing the method to call namely 'com.Analytics.ca.Services.XXX'. By doing so the JS came back in an error message but there are a lot of things that would have to happen for that JS to get anywhere close to executing. Seems an indirect threat at best.
-- Security Auditor's latest perspective --
I’ve discussed with the larger team and we all believe it’s a valid attack. As PortSwigger mentions in his first paragraph, while theoretically since you set the content-type to x-amf, and would hope it won’t render in the browser, most browsers will ignore this request and render it anyway. I think the vendors are relying heavily on the fact that the content-type is set; however popular browsers like IE and some versions of Safari will ignore this.
The attack can easily be triggered by exploiting CSRF or any other form of initiating an XSS attack.
It could not be a JavaScript injection, as what in the Flash Player would interpret JS? The flash community would be ecstatic if we had native JS or even json support in the player. There is no eval function for actionscript let alone javascript
Let's assume they meant you could inject it with actionscript. The AMF protocol does not send code, it sends data models in the form of primitive types or generic or typed objects. The worst thing that could happen is that they analyze your model and add additional data. This would be amazingly difficult to do as you would not be able to inject the data but would have to parse all the data, add the new data, parse it back and keep the AMF headers. Because AMF uses references in it's data serialization which means that when duplicate object types you would have had to of seen the first object. The reference is then an offset which means little chance of adding code but only changing values to existing parameters.
The remote object has a response handler that is checking for the data types and expects to bind those data types to ui components or whatever your code does. If those data types are wrong you will get an error. If the AMF response sequence number is wrong you will get an error. If anything is not perfectly formed in the amf datagram you will get an error.
Remote object automatically retry. If the "injecting" code takes to long Flex will resend a message and invalidate the one that took to long.
Just my two cents. As an AMF developer I have frequently wished it was easy to screw with the amf datagram for debugging and testing. Unfortunately you will get an error.
Wade Arnold
You seem to have answered your own queries here.
So you have a server side implementation that takes the arguments to an amf function call and includes the input data somewhere in the returned output.
I appreciate that this is largely a theoretical attack as it involves getting the payload to be rendered by the browser and not into an amf client. Other vulnerabilities in browsers/plugins may be required to even enable this scenario. Maybe a CSRF post via the likes of a gateway.php or similar would make this pretty easy to abuse, as long as the browser processed the output as html/js.
However, unless you need the caller to be able to pass-through angle brackets into the response, just html-encode or strip them and this attack scenario dissapears.
This is interesting though. Normally one would perform output-encoding solely for the expected consumer of the data, but it is interesting to consider that the browser could often be a special case. This really is one hell of an edge-case, but i'm all for people getting into the habit of sanitising and encoding their untrusted inputs.
This reminds me, in many ways, to the way that cross-protocol injection can be used to abuse the reflection capabilities of protocols such as smtp to acheive XSS in the browser. See http://i8jesus.com/?p=75
I can't explain how someone would take advantage of this "vulnerability".
But, can you solve the issue to their satisfaction by passing data over an HTTPS connection instead of straight HTTP? Assuming you have an SSL certificate installed on your server and HTTPS enabled, this should be a minor change in the services-config.xml file that you compile into your Flex Application.
I pinged an Adobe colleague of mine in hopes that he can offer more insight.
I think it is a valid attack scenario. A related attack is GIFAR, where the JVM is fooled to treat a gif file as a jar. Also, I don't think output encoding is the right way to solve the problem.
The premise of the attack is to fool the browser into thinking the AMF response is HTML or Javascript. This is possible because of a feature called MIME Type Detection, which is essentially the browser saying "Developers may not know about content-types, I will play god and (possibly incorrectly) figure out the MIME type".
In order for this to work, the following need to hold true -
The attacker should be able to make a GET or POST request to your AMF server using HTML techniques like <script> or <frame> or an <a> tag. Techniques like XmlHttpRequest or Flash or Silverlight don't count.
The attacker should be able to insert malicious content into the first 256 or so bytes of the response. Additionally, this malicious content should be able to trick the browser in thinking that the rest of the response is really javascript or html.
So, how do you prevent it?
It is best to ensure the attacker cannot make a request in the first place. A very simple and effective way is to add a http request header while making the AMF request, check its existence on the server and deny the request if absent. The value can be a hard-coded value and need not be secret. This works, because there is no known method of adding a custom request header via standard html techniques. You can do so via XmlHttpRequest or flash or silverlight, but then the browser will not interpret the content-type for you, so its okay.
Now, I don't know much about AMF, but if it is already adding a request header - then this attack scenario is not possible. If it isn't, its trivial to add one.
HTML escaping the content is not a good solution. Allegedly, there are various ways to trick the browser into thinking the response is actually HTML. In other words, the malicious input need not be well formed HTML. Try a google search on mime sniffing, you should be able to find various ways to trick the browser.
I don't know how possible it is to alter data within an AMF response stream, but you might want to ensure that your endpoints cannot be manipulated through communication with the browser and/or JavaScript. Check out this article under the Malicious data injection section.

How can I reliably detect if Flash was the originator of a request to a service?

I need to be able to detect if flash was the originator of a request to an ASP.NET service. The reason being that Flash is unable to process SOAP messages when the response status code is something other than 200. However, I allow exception to bubble up through our SOAP web services and as a result the status code for a SOAP server fault is 500. Before Flash 10 I was able to check the referrer property and if it ended in .SWF I changed the status code to 200 so that our Flex application could process the SOAP messages appropriately. But since the introduction of Flash 10 the referrer is no longer sent. I would like to use the x-flash-version header, but it seems to only be sent when using IE, not FF.
Which brings me to my question: How can I reliably detect if Flash was the originator of a request to a service?
You cannot reliably do this - after all, it could be a proxy, or someone may have snooped your Flash component's traffic to work out how to reuse your API without whatever restrictions the Flash version wouldn't have.
For a basic sanity check to differentiate the output, then you could just as simply add a flag to say "Flash API version please"; But with all HTTP communications, it is relatively trivial to fake whatever is required.
How about http://domain.com/path/to/target?flash=true? If all you are doing is changing the api or returning different errors you don't need a secure detection method.
Edit: Note, this is definitely not "reliable" but do you truly need a reliable detection method or one that merely works? This works, it's just not secure and if you need it to be secure you are doing something wrong because it's impossible to know what client is actually in use.
You can check the user agent (but it could be faked), Flash uses something like "Adobe Flash"
The most secure way (of the easy options presented) is to Regex match the referrer URL which will have .swf in it.
That would be a heck of a lot harder to spoof than a query string/form param of &flash=true. It's certainly hackable using hacker tools that can send false HTTP headers (referrer) but out of the options presented it takes the most effort.

Resources