Does sending POST data to a server that doesn't accept post data recieve the data? - http

I am setting up a back end API in a script of mine that contacts one of my sites by sending XML to my web server in the form of POST data. This script will be used by many and I want to limit the bandwidth waste for people that accidentally turn the feature on without a proper access key.
I will be denying requests that do not have the correct access key by maybe generating a 403 access code.
Lets say the POST data is ~500kb of data. Does the server receive all 500kb of data when this attempt is made regardless of the status code?
How about if I made the url contain the key mydomain/api/123456789 and generate 403 status on all bad access keys.
Does the POST data still get sent/received regardless or is it negotiated before the data is finally sent.
Thanks in advance!

Generally speaking, the entire request will be sent, including post data. There is often no way for the application layer to return a response like a 403 until it has received the entire request.
In reality, it will depend on the language/framework used and how closely it is linked to the HTTP server. Section 8.2.2 of RFC2616 HTTP/1.1 specification has this to say
An HTTP/1.1 (or later) client sending
a message-body SHOULD monitor the
network connection for an error status
while it is transmitting the request.
If the client sees an error status, it
SHOULD immediately cease transmitting
the body. If the body is being sent
using a "chunked" encoding (section
3.6), a zero length chunk and empty trailer MAY be used to prematurely
mark the end of the message. If the
body was preceded by a Content-Length
header, the client MUST close the
connection.
So, if you can find a language environemnt closely linked with the HTTP server (for example, mod_perl), you could do this in a way which does comply with standards.
An alternative approach you could take is to make an initial, smaller request to obtain a URL to use for the larger POST. The application can then deny providing the URL to clients without an appropriate key.

Here is great book about RESTful Web Services, where it's explained how HTTP works: http://oreilly.com/catalog/9780596529260
You can consider any request as envelope, where on top of it it's written address (URL), some properties (HTTP Headers) and inside it there's some data (if request is initiated by post method). So as you might guess you can't receive envelope partially.
Oh I forgot, it's when you are using HTTP Post with standard HTTP header "application/x-www-form-urlencoded" but if you are uploading files (correspondingly using ""multipart/form-data") Django gives you control over streamed chunks of files using Middleware classes: http://docs.djangoproject.com/en/dev/topics/http/middleware/

Related

what is grpc trailers metadata used for?

I was looking through grpc documents and found out that on the server-side you are able to set metadata both in the form of headers and trailers.
Headers seem like the usual replacement for normal HTTP headers with key-value mapping. I don't see any needs for trailers anymore seems the header is serving somewhat a similar purpose or am I missing something here?
Trailers can be used for anything the server wishes to send to the client after processing the request. Typically this should be used for information common to all methods a service provides, for example, data about the load created by the RPC for metrics purposes.
gRPC uses HTTP trailers for two purposes.
it sends its final status (grpc-status) as a trailer header after the content has been sent.
When an application or runtime error occurs during an RPC a Status and Status-Message are delivered in Trailers.
For responses end-of-stream is indicated by the presence of the END_STREAM flag on the last received HEADERS frame that carries Trailers
The second reason is to support streaming use cases. These use cases last much longer than normal HTTP requests. The HTTP trailer is used to give the post-processing result of the request or the response. For example, if there is an error during streaming data processing, you can send an error code using the trailer, which is not possible with the header before the message body.
Source 1 2

What is the actual difference between the different HTTP request methods besides semantics?

I have read many discussions on this, such as the fact the PUT is idempotent and POST is not, etc. However, doesn't this ultimately depend on how the server is implemented? A developer can always build the backend server such that the PUT request is not idempotent and creates multiple records for multiple requests. A developer can also build an endpoint for a PUT request such that it acts like a DELETE request and deletes a record in the database.
So my question is, considering that we don't take into account any server side code, is there any real difference between the HTTP methods? For example, GET and POST have real differences in that you can't send a body using a GET request, but you can send a body using a POST request. Also, from my understanding, GET requests are usually cached by default in most browsers.
Are HTTP request methods anything more than just a logical structure (semantics) so that as developers we can "expect" a certain behavior based on the type of HTTP request we send?
You are right that most of the differences are on the semantic level, and if your components decide to assign other semantics, this will work as well. Unless there are components involved that you do not control (libraries, proxies, load balancers, etc).
For instance, some component might take advantage of the fact that PUT it idempotent and thus can re retried, while POST is not.
The Hypertext Transfer Protocol (HTTP) is designed to enable communications between clients and servers.
HTTP works as a request-response protocol between a client and server.
A web browser may be the client, and an application on a computer that hosts a web site may be the server.
Example: A client (browser) submits an HTTP request to the server; then the server returns a response to the client. The response contains status information about the request and may also contain the requested content.
HTTP Methods
GET
POST
PUT
HEAD
DELETE
PATCH
OPTIONS
The GET Method
GET is used to request data from a specified resource.
GET is one of the most common HTTP methods.
Note that the query string (name/value pairs) is sent in the URL of a GET request.
The POST Method
POST is used to send data to a server to create/update a resource.
The data sent to the server with POST is stored in the request body of the HTTP request.
POST is one of the most common HTTP methods.
The PUT Method
PUT is used to send data to a server to create/update a resource.
The difference between POST and PUT is that PUT requests are idempotent. That is, calling the same PUT request multiple times will always produce the same result. In contrast, calling a POST request repeatedly have side effects of creating the same resource multiple times.
The HEAD Method
HEAD is almost identical to GET, but without the response body.
In other words, if GET /users returns a list of users, then HEAD /users will make the same request but will not return the list of users.
HEAD requests are useful for checking what a GET request will return before actually making a GET request - like before downloading a large file or response body.
The DELETE Method
The DELETE method deletes the specified resource.
The OPTIONS Method
The OPTIONS method describes the communication options for the target resource.
src. w3schools

What does client send after receiving a "100 Continue" status code?

Client sends a POST or PUT request that includes the header:
Expect: 100-continue
The server responds with the status code:
100 Continue
What does the client send now? Does it send an entire request (the previously send request line and headers along with the previously NOT sent content)? Or does it only send the content?
I think it's the later, but I'm struggling to find concrete examples online. Thanks.
This should be all the information you need regarding the usage of a 100 Continue response.
In my experienced this is really used when you have a large request body. It could be considered to be roughly complementary to the HEAD method in relation to GET requests - fetch just the header information and not the body (usually) to reduce network load. 100 responses are used to determine whether the server will accept the request based purely on the headers - so that, for example, if you try and send a large POST/PUT request to a non-existent server resource it will result in a 404 before the entire request body has been sent.
So the short answer to your question is - yes, it's the latter. Although, you should always read the RFC for a complete picture. RFC2616 contains 99% of the information you would ever need to know about HTTP - there are some more recent RFCs that settle some ambiguities and offer a few small extensions to the protocol but off the top of my head I can't remember what they are.

Payloads of HTTP Request Methods

The Wikipedia entry on HTTP lists the following HTTP request methods:
HEAD: Asks for the response identical to the one that would correspond to a GET request, but without the response body.
GET: Requests a representation of the specified resource.
POST: Submits data to be processed (e.g., from an HTML form) to the identified resource. The data is included in the body of the request.
PUT: Uploads a representation of the specified resource.
DELETE: Deletes the specified resource.
TRACE: Echoes back the received request, so that a client can see what (if any) changes or additions have been made by intermediate servers.
OPTIONS: Returns the HTTP methods that the server supports for specified URL. This can be used to check the functionality of a web server by requesting '*' instead of a specific resource.
CONNECT: Converts the request connection to a transparent TCP/IP tunnel, usually to facilitate SSL-encrypted communication (HTTPS) through an unencrypted HTTP proxy.
PATCH: Is used to apply partial modifications to a resource.
I'm interested in knowing (specifically regarding the first five methods):
which of these methods are able (supposed to?) receive payloads
of the methods that can receive payloads, how do they receive it?
via query string in URL?
via URL-encoded body?
via raw / chunked body?
via a combination of ([all / some] of) the above?
I appreciate all input, if you could share some (preferably light) reading that would be great too!
Here is the summary from RFC 7231, an updated version of the link #Darrel posted:
HEAD - No defined body semantics.
GET - No defined body semantics.
PUT - Body supported.
POST - Body supported.
DELETE - No defined body semantics.
TRACE - Body not supported.
OPTIONS - Body supported but no semantics on usage (maybe in the future).
CONNECT - No defined body semantics
As #John also mentioned, all request methods support query strings in the URL (one notable exception might be OPTIONS which only seems to be useful [in my tests] if the URL is HOST/*).
I haven't tested the CONNECT and PATCH methods since I have no interest in them ATM.
RFC 7231, HTTP 1.1 Semantics and Content, is the most up-to-date and authoritative source on the semantics of the HTTP methods. This spec says that there are no defined meaning for a payload that may be included in a GET, HEAD, OPTIONS, or CONNECT message. Section 4.3.8 says that the client must not send a body for a TRACE request. So, only TRACE cannot have a payload, but GET, HEAD, OPTIONS, and CONNECT probably won't and the server isn't expected to know how to handle it if the client sends one (meaning it can ignore it).
If you believe anything is ambiguous, then there is a mailing list where you can voice your concerns.
I'm pretty sure it's not clear whether or not GET requests can have payloads. GET requests generally post form data through the query string, same for HEAD requests. HEAD is essentially GET - except it doesn't want a response body.
(Side note: I say it's not clear because a GET request could technically upgrade to another protocol; in fact, a version of websockets did just this, and while some proxy software worked fine with it, others chocked upon the handshake.)
POST generally has a body. Nothing is stopping you from using a query string, but the POST body will generally contain form data in a POST.
For more (and more detailed) information, I'd hit the actual HTTP/1.1 specs.

POST/Redirect/GET (PRG) vs. meaningful 2xx response codes

Since the POST request in a POST/Redirect/GET (PRG) pattern returns a redirect (303 See Other) status code on success, is it at all possible to inform the client of the specific flavour of success they are to enjoy (eg. OK, Created, Accepted, etc.) as well as any appropriate headers (eg. Location for a 201 Created, which might conflict with that of the redirect)?
Might it be appropriate, for example, to make the redirected GET respond with the proper response code & headers that would be expected from the POST response?
The HTTP 1.1 spec says:
This method [303] exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource.
But doesn't offer any insight into the loss of the more usual status code and headers.
Edit - An example:
A client sends POST request to /orders which creates a new resource at /orders/1.
If the server sends a 201 Created status with location: /orders/1, an automated client will be happy because it knows the resource was created, and it know where it is, but a human using a web browser will be unhappy, because they get the page /orders again, and if they refresh it they're going to send another order, which is unlikely to be what they want.
If the server sends a 303 See Other status with location: /orders/1 the human will be taken to their order, informed of its existence and state and will not be in danger of repeating it by accident. The automated client, though, won't be told explicitly of the resource's creation, it'll have to infer creation based on the location header. Furthermore, if the 303 redirects somewhere else (eg. /users/someusername/orders) the human may be well accomodated, but the automated client is left drastically uninformed.
My suggestion was to send 201 Created as the response to the redirected get request on the new resource, but the more I think about it, the less I like it (could be tricky to ensure only the creator receives the 201 and it shouldn't appear that the GET request created the resource).
What's the optimal response in this situation?
Send human-targetted information in the response body as HTML. Don't differentiate on the User-Agent header; if you also need to send bodies to machines, differentiate based upon the Accept request header.
If you have control over the web server, how about differentiating between the Agent header ?
Fill it in something only you know of (a GUID or other pseudo-random thing) and present that one to the webserver from the automated client. Then have the webserver response with 201 / 303 accordingly.

Resources