While looking at the code in "petclinic", part of Spring 3.0 samples I noticed the following lines
<c:choose>
<c:when test="${owner.new}"><c:set var="method" value="post"/></c:when>
<c:otherwise><c:set var="method" value="put"/></c:otherwise>
</c:choose>
In this discussion at SO it seems that PUT should be used for "create/update" and POST for "updates".
Which is right?
What is the impact of using post for "create" and put for "update"?
Note : According to the HTTP/1.1 spec. quoted in the referenced SO discussion, the code given above seems to have the correct behavior.
Both POST and PUT are have well defined behavior as per HTTP spec.
The result of a POST request should be a new resource that is subordinate to the request URL; the response should contain Location header with the URL of the newly created resource.
The result of a PUT should be an update of the resource at the request URL. if there is no existing resource at the request URL, a new one can be created.
The confusion arises from the fact that POST is also used with forms as a mechanism to pass the form data. Most common implementation of forms is to post back to the same URL at which the form page is located, thus giving the false idea that the POST operation is used for an update. However, in this particular usage, the form page is not the resource.
With all this in mind, here's the correct (in my opinion of course :-)) usage:
POST should be used to create new resources when:
- the new resource is subordinate to an existing resource
- the resource identity/URL is not known at creation time
PUT should be used to update existing resources with well-known URL. It can be used to create a resource at well-known URL as well; however, it does help to think about this scenario in a different way - if the resource URL is known before the PUT request is made, this could be treated the same as the resource at this location already existing but being empty.
It's quite simple:
POST allows anything to happen, and it isn't restricted to creating "subordinate" resources, but allows the client to "provide a block of data ... to a data-handling process" (RFC 2616 sec 9.5). POST means "Here's that data you asked for just now"
PUT is used as an opposite of GET. The usual flow is that you GET a resource, modify it somehow, and then you PUT it back at the same URI that you got it from. PUT means "Please store this file at this URI".
The uniformity of PUT (which is to store a file) allows intermediaries (e.g. caches) to invalidate any cached responses they might have at that exact URI (since they know that it's about to change). The uniformity of PUT also allows clients (that understand this) to modify a resource by first retrieving it (GET) and then send a modified copy back (PUT). It also allows clients to retry on a network failure, due to PUT's idempotency.
Side note: Using PUT to create resources is dubious. While it's possible within the spec, I don't see it as a good idea, just as using POST to perform searches isn't a good idea, just as tunneling SOAP over HTTP isn't a good idea. AtomPub explicitly states that PUT isn't used to create atom entries.
POSTs ubiquitousness comes from the fact that HTML defines <form> elements that result in POSTing a application/x-www-form-urlencoded entity, with which the recipient can do anything it pleases, including
creating subordinate resources (The repsonse is usually accompanied by a 201 response and Location header)
creating a completely different resource (again usually a 201 response and Location header)
creating many subordinate and/or unrelated resources (perhaps with a simple response indicating the URIs of the created resources)
doing nothing except return a response (e.g. 200 or 302) (a case where perhaps GET should have been used)
modifying the resource that received the POST itself (returning or redirecting back to the updated resource).
delete one or more resource.
any combination of the above.
The only one who knows what will happen in a POST request is the user who initiated the request (by clicking the huge "yes I confirm deleting my Facebook profile" button) and the server that's handling the request. To the rest of the world, the request is opaque and doesn't mean anything other than "this URI is being passed some data".
So the answer to your question is that both POST and PUT can be used for both create and update.
POST is often use to create resources (like AtomPub 9.2)
PUT semantics fits well for modifying resources (like AtomPub 9.3)
POST may be used to modify resources (like a www form edit your profile)
PUT can technically be used to create resources (although I advise against it)
Related
I would like to try send requests.get to this website:
requests.get('https://rent.591.com.tw')
and I always get
<Response [404]>
I knew this is a common problem and tried different way but still failed.
but all of other website is ok.
any suggestion?
Webservers are black boxes. They are permitted to return any valid HTTP response, based on your request, the time of day, the phase of the moon, or any other criteria they pick. If another HTTP client gets a different response, consistently, try to figure out what the differences are in the request that Python sends and the request the other client sends.
That means you need to:
Record all aspects of the working request
Record all aspects of the failing request
Try out what changes you can make to make the failing request more like the working request, and minimise those changes.
I usually point my requests to a http://httpbin.org endpoint, have it record the request, and then experiment.
For requests, there are several headers that are set automatically, and many of these you would not normally expect to have to change:
Host; this must be set to the hostname you are contacting, so that it can properly multi-host different sites. requests sets this one.
Content-Length and Content-Type, for POST requests, are usually set from the arguments you pass to requests. If these don't match, alter the arguments you pass in to requests (but watch out with multipart/* requests, which use a generated boundary recorded in the Content-Type header; leave generating that to requests).
Connection: leave this to the client to manage
Cookies: these are often set on an initial GET request, or after first logging into the site. Make sure you capture cookies with a requests.Session() object and that you are logged in (supplied credentials the same way the browser did).
Everything else is fair game but if requests has set a default value, then more often than not those defaults are not the issue. That said, I usually start with the User-Agent header and work my way up from there.
In this case, the site is filtering on the user agent, it looks like they are blacklisting Python, setting it to almost any other value already works:
>>> requests.get('https://rent.591.com.tw', headers={'User-Agent': 'Custom'})
<Response [200]>
Next, you need to take into account that requests is not a browser. requests is only a HTTP client, a browser does much, much more. A browser parses HTML for additional resources such as images, fonts, styling and scripts, loads those additional resources too, and executes scripts. Scripts can then alter what the browser displays and load additional resources. If your requests results don't match what you see in the browser, but the initial request the browser makes matches, then you'll need to figure out what other resources the browser has loaded and make additional requests with requests as needed. If all else fails, use a project like requests-html, which lets you run a URL through an actual, headless Chromium browser.
The site you are trying to contact makes an additional AJAX request to https://rent.591.com.tw/home/search/rsList?is_new_list=1&type=1&kind=0&searchtype=1®ion=1, take that into account if you are trying to scrape data from this site.
Next, well-built sites will use security best-practices such as CSRF tokens, which require you to make requests in the right order (e.g. a GET request to retrieve a form before a POST to the handler) and handle cookies or otherwise extract the extra information a server expects to be passed from one request to another.
Last but not least, if a site is blocking scripts from making requests, they probably are either trying to enforce terms of service that prohibit scraping, or because they have an API they rather have you use. Check for either, and take into consideration that you might be blocked more effectively if you continue to scrape the site anyway.
One thing to note: I was using requests.get() to do some webscraping off of links I was reading from a file. What I didn't realise was that the links had a newline character (\n) when I read each line from the file.
If you're getting multiple links from a file instead of a Python data type like a string, make sure to strip any \r or \n characters before you call requests.get("your link"). In my case, I used
with open("filepath", 'w') as file:
links = file.read().splitlines()
for link in links:
response = requests.get(link)
In my case this was due to fact that the website address was recently changed, and I was provided the old website address. At least this changed the status code from 404 to 500, which, I think, is progress :)
I'm working on a direct-to-S3 upload service that operates in two parts described below. This service would not be used by browsers, but would be a RESTful API used by other software clients.
Make a request to an endpoint which certifies and validates the upload, returning an upload URL if all's well.
Make a PUT request to the URL returned from #1 to actually do the upload to S3.
How should the server structure the response for the first endpoint?
The first option I am considering would be to use GET and return a status code 302 with a Content-Location header containing the URL to upload to. However, the intent behind the redirect descriptions in the spec seems to be focussed on redirecting after a form submission.
The other option I'm considering is to use POST for the first endpoint and returning a Location header with the URL, as described here:
If a resource has been created on the origin server, the response
SHOULD be 201 (Created) and contain an entity which describes the
status of the request and refers to the new resource, and a Location
header. RFC 2616 #9.5
Please advise on what other people have used in such circumstances?
I think it mainly depends on whether your API itself will have a resource referencing the uploaded file or not. The only one with knowledge of the uploaded file is the S3 itself or your API has something referencing it?
If the first case where only S3 knows about it, then it's OK to use the GET if it acts merely as a generator for the upload parameters, including the URI.
If the second case, then it shouldn't be a GET, since you're changing something on your side. Yes, you should make a POST, but the Location header should be used to return the URI for the created resource that references the uploaded file. That resource may have the upload URI and it could act like a state-machine, tracking if the file is uploaded or not. To avoid the need for clients to GET that resource before being able to upload, you may return the upload URI in the Link header, with a rel reflecting that purpose.
Reading this page, it says:
...the URI in a PUT request identifies the entity enclosed with the request -- the user agent knows what URI is intended and the server MUST NOT attempt to apply the request to some other resource.
From this I gather that I should not have a URL exposed which accepts PUT requests where the URL does not uniquely identify a resource. e.g.:
http://www.example.com/cars
Rather, I should allow PUT requests on the following URL:
http://www.example.com/cars/123
However, in a PUT request, the content is supposed to contain the entire entity, which could, therefore, contain some sort of primary key (like the 123 in the above URL). So is it really considered bad practice to expose non-unique URLs for PUT requests when the content will contain the unique identifier? In my service, all I want to do is collect data from clients, so a RESTful service accepting PUT requrests is great, but I don't really want to have the URL being unique (as this means more work to construct the URL on the client side).
PUT should only be used when the client knows for sure the resulting resource that should contain the entity after the request completes.
If you are simply collecting data from clients, then likely you should be using POST instead.
That is why in REST semantics, the PUT is used for updates and to create any resource you are supposed to be using POST.
In the current HTTP spec, the URL fragment (the part of the URL including and following the #) is not sent to the server in any way. However with the increased spread of AJAX, which uses the fragment to maintain some form of state, there are a lot of situations where it would be useful for the server to have knowledge of the URL fragment at request time.
For example, if you go to http://facebook.com, then click a user name in your stream, the URL will become http://faceboook.com/#!/username - to allow FB to update your page without reloading all of its bootstrap JS and HTML. However, if you were to reload this with your browser, the server would have no way of seeing the "#/!username" part of the URL, and therefore could not pre-render the content for you. This forces your browser to make an extra request once the client Javascript has loaded and parsed the fragment.
I am wondering if there have been any efforts or proposals towards creating a standard mechanism to achieve this.
For example, there could be a standard HTTP header, which would be sent with the value of the URL fragment - any server which cared about such things could then have access to it.
It seems like this would be a very useful thing for the web-application community as a whole, so I am surprised to not have heard anything proposed. Perhaps I missed it though.
Imho, the fragment identifier really is not a good place to store the state, it has been designed for something else.
That being said, http://www.jenitennison.com/blog/node/154 has a good discussion of the whole subject.
I found this proposal by Google to make Ajax pages crawlable, but it addresses a more constrained set of use cases. Specifically, it creates a way to replace the URL fragment with a URL parameter to obtain the same HTML output from the server as would be generated by a client visiting the equivalent URL with the fragment. However, such URLs are useless for actually running the Ajax apps, since they would necessitate a page reload every time.
Webkit Bug 24175 - URL Redirect Loses Fragment refers to Handling of fragment identifiers in redirected URLs which may be of interest.
A suggestion for a future version of HTTP may be to add an (optional)
Fragment header to the request, which holds the fragment identifier.
Even simpler may be to allow an HTTP request to contain a fragment
identifier.
I'm working on a REST API. The key objects ("nouns") are "items", and each item has a unique ID. E.g. to get info on the item with ID foo:
GET http://api.example.com/v1/item/foo
New items can be created, but the client doesn't get to pick the ID. Instead, the client sends some info that represents that item. So to create a new item:
POST http://api.example.com/v1/item/
hello=world&hokey=pokey
With that command, the server checks if we already have an item for the info hello=world&hokey=pokey. So there are two cases here.
Case 1: the item doesn't exist; it's created. This case is easy.
201 Created
Location: http://api.example.com/v1/item/bar
Case 2: the item already exists. Here's where I'm struggling... not sure what's the best redirect code to use.
301 Moved Permanently? 302 Found? 303 See Other? 307 Temporary Redirect?
Location: http://api.example.com/v1/item/foo
I've studied the Wikipedia descriptions and RFC 2616, and none of these seem to be perfect. Here are the specific characteristics I'm looking for in this case:
The redirect is permanent, as the ID will never change. So for efficiency, the client can and should make all future requests to the ID endpoint directly. This suggests 301, as the other three are meant to be temporary.
The redirect should use GET, even though this request is POST. This suggests 303, as all others are technically supposed to re-use the POST method. In practice, browsers will use GET for 301 and 302, but this is a REST API, not a website meant to be used by regular users in browsers.
It should be broadly usable and easy to play with. Specifically, 303 is HTTP/1.1 whereas 301 and 302 are HTTP/1.0. I'm not sure how much of an issue this is.
At this point, I'm leaning towards 303 just to be semantically correct (use GET, don't re-POST) and just suck it up on the "temporary" part. But I'm not sure if 302 would be better since in practice it's been the same behavior as 303, but without requiring HTTP/1.1. But if I go down that line, I wonder if 301 is even better for the same reason plus the "permanent" part.
Thoughts appreciated!
Edit: Let me try to better explain the semantics of this "get or create" operation with a more concrete example: URL shortening. This is actually much closer to my app anyway.
For URL shorteners, the most common operation by far is retrieving by ID. E.g. for http://bit.ly/4Agih5, bit.ly receives an ID of 4Agih5 and must redirect the user to its corresponding URL.
bit.ly already has an API, but it's not truly RESTful. For the sake of example, let me make up a more RESTful API. For example, querying the ID might return all sorts of info about it (e.g. analytics):
GET http://api.bit.ly/item/4Agih5
Now if I want to submit a new URL to bit.ly to shorten, I don't know the ID of my URL in advance, so I can't use PUT. I'd use POST instead.
POST http://api.bit.ly/item/
url=http://stackoverflow.com/ (but encoded)
If bit.ly hasn't seen this URL before, it'll create a new ID for it and redirect me via 201 Created to the new ID. But if it has seen that URL, it'll still redirect me without making a change. This way, I can hit that redirect location either way to get the info/metadata on the shortened URL.
Like this example of URL shortening, in my app, collisions don't matter. One URL maps to one ID, and that's it. So it doesn't really matter if the URL has been shortened before or not; either way, it makes sense to point the client to the ID for it, whether that ID needs to be created first or not.
So I probably won't be changing this approach; I'm just asking about the best redirect method for it. Thanks!
I'd argue for 303. Supposing right now hello=world&hokey=pokey uniquely identifies item foo, but later item foo's hokey value changes to "smokey"? Now those original values are no longer a unique identifier for that resource. I'd argue that a temporary redirect is appropriate.
I think one of the reasons that you are struggling with this scenario is because (unless we are missing some key information) the interaction is not very logical.
Let me explain why I think this. The initial premise is that the user is requesting to create something and has provided some key information for the resource they wish to create.
You then state that if that key information refers to an existing object then you wish to return that object. The problem is that the user did not wish to retrieve an existing object they wished to create a new one. If they cannot create the resource because either it already exists or there is a key collision then the user should be informed of that fact.
Choosing to retrieve an existing object when the user has attempted to create a new one seems to be a misleading approach.
Maybe one alternative would be to return a 404 Bad request if the resource already exists and include a link to the existing object in the entity body. The client application could choose to swallow the bad request error and simply follow the link to the existing entity and by doing so hide the issue from the user. That would be the choice of the client application, but at least the server is behaving in a clear manner.
Based on the new example, let me suggest a completely different approach. It may not work in your case, as always the devil is in the details, but maybe it will be helpful.
From the client's perspective it really has no interest in whether the server is creating a new shortened URL or pulling back an existing one. In fact, whether the server needs to generate a new ID or not is an implementation detail that is completely hidden.
Hiding the creation process could be very valuable. Maybe the server can predict in advance that lots of short urls will soon be requested related to a event such as a conference. It could pre-generate these urls in quite periods to balance the load on its servers.
So, based on that assumption, why not just use
GET /ShortUrl?longUrl=http://www.example.org/en/article/something-that-is-crazy-long.html&suggestion=crazyUrl
If the url already existed then you might get back
303 See Other
Location: http://example.org/ShortUrl/3e4tyz
If it previously didn't, you might get
303 See Other
Location: http://example.org/ShortUrl/crazyurl
I realize that this looks like we are breaking the rules of GET by creating something in response to a GET, but I believe in this case there is nothing wrong with it because client did not ask for the shortened URL to be created and really does not care either way. It is idempotent because does not matter how many times you call it.
One interesting question that I don't know the answer to is whether proxies will cache the initial GET and redirect. That might be an interesting property as future requests by other users for the same url may never need to actually get to the origin server, the proxy could handle the request completely.
POST does not support a 'lookup or create' approach. The server cannot tell the client "I would create that, but it already existed. Look here for the existing entry". None of the 2xx codes work because the request is not successful. None of the 3xx codes work, because the intention is not to redirect the POST to a new resource. And 303 is also not appropriate since nothing changed (see 303 spec).
What you could do is provide a form or template to the client to be used with PUT that tells the client how to construct the PUT URI. If the PUT results in a 200 the client knows the resource existed and if 201 is returned that a new resource has been created.
For example:
Template for URI: http://service/items/{key}
PUT http://service/items/456
[data]
201 Created
or
PUT http://service/items/456
[data]
200 Ok
You can also do a 'create but do not replace if exists' using If-None-Match:
PUT http://service/items/456
If-None-Match: *
[data]
412 Precondition failed
Jan
From the client's point of view, I would think that you could just send a 201 for case 2 the same as for case 1 as to the client the record is now "created".
HTTP 1.1. Spec (RFC 2616) suggests 303:
303 See Other
The response to the request can be found under a different URI and
SHOULD be retrieved using a GET method on that resource. This method
exists primarily to allow the output of a POST-activated script to
redirect the user agent to a selected resource. The new URI is not a
substitute reference for the originally requested resource.