(Rails 7) status: What is it/does it do? - ruby-on-rails-7

Upgraded to rails 7, working on an invitation system where I have a nested hierarchy
Routes:
resources :invites do
resources :guests
end
Invite Model has : has_many :guests, dependent: :destroy
Guest Model has : belongs_to :invite
Issue specific code from guests_controller.rb:
def destroy
#invite = Invite.find(params[:invite_id])
#guest = Guest.find(params[:id])
#guest.destroy
redirect_to invite_path(#invite)#, status: :see_other
end
When I don't include the status: :see_other Rails deletes Guests associated with that Invitation, and the invitation itself.
But if I include it. It works as expected by deleting only the selected Guest entry.
What exactly does the 'status:' do. I can't find a satisfying answer as of yet.

see_other - 303 HTTP status code. It is a way to redirect the application to another URI.
When to use see_other status?
If the response to the request is another URI using a GET method.
Referral link for in-depth understanding!!

The destroy action fetches the guest from the database, and calls destroy on it
Then, the browser redirects to the invite path with status code 303 See Other. The status 303 provides a redirect with the GET method after the perform some another HTTP verb (DELETE in your case)
If you doesn't specify this status, browser redirects to a new location with DELETE method. You probably have route for invitation destroy, that's why invitation is destroyed automagically
Rails 7 works this way, official rails guide recommends to specify such status
Please also see official docs
The status code can either be a standard HTTP Status code as an integer, or a symbol representing the downcased, underscored and symbolized description. Note that the status code must be a 3xx HTTP code, or redirection will not occur.
If you are using XHR requests other than GET or POST and redirecting after the request then some browsers will follow the redirect using the original request method. This may lead to undesirable behavior such as a double DELETE. To work around this you can return a 303 See Other status code which will be followed using a GET request.

Related

unable to crawl a website using scrappy but the same website can be requested and used using scrappy shell using same settings

I am trying to crawl the website https://www.rightmove.co.uk/properties/105717104#/?channel=RES_NEW
but I get (410) error
INFO: Ignoring response <410 https://www.rightmove.co.uk/properties/105717104>: HTTP status code is not handled or not allowed
I am just trying to find the properties that have been sold using the notification on the page "This property has been removed by the agent."
I know the website has not blocked me because I am able to use the scrappy shell to get the data and also view(response) works fine too, I can directly go to the same URL using web browser so the 410 doesn't make sense I can also crawl pages from the same domain,
(ie) the pages without the notification "This property has been removed by the agent."
Any help would be much appreciated.
Seem's the when a listing has been marked as removed by and agent on Rightmove then the website will return status code 410 Gone (Which is quite weird). But to solve this, simply do something like this in your request:
def start_requests(self):
yield scrapy.Request(
url='https://www.rightmove.co.uk/properties/105717104#/?channel=RES_NEW',
meta={
'handle_httpstatus_list': [410],
}
)
EDIT
Explanation: Basically, Scrapy will only handle the status code from the response is in the range 200-299, since 2XX means that it was a successful response. In your case, you got a 4XX status code which means that some error happened. By passing handle_httpstatus_list = [410] we tell Scrapy that we want it to also handle 410 responses and not only 200-299.
Here is the docs: https://docs.scrapy.org/en/latest/topics/spider-middleware.html#std-reqmeta-handle_httpstatus_list

API Status Page Response Codes

(This is sort of an abstract philosophical question. But I believe it has objective concrete answers.)
I'm writing an API, my API has a "status" page (like, https://status.github.com/).
If whatever logic I have in place to determine the status says everything is good my plan would be to return 200 OK, and a JSON response with more information about each service tested by my status page.
But what if my logic says the API is down? Say the database isn't responding or something.
I think I want to return 500 INTERNAL SERVER ERROR (or 503 SERVICE NOT AVAILABLE) along with a JSON response with more details.
However, is that breaking the HTTP Status Code spec? Would that confuse end users? My status page itself is working just fine in that case. So maybe it should return 200? But that would mean anyone using it would have to dig into the body looking for a specific parameter to determine the API's status vs. just checking the HTTP Status Code. (Also if my status page itself was broken, I'm fine with the end user taking that to mean the API is down since that's a pretty bad sign...)
Thoughts? Is there official protocol on how a status page should work?
https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
For me the page should return 200 unless has problems itself. Is true that is easier to check the status code of a response than parsing but using HTTP status codes to encode application informations breaks what people (and spiders) expect. If a spider passes for your page and sees a 500 or 503 will think your site has a page with problems, not that that page is ok and is signaling that the site is down.
Also, as you notice, it wont' be possible to distinguish between the service is down and the status page is down cases, with the last the only one that should send 500. Also, what if you show more than one service like the twitter status page ? Use 200.
Related: https://stackoverflow.com/a/943021/1536382 https://stackoverflow.com/a/34324179/1536382

What happens if a 302 URI can't be found?

If I make an HTTP request to get index.html on http://www.example.com but that URL has a 302 re-direct in place that points to http://www.foo.com/index.html, what happens if the redirect target (http://www.foo.com/index.html) isn't available? Will the user agent try the original URL (http://www.example.com/index.html) or just return an error?
Background to the question: I manage a legacy site that supports a few existing customers but doesn't allow new signs ups. Pretty much all the pages are redirected (using 302s rather than 301s for some unknown reason...) to a newer site. This includes the sign up page. On one of the pages that isn't redirected there is still a link to the sign up page which itself links through to a third party payment page (i.e. on another domain). Last week our current site went down for a couple of hours and in that period someone successfully signed up through the old site. The only way I can imagine this happened is that if a 302 doesn't find its intended URL some (all?) user agents bypass the redirect and then go to originally requested URL.
By the way, I'm aware there are many better ways to handle the particular situation we're in with the two sites. We're on it! This is just one of those weird situations I want to get to the bottom of.
You should receive a 404 Not Found status code.
Since HTTP is a stateless protocol, there is no real connection between two requests of a user agent. The redirection status codes are just a way for servers to politely tell their clients that the resource they were looking for is somewhere else now. The clients, however, are in no way obliged to actually request the resource from that other URL.
Oh, the signup page is at that URL now? Well then I don't want it anymore... I'll go and look at some kittens instead.
Moreover, even if the client decides to do request the new URL (which it usually does ^^), this can be considered as a completely new communication between server and client. Neither server nor client should remember that there was a previous request which resulted in a redirection status code. Instead, the current request should be treated as if it was the first (and only) request. And what happens when you request a URL that cannot be found? You get a 404 Not Found status code.

Get the final destination after WP_Http redirects (WordPress)

I'm doing some requests to an API via WordPress, and the API uses SSL connections if they're turned on in the API settings. I'd like to determine whether SSL is turned on or off without having to ask the user if SSL is turned on on their account, and the API does a good job at redirecting, meaning
If I access http://api/endpoint and SSL is turned on, I'm redirected to https://api/endpoint
If I access https://api/endpoint and SSL is turned off, I'm redirected to http://api/endpoint
Now what I'd like to do is see whether a redirect happened or not and record that to my options so that the other requests are fired to the correct URL without any redirections.
So my question is: is there a way to determine the final destination after firing a WP_Http->request() when the request is being redirected?
I can't see any info about that in the response arrays, I only get to see the final response but I have no idea what URL that came from. What I can do is set the redirection parameter to 0 and catch the max redirects allowed error, but that's not bullet-proof, since I still don't know whether the redirect happened from http to https or simply another page under http.
I hope this all makes sense, let me know if you have any ideas.
Thanks!
~ K
check $response['headers'] - they may contain 'location' key.
It all depends on the HTTP library you are using.
See class-http.php(wp 3.0.1) file:
line 1393, http_api_curl action - curl handle available directly to catch anything.
fopen:
check lines 887-888, and $http_response_header variable.
also, try to override processHeaders function as it has an access to raw http headers.
The WP_Http class processes the headers and removes all but the last one. So you could do what jetdog described above. Check the original URL and compare it to the returned $response['headers']['location']. If it is different, than you know it redirected.

Which HTTP redirect status code is best for this REST API scenario?

I'm working on a REST API. The key objects ("nouns") are "items", and each item has a unique ID. E.g. to get info on the item with ID foo:
GET http://api.example.com/v1/item/foo
New items can be created, but the client doesn't get to pick the ID. Instead, the client sends some info that represents that item. So to create a new item:
POST http://api.example.com/v1/item/
hello=world&hokey=pokey
With that command, the server checks if we already have an item for the info hello=world&hokey=pokey. So there are two cases here.
Case 1: the item doesn't exist; it's created. This case is easy.
201 Created
Location: http://api.example.com/v1/item/bar
Case 2: the item already exists. Here's where I'm struggling... not sure what's the best redirect code to use.
301 Moved Permanently? 302 Found? 303 See Other? 307 Temporary Redirect?
Location: http://api.example.com/v1/item/foo
I've studied the Wikipedia descriptions and RFC 2616, and none of these seem to be perfect. Here are the specific characteristics I'm looking for in this case:
The redirect is permanent, as the ID will never change. So for efficiency, the client can and should make all future requests to the ID endpoint directly. This suggests 301, as the other three are meant to be temporary.
The redirect should use GET, even though this request is POST. This suggests 303, as all others are technically supposed to re-use the POST method. In practice, browsers will use GET for 301 and 302, but this is a REST API, not a website meant to be used by regular users in browsers.
It should be broadly usable and easy to play with. Specifically, 303 is HTTP/1.1 whereas 301 and 302 are HTTP/1.0. I'm not sure how much of an issue this is.
At this point, I'm leaning towards 303 just to be semantically correct (use GET, don't re-POST) and just suck it up on the "temporary" part. But I'm not sure if 302 would be better since in practice it's been the same behavior as 303, but without requiring HTTP/1.1. But if I go down that line, I wonder if 301 is even better for the same reason plus the "permanent" part.
Thoughts appreciated!
Edit: Let me try to better explain the semantics of this "get or create" operation with a more concrete example: URL shortening. This is actually much closer to my app anyway.
For URL shorteners, the most common operation by far is retrieving by ID. E.g. for http://bit.ly/4Agih5, bit.ly receives an ID of 4Agih5 and must redirect the user to its corresponding URL.
bit.ly already has an API, but it's not truly RESTful. For the sake of example, let me make up a more RESTful API. For example, querying the ID might return all sorts of info about it (e.g. analytics):
GET http://api.bit.ly/item/4Agih5
Now if I want to submit a new URL to bit.ly to shorten, I don't know the ID of my URL in advance, so I can't use PUT. I'd use POST instead.
POST http://api.bit.ly/item/
url=http://stackoverflow.com/ (but encoded)
If bit.ly hasn't seen this URL before, it'll create a new ID for it and redirect me via 201 Created to the new ID. But if it has seen that URL, it'll still redirect me without making a change. This way, I can hit that redirect location either way to get the info/metadata on the shortened URL.
Like this example of URL shortening, in my app, collisions don't matter. One URL maps to one ID, and that's it. So it doesn't really matter if the URL has been shortened before or not; either way, it makes sense to point the client to the ID for it, whether that ID needs to be created first or not.
So I probably won't be changing this approach; I'm just asking about the best redirect method for it. Thanks!
I'd argue for 303. Supposing right now hello=world&hokey=pokey uniquely identifies item foo, but later item foo's hokey value changes to "smokey"? Now those original values are no longer a unique identifier for that resource. I'd argue that a temporary redirect is appropriate.
I think one of the reasons that you are struggling with this scenario is because (unless we are missing some key information) the interaction is not very logical.
Let me explain why I think this. The initial premise is that the user is requesting to create something and has provided some key information for the resource they wish to create.
You then state that if that key information refers to an existing object then you wish to return that object. The problem is that the user did not wish to retrieve an existing object they wished to create a new one. If they cannot create the resource because either it already exists or there is a key collision then the user should be informed of that fact.
Choosing to retrieve an existing object when the user has attempted to create a new one seems to be a misleading approach.
Maybe one alternative would be to return a 404 Bad request if the resource already exists and include a link to the existing object in the entity body. The client application could choose to swallow the bad request error and simply follow the link to the existing entity and by doing so hide the issue from the user. That would be the choice of the client application, but at least the server is behaving in a clear manner.
Based on the new example, let me suggest a completely different approach. It may not work in your case, as always the devil is in the details, but maybe it will be helpful.
From the client's perspective it really has no interest in whether the server is creating a new shortened URL or pulling back an existing one. In fact, whether the server needs to generate a new ID or not is an implementation detail that is completely hidden.
Hiding the creation process could be very valuable. Maybe the server can predict in advance that lots of short urls will soon be requested related to a event such as a conference. It could pre-generate these urls in quite periods to balance the load on its servers.
So, based on that assumption, why not just use
GET /ShortUrl?longUrl=http://www.example.org/en/article/something-that-is-crazy-long.html&suggestion=crazyUrl
If the url already existed then you might get back
303 See Other
Location: http://example.org/ShortUrl/3e4tyz
If it previously didn't, you might get
303 See Other
Location: http://example.org/ShortUrl/crazyurl
I realize that this looks like we are breaking the rules of GET by creating something in response to a GET, but I believe in this case there is nothing wrong with it because client did not ask for the shortened URL to be created and really does not care either way. It is idempotent because does not matter how many times you call it.
One interesting question that I don't know the answer to is whether proxies will cache the initial GET and redirect. That might be an interesting property as future requests by other users for the same url may never need to actually get to the origin server, the proxy could handle the request completely.
POST does not support a 'lookup or create' approach. The server cannot tell the client "I would create that, but it already existed. Look here for the existing entry". None of the 2xx codes work because the request is not successful. None of the 3xx codes work, because the intention is not to redirect the POST to a new resource. And 303 is also not appropriate since nothing changed (see 303 spec).
What you could do is provide a form or template to the client to be used with PUT that tells the client how to construct the PUT URI. If the PUT results in a 200 the client knows the resource existed and if 201 is returned that a new resource has been created.
For example:
Template for URI: http://service/items/{key}
PUT http://service/items/456
[data]
201 Created
or
PUT http://service/items/456
[data]
200 Ok
You can also do a 'create but do not replace if exists' using If-None-Match:
PUT http://service/items/456
If-None-Match: *
[data]
412 Precondition failed
Jan
From the client's point of view, I would think that you could just send a 201 for case 2 the same as for case 1 as to the client the record is now "created".
HTTP 1.1. Spec (RFC 2616) suggests 303:
303 See Other
The response to the request can be found under a different URI and
SHOULD be retrieved using a GET method on that resource. This method
exists primarily to allow the output of a POST-activated script to
redirect the user agent to a selected resource. The new URI is not a
substitute reference for the originally requested resource.

Resources