Drop a connection without sending any response in axum - http

Is it possible to not send any response to an incoming request using axum?
I am trying to implement a blacklist, and while it's certainly possible to just return a 404 (you might make an argument for a different error code), if an address has earned a spot on the blacklist, I'd rather not devote the server resources or courtesy (minimal as they may be) to actually spitting out a response; I'd rather just drop the request on the floor, as it were.
The documentation mentions that errors that bubble up to the hyper layer will cause hyper to close the connection without sending a response; it goes on to explain that this is why every single service/handler/etc. in axum must return an Infallible error type. While I agree that closing the connection is (in the docs' terminology) "generally not desirable", I believe that in this particular circumstance it is specifically desirable to drop the incoming request, and hope this is possible, but can't figure out how to do it.

Related

what is difference between PUT and POST , why PUT is considered idempotent?

It is said everywhere[after reading many posts] that PUT is idempotent, means multiple requests with same inputs will produce same result as the very first request.
But, if we put same request with same inputs with POST method, then again, it will behave as PUT.
So, what is the difference in terms of Idempotent between PUT and POST.
The idea is that there should be a difference between POST and PUT, not that there is any. To clarify, the POST request should ideally create a new resource, whereas PUT request should be used to update the existing one. So, a client sending two POST requests would create two resources, whereas two PUT requests wouldn't (or rather shouldn't) cause any undesirable change.
To go into more detail, idempotency means that in an isolated environment multiple requests from the same client does not have any effect on the state of the resource. If request from another client changes the state of the resource, than it does not break the idempotency principle. Although, if you really want to ensure that put request does not end up overriding the changes by another simultaneous request from different client, you should always use etags. To elaborate, put request should always supply an etag (it got from get request) of the last resource state, and only if the etag is latest the resource should be updated, otherwise 412 (Precondition Failed) status code should be raised. In case of 412, client is suppose to get the resource again, and then try the update. According to REST, this is vital to prevent race conditions.
According to
W3C(http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html),
'Methods can also have the property of "idempotence" in that (aside
from error or expiration issues) the side-effects of N > 0 identical
requests is the same as for a single request.'

Can a web server begin responding before the client has sent the full request?

I am writing a web application for an academic research group. The researchers need to be able to upload large data sets (100MB - 1GB) in CSV format. I've written the server to process the data as it comes in. This means that if there is an error in the first row of the CSV, we can return an error straight away.
However, when this happens, the browser reports that "The connection was reset" or similar. Clearly, my web server is responding in a way that doesn't make sense.
If I explicitly close the HTTP request stream (this is Kotlin on the JVM by the way) before returning the error to the browser, then the problem goes away. However, it turns out that the close implementation of the request stream first goes and reads the whole stream to its end. So at that point the user still has to wait 30mins+ to find out that there is an error in the first row of their CSV.
Is what I am trying to do possible? Does the HTTP protocol permit a web server, in any circumstances, to begin responding before the full request body has been sent? If not, can you suggest a workaround that would allow me to deliver a user experience where the user doesn't have to wait for the whole file to be uploaded before finding out if there are any problems?
The answer is yes, according to the http spec servers should be able to send responses early and the client should stop sending the request body. Most browsers however, don't implement this correctly.
In theory, your http server needs to return a 4xx error code with a response body, then reset the connection to prevent the upload continuing in the background. See the answers below for a more detailed description of the issue. There are a couple of browser versions that do support this, so if you're doing this in lab conditions where you can control the client being used the links below will help.
https://stackoverflow.com/a/14483857/2274303
https://stackoverflow.com/a/18370751/2274303
[edit]
To answer your question about using a workaround, chunking the uploads using javascript is a good way to mitigate internet connectivity issues, but if you want to parse it in real time it's not as simple as arbitrarily breaking up the file into pieces. You need to make sure you're not splitting the file in the middle of a line, otherwise it will fail even if the data is valid. That brings up the issue of parsing a 1GB file in javascript, which isn't a good idea imo.
If you want to use javascript, continue uploading the entire file at once via an ajax request, so you can get the response outside of the main dom and force a redirect or cancel the upload. Depending on which js libraries you're using there are different ways of doing this.
None of this solves the reverse scenario. What if the file is 95% uploaded before there's an error? The researcher will need to either upload the whole thing again or edit the file to only include the rows from the error going forward. That means your application needs to support partial uploads and know to pick up where it left off. All these things are possible, but you're probably not going to find a simple workaround to get this working well.
Without understanding the dataset and what kind of validation you are doing it's hard to come up with a full solution. If parsing each row doesn't depend on the previous rows being valid, you could always upload the whole file, then display the rows with errors at the end and ask them to upload a second file with just the corrections.
The normal process of a HTTP web server happens like:
Server listens for request
Client creates request
Client sends request to server
Server processes request
Server creates response
Server sends response to client
Client processes response
The client starts the connection for communication and the server is able to respond on that connection, however if you close the connection the server will need to send a response on another connection. The browser may not allow the server to start a new connection that the client didn't request.
You may be able to respond by reading the first line and creating an error quickly, but the client will not read the response until it is done sending the request.
By sending the file in chunks or asynchronously sending lines of the file, you will be able to give feedback more immediately. You will be sending many smaller requests with the ability to respond in between.
The question was about HTTP protocol. I feel like this would be allowed by the protocol if you wrote a custom app and web app, however if you are using browsers then you must use HTTP as the companies have implemented it. In a custom app you could check for interruptions however most browsers will probably fire a full request before listening for a response, which is also a reason AJAX took off 20years ago.

Correct handling of long-running HTTP processes

I am building a REST-style Back End web service which serves up pre-generated blobs of data to a Front End site. The blobs themselves are not large and can easily be satisfied in a single HTTP response/request. The Back End is written in PHP.
It all works just fine. The difficulty is the blob regeneration, which needs considerably longer when there are many blobs. The regeneration can take longer than the response timeout on the (separately hosted) server.
I would like to proceed as follows:
– the initial request sent is "regenerate all blobs"
– processing starts, with no response until either all are complete (HTTP 200, all hunky-dory) or I reach an internal timelimit
– if the timelimit is reached, I want to send a response which indicates that the processing was incomplete (which HTTP status is appropriate, since the processing was successful, but incomplete? – 206 does not apply without Range headers...), so that the client can request continuation. I can imagine returning data that indicates what the "please continue" request should be (is this best done in a Link header?)
– the client then requests continuation and the loop continues until full success is signalled.
What is the best way to signal such things in an HTTP client-server exchange? I am prepared to write a short piece of Javascript to handle the client loop.
Thanks for any good ideas!
I did quite a bit of research about Range headers, and came to the conclusion that there was not enough consensus about the effect of using a range unit that was not "bytes". Despite the fact that the HTTP statuses 200, 206 and 416 carry useful meanings. See http://otac0n.com/blog/2012/11/21/range-header-i-choose-you.html for a good summary.
So I ended up writing a special case solution with the result in each response signalling whether to rerun the query with a "resume" value that enables already-processed blobs to be skipped.
It would be just great if the Range stuff would also allow "items" as unit – this would enable simple collections to be handled via header parameters without additional overloading of the URI.

HTTP Redirects caused by public hotspots

When a user connects for the first time to a public hotspot they often get back a welcome page with a login, instead of the requested page. This can also happen when you request a page from code, leading to corruption. We expect this type of page should always return a "302 redirect", but hard evidence that this always happens is hard to come by and we think some users may be getting back corrupted data - with a 200 return from hotspots in the world.
Does anyone know what the correct behaviour of a hotspot is, and the way to avoid it getting into the flow?
This also goes for proxies that downsample images over 3G connections etc. We have resorted to hashing our files with an integrity check to throw out altered data, but this seems a very heavy weight solution.
Note: we're happy with the data being proxied, we just want it left unaltered.
The correct status code for this case would be status code 511 (http://greenbytes.de/tech/webdav/rfc6585.html#status-511)

Why is using a HTTP GET to update state on the server in a RESTful call incorrect?

OK, I know already all the reasons on paper why I should not use a HTTP GET when making a RESTful call to update the state of something on the server. Thus returning possibly different data each time. And I know this is wrong for the following 'on paper' reasons:
HTTP GET calls should be idempotent
N > 0 calls should always GET the same data back
Violates HTTP spec
HTTP GET call is typically read-only
And I am sure there are more reasons. But I need a concrete simple example for justification other than "Well, that violates the HTTP Spec!". ...or at least I am hoping for one. I have also already read the following which are more along the lines of the list above: Does it violate the RESTful when I write stuff to the server on a GET call? &
HTTP POST with URL query parameters -- good idea or not?
For example, can someone justify the above and why it is wrong/bad practice/incorrect to use a HTTP GET say with the following RESTful call
"MyRESTService/GetCurrentRecords?UpdateRecordID=5&AddToTotalAmount=10"
I know it's wrong, but hopefully it will help provide an example to answer my original question. So the above would update recordID = 5 with AddToTotalAmount = 10 and then return the updated records. I know a POST should be used, but let's say I did use a GET.
How exactly and to answer my question does or can this cause an actual problem? Other than all the violations from the above bullet list, how can using a HTTP GET to do the above cause some real issue? Too many times I come into a scenario where I can justify things with "Because the doc said so", but I need justification and a better understanding on this one.
Thanks!
The practical case where you will have a problem is that the HTTP GET is often retried in the event of a failure by the HTTP implementation. So you can in real life get situations where the same GET is received multiple times by the server. If your update is idempotent (which yours is), then there will be no problem, but if it's not idempotent (like adding some value to an amount for example), then you could get multiple (undesired) updates.
HTTP POST is never retried, so you would never have this problem.
If some form of search engine spiders your site it could change your data unintentionally.
This happened in the past with Google's Desktop Search that caused people to lose data because people had implemented delete operations as GETs.
Here is an important reason that GETs should be idempotent and not be used for updating state on the server in regards to Cross Site Request Forgery Attacks. From the book: Professional ASP.NET MVC 3
Idempotent GETs
Big word, for sure — but it’s a simple concept. If an
operation is idempotent, it can be executed multiple times without
changing the result. In general, a good rule of thumb is that you can
prevent a whole class of CSRF attacks by only changing things in your
DB or on your site by using POST. This means Registration, Logout,
Login, and so forth. At the very least, this limits the confused
deputy attacks somewhat.
One more problem is there. If GET method is used , data is sent in the URL itself . In web server's logs , this data gets saved somewhere in the server along with the request path. Now suppose that if someone has access to/reads those log files , your data (can be user id , passwords , key words , tokens etc. ) gets revealed . This is dangerous and has to be taken care of .
In server's log file, headers and body are not logged but request path is . So , in POST method where data is sent in body, not in request path, your data remains safe .
i think that reading this resource:
http://www.servicedesignpatterns.com/WebServiceAPIStyles could be helpful to you to make difference between message API and resource api ?

Resources