Tomcat not handling encoded URL - nginx

I'm hitting a problem with Tomcat 8.5 when using URL that needs to be encoded. The URL contains [ and ] characters, and those are correctly being encoded as %5B and %5D. I send the request with curl e.g. curl http://somewhere.com/foo%5Bbar%5D but Tomcat throws a 400 error stating that "The server cannot or will not process the request due to something that is perceived to be a client error (e.g., malformed request syntax, invalid request message framing, or deceptive request routing).".
In the Tomcat access log the URL is report in its decoded form along with the 400 response.
I do have an nginx reverse proxy in between, but I don't think that would be decoding the URL.
Any ideas what's wrong?

So it turned out to be a know problem with the nginx ingress in that when it has to rewrite the URL is decodes it and sends the decoded URL downstream to Tomcat which then blows up.
There's various discussions on this, but as I needed to fix this as part of the K8S ingress definition I used the info described here. I've not sure where the upstream_balancer thing comes from but must be some black magic, but it does seem to fix the problem. I'm sure there are other approaches that would also work.

Related

nginx 1.21.1 - spaces in request line - 400 Bad Request

Since version 1.21.1 nginx now returns a HTTP/1.1 400 Bad Request error when there is a space in the request url.
From the nginx official changelog:
*) Change: now nginx always returns an error if spaces or control
characters are used in the request line.
My server is still receiving some requests with spaces in the GET query string from some older Android devices. I know that spaces in the url need to be encoded, but I have a few special cases where this is not possible right now.
Is there any option to turn off this new behaviour without having to revert to version 1.21.0?

HTTP 403 Forbidden Message Format

What is the correct format for sending an HTTP 403 forbidden message?
I'm writing a proxy in c for a homework project that has a content filtering system built in. When my proxy detects that a server's response has certain keywords that are contained in the content blacklist, I would like to send a HTTP 403 Forbidden message.
Currently, I am sending the message as: "HTTP/1.1 403 Forbidden\r\n\r\n" (without the quotes) as per this standard: https://www.rfc-editor.org/rfc/rfc7231#section-6.5.3
When I send this message, the browser doesn't display an error and looks like it's still trying to load the page.
Are there any required header fields for this http message that I missed? Also, is this the correct usage for the 403 error? I couldn't find anything else that would be more fitting, so I chose 403 because the client won't automatically re-request the data.
Thanks in advance for any help!
For those struggling with this issue as I did, you need to make sure to close the socket or set Connection: Close as Sami noted in the comments. I assumed that you could keep it open so they could send another request with http persistent connections, but they will need to open a new connection.
As for the html displayed, you can send a body with the response (make sure you set Content-Length) that contains the html you want displayed.
Finally, here are two references, one to the HTTP response spec, and the other to the Amazon Restful response spec:
https://www.rfc-editor.org/rfc/rfc7231#section-6.5.3
https://developer.amazon.com/docs/amazon-drive/ad-restful-api-response-codes.html

What is the correct status code if request URL does not match regex

I'm trying to develop error handling in my REST API and I'm currently working on the response codes. I haven't found a proper answer to what would be the appropriate response code for my problem.
I have routes which are managed like this in the back-end code:
$site->route([
'route' => '/{id:^[a-z]+$}'
]);
Now lets say the user inputs www.example.com/page1 which does not match the regex pattern. What would be the correct response? I am thinking either a 404, page not found, but I also think that a 400 response would be correct, because it describes that there was an error with the request.
I already use 404 if static URLs are invalid, so this is a question of matching dynamic parts of the URLs.
What are the field of use of both of these response codes in the case of REST APIs?
I think 404 is more appropriate.
According to the specification, 400 means:
The 400 (Bad Request) status code indicates that the server cannot or
will not process the request due to something that is perceived to be
a client error.
While 404 means:
The 404 (Not Found) status code indicates that the origin server did
not find a current representation for the target resource or is not
willing to disclose that one exists.
From client point of view, request to /page1 will result in error, because the resource page1 does not exist (no route is matched in server side).
Normally, 400 means the server has already targeted the resource, but cannot return that resource due to client error. In this scenario, server cannot target the resource if no route is matched. Anyway, if the request is sent to /123456 and the status code is 400, what should be the error message? "Your requested URL is incorrect" sounds like 404, and "Your requested URL should follow regular expression: ...^[a-z]+$..." sounds very weird.

jetty BadMessage: 400 No Host for HttpChannelOverHttp

I have seen previous posts about Jetty BadMessage: 400 No Host for HttpChannelOverHttp and I can confirm that I am able to repeat the problem.
I have a Jetty route in Camel Blueprint, which creates another request and forwards on to a Dropwizard service via Camel HTTP.
.process(new Processor() {
//Creates Object for request
}
.marshal(jsonFormat)
.convertBodyTo(String.class)
.setHeader(Exchange.HTTP_URI, simple(serviceEndpoint))
.setHeader(Exchange.HTTP_METHOD, constant(HttpMethod.POST))
.to(userviceEndpoint)
When this request executes, I see the following error on Dropwizard
WARN [2014-11-12 23:15:35,333] org.eclipse.jetty.http.HttpParser: BadMessage: 400 No Host for HttpChannelOverHttp#3aa99dd2{r=0,a=IDLE,uri=-}
This happens constantly, and this problem does not occur when I send a request to the DW service using SOAP-UI (using the serviceEndpoint URL).
Please if anyone has solved this problem, I would like to know how. Thank you.
Capture your network traffic, and post the HTTP request headers you are sending to Jetty.
Odds are that your HTTP client is not sending the Host: header (which is required on HTTP/1.1)
In my case, I was setting header with null value. After removing header having null value from request solved the issue.
Jetty Version: 9.3.8
I got this error when I was making a request with incorrectly formatted headers. So instead of having the header as "X_S_ID: ABC" I had "X_S_ID: ["X_S_ID":BLAH]". So the error sometimes may not literally mean you need to pass a Host header.
Fixing the headers fixed this. Print the exact request you are making and make sure all headers are correctly formatted.

How does HTTP caching works in a proxy server?

It's my understanding that caching is one of the main utilities of a proxy server. I'm currently trying to develop a simple one and I would like to know exactly how caching works.
Intuitively I think that it's basically an association between a request and a response. For example: for the following request: "GET google.com" you have the following response: "HTTP/1.0 200 OK..."
That way, whenever the proxy server receives a request for that URL he can reply with the cached response (I'm not really worried right now about when to serve the cached response and when to actually send the request to the real destination).
What I don't understand is how to establish the association between a request and a response since the HTTP response doesn't have any field saying "hey this is the response you get when you request the X URL" (or does it?).
Should I get this information by analyzing the underlying protocols? If so, how?
Your cache proxy server is already putted into play when a request arrives. Therefore you have the requested resource URL. Then you look in your cache and try to find the cached resource for the requested resource URL, if you cannot find the resource in your cache (or the cache is outdated), you fetch the data from the source. Keep in Mind, that you have to invalidate the cached resource if you receive a PUT, POST or DELETE request.

Resources