How to have curl destroy/reset connection? - http

I am trying to recreate an intermittent issue I see with our test automation and am using curl (not libcurl) in a loop. But I see in the headers Connection #0 to host storage.googleapis.com left intact in successive requests in my loop. I want the connection to be destroyed/reset every time. The issue I am trying to test is on the TLS handshake and re-using the connection won't help.
I searched man curl for 'destroy' and 'reset' with no results and all the results for my web searches are around others getting connection resets, so it is a bit noisy.
I feel like this might be at the OS level.
How do I have curl reset the connection immediately?

curl doesn't have such an option (while libcurl does) but you can often achieve the same effect by insisting on doing the request using HTTP/1.0 with the --http1.0 option.
This has this effect because in HTTP/1.0 persistent connections were not the default.

Related

How to send an incomplete http request using netcat?

I'd like to send a incomplete http request, or some kind of request that will temporarily block my server for a while. I wrote the server myself in C, and it is currently designed to only accept one client at a time. I want to test that this is indeed the case.
Would it be possible to send something simple, similar to GET /HTTP/1.0? I'm just doing all my testing in my terminal, not using anything else so far.
Yes, you can do this with netcat.
nc -c servername 80 <<<"GET / HTTP/1.0"
This will send the GET line and then wait for the server to respond. But the server should be waiting for headers and the blank line that ends them, so it will never respond. So nc will wait forever, keeping the connection open.

Scraping: SSL_ERROR_SYSCALL with cURL. Works in Chrome/Firefox

Motivation
I'm currently an exchange student at Taiwan Tech in Taipei, but the course overview/search engine is not very comfortable to use - so I'm trying to scrape it, which unexpectedly leads to a lot of difficulties.
Problem
Opening https://qcourse.ntust.edu.tw works just fine when using Chrome/Firefox, however, I run in to trouble when trying to use command line interfaces:
# Trying to use curl:
$ curl https://qcourse.ntust.edu.tw
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to qcourse.ntust.edu.tw:443
# Trying to use wget:
$ wget https://qcourse.ntust.edu.tw
--2019-02-25 12:13:55-- https://qcourse.ntust.edu.tw/
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving qcourse.ntust.edu.tw (qcourse.ntust.edu.tw)... 140.118.242.168
Connecting to qcourse.ntust.edu.tw (qcourse.ntust.edu.tw)|140.118.242.168|:443... connected.
GnuTLS: The TLS connection was non-properly terminated.
Unable to establish SSL connection.
I also run into trouble when trying to use the browser Pale Moon
What I've considered
Maybe there is a problem with the certificate itself?
Seemingly not:
# This uses the same wildcard certificate (*.ntust.edu.tw) as qcourse.ntust.edu.tw
# (I double checked, and the SHA256 fingerprint is identical)
$ curl https://www.ntust.edu.tw
<html><head><meta http-equiv='refresh' content='0; url=bin/home.php'><title>title</title></head></html>%
Maybe I need specific headers that only Chrome/Firefox sends by default?
It seems like this doesn't solve anything either. By opening the request (Network tab) in Chrome, right clicking, and choosing "Copy" > "Copy as cURL", I get the same error message as earlier.
Additional information
The course overview site is written in ASP.NET, and seems to be running on Microsoft IIS httpd 6.0.
I find this quite mysterious and intriguing. I hope someone might be able to offer an explanation of this behaviour, and if possible: a workaround.
As you can see from the SSLLabs report this is a server with a terrible setup. It is getting a rating of F since it supports the totally broken SSLv2, mostly broken SSLv3 and many many totally broken ciphers. The only kind of secure way to access this server is using TLS 1.0 with TLS_RSA_WITH_3DES_EDE_CBC_SHA (3DES), a cipher which is not considered insecure as the others but only weak.
Only, since 3DES is considered weak (albeit not insecure) it is disabled by default in most modern TLS stacks. One need to specifically enable the support for it. For curl with OpenSSL backend this would look like this, provided that the OpenSSL library you use still supports 3DES in the first place (not the case with default build of OpenSSL 1.1.1):
$ curl -v --cipher '3DES' https://qcourse.ntust.edu.tw

Tomcat occasionally returns a response without HTTP headers

I’m investigating a problem where Tomcat (7.0.90 7.0.92) returns a response with no HTTP headers very occasionally.
According to the captured packets by Wireshark, after Tomcat receives a request it just returns only a response body. It returns neither a status line nor HTTP response headers.
It makes a downstream Nginx instance produce the error “upstream sent no valid HTTP/1.0 header while reading response header from upstream”, return 502 error to the client and close the corresponding http connection between Nginx and Tomcat.
What can be a cause of this behavior? Is there any possibility which makes Tomcat behave this way? Or there can be something which strips HTTP headers under some condition? Or Wireshark failed to capture the frames which contain the HTTP headers? Any advice to narrow down where the problem is is also greatly appreciated.
This is a screenshot of Wireshark's "Follow HTTP Stream" which is showing the problematic response:
EDIT:
This is a screen shot of "TCP Stream" of the relevant part (only response). It seems that the chunks in the second response from the last looks fine:
EDIT2:
I forwarded this question to the Tomcat users mailing list and got some suggestions for further investigation from the developers:
http://tomcat.10.x6.nabble.com/Tomcat-occasionally-returns-a-response-without-HTTP-headers-td5080623.html
But I haven’t found any proper solution yet. I’m still looking for insights to tackle this problem..
The issues you experience stem from pipelining multiple requests over a single connection with the upstream, as explained by yesterday's answer here by Eugène Adell.
Whether this is a bug in nginx, tomcat, your application, or the interaction of any combination of the above, would probably be a discussion for another forum, but for now, let's consider what would be the best solution:
Can you post your nginx configuration? Specifically, if you're using keepalive and a non-default value of proxy_http_version within nginx? – cnst 1 hour ago
#cnst I'm using proxy_http_version 1.1 and keepalive 100 – Kohei Nozaki 1 hour ago
As per an earlier answer to an unrelated question here on SO, yet sharing the configuration parameters as above, you might want to reconsider the reasons behind your use of the keepalive functionality between the front-end load-balancer (e.g., nginx) and the backend application server (e.g., tomcat).
As per a keepalive explanation on ServerFault in the context of nginx, the keepalive functionality in the upstream context of nginx wasn't even supported until very-very recently in the nginx development years. Why? It's because there are very few valid scenarios for using keepalive when it's basically faster to establish a new connection than to wait for an existing one to become available:
When the latency between the client and the server is on the order of 50ms+, keepalive makes it possible to reuse the TCP and SSL credentials, resulting in a very significant speedup, because no extra roundtrips are required to get the connection ready for servicing the HTTP requests.
This is why you should never disable keepalive between the client and nginx (controlled through http://nginx.org/r/keepalive_timeout in http, server and location contexts).
But when the latency between the front-end proxy server and the backend application server is on the order of 1ms (0.001s), using keepalive is a recipe for chasing Heisenbugs without reaping any benefits, as the extra 1ms latency to establish a connection might as well be less than the 100ms latency of waiting for an existing connection to become available. (This is a gross oversimplification of connection handling, but it just shows you how extremely insignificant any possible benefits of the keepalive between the front-end load-balancer and the application server would be, provided both of them live in the same region.)
This is why using http://nginx.org/r/keepalive in the upstream context is rarely a good idea, unless you really do need it, and have specifically verified that it produces the results you desire, given the points as above.
(And, just to make it clear, these points are irrespective of what actual software you're using, so, even if you weren't experiencing the problems you experience with your combination of nginx and tomcat, I'd still recommend you not use keepalive between the load-balancer and the application server even if you decide to switch away from either or both of nginx and tomcat.)
My suggestion?
The problem wouldn't be reproducible with the default values of http://nginx.org/r/proxy_http_version and http://nginx.org/r/keepalive.
If your backend is within 5ms of front-end, you most certainly aren't even getting any benefits from modifying these directives in the first place, so, unless chasing Heisenbugs is your path, you might as well keep these specific settings at their most sensible defaults.
We see that you are reusing an established connection to send the POST request and that, as you said, the response comes without the status-line and the headers.
after Tomcat receives a request it just returns only a response body.
Not exactly. It starts with 5d which is probably a chunk-size and this means that the latest "full" response (with status-line and headers) got from this connection contained a "Transfer-Encoding: chunked" header. For any reason, your server still believes the previous response isn't finished by the time it starts sending this new response to your last request.
A missing chunked seems confirmed as the screenshot doesn't show a last-chunk (value = 0) ending the previous request. Note that the last response ends with a last-chunk (the last byte shown is 0).
What causes this ? The previous response isn't technically considered as fully answered. It can be a bug on Tomcat, your webservice library, your own code. Maybe even, you're sending your request too early, before the previous one was completely answered.
Are some bytes missing if you compare the chunk-sizes from what is actually sent to the client ? Are all buffers flushed ? Beware of the line endings (CRLF vs LF only) too.
One last cause that I'm thinking about, if your response contains some kind of user input taken from the request, you can be facing HTTP Splitting.
Possible solutions.
It is worth trying to disable the chunked encoding at your library level, for example with Axis2 check the HTTP Transport.
When reusing a connection, check your client code to make sure that you aren't sending a request before you read all of the previous response (to avoid overlapping).
Further reading
RFC 2616 3.6.1 Chunked Transfer Coding
It turned out that the "sjsxp" library which JAX-WS RI v2.1.3 uses makes Tomcat behave this way. I tried a different version of JAX-WS RI (v2.1.7) which doesn't use the "sjsxp" library anymore and it solved the issue.
A very similar issue posted on Metro mailing list: http://metro.1045641.n5.nabble.com/JAX-WS-RI-2-1-5-returning-malformed-response-tp1063518.html

gawk to read last bit of binary data over a pipe without timeout?

I have a program already written in gawk that downloads a lot of small bits of info from the internet. (A media scanner and indexer)
At present it launches wget to get the information. This is fine, but I'd like to simply reuse the connection between invocations. Its possible a run of the program might make between 200-2000 calls to the same api service.
I've just discovered that gawk can do networking and found geturl
However the advice at the bottom of that page is well heeded, I can't find an easy way to read the last line and keep the connection open.
As I'm mostly reading JSON data, I can set RS="}" and exit when body length reaches the expected content-length. This might break with any trailing white space though. I'd like a more robust approach. Does anyone have a nicer way to implement sporadic http requests in awk that keep the connection open. Currently I have the following structure...
con="/inet/tcp/0/host/80";
send_http_request(con);
RS="\r\n";
read_headers();
# now read the body - but do not close the connection...
RS="}"; # for JSON
while ( con |& getline bytes ) {
body = body bytes RS;
if (length(body) >= content_length) break;
print length(body);
}
# Do not close con here - keep open
Its a shame this one little thing seems to be spoiling all the potential here. Also in case anyone asks :) ..
awk was originally chosen for historical reasons - there were not many other language options on this embedded platform at the time.
Gathering up all of the URLs in advance and passing to wget will not be easy.
re-implementing in perl/python etc is not a quick solution.
I've looked at trying to pipe urls to a named pipe and into wget -i - , that doesn't work. Data gets buffered, and unbuffer not available - also I think wget gathers up all the URLS until EOF before processing.
The data is small so lack of compression is not an issue.
The problem with the connection reuse comes from the HTTP 1.0 standard, not gawk. To reuse the connection you must either use HTTP 1.1 or try some other non-standard solutions for HTTP 1.0. Don't forget to add the Host: header in your HTTP/1.1 request, as it is mandatory.
You're right about the lack of robustness when reading the response body. For line oriented protocols this is not an issue. Moreover, even when using HTTP 1.1, if your scripts locks waiting for more data when it shouldn't, the server will, again, close the connection due to inactivity.
As a last resort, you could write your own HTTP retriever in whatever langauage you like which reuses connections (all to the same remote host I presume) and also inserts a special record separator for you. Then, you could control it from the awk script.

NGINX Reverse Proxy : Many html status code 400 responses, why?

We have recently implemented a nginx based reverse proxy.
While, debugging our access logs, we are seeing quite a bit of status code 400 results.
They look something like this:
[07/Sep/2011:05:49:04 -0700] - "400" 0 "-" "-" "-"
We have enabled debug error logging, and they usually correspond to something like this:
2011/09/07 05:09:28 [info] 5937#0: *30904 client closed prematurely connection while reading client request line
We have tried raising a number of the buffers, as mentioned by a few pages we were able to google up.
http://www.ruby-forum.com/topic/173362
or
http://blog.craz8.com/articles/2009/06/17/nginx-400-bad-request-errors-due-to-cookies-and-what-to-do-about-them
To no avail.
Why is this happening?
This is a strandard nginx reverse proxy -> apache backend server.
Worth mentioning, the unique type of content on our site is fairly minimal. We have tested this using many browsers and are not personally receiving any of these 400 results.
Thanks!
Further urls detailing similar entries in their logs:
http://blog.rayfoo.info/2009/10/weird-web-server-access-log-entries
I found this was caused by using Chrome, which apparently opens extra connections occasionally without sending any data.
Here's some more info: http://www.ruby-forum.com/topic/2953545
Now the question is what to do about them - the answer provided there wasn't very satisfying.
Are you handling SSL connections? Can you add $ssl_cipher $ssl_protocol to your access log format?
First, it's fairly possible that your clients send request with really big http headers or urls. Maybe an older version of your application set some (probably big) cookies which are unused now and some clients are still trying send them back.
I'd set the header buffers to a really big value and on the application side log the size of the headers/requests and the complete request if they are bigger than usual. Or completely take out the nginx from the chain and log the header/request with the same conditions. If you can, take out the nginx for only those IPs/subnets where the 400 errors came from. I suppose nginx can log the source IP for these 400 errors.

Resources