Scraping: SSL_ERROR_SYSCALL with cURL. Works in Chrome/Firefox

Scraping: SSL_ERROR_SYSCALL with cURL. Works in Chrome/Firefox - asp.net

Motivation
I'm currently an exchange student at Taiwan Tech in Taipei, but the course overview/search engine is not very comfortable to use - so I'm trying to scrape it, which unexpectedly leads to a lot of difficulties.
Problem
Opening https://qcourse.ntust.edu.tw works just fine when using Chrome/Firefox, however, I run in to trouble when trying to use command line interfaces:
# Trying to use curl:
$ curl https://qcourse.ntust.edu.tw
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to qcourse.ntust.edu.tw:443
# Trying to use wget:
$ wget https://qcourse.ntust.edu.tw
--2019-02-25 12:13:55-- https://qcourse.ntust.edu.tw/
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving qcourse.ntust.edu.tw (qcourse.ntust.edu.tw)... 140.118.242.168
Connecting to qcourse.ntust.edu.tw (qcourse.ntust.edu.tw)|140.118.242.168|:443... connected.
GnuTLS: The TLS connection was non-properly terminated.
Unable to establish SSL connection.
I also run into trouble when trying to use the browser Pale Moon
What I've considered
Maybe there is a problem with the certificate itself?
Seemingly not:
# This uses the same wildcard certificate (*.ntust.edu.tw) as qcourse.ntust.edu.tw
# (I double checked, and the SHA256 fingerprint is identical)
$ curl https://www.ntust.edu.tw
<html><head><meta http-equiv='refresh' content='0; url=bin/home.php'><title>title</title></head></html>%
Maybe I need specific headers that only Chrome/Firefox sends by default?
It seems like this doesn't solve anything either. By opening the request (Network tab) in Chrome, right clicking, and choosing "Copy" > "Copy as cURL", I get the same error message as earlier.
Additional information
The course overview site is written in ASP.NET, and seems to be running on Microsoft IIS httpd 6.0.
I find this quite mysterious and intriguing. I hope someone might be able to offer an explanation of this behaviour, and if possible: a workaround.

As you can see from the SSLLabs report this is a server with a terrible setup. It is getting a rating of F since it supports the totally broken SSLv2, mostly broken SSLv3 and many many totally broken ciphers. The only kind of secure way to access this server is using TLS 1.0 with TLS_RSA_WITH_3DES_EDE_CBC_SHA (3DES), a cipher which is not considered insecure as the others but only weak.
Only, since 3DES is considered weak (albeit not insecure) it is disabled by default in most modern TLS stacks. One need to specifically enable the support for it. For curl with OpenSSL backend this would look like this, provided that the OpenSSL library you use still supports 3DES in the first place (not the case with default build of OpenSSL 1.1.1):
$ curl -v --cipher '3DES' https://qcourse.ntust.edu.tw

Related

How to have curl destroy/reset connection?

I am trying to recreate an intermittent issue I see with our test automation and am using curl (not libcurl) in a loop. But I see in the headers Connection #0 to host storage.googleapis.com left intact in successive requests in my loop. I want the connection to be destroyed/reset every time. The issue I am trying to test is on the TLS handshake and re-using the connection won't help.
I searched man curl for 'destroy' and 'reset' with no results and all the results for my web searches are around others getting connection resets, so it is a bit noisy.
I feel like this might be at the OS level.
How do I have curl reset the connection immediately?

curl doesn't have such an option (while libcurl does) but you can often achieve the same effect by insisting on doing the request using HTTP/1.0 with the --http1.0 option.
This has this effect because in HTTP/1.0 persistent connections were not the default.

curl error 18, attempting to solve problem using SO answer 1759956

I am trying to follow curl error 18 - transfer closed with outstanding read data remaining.
The top answer is to
...let curl set the length by itself.
I don't know how to do this. I have tried the following:
curl --ignore-content-length http://corpus-db.org/api/author/Dickens,%20Charles/fulltext
However, I still get this error:
curl: (18) transfer closed with outstanding read data remaining

The connection is just getting closed by the server after 30 seconds.
You can try to increase speed of the client but if the server is not delivering enough in the limited time you get the message even with fast connection.
In the case of the example http://corpus-db.org/api/author/Dickens,%20Charles/fulltext I got a larger amount of content with direct output:
curl http://corpus-db.org/api/author/Dickens,%20Charles/fulltext
while the amount was smaller while writing in a file (already ~47MB in 30 seconds):
curl -o Dickens,%20Charles http://corpus-db.org/api/author/Dickens,%20Charles/fulltext
Resuming file transfers can be tried, but on the example server it's not supported:
curl -C - -o Dickens,%20Charles http://corpus-db.org/api/author/Dickens,%20Charles/fulltext
curl: (33) HTTP server doesn't seem to support byte ranges. Cannot resume.
So there might be options to optimize the request, to increase the connection-speed or the cache-size but if you reached the limit and never get more data in the limited time you can't do anything.
The cUrl manual can be found here: https://curl.haxx.se/docs/manual.html
The following links won't help you but perhaps are interesting:
The repository for the data-server can be found here: https://github.com/JonathanReeve/corpus-db
The documentation for the used web-server can be found here: https://hackage.haskell.org/package/warp-3.2.13

It's a speed issue. The server at corpus-db.org will DISCONNECT YOU if you take longer than 35 seconds to download something, regardless of how much you've already downloaded.
To make matters worse, the server does not support Content-Range, so you can't download it in chunks and simply resume download where you left off.
To make matters even worse, not only is Content-Range not supported, but it's SILENTLY IGNORED, which means it seems to work, until you actually inspect what you've downloaded.
If you need to download that page from a slower connection, I recommend renting a cheap VPS, and set it up as a mirror of whatever you need to download, and download from your mirror instead. Your mirror does not need to have the 35-second-limit.
For example, this vps1 costs $1.25/month has a 1Gbps connection, and would be able to download that page. Rent one of those, install nginx on it, wget it in nginx's www folder, and download it from your mirror, and you'll have 300 seconds to download it (nginx default timeout) instead of 35 seconds. If 300 seconds is not enough, you can even change the timeout to whatever you want.
Or you could even get fancy and set up a caching proxy compatible with curl's --proxy, parameter so your command could become
curl --proxy=http://yourserver http://corpus-db.org/api/author/Dickens,%20Charles/fulltext
If someone is interested in an example implementation of this, let me know.
You can't download that page with a 4mbit connection because the server will kick you before the download is complete (after 35 seconds), but if you download it with a 1000mbit connection, you'll be able to download the entire file before the timeout kicks in.
(My home internet connection is 4mbit, and I can't download it from home, but I tried downloading it from a server with a 1000mbit connection, and that works fine.)
1PS: I'm not associated with ramnode in any way, except that I'm a (prior) happy customer of them, and I recommend them to anyone looking for cheap reliable VPSs.

Qt: How to make sure the protocol between server-client is always TLS and never fall to SSL?

I am using Qt5 on Windows 7.
I currently have a server app that uses a QSslSocket to communicate with clients.
It works ok so far, but the customer wants me to use only TLS protocol, and make sure we never fall down to SSL protocol (which he considers to be not secure enough).
In my code there's absolutely nothing set explicitly, I only used the default(s) offered by the QSslSocket class.
I saw in Qt doc that by default QSslSocket uses TLSv1_0, yet I am not quite sure, because...
On the other hand, the Qt doc says, see QSsl::SecureProtocols: "The default option, using protocols known to be secure; currently behaves similar to TlsV1Ssl3 except denying SSLv3 connections that does not upgrade to TLS."
So, I am a little bit puzzled about this...
Finally, the question is: Would the default (what I have right now) of QSslSocket class guarantee that the connection is TLS encrypted? If not, what should I do in order to be sure the connection always uses TLS protocol?

How to monitor video and https traffic using bro network security monitor

I have configured bro on my system successfully. OS is centos 7. I have to monotor multimedia traffic e.g. youtube and some social site like facebook. I started bro for some miniutes while using facebook and youtube but their is no information about youtube in http log file nithir facebook. As for I think that this is a protocol problem as facebook use https rather than http but I do not know why youtube.
I have followed following steps after setting correct interface.
[BroControl] > install
Then
[BroControl] > start
But I have not found any youtube or facebook info in http.log. How to get traffic info of such websites?

The problem is that you are expecting SSL encrypted traffic to be magically decrypted and appear in your http.log. If you look again, you will find that YouTube also runs over HTTPS.
Unless you are doing something to intercept and act as a man-in-the-middle for the SSL/TLS connections, you cannot expect to be able to see the content. If you can't see it, Bro can't see it either. :)
If you want to verify that you are properly configured, you would be best served looking at the conn.log to verify that the connections are occurring. Once you do that, search for the UID values in the other logs and I strongly suspect that you will see that you are finding SSL certificate data.

Several things come to mind
1) What are the contents of /usr/local/bro/etc/node.cfg? Make sure it is the interface you expect traffic to cross via a span or tap.
2) Run tcpdump -i <interface> where interface comes from question 1.
3) Run /usr/local/bro/bin/broctl diag to see if there are any issues.
4) Run /usr/local/bro/bin/broctl status to verify everything is running.
If the interface is wrong, the solution may be that easy.

Explanation for CONNECT observations using Fiddler for url https://www.fiddler2.com/fiddler2/version.asp

I'm using IE9 beta and Fiddler to understand the https session negotiation taking place for the above url (chosen for no paritcular reason other than it's secured).
Some observations made me curious.. does anyone understand what's happening here?
1. When I connect with Fiddler setting: HTTPS decrypt OFF, I see this sequence
5 CONNECTs to fiddler2.com with nothing but headers showing
a) Curious, why more than one?
1 CONNECT to beta.urs.microsoft.com
b) Does this have something to do with asking MS which cert it recognises? I thought this data is supposed to be kept locally? Maybe that only happened because I'm using a beta of IE9?
4 CONNECTs to fiddler2.com with the same SessionID but different Random and the list of ciphers available on the client.
1 CONNECT to beta.urs.microsoft.com with similar content to above 4
c) Why the multiple CONNECTs here with different Random?
2. When I connect with Fiddler setting: HTTPS decrypt ON, I see this sequence
5 CONNECTs to fiddler2.com with nothing but headers in the request only and the response shows a certificate and the chosen cipher. Same in all 5.
a) same question
1 GET with the page contents
d) what happened to the extra CONNECTs this time?
I'm trying to relate what I see here to the negotiation between client and server as it's documented here.
Transport Layer Security
Tyia,
Mick.

You didn't mention what browser you're using and what ciphers you have enabled in that browser.
Sometimes, you'll see multiple CONNECT handshakes because the server immediately closes the connection (ungracefully stating that they don't support the requested protocol version) and the client will retry (fallback) to an older protocol version. You definitely see this happen a lot if you enable TLSv1.1 and TLSv1.2 in IE, for instance.
You also may see multiple CONNECTs if the client aborts a connection and then attempts to open a new one.
urs.microsoft.com and beta.urs.microsoft.com are used for the SmartScreen site-reputation feature.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex