Why does curl repeat headers in the output? - unix

Options I used:
-I, --head
(HTTP/FTP/FILE) Fetch the HTTP-header only! HTTP-servers feature
the command HEAD which this uses to get nothing but the header
of a document. When used on an FTP or FILE file, curl displays
the file size and last modification time only.
-L, --location
(HTTP/HTTPS) If the server reports that the requested page has moved to a different location (indi-
cated with a Location: header and a 3XX response code), this option will make curl redo the request
on the new place. If used together with -i, --include or -I, --head, headers from all requested
pages will be shown. When authentication is used, curl only sends its credentials to the initial
host. If a redirect takes curl to a different host, it won't be able to intercept the user+password.
See also --location-trusted on how to change this. You can limit the amount of redirects to follow
by using the --max-redirs option.
When curl follows a redirect and the request is not a plain GET (for example POST or PUT), it will
do the following request with a GET if the HTTP response was 301, 302, or 303. If the response code
was any other 3xx code, curl will re-send the following request using the same unmodified method.
You can tell curl to not change the non-GET request method to GET after a 30x response by using the
dedicated options for that: --post301, --post302 and -post303.
-v, --verbose
Be more verbose/talkative during the operation. Useful for debugging and seeing what's going on
"under the hood". A line starting with '>' means "header data" sent by curl, '<' means "header data"
received by curl that is hidden in normal cases, and a line starting with '*' means additional info
provided by curl.
Note that if you only want HTTP headers in the output, -i, --include might be the option you're
looking for.
If you think this option still doesn't give you enough details, consider using --trace or --trace-
ascii instead.
This option overrides previous uses of --trace-ascii or --trace.
Use -s, --silent to make curl quiet.
Below is the output that I'm wondering about. In the response containing the redirect(301), all the headers are displayed twice, but only one of the duplicates has the < in front of it. How am I supposed to interpret that?
$ curl -ILv http://www.mail.com
* Rebuilt URL to: http://www.mail.com/
* Trying 74.208.122.4...
* Connected to www.mail.com (74.208.122.4) port 80 (#0)
> HEAD / HTTP/1.1
> Host: www.mail.com
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
HTTP/1.1 301 Moved Permanently
< Date: Sun, 28 May 2017 22:02:16 GMT
Date: Sun, 28 May 2017 22:02:16 GMT
< Server: Apache
Server: Apache
< Location: https://www.mail.com/
Location: https://www.mail.com/
< Vary: Accept-Encoding
Vary: Accept-Encoding
< Connection: close
Connection: close
< Content-Type: text/html; charset=iso-8859-1
Content-Type: text/html; charset=iso-8859-1
<
* Closing connection 0
* Issue another request to this URL: 'https://www.mail.com/'
* Trying 74.208.122.4...
* Connected to www.mail.com (74.208.122.4) port 443 (#1)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384
* Server certificate: *.mail.com
* Server certificate: thawte SSL CA - G2
* Server certificate: thawte Primary Root CA
> HEAD / HTTP/1.1
> Host: www.mail.com
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Date: Sun, 28 May 2017 22:02:16 GMT
Date: Sun, 28 May 2017 22:02:16 GMT
< Server: Apache
Server: Apache
< Vary: X-Forwarded-Proto,Host,Accept-Encoding
Vary: X-Forwarded-Proto,Host,Accept-Encoding
< Set-Cookie: cookieKID=kid%40autoref%40mail.com; Domain=.mail.com; Expires=Tue, 27-Jun-2017 22:02:16 GMT; Path=/
Set-Cookie: cookieKID=kid%40autoref%40mail.com; Domain=.mail.com; Expires=Tue, 27-Jun-2017 22:02:16 GMT; Path=/
< Set-Cookie: cookiePartner=kid%40autoref%40mail.com; Domain=.mail.com; Expires=Tue, 27-Jun-2017 22:02:16 GMT; Path=/
Set-Cookie: cookiePartner=kid%40autoref%40mail.com; Domain=.mail.com; Expires=Tue, 27-Jun-2017 22:02:16 GMT; Path=/
< Cache-Control: no-cache, no-store, must-revalidate
Cache-Control: no-cache, no-store, must-revalidate
< Pragma: no-cache
Pragma: no-cache
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Set-Cookie: JSESSIONID=F0BEF03C92839D69057FFB57C7FAA789; Path=/mailcom-webapp/; HttpOnly
Set-Cookie: JSESSIONID=F0BEF03C92839D69057FFB57C7FAA789; Path=/mailcom-webapp/; HttpOnly
< Content-Language: en-US
Content-Language: en-US
< Content-Length: 85237
Content-Length: 85237
< Connection: close
Connection: close
< Content-Type: text/html;charset=UTF-8
Content-Type: text/html;charset=UTF-8
<
* Closing connection 1

best guess: with -v you tell curl to be verbose (send debug info) to STDERR. with -I you tell curl to dump headers to STDOUT. and your shell, by default, combines STDOUT and STDERR. separate stdout and stderr, and you'll avoid the confusion.
curl -ILv http://www.mail.com >stdout.log 2>stderr.log ; cat stdout.log

Use:
curl -ILv http://www.mail.com 2>&1 | grep '^[<>\*].*$'
When cURL is called with the verbose command line flag, it sends the verbose output to stderr instead of stdout. The above command redirects stderr to stdout (2>&1), then we pipe the combined output to grep and use the above regex to only return the lines that begin with *, <, or >. All of the other lines in the output (including the dupes you were first concerned with) are removed from the output.

Related

Magnolia: Range request doesn't serve content when cache filter enabled resulting in Facebook Sharing not to work

When sending an HTTP request with a Range header to Magnolia I get a Response with
Content-Length: 0:
curl -I -X GET \
http://localhost:8080/ \
-H 'Accept-Encoding: gzip, deflate' \
-H 'Cache-Control: no-cache' \
-H 'Range: bytes=0-2000'
HTTP/1.1 206
Set-Cookie: SID=C36D961EC92D152724BBCD0C34EC6536; Path=/; HttpOnly
X-Magnolia-Registration: Registered
Accept-Ranges: bytes
Cache-Control: no-cache, no-store, must-revalidate, max-age=0
ETag: 8B4901E7DD862E5E74287A0F538DCDDFEB78DE77
Content-Range: bytes 0-2000/23529
Content-Encoding: gzip
Vary: Accept-Encoding
Pragma: no-cache
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Last-Modified: Thu, 19 Dec 2019 08:52:49 GMT
Content-Type: text/html;charset=UTF-8
Content-Length: 0
Date: Thu, 19 Dec 2019 08:52:49 GMT
However, when I disable the Magnolia Cache Module I get the expected response:
/server/filters/cache -> enabled: false
curl -I -X GET \
http://localhost:8080/ \
-H 'Accept-Encoding: gzip, deflate' \
-H 'Cache-Control: no-cache' \
-H 'Range: bytes=0-2000'
HTTP/1.1 206
Set-Cookie: SID=FF557EC1F0653E5CBD81A57D599091AE; Path=/; HttpOnly
X-Magnolia-Registration: Registered
Accept-Ranges: bytes
ETag: 2A9DE4F4B2ACDDE22BAC3C07784CD65693574B67
Content-Range: bytes 0-2000/2147483647
Content-Type: text/html;charset=UTF-8
Content-Length: 2001
Date: Thu, 19 Dec 2019 08:51:49 GMT
I got the problem that the Facebook crawler isn't able to detect any open graph meta tags when trying to crawl my website. I think the reason is the above described problem with sending range requests to Magnolia (What the Facebook crawler does).
My Open Graph tags are properly set (Working for opengraphcheck and Twitter Card Validator).
I'm using Magnolia 5.7.1.
The simplest work around is to configure request header voter to bypass cache when range header is present.
See RequestHeaderPatternSimpleVoter and/or RequestHeaderPatternRegexVoter for more details on how to set it, but I would still consider it workaround and not final solution.
It seems weird that such thing should be happening. Could you replicate it against e.g. https://demo.magnolia-cms.com?

Icecast header response is both 400 and 200

I have Icecast 2.4.4 running on a Windows box at sub.domain.org. My website is on a different server at domain.org.
When I SSH into my Linux host shell and run curl to the mount point I get a response of 400, but if I do wget I get a response of 200. How can this be?
# wget https://sub.domain.org/live.mp3
--2018-12-19 17:52:58-- https://sub.domain.org/live.mp3 Resolving sub.domain.org... 111.111.111.111 Connecting to
sub.domain.org|111.111.111.111|:443... connected. HTTP request sent,
awaiting **response... 200 OK** Length: unspecified [audio/mpeg] Saving
to: `live.mp3'
[ <=> ] 96,600 3.93K/s ^C
# curl --head https://sub.domain.org/live.mp3
HTTP/1.0 **400 Bad Request**
Server: Icecast 2.4.4
Connection: Close Date: Thu, 20 Dec 2018
00:53:32 GMT Content-Type: text/html; charset=utf-8 Cache-Control:
no-cache, no-store Expires: Mon, 26 Jul 1997 05:00:00 GMT Pragma:
no-cache Access-Control-Allow-Origin: *
Because in case of cURL you are passing the --head parameter. This tells cURL to make a HTTP HEAD request instead of the HTTP GET request that wget performs.
Icecast does not support HTTP HEAD requests and thus the HTTP 400 response is fully justified.

is it a bug to send response gzip-compressed to clients that doesn't specify Accept-Encoding: gzip?

is it a bug in the server if it sends content gzip-compressed to clients that did not specify Accept-Encoding: gzip ? is it breaking the http specs? or is it legal?
i'm curious because https://www.amazon.com always sends content gzip-compressed, regardless of the Accept-Encoding header, as a simple test to confirm:
$ curl https://www.amazon.com
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
$ curl https://www.amazon.com -I
HTTP/2 405
content-type: text/html; charset=UTF-8
server: Server
date: Sat, 03 Nov 2018 11:27:35 GMT
set-cookie: skin=noskin; path=/; domain=.amazon.com
strict-transport-security: max-age=47474747; includeSubDomains; preload
x-amz-id-1: 2M3HZHHA9J21D3MTHH4K
allow: POST, GET
vary: Accept-Encoding,User-Agent,X-Amazon-CDN-Cache
content-encoding: gzip
x-amz-rid: 2M3HZHHA9J21D3MTHH4K
x-frame-options: SAMEORIGIN
x-cache: Error from cloudfront
via: 1.1 1cc4305a3ce000ca199328864ca1c98e.cloudfront.net (CloudFront)
x-amz-cf-id: OKz61IdKmCBfC97pPg-zmDhQnJzK3THXL2iYwegU5EtDaRf6yjBGzw==
curl complains that it's recieving binary data here because it's not responding with HTML, but gzip-compressed html, which is binary data. to actually see the html, add the --compressed argument, which tells curl to add the header Accept-Encoding: gzip, deflate and automatically decompress the response.
A request without an Accept-Encoding header field implies that the user agent has no preferences regarding content-codings. Although this allows the server to use any content-coding in a response, it does not imply that the user agent will be able to correctly process all encodings.
-- https://greenbytes.de/tech/webdav/rfc7231.html#rfc.section.5.3.4.p.4

How to successfully resend a POST request and get the correct respond without knowing the actual payload

I've been working on to capture a multiple post requests from an android app for testing purpose.
Unfortunately, I'm stuck in finding a way to get the actual payload of the request by using a request sender to resend the request. I could get 200 status code but I could only get a wrong respond, and that is not what I expected. I'm hoping to get any advice in here if it's possible?
The request is sent via a POST method.
The request address looks like this(from my perspective it doesn't have a body, does it?)
http://proxy.ABC.ABC.com/ABC/qryunreadmsgcount.do?d=2&m=1&t=803514
Please correct me if the description or the title needs further editing .
Cheers
=========================================================================
Edit:
this is the respond that I got
Preview: {
"respbase": {
"status": "false",
"returncode": "000002",
"returndesc": "必填参数[clientrequest]"
}
}
Server: nginx/1.14.0
Date: Thu, 19 Jul 2018 01:44:38 GMT
Content-Type: application/json;charset=UTF-8
Content-Length: 4
Connection: keep-alive
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: DNT,X-Mx-ReqToken,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type, Accept-Language, Origin, Accept-Encoding
X-Frame-Options: SAMEORIGIN
* Preparing request to http://proxy.ABC.ABC.com/vboxserver/qryunreadmsgcount.do?d=2&m=1&t=803514
* Using libcurl/7.54.0 LibreSSL/2.0.20 zlib/1.2.11 nghttp2/1.24.0
* Enable automatic URL encoding
* Enable SSL validation
* Enable cookie sending with jar of 7 cookies
* Trying 101.XXX.XXX.XXX...
* TCP_NODELAY set
* Connected to proxy.ABC.ABC.com (101.xxx.xxx.xxx) port 80 (#75)
> POST /ABC/qryunreadmsgcount.do?d=2&m=1&t=803514 HTTP/1.1
> Host: proxy.ABC.ABC.com
> User-Agent: insomnia/5.16.6
> Accept: */*
> Content-Length: 0
< HTTP/1.1 200 OK
< Server: nginx/1.14.0
< Date: Thu, 19 Jul 2018 02:13:24 GMT
< Content-Type: text/plain
< Content-Length: 96
< Connection: keep-alive
< X-Frame-Options: SAMEORIGIN
* Received 88 B chunk
* Connection #75 to host proxy.ABC.ABC.com left intact
And this is the request sender I've been using:
For those who might want to know the answer:
Burpsuite is very handy in dealing with this
:)

Getting strange http response codes, but the site is actually working

When I view the URL below or the other below in the code it's displayed fine. I don't see anything unusual in the network tab when I press F12 in the browser, but with the code below I will get response codes 403 or 400. When I use the response code checker here http://httpstatus.io/ it will come back fine with a 200 response for both URLS.
I get a 403 for http://psychsignal.com/ using my code below.
URL u = new URL("http://www.nasdaqomxnordic.com/"); //returns 400 response code
//u.toURI(); //to check the syntax
HttpURLConnection huc = (HttpURLConnection)u.openConnection();
huc.setRequestMethod("GET");
//huc.setRequestMethod("HEAD");
huc.connect();
System.out.println(huc.getResponseCode());
Thanks if anyone has any ideas! This is actually my first post!
My guess is that there's some restrictions placed on the User-Agent of the client. Some testing seems to support my theory:
If I use the curl default user agent:
# curl -I -H "User-Agent: curl/7.35.0" "http://www.nasdaqomxnordic.com/"
HTTP/1.1 400 Bad Request
Content-Type: text/html; charset=UTF-8
Cache-Control: no-cache
Pragma: no-cache
Expires: 0
Connection: close
If I use a hacked up standard browser agent string:
# curl -I -H "User-Agent: Mozilla/5.0" -0 "http://www.nasdaqomxnordic.com/"
HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 0
Content-Type: text/html;charset=UTF-8
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Wed, 22 Jul 2015 15:06:22 GMT
Connection: close
And then if I use a Java agent string (which is my guess as to what you're using):
# curl -I -H "User-Agent: Java/1.6.0_26" "http://www.nasdaqomxnordic.com/"
HTTP/1.1 400 Bad Request
Content-Type: text/html; charset=UTF-8
Cache-Control: no-cache
Pragma: no-cache
Expires: 0
Connection: close
Only the "browser" user agent gets through. I'd try tweaking your code to set the user agent string to something commonly found in a web browser.

Resources