add curl variable url to output file along with data - unix

Is there a way to add the url you are using in curl to the output file? I have a url string with a variable and each record set that is found, I want to include the url in the output file. URL example i am using is http://history/[1980-2012]/[1-12]/[1-31].hist.htm

If you add -i to your command you'll get the HTTP headers in the output as well with "Location:" on the 2nd line being what you want.
For example, this command:
curl -i -o test.html http://google.com
Generates this output in a file:
HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Wed, 07 Mar 2012 21:00:39 GMT
Expires: Fri, 06 Apr 2012 21:00:39 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 219
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
here.
</BODY></HTML>

Related

Telnet on Linux - can i set Transfer-encoding: chunked as option of GET?

I want to get chunked response from server, but i can't understand - what wrong i did in this terminal log:
telnet www.google.com 80
Trying 172.217.20.36...
Connected to www.google.com.
Escape character is '^]'.
GET / HTTP/1.1
Transfer-Encoding: chunked
HTTP/1.1 302 Found
Location: http://www.google.com/sorry/index?continue=http://www.google.com/&q=EgRVjAHRGPGutfEFIhkA8aeDS1HQdWrbrx7jkGSfPgX8M5Ou6VMLMgFy
Date: Sun, 26 Jan 2020 09:10:10 GMT
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Cache-Control: no-store, no-cache, must-revalidate
Content-Type: text/html; charset=UTF-8
Server: HTTP server (unknown)
Content-Length: 325
X-XSS-Protection: 0
Connection: close
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
here.
</BODY></HTML>
I choose as option:
Transfer-Encoding: chunked
But i still get Content-Length: 325, and all content is not chunked encoded.
How to make GET request with chunked encoding using telnet?
1) You can't force a server to use chunked encoding in the response.
2) Setting "Transfer-Encoding" on the request means that you are sending using chunked encoding.

Odd cookie set by WordPress installed in a sub directory

I want to install & configure my WordPress site in /journal like:
https://example.com/journal/
After my installation, when I try to access /wp-admin, they say cookie settings haven't been configured within my browser and I fail to log in. When I hit curl:
$ curl -I localhost/journal/wp-login.php
HTTP/1.1 200 OK
Date: Tue, 13 Feb 2018 12:02:28 GMT
Server: Apache/2.4.6 (Amazon Linux 2) PHP/7.2.0
X-Powered-By: PHP/7.2.0
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Set-Cookie: wordpress_test_cookie=WP+Cookie+check; path=/journal/journal/; secure
X-Frame-Options: SAMEORIGIN
Content-Type: text/html; charset=UTF-8
I suppose the cookie path being /journal/journal/ is the reason I can't log in properly. What kind of additional configuration is needed to set my cookies properly?

cURL response missing Location field

I am curling to a server that I own and getting a response. But this response is missing a Location field. Can anyone help me figure out why this is happening?
This is the curl:
curl -I http://example.server.net/
And this the the response I get:
HTTP/1.1 200 OK
Date: Fri, 18 Jul 2014 15:08:53 GMT
Server: Apache/2.2.22 (Ubuntu)
Content-Length: 15
Vary: Accept-Encoding
Content-Type:
As you can see there is no Location field that responds with something like:
Location: http://example.server.net
I know it is a valid field from other curls. I am looking for reasons why some basic fields might not be shown and possible solutions
Thanks!
You will get Location header when response code is 3xx.
For example:
$ curl -I http://g.cn
HTTP/1.1 301 Moved Permanently
Location: http://www.google.cn/
Date: Fri, 18 Jul 2014 15:29:10 GMT
Expires: Fri, 18 Jul 2014 15:29:10 GMT
Cache-Control: private, max-age=2592000
Content-Type: text/html; charset=UTF-8
Server: gws
Content-Length: 218
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Alternate-Protocol: 80:quic,80:quic

IIS "Object moved to" when user-agent not included

Enter this curl command:
curl -i "http://www.takeoffvideo.com/recurly"
And I get
HTTP/1.1 302 Found
Location: /
Server: Microsoft-IIS/7.5
Set-Cookie: ASP.NET_SessionId=m0fmpkgsgxhblhmzgdyj1ttv; path=/; HttpOnly
Set-Cookie: 51D=634601722298776937; expires=Fri, 31-Dec-9999 23:59:59 GMT; path=/
Date: Thu, 22 Dec 2011 17:17:09 GMT
Content-Length: 118
<html><head><title>Object moved</title></head><body>
<h2>Object moved to here.</h2>
</body></html>
Then if I add the user agent header, it works:
curl -i -A "" "http://www.takeoffvideo.com/recurly"
It works! Anyone have a reason for this? It's an asp.net mvc app using .net 4 and all the latest.

How to crawl a wordpress blog?

I write a c program to crawl blogs. It works well until it meets this blog: www.ipujia.com. I send the HTTP request:
GET http://www.ipujia.com/ HTTP/1.0
to the website and get the response as below:
HTTP/1.1 301 Moved Permanently
Date: Sun, 27 Feb 2011 13:15:26 GMT
Server: Apache/2.2.16 (Unix) mod_ssl/2.2.16 OpenSSL/0.9.8e-fips-rhel5
mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635 mod_perl/2.0.4
Perl/v5.8.8
X-Powered-By: PHP/5.2.14
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Last-Modified: Sun, 27 Feb 2011 13:15:27 GMT
Location: http://http/www.ipujia.com/
Content-Length: 0
Connection: close
Content-Type: text/html; charset=UTF-8
This is strange because I cannot get the index page following the Location. Does anyone have any ideas?
The Location field in the response contains a malformed URI.
Location: http://http/www.ipujia.com/ (notice the protocol error)
Should be
Location: http://www.ipujia.com/
Unless you are in control of the server there is little you could do here.
To solve it could you not parse the "Location" response and attempt to extract a valid URI from the it?

Resources