Curl to grab remote filename after following location - unix

When downloading a file using curl, how would I follow a link location and use that for the output filename (without knowing the remote filename in advance)?
For example, if one clicks on the link below, you would download a filenamed "pythoncomplete.vim." However using curl's -O and -L options, the filename is simply the original remote-name, a clumsy "download_script.php?src_id=10872."
curl -O -L http://www.vim.org/scripts/download_script.php?src_id=10872
In order to download the file with the correct filename you would have to know the name of the file in advance:
curl -o pythoncomplete.vim -L http://www.vim.org/scripts/download_script.php?src_id=10872
It would be excellent if you could download the file without knowing the name in advance, and if not, is there another way to quickly pull down a redirected file via command line?

The remote side sends the filename using the Content-Disposition header.
curl 7.21.2 or newer does this automatically if you specify --remote-header-name / -J.
curl -O -J -L $url
The expanded version of the arguments would be:
curl --remote-name --remote-header-name --location $url

If you have a recent version of curl (7.21.2 or later), see #jmanning2k's answer.
I you have an older version of curl (like 7.19.7 which came with Snow Leopard), do two requests: a HEAD to get the file name from response header, then a GET:
url="http://www.vim.org/scripts/download_script.php?src_id=10872"
filename=$(curl -sI $url | grep -o -E 'filename=.*$' | sed -e 's/filename=//')
curl -o $filename -L $url

If you can use wget instead of curl:
wget --content-disposition $url

I wanted to comment to jmanning2k's answer but as a new user I can't, so I tried to edit his post which is allowed but the edit was rejected saying it was supposed to be a comment. sigh
Anyway, see this as a comment to his answer thanks.
This seems to only work if the header looks like filename=pythoncomplete.vim as in the example, but some sites send a header that looks like filename*=UTF-8' 'filename.zip' that one isn't recognized by curl 7.28.0

I wanted a solution that worked on both older and newer Macs, and the legacy code David provided for Snow Leopard did not behave well under Mavericks. Here's a function I created based on David's code:
function getUriFilename() {
header="$(curl -sI "$1" | tr -d '\r')"
filename="$(echo "$header" | grep -o -E 'filename=.*$')"
if [[ -n "$filename" ]]; then
echo "${filename#filename=}"
return
fi
filename="$(echo "$header" | grep -o -E 'Location:.*$')"
if [[ -n "$filename" ]]; then
basename "${filename#Location\:}"
return
fi
return 1
}
With this defined, you can run:
url="http://www.vim.org/scripts/download_script.php?src_id=10872"
filename="$(getUriFilename $url)"
curl -L $url -o "$filename"

Please note that certain malconfigured webservers will serve the name using "Filename" as key, where RFC2183 specifies it should be "filename". curl only handles the latter case.

I had the same Problem like John Cooper. I got no filename but a Location File name back. His answer also worked but are 2 commands.
This oneliner worked for me....
url="https://download.mozilla.org/?product=firefox-latest-ssl&os=linux64&lang=de";url=$(curl -L --head -w '%{url_effective}' $url 2>/dev/null | tail -n1) ; curl -O $url
Stolen and added some stuff from
https://unix.stackexchange.com/questions/126252/resolve-filename-from-a-remote-url-without-downloading-a-file

An example using the answer above for Apache Archiva artifact repository to pull latest version. The curl returns the Location line and the filename is at the end of the line. Need to remove the CR at end of file name.
url="http://archiva:8080/restServices/archivaServices/searchService/artifact?g=com.imgur.backup&a=snapshot-s3-util&v=LATEST"
filename=$(curl --silent -sI -u user:password $url | grep Location | awk -F\/ '{print $NF}' | sed 's/\r$//')
curl --silent -o $filename -L -u user:password $url

instead of applying grep and other Unix-Fu operations, curl ships with a builtin "Write Out" option variable[1] specifically for such a case, e.g.
$ curl -OJsL "http://www.vim.org/scripts/download_script.php?src_id=10872" -w "%{filename_effective}"
pythoncomplete.vim
[1] https://everything.curl.dev/usingcurl/verbose/writeout#available-write-out-variables

Using the solution proposed above, I wrote this helper function curl2file.
[UPDATED]
function curl2file() {
url=$1
url=$(curl -o /dev/null -L --head -w '%{url_effective}' $url 2>/dev/null | tail -n1) ; curl -O $url
}
Usage:
curl2file https://cloud.tsinghua.edu.cn/f/4666d28af98a4e63afb5/?dl=1

Related

Download all files of a particular type from a website using wget stops in the starting url

The following did not work.
wget -r -A .pdf home_page_url
It stop with the following message:
....
Removing site.com/index.html.tmp since it should be rejected.
FINISHED
I don't know why it only stops in the starting url, do not go into the links in it to search for the given file type.
Any other way to recursively download all pdf files in an website. ?
It may be based on a robots.txt. Try adding -e robots=off.
Other possible problems are cookie based authentication or agent rejection for wget.
See these examples.
EDIT: The dot in ".pdf" is wrong according to sunsite.univie.ac.at
the following cmd works for me, it will download pictures of a site
wget -A pdf,jpg,png -m -p -E -k -K -np http://site/path/
This is certainly because of the links in the HTML don't end up with /.
Wget will not follow this has it think it's a file (but doesn't match your filter):
page
But will follow this:
page
You can use the --debug option to see if it's the actual problem.
I don't know any good solution for this. In my opinion this is a bug.
In my version of wget (GNU Wget 1.21.3), the -A/--accept and -r/--recursive flags don't play nicely with each other.
Here's my script for scraping a domain for PDFs (or any other filetype):
wget --no-verbose --mirror --spider https://example.com -o - | while read line
do
[[ $line == *'200 OK' ]] || continue
[[ $line == *'.pdf'* ]] || continue
echo $line | cut -c25- | rev | cut -c7- | rev | xargs wget --no-verbose -P scraped-files
done
Explanation: Recursively crawl https://example.com and pipe log output (containing all scraped URLs) to a while read block. When a line from the log output contains a PDF URL, strip the leading timestamp (25 characters) and tailing request info (7 characters) and use wget to download the PDF.

mutt command with multiple attachments in single mail unix

My requirement is to attach all the .csv files in a folder and send them in a single mail.
Here is what have tried,
mutt -s "subject" -a *.csv -- abc#gmail.com < subject.txt
The above command is not working (It's not recognizing multiple files) and throwing the error
Error sending message, child exited 67 (User unknown.).
Could not send the message.
Then I tried using multiple -a option as follows,
mutt -s "subject" -a aaa.csv -a bbb.csv -- abc#gmail.com < subject.txt
This works as expected.
But this is not feasible for 100 files for example. I should be able use it with file mask (as like *.csv to take all csv files). Is there is any way we can use like *.csv in single command?
Thanks
Mutt doesn't support such syntax, but it doesn't mean it's impossible. You just have to build the mutt command.
mutt -s "subject" $( printf -- '-a %q ' *.csv ) ...
The command in $( ... ) produces something like this:
-a aaa.csv -a bbb.csv -a ...
Here is the example of sending multiple files using a single command -
mutt -s "Subject" -i "Mail_body text" email_id#abc.com -c email_cc_id#abc.com -a attachment1.pdf -a attachment2.pdf
At the end of the command line use -a for the attachment .
Some linux system have attachment size limit . Mostly it support less size .
I'm getting backslash( \ ) Additionally
Daily_Batch_Status{20131003}.PDF
Daily_System_Monitoring{20131003}.PDF
printf -- '-a %q ' *.PDF
-a Daily_Batch_Status \ {20131003 \ }.PDF -a Daily_System_Monitoring \ {20131003 \ }.PDF
#!/bin/bash
from="me#address.com"
to="target#address.com"
subject="pdfs $(date +%B) $(date +%Y)"
body="You can find the pdfs from $(date +%B) $(date +%Y)"
# here comes the attachments
mutt -s "$subject" $( printf -- ' -a %q' $PWD/*.pdf ) -- $to <<EOF
Dear Mr and Ms,
$(echo $body)
$(cat ~/.signature)
EOF
but it does not work with escape characters in file name like "\[5\]" which can come in MacOs.
I created as a script and collect needed PDFs in a folder and just run the script from that location. So monthly reports are sent... it does not matter how many pdfs (number can vary) but also there should be no white space.

Pipe output of cat to cURL to download a list of files

I have a list URLs in a file called urls.txt. Each line contains 1 URL. I want to download all of the files at once using cURL. I can't seem to get the right one-liner down.
I tried:
$ cat urls.txt | xargs -0 curl -O
But that only gives me the last file in the list.
This works for me:
$ xargs -n 1 curl -O < urls.txt
I'm in FreeBSD. Your xargs may work differently.
Note that this runs sequential curls, which you may view as unnecessarily heavy. If you'd like to save some of that overhead, the following may work in bash:
$ mapfile -t urls < urls.txt
$ curl ${urls[#]/#/-O }
This saves your URL list to an array, then expands the array with options to curl to cause targets to be downloaded. The curl command can take multiple URLs and fetch all of them, recycling the existing connection (HTTP/1.1), but it needs the -O option before each one in order to download and save each target. Note that characters within some URLs ] may need to be escaped to avoid interacting with your shell.
Or if you are using a POSIX shell rather than bash:
$ curl $(printf ' -O %s' $(cat urls.txt))
This relies on printf's behaviour of repeating the format pattern to exhaust the list of data arguments; not all stand-alone printfs will do this.
Note that this non-xargs method also may bump up against system limits for very large lists of URLs. Research ARG_MAX and MAX_ARG_STRLEN if this is a concern.
A very simple solution would be the following:
If you have a file 'file.txt' like
url="http://www.google.de"
url="http://www.yahoo.de"
url="http://www.bing.de"
Then you can use curl and simply do
curl -K file.txt
And curl will call all Urls contained in your file.txt!
So if you have control over your input-file-format, maybe this is the simplest solution for you!
Or you could just do this:
cat urls.txt | xargs curl -O
You only need to use the -I parameter when you want to insert the cat output in the middle of a command.
xargs -P 10 | curl
GNU xargs -P can run multiple curl processes in parallel. E.g. to run 10 processes:
xargs -P 10 -n 1 curl -O < urls.txt
This will speed up download 10x if your maximum download speed if not reached and if the server does not throttle IPs, which is the most common scenario.
Just don't set -P too high or your RAM may be overwhelmed.
GNU parallel can achieve similar results.
The downside of those methods is that they don't use a single connection for all files, which what curl does if you pass multiple URLs to it at once as in:
curl -O out1.txt http://exmple.com/1 -O out2.txt http://exmple.com/2
as mentioned at https://serverfault.com/questions/199434/how-do-i-make-curl-use-keepalive-from-the-command-line
Maybe combining both methods would give the best results? But I imagine that parallelization is more important than keeping the connection alive.
See also: Parallel download using Curl command line utility
Here is how I do it on a Mac (OSX), but it should work equally well on other systems:
What you need is a text file that contains your links for curl
like so:
http://www.site1.com/subdirectory/file1-[01-15].jpg
http://www.site1.com/subdirectory/file2-[01-15].jpg
.
.
http://www.site1.com/subdirectory/file3287-[01-15].jpg
In this hypothetical case, the text file has 3287 lines and each line is coding for 15 pictures.
Let's say we save these links in a text file called testcurl.txt on the top level (/) of our hard drive.
Now we have to go into the terminal and enter the following command in the bash shell:
for i in "`cat /testcurl.txt`" ; do curl -O "$i" ; done
Make sure you are using back ticks (`)
Also make sure the flag (-O) is a capital O and NOT a zero
with the -O flag, the original filename will be taken
Happy downloading!
As others have rightly mentioned:
-cat urls.txt | xargs -0 curl -O
+cat urls.txt | xargs -n1 curl -O
However, this paradigm is a very bad idea, especially if all of your URLs come from the same server -- you're not only going to be spawning another curl instance, but will also be establishing a new TCP connection for each request, which is highly inefficient, and even more so with the now ubiquitous https.
Please use this instead:
-cat urls.txt | xargs -n1 curl -O
+cat urls.txt | wget -i/dev/fd/0
Or, even simpler:
-cat urls.txt | wget -i/dev/fd/0
+wget -i/dev/fd/0 < urls.txt
Simplest yet:
-wget -i/dev/fd/0 < urls.txt
+wget -iurls.txt

unix command to extract part of a hostname

I would like to extract the first part of this hostname testsrv1
from testsrv1.main.corp.loc.domain.com in UNIX, within a shell script.
What command can I use? It would be anything before the first period .
Do you have the server name in a shell variable? Are you using a sh-like shell? If so,
${SERVERNAME%%.*}
will do what you want.
You can use cut:
echo "testsrv1.main.corp.loc.domain.com" | cut -d"." -f1
To build upon pilcrow's answer, no need for new variable, just use inbuilt $HOSTANME.
echo $HOSTNAME-->my.server.domain
echo ${HOSTNAME%%.*}-->my
Tested on two fairly different Linux's.
2.6.18-371.4.1.el5, GNU bash, version 3.2.25(1)-release (i386-redhat-linux-gnu)
3.4.76-65.111.amzn1.x86_64, GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
try the -s switch:
hostname -s
I use command cut, awk, sed or bash variables
Operation
Via cut
[flying#lempstacker ~]$ echo "testsrv1.main.corp.loc.domain.com" | cut -d. -f1
testsrv1
[flying#lempstacker ~]$
Via awk
[flying#lempstacker ~]$ echo "testsrv1.main.corp.loc.domain.com" | awk -v FS='.' '{print $1}'
testsrv1
[flying#lempstacker ~]$
Via sed
[flying#lempstacker ~]$ echo "testsrv1.main.corp.loc.domain.com" | sed -r 's#([^.]*).(.*)#\1#g'
testsrv1
[flying#lempstacker ~]$
Via Bash Variables
[flying#lempstacker ~]$ hostName='testsrv1.main.corp.loc.domain.com'
[flying#lempstacker ~]$ echo ${hostName%%.*}
testsrv1
[flying#lempstacker ~]$
You could have used "uname -n" to just get the hostname only.
You can use IFS to split text by whichever token you want. For domain names, we can use the dot/period character.
#!/usr/bin/env sh
shorthost() {
# Set IFS to dot, so that we can split $# on dots instead of spaces.
local IFS='.'
# Break up arguments passed to shorthost so that each domain zone is
# a new index in an array.
zones=($#)
# Echo out our first zone
echo ${zones[0]}
}
If this is in your script then, for instance, you'll get test when you run shorthost test.example.com. You can adjust this to fit your use case, but knowing how to break the zones into the array is the big thing here, I think.
I wanted to provide this solution, because I feel like spawning another process is overkill when you can do it easily and completely within your shell with IFS. One thing to watch out for is that some users will recommend doing things like hostname -s, but that doesn't work in the BSD userland. For instance, MacOS users don't have the -s flag, I don't think.
Assuming the variable $HOSTNAME exists, so try echo ${HOSTNAME%%.*} to get the top-most part of the full-qualified hostname. Hope it helps.
If interested, the hint is from the below quoted partial /etc/bashrc on a REHL7 host:
if [ -e /etc/sysconfig/bash-prompt-screen ]; then
PROMPT_COMMAND=/etc/sysconfig/bash-prompt-screen
else
PROMPT_COMMAND='printf "\033k%s#%s:%s\033\\" "${USER}" "${HOSTNAME%%.*}" "${PWD/#$HOME/~}"'
fi
;; ... ```

How to use pastebin from shell script?

Is it possible to use pastebin (may be via their "API" functionality) inside bash shell scripts? How do I send http-post? How do I get back the URL?
As pastebin.com closed their public api, I was looking for alternatives.
Sprunge is great. Usage:
<command> | curl -F 'sprunge=<-' http://sprunge.us
or, as I use it:
alias paste="curl -F 'sprunge=<-' http://sprunge.us"
<command> | paste
The documentation says that you need to submit a POST request to
http://pastebin.com/api_public.php
and the only mandatory parameter is paste_code, of type string is the paste that you want to make.
On success a new pastebin URL will be returned.
You can easily do this from your bash shell using the command curl.
curl uses the -d option to send the POST data to the specified URL.
Demo:
This demo will create a new paste with the code:
printf("Hello..I am Codaddict");
From your shell:
$ curl -d 'paste_code=printf("Hello..I am Codaddict");' 'http://pastebin.com/api_public.php'
http://pastebin.com/598VLDZp
$
Now if you see the URL http://pastebin.com/598VLDZp, you'll see my paste :)
Alternatively you can do it using the wget command which uses the option --post-data to sent POST values.
I've tried this command it works fine:
wget --post-data 'paste_code=printf("Hello..I am Codaddict");' 'http://pastebin.com/api_public.php'
Put the following in your .bashrc:
sprunge() {
if [[ $1 ]]; then
curl -F 'sprunge=<-' "http://sprunge.us" <"$1"
else
curl -F 'sprunge=<-' "http://sprunge.us"
fi
}
...and then you can run:
sprunge filename # post file to sprunge
...or...
some_command | sprunge # pipe output to sprunge
The API for posting to pastebin has changed, since posted by codaddict.
Details can be found at this link: https://pastebin.com/api
Example:
curl -d 'api_paste_code=printf("Hello..\n I am Codaddict");' \
-d 'api_dev_key=<get_your_own>' \
-d 'api_option=paste' 'http://pastebin.com/api/api_post.php'
There are three essential fields as of now:
api_dev_key -> You need to create a login on pastebin.com in order to get that
api_option -> Format in which to post
api_paste_code -> Text you want to post
Two other answers (from circa 2014) point to http://sprunge.us, which is designed to be used like this...
curl --form 'sprunge=#yourfile.txt' sprunge.us
However, as of 2018, sprunge.us has a tendency to be overloaded and return 500 Internal Server Error to every request. For files up to at least 300 KB but not as high as 2.8 MB, I have had good luck with the very similar service at http://ix.io:
curl --form 'f:1=#yourfile.txt' ix.io
For files up to at least 2.8 MB (and maybe higher, I don't know), I've found the more highly polished https://transfer.sh. It recommends a slightly different and simpler command line, and requires https (it won't work without it):
curl --upload-file yourfile.txt https://transfer.sh
I have found that Sprunge is currently down, but dpaste.com has a simple API.
To post from STDIN
curl -s -F "content=<-" http://dpaste.com/api/v2/
from a file foo.txt
cat foo.txt | curl -s -F "content=<-" http://dpaste.com/api/v2/
to post a string
curl -s -F "content=string" http://dpaste.com/api/v2/
The response will be a plain text URL to the paste.
Nb: the trailing / in the URL http://dpaste.com/api/v2/ seems necessary
https://paste.c-net.org/ has a simpler API than all of them. Simply "POST" to it.
From the website:
Upload text using curl:
$ curl -s --data 'Hello World!' 'https://paste.c-net.org/'
Upload text using wget:
$ wget --quiet -O- --post-data='Hello World!' 'https://paste.c-net.org/'
Upload a file using curl:
$ curl --upload-file #'/tmp/file' 'https://paste.c-net.org/'
Upload a file using wget:
$ wget --quiet -O- --post-file='/tmp/file' 'https://paste.c-net.org/'
Upload the output of a command or script using curl:
$ ls / | curl --upload-file - 'https://paste.c-net.org/'
$ ./bin/hello_world | curl -s --data-binary #- 'https://paste.c-net.org/'
You can also simply use netcat. Unlike termbin, paste.c-net.org won't time out if your script takes more than 5 seconds to produce its output.
$ { sleep 10; ls /; } | nc termbin.com 9999
$ { sleep 10; ls /; } | nc paste.c-net.org 9999
https://paste.c-net.org/ExampleOne
Easiest way to post to pastebin
echo 'your message' | sed '1s/^/api_paste_code=/g' | sed 's/$/\%0A/g' | curl -d #- -d 'api_dev_key=<your_api_key>' -d 'api_option=paste' 'http://pastebin.com/api/api_post.php'
Just change the <your_api_key> part and pipe whatever you want into it.
The sed invocations add the api_paste_code parameter to beginning of the message and add a newline at the end of each line so it can handle multiline input. The #- tells curl to read from stdin.
A Bash Function You Can Paste
For easy reuse, make it a bash function (copy and paste this into your terminal and set the API_KEY field appropriately:
pastebin () {
API_KEY='<your_api_key>'
if [ -z $1 ]
then
cat - | sed '1s/^/api_paste_code=/g' | sed 's/$/\%0A/g' | curl -d #- -d 'api_dev_key='"$API_KEY"'' -d 'api_option=paste' 'http://pastebin.com/api/api_post.php'
else
echo "$1" | sed '1s/^/api_paste_code=/g' | sed 's/$/\%0A/g' | curl -d #- -d 'api_dev_key='"$API_KEY"'' -d 'api_option=paste' 'http://pastebin.com/api/api_post.php'
fi
printf '\n'
}
You can run it with either:
pastebin 'your message'
or if you need to pipe a file into it:
cat your_file.txt | pastebin
To built upon Vishal's answer, pastebin has upgraded to only use HTTPS now:
curl -d 'api_paste_code=printf("Hello World");' \
-d 'api_dev_key=<your_key>' \
-d 'api_option=paste' 'https://pastebin.com/api/api_post.php'
You don't have to specify the -X POST parameter
Additional details can be found here:
https://pastebin.com/doc_api#1
Based on another answer on this page, I wrote the following script which reads from STDIN (or assumes output it piped into it).
This version allows for arbitrary data which is URI escaped (by jq).
#!/bin/bash
api_key=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
curl -d "api_paste_code=$(jq -sRr #uri)" \
-d "api_dev_key=$api_key" \
-d 'api_option=paste' 'https://pastebin.com/api/api_post.php'
echo # By default, there's no newline
I am a bit late to this post, but I created a little tool to help with this.
https://pasteshell.com/
Feel free to check it out and let me know what you think.
Thanks,

Resources