How to download XML web page with Wget - unix

I want to download an XML from the web using Unix wget.
In principle to simple get it and save it into a file.
This is the command I use:
wget http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=Alum+AND+Adjuvant&retmax=100 --output-document=test.xml
But if failed to download it. What's the right way to go?

You must quote the url like
wget "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=Alum+AND+Adjuvant&retmax=100" --output-document=test.xml
since the url contains meta characters that influence the processing of the line.

if the --output-document does not work, you can use the -O
wget -O test.xml
"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=Alum+AND+Adjuvant&retmax=100"

wget --output-document=test.xml "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=Alum+AND+Adjuvant&retmax=100"

Related

Wget download pdf

I am trying to download a pdf file using wget.
When I do:
wget <url> it downloads a corrupted file however if I run wget -i test.txt with the pdf URL inside this test txt file it works and the file is not corrupted.
Does anyone know why?
From the logs I can see the following.
In the first case, it is downloading a note found page.
Length: 11322 (11K) [text/html] Saving to: ‘media.nl?id=39194.1’
In the second it is a proper pdf.
Length: 58272 (57K) [application/pdf] Saving to:
‘media.nl?id=39194&c=4667446&h=34c63dbaaa7adc7c8a33&_xt=.pdf’
Thanks,
Put your URL into quotes. Not quoting the URL can lead to strange effects, in your case the & is interpreted by the shell.
E.g.
wget "https://www.roofingsuppliesuk.co.uk/core/media/media.nl?id=39194&c=4667446&h=34c63dbaaa7adc7c8a33&_xt=.pdf"
or
wget 'https://www.roofingsuppliesuk.co.uk/core/media/media.nl?id=39194&c=4667446&h=34c63dbaaa7adc7c8a33&_xt=.pdf'
or with escaping of &
wget https://www.roofingsuppliesuk.co.uk/core/media/media.nl?id=39194\&c=4667446\&h=34c63dbaaa7adc7c8a33\&_xt=.pdf
I got the same issue but I changed the command to this and then it worked fine when i tested it:
Wget —-no-check-certificate https://www.roofingsuppliesuk.co.uk/core/media/'media.nl?id=39194&c=4667446&h=34c63dbaaa7adc7c8a33&_xt=.pdf'
i just added single quotes beginning at 'media.nl.......pdf'
Make sure the file with same name doesnt exist. You dont need to add --no-check-certificate if you dont get self-signed certificate error

How to make a groovy script which uploads a file to JFrog's artifactory

I'm trying to write a simple Groovy script which deploys a text file into my artifactory. I read the REST API in order to understand how to write the script but I've seen so many vastly different versions online I'm confused.
I want it to be a simple groovy script using the REST API and curl.
This is what JFrog are suggesting in their website:
curl -u myUser:myP455w0rd! -X PUT "http://localhost:8081/artifactory/my-repository/my/new/artifact/directory/file.txt" -T Desktop/myNewFile.txt
And it might work perfectly but I don't understand each part here, and I don't know if I can simply integrate this into a groovy script as is or some adjustments are needed.
I'm a beginner in this field and I would love any help!
Thanks in advance
As you are using the '-T' flag it is not required also to use the '-X PUT'.
Also, the use of '-T' allows you to not specify the file name on the destination so for example, your path will be "http://localhost:8081/artifactory/my-repository/my/new/artifact/directory/' and the file name will be the same as it is on the origin.
The full command will look like that:
curl -u user:password -T Desktop/myNewFile.txt "http://localhost:8081/artifactory/my-repository/my/new/artifact/directory/"
Now just to be on the safe side, you are going to have the file name and path to file on the destination as variables right?
The -T flag should only be used for uploading files so don't take it as obvious that you can replace all '-X PUT' with '-T' but for this specific case of uploading a file, it is possible.

How to download a http file in unix that required login befor downloading?

I would like to download file from a website that required login prior to downloading. I tried w3p and I can open it but I don't how how to download it. What I tried is:
w3m https://services.appliedgenomics.org/sequences-export/536-RNA-seq_Disco_TuDO/
then I give my user name and passward to go in to directory where my desired file is and open it.
Now file is open how can I download it?
You can use curl to login and download a file.
syntax
curl -u username:password http://example.com
you can also specify download location using -o option.
curl -o ~/Desktop/myfile.pdf http://url-to-file/abcedfeghijklmnop.pdf
This page has good tutorial on the usage of curl
http://www.cyberciti.biz/faq/curl-download-file-example-under-linux-unix/

wget - file name extension for incomplete downloads (e.g. file.zip.incomplete or file.zip.part)

I want to mirror http and ftp directories with wget and I want to identify incomplete downloads of wget. Is there a way that incomplete downloads get an additional file extension like ".part" or ".incomplete"?
Sometimes I don't have a download log and I don't know the exact file size of a file. (The downloads are often not complete because of bad ftp/http server or bad internet connection)
For only one download of a file I could write a kind of wrapper:
wget -c -O file.zip.part http://domain.tld/file.zip
(if finished) mv file.zip.part file.zip
I am not sure how to do it for directories.
Kind regards
matt

Download Folder including Subfolder via wget from Dropbox link to Unix Server

I have a dropbox link like https://www.dropbox.com/sh/w4366ttcz6/AAB4kSz3adZ which opens the ususal dropbox site with folders and files.
Is there any chance to download the complete content (tar or directly as sync) to a unix machine using wget?
I have seen some posts here where single files were downloaded but could not find any answer to this. There is an api from Dropbox but that does not work on my server due to the 64 bit issue on my server and http://www.dropboxwiki.com/dropbox-addons/dropbox-gallery-download#BASH_Version does also not work for me.... any other suggestions?
This help article documents some parameters you can use to get different behaviors from Dropbox shared links:
https://www.dropbox.com/help/201
For example, using this link:
https://www.dropbox.com/sh/igoku2mqsjqsmx1/AAAeF57DR2ou_nZGC4JPoQKfa
We can use the dl parameter to get a direct download. Using curl, we can download it as such:
curl -L https://www.dropbox.com/sh/igoku2mqsjqsmx1/AAAeF57DR2ou_nZGC4JPoQKfa?dl=1 > download.zip
(The -L is necessary in order to follow redirects.)
Or, with wget, something like:
wget --max-redirect=20 -O download.zip https://www.dropbox.com/sh/igoku2mqsjqsmx1/AAAeF57DR2ou_nZGC4JPoQKfa
You can use --content-disposition with wget too.
wget https://www.dropbox.com/sh/igoku2mqsjqsmx1/AAAeF57DR2ou_nZGC4JPoQKfa --content-disposition
It will auto-detect the folder name as the zip filename.
Currently, you're probably better off creating an app that you don't publish, which can either access all your files, or just a dedicated app folder (safer). Click the generate API token button about halfway down the app's settings page, and store it securely! You can then use the dedicated download or zip download API calls to get your files from anywhere like so:
curl -X POST https://content.dropboxapi.com/2/files/download_zip \
--header "Authorization: Bearer $MY_DROPBOX_API_TOKEN" \
--header 'Dropbox-API-Arg: {"path": "/path/to/directory"}' \
> useful-name.zip
Adding your token as an environment variable makes it easier & safer to type/script these operations. If you're using BASH, and you have ignorespace in your $HISTCONTROL you can just type + paste your key with a leading space so it's not saved in your history. For frequent use, save it in a file with 0600 permissions that you can source, as you would an SSH key.
export MY_DROPBOX_API_TOKEN='...'
Yes you can as it is pretty wasy follow below steps
Firstly, get the dropbox share link. It will look like this https://www.dropbox.com/s/ad2arn440pu77si/test.txt
Then add a “?dl=1” to the end of that url and a “-O filename” so that you end up with something like this: wget https://www.dropbox.com/s/ad2arn440pu77si/test.txt?dl=1 -O test.txt
Now you can easily get files onto your linux.

Resources