Wget download pdf - unix

I am trying to download a pdf file using wget.
When I do:
wget <url> it downloads a corrupted file however if I run wget -i test.txt with the pdf URL inside this test txt file it works and the file is not corrupted.
Does anyone know why?
From the logs I can see the following.
In the first case, it is downloading a note found page.
Length: 11322 (11K) [text/html] Saving to: ‘media.nl?id=39194.1’
In the second it is a proper pdf.
Length: 58272 (57K) [application/pdf] Saving to:
‘media.nl?id=39194&c=4667446&h=34c63dbaaa7adc7c8a33&_xt=.pdf’
Thanks,

Put your URL into quotes. Not quoting the URL can lead to strange effects, in your case the & is interpreted by the shell.
E.g.
wget "https://www.roofingsuppliesuk.co.uk/core/media/media.nl?id=39194&c=4667446&h=34c63dbaaa7adc7c8a33&_xt=.pdf"
or
wget 'https://www.roofingsuppliesuk.co.uk/core/media/media.nl?id=39194&c=4667446&h=34c63dbaaa7adc7c8a33&_xt=.pdf'
or with escaping of &
wget https://www.roofingsuppliesuk.co.uk/core/media/media.nl?id=39194\&c=4667446\&h=34c63dbaaa7adc7c8a33\&_xt=.pdf

I got the same issue but I changed the command to this and then it worked fine when i tested it:
Wget —-no-check-certificate https://www.roofingsuppliesuk.co.uk/core/media/'media.nl?id=39194&c=4667446&h=34c63dbaaa7adc7c8a33&_xt=.pdf'
i just added single quotes beginning at 'media.nl.......pdf'
Make sure the file with same name doesnt exist. You dont need to add --no-check-certificate if you dont get self-signed certificate error

Related

How to make a groovy script which uploads a file to JFrog's artifactory

I'm trying to write a simple Groovy script which deploys a text file into my artifactory. I read the REST API in order to understand how to write the script but I've seen so many vastly different versions online I'm confused.
I want it to be a simple groovy script using the REST API and curl.
This is what JFrog are suggesting in their website:
curl -u myUser:myP455w0rd! -X PUT "http://localhost:8081/artifactory/my-repository/my/new/artifact/directory/file.txt" -T Desktop/myNewFile.txt
And it might work perfectly but I don't understand each part here, and I don't know if I can simply integrate this into a groovy script as is or some adjustments are needed.
I'm a beginner in this field and I would love any help!
Thanks in advance
As you are using the '-T' flag it is not required also to use the '-X PUT'.
Also, the use of '-T' allows you to not specify the file name on the destination so for example, your path will be "http://localhost:8081/artifactory/my-repository/my/new/artifact/directory/' and the file name will be the same as it is on the origin.
The full command will look like that:
curl -u user:password -T Desktop/myNewFile.txt "http://localhost:8081/artifactory/my-repository/my/new/artifact/directory/"
Now just to be on the safe side, you are going to have the file name and path to file on the destination as variables right?
The -T flag should only be used for uploading files so don't take it as obvious that you can replace all '-X PUT' with '-T' but for this specific case of uploading a file, it is possible.

How to download a http file in unix that required login befor downloading?

I would like to download file from a website that required login prior to downloading. I tried w3p and I can open it but I don't how how to download it. What I tried is:
w3m https://services.appliedgenomics.org/sequences-export/536-RNA-seq_Disco_TuDO/
then I give my user name and passward to go in to directory where my desired file is and open it.
Now file is open how can I download it?
You can use curl to login and download a file.
syntax
curl -u username:password http://example.com
you can also specify download location using -o option.
curl -o ~/Desktop/myfile.pdf http://url-to-file/abcedfeghijklmnop.pdf
This page has good tutorial on the usage of curl
http://www.cyberciti.biz/faq/curl-download-file-example-under-linux-unix/

Download Folder including Subfolder via wget from Dropbox link to Unix Server

I have a dropbox link like https://www.dropbox.com/sh/w4366ttcz6/AAB4kSz3adZ which opens the ususal dropbox site with folders and files.
Is there any chance to download the complete content (tar or directly as sync) to a unix machine using wget?
I have seen some posts here where single files were downloaded but could not find any answer to this. There is an api from Dropbox but that does not work on my server due to the 64 bit issue on my server and http://www.dropboxwiki.com/dropbox-addons/dropbox-gallery-download#BASH_Version does also not work for me.... any other suggestions?
This help article documents some parameters you can use to get different behaviors from Dropbox shared links:
https://www.dropbox.com/help/201
For example, using this link:
https://www.dropbox.com/sh/igoku2mqsjqsmx1/AAAeF57DR2ou_nZGC4JPoQKfa
We can use the dl parameter to get a direct download. Using curl, we can download it as such:
curl -L https://www.dropbox.com/sh/igoku2mqsjqsmx1/AAAeF57DR2ou_nZGC4JPoQKfa?dl=1 > download.zip
(The -L is necessary in order to follow redirects.)
Or, with wget, something like:
wget --max-redirect=20 -O download.zip https://www.dropbox.com/sh/igoku2mqsjqsmx1/AAAeF57DR2ou_nZGC4JPoQKfa
You can use --content-disposition with wget too.
wget https://www.dropbox.com/sh/igoku2mqsjqsmx1/AAAeF57DR2ou_nZGC4JPoQKfa --content-disposition
It will auto-detect the folder name as the zip filename.
Currently, you're probably better off creating an app that you don't publish, which can either access all your files, or just a dedicated app folder (safer). Click the generate API token button about halfway down the app's settings page, and store it securely! You can then use the dedicated download or zip download API calls to get your files from anywhere like so:
curl -X POST https://content.dropboxapi.com/2/files/download_zip \
--header "Authorization: Bearer $MY_DROPBOX_API_TOKEN" \
--header 'Dropbox-API-Arg: {"path": "/path/to/directory"}' \
> useful-name.zip
Adding your token as an environment variable makes it easier & safer to type/script these operations. If you're using BASH, and you have ignorespace in your $HISTCONTROL you can just type + paste your key with a leading space so it's not saved in your history. For frequent use, save it in a file with 0600 permissions that you can source, as you would an SSH key.
export MY_DROPBOX_API_TOKEN='...'
Yes you can as it is pretty wasy follow below steps
Firstly, get the dropbox share link. It will look like this https://www.dropbox.com/s/ad2arn440pu77si/test.txt
Then add a “?dl=1” to the end of that url and a “-O filename” so that you end up with something like this: wget https://www.dropbox.com/s/ad2arn440pu77si/test.txt?dl=1 -O test.txt
Now you can easily get files onto your linux.

How to download XML web page with Wget

I want to download an XML from the web using Unix wget.
In principle to simple get it and save it into a file.
This is the command I use:
wget http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=Alum+AND+Adjuvant&retmax=100 --output-document=test.xml
But if failed to download it. What's the right way to go?
You must quote the url like
wget "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=Alum+AND+Adjuvant&retmax=100" --output-document=test.xml
since the url contains meta characters that influence the processing of the line.
if the --output-document does not work, you can use the -O
wget -O test.xml
"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=Alum+AND+Adjuvant&retmax=100"
wget --output-document=test.xml "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=Alum+AND+Adjuvant&retmax=100"

Error "Invalid Parameter" fom ImageMagick convert on Windows

I am trying to convert a PDF document into a PNG file using ImageMagick command line tools from a ASP.NET website. I create a new shell process and ahve it execute the following command:
convert -density 96x96 "[FileNameAndPath].pdf" "[FileNameAndPath].png"
This runs well when testing the website on my local machine with the ASP.NET Develeopment Server of VS and the command also works well when manually entered into the shell. When running from the programatically created shell in ASP.NET there is the following error message:
Invalid Parameter - 96x96
Does anybody know why that happens and what to do?
I have tested the command while being logged in on the server via RDP with a different user account than the ASP.NET process. I have used exactly the same ImageMagick and Ghostscript installation files as on my local machine and have activated adding the ImageMagick installation path to the enironment variables during installing. The server has not been rebooted since than.
convert is also the name of a windows executable which converts FAT filesystem to NTFS. When you do not specify the full path of an executable, quote:
...the system first searches the current working directory and then
searches the path environment variable, examining each
directory from left to right, looking for an executable filename that
matches the command name given.
"C:\Windows\System32" is generally present in the beginning of %PATH% variable, causing the Windows convert utility to launch, which fails with "Invalid Parameter" error as expected.
Try specifying the full path of the ImageMagick's convert.exe like so:
"C:\Program Files\ImageMagick\convert.exe" -density 96x96 "path_and_filename.pdf" "path_and_filename.png"
As others have stated convert points to a different program in your PATH. Instead preface your command with magick. So your command would instead be:
magick convert -density 96x96 "[FileNameAndPath].pdf" "[FileNameAndPath].png"
In Window actually exists a "convert.exe" in system32 - make sure your script doesn't start that one (maybe the environment paths on your development machine are set differently).
I am only answering this late because imagemagick was updated. Now, if you wish to use the "convert" command, you do it like this:
magick convert "image.png" "document.pdf"
or
magick convert "image_00*.png" "document.pdf"
for multiple images.
Same syntax for command, just add magick before it
A couple more options for fixing this:
Edit your Path system variable to contain the path to imagemagick
as it's first content and then add the rest after it. This will make
windows always find the imagemagic convert first before it finds
the other convert program. So something like this: C:\Program Files\ImageMagick-6.9.2-Q16;C:\Program Files\Haskell Platform\2014.2.0.0\lib\extralibs\bin;...
Another option is to create a dedicated folder somewhere on your machine where you will place shortcuts for some of these name clashes. Then what you do is that you rename those shortcuts to meaningful names, for example convert_image_magick, then add the path to this folder to your system path. So now as you hit tab more, you will finally find the right program you want to run
yes! if you launch an Administrator command window it defaults to C:\windows\sytem32\ ... as long as you're not in that directory the command will pickup the ImageMagick convert.exe
My issue was I was using the "FORFILES" command which is tricky because it requires using
"cmd /c" and passing the convert command with #path and #file parameters and it does some escaping of slashes... needless to say it's caused me hours and hours of headache. It even parses hex characters, like if your filepath has the combination 0x00 in it, it will think that's a hex value and mangle your path. I had a filepath named C:\ImageRes3000x3000
and FORFILES interprets that literally and it caused a strange path issue. Sorry if this is a long useless post but it's meant to be FYI, if someone runs across this, maybe it will help them. That being said, FORFILES and "convert.exe" are a powerful and simple image renaming line script combo.
here's my full 3 line image renaming script
robocopy D:\SRC_DIR\ D:\DEST_DIR\_staging *.jpg /e /MAXAGE:2
FORFILES /P D:\DEST_DIR\_staging\ /S /M *.jpg /C "cmd /c convert.exe #path -quality 65 -resize 1500 D:\RESIZED_DIR\\#file"
DEL D:\DEST_DIR\_staging\*.* /S /Q

Resources