I can't seem to find if you can do this with wget or not.
What I am trying to do is download a whole bunch of images from one folder on a web server where all the images are stored. I am wondering if i can have multiple runs of wget to download them quicker.
eg
instance 1 of wget is downloading file 1
instance 2 of wget is downloading file 2
instance 3 of wget is downloading file 3 ...
and so on.
once an instance has finished I want it to move to the next file that hasn't started downloading yet.
Is this even possible with wget?
It's a question for your operating system.
For example :
If you are trying to run wget from command line, to launch wget and get
on linux you may add amp sign (&) at the end of your command.
"your wget command here" &
on windows you can type
start "your wget command here"
and a lot of other ways actually
Related
I have two copies of a 400MB dataset file in my personal computer and in my Google drive. I want to play with the dataset with the programming language Julia on the Google Colab Jupyter notebook. I found a working code piece that changes the default Colab runtime type from Python 3 to Julia 1.3.1. If you run the following code in a code cell, and then reload the Colab page, the runtime type becomes Julia:
%%shell
if ! command -v julia 2>&1 > /dev/null
then
wget 'https://julialang-s3.julialang.org/bin/linux/x64/1.3/julia-1.3.1-linux-x86_64.tar.gz' \
-O /tmp/julia.tar.gz
tar -x -f /tmp/julia.tar.gz -C /usr/local --strip-components 1
rm /tmp/julia.tar.gz
fi
julia -e 'using Pkg; pkg"add Plots; add PyPlot; add IJulia; add Knet;"'
julia -e 'using Pkg; pkg"build Knet;"'
When the runtime type becomes Julia, clicking on the Mount Drive button returns the following error message:
Mounting your Google Drive is only available on hosted Python runtimes.
When I try to mount the drive during the Python runtime type, then converting the runtime type to Julia, Colab clears everything including the mounted drive. So, this method does not work, too.
When I try to upload the dataset to Colab from my computer, everything starts smoothly. However, each time that I try to upload the dataset from my computer in place of mounting the drive, I face one of these two problems: Either the upload process fails or Colab stops the Julia runtime due to inactivity (how can I start being active without my dataset). When the upload process stops without uploading the file completely, the yellow-green circle on the bottom left part of the page which indicates the percentage of the task that is completed becomes completely red. It gives no error message except this red circle. When I download the uploaded (not complete) file to my computer, I see that it is only around 20MB (the original file was 400MB). Therefore, I can understand that the upload process has failed.
The same question has been asked here before. However, the answer suggests mounting the drive in Python runtime and changing the runtime type after that. This does not work for me because when the runtime changes, everything goes away as I stated above.
By the way, my dataset cannot be found anywhere else. So, sample datasets folder does not work.
So, how can I use my dataset on Google Colab with Julia?
If the dataset is not top secret, you can share it publicly and use gdown command to download it
run(`gdown --id 1-7dVdjCIZIxh8hHJnGTK-RA1-jL1tor4`)
Here 1-7dV...or4 is the file_id taken from the shared URL.
I am trying to run a script to download some weather data from a NOAA ftp site.
When I attempt to run the following command:
system("wget ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2016/999999-54856-2016.gz")
it returns status 127, which as I understand simply means the command will not run.
This link on the other hand, seems to be working well and downloads the zip folder when I ran it in the browser.
I read online about adding the path 'C:\Rtools\bin' from this link: Create zip file: error running command " " had status 127 but that doesn't seem to work either.
I'm wondering if this might be a permissions issue or other security setting preventing me from invoking system commands.
Any ideas?
Thanks!
You're using Windows. wget is a Unix/Linux program. You can just call download.file to download from within R:
download.file("ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2016/999999-54856-2016.gz",
"999999-54856-2016.gz", mode="wb")
The mode="wb" is important for downloading binary files on Windows.
On my Unix server I execute this command to copy all content from folderc via the unix shell.
wget -r -nH --accept=ismv,ismc,ism,jpg --cut-dirs=5 --level=0 --directory-prefix="/root/sstest" -o /root/sstest2.log http://site.com/foldera/folderb/folderc/
All the content from folderc is actually copied to /root/sstest .
The wget does not exit after copying and take me back to the command prompt.
What could be causing this behaviour?
I had the same problem, and I just add single quote to the front and end of the URL.
This step resolved this issue form me.
It's possible that the HTTP server miscommunicates the length of a response, so that Wget keeps waiting for more data. It could be due to a bug in Wget or in the server (or a software component running on the server) which you don't notice in an interactive web browser.
To debug this, make sure you are running the latest version of Wget. If the problem persists, use the -d flag to collect the debug output, and send a report about the misbehavior to Wget developers at bug-wget#gnu.org. Be sure to strip the sensitive data, such as passwords or internal host names, from the report before sending it.
I observe a similar problem when downloading files from dropbox with wget:
the download finishes (file is complete)
wget (or curl, depending on what I use for download) do not show up in running processes, anymore, after the file is complete
wget (or curl) do not return to the command prompt
returning to the command prompt can be "forced" by simply hitting enter, I do not have to actually kill any process to return to the command prompt, it's just kind of stuck before I press enter one more time.
The problem is not wget-specific, it also occurs when I try to download the same file from the same location with curl. The problem does not occur at all if I download the same file from several unix web server, neither with wget, nor with curl.
I have tried using timeout (with a sufficiently long time) to force wget/curl to return to the command prompt, but they even do not return to the command prompt after timeout kills them.
I have installed XULRunner 11.0 (xr) from here:
Downloads - sqlite-manager - Extension for Firefox and other apps to manage any sqlite database - Google Project Hosting
I have followed the steps listed here:
kiveo - Mac SQLite Manager Standalone App
I have read and tried the suggestions here (though they're for version 6.0):
stackoverflow: How to Install and run a XulRunner Application on Mac OS X?
I am able to get the help listing with this command:
/Library/Frameworks/XUL.framework/xulrunner-bin -h
I am able to run the app from Firefox using this command (after changing the max version in sqlitemanager-xr-0/application.ini to 11.0 from 11.0a1):
/Applications/Firefox.app/Contents/MacOS/firefox --app ~/Downloads/sqlitemanager-xr-0/application.ini
Here are the contents of the application.ini file:
[App]
Name=sqlite-manager
ID=SQLiteManager#mrinalkant.blogspot.com
Version=0.7.7
BuildID=201111132204
Vendor=lazierthanthou
Copyright=Copyright (c) 2008 - 2011 lazierthanthou
[Gecko]
MinVersion=2.0
MaxVersion=11.0
[XRE]
EnableExtensionManager=1
When I run the following command in Terminal, with or without sudo, it just immediately returns to the command prompt. There are no error messages. No application appears under Applications. Nothing seems to happen at all. (And, despite the stackoverflow page above noting that --install-app may not really be supported, it is in the XULRunner help listing - which I guess doesn't necessarily mean it'll work ;)
/Library/Frameworks/XUL.framework/xulrunner-bin --install-app Downloads/sqlitemanager-xr-0/ /Applications
Following a suggestion below, I checked for an exit code. The line above is returning 2.
Help?
Just like you did with Firefox, this command should run your app:
/Library/Frameworks/XUL.framework/xulrunner-bin --app ~/Downloads/sqlitemanager-xr-0/application.ini
Also, the --app switch is optional within XULRunner.
Here's how you can make a self-contained application you can run from the Dock.
Use the xulrunner --install-app command to create the application and then copy all contents of XUL.framework/Versions/Current into the generated application at /Applications/sqlite-manager.app/Contents/MacOS.
You can then create a wrapper script that runs the xulrunner within the generated app with the application.ini file as described here.
For example, put the following into sqlite-manager.app/Contents/MacOS/sqlite-manager and make it executable.
#!/usr/bin/env bash
APP_PATH="/Applications/sqlite-manager.app"
"$APP_PATH/Contents/MacOS/xulrunner" --app "$APP_PATH/Contents/Resources/application.ini"
Now you have to tell OS X to run sqlite-manager instead of xulrunner. You can do that by editing sqlite-manager.app/Contents/info.plist and setting CFBundleExecutable to sqlite-manager like this:
<key>CFBundleExecutable</key>
<string>sqlite-manager</string>
The only limitation of this approach is that it breaks when you move the application or rename it. I'd love suggestions on how to get rid of the absolute path within the sqlite-manager script.
try this:
firefox -chrome chrome://sqlitemanager/content/sqlitemanager.xul
or on OS X
/Applications/Firefox.app/Contents/MacOS/firefox -chrome chrome://sqlitemanager/content/sqlitemanager.xul
(found on http://www.egeek.me/2013/09/07/how-to-run-sqlite-manager-with-a-single-command/)
works fine for me on UBUNTU 12.04 to start sqlite manager without starting firefox first
If the install was successful, I think the app should be available in some usual place for your system (which wasn't mentioned, but I'm guessing OSX :). Have you looked under /Applications?
To see whether the command failed quietly, you could check its return value. Is there a verbose switch?
$ cd narnia
bash: cd: narnia: No such file or directory
$ echo $?
1
$ cd .
$ echo $?
0
$ cd narnia && echo "success"
bash: cd: narnia: No such file or directory
$ cd . && echo "success"
success
I am a novice as far as using cloud computing but I get the concept and am pretty good at following instructions. I'd like to do some simulations on my data and each step takes several minutes. Given the hierarchy in my data, it takes several hours for each set. I'd like to speed this up by running it on Amazon's EC2 cloud.
After reading this, I know how to launch an AMI, connect to it via the shell, and launch R at the command prompt.
What I'd like help on is being able to copy data (.rdata files) and a script and just source it at the R command prompt. Then, once all the results are written to new .rdata files, I'd like to copy them back to my local machine.
How do I do this?
I don't know much about R, but I do similar things with other languages. What I suggest would probably give you some ideas.
Setup a FTP server on your local machine.
Create a "startup-script" that you launch with your instance.
Let the startup script download the R files from your local machine, initialize R and do the calculations, then the upload the new files to your machine.
Start up script:
#!/bin/bash
set -e -x
apt-get update && apt-get install curl + "any packages you need"
wget ftp://yourlocalmachine:21/r_files > /mnt/data_old.R
R CMD BATCH data_old.R -> /mnt/data_new.R
/usr/bin/curl -T /mnt/data_new.r -u user:pass ftp://yourlocalmachine:21/new_r_files
Start instance with a startup script
ec2-run-instances --key KEYPAIR --user-data-file my_start_up_script ami-xxxxxx
first id use amazon S3 for storing the filesboth from your local machine and back from the instance
as stated before, you can create start up scripts, or even bundle your own customized AMI with all the needed settings and run your instances from it
so download the files from a bucket in S3, execute and process, finally upload the results back to the same/different bucket in S3
assuming the data is small (how big scripts can be) than S3 cost/usability would be very effective