download.file in R including pre-requisites

download.file in R including pre-requisites - r

I'm trying to use download.file to get some webpages including embedded images, etc. I think using wget it's the equivalent of the -p -k options, but I can't see how to do this...
if I do:
download.file("http://guardian.co.uk","test.html")
That obviously works, but I get this error:
Warning messages:
1: running command 'wget -p -k "http://guardian.co.uk" -O "test.html"' had status 1
2: In download.file("http://guardian.co.uk", "test.html", method = "wget", :
download had nonzero exit status
When I run:
download.file("http://guardian.co.uk","test.html", method = "wget", extra = "-p -k") #no recursion (-r), but get pre-requisites, and (-k) convert for local viewing
I've done Sys.which("wget") & the path is set (and I'm not trying to access https which I think can cause issues).
Once I've done this I actually want to put it into a loop where I download a set of urls (& their embedded content) to create a single html output...

Easy solution, just use system to call wget directly:
system("wget http://guardian.co.uk -p -k")
I think the issue is that passing an output file ('test.html') means -O option specified, so you can't also invoke -r -k whereas calling wget directly means it saves the files separately.

Related

Unable to export env variable from script

I'm currently struggling with running a .sh script I'm trying to trigger from Jenkins.
Within the Jenkins "execute shell" section, I'm connecting to a remote server (The Jenkins agent does not have right OS to build what I need.), using:
cp -r . /to/shared/drive/to/have/access/on/remote
ssh -t -t username#servername << EOF
cd /to/shared/drive/to/have/access/on/remote
source build.sh dev
exit
EOF
Inside build.sh, I'm exporting R_LIBS to build a package for different R versions.
...
for path in "${!rVersionPaths[#]}"; do
export R_LIBS="${path}"
Rscript -e 'install.packages(c("someDependency", "someOtherDependency"), repos="http://cran.r-project.org");'
...
Setting R_LIBS should functions here like setting lib within install.packages(...). For some reason the R_LIBS export doesn't get picked up. Also setting other env variables like http_proxy are ignored. This causes any requests outside the network to fail.
Is there any particular way of achieving this?

Maybe pass those variables with env, like
env R_LIBS="${path}" Rscript -e 'install.packages(c("someDependency", .....

Well i'm not able to comment on the question, so posting it as answer.
I had similar problem when calling remote shell script from Jenkins, the problem was somehow bash_profile variables were not loaded when called the script from Jenkins but locally it worked. Loading the bash profile in ssh connection solved it for me.
Add source to bash_profile in build.sh
. ~/.bash_profile OR source ~/.bash_profile
Or
Reload bash_profile in ssh connection
`ssh -t -t username#servername << EOF
. ~/.bash_profile
your commands here
exit
EOF

You can set that variable in the same command line like this:
R_LIBS="${path}" Rscript -e \
'install.packages(c("someDependency", "someOtherDependency"), repos="http://cran.r-project.org");'
It's possible to append more variables in this way. Note that this will set those environment variables only for the command being called after them (and its children processes as well).

You said that "R_LIBS export doesn't get picked up". Question Is the value UNSET? Or is it set to some other value & you are trying to override it?
It is possible that SSH may be invoking "/bin/sh -c". Based on the second answer to: Why does 'cd' command not work via SSH?, you can simplify the SSH command and explicitly invoke the build.sh script in Bash:
cp -r . /to/shared/drive/to/have/access/on/remote
ssh -t -t username#servername "cd /to/shared/drive/to/have/access/on/remote && bash -f build.sh dev"
This makes the SSH invocation more similar to invoking the command within a remote interactive shell. (You can avoid sourcing scripts and exporting variables.)
You don't need to export R_LIBSor env R_LIBS when it is possible to prefix any command with local environment variable overrides (agrees with Luis' answer):
...
for path in "${!rVersionPaths[#]}"; do
R_LIBS="${path}" Rscript -e 'install.packages(c("someDependency", "someOtherDependency"), repos="http://cran.r-project.org");'
...
The Rscript may be doing a lot with env vars. You can verify that you are setting the R_LIBS env var by replacing Rscript with the env command and observe the output:
...
for path in "${!rVersionPaths[#]}"; do
R_LIBS="${path}" env
...
According to this manual "Initialization at Start of an R Session", Rscript looks in several places to load "site and user files":
$R_PROFILE
$R_HOME/etc/Renviron
$R_HOME/etc/Renviron.site
$R_ENVIRON_USER
$R_PROFILE_USER
./.Rprofile
$HOME/.Rprofile
./.RData
The "Examples" section of that manual shows this:
## Not run:
## Example ~/.Renviron on Unix
R_LIBS=~/R/library
PAGER=/usr/local/bin/less
If you add the --vanilla command-line option to ignore all of these files, then you may get different results and know something in the site/init/environ files is affecting your R_LIBS! I cannot run this system myself. Hopefully we have given you some areas to investigate.

You probably don't want to source build.sh, just invoke it directly (i.e. remove the source command).
By source-ing the file your script is executed in the SSH shell (likely sh) rather than by bash, which it sounds like is what you intended.

Getting `Cannot mix POST with other methods` error when using `ab -p`

I'm using ApacheBench (ab) to stress test my site. When I specify a method -m POST and some postdata -p {datafile}, I get the message
Cannot mix POST with other methods.
The trouble is that I'm not actually mixing POST with other methods. Here's my full command:
ab -m POST -p postdata.txt -n 1000 -c 100 http://example.com/

This is due to an idiosyncrasy in the way ab handles command-line arguments. When you use -p it automatically sets the method to POST for you, and this happens before -m is parsed. So when it parses -m, it sees that the already set method is not null and throws an error. What it should do (IMO) is silently ignore the parameter if its value is the same as what's implicitly been set.
Note that everything above also applies when you try to do a PUT request; e.g., ab -m PUT -u putdata.txt.
So what you should do to avoid this error is never specify -m when you're using -p or -u.
(Source: the ab.c source file)

How do I use the nohup command without getting nohup.out?

I have a problem with the nohup command.
When I run my job, I have a lot of data. The output nohup.out becomes too large and my process slows down. How can I run this command without getting nohup.out?

The nohup command only writes to nohup.out if the output would otherwise go to the terminal. If you have redirected the output of the command somewhere else - including /dev/null - that's where it goes instead.
nohup command >/dev/null 2>&1 # doesn't create nohup.out
Note that the >/dev/null 2>&1 sequence can be abbreviated to just >&/dev/null in most (but not all) shells.
If you're using nohup, that probably means you want to run the command in the background by putting another & on the end of the whole thing:
nohup command >/dev/null 2>&1 & # runs in background, still doesn't create nohup.out
On Linux, running a job with nohup automatically closes its input as well. On other systems, notably BSD and macOS, that is not the case, so when running in the background, you might want to close input manually. While closing input has no effect on the creation or not of nohup.out, it avoids another problem: if a background process tries to read anything from standard input, it will pause, waiting for you to bring it back to the foreground and type something. So the extra-safe version looks like this:
nohup command </dev/null >/dev/null 2>&1 & # completely detached from terminal
Note, however, that this does not prevent the command from accessing the terminal directly, nor does it remove it from your shell's process group. If you want to do the latter, and you are running bash, ksh, or zsh, you can do so by running disown with no argument as the next command. That will mean the background process is no longer associated with a shell "job" and will not have any signals forwarded to it from the shell. (A disowned process gets no signals forwarded to it automatically by its parent shell - but without nohup, it will still receive a HUP signal sent via other means, such as a manual kill command. A nohup'ed process ignores any and all HUP signals, no matter how they are sent.)
Explanation:
In Unixy systems, every source of input or target of output has a number associated with it called a "file descriptor", or "fd" for short. Every running program ("process") has its own set of these, and when a new process starts up it has three of them already open: "standard input", which is fd 0, is open for the process to read from, while "standard output" (fd 1) and "standard error" (fd 2) are open for it to write to. If you just run a command in a terminal window, then by default, anything you type goes to its standard input, while both its standard output and standard error get sent to that window.
But you can ask the shell to change where any or all of those file descriptors point before launching the command; that's what the redirection (<, <<, >, >>) and pipe (|) operators do.
The pipe is the simplest of these... command1 | command2 arranges for the standard output of command1 to feed directly into the standard input of command2. This is a very handy arrangement that has led to a particular design pattern in UNIX tools (and explains the existence of standard error, which allows a program to send messages to the user even though its output is going into the next program in the pipeline). But you can only pipe standard output to standard input; you can't send any other file descriptors to a pipe without some juggling.
The redirection operators are friendlier in that they let you specify which file descriptor to redirect. So 0<infile reads standard input from the file named infile, while 2>>logfile appends standard error to the end of the file named logfile. If you don't specify a number, then input redirection defaults to fd 0 (< is the same as 0<), while output redirection defaults to fd 1 (> is the same as 1>).
Also, you can combine file descriptors together: 2>&1 means "send standard error wherever standard output is going". That means that you get a single stream of output that includes both standard out and standard error intermixed with no way to separate them anymore, but it also means that you can include standard error in a pipe.
So the sequence >/dev/null 2>&1 means "send standard output to /dev/null" (which is a special device that just throws away whatever you write to it) "and then send standard error to wherever standard output is going" (which we just made sure was /dev/null). Basically, "throw away whatever this command writes to either file descriptor".
When nohup detects that neither its standard error nor output is attached to a terminal, it doesn't bother to create nohup.out, but assumes that the output is already redirected where the user wants it to go.
The /dev/null device works for input, too; if you run a command with </dev/null, then any attempt by that command to read from standard input will instantly encounter end-of-file. Note that the merge syntax won't have the same effect here; it only works to point a file descriptor to another one that's open in the same direction (input or output). The shell will let you do >/dev/null <&1, but that winds up creating a process with an input file descriptor open on an output stream, so instead of just hitting end-of-file, any read attempt will trigger a fatal "invalid file descriptor" error.

nohup some_command > /dev/null 2>&1&
That's all you need to do!

Have you tried redirecting all three I/O streams:
nohup ./yourprogram > foo.out 2> foo.err < /dev/null &

You might want to use the detach program. You use it like nohup but it doesn't produce an output log unless you tell it to. Here is the man page:
NAME
detach - run a command after detaching from the terminal
SYNOPSIS
detach [options] [--] command [args]
Forks a new process, detaches is from the terminal, and executes com‐
mand with the specified arguments.
OPTIONS
detach recognizes a couple of options, which are discussed below. The
special option -- is used to signal that the rest of the arguments are
the command and args to be passed to it.
-e file
Connect file to the standard error of the command.
-f Run in the foreground (do not fork).
-i file
Connect file to the standard input of the command.
-o file
Connect file to the standard output of the command.
-p file
Write the pid of the detached process to file.
EXAMPLE
detach xterm
Start an xterm that will not be closed when the current shell exits.
AUTHOR
detach was written by Robbert Haarman. See http://inglorion.net/ for
contact information.
Note I have no affiliation with the author of the program. I'm only a satisfied user of the program.

Following command will let you run something in the background without getting nohup.out:
nohup command |tee &
In this way, you will be able to get console output while running script on the remote server:

sudo bash -c "nohup /opt/viptel/viptel_bin/log.sh $* &> /dev/null" &
Redirecting the output of sudo causes sudo to reask for the password, thus an awkward mechanism is needed to do this variant.

If you have a BASH shell on your mac/linux in-front of you, you try out the below steps to understand the redirection practically :
Create a 2 line script called zz.sh
#!/bin/bash
echo "Hello. This is a proper command"
junk_errorcommand
The echo command's output goes into STDOUT filestream (file descriptor 1).
The error command's output goes into STDERR filestream (file descriptor 2)
Currently, simply executing the script sends both STDOUT and STDERR to the screen.
./zz.sh
Now start with the standard redirection :
zz.sh > zfile.txt
In the above, "echo" (STDOUT) goes into the zfile.txt. Whereas "error" (STDERR) is displayed on the screen.
The above is the same as :
zz.sh 1> zfile.txt
Now you can try the opposite, and redirect "error" STDERR into the file. The STDOUT from "echo" command goes to the screen.
zz.sh 2> zfile.txt
Combining the above two, you get:
zz.sh 1> zfile.txt 2>&1
Explanation:
FIRST, send STDOUT 1 to zfile.txt
THEN, send STDERR 2 to STDOUT 1 itself (by using &1 pointer).
Therefore, both 1 and 2 goes into the same file (zfile.txt)
Eventually, you can pack the whole thing inside nohup command & to run it in the background:
nohup zz.sh 1> zfile.txt 2>&1&

You can run the below command.
nohup <your command> & > <outputfile> 2>&1 &
e.g.
I have a nohup command inside script
./Runjob.sh > sparkConcuurent.out 2>&1

problem while doing gzip over ssh

I am getting below error while running gzip command over ssh
ssh 123#HPUX "gzip"
ksh: gzip: not found
whereas if i am running tar in same way it is working properly.
ssh 123#HPUX "tar"
tar: usage tar [-]{txruc}[eONvVwAfblhm{op}][0-7[lmh]] [tapefile] [blocksize] [[-C directory] file] ...
Can you please suggest why am i getting this error and how can i overcome this problem ?
When i tried following step gzip is working properly
ssh 123#HPUX
gzip
gzip: compressed data not written to a terminal. Use -f to force compression.
For help, type: gzip -h
which means that gzip is working.

Your $path may be set differently for an interactive login session, versus
executing a single command via ssh. Does it work if you specify an absolute path to gzip?
Try logging in interactively, and use the command which gzip to show where the
binary is. Perhaps it's something like /usr/local/gnu/gzip . (You might want to do
echo $path too, and make a note of it for comparison purposes.) Then try using
that path in your batch SSH command, i.e. ssh 123#HPUX "/usr/local/gnu/gzip" to see
what happens. The command ssh 123#HPUX 'echo $path' (note single quotes!) should tell you how your $path is set in that context -- if you compare that to your interactive $path, you'll probably see a difference that explains why gzip isn't found in the first version of your batch command.

Wild guess: it's ksh raising the error the first time. When you do a full ssh login, are you using ksh? Are you running any scripts that modify its path?

Post to Twitter using Terminal with CURL

I got this far:
:~ curl -u username:password -d status="new_status" http://twitter.com/statuses/update.xml
Now, how can I alias this with variables so I can easily twit from Terminal? How can I make the alias working through different sessions (when I close Terminal aliases reset).
Thanks!

Basic Authentication is no longer supported by twitter. Please use OAuth.

You clearly have the alias command: stick it in your ~/.bashrc and it will be set up when your bash shell starts. (.shrc should also work for sh-like shells.)
If you stick it in a script file as the previous answer suggests:
(a) add the line
#!/bin/sh
at the top;
(b) make sure it's on your path or you'll have to type the whole path to the script when you want to run it.
(c) to make it executable,
chmod +x tweet.sh

what about putting it a file and using argument 1 as $1:
# tweet.sh "post my status, moron!":
curl -u username:password -d status="$1" http://twitter.com/statuses/update.xml
will that work?

You need to create a file in your home directory that will get referenced each time a new terminal opens.
Do a bit of research as to what to name the file, according to what type of shell you are using (tcsh looks for a file called .tcshrc while bash looks for .bashrc).
Once you have that file, make it executable by running:
chmod +x name_of_file
Then, in that file, create your alias (again, you'll need to research how to do this depending on what type of shell you are using). For tcsh, my alias looks like this:
alias tw 'curl -u username:password -d status=\!^ http://twitter.com/statuses/update.xml'
Bash aliases use an equals sign. A bash alias would look something more like this:
alias tw='curl -u username:password -d status=\!^ http://twitter.com/statuses/update.xml'
Note the change in the command after "status=". The \!^ tells the line of code to insert the first argument passed after the alias itself.
Save your file.
You could then run an update to twitter by typing the following in a new terminal:
tw 'my first post to twitter via the terminal, using aliases'
Don't forget to escape 'special' characters (like exclamations) with the escape character, \ (i.e. \!)

Since Basic Authentication is no longer supported by twitter, you have to use OAuth to achieve your goal.
But if you just want to post to Twitter using terminal, there are many application can do it.
Take a look at Rainbowstream or t
With rainbowstream, the following lines will let you tweet from console:
$ sudo pip install rainbowstream
$ rainbowstream
[#yourscreenname]t whatever you want