Can I force Rscript to pass wildcard in command line arguments? - r

As best as I can sort out (since I haven't had much luck finding documentation of it) when one runs Rscript with a command argument that includes a wildcard *, the argument is expanded to a character vector of the filepaths that match, or passed if there are no matches. Is there a way to pass the wildcard all the time, so I can handle it myself within the script (using things like Sys.glob, for example)?
Here's a minimal example, run from the terminal:
ls
## foo.csv bar.csv baz.txt
Rscript -e "print(commandArgs(T))" *.csv
## [1] "foo.csv" "bar.csv"
Rscript -e "print(commandArgs(T))" *.txt
## [1] "baz.txt"
Rscript -e "print(commandArgs(T))" *.rds
## [1] "*.rds"
EDIT: I have learned that this behavior is from bash, not Rscript. Is there some way to work around this behavior from within R, or to suppress wildcard expansion for a particular R script but not the Rscript command? In my particular case, I want to run a function with two arguments, Rscript collapse.R *.rds out.rds that concatenates the contents of many individual RDS files into a list and saves the result in out.rds. But since the wildcard gets expanded before being passed to R, I have no way of checking whether the second argument has been supplied.

If I understand correctly, you don't want bash to glob the wildcard for you, you want to pass the expression itself, e.g. *.csv. Some options include:
Pass the expression in quoted text and process it within R, either by evaluating that in another command or otherwise
Rscript -e "list.files(pattern = commandArgs(T))" "*\.csv$"
Pass just the extension and process the * within R by context
Rscript -e "list.files(pattern = paste0('*\\\\.', commandArgs(T)))" "csv$"
Through complicated and unnecessary means, disable globbing for that command: Stop shell wildcard character expansion?
Note: I've changed the argument to a regex to prevent it matching too greedily.

Related

Pass output of R script to bash script

I would like to pass the output of my R file to a bash script.
The output of the R script being the title of a video: "Title of Video"
And my bash script being simply:
youtube-dl title_of_video.avi https://www.youtube.com/watch?v=w82a1FTere5o88
Ideally I would like the output of the video being "Title of Video".avi
I know I can use Rscript to launch an R script with a bash command but I don't think Rscript can help me here.
In bash you can call a command and use its output further via the $(my_command) syntax.
Make sure your script only outputs the title.
E.g.
# in getTitle.R
cat('This is my title') # note, print would have resulted in an extra "[1]"
# in your bash script:
youtube-dl $(Rscript getTitle.R) http://blablabla
If you want to pass arguments to your R script as well, do so inside the $() syntax; if you want to pass those arguments to your bash script, and delegate them to the R script to handle them, you can use the special bash variables $1 (denote the 1st argument, etc), or $* to denote all arguments passed to the bash script, e.g.
#!/bin/bash
youtube-dl $(Rscript getTitle.R $1) http://blablablabla
(this assumes that getTitle.R does something to those arguments internally via commandArgs etc to produce the wanted title output)
You can call Rscript in bash script and you might assign the output of the R script to a variable in bash script. Check this question. After that you can execute
youtube-dl $outputFromRScript https://www.youtube.com/watch?v=w82a1FTere5o88

Is it feasible to narrow down the result returned by ls() with grep in R, much like the `ls -l | grep` command in UNIX?

In Terminal/shell script, you can list all files in the current directory with ls -l, and then pipe it to execute an additional command. For example, ls -l | grep -i "calc" returns all files whose filename includes calc. In R, you can list all objects currently stored in the workspace, with ls() command.
However, I want to do narrow down the list returned by ls() with something like the grep feature in R, where the input is the returned list by ls() and the output is the list narrowed down by grep (or something), much like the UNIX pipe feature I mentioned above. Is it feasible to do it in R?
Also, is it also feasible to narrow down the list by xargs-like functionality in R? So I like to get only the objects on which the literal includes if, so that if a function on the list returned by ls() includes the if-else condition inside it, I want to display the function in console. You can do it in Terminal with find . | xargs grep "if" (of course those are files in the current directory, not an R object in workspace, but I showed it just the purpose of illustration).
Note that this is not a post on how to call shell commands from within R. It's not what I want to do.
I use OS X 10.9.3 and R 3.1.0.
ls() has a pattern parameter that might be what you need:
pattern an optional regular expression. Only names matching pattern
are returned. glob2rx can be used to convert wildcard patterns
to regular expressions.
For the second part of your question, you could use capture.output(getAnywhere()) and grep to look inside function source. You'll need to pass in the functions to that and I'd make that whole operation a function to keep the implementation clean.
You can do
grep("calc",list.files(),value=TRUE)
which should "emulate" ls -l | grep -i "calc". See ?list.files and grep.

How to pass parameters in system call in Perl?

I'm calling R function in Perl by passing variables in Perl program using system command.
#!/usr/bin/perl
$file1= "Test1.txt"
$file2= "Test2.txt"
$val="Rscript Test.R ".$file1." ".$file2;
print($val,"\n");
system('Rscript Test.R', $file1, $file2);
But it does not call the R script and pass the file1 and file2 values. How can I fix this?
When using the system LIST syntax, put all the arguments to the list - otherwise, Rscript Test.R is taken as one command.
system('Rscript', 'Test.R', $file1, $file2);

Executing expressions in Rscript.exe

I'd like to put some expressions that write stuff to a file directly into a call to Rscript.exe (without specifying file in Rscript [options] [-e expression] file [args] and thus without an explicit R script that is run).
Everything seems to work except the fact that the desired file is not created. What am I doing wrong?
# Works:
shell("Rscript -e print(Sys.time())")
# Works:
write(Sys.time(), file='c:/temp/systime.txt')
# No error, but no file created:
shell("Rscript -e write(Sys.time(), file='c:/temp/systime.txt')")
Rscript parses its command line using spaces as separators. If your R string contains embedded spaces, you need to wrap it within quotes to make sure it gets sent as a complete unit to the R parser.
You also don't need to use shell unless you specifically need features of cmd.exe.
system("Rscript.exe -e \"write(Sys.time(), file='c:/temp/systime.txt')\"")

How can I implement the command 'ls' with wildcard, '*'?

EDIT #1 : I'm under the limit that all arguments are enclosed in two quotes, so that shell do not expand any argument with * to the corresponding path.
EDIT #2 : In order to retrieve directories such as */*, ../*, and dirA/*/file.out, How should I use iteration loop or recursive call?
I have just learned about the function fnmatch(). But I don't know start place.
There are many possible cases. I'm confused dealing with these all cases.
For example, Let me assume that executable program is a.out.
$./a.out -l */*
$./a.out -l ../*
$./a.out -l [file_name] [directory_name]
/* Since I also have to implement ls command with no wildcard. */
What should I do? Any advice would be awesome.
Thank you in advance.
Your problem is : shell replaces wildcard caracter * with all of the filenames matching the pattern.
Solution:
If you do not want to use this feature of bash, just put quotation marks around your command line arguments.
Calling your program that way will have the original arguments, containing wildcards.
After this, you can list all the filenames with their paths. For example using some recursive algorithm. Then you can apply some matching to these path string. (when visiting it)
If you want to be a good unix citizen, the rule is Don't do filename globbing unless you are writing a shell.
You want to write an ls-like program? Don't do any wildcard expansion. Don't treat "*" specially. Just treat your argv as a list of filenames. If your program handles these cases:
./a.out file1
./a.out file1 file2 file3
Then it will also handle
./a.out file*
correctly because the shell will do the expansion and your program won't need to know about it. And besides that, it will handle this:
zsh% ./a.out **/file<40-185>~file<90-100>(.mm-30OL[1,2])
which in zsh expanded glob syntax means: expand file40 through file185, except for file90 through file100, include only the ones that have been modified in the last 30 minutes, and use only the largest 2 files in the resulting set.
fnmatch is never going to do anything like that. But these fancy globs can be used with any command that just takes a filename list and doesn't care where it came from.
When you're in a situation where you can't take a list of filenames from the command line, then consider using fnmatch. ls isn't one of those situations.

Resources