Using rtools grep/pipe combination through a system call - r

I have a file called goodfile. Lets say the contents are
badline
goodline
badline
goodline
badline
badline
On a windows machine I want to filter this file to get only the "goodline"s before reading it to save on memory costs. Thankfully, the rtools installation comes with grep that should allow me to do that. I should be able to do
if(!pkgbuild::has_rtools()){
stop('install rtools')
}
rtoolsPath = pkgbuild::rtools_path()
grep = file.path(rtoolsPath,'grep.exe')
command = paste(grep, "goodline goodfile")
system(command)
and get
goodline
goodline
However when I try to pipe the output to a file by doing
command = paste(grep, "goodline goodfile > betterfile")
system(command)
I get
goodfile:goodline
goodfile:goodline
/usr/bin/grep: >: No such file or directory
/usr/bin/grep: betterfile: No such file or directory
This error message and the "betterfile" is not generated.
If I take the same command and run it on my command line, it just works, if I do the same system call with regular grep on R in linux machine it just works, so I can't see what the problem is.
I was able to find an alternative way to get the file by doing
system2(grep,
args = c('goodline','goodfile'),stderr = 'betterfile',stdout = 'betterfile')
but still curious why the pipe doesn't work

Related

in R, invoke external program in path with spaces with command line parameters

A combination of frustrating problems here.
Essentially I want R to open an external program with command line parameters. I am currently trying to achieve it on a Windows machine, ideally it would work cross-platform.
The program (chimera.exe) is in a directory containing spaces: C:\Program Files\Chimera1.15\bin\
The command line options could be for instance a --nogui flag and a script name, so from the shell I would write (space-specifics aside):
C:\Program Files\Chimera1.15\bin\chimera.exe --nogui scriptfile
This works if I go in windows cmd.exe to the directory itself and just type chimera.exe --nogui scriptfile
Now in R:
I've been playing with shell(), shell.exec(), and system(), but essentially I fail because of the spaces and/or the path separators.
most of the times system() just prints "127" for whatever reason:
> system("C:/Program Files/Chimera1.15/bin/chimera.exe")
[1] 127`
back/forward slashes complicate the matter further but don't make it work:
> system("C:\Program Files\Chimera1.15\bin\chimera.exe")
Error: '\P' is an unrecognized escape in character string starting "C\P"
> system("C:\\Program Files\\Chimera1.15\\bin\\chimera.exe")
[1] 127
> system("C:\\Program\ Files\\Chimera1.15\\bin\\chimera.exe")
[1] 127
> system("C:\\Program\\ Files\\Chimera1.15\\bin\\chimera.exe")
[1] 127
When I install the program in a directory without spaces, it works. How can I escape or pass on the space in system() or related commands or how do I invoke the program otherwise?
Try system2 as it does not use the cmd line processor and use r"{...}" to avoid having to double backslashes. This assumes R 4.0 or later. See ?Quotes for the full definition of the quotes syntax.
chimera <- r"{C:\Program Files\Chimera1.15\bin\chimera.exe}"
system2(chimera, c("--nogui", "myscript"))
For example, this works for me (you might need to change the path):
R <- r"{C:\Program Files\R\R-4.1\bin\x64\Rgui.exe}" # modify as needed
system2(R, c("abc", "def"))
and when Rgui is launched we can verify that the arguments were passed by running this in the new instance of R:
commandArgs()
## [1] "C:\\PROGRA~1\\R\\R-4.1\\bin\\x64\\Rgui.exe"
## [2] "abc"
## [3] "def"
system
Alternately use system but put quotes around the path so that cmd interprets it correctly -- if it were typed into the Windows cmd line the quotes would be needed too.
system(r"{"C:\Program Files\Chimera1.15\bin\chimera.exe" --nogui myscript}")

Error in system(command, intern = TRUE) : '“C:\Program' not found selectWeka function

I'm trying to run the code below from BioSeqClass package, however I get an error message:
Error in system(command, intern = TRUE) : '“C:\Program' not found
selectWeka(data, evaluator="CfsSubsetEval", search="BestFirst", n)
This is a problem with how BioSeqClass is calling java: it is leaving the file names unprotected/unquoted, and R's system and system2 commands are horrible by not forcing quoting. (If you ever think of using these commands directly yourself, I strongly recommend something like processx.)
One should create an issue or bug-report, but I don't know how to do that with Bioconductor, and their mirror on github (https://github.com/Bioconductor-mirror) is defunct, so I'm at a loss there. Hopefully somebody with more info can weigh in on this.
Workarounds
It is not obvious if the problem is due to where weka.jar is located or perhaps one of the other arguments. You can find out where the problem is by debugging the selectWeka function and inspecting the value of command before the system call. Look for the Program Files component of a path.
If the problem is with weka.jar, then this suggests that you are installing packages somewhere under C:\Program Files\, which is in my experience bad practice on two counts:
For many problems I cannot recall (but this one re-ignites the discussion), I never install R in the default location under C:\Program Files\...; instead, I install it under a new directory, C:\R\R-3.5.3 (version-based) and go from there. You may not have control over this if on a university/company system.
Since this is not in a base-R package, this suggests that either you are not using a personal library location (collection of packages), or have placed your personal library for some reason under C:\Program Files. If the former, I strongly suggest you never install new packages inside the base-R installation directory, instead using your own. See ?.libPaths and many other tutorials/discussions on the topic online. Using packrat or checkpoint might also mitigate this problem.
If the problem is with trainFile (if I'm reading the source correctly), then your "permanent" fix is to change where Windows puts temporary files, as this trainFile is a temporary file created specifically for this function-run. If this is your problem, I'll leave it up to you to fix.
Regardless, you may not have time or need to make a more permanent solution, you just want to run this once or twice and then move on. For that fix:
Again, debug(selectWeka), and once command is defined (the next command to be executed is tmp <- system(command,intern = TRUE)), run this code to fix the value of command:
if(search=="Ranker"){
command = paste("java -cp ", shQuote(file.path(.path.package("BioSeqClass"), "scripts", "weka.jar")),
" weka.attributeSelection.", evaluator, " -i ", shQuote(trainFile),
" -s \"weka.attributeSelection.", search, " -N ", n, "\"", sep="" )
}else{
command = paste("java -cp ", shQuote(file.path(.path.package("BioSeqClass"), "scripts", "weka.jar")),
" weka.attributeSelection.", evaluator, " -i ", shQuote(trainFile),
" -s weka.attributeSelection.", search, sep="" )
}
(For the record, all I changed was adding shQuote twice to each paste.) Confirm that command now has quotes around things, something like
Browse[2]> command
[1] "java -cp \"C:\\Program Files\\...\\weka.jar" weka.attributeSel... -i \"c:\\path\\to\\some\\tempfile\" ...
Then you can continue-out of the debugger and let it run its course.
(I hope you don't have hundreds of calls to selectWeka.)
Caveat: I am not a use of BioSeqClass, so I'm saying all of this from speculation and inference. I might have mis-located the source of the error. And since I don't know what I'm doing with it, I have not tested the modified command assignment within selectWeka. I believe shQuote(...) is the right way to go, but you might need to use sQuote or dQuote instead, I'm not sure how your system is setup.

R calls mGENOVA-an external Program

Recently I have been trying to use R to call a .exe program named mGenov It's command line program. I have some screenshots to help me explain this (I use Windows 10).
Supposedly, it works this way:
double click on mGenova,
type card.txt
hit "enter" the cmd window will close
I have tried a lot; basically they can invoke the program, but pass the command about typing card.txt in the command
shell(cmd="D:\\mgenova\\mGENOVA\\card.txt", shell="D:\\mgenova\\mGENOVA\\mGENOVA.exe",intern=F)
OR
system("\"D:\\mgenova\\mGENOVA\\mGENOVA.exe\" \"D:\\mgenova\\mGENOVA\\card.txt\""
,show.output.on.console=TRUE,invisible=T,intern=T)
And I always got this
[1] "Input the filename containing the control cards." "" "" "*** Control cards file is empty"
attr(,"status")
[1] 1
Warning message:
running command '"D:\mgenova\mGENOVA\mGENOVA.exe" "D:\mgenova\mGENOVA\card.txt"' had status 1
How can I get it run on it? Thanks for helping!!!!!
You could create a batchfile (let's name it batch.bat) on Windows with the content
cd /D D:\mgenova\mGENOVA\
mGENOVA.exe < card.txt
All necessary input for GENOVA must be provided by the file card.txt.
Then in R run the command
system("batch.bat")

Unix SQLLDR scipt gives 'Unexpected End of File' error

All, I am running the following script to load the data on to the Oracle Server using unix box and sqlldr. Earlier it gave me an error saying sqlldr: command not found. I added "SQLPLUS < EOF", it still gives me an error for unexpected end of file syntax error on line 12 but it is only 11 line of code. What seems to be the problem according to you.
#!/bin/bash
FILES='ls *.txt'
CTL='/blah/blah1/blah2/name/filename.ctl'
for f in $FILES
do
cat $CTL | sed "s/:FILE/$f/g" >$f.ctl
sqlplus ID/'PASSWORD'#SERVERNAME << EOF sqlldr SCHEMA_NAME/SCHEMA_PASSWORD control=$f.ctl data=$f EOF
done
sqlplus will never know what to do with the command sqlldr. They are two complementary cmd-line utilities for interfacing with Oracle DB.
Note NO sqlplus or EOF etc required to load data into a schema:
#!/bin/bash
#you dont want this FILES='ls *.txt'
CTL_PATH=/blah/blah1/blah2/name/'
CTL_FILE="$CTL_PATH/filename.ctl"
SCHEMA_NM=SCHEMA_NAME
SCHEMA_PSWD=SCHEMA_PASSWORD
for f in *.txt
do
# don't need cat! cat $CTL | sed "s/:FILE/$f/g" >"$f".ctl
sed "s/:FILE/$f/g" "$CTL_FILE" > "$CTL_PATH/$f.ctl"
#myBad sqlldr "$SCHEMA_NAME/$SCHEMA_PASSWORD" control="$CTL_PATH/$f.ctl" data="$f"
sqlldr $SCHEMA_USER/$SCHEMA_PASSWORD#$SERVER_NAME control="$CTL_PATH/$f.ctl" data="$f" rows=10000 direct=true errors=999
done
Without getting too philosophical, using assignments like FILES=$(ls *.txt) is a bad habit to get into. By contrast, for f in *.txt will deal correctly for files with odd characters in them (like spaces or other syntax breaking values). BUT the other habit you do want to get into is to quote all variable references (like $f), with dbl-quotes : "$f", OK? ;-) This is the otherside of protection for files with spaces etc embedded in them.
In the edit update, I've varibalized your CTL_PATH and CTL_FILE. I think I understand your intent, that you have 1 std CTL_FILE that you pass thru sed to create a table specific .ctl file (a good approach in my experience). Note that you don't need to use cat to send a file to sed, but your use to create a altered file via redirection (> $f.ctl) is very shell-like too.
In 2nd edit update, I looked here on S.O. and found an example sqlldr cmdline that has the correct syntax and have modified to work with your variable names.
To finish up,
A. Are you sure the Oracle Client package is installed on the machine
that you are running your script on?
B. Is the /path/to/oracle/client/tools/bin included in your working
$PATH?
C. try which sqlldr. If you don't get anything, either its not
installed or its not in the path.
D. If not installed, you'll have to get it installed.
E. Once installed, note the directory that contains the sqlldr cmd.
find / -name 'sqlldr*' will take a long time to run, but it will
print out the path you want to use.
F. Take the "path" part of what is returned (like
/opt/oracle/11.2/client/bin/ (but not the sqlldr at the end), and
edit script at 2nd line with
(Txt added to appease the S.O. Formatter ;-) )
export ORCL_PATH="/path/you/found/to/oracle/client"
export PATH="$ORCL_PATH:$PATH"
These steps should solve any remaining issues. If this doesn't work, see if there is someone where you work that understands your local computing environment that can help explain any missing or different steps.
IHTH

Running a Windows executable file from within R with command line options

I am trying to call a Windows program called AMDIS from within R using the call
system("C:/NIST08/AMDIS32/AMDIS_32.exe /S C:/Users/Ento/Documents/GCMS/test_cataglyphis_iberica/queens/CI23_Q_120828_01.CDF")
in order to carry out an analysis (specified using the /S switch) on a file called CI23_Q_120828_01.CDF, but it seems that no matter what I try the file is not loaded in correctly, presumably because the options are not passed along. Does anyone have a clue what I might be doing wrong?
Right now this command either
doesn't do anything,
makes AMDIS pop up, but it doesn't load the file I specify
gives me the error
Warning message:
running command 'C:/NIST08/AMDIS32/AMDIS_32.exe /S
C:/Users/Ento/Documents/GCMS/test_cataglyphis_iberica/queens/CI23_Q_120828_01.CDF'
had status 65535
(I have no idea what results in these different outcomes of the same command)
(the AMDIS command line options are described here at the page 8)
Cheers,
Tom
EDIT:
Found it had to do with forward vs backslashes - running
system("C:\\NIST08\\AMDIS32\\AMDIS_32.EXE C:\\Users\\Ento\\Documents\\GCMS\\test_cataglyphis_iberica\\queens\\CI23_Q_120828_01.CDF /S /E")
seems to work - thank you all for the suggestions!
You've heard of bquote , noquote , sQuote, dQuote , quote enquote and Quotes, well now meet shQuote!!! :-)
This little function call works to format a string to be passed to an operating system shell. Personally I find that I can get embroiled in backslash escaping hell, and shQuote saves me. Simply type the character string as you would on the command line of your choice ('sh' for Unix alikes like bash , csh for the C-shell and 'cmd' for the Windows shell ) wihtin shQuote and it will format it for a call from R using system:
shQuote("C:/NIST08/AMDIS32/AMDIS_32.exe /S C:/Users/Ento/Documents/GCMS/test_cataglyphis_iberica/queens/CI23_Q_120828_01.CDF" , type = "cmd" )
#[1] "\"C:/NIST08/AMDIS32/AMDIS_32.exe /S C:/Users/Ento/Documents/GCMS/test_cataglyphis_iberica/queens/CI23_Q_120828_01.CDF\""
More generally, you can use shQuote like this:
system( shQuote( "mystring" , type = c("cmd","sh") ) , ... )

Resources