Passing a predefined filename argument to awk via system() in R - r

I am struggling with something small but important with syntax, trying to pass a pre-defined path and filename to awk within the system() call in R (OSX, R3.0.1; readLines() and scan() can NOT accomplish what I need).
The use of system and the file name, directly within R, works fine
system("awk 'NR==2' ~/path/filename", intern=TRUE)
However
filename<-"~/path/filename"
system("awk 'NR==2' filename", intern=TRUE)
returns the frustrating error
character(0)
attr(,"status")
[1] 2
Warning message:
running command 'awk 'NR==2' filename' had status 2
awk: can't open file filename
source line number 1
I expect I need to escape something somewhere in the filename, but I don't know where, or how.

This would be my first line of R code. :)
I guess the problem is you wrote the filename variable in literal string. You should first build the awk command with string concatenation, and then pass it to system(), like:
system(paste("awk 'NR==2' ", filename), intern=TRUE)

Try to replace ~/path/filename with its absolute form instead. e.g. /home/user/path/filename.

Related

What does the `input` argument do in the system() function in R?

What does the input argument do in the system() function in R? For example in the code below
authentication_test <- "authentication_test aws s3 ls s3://test-bucket/ > /dev/null"
system(authentication_test, input = "q")
I don't understand what purpose the letter q serves.
Looking at the help file, input is described as
input: if a character vector is supplied, this is copied
one string per line to a temporary file, and the standard
input of command is redirected to the file.
but I still have trouble understanding what exactly it is doing.
input creates a temporary file which is used as STDIN for the system shell command.
Take for example the cat command:
system("cat", input = "Line1\nLine2")
#Line1
#Line2
In your bash shell this would be the same as
echo -e "File1\nFile2" > file
cat < file
#Line1
#Line2

R or bash command line length limit

I'm developing a bash program that execute a R oneliner command to convert a RMarkdown template into a HTML document.
This R oneliner command looks like:
R -e 'library(rmarkdown) ; rmarkdown::render( "template.Rmd", "html_document", output_file = "report.html", output_dir = "'${OUTDIR}'", params = list( param1 = "'${PARAM1}'", param2 = "'${PARAM2}'", ... ) )
I have a long list of parameters, let's say 10 to explain the problem, and it seems that the R or bash has a command line length limit.
When I execute the R oneliner with 10 parameters I obtain a error message like this:
WARNING: '-e library(rmarkdown)~+~;~+~rmarkdown::render(~+~"template.Rmd",~+~"html_document",~+~output_file~+~=~+~"report.html",~+~output_dir~+~=~+~"output/",~+~params~+~=~+~list(~+~param1~+~=~+~"param2", ...
Fatal error: you must specify '--save', '--no-save' or '--vanilla'
When I execute the R oneliner with 9 parameters it's ok (I tried different combinations to verify that the problem was not the last parameter).
When I execute the R oneliner with 10 parameters but with removing all spaces in it, it's ok too so I guess that R or bash use a command line length limit.
R -e 'library(rmarkdown);rmarkdown::render("template.Rmd","html_document",output_file="report.html",output_dir="'${OUTDIR}'",params=list(param1="'${PARAM1}'",param2="'${PARAM2}'",...))
Is it possible to increase this limit?
This will break a number of ways – including if your arguments have spaces or quotes in them.
Instead, try passing the values as arguments. Something like this should give you an idea how it works:
# create a script file
tee arguments.r << 'EOF'
argv <- commandArgs(trailingOnly=TRUE)
arg1 <- argv[1]
print(paste("Argument 1 was", arg1))
EOF
# set some values
param1="foo bar"
param2="baz"
# run the script with arguments
Rscript arguments.r "$param1" "$param2"
Expected output:
[1] "Argument 1 was foo bar"
Always quote your variables and always use lowercase variable names to avoid conflicts with system or application variables.

'embedded nul in string' with fread (tried all other method still couldn't solve)

I'm using Mac with RStudio 0.99.489 and R3.2.2. I have a csv file of 1GB, it's not exactly big but still takes around 5 min if I tried to import it with read.csv, and I have a lot files of this size so I tried fread(). From reading the previous questions, I learned that this error might be because of missing values on date (a normal entries would be like '03May1995:15:31:50' for the date column, however, where the error occurs, it looks like '05May').
I tried sed 's/\\0//g' mycsv1.csv > mycsv2.csv as mentioned in 'Embedded nul in string' error when importing csv with fread, but the same error message still pops up.
sed -i 's/\\0//g' /src/path/mycsv.csv simply doesn't work for me, the terminal reports error for this command line (I'm not very familiar with those command lines, so I don't understand the logic behind those)
I tried
file <- "file.csv"
tt <- tempfile() # or tempfile(tmpdir="/dev/shm")
system(paste0("tr < ", file, " -d '\\000' >", tt))
fread(tt)
from 'Embedded nul in string' when importing large CSV (8 GB) with fread(), I guess it removed the entries where there is a missing value, because when I run fread(tt) R says
Error in fread(tt) :
Expecting 5 cols, but line 5060627 contains text after processing all cols. It is very likely that this is due to one or more fields having embedded sep=',' and/or (unescaped) '\n' characters within unbalanced unescaped quotes.
After that, I tried iconv -f utf-16 -t utf-8 myfile1.csv > myfile2.csv because it seems like this was caused by some problem with fread can't comprehend utf-16, and there might be something wrong with this command line, but it simply gives me a spread sheet with random symbols.
And I saw this
vim filename.csv
:%s/CTRL+2//g
ESC #TO SWITCH FROM INSERT MODE
:wq # TO SAVE THE FILE
from Error with fread in R--embedded nul in string: '\0' but after I typed in vim filename.csv, the terminal just read in the whole spreadsheet and I couldn't type in the 2nd command (:%s/CTRL+2//g), again, I don't really understand those command lines, so maybe I need to make some adjustment to my situation.
Thanks for the help!
try
sed -i 's/\x0//g' my_file
or
cat my_file|tr -d '\000' > new_file

Have an Rscript read or take input from stdin

I see how to have an Rscript perform the operations I want when given a filename as an argument, e.g. if my Rscript is called script and contains:
#!/usr/bin/Rscript
path <- commandArgs()[1]
writeLines(readLines(path))
Then I can run from the bash command line:
Rscript script filename.Rmd --args dev='svg'
and successfully get the contents of filename.Rmd echoed back out to me. If instead of passing the above argument a filename like filename.Rmd I want to pass it text from stdin, I try modifying my script to read from stdin:
#!/usr/bin/Rscript
writeLines(file("stdin"))
but I do not know how to construct the commandline call for this case. I tried piping in the contents:
cat filename.Rmd | Rscript script --args dev='svg'
and also tried redirect:
Rscript script --args dev='svg' < filename.Rmd
and either way I get the error:
Error in writeLines(file("stdin")) : invalid 'text' argument
(I've also tried open(file("stdin"))). I'm not sure if I'm constructing the Rscript incorrectly, or the commandline argument incorrectly, or both.
You need to read text from the connection created by file("stdin") in order to pass anything useful to the text argument of writeLines(). This should work
#!/usr/bin/Rscript
writeLines(readLines(file("stdin")))

Executing expressions in Rscript.exe

I'd like to put some expressions that write stuff to a file directly into a call to Rscript.exe (without specifying file in Rscript [options] [-e expression] file [args] and thus without an explicit R script that is run).
Everything seems to work except the fact that the desired file is not created. What am I doing wrong?
# Works:
shell("Rscript -e print(Sys.time())")
# Works:
write(Sys.time(), file='c:/temp/systime.txt')
# No error, but no file created:
shell("Rscript -e write(Sys.time(), file='c:/temp/systime.txt')")
Rscript parses its command line using spaces as separators. If your R string contains embedded spaces, you need to wrap it within quotes to make sure it gets sent as a complete unit to the R parser.
You also don't need to use shell unless you specifically need features of cmd.exe.
system("Rscript.exe -e \"write(Sys.time(), file='c:/temp/systime.txt')\"")

Resources