Loop to import arguments into R - r

I am new to R and I am trying to have a script get arguments from a file. I am using the following code within my R script:
args <- commandArgs(TRUE)
covr <- args[1]
rpts <- args[2]
The arguments will come from a parameters.tsv which will have two fields, one for each argument.
I want to run the R script with the parameters given in a line from parameters.tsv until all lines have been used to run the R script.
The end result will be qsub'ing a bash script to run each line into the R script.
This is what I came up with:
#!/bin/bash
cat parameters.tsv | while read v1 v2; do RScript --slave ‘--args $v1 $v2’ myscript.R; done
It's currently terminating almost immediately after i submit it and i don't understand why.
Any help is greatly appreciated since i am very new to this and anything i read prior did not explain in enough detail to grasp.

How about something like:
var_df <- read.csv([your_file_here]) # or read table with correct specs
for (i in 1:dim(var_df)[1]) { # vectorise for speed; doing it with loops to
# make this clearer
this_var_a <- var_df[i,1]
this_var_b <- var_df[i,2]
source([Rscript file here], local=TRUE) #set local=T as otherwise the vars
# will not be visible to the script's operations
}

Related

How to execute R inside snakemake

I find the snakemake documentation quite terse. Since I've been asked this a few times I thought posting the question and my own answer.
How do you execute or integrate this R script in a snakemake rule?
my-script.R
CUTOFF <- 25
dat <- read.table('input-table.txt')
out <- dat[dat$V1 > CUTOFF, ]
write.table(out, 'output-table.txt')
Snakefile:
rule one:
input:
txt= 'input-table.txt',
output:
out= 'output-table.txt',
params:
cutoff= 25,
# now what?
I'm going to propose three solutions, roughly in order from most canonical.
OPTION 1: Use the script directory in combination with the snakemake R object
Replace the actual filenames inside the R script with the S4 object snakemake
which holds input, output, params etc:
my-script.R
CUTOFF <- snakemake#params[['cutoff']]
dat <- read.table(snakemake#input[['txt']])
out <- dat[dat$V1 > CUTOFF, ]
write.table(out, snakemake#output[['out']])
Similarly, wildcards values would be accessible via snakemake#wildcards[['some-wildcard']]
The rule would look like:
rule one:
input:
txt= 'input-table.txt',
output:
out= 'output-table.txt',
params:
cutoff= 25,
script:
'/path/to/my-script.R'
Note that the script filename must end in .R or .r to
instruct snakemake that this is an R script.
Pros
This is the simplest option, just replace the actual filenames and parameters with snakemake#... in the R script
Cons
You may have many short scripts that make the pipeline difficult to follow.
What does my-script.R actually do?
--printshellcmds option is not helpful here
Debugging the R script may be cluncky because of the embedded snakemake object
OPTION 2: Write a standalone R script executable via shell directive
For example using the argparse library for R:
my-script.R
#!/usr/bin/env Rscript
library(argparse)
parser <- ArgumentParser(description= 'This progrom does stuff')
parser$add_argument('--input', '-i', help= 'I am the input file')
parser$add_argument('--output', '-o', help= 'I am the output file')
parser$add_argument('--cutoff', '-c', help= 'Some filtering cutoff', type= 'double')
xargs<- parser$parse_args()
CUTOFF <- xargs$cutoff
dat <- read.table(xargs$input)
out <- dat[dat$V1 > CUTOFF, ]
write.table(out, xargs$output)
The snakemake rule is like any other one executing shell commands:
rule one:
input:
txt= 'input-table.txt',
output:
out= 'output-table.txt',
params:
cutoff= 25,
shell:
r"""
/path/to/my-script.R --input {input.txt} --output {output.out} --cutoff {params.cutoff}
"""
Pros
You get a standalone, nicely documented script that you can use elsewhere
--printshellcmds tells you what is being executed and you can re-run it outside snakemake
Cons
Some setting up to do via argparse
Not so easy to debug by opening the R interpreter and running the individual R commands
OPTION 3 Create a temporary R script as an heredoc that you run via Rscript:
This is all self-contained in the rule:
rule one:
input:
txt= 'input-table.txt',
output:
out= 'output-table.txt',
params:
cutoff= 25,
shell:
r"""
cat <<'EOF' > {rule}.$$.tmp.R
CUTOFF <- {params.cutoff}
dat <- read.table('{input.txt}')
out <- dat[dat$V1 > CUTOFF, ]
write.table(out, '{output.out}')
EOF
Rscript {rule}.$$.tmp.R
rm {rule}.$$.tmp.R
"""
Explanantion: The syntax cat <<'EOF' tells Bash that everything until the EOF
marker is a string to be written to file {rule}.$$.tmp.R. Before running this
shell script, snakemake will replace the placeholders {...} as usual. So file
{rule}.$$.tmp.R will be one.$$.tmp.R where bash will replace $$ with the
current process ID. The combination of {rule} and $$ reasonably ensures
that each temporary R script has a distinct filename. NB: EOF is not a special
keyword, you can use any marker string to delimit the heredoc.
The content of {rule}.$$.tmp.R will be:
CUTOFF <- 25
dat <- read.table('input-table.txt')
out <- dat[dat$V1 > CUTOFF, ]
write.table(out, 'output-table.txt')
after execution via Rscript, {rule}.$$.tmp.R will be deleted by rm provided
that Rscripts exited clean.
Pros
Especially for short scripts, you see what the rule actually does. No
need to look into other script files.
--printshellcmds option shows the exact R code. You can copy & paste it
to the R interpreter as it is which is very useful for debugging and developing
Cons
Clunky, quite a bit of boiler plate code
If input/output/params are lists, you need to split them with e.g.
strsplit('{params}', sep= ' ') inside the R script. This is not great
especially if you have space inside the list items.
If Rscript fails the temp R script is not deleted and it litters your working
dir.
I've created a simple template for running snakemake with R - here's the Github link:
https://github.com/fritzbayer/snakemake-with-R
It shows two simple options for passing variables between snakemake and R.
#dariober has a thorough answer. I wanted to share the variation on Option 1 that I use to make interactive development/debugging of R scripts a bit easier.
Mock snakemake Object
One can include a preamble to create a mock snakemake object conditioned on whether the script is run interactively or not. This mimics the actual object that gets instantiated, enabling one to step through the code with a realistic input, but gets skipped over when Snakemake is executing the script.
scripts/my-script.R
#!/usr/bin/env Rscript
################################################################################
## Mock `snakemake` preamble
################################################################################
if (interactive()) {
library(methods)
Snakemake <- setClass(
"Snakemake",
slots=c(
input='list',
output='list',
params='list',
threads='numeric'
)
)
snakemake <- Snakemake(
input=list(txt="input-table.txt"),
output=list(out="output-table.txt"),
params=list(cutoff=25),
threads=1
)
}
################################################################################
CUTOFF <- snakemake#params$cutoff
dat <- read.table(snakemake#input$txt)
out <- dat[dat$V1 > CUTOFF, ]
write.table(out, snakemake#output$out)
Object Details
The full set of slots on the Snakemake class can be seen in the generate_preamble code from the snakemake.script.Rscript class. Since this is subject to change, one can inspect the code in the installed version using (from Python where Snakemake is installed):
from inspect import getsource
from snakemake.script import RScript
print(getsource(RScript.generate_preamble))

passing ellipsis (dotdotdot) argument to function command line in R

I have a function which takes multiple csv files and processes them into an excel file.
#review.R
review <- function(... ,savename) {
somecodes
}
and I have these files in my folder:
fileA.csv
fileB.csv
fileC.csv
fileD.csv
...
and this is how I run it:
review("fileA","fileB","fileC","fileD", savename="analysis")
And then it processes and outputs "analysis.xlsx"
I have no problem running it in RStudio, but I really would like to run my script in cmd line like this:
rscript.exe f_wrapper.r "fileA" "fileB" "fileC" "fileD" savename="analysis"
This is my f_wrapper.R
#f_wrapper.R
#this script doesn't work at all
args <- commandArgs(TRUE)
obj <- list(...)
source("my_R_folder/review.R")
review(obj)
I googled all over but all I could find was passing fixed arguments like a,b,c but I am trying to pass a,b,c,d,e .... and more arguments to my function.
Please help.

User input and output path of files in R

I am aiming to write an R script that would take the user path for the file and output name of the file. This will be followed by its processing and then output being stored in that file.
Normally, if I had to break this code on R studio it will look like this:
d<- read.table("data.txt" , header =T)
r<-summary(d)
print(r)
the output that is being displayed is also to be written in output file.
where data.txt is
1
2
3
4
45
58
10
What I would like to do is to put the code in a file called script.R and then run it as follows
R script.R input_file_path_name output_file_name
Could anyone spare a minute or two and help me out.
Many thanks in advance.
The most natural way to pass arguments from the command line is to use the function commandArgs. This function scans the arguments which have been supplied when the current R session was invoked. So creating a script named sillyScript.R which starts with
#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
and running the following command line
Rscript --vanilla sillyScript.R iris.txt out.txt
will create a string vector args which contains the entries iris.txt and out.txt.
Use the args[1] and args[2] vector as the input and outputfile paths.
https://www.r-bloggers.com/passing-arguments-to-an-r-script-from-command-lines/
You can consider this method:
script<-function(input_file){
input_file<-read.table("data.txt", header=T)
r<- summary(input_file)
return(r)}
If you want to manually choose the input file location you can use:
read.table(file.choose(),header=T)
Just executing the script with the input name is sufficient to return the desired output. For eg,
output_file<-script(input_file)
If you also want to export the data from R, you can modify the script as follows:
script<-function(input_file,output_file_name){
input_file<-read.table("data.txt", header=T)
r<- summary(input_file)
write.table(r,paste0(output_file_name,".txt",collapse="")
return(r)}
You have to use file name within inverted commas:
> vector<- script(input_file,"output_file_name")
By default, the output will be exported to current R directory. You can also specify output file location by putting the file location before output file_name within the script:
write.table(r,paste0("output_location",output_file_name,".txt",collapse="")

Run shell script in shiny

I think there is a problem in my shiny script, with executing a shell comand and I was wondering if there maybe is a way to do this command within shiny.
Outside of shiny my code functions.
#Function to calculate maxentscore
calculate_maxent <- function(seq_vector,maxent_type){
#Save working directory from before
Current_wkdir <- getwd()
#First, change working directory, to perl script location
setwd("S:/Arbeitsgruppen VIRO/AG_Sch/Johannes Ptok_JoPt/Rpackages/rnaSEQ/data/maxent")
#Create new text file with the sequences saved
cat(seq_vector,file="Input_sequences",sep="\n",append=TRUE)
#Execute the respective Perl script with the respective Sequence file
if(maxent_type == 3) cmd <- paste("score3.pl", "Input_sequences")
if(maxent_type == 5) cmd <- paste("score5.pl", "Input_sequences")
#Save the calculated Maxent score file
x <- shell(cmd,intern=TRUE)
#Reset the working directory
setwd(Current_wkdir)
#Substracting the Scores from the Maxent Score Table
x <- substr(x,(regexpr("\t",x)[[1]]+1),nchar(x))
#Returning the maxent table
return(x)
}
So basically I just try to execute following code:
shell("score5.pl Input_sequences")
This does seem to not be possible that way within shiny
I do not know the shell command, but executing shell commands is possible via system(). It even uses the current working directory set by R.
So you might try:
x <- system(cmd, intern=True)

Access, Update and Run an R script from another R script

I would like to access, update and run an R script from another R script. Specifically, I would like to create a loop that reads in the lines of another R script in order to update some variable names by adding increments of 1 to each of them. Once the variables have been updated, I would like to run the R script.
Currently my code for this is as follows:
for (i in 1:n) {
x <- readLines("Mscript0.R")
y <- gsub(paste0("Mtrain",i),paste0("Mtrain",i+1), x)
cat(y, file="Mscript0.R", sep="\n")
source("Mscript0.R")
}
Please note that the string "Mtrain" in the source script takes on various different forms such as:
Mtrain4 <- read.csv("Mtrain4.csv",header=T,sep=",")
s <- Mtrain4$Weight
Any Ideas?
Thanks

Resources