I am preparing some Rscript automation and when running below 3 R-files with Rscript command (in a terminal) I get the following error after running [result.R]:
Error: object 'a' not found
Execution halted.
I do check in R terminal that Rscript has created the variables [a] and [b], so it seems to be that when running the final Rscript, it does not recognize that the variables has been created in R.
I am running Linux Ubuntu 16.04 with R version 3.4.4.
All files are stored in the same folder.
Below is the content of my R scripts:
#File: a.R
# Content:
a <- 1
save.image ('.RData')
#File: b.R
# Content:
b <- 2
save.image ('.RData')
#File: result.R
# Content:
load('.RData')
c = a + b
save.image('.RData')
Scenario that works:
If I create the variables [a] and [b] in R terminal, and then run the result.R with Rscript, it does not throw an error message.
You need to add a load('.RData') to b.R. To prevent overwriting the objects from a.R.
#File: a.R
# Content:
a <- 1
save.image('.RData')
#File: b.R
# Content:
load('.RData')
b <- 2
save.image('.RData')
#File: result.R
# Content:
load('.RData')
c = a + b
save.image('.RData')
Related
I don't understand how to redefine my snakemake rule to fix the Wildcards issue below.
Ignore the logic of batches, it internally makes sense in the python script. In theory, I want the rule to be run for each batch 1-20. I use BATCHES list for {batch} in output, and in the shell command, I use {wildcards.batch}:
OUTDIR="my_dir/"
nBATCHES = 20
BATCHES = list(range(1,21)) # [1,2,3 ..20] list
[...]
rule step5:
input:
ids = expand('{IDLIST}', IDLIST=IDLIST)
output:
type1 = expand('{OUTDIR}/resources/{batch}_output_type1.csv.gz', OUTDIR=OUTDIR, batch=BATCHES),
type2 = expand('{OUTDIR}/resources/{batch}_output_type2.csv.gz', OUTDIR=OUTDIR, batch=BATCHES),
type3 = expand('{OUTDIR}/resources/{batch}_output_type3.csv.gz', OUTDIR=OUTDIR, batch=BATCHES)
shell:
"./some_script.py --outdir {OUTDIR} --idlist {input.ids} --total_batches {nBATCHES} --current_batch {wildcards.batch}"
Error:
RuleException in rule step5 in line 241 of Snakefile:
AttributeError: 'Wildcards' object has no attribute 'batch', when formatting the following:
./somescript.py --outdir {OUTDIR} --idlist {input.idlist} --totalbatches {nBATCHES} --current_batch {wildcards.batch}
Executing script for a single batch manually looks like this (and works): (total_batches is a constant; current_batch is supposed to iterate)
./somescript.py --outdir my_dir/ --idlist ids.csv --total_batches 20 --current_batch 1
You seem to want to run the rule step5 once for each batch in BATCHES. So you need to structure your Snakefile to do exactly that.
In the following Snakefile running the rule all runs your rule step5 for all combinations of OUTDIR and BATCHES:
OUTDIR = "my_dir"
nBATCHES = 20
BATCHES = list(range(1, 21)) # [1,2,3 ..20] list
IDLIST = ["a", "b"] # dummy data, I don't have the original
rule all:
input:
type1=expand(
"{OUTDIR}/resources/{batch}_output_type1.csv.gz",
OUTDIR=OUTDIR,
batch=BATCHES,
),
rule step5:
input:
ids=expand("{IDLIST}", IDLIST=IDLIST),
output:
type1="{OUTDIR}/resources/{batch}_output_type1.csv.gz",
type2="{OUTDIR}/resources/{batch}_output_type2.csv.gz",
type3="{OUTDIR}/resources/{batch}_output_type3.csv.gz",
shell:
"./some_script.py --outdir {OUTDIR} --idlist {input.ids} --total_batches {nBATCHES} --current_batch {wildcards.batch}"
In your earlier version {batches} was just an expand-placeholder, but not a wildcard and the rule was only called once.
Instead of the rule all, this could be a subsequent rule which uses one or multiple of the outputs generated from step5.
Currently I have an R Script that takes 8 parameters that are hard-coded as the first 8 lines of my script.
I've made a Batch file to try and manually change the parameters on the fly, but it doesn't seem to be working the way I want it to.
Batch that currently runs the script (But doesnt actually change the parameters)
echo off
ECHO PRESS ENTER AT ANY INPUT TO ACCEPT the DEFAULT VALUE.
:: Setting of Variables
#Set /P RScript=Set path to R:_
#Set /P RProgram=Set path to RScript:_
#Set /P RStartDir=Set Start Directory:_
#Set /P Begin=Begin on which Loan?:_
#Set /P End=End on which Loan?:_
#Set /P OutputDir=Set Output Directory:_
#Set /P Deal=Set Deal input file (.txt):_
#Set /P OutputFile=Name Deal Output File:_
#Set /P AsOfDate=As of Date?:_
#Set /P ThirtyYrSpread=Thirty Year Mortgage Spread?:_
:: If Blank (enter), set variables/paths to Defaults (Listed Below)
if "%RScript%"=="" Set RScript=c:\program files\r\r-
3.4.3\bin\x64\rscript.exe
if "%RProgram%"=="" Set RProgram=C:\MortgageMatt\Cirt2014-
1\0.Mortgage Model.R
if "%RStartDir%"=="" Set RStartDir=C:\MortgageMatt\Cirt2014-1
if "%Begin%"=="" Set Begin=1
if "%End%"=="" Set End=2
if "%OutputDir%"=="" Set OutputDir=C:\MortgageMatt\Cirt2014-1
if "%Deal%"=="" Set Deal=Cirt 2014-1 Loan Level.txt
if "%OutputFile%"=="" Set OutputFile=Cirt 2014-1d
if "%AsOfDate%"=="" Set AsOfDate=62017
if "%ThirtyYrSpread%" == "" Set ThirtyYrSpread=135
echo "%RScript% %RProgram% %RStartDir% %Begin% %End% %OutputDir% %Deal%
%OutputFile% %AsOfDate% %ThirtyYrSpread%"
ECHO PLEASE CHECK IF THESE VALUES ARE CORRECT
pause
:: Command Prompt, /c Carries out command specified by string and then
terminates
cmd /c ""%RScript%" "%RProgram%" "%RStartDir%" "%Begin%" "%End%"
"%OutputDir%" "%Deal%" "%OutputFile%" "%AsOfDate%" "%ThirtyYrSpread%""
So because the parameters were actually hard coded into the R Script, this is what I've added to try to accommodate. Does this look okay? I think this is where I'm running into errors.
Added to R Script
args <- commandArgs(trailingOnly = TRUE)
if (length(args) == 0) {
if (!exists("dataDir")) { stop("variables dataDir not found") }
# Set dataDir variable when Running inside a R Session
args <- c(getwd(), 1, 2, ".", "Cirt 2014-1 Loan Level.txt", "Cirt 2014-
1", "62017", 175)
}
print(args)
# Input Values
Input.Directory <- paste(args[1]) ## getwd() , "/", "inputs", sep = "")
Begin.Sim <- args[2]
End.Sim <- args[3]
Output.Directory <- paste(args[1],"\\",args[4],sep = "") ##, "/", "outputs",
sep = "")
Pool.ID.File <- args[5] #"Cirt 2014-1 Loan Level.txt"
Pool.ID <- args[6] #"Cirt 2014-1"
asofdate <- args[7] #"62017"
Thirty.Yr.Mort.Spread <- args[8] # 175
When I try to run it in cmd using the .bat.. I get an error that says cannot change working directory. Anyone have any suggestions?
I sort-of understand where the error is but I'm struggling to fix it.
The path to my file with everything in it is
C:\MortgageMatt\Cirt2014-1
Edit:
I've also heard of something called R CMD Batch... should I look into that? I'm finding that it's an older technique.
What my code looked like before the Args/IF
# Input Values
Input.Directory <- "C:/Mortgage/Cirt 2014 - 1"
Output.Directory <- "C:/Mortgage/Cirt 2014 - 1"
Pool.ID.File <- "Cirt 2014-1 Loan Level.txt"
Pool.ID <- "Cirt 2014-1 NEW"
start<- 1
sims <- 2 # Number of Simulations
asofdate <- "62017"
Thirty.Yr.Mort.Spread <- 175
You can do all of this in R using one of these packages to parse command-line options:
docopt (my favourite)
optparse
argparse
getopt
or doing it manually -- not recommended.
You also do not want the older R CMD BATCH -- use Rscript (or littler, but littler does not work on Windows).
Code Example
#!/usr/bin/Rscript
suppressMessages(library(docopt))
doc <- "Usage: foo.R [-h] [-x] [--src REPODIR] [--out OUTDIR] [FILES...]
-s --src REPODIR source root directory [default: ~/git]
-o --out OUTDIR output directory [default: /tmp]
-h --help show this help text"
opt <- docopt(doc) # docopt parsing
print(opt)
Use with -h
You get a nice message, automatically, with not formatting need:
edd#rob:/tmp$ Rscript so50256138.R -h
Usage: foo.R [-h] [-x] [--src REPODIR] [--out OUTDIR] [FILES...]
-s --src REPODIR source root directory [default: ~/git]
-o --out OUTDIR output directory [default: /tmp]
-h --help show this help text
edd#rob:/tmp$
Use with argument
Note how one default argument is used, and the other from the command-line:
edd#rob:/tmp$ Rscript so50256138.R -s A
List of 9
$ --src : chr "A"
$ --out : chr "/tmp"
$ --help: logi FALSE
$ -x : logi FALSE
$ FILES : list()
$ src : chr "A"
$ out : chr "/tmp"
$ help : logi FALSE
$ x : logi FALSE
NULL
You can access them in opt by name or by option flag.
The docopt site has more; this is actually a portable specification and the CRAN package implements it for R.
For some reason, the optparse usage in this script breaks:
test.R:
#!/usr/bin/env Rscript
library("optparse")
option_list <- list(
make_option(c("-n", "--name"), type="character", default=FALSE,
dest="report_name", help="A different name to use for the file"),
make_option(c("-h", "--height"), type="numeric", default=12,
dest = "plot_height", help="Height for plot [default %default]",
metavar="plot_height"),
make_option(c("-w", "--width"), type="numeric", default=10,
dest = "plot_width", help="Width for plot [default %default]",
metavar="plot_width")
)
opt <- parse_args(OptionParser(option_list=option_list), positional_arguments = TRUE)
print(opt)
report_name <- opt$options$report_name
plot_height <- opt$options$plot_height
plot_width <- opt$options$plot_width
input_dir <- opt$args[1] # input directory
I get this error:
$ ./test.R --name "report1" --height 42 --width 12 foo
Error in getopt(spec = spec, opt = args) :
redundant short names for flags (column 2).
Calls: parse_args -> getopt
Execution halted
However, if I remove the "-h" from this line:
make_option(c("--height"), type="numeric", default=12,
dest = "plot_height", help="Height for plot [default %default]"
It seems to work fine;
$ ./test.R --name "report1" --height 42 --width 12 foo
$options
$options$report_name
[1] "report1"
$options$plot_height
[1] 42
$options$plot_width
[1] 12
$options$help
[1] FALSE
$args
[1] "foo"
Any ideas what might be going on here?
I am using R 3.3.0 and optparse_1.3.2 (getopt_1.20.0)
The -h flag is reserved by optparse (which is described as a feature of optparse which is not in getopt, from the getopt.R source file on Github):
Some features implemented in optparse package unavailable in getopt:
2. Automatic generation of an help option and printing of help text when encounters an "-h"
Therefore, when the user specifies -h, the sanity check for uniqueness of flags fails. The issue tracker does not seem to have any mention of the need to create a better error message for this case, however.
Finally, note that optparse seems to be invoking getopt, as they have the same author.
I'm trying a SnakeMake pipeline and I'm stucked on an error I really don't understand.
I've got a directory (raw_data) in which I have the input files :
ll /home/nico/labo/etudes/Optimal/data/raw_data
total 41M
drwxrwxr-x 2 nico nico 4,0K mars 6 16:09 ./
drwxrwxr-x 11 nico nico 4,0K mars 6 16:14 ../
-rw-rw-r-- 1 nico nico 15M févr. 27 12:21 sampleA_R1.fastq.gz
-rw-rw-r-- 1 nico nico 19M févr. 27 12:22 sampleA_R2.fastq.gz
-rw-rw-r-- 1 nico nico 3,4M févr. 27 12:21 sampleB_R1.fastq.gz
-rw-rw-r-- 1 nico nico 4,3M févr. 27 12:22 sampleB_R2.fastq.gz
This directory contains 4 files for 2 samples.
I created a config json file for the SnakeMake pipeline named config_snakemake_Optimal_mapping_BaL.json:
{
"fastqExtension": "fastq.gz",
"fastqDir": "/home/nico/labo/etudes/Optimal/data/raw_data",
"outputDir": "/home/nico/labo/etudes/Optimal/data/mapping_BaL",
"logDir": "logs",
"reference": {
"fasta": "/home/nico/labo/references/genomes/HIV1/BaL_AY713409/BaL_AY713409.fasta",
"index": "/home/nico/labo/references/genomes/HIV1/BaL_AY713409/BaL_AY713409.fasta.bwt"
}
}
And finally the SnakeMake file snakefile_bwa_samtools.py:
import subprocess
from os.path import join
### Globals ---------------------------------------------------------------------
# A Snakemake regular expression matching fastq files.
SAMPLES, = glob_wildcards(join(config["fastqDir"], "{sample}_R1."+config["fastqExtension"]))
print(SAMPLES)
### Rules -----------------------------------------------------------------------
# Pipeline output files
rule all:
input: expand(join(config["outputDir"], "{sample}.bam.bai"), sample=SAMPLES)
# Reads alignment on reference genome and BAM file creation
rule bwa_mem_to_bam:
input:
index = config["reference"]["index"],
fasta = config["reference"]["fasta"],
fq1_ID = "{sample}_R1."+config["fastqExtension"],
fq2_ID = "{sample}_R2."+config["fastqExtension"],
fq1 = join(config["fastqDir"], "{sample}_R1."+config["fastqExtension"]),
fq2 = join(config["fastqDir"], "{sample}_R2."+config["fastqExtension"])
output:
temp(join(config["outputDir"], "{sample}.bamUnsorted"))
version:
subprocess.getoutput(
"man bwa | tail -n 1 | cut -d ' ' -f 1 | cut -d '-' -f 2"
)
log:
join(config["outputDir"], config["logDir"], "{sample}.bwa_mem.log")
message:
"Alignment of {input.fq1_ID} and {input.fq2_ID} on {input.fasta} with BWA version {version}."
shell:
"bwa mem {input.fasta} {input.fq1} {input.fq2} 2> {log} | samtools view -Sbh - > {output}"
# Sorting the BAM files on genomic positions
rule bam_sort:
input:
join(config["outputDir"], "{sample}.bamUnsorted")
output:
join(config["outputDir"], "{sample}.bam")
log:
join(config["outputDir"], config["logDir"], "{sample}.samtools_sort.log")
version:
subprocess.getoutput(
"samtools --version | "
"head -1 | "
"cut -d' ' -f2"
)
message:
"Genomic sorting of {input} with samtools version {version}."
shell:
"samtools sort -f {input} {output} 2> {log}"
# Indexing the BAM files
rule bam_index:
input:
join(config["outputDir"], "{sample}.bam")
output:
join(config["outputDir"], "{sample}.bam.bai")
message:
"Indexing {input}."
shell:
"samtools index {input}"
I run this pipeline:
snakemake --cores 3 --snakefile /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py --configfile /home/nico/labo/etudes/Optimal/data/snakemake_config_files/config_snakemake_Optimal_mapping_BaL.json
and I've got the following error outputs:
['sampleB', 'sampleA']
MissingInputException in line 18 of /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py:
Missing input files for rule bwa_mem_to_bam:
sampleB_R1.fastq.gz
sampleB_R2.fastq.gz
or depending the moment:
['sampleB', 'sampleA']
PeriodicWildcardError in line 40 of /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py:
The value _unsorted in wildcard sample is periodically repeated (sampleB_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted). This would lead to an infinite recursion. To avoid this, e.g. restrict the wildcards in this rule to certain values.
The samples are correctly detected as they appear in the list (first line of kind of outputs) and I'm surely messing around with the wildcards in the rule bwa_mem_to_bam, but I really don't get why..
Any clue?
I quickly looked your code.
Why didn't the first one work out?
Look when you declare fq1_ID and fq1, same for sample 2. You didn't assign the same string. For fq1 you add a repertory for the file witch is not present for fq1_ID so snakemake is searching it in the workdir (current directory if -d option is not set) a file name with your string. Beacuse these variables are in input section.
So by removing the two fq1/2_ID, it will erase all files searching problems.
Hugo
Finally, I succed with the pipeline removing the fq1_ID and fq2_ID variables in the rule bwa_mem_to_bam and replacing in the message of the rule input.fq1_ID and input.fq2_ID by input.fq1 and input.fq2.
The message is less elegant, but the pipeline is running correctly. Still doesn't understand exactly where was the mistake, if someone can explain, I'm still listening!
The correct code for rule bwa_mem_to_bam:
rule bwa_mem_to_bam:
input:
index = config["reference"]["index"],
fasta = config["reference"]["fasta"],
fq1 = join(config["fastqDir"], "{sample}_R1."+config["fastqExtension"]),
fq2 = join(config["fastqDir"], "{sample}_R2."+config["fastqExtension"])
output:
temp(join(config["outputDir"], "{sample}.bamUnsorted"))
version:
subprocess.getoutput(
"man bwa | tail -n 1 | cut -d ' ' -f 1 | cut -d '-' -f 2"
)
log:
join(config["outputDir"], config["logDir"], "{sample}.bwa_mem.log")
message:
"Alignment of {input.fq1} and {input.fq2} on {input.fasta} with BWA version {version}."
shell:
"bwa mem {input.fasta} {input.fq1} {input.fq2} 2> {log} | samtools view -Sbh - > {output}"
Thanks Hugo for checking my code and your explanation, it makes sense!
I finally get a flash idea waking up this morning (the best ones), and realized that I neglected the params part of the rule, fq1_ID and fq2_ID are not inputs but params..
I changed the code to that:
rule bwa_mem_to_bam:
input:
index = config["reference"]["index"],
fasta = config["reference"]["fasta"],
fq1 = join(config["fastqDir"], "{sample}_R1.fastq.gz"),
fq2 = join(config["fastqDir"], "{sample}_R2.fastq.gz")
output:
temp(join(config["outputDir"],"{sample}_unsorted.bam"))
params:
fq1_ID = "{sample}_R1.fastq.gz",
fq2_ID = "{sample}_R2.fastq.gz",
ref_ID = os.path.basename(config["reference"]["fasta"])
version:
subprocess.getoutput(
"man bwa | tail -n 1 | cut -d ' ' -f 1 | cut -d '-' -f 2"
)
log:
join(config["outputDir"], config["logDir"], "{sample}.bwa_mem.log")
message:
"Alignment of {params.fq1_ID} and {params.fq2_ID} on {params.ref_ID} with BWA version {version}."
shell:
"bwa mem {input.fasta} {input.fq1} {input.fq2} 2> {log} | samtools view -Sbh - > {output}"
And it works just fine!
snakemake --cores 3 --snakefile /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py --configfile /home/nico/labo/etudes/Optimal/data/snakemake_config_files/config_snakemake_Optimal_mapping_BaL.json
Provided cores: 3
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
2 bam_index
2 bam_sort
2 bwa_mem_to_bam
7
Alignment of sampleB_R1.fastq.gz and sampleB_R2.fastq.gz on BaL_AY713409.fasta with BWA version 0.7.12.
Alignment of sampleA_R1.fastq.gz and sampleA_R2.fastq.gz on BaL_AY713409.fasta with BWA version 0.7.12.
1 of 7 steps (14%) done
Genomic sorting of sampleB_unsorted.bam with samtools version 1.2.
Removing temporary output file /home/nico/labo/etudes/Optimal/data/mapping_BaL/sampleB_unsorted.bam.
2 of 7 steps (29%) done
Indexing sampleB.bam.
3 of 7 steps (43%) done
4 of 7 steps (57%) done
Genomic sorting of sampleA_unsorted.bam with samtools version 1.2.
Removing temporary output file /home/nico/labo/etudes/Optimal/data/mapping_BaL/sampleA_unsorted.bam.
5 of 7 steps (71%) done
Indexing sampleA.bam.
6 of 7 steps (86%) done
localrule all:
input: /home/nico/labo/etudes/Optimal/data/mapping_BaL/sampleB.bam.bai, /home/nico/labo/etudes/Optimal/data/mapping_BaL/sampleA.bam.bai
7 of 7 steps (100%) done
And finally get my correct messages:
Alignment of sampleB_R1.fastq.gz and sampleB_R2.fastq.gz on
BaL_AY713409.fasta with BWA version 0.7.12.
Alignment of sampleA_R1.fastq.gz and sampleA_R2.fastq.gz on BaL_AY713409.fasta
with BWA version 0.7.12.
I am running an R script via bash script and want to return the output of the R script to the bash script to keep working with it there.
The bash is sth like this:
#!/bin/bash
Rscript MYRScript.R
a=OUTPUT_FROM_MYRScript.R
do sth with a
and the R script is sth like this:
for(i in 1:5){
i
sink(type="message")
}
I want bash to work with one variable from R at the time, meaning: bash receives i=1 and works with that, when that task is done, receives i=2 and so on.
Any ideas how to do that?
One option is to make your R script executable with #!/usr/bin/env Rscript (setting the executable bit; e.g. chmod 0755 myrscript.r, chmod +x myrscript.r, etc...), and just treat it like any other command, e.g. assigning the results to an array variable below:
myrscript.r
#!/usr/bin/env Rscript
cat(1:5, sep = "\n")
mybashscript.sh
#!/bin/bash
RES=($(./myrscript.r))
for elem in "${RES[#]}"
do
echo elem is "${elem}"
done
nrussell$ ./mybashscript.sh
elem is 1
elem is 2
elem is 3
elem is 4
elem is 5
Here is MYRScript.R:
for(iter in 1:5) {
cat(iter, ' ')
}
and here is your bash script:
#!/bin/bash
r_output=`Rscript ~/MYRscript.R`
for iter in `echo $r_output`
do
echo Here is some output from R: $iter
done
Here is some output from R: 1
Here is some output from R: 2
Here is some output from R: 3
Here is some output from R: 4
Here is some output from R: 5