How to force wildcards into --report caption - report

I am using snakemake --report (v5.9.1) to create .html reports for pipeline/results. However I cannot use wildcards in the caption parameter of report().
Here is a short example that works, without using wildcards in caption
rule all:
input: expand("doit.{role}", role=["founder","offspring"])
rule doit:
output: report(touch("doit.{role}"),caption="doit.rst")
run: print(output[0])
Now, what I want/need is a separate caption for founder and offspring .
I have tried to simply add the {role}wildcard to the caption:
output: report(touch("doit.{role}"),caption="doit.{role}.rst")
but that gives an error
FileNotFoundError: [Errno 2] No such file or directory: 'sandBox/doit.{role}.rst'
but only when generating the hmtl-file by running snakemake --report . (Running the pipeline is OK).
It seems that output wildcards are not evaluated/substituted when captionis parsed.
I am using caption-functionality to display short results, as well as ordering the results in the .html report. (related to Snakemake report: How to show results in pipeline order ).
Can anyone suggest a work-around or a better pattern for what I am trying to do?

Related

Use wildcard to capture different datasets in Snakemake

I would like to use wildcards in Snakemake in a very simple way to start a script for two datasets. Unfortunately, I cannot find the proper way of doing it.
My data folder contains three files: gene_list.txt, expression_JGI.txt, expression_UBC.txt.
Here is what my snakefile looks like:
rule extract:
input:
genes="data/gene_list.txt",
expression="data/expression_{dataset}.txt"
output:
"data/expression_{dataset}_subset.txt"
shell:
"bash scripts/extract.sh {input.genes} {input.expression} {output}"
When I use snakemake -c1 extract I get the following error message:
Building DAG of jobs...
WorkflowError:
Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards at the command line, or have a rule without wildcards at the very top of your workflow (e.g. the typical "rule all" which just collects all results you want to generate in the end).
I tried adding a rule all at the beginning of the snakefile with the desired result files as input without success:
rule all:
input:
"data/expression_JGI_subset.txt",
"data/expression_UBC_subset.txt"
I also tried with expand:
DATASETS = ["JGI", "UBC"]
rule all:
input:
expand("data/expression_{dataset}_subset.txt", dataset=DATASETS)
But I get the same error message.
The script works fine when I use it outside Snakemake.
How can I achieve what I want?
When you do snakemake -c1 extract you ask snakemake to execute only rule extract and its dependencies, if any. However, because extract contains wildcards snakemake doesn't know what to replace them with. (Note that rule all is not a dependency of extract).
So either execute snakemake -c1 to run the whole pipeline or specify the concrete files you want to generate, e.g.:
snakemake -c1 -- data/expression_JGI_subset.txt data/expression_UBC_subset.txt

define SAMPLE for different dir name and sample name in snakemake code

I have written a snakemake code to run bwa_map. Fastq files are with different folder name and different sample name (paired end). It shows error as 'SAMPLES' is not defined. Please help.
Error:
$snakemake --snakefile rnaseq.smk mapped_reads/EZ-123-B_IGO_08138_J_2_S101_R2_001.bam -np
*NameError in line 2 of /Users/singhh5/Desktop/tutorial/rnaseq.smk:
name 'SAMPLES' is not defined
File "/Users/singhh5/Desktop/tutorial/rnaseq.smk", line 2, in *
#SAMPLE DIRECTORY
fastq
Sample_EZ-123-B_IGO_08138_J_2
EZ-123-B_IGO_08138_J_2_S101_R1_001.fastq.gz
EZ-123-B_IGO_08138_J_2_S101_R2_001.fastq.gz
Sample_EZ-123-B_IGO_08138_J_4
EZ-124-B_IGO_08138_J_4_S29_R1_001.fastq.gz
EZ-124-B_IGO_08138_J_4_S29_R2_001.fastq.gz
#My Code
expand("~/Desktop/{sample}/{rep}.fastq.gz", sample=SAMPLES)
rule bwa_map:
input:
"data/genome.fa",
"fastq/{sample}/{rep}.fastq"
conda:
"env.yaml"
output:
"mapped_reads/{rep}.bam"
threads: 8
shell:
"bwa mem {input} | samtools view -Sb -> {output}"
The specific error you are seeing is because the variable SAMPLES isn't set to anything before you use it in expand.
Some other issues you may run into:
Output file is missing the {sample} wildcard.
The value of threads isn't passed into bwa or samtools
You should place your expand into the input directive of the first rule in your snakefile, typically called all to properly request the files from bwa_map.
You aren't pairing your reads (R1 and R2) in bwa.
You should look around stackoverflow or some github projects for similar rules to give you inspiration on how to do this mapping.

How to set custom filename for pabot result (html)

I implemented test cases for my application and decided to run it everyday. The problem is the result of the previous test will be overwritten by the latest test result. I need to keep them both so I came up with a solution that include the test date and time in the report name, for example; report-202111181704.html (use time in 24-hour format).
I searched through the internet and did not found any solution yet. Anybody here know the solution? or any alternative solution will be fine.
It depends on where you execute your tests. From command line you can save the date to variable. Then use this variable to change the name of generated outputs. For example
date=$(date '+%Y-%m-%d_%H:%M:%S')
robot --output ${date}output.xml --log ${date}log.html --report ${date}report.html test.robot
I found the solution. Instead of setting .html file name, I create a folder and put the result there.
To do this, add --outputdir in pabot command so it's gonna look like this
pabot --pabotlibport $PABOT_PORT --pabotlib --resourcefile ./DeviceSet.dat --processes $thread --verbose --outputdir ./result/$OUTPUT_DIR $ENV
where
$OUTPUT_DIR=`date + "%Y%m%d-%H%M"`
The output folder gonna be like ./result/20220301-2052

Snakemake specify a new wildcard in a new rule

I have input files:
Bob_1.fastq.gz
Bob_2.fastq.gz
Bob_3.fastq.gz
Bob_4.fastq.gz
Ron_1.fastq.gz
Ron_2.fastq.gz
Ron_3.fastq.gz
Ron_4.fastq.gz
I am running demultiplexing and trimming steps in one snakefile, like this:
workdir: "/path/to/dir/"
(SAMPLES,) =glob_wildcards('/path/to/dir/raw/{sample}.fastq.gz')
rule all:
input:
expand("demulptiplex/{sample}.fastq.gz", sample=SAMPLES),
expand("trimmed/{sample}.trimmed.fastq.gz", sample=SAMPLES)
rule sabre:
input:
infile="/path/to/dir/raw/{sample}.fastq.gz",
barcodefile= "files/{sample}.txt"
output:
unknownfile=temp("demulptiplex/unknown_barcode_{sample}.fastq.gz"),
shell:
"""
/Tools/sabre-master2/sabre se -f {input.infile} -b {input.barcodefile} -u {output.unknownfile}
"""
rule trimmomatic_se:
input:
r="{sample}.fastq.gz"
output:
r="trimmed/{sample}.trimmed.fastq.gz"
threads: 10
shell:
"""java -jar /Tools/Trimmomatic-0.36/trimmomatic-0.36.jar SE -threads {threads} {input.r} {output.r} ILLUMINACLIP:/Tools/Trimmomatic-0.36/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36"""
The demultiplex output files are like this:
Bob_1_CL1.fastq.gz.... Bob_1_CL345.fastq.gz
Bob_2_CL1.fastq.gz.... Bob_1_CL248.fastq.gz
Ron_1_dad1.fastq.gz... Ron_1_dad67.fastq.gz
and so on
So,if I do not specify the demultiplex output file the program would create it by itself. My problem is how to specify/introduce a new wildcard from the output of the previous rule in the next trimming step, as the wildcards are different from initial sample now.
Wildcards just need to be consistent in a rule, not across the workflow. The issue here is that you have a rule generating 'unknown' outputs that you need to process further. For that you need to use checkpoints.
Read through the second block of code about aggregating. Your checkpoint will be demultiplexing and if you don't have any other steps, all will be your aggregate step that calls checkpoints.demultiplex.get. If you search for checkpoint on stackoverflow you will find lots of examples; it's a hard feature to use at first!

remove log information from report and save report in desire location

I am new to robot framework and wanted to see if i can get any simple code for custom report. I am also fine with answer to my problem. I went through all questions related to report but could not find any specific answer to my problem. currently my report contains log and wanted to see if i can remove log information from reports and save report in specific location. I just want to get PASS/FAIL information in my report. Can any one give me example how i can overcome this problem? I also need to know how i can save my report in different location. Any example would be helpful. Thank you in advance.
There is a tool called Rebot which is part of Robot Framework.
By default, Robot Framework creates XML reports. The XML reports are automatically converted into HTML reports by Rebot.
You can set the location of the output files in the execution by specifying the parameter --outputdir (and thus set a different base directory for outputs).
From the documentaiton:
All output files can be set using an absolute path, in which case they are created to the specified place, but in other cases, the path is considered relative to the output directory. The default output directory is the directory where the execution is started from, but it can be altered with the --outputdir (-d) option. The path set with this option is, again, relative to the execution directory, but can naturally be given also as an absolute path. Regardless of how a path to an individual output file is obtained, its parent directory is created automatically, if it does not exist already.
You can call Rebot yourself to control this conversion.
You can also run Rebot after the test was run in order to create new output on a different location.
See documentation in:
http://robotframework.org/robotframework/latest/RobotFrameworkUserGuide.html#post-processing-outputs
The following example shows how to store the HTML reports in a different location and including only partial data:
rebot --include smoke --name Smoke_Tests c:\results\output.xml --outputdir c:\version1.0\reports
In the example above, we process the file c:\results\output.xml, create a new report called Smoke_Tests that includes only tests with the tag smoke and save it to the output folder c:\version1.0\reports
In addition you can also set the location of the log file (HTML) from the execution.
The command line option --log (-l) determines where log files are created.
The command line option --report (-r) determines where report files are created
Removing log lines can be done a bit differently. If you run rebot --help you'll get the following options:
--removekeywords all|passed|for|wuks|name: * Remove keyword data
from all generated outputs. Keywords containing
warnings are not removed except in `all` mode.
all: remove data from all keywords
passed: remove data only from keywords in passed
test cases and suites
for: remove passed iterations from for loops
wuks: remove all but the last failing keyword
inside `BuiltIn.Wait Until Keyword Succeeds`
name:: remove data from keywords that match
the given pattern. The pattern is matched
against the full name of the keyword (e.g.
'MyLib.Keyword', 'resource.Second Keyword'),
is case, space, and underscore insensitive,
and may contain `*` and `?` as wildcards.
Examples: --removekeywords name:Lib.HugeKw
--removekeywords name:myresource.*
--flattenkeywords for|foritem|name: * Flattens matching keywords
in all generated outputs. Matching keywords get all
log messages from their child keywords and children
are discarded otherwise.
for: flatten for loops fully
foritem: flatten individual for loop iterations
name:: flatten matched keywords using same
matching rules as with
`--removekeywords name:`

Resources