In snakemake, how do you use wildcards with scatter-gather processes?

In snakemake, how do you use wildcards with scatter-gather processes? - wildcard

I am trying to use snakemake's scatter-gather functionality to parallelize a slow step in my workflow. However, I cannot figure out how to apply it in situations where I am using wildcards. For example, I have defined the wildcard library in rule all, however, this does not seem to apply to the scatter function in ScatterIntervals:
import re
SCATTER_COUNT = 100
scattergather:
split=SCATTER_COUNT
rule all:
input:
expand("{library}_output.txt", library=["FC19271512", "FC19271513"])
rule ScatterIntervals:
input:
"{library}_baits.interval_list"
output:
temp(scatter.split("tmp/{library}_baits.scatter_{scatteritem}.interval_list"))
params:
output_prefix = (
lambda wildcards, output:
re.sub("\.scatter_\d+\.interval_list", "", output[0])
),
scatter_count = SCATTER_COUNT
shell:
"""
python ScatterIntervals.py \
-i {input} \
-o {params.output_prefix} \
-s {params.scatter_count}
"""
rule ProcessIntervals:
input:
bam = "{library}.bam",
baits = "tmp/{library}_baits.scatter_{scatteritem}.interval_list"
output:
temp("tmp/{library}_output.scatter_{scatteritem}.txt")
shell:
"""
python ProcessIntervals.py \
-b {input.bam} \
-l {input.baits} \
-o {output}
"""
rule GatherIntervals:
input:
gather.split("tmp/{library}_output.scatter_{scatteritem}.txt")
output:
"{library}_output.txt"
run:
inputs = "-i ".join(input)
command = f"python GatherOutputs.py {inputs} -o {output[0]}"
shell(command)
WildcardError in line 16 of Snakefile:
No values given for wildcard 'library'.

Evidently this works like expand, in that you can quote the wildcards that aren't scatteritem if you want DAG resolution to deal with them:
temp(scatter.split("tmp/{{library}}_baits.scatter_{scatteritem}.interval_list"))
The same logic applies for gather.split.

Related

snakemake Wildcards in input files cannot be determined from output files:

I use the snakemkae to create a pipeline to split bam by chr,but I met a problem,
Wildcards in input files cannot be determined from output files:
'OutputDir'
Can someone help me to figure it out ?
if config['ref'] == 'hg38':
ref_chr = []
for i in range(1,23):
ref_chr.append('chr'+str(i))
ref_chr.extend(['chrX','chrY'])
elif config['ref'] == 'b37':
ref_chr = []
for i in range(1,23):
ref_chr.append(str(i))
ref_chr.extend(['X','Y'])
rule all:
input:
expand(f"{OutputDir}/split/{name}.{{chr}}.bam",chr=ref_chr)
rule minimap2:
input:
TargetFastq
output:
Sortbam = "{OutputDir}/{name}.sorted.bam",
Sortbai = "{OutputDir}/{name}.sorted.bam.bai"
resources:
mem_mb = 40000
threads: nt
singularity:
OntSoftware
shell:
"""
minimap2 -ax map-ont -d {ref_mmi} --MD -t {nt} {ref_fasta} {input} | samtools sort -O BAM -o {output.Sortbam}
samtools index {output.Sortbam}
"""
rule split_bam:
input:
rules.minimap2.output.Sortbam
output:
splitBam = expand(f"{OutputDir}/split/{name}.{{chr}}.bam",chr=ref_chr),
splitBamBai = expand(f"{OutputDir}/split/{name}.{{chr}}.bam.bai",chr=ref_chr)
resources:
mem_mb = 30000
threads: nt
singularity:
OntSoftware
shell:
"""
samtools view -# {nt} -b {input} {chr} > {output.splitBam}
samtools index -# {nt} {output.splitBam}
"""
I change the wilcards {outputdir},but is dose not help.

expand(f"{OutputDir}/split/{name}.{{chr}}.bam",chr=ref_chr),
splitBamBai = expand(f"{OutputDir}/split/{name}.{{chr}}.bam.bai",chr=ref_chr),
A couple of comments on this lines...:
You escape chr by using double braces, {{chr}}. This means you don't want chr to be expanded, which I doubt it is correct. I suspect you want something like:
expand("{{OutputDir}}/split/{{name}}.{chr}.bam",chr=ref_chr),
The rule minimpa2 does not contain {chr} wildcard, hence the error you get.
As an aside, when you create a bam file and its index in the same rule, you can get the time stamp of the index file to be older than the bam file itself. This later can generate spurious warning from samtools/bcftools. See https://github.com/snakemake/snakemake/issues/1378 (not sure if it's been fixed).

Confusion about the composition of the `cmd` parameter used in Deno.run()

I tried to use Deno as a replacement for shell script, but got stuck.
I attempted to use Deno/Typescript to carry out the equivalent job as this:
docker run \
-d \
-v pgdata:/var/lib/postgresql/data \
--name pg \
-e POSTGRES_PASSWORD=123456 \
--rm \
-p 5432:5432 \
postgres
ts code looks like this:
function runCmd(s: string[]): Deno.Process {
return Deno.run({ cmd: s, stdout: "piped", stderr: "piped" })
}
function runPg() {
const cmd = [
"docker",
`run -d -v ${VOLUME}:/var/lib/postgresql/data --name pg -e POSTGRES_PASSWORD=${PASSWORD} --rm -p 5432:5432 postgres`
];
return runCmd(cmd);
}
add execution bit to this ts file and run it in terminal:
after this, I tried
function runPg() {
const cmd = [
"docker",
"run",
`-d -v ${VOLUME}:/var/lib/postgresql/data --name pg -e POSTGRES_PASSWORD=${PASSWORD} --rm -p 5432:5432 postgres`
];
return runCmd(cmd);
}
move out subcommand run from command options.
I got this:
I guess that Deno.run doesn't simply concatenate the passed-in string of command particles, but I cannot find enough information on this subject in order to fix the issue.
I haven't gone through the rust source code on this API, but I thought it's better to ask for help before trying the hard way.

You need to specify each part of the command as a separate string in the cmd array:
function runPg() {
const cmd = [
"docker",
"run",
"-d",
"-v",
`${VOLUME}:/var/lib/postgresql/data`,
"--name",
"pg",
"-e",
`POSTGRES_PASSWORD=${PASSWORD}`,
"--rm",
"-p",
"5432:5432",
"postgres"
];
return runCmd(cmd);
}
This will send run as the first argument to docker instead of sending run -d … as the first argument.
You can also build your command as a single string and then use split(" ") as long as no arguments contain spaces.

a follow up on my trial-n-error journey on this top.
While reading a book about unix shell programming, it points out a way to help the shell differentiate the space in identifier and the space as delimiter. When one tries to cat a file named a b (there is a space in between), the command should be cat a\ b or, using quotes, cat 'a b'.
This gives me an idea why my command does not work in Deno. See, each item in the cmd string list is an identifier, when I mix up delimiter-space with identifiers, the command is interpreted in the wrong way.
For example.
If I'd like to cat a file named a b. In Deno.run, I need to use { cmd: ["cat", "a b"] }.
If I'd like to cat two files named a and b. In Deno.run, I need to use { cmd: ["cat", "a", "b"] }.
Just remember that space in a command particle counts as a part of that term.

Shell script arguments

I just started writing shell scripts in Unix so, I am a total newbie
I want to read the arguments given when the user run the script
ex:
sh script -a abc
I want to read for argument -a user gave abc.
My code so far:
if ( $1 = "-a" )
then var=$2
fi
echo $var
I get an error.

Bash uses an external program called test to perform boolean tests, but that program is used mostly via its alias [.
if ( $1 = "-a" )
should become
if [ $1 = "-a" ]
if you use [ or
if test $1 = "-a"

#!/bin/sh
if [ $1 = "-a" ]; then
var=$2
fi
echo $var
You shoud be careful of the space between if and [

Problem in using shell for loop inside gnu make?

consider the below make file
all:
#for x in y z; \
do \
for a in b c; \
do \
echo $$x$$a >> log_$$x; \
done; \
done
While executing this make file, two file got created log_y and log_z. log_y is having data "yb" and "yc". similarly log_z is having data"zb" and "zc".
Actually I want to create four files(log_y_b, log_y_c, log_z_b, log_z_c). For this i have modified the above make file as,
all:
#for x in y z; \
do \
for a in b c; \
do \
echo $$x$$a >> log_$$x_$$a; \
done; \
done
But its creating only one file log_. What should i have to do to create four files.

Perhaps put braces around the variable names: it works on my system.
all:
#for x in y z; \
do \
for a in b c; \
do \
echo $$x$$a >> log_$${x}_$${a}; \
done; \
done
You can also use foreach:
all:
#$(foreach x,y z,$(foreach a,b c,echo $(x)$(a) >> log_$(x)_$(a);))

log_$$x_$$a in the Makefile turns into log_$x_$a for the shell which is equivalent to log_${x_}${a}. The variable $x_ is undefined, however, so the shell substitutes it by the empty string.
Solution: Properly write the $x variable with curly braces around the name (${variablename}), i.e. for consistency's sake write log_${x}_${a} (or in Makefile style: log_$${x}_$${a}).

How to custom display prompt in KornShell to show hostname and current directory?

I am using KornShell (ksh) on Solaris and currently my PS1 env var is:
PS1="${HOSTNAME}:\${PWD} \$ "
And the prompt displays: hostname:/full/path/to/current/directory $
However, I would like it to display: hostname:directory $
In other words, how can I display just the hostname and the name of the current directory, i.e. tmp or ~ or public_html etc etc?

From reading the ksh man page you want
PS1="${HOSTNAME}:\${PWD##*/} \$ "
Tested on default ksh on SunOS 5.8

Okay, a little old and a little late, but this is what I use in Kornshell:
PS1='$(print -n "`logname`#`hostname`:";if [[ "${PWD#$HOME}" != "$PWD" ]] then; print -n "~${PWD#$HOME}"; else; print -n "$PWD";fi;print "\n$ ")'
This makes a prompt that's equivalent to PS1="\u#\h:\w\n$ " in BASH.
For example:
qazwart#mybook:~
$ cd bin
qazwart#mybook:~/bin
$ cd /usr/local/bin
qazwart#mybook:/usr/local/bin
$
I like a two line prompt because I sometimes have very long directory names, and they can take up a lot of the command line. If you want a one line prompt, just leave off the "\n" on the last print statement:
PS1='$(print -n "`logname`#`hostname`:";if [[ "${PWD#$HOME}" != "$PWD" ]] then; print -n "~${PWD#$HOME}"; else; print -n "$PWD";fi;print "$ ")'
That's equivalent to PS1="\u#\h:\w$ " in BASH:
qazwart#mybook:~$ cd bin
qazwart#mybook:~/bin$ cd /usr/local/bin
qazwart#mybook:/usr/local/bin$
It's not quite as easy as setting up a BASH prompt, but you get the idea. Simply write a script for PS1 and Kornshell will execute it.
For Solaris and other Older Versions of Kornshell
I found that the above does not work on Solaris. Instead, you'll have to do it the real hackish way...
In your .profile, make sure that ENV="$HOME/.kshrc"; export ENV
is set. This is probably setup correctly for you.
In your .kshrc file, you'll be doing two things
You'll be defining a function called _cd. This function will change to the directory specified, and then set your PS1 variable based upon your pwd.
You'll be setting up an alias cd to run the _cd function.
This is the relevant part of the .kshrc file:
function _cd {
logname=$(logname) #Or however you can set the login name
machine=$(hostname) #Or however you set your host name
$directory = $1
$pattern = $2 #For "cd foo bar"
#
# First cd to the directory
# We can use "\cd" to evoke the non-alias original version of the cd command
#
if [ "$pattern" ]
then
\cd "$directory" "$pattern"
elif [ "$directory" ]
then
\cd "$directory"
else
\cd
fi
#
# Now that we're in the directory, let's set our prompt
#
$directory=$PWD
shortname=${directory#$HOME} #Possible Subdir of $HOME
if [ "$shortName" = "" ] #This is the HOME directory
then
prompt="~$logname" # Or maybe just "~". Your choice
elif [ "$shortName" = "$directory" ] #Not a subdir of $HOME
then
prompt="$directory"
else
prompt="~$shortName"
fi
PS1="$logname#$hostname:$prompt$ " #You put it together the way you like
}
alias cd="_cd"
This will set your prompt as the equivelent BASH PS1="\u#\h:\w$ ". It isn't pretty, but it works.

ENV=~/.kshrc, and then in your .kshrc:
function _cd {
\cd "$#"
PS1=$(
print -n "$LOGNAME#$HOSTNAME:"
if [[ "${PWD#$HOME}" != "$PWD" ]]; then
print -n "~${PWD#$HOME}"
else
print -n "$PWD"
fi
print "$ "
)
}
alias cd=_cd
cd "$PWD"
Brad

HOST=`hostname`
PS1='$(print -n "[${USER}#${HOST%%.*} ";[[ "$HOME" == "$PWD" ]] && print -n "~" ||([[ "${PWD##*/}" == "" ]] && print -n "/" || print -n "${PWD##*/}");print "]$")'

PS1=`id -un`#`hostname -s`:'$PWD'$

and...
if you work between two shells for most of your effort [ksh and bourne sh]
and desire a directory tracking display on your command line
then PWD can be substituted easily in ksh
and if you use /usr/xpg4/bin/sh for your sh SHELL, it will work there as well

Try this:
PS1="\H:\W"
More information on: How to: Change / Setup bash custom prompt, I know you said ksh, but I am pretty sure it will work.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

In snakemake, how do you use wildcards with scatter-gather processes? - wildcard

Evidently this works like expand, in that you can quote the wildcards that aren't scatteritem if you want DAG resolution to deal with them: temp(scatter.split("tmp/{{library}}_baits.scatter_{scatteritem}.interval_list")) The same logic applies for gather.split.

Related

snakemake Wildcards in input files cannot be determined from output files:

Confusion about the composition of the `cmd` parameter used in Deno.run()

Shell script arguments

Problem in using shell for loop inside gnu make?

How to custom display prompt in KornShell to show hostname and current directory?

Categories

Resources