if statement to select a channel in input block with nextflow - pipeline

I am currently writing my first nextflow pipeline and I need to run different process in function of the parameter.
In fact, I would like, in one process, to select the channel where the input come from.
I've tested like that :
process foo{
input:
if(params.bar && params.bar2)
{
file reads from channel1.flatten()
}
else
{
file reads from channel_2.flatten()
}
output:
publishDir "$params.output_dir"
file "output_file" into channel_3
"""
my command line
"""
I obtain this error and I don't understand why.
No such variable: reads
Is there a way to do something like that ?
Thanks !

It's a bit of a weird error, but basically you just need to make sure your input declaration follows/matches the required syntax:
input:
<input qualifier> <input name> [from <source channel>] [attributes]
One solution might be to use the ternary operator to replace your if/else branch, for example:
ch1 = Channel.of( 'hello', 'world' )
ch2 = Channel.of( 1, 3, 5, 7, 9 )
params.foo = false
params.bar = false
process test {
echo true
input:
val myval from ( params.foo && params.bar ? ch1 : ch2 )
"""
echo -n "${myval}"
"""
}
Results:
$ nextflow run script.nf
N E X T F L O W ~ version 21.04.3
Launching `script.nf` [shrivelled_stone] - revision: 7b3f3a51df
executor > local (5)
[3b/fafa5e] process > test (2) [100%] 5 of 5 ✔
1
5
9
7
3
$ nextflow run script.nf --foo --bar
N E X T F L O W ~ version 21.04.3
Launching `script.nf` [irreverent_mahavira] - revision: 7b3f3a51df
executor > local (2)
[d2/09d418] process > test (1) [100%] 2 of 2 ✔
world
hello
Note that the new DSL 2 decouples the channel inputs from the process declaration, which might help to keep things readable, especially if the condition or action statements are more complex. For example:
nextflow.enable.dsl=2
params.foo = false
params.bar = false
process test {
echo true
input:
val myval
"""
echo -n "${myval}"
"""
}
workflow {
ch1 = Channel.of( 'hello', 'world' )
ch2 = Channel.of( 1, 3, 5, 7, 9 )
if( params.foo && params.bar ) {
test( ch1 )
} else {
test( ch2 )
}
}
Results:
$ nextflow run script.nf
N E X T F L O W ~ version 21.04.3
Launching `script.nf` [nauseous_pare] - revision: e1c4770ff1
executor > local (5)
[36/49d8da] process > test (4) [100%] 5 of 5 ✔
9
1
3
5
7
$ nextflow run script.nf --foo --bar
N E X T F L O W ~ version 21.04.3
Launching `script.nf` [goofy_euler] - revision: e1c4770ff1
executor > local (2)
[56/e635e8] process > test (2) [100%] 2 of 2 ✔
world
hello

Related

How to run a ipynb notebook function from inside notebook with arguments

I have a jupyter notebook function
def add_args(a: int, b: int) -> int:
return a + b
a = sys.argv[1] # 0 is file name
b = sys.argv[2]
print("sum of {} and {} is {}".format(a, b, add_args(a, b)))
I want to execute this function from within notebook. I want to run only this function and it should print answer like this sum of 1 and 2 is 3.
Thank you.

check if a tree (nested list) is a binary search tree

I am trying to implement the solution from here to R but I cannot figure out to how to properly do it in R
check if a tree is a binary search tree
I converted this tree to a list:
1
/ \
2 2
/ \ / \
3 4 4 3
tree <- list("node"=1, "left"=list("node"=2, "left"=list("node"=3), "right"=list("node"=4)), "right"=list("node"=2, "left"=list("node"=3), "right"=list("node"=4)) )
using the data.tree package I can plot it:
> data.tree::FromListSimple(tree, nodeName = "1")
levelName
1 1
2 ¦--left
3 ¦ ¦--left
4 ¦ °--right
5 °--right
6 ¦--left
7 °--right
I tried to translate the Java version from the link above to R but I cannot get it to work:
isBST <- function(node, mini, maxi) {
if(is.null(node)) return(TRUE)
if(node < mini | node > maxi) return(FALSE)
return(isBST(left, mini, node-1) & isBST(right, node+1, maxi))
}
isBST(tree, -10, 10)
The simplest way to check if a tree is binary is simply:
tree$isBinary
The function takes advantage of the data.tree facility:
isBinary = function() {
all(2 == Get(Traverse(self, filterFun = function(x) !x$isLeaf), "count"))
}
Obviously, you can also implement it yourself, e.g. with recursion as in the example linked. Though I would not recommend it, because recursion is typically slow in R.

TCL recursively call procedure

I'm a beginner at TCL and while trying to build the GCD algorithm I ran into some problems I'd like some help with:
how can I call a proc inside a proc recursively like so
proc Stein_GCD { { u 0 } { v 0 } } {
if { $v == 0 } {
puts "$u\t\t$v\t\t$v"
}
if { [expr { $v % 2 } && { $u % 2 } ] == 0 } {
return [expr 2 * ${Stein_GCD 1 0} ]
}
}
set a [Stein_GCD 2 2 ]
puts $a
as you can see, I made the proc to evaluate GCD(the code does not make any sense because I'm trying to solve an example issue), and I'm trying to recursively call the proc again to continue evaluating(notice that I made an if statement that can understand the Stein_GCD 1 0 call, yet the tcl 8.6.6 online EDA emulator says:
can't read "Stein_GCD 1 0": no such variable
while executing
"expr 2 * ${Stein_GCD 1 0} "
(procedure "Stein_GCD" line 5)
invoked from within
"Stein_GCD 2 2 "
invoked from within
"set a [Stein_GCD 2 2 ]"
(file "main.tcl" line 7)
Can you tell me how to efficiently recursively call a proc, and where was my mistake?
will gladly provide more info in the case I did a bad job at explaining.
The error can't read "Stein_GCD 1 0": indicates that you are treating the data as a single string instead of separate arguments. The problem line:
return [expr 2 * ${Stein_GCD 1 0} ]
is not written correctly. ${Stean_GCD 1 0} is not a variable.
You should have:
return [expr 2 * [Stein_GCD 1 0] ]
You want the result from Stein_GCD 1 0, so the brackets should be used.

In this akka stream example's running result, why priority job comes after normal job?

The folling example is from akka stream reference doc.
import akka.actor.ActorSystem
import akka.stream._
import akka.stream.scaladsl._
/**
* Created by lc on 2016/1/2.
*/
// A shape represents the input and output ports of a reusable
// processing module
case class PriorityWorkerPoolShape[In, Out](
jobsIn: Inlet[In],
priorityJobsIn: Inlet[In],
resultsOut: Outlet[Out]) extends Shape {
// It is important to provide the list of all input and output
// ports with a stable order. Duplicates are not allowed.
override val inlets: scala.collection.immutable.Seq[Inlet[_]] =
jobsIn :: priorityJobsIn :: Nil
override val outlets: scala.collection.immutable.Seq[Outlet[_]] =
resultsOut :: Nil
// A Shape must be able to create a copy of itself. Basically
// it means a new instance with copies of the ports
override def deepCopy() = PriorityWorkerPoolShape(
jobsIn.carbonCopy(),
priorityJobsIn.carbonCopy(),
resultsOut.carbonCopy())
// A Shape must also be able to create itself from existing ports
override def copyFromPorts(
inlets: scala.collection.immutable.Seq[Inlet[_]],
outlets: scala.collection.immutable.Seq[Outlet[_]]) = {
assert(inlets.size == this.inlets.size)
assert(outlets.size == this.outlets.size)
// This is why order matters when overriding inlets and outlets.
PriorityWorkerPoolShape[In, Out](inlets(0).as[In], inlets(1).as[In], outlets(0).as[Out])
}
}
import akka.stream.FanInShape.{Init, Name}
class PriorityWorkerPoolShape2[In, Out](_init: Init[Out] = Name("PriorityWorkerPool"))
extends FanInShape[Out](_init) {
protected override def construct(i: Init[Out]) = new PriorityWorkerPoolShape2(i)
val jobsIn = newInlet[In]("jobsIn")
val priorityJobsIn = newInlet[In]("priorityJobsIn")
// Outlet[Out] with name "out" is automatically created
}
object PriorityWorkerPool {
def apply[In, Out](
worker: Flow[In, Out, Any],
workerCount: Int): Graph[PriorityWorkerPoolShape[In, Out], Unit] = {
FlowGraph.create() { implicit b ⇒
import FlowGraph.Implicits._
val priorityMerge = b.add(MergePreferred[In](1))
val balance = b.add(Balance[In](workerCount))
val resultsMerge = b.add(Merge[Out](workerCount))
// After merging priority and ordinary jobs, we feed them to the balancer
priorityMerge ~> balance
// Wire up each of the outputs of the balancer to a worker flow
// then merge them back
for (i <- 0 until workerCount)
balance.out(i) ~> worker ~> resultsMerge.in(i)
// We now expose the input ports of the priorityMerge and the output
// of the resultsMerge as our PriorityWorkerPool ports
// -- all neatly wrapped in our domain specific Shape
PriorityWorkerPoolShape(
jobsIn = priorityMerge.in(0),
priorityJobsIn = priorityMerge.preferred,
resultsOut = resultsMerge.out)
}
}
}
object ReusableGraph extends App {
implicit val system = ActorSystem("UsingGraph")
implicit val materializer = ActorMaterializer()
val worker1 = Flow[String].map("step 1 " + _)
val worker2 = Flow[String].map("step 2 " + _)
RunnableGraph.fromGraph(FlowGraph.create() { implicit b =>
import FlowGraph.Implicits._
val priorityPool1 = b.add(PriorityWorkerPool(worker1, 4))
val priorityPool2 = b.add(PriorityWorkerPool(worker2, 2))
Source(1 to 10).map("job: " + _) ~> priorityPool1.jobsIn
Source(1 to 10).map("priority job: " + _) ~> priorityPool1.priorityJobsIn
priorityPool1.resultsOut ~> priorityPool2.jobsIn
Source(1 to 10).map("one-step, priority " + _) ~> priorityPool2.priorityJobsIn
priorityPool2.resultsOut ~> Sink.foreach(println)
ClosedShape
}).run()
}
build.sbt
name := "AkkaStream"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies ++=Seq(
"com.typesafe.akka" % "akka-actor_2.11" % "2.4.1",
"com.typesafe.akka" % "akka-testkit_2.11" % "2.4.1",
"com.typesafe.akka" % "akka-stream-experimental_2.11" % "2.0-M2"
)
I run the code, and get the results as follows.
step 2 one-step, priority 1
step 2 one-step, priority 3
step 2 one-step, priority 2
step 2 one-step, priority 5
step 2 one-step, priority 4
step 2 one-step, priority 6
step 2 one-step, priority 7
step 2 one-step, priority 8
step 2 one-step, priority 10
step 2 one-step, priority 9
step 2 step 1 job: 2
step 2 step 1 job: 1
step 2 step 1 job: 4
step 2 step 1 job: 6
step 2 step 1 job: 8
step 2 step 1 job: 10
step 2 step 1 priority job: 2
step 2 step 1 priority job: 4
step 2 step 1 priority job: 6
step 2 step 1 priority job: 8
step 2 step 1 priority job: 10
step 2 step 1 job: 3
step 2 step 1 job: 5
step 2 step 1 job: 7
step 2 step 1 job: 9
step 2 step 1 priority job: 1
step 2 step 1 priority job: 3
step 2 step 1 priority job: 5
step 2 step 1 priority job: 7
step 2 step 1 priority job: 9
I have two questions:
1. step 2 one-step comes first, yes.
But "step 2 step 1 job" should comes after "step 2 step 1 priority job", why it comes out before "step 2 step 1 priority job"?
2. there is only one instance of worker, would the worker part run concurrently or not?
Question is a little old but answering anyway since I stumbled on the same thing.
I think it's just because your computer is fast enough that once it hits this code:
Source(1 to 10).map("job: " + _) ~> priorityPool1.jobsIn
Source(1 to 10).map("priority job: " + _) ~> priorityPool1.priorityJobsIn
by the time it sends the second 10 numbers , the first 10 have already been processed. I think that because of this issue they changed the example to 100, but still on my computer I see results similar to yours, but if you slow it down using throttling, you'll see results how you'd expect them:
Source(1 to 10)
.throttle(1, 0.1.second, 1, ThrottleMode.shaping)
.map("job: " + _) ~> priorityPool1.jobsIn
Source(1 to 10)
.throttle(1, 0.1.second, 1, ThrottleMode.shaping)
.map("priority job: " + _) ~> priorityPool1.priorityJobsIn
So, it's not that the results are incorrect, it's just in parallel processing your computer may be too fast.
Of course here throttling is used only to slow down computation and to see our learning example working and should not be used in production unless slowing down computation is actually what you want.

How to calculate the tree the results by combining individual leaf paths?

Let's say I have an input file where each line contains the path from the root (A) to a leaf
echo "A\tB\tC\nA\tB\tD\nA\tE" > lines.txt
A B C
A B D
A E
How can I easily generate the resulting tree?: (A(B(C,D),E))
I'd like to use GNU tools (awk, sed, etc.) because they tend to work better with large files, but an R script would also work. The R input would be:
# lines <- lapply(readLines("lines.txt"), strsplit, " +")
lines <- list(list(c("A", "B", "C")), list(c("A", "B", "D")), list(c("A","E")))
In Perl:
#!/usr/bin/env perl
use strict;
my $t = {};
while (<>) {
my #a = split;
my $t1 = $t;
while (my $a = shift #a) {
$t1->{$a} = {} if not exists $t1->{$a};
$t1 = $t1->{$a};
}
}
print &p($t)."\n";
sub p {
my ($t) = #_;
return
unless keys %$t;
return '('
. join(',', map { $_ . p($t->{$_}) } sort keys %$t)
. ')';
}
This script returns:
% cat <<EOF | perl l.pl
A B C
A B D
A E
EOF
(A(B(C,D),E))
Note that this script, due to recursion in p is not at all suited for large datasets. But that can be easily resolved by turning that into a double for loop, like in the first while above.
Why do it the easy way, if you can use Bourne Shell script instead? Note, this is not even Bash, this is plain old Bourne shell, without arrays...
#!/bin/sh
#
# A B C
# A B D
# A E
#
# "" vs "A B C" -> 0->3, ident 0 -> -0+3 -> "(A(B(C"
# "A B C" vs "A B D" -> 3->3, ident 2 -> -1+1 -> ",D"
# "A B D" vs "A E" -> 3->2, ident 1 -> -2+1 -> "),E"
# "A E" vs. endc -> 2->0, ident 0 -> -2+0 -> "))"
#
# Result: (A(B(C,D),E))
#
# Input stream is a path per line, path segments separated with spaces.
process_line () {
local line2="$#"
n2=$#
set -- $line1
n1=$#
s=
if [ $n2 = 0 ]; then # last line (empty)
for s1 in $line1; do
s="$s)"
done
else
sep=
remainder=false
for s2 in $line2; do
if ! $remainder; then
if [ "$1" != $s2 ]; then
remainder=true
if [ $# = 0 ]; then # only children
sep='('
else # sibling to an existing element
sep=,
shift
for s1 in $#; do
s="$s)"
done
fi
fi
fi
if $remainder; then # Process remainder as mismatch
s="$s$sep$s2"
sep='('
fi
shift # remove the first element of line1
done
fi
result="$result$s"
}
result=
line1=
(
cat - \
| sed -e 's/[[:space:]]\+/ /' \
| sed -e '/^$/d' \
| sort -u
echo '' # last line marker
) | while read line2; do
process_line $line2
line1="$line2"
test -n "$line2" \
|| echo $result
done
This produces the correct answer for two different files (l.sh is the shell version, l.pl the version in Perl):
% for i in l l1; do cat $i; ./l.sh < $i; ./l.pl < $i; echo; done
A
A B
A B C D
A B E F
A G H
A G H I
(A(B(C(D),E(F)),G(H(I))))
(A(B(C(D),E(F)),G(H(I))))
A B C
A B D
A E
(A(B(C,D),E))
(A(B(C,D),E))
Hoohah!
Okay, so I think I got it:
# input
lines <- c(list(c("A", "B", "C")), list(c("A", "B", "D")), list(c("A","E")))
# generate children
generate_children <- function(lines){
children <- list()
for (line in lines) {
for (index in 1:(length(line)-1)){
parent <- line[index]
next_child <- line[index + 1]
if (is.null(children[[parent]])){
children[[parent]] <- next_child
} else {
if (next_child %notin% children[[parent]]){
children[[parent]] <- c(children[[parent]], next_child)
}
}
}
}
children
}
expand_children <- function(current_parent, children){
if (current_parent %in% names(children)){
expanded_children <- sapply(children[[current_parent]], function(current_child){
expand_children(current_child, children)
}, USE.NAMES = FALSE)
output <- setNames(list(expanded_children), current_parent)
} else {
output <- current_parent
}
output
}
children <- generate_children(lines)
root <- names(children)[1]
tree <- expand_children(root, children)
dput(tree)
# structure(list(A = structure(list(B = c("C", "D"), "E"), .Names = c("B",""))), .Names = "A")
Is there a simpler answer?

Resources