How to parallellize tasks given on the command line

How to parallellize tasks given on the command line - sbt

I'd like to be able to execute dynamically some tasks in SBT.
So, I'm using the command line:
sbt taskA taskB taskC
It works ok, but all of them are executed sequentially.
On the other hand, if I programmatically write this inside build.sbt:
val allTasks = taskKey[Unit]("All")
allTasks := {
taskA.value
taskB.value
taskC.value
}
all of them are executed in parallel.
How can I get this behavior on the command line?

You can use the all command:
build.sbt
TaskKey[String]("taskA") := { println("A start"); Thread.sleep(3000); println("A end"); "a" }
TaskKey[String]("taskB") := { println("B start"); Thread.sleep(2000); println("B end"); "b" }
TaskKey[String]("taskC") := { println("C start"); Thread.sleep(1000); println("C end"); "c" }
And running it:
> all taskA taskB taskC
C start
A start
B start
C end
B end
A end

This isn't currently possible from the command line.
One thing that you can do from the sbt shell is define a task on the fly then run it:
$ set TaskKey[Unit]("allTasks") := { val a = taskA.value ; val b = taskB.value ; val c = taskC.value ; () }
[info] Defining *:allTasks
$ allTasks
[info] Running Task A in parallel
[info] Running Task B in parallel
[info] Running Task C in parallel

Related

Why tokio::spawn doesn't execute my code?

My rust code is like below.
#[tokio::main]
pub async fn main() {
for i in 1..10 {
tokio::spawn(async move {
println!("{}", i);
});
}
}
When run the code, I expect it to print 1 to 10 in a random sequence.
But it just print some random numbers:
1
3
2
Terminal will be reused by tasks, press any key to close it.
Why this is happenning?

https://docs.rs/tokio/latest/tokio/fn.spawn.html warns that:
There is no guarantee that a spawned task will execute to completion. When a runtime is shutdown, all outstanding tasks are dropped, regardless of the lifecycle of that task.
One solution that should work is to store all of the JoinHandles and then await all of them:
let mut join_handles = Vec::with_capacity(10);
for i in 1..10 {
join_handles.push(tokio::spawn(async move {
println!("{}", i);
}));
}
for join_handle in join_handles {
join_handle.await.unwrap();
}
P.S. In 1..10, the end is exclusive, so the last number is 9. You might want 1..=10 instead. (see https://doc.rust-lang.org/reference/expressions/range-expr.html)

How can I have a grace period between starting an asynchronous operation and waiting for its result in a loop?

I have a loop:
let grace = 2usize;
for i in 0..100 {
if i % 10 == 0 {
expensive_function()
} else {
cheap_function()
}
}
The goal is that when it hits expensive_function(), it runs asynchronously and allows grace number of further iterations until waiting on expensive_function().
If expensive_function() triggers at iteration 10, it could then run iterations 11 and 12 before needing to wait for the expensive_function() run on iteration 10 to finish to continue.
How could I do this?
In my case expensive_function() is effectively:
fn expensive_function(&b) -> Vec<_> {
return b.iter().map(|a| a.inner_expensive_function()).collect();
}
As such I plan to use multi-threading within this function.

When you start the expensive computation, store the resulting future in a variable, along with the deadline time to wait for the result. Here, I use an Option of a tuple:
use std::{thread, time::Duration};
use tokio::task; // 0.2.21, features = ["full"]
#[tokio::main]
async fn main() {
let grace_period = 2usize;
let mut pending = None;
for i in 0..50 {
if i % 10 == 0 {
assert!(pending.is_none(), "Already had pending work");
let future = expensive_function(i);
let deadline = i + grace_period;
pending = Some((deadline, future));
} else {
cheap_function(i);
}
if let Some((deadline, future)) = pending.take() {
if i == deadline {
future.await.unwrap();
} else {
pending = Some((deadline, future));
}
}
}
}
fn expensive_function(n: usize) -> task::JoinHandle<()> {
task::spawn_blocking(move || {
println!("expensive_function {} start", n);
thread::sleep(Duration::from_millis(500));
println!("expensive_function {} done", n);
})
}
fn cheap_function(n: usize) {
println!("cheap_function {}", n);
thread::sleep(Duration::from_millis(1));
}
This generates the output of
cheap_function 1
expensive_function 0 start
cheap_function 2
expensive_function 0 done
cheap_function 3
cheap_function 4
cheap_function 5
Since you did not provide definitions of expensive_function and cheap_function, I have provided appropriate ones.
One tricky thing here is that I needed to add the sleep call in the cheap_function. Without it, my OS never schedules the expensive thread until it's time to poll it, effectively removing any parallel work. In a larger program, the OS is likely to schedule the thread simply because more work will be done by cheap_function. You might also be able to use thread::yield_now to the same effect.
See also:
How to create a dedicated threadpool for CPU-intensive work in Tokio?
How do I synchronously return a value calculated in an asynchronous Future in stable Rust?
What is the best approach to encapsulate blocking I/O in future-rs?

How to get the list of subprojects dynamically in sbt 0.13

How can I programmatically (in build.sbt) find all the subprojects of the current root project in sbt 0.13?
(I have not tried Project.componentProjects yet, because it's new in sbt 1.0).
lazy val root = (project in file(".") ... )
val myTask = taskKey[Unit]("some description")
myTask := {
val masterRoot = baseDirectory.value
// This does not work
// val subProjects: Seq[ProjectReference] = root.aggregate
// So I tried to specify the subproject list explicitly; still does not work
val subProjects = Seq[Project](subPrj1)
subProjects.foreach { subproject =>
// All of this works if the "subproject" is hard-coded to "subPrj1"
val subprojectTarget = target.in(subproject).value / "classes"
val cp = (dependencyClasspath in(subproject, Compile, compile)).value
}
}
Got these errors:
build.sbt: error: Illegal dynamic reference: subproject
val subprojectTarget = target.in(subproject).value / "classes"
^
build.sbt: error: Illegal dynamic reference: subproject
val cp = (dependencyClasspath in(subproject, Compile, compile)).value

You can access a list of all subprojects via buildStructure.value.allProjectRefs.
The other part is of your problem is an aweful issue that I've also faced quite often. I was able to work around such problems by first creating a List[Task[A] and then using a recursive function to lift it into a Task[List[A]].
def flattenTasks[A](tasks: Seq[Def.Initialize[Task[A]]]): Def.Initialize[Task[List[A]]] =
tasks.toList match {
case Nil => Def.task { Nil }
case x :: xs => Def.taskDyn {
flattenTasks(xs) map (x.value :: _)
}
}
myTask := {
val classDirectories: List[File] = Def.taskDyn {
flattenTasks {
for (project ← buildStructure.value.allProjectRefs)
yield Def.task { (target in project).value / "classes" }
}
}.value
}
I've used this approach e.g. here: utility methods actual usage

Finishing a forked process blocks SBT with a custom output strategy

In SBT, I fork a Java process with:
class FilteredOutput extends FilterOutputStream(System.out) {
var buf = ArrayBuffer[Byte]()
override def write(b: Int) {
buf.append(b.toByte)
if (b == '\n'.toInt)
flush()
}
override def flush(){
if (buf.nonEmpty) {
val arr = buf.toArray
val txt = try new String(arr, "UTF-8") catch { case NonFatal(ex) ⇒ "" }
if (!txt.startsWith("pydev debugger: Unable to find real location for"))
out.write(arr)
buf.clear()
}
super.flush()
}
}
var process = Option.empty[Process]
process = Some(Fork.java.fork(ForkOptions(outputStrategy = new FilteredOutput()), Seq("my.company.MyClass")))
as a result of a custom task.
Later on, I terminate it with:
process.map { p =>
log info "Killing process"
p.destroy()
}
by means of another custom task.
The result is that SBT doesn't accept more input and gets blocked. Ctrl+C is the only way of restoring control back, but SBT dies as a consequence.
The problem has to do with the custom output strategy, that filters some annoying messages.
With jstack I haven't seen any deadlock.
SBT version 0.13.9.

The solution is to avoid closing System.out:
class FilteredOutput extends FilterOutputStream(System.out) {
var buf = ArrayBuffer[Byte]()
override def write(b: Int) {
...
}
override def flush(){
...
}
override def close() {}
}

Use GPars to dispatch an async task and return immediately

I would like to have a method that dispatches an async task, and returns immediately. I don't need wait for the result.
I'd like something like this to work:
/**
* runs a job and return job id for later montoring.
*/
def int runJob(){
int jobId = createJob() // returns immediately
task{
doSomthingThatTakesSomeTime()
}.then {stepResult-> doSmtgElse(stepResult)}
return jobId
}
In the situation above, the task won't run, as there's no call for .get()
, however, if I DO .get() , the method will not return jobId until task is finished.
How can i dispatch the task and still return immediately?

You can run this example as a Groovy script:
#Grapes(
#Grab(group='org.codehaus.gpars', module='gpars', version='1.2.1')
)
import java.util.concurrent.*
import groovyx.gpars.*
def doSomethingThatTakesSomeTime(){
println "calculating..."
for(long i: 0..100){
Thread.sleep(i)
}
println "*done*"
"Done with doSomethingThatTakesSomeTime"
}
def doSomethingElse(){
for(int x:0..1000) print "."
println "doSomethingElse done."
}
/**
* runs a job and return job id for later montoring.
*/
def runJob(){
GParsPool.withPool(){
Future future = createJob() // returns immediately
doSomethingElse() //Do someting else while the async process is running
//Ok, thats done, but the longer runningprocess is still running, return the future
future
}
}
Future createJob(){
//create a new closure, which starts the original closure on a thread pool
Closure asyncFunction = { doSomethingThatTakesSomeTime() }.async()
//Invoke the function, return a Future
asyncFunction()
}
def job = runJob()
//println "\n\nResult is: " + job.get()
If you run the script "as-is", you will see that it runs and the long running job does print *done* indicating it did in fact run to completion, even though the line at the bottom that calls Future.get() is commented out and never called.
If you uncomment the last line, you will see the result printed once complete as a result of calling Future.get()

After reading #pczeus answer and Jérémie B's comment I came up with this:
import static groovyx.gpars.dataflow.Dataflow.task
def int longTask(){
def counter = 0
10.times {
println "longTask_${counter}"
counter++
sleep 10
}
counter
}
def int getSomeString() {
def jobId=55
task {
longTask()
}.then { num -> println "completed running ${num} times" }
return jobId
}
println getSomeString()
sleep 2000
This prints:
longTask_0
55
longTask_1
longTask_2
longTask_3
longTask_4
longTask_5
longTask_6
longTask_7
longTask_8
longTask_9
completed running 10 times
Which is what I intended:
the longTask() is running in the background, the getSomeString() retruns without waiting for the long task, and as long as the program is still running (hence the sleep 2000), even the clause in the 'then' part is executed

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to parallellize tasks given on the command line - sbt

Related

Why tokio::spawn doesn't execute my code?

How can I have a grace period between starting an asynchronous operation and waiting for its result in a loop?

How to get the list of subprojects dynamically in sbt 0.13

Finishing a forked process blocks SBT with a custom output strategy

Use GPars to dispatch an async task and return immediately

Categories

Resources