Julia Distributed, redundant iterations appearing - julia

I ran
mpiexec -n $nprocs julia --project myfile.jl
on a cluster, where myfile.jl has the following form
using Distributed; using Dates; using JLD2; using LaTeXStrings
#everywhere begin
using SharedArrays; using QuantumOptics; using LinearAlgebra; using Plots; using Statistics; using DifferentialEquations; using StaticArrays
#Defining some other functions and SharedArrays to be used later e.g.
MySharedArray=SharedArray{SVector{Nt,Float64}}(Np,Np)
end
#sync #distributed for pp in 1:Np^2
for jj in 1:Nj
#do some stuff with local variables
for tt in 1:Nt
#do some stuff with local variables
end
end
MySharedArray[pp]=... #using linear indexing
println("$pp finished")
end
timestr=Dates.format(Dates.now(), "yyyy-mm-dd-HH:MM:SS")
filename="MyName"*timestr
#save filename*".jld2"
#later on, some other small stuff like making and saving a figure. (This does give an error "no method matching heatmap_edges(::Surface{Array{Float64,2}}, ::Symbol)" but I think that this is a technical thing about Plots so not very related to the bigger issue here)
However, when looking at the output, there are a few issues that make me conclude that something is wrong
The "$pp finished" output is repeated many times for each value of pp. It seems that this amount is actually equal to 32=$nprocs
Despite the code not being finished, "MyName" files are generated. It should be one, but I get a dozen of them with different timestr component
EDIT: two more things that I can add
the output of the different "MyName" files is not identical, but this is expected since random numbers are used in the inner loops. There are 28 of them, a number that I don't easily recognize except that its again close to the 32 $nprocs
earlier, I wrote that the walltime was exceeded, but this turns out not to be true. The .o file ends with "BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES ... EXIT CODE :9", pretty shortly after the last output file.
$nprocs is obtained in the pbs script through
#PBS -l select=1:ncpus=32:mpiprocs=32
nprocs= `cat $PBS_NODEFILE|wc -l`

As pointed out by adamslc on the Julia discourse, the proper way to use Julia on a cluster is to either
Start a session with one core from the job script, add more with addprocs() in the Julia script itself
Use more specialized Julia packages
https://discourse.julialang.org/t/julia-distributed-redundant-iterations-appearing/57682/3

Related

seqff script got trouble after samtools treatment

I'm trying to study fetal fraction by using a seqff script from here.
The instruction says that I need a headless sam file to work so after alignment with bwa, I process my sam file by using samtools:
samtools view -S -q 20 target.sam > treated.sam
This process also remove the #SG header since there is no reference file. I also use grep to remove header of the target.sam got after alignment for comparison. However, when I compare, I noticed that ENet calculation on the treated.sam returned with NA results, so half of the method to calculate fetal fraction of this script was gone.
I have tested it in multiple cases and it seems that without using samtools, the script worked well but with samtools Enet calculaton not worked. Can someone tell me what is wrong with the script?
I'm using R 4.0 and all modification I made to the script is on the args since it cannot recognize args I input with anymore.

why "netstat -a" do not exit immediately but "netstat -n" does?

I have checked about the function of "-n" --
"Displays active TCP connections, however, addresses and port numbers are expressed numerically and no attempt is made to determine names."
But I can't see why "-n" can make netstat exit immediately?
From a quick check, I don't see the same description for the "-n" option as you do, and it doesn't make netstat run continuously.
As you didn't specify the version and exact command you are using, I tried both the version that comes with RH7.6 (net-tools 2.10-alpha) and the latest from source code (net-tools 3.14-alpha). The net-tools source code can be found in github [1].
As I couldn't find the exact option you describe, I tried all flags (without combinations) that don't require an argument. As far as I can tell the only options that cause netstat to not exit immediately are '-g' and '-c'. '-c' makes sense as it is the flag for running netstat continuously. For '-g' it isn't as obvious as the continuous behavior is coming from reading the /proc/net/igmp and /proc/net/igmp6 files line-by-line. The first file is read quickly but the igmp6 file takes much longer (1 line per ~1 sec). The '-g' option isn't really continuous, but just takes a lot of time to finish.
From the code, the only reason for continuous execution is (appears 4 times in the code):
if (i || !flag_cnt)
break;
wait_continous();
'i' is a return code from a function and the 'break' command is to break from an infinite for loop, so basically the code will run continuously only if flag_cnt is set (only happens when '-c' is provided) and there were no errors with previous commands.
For the specific issue above there could be a few reasons:
The option involves reading from a file and it takes very long time to finish, but it is not really continuous.
There's a correlation between the given option and flag_cnt, which cause flag_cnt to be set.
There's a call to wait_continous() which doesn't follow the condition above.
As I said, I couldn't reproduce the issue in the original question, nor could I find any flag with the description above. Also, non of the flags besides '-c' caused netstat to run continuously.
If you still want to figure this out I suggest you take a look at your code, or at least specify the net-tools version you use. The kernel version is also important as some code would be compiled-out due to missing kernel support.
[1] https://github.com/ecki/net-tools

R system() error when using brace expansion to match folders in linux

I have a series of sequential directories to gather files from on a linux server that I am logging into remotely and processing from an R terminal.
/r18_060, /r18_061, ... /r18_118, /r18_119
Each directory is for the day of year the data was logged on, and it contains a series of files with standard prefix such as "fl.060.gz"
I have to supply a function that contains multiple system() commands with a linux glob for the day. I want to divide the year into 60-day intervals to make the QA/QC more manageable. Since I'm crossing from 099 - 100 in the glob, I have to use brace expansion to match the correct sequence of days.
ls -d /root_driectory/r18_{0[6-9]?,1[0-1]?}
ls -d /root_driectory/r18_{060..119}
All of these work fine when I manually input these globs into my bash shell, but I get an error when the system() function provides a similar command through R.
day_glob <- {060..119}
system(paste("zcat /root_directory/r_18.", day_glob, "/fl.???.gz > tmpfile", sep = "")
>gzip: cannot access '/root_directory/r18_{060..119}': No such file or directory
I know that this could be an error in the shell that the system() function operates in, but when I query that it gives the correct environment and user name
system("env | grep ^SHELL=")
>SHELL=/bin/bash
system("echo $USER")
>tgw
Does anyone know why this fails when it is passed through R's system() command? What can I do to get around this problem without removing the system call altogether? There are many scripts that rely on these functions, and re-writing the entire family of R scripts would be time prohibitive.
Previously I had been using 50-day intervals which avoids this problem, but I thought this should be something easy to change, and make one less iteration of my QA/QC scripts per year. I'm new to the linux OS so I figured I might just be missing something obvious.

Opening a new instance of R and sourcing a script within that instance

Background/Motivation:
I am running a bioinformatics pipeline that, if executed from beginning to end linearly takes several days to finish. Fortunately, some of the tasks don't depend upon each other so they can be performed individually. For example, Task 2, 3, and 4 all depend upon the output from Task 1, but do not need information from each other. Task 5 uses the output of 2, 3, and 4 as input.
I'm trying to write a script that will open new instances of R for each of the three tasks and run them simultaneously. Once all three are complete I can continue with the remaining pipeline.
What I've done in the past, for more linear workflows, is have one "master" script that sources (source()) each task's subscript in turn.
I've scoured SO and google and haven't been able to find a solution for this particular problem. Hopefully you guys can help.
From within R, you can run system() to invoke commands within a terminal and open to open a file. For example, the following will open a new terminal instance:
system("open -a Terminal .",wait=FALSE)
Similarly, I can start a new r session by using
system("open -a r .")
What I can't figure out for the life of me is how to set the "input" argument so that it sources one of my scripts. For example, I would expect the following to open a new terminal instance, call r within the new instance, and then source the script.
system("open -a Terminal .",wait=FALSE,input=paste0("r; source(\"/path/to/script/M_01-A.R\",verbose=TRUE,max.deparse.length=Inf)"))
Answering my own question in the event someone else is interested down the road.
After a couple of days of working on this, I think the best way to carry out this workflow is to not limit myself to working just in R. Writing a bash script offers more flexibility and is probably a more direct solution. The following example was suggested to me on another website.
#!/bin/bash
# Run task 1
Rscript Task1.R
# now run the three jobs that use Task1's output
# we can fork these using '&' to run in the background in parallel
Rscript Task2.R &
Rscript Task3.R &
Rscript Task4.R &
# wait until background processes have finished
wait %1 %2 %3
Rscript Task5.R
You might be interested in the future package (I'm the author). It allows you to write your code as:
library("future")
v1 %<-% task1(args_1)
v2 %<-% task2(v1, args_2)
v3 %<-% task3(v1, args_3)
v4 %<-% task4(v1, args_4)
v5 %<-% task5(v2, v3, v4, args_5)
Each of those v %<-% expr statements creates a future based on the R expression expr (and all of it's dependencies) and assigns it to a promise v. It is only when v is used, it will block and wait for the value v to be available.
How and where these futures are resolved is decided by the user of the above code. For instance, by specifying:
library("future")
plan(multiprocess)
at the top, then the futures (= the different tasks) are resolved in parallel on your local machine. If you use,
plan(cluster, workers = c("n1", "n3", "n3", "n5"))
they're resolved on those for machine (where n3 accepts two concurrent jobs).
This works on all operating systems (including Windows).
If you have access to a HPC compute with schedulers such as Slurm, SGE, and TORQUE / PBS, you can use the future.BatchJobs package, e.g.
plan(future.BatchJobs::batchjobs_torque)
PS. One reason for creating future was to do large-scale Bioinformatics in parallel / distributed.

Kill a calculation programme after user defined time in R

Say my executable is c:\my irectory\myfile.exe and my R script calls on this executeable with system(myfile.exe)
The R script gives parameters to the executable programme which uses them to do numerical calculations. From the ouput of the executable, the R script then tests whether the parameters are good ore not. If they are not good, the parameters are changed and the executable rerun with updated parameters.
Now, as this executable carries out mathematical calculations and solutions may converge only slowly I wish to be able to kill the executable once it has takes to long to carry out the calculations (say 5 seconds)
How do I do this time dependant kill?
PS:
My question is a little related to this one: (time non dependant kill)
how to run an executable file and then later kill or terminate the same process with R in Windows
You can add code to your R function which issued the executable call:
setTimeLimit(elapse=5, trans=T)
This will kill the calling function, returning control to the parent environment (which could well be a function as well). Then use the examples in the question you linked to for further work.
Alternatively, set up a loop which examines Sys.time and if the expected update to the parameter set has not taken place after 5 seconds, break the loop and issue the system kill command to terminate myfile.exe .
There might possibly be nicer ways but it is a solution.
The assumption here is, that myfile.exe successfully does its calculation within 5 seconds
try.wtl <- function(timeout = 5)
{
y <- evalWithTimeout(system(myfile.exe), timeout = timeout, onTimeout= "warning")
if(inherits(y, "try-error")) NA else y
}
case 1 (myfile.exe is closed after successfull calculation)
g <- try.wtl(5)
case 2 (myfile.exe is not closed after successfull calculation)
g <- try.wtl(0.1)
MSDOS taskkill required for case 2 to recommence from the beginnging
if (class(g) == "NULL") {system('taskkill /im "myfile.exe" /f',show.output.on.console = FALSE)}
PS: inspiration came from Time out an R command via something like try()

Categories

Resources