I use the following code to both output something to stdout, and pipe it to a program:
function example() {
local fd1
{
exec {fd1}>&1
{ echo hi >&$fd1 } | true
} always { exec {fd1}>&- }
}
I am wondering if I can safely drop always { exec {fd1}>&- }. fd1 goes out of scope after the function finishes anyways.
You need to keep always { exec {fd1}>&- }. If you get rid of that, the variable containing the file descriptor will go out of scope, but the file descriptor won't be closed, resulting in leaking it. You can see this by doing ls -l /proc/$$/fd before and after running your function without that line. Each run of the function will permanently add another FD to that list. Eventually, you'll run out of file descriptors and won't be able to open any new ones, which will break things.
Related
i'm new to nf-core/nextflow and needless to say the documentation does not reflect what might be actually implemented. But i'm defining the basic pipeline below:
nextflow.enable.dsl=2
process RUNBLAST{
input:
val thr
path query
path db
path output
output:
path output
script:
"""
blastn -query ${query} -db ${db} -out ${output} -num_threads ${thr}
"""
}
workflow{
//println "I want to BLAST $params.query to $params.dbDir/$params.dbName using $params.threads CPUs and output it to $params.outdir"
RUNBLAST(params.threads,params.query,params.dbDir, params.output)
}
Then i'm executing the pipeline with
nextflow run main.nf --query test2.fa --dbDir blast/blastDB
Then i get the following error:
N E X T F L O W ~ version 22.10.6
Launching `main.nf` [dreamy_hugle] DSL2 - revision: c388cf8f31
Error executing process > 'RUNBLAST'
Error executing process > 'RUNBLAST'
Caused by:
Not a valid path value: 'test2.fa'
Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run
I know test2.fa exists in the current directory:
(nfcore) MN:nf-core-basicblast jraygozagaray$ ls
CHANGELOG.md conf other.nf
CITATIONS.md docs pyproject.toml
CODE_OF_CONDUCT.md lib subworkflows
LICENSE main.nf test.fa
README.md modules test2.fa
assets modules.json work
bin nextflow.config workflows
blast nextflow_schema.json
I also tried with "file" instead of path but that is deprecated and raises other kind of errors.
It'll be helpful to know how to fix this to get myself started with the pipeline building process.
Shouldn't nextflow copy the file to the execution path?
Thanks
You get the above error because params.query is not actually a path value. It's probably just a simple String or GString. The solution is to instead supply a file object, for example:
workflow {
query = file(params.query)
BLAST( query, ... )
}
Note that a value channel is implicitly created by a process when it is invoked with a simple value, like the above file object. If you need to be able to BLAST multiple query files, you'll instead need a queue channel, which can be created using the fromPath factory method, for example:
params.query = "${baseDir}/data/*.fa"
params.db = "${baseDir}/blastdb/nt"
params.outdir = './results'
db_name = file(params.db).name
db_path = file(params.db).parent
process BLAST {
publishDir(
path: "{params.outdir}/blast",
mode: 'copy',
)
input:
tuple val(query_id), path(query)
path db
output:
tuple val(query_id), path("${query_id}.out")
"""
blastn \\
-num_threads ${task.cpus} \\
-query "${query}" \\
-db "${db}/${db_name}" \\
-out "${query_id}.out"
"""
}
workflow{
Channel
.fromPath( params.query )
.map { file -> tuple(file.baseName, file) }
.set { query_ch }
BLAST( query_ch, db_path )
}
Note that the usual way to specify the number of threads/cpus is using cpus directive, which can be configured using a process selector in your nextflow.config. For example:
process {
withName: BLAST {
cpus = 4
}
}
when I am running a Tcl script that contains the following lines:
set V [exec bjobs ]
puts "bjobs= ${V}"
When jobs are present it's working properly but, no jobs are running it is showing an error like this:
No unfinished job found
while executing
"exec bjobs "
invoked from within
"set V [exec bjobs ]"
How to avoid this error? Please let me know how to avoid this kind of errors.
It sounds to me like the bjobs program has a non-zero exit code in this case. The exec manual page includes this example in a subsection WORKING WITH NON-ZERO RESULTS:
To execute a program that can return a non-zero result, you should wrap
the call to exec in catch and check the contents of the -errorcode
return option if you have an error:
set status 0
if {[catch {exec grep foo bar.txt} results options]} {
set details [dict get $options -errorcode]
if {[lindex $details 0] eq "CHILDSTATUS"} {
set status [lindex $details 2]
} else {
# Some other error; regenerate it to let caller handle
return -options $options -level 0 $results
}
}
This is more easily written using the try command, as that makes it
simpler to trap specific types of errors. This is done using code like
this:
try {
set results [exec grep foo bar.txt]
set status 0
} trap CHILDSTATUS {results options} {
set status [lindex [dict get $options -errorcode] 2]
}
I think you could write this as:
try {
set V [exec bjobs ]
} trap CHILDSTATUS {message} {
# Not sure how you want to handle the case where there's nothing...
set V $message
}
puts "bjobs= ${V}"
if {[catch {exec bjobs} result]} {
puts "bjobs have some issues. Reason : $result"
} else {
puts "bjobs executed successfully. Result : $result"
}
Reference : catch
Note carefully in the exec man
page:
If any of the commands in the pipeline exit abnormally or are killed or
suspended, then exec will return an error [...]
If any of the commands
writes to its standard error file and that standard error is not
redirected and
-ignorestderr is not specified, then exec will return an
error.
So if bjobs returns non-zero or prints to stderr when there are no jobs, exec needs catch or try as Donal writes.
What is the flow of execution of the time command in detail?
I have a user created function in PowerShell, which will compute the time for execution of the command in the following way.
It will open the new PowerShell window.
It will execute the command.
It will close the PowerShell window.
It will get the the different execution times using the GetProcessTimes function function.
Is the "time command" in Unix also calculated in the same way?
The Measure-Command cmdlet is your friend.
PS> Measure-Command -Expression {dir}
You could also get execution time from the command history (last executed command in this example):
$h = Get-History -Count 1
$h.EndExecutionTime - $h.StartExecutionTime
I've been doing this:
Time {npm --version ; node --version}
With this function, which you can put in your $profile file:
function Time([scriptblock]$scriptblock, $name)
{
<#
.SYNOPSIS
Run the given scriptblock, and say how long it took at the end.
.DESCRIPTION
.PARAMETER scriptBlock
A single computer name or an array of computer names. You mayalso provide IP addresses.
.PARAMETER name
Use this for long scriptBlocks to avoid quoting the entire script block in the final output line
.EXAMPLE
time { ls -recurse}
.EXAMPLE
time { ls -recurse} "All the things"
#>
if (!$stopWatch)
{
$script:stopWatch = new-object System.Diagnostics.StopWatch
}
$stopWatch.Reset()
$stopWatch.Start()
. $scriptblock
$stopWatch.Stop()
if ($name -eq $null) {
$name = "$scriptblock"
}
"Execution time: $($stopWatch.ElapsedMilliseconds) ms for $name"
}
Measure-Command works, but it swallows the stdout of the command being run. (Also see Timing a command's execution in PowerShell)
If you need to measure the time taken by something, you can follow this blog entry.
Basically, it suggest to use the .NET StopWatch class:
$sw = [System.Diagnostics.StopWatch]::startNew()
# The code you measure
$sw.Stop()
Write-Host $sw.Elapsed
I need to edit a .jam file used by boost-build for a specific kind of projects. The official manual on BJAM language says:
One of the toolsets that cares about DEF files is msvc. The following line should be added to it. flags msvc.link DEF_FILE
;
Since the DEF_FILE variable is not used by the msvc.link action, we need to modify it to be: actions link bind DEF_FILE { $(.LD) ....
/DEF:$(DEF_FILE) .... } Note the bind DEF_FILE part. It tells bjam to
translate the internal target name in DEF_FILE to a corresponding
filename in the link
So apparently just printing DEF_FILE with ECHO wouldn't work. How can it be expanded to a string variable or something that can actually be checked?
What I need to do is to print an error message and abort the build in case the flag is not set. I tried:
if ! $(DEF_FILE)
{
errors.user-error "file not found" ;
EXIT ;
}
but this "if" is always true
I also tried putting "if ! $_DEF_FILE {...}" inside the "actions" contained but apparently it is ignored.
I am not sure I understand the global task you have. However, if you wanted to add checking for non-empty DEF_FILE -- expanding on the documentation bit you quote, you need to add the check in msvc.link function.
If you have a command line pattern (specified with 'actions') its content is what is passed to OS for execution. But, you can also have a function with the same name, that will be called before generating the actions. For example, here's what current codebase have:
rule link.dll ( targets + : sources * : properties * )
{
DEPENDS $(<) : [ on $(<) return $(DEF_FILE) ] ;
if <embed-manifest>on in $(properties)
{
msvc.manifest.dll $(targets) : $(sources) : $(properties) ;
}
}
You can modify this code to additionally:
if ! [ on $(<) return $(DEF_FILE) ] {
ECHO "error" ;
}
I am working on a UNIX task where i want check if a particular log file is present in the directory or not. If it is present, i would like to rename it by appending a timestamp at the end. The format of the file name is as such: ServiceFileName_0.log
This is what i have so far but it wouldn't rename when i run the script, even though there is a file with the name ServiceFileName_0.log present.
renameLogs()
{
#If a ServiceFileName log exists, rename it
if [ -f $MY_DIR/logs/ServiceFileName_0.log ];
then
mv ServiceFileName_0.log ServiceFileName_0.log.%M%H%S
fi
}
Pls Help!
Thanks
renameLogs()
{
if [ -f $MY_DIR/logs/ServiceFileName_0.log ]
then mv $MY_DIR/ServiceFileName_0.log $MY_DIR/ServiceFileName_0.log.$(date +%M%H%S)
fi
}
Use the directory prefix consistently. Also you need to specify the time properly, as shown.
Better, though (less repetition):
renameLogs()
{
logfile="$MY_DIR/logs/ServiceFileName_0.log"
if [ -f "$logfile" ]
then mv "$logfile" "$logfile.$(date +%H%M%S)"
fi
}
NB: I've reordered the format from MMHHSS to the more conventional HHMMSS order. If you work with date components too, you should seriously consider using the ordering recommended by ISO 8601, which is [YYYY]mmdd. It groups all the log files for a month together in an ls listing, which is usually helpful. Using ddmm order means that the files for the first of each month are grouped together, then the files for the second of each month, etc. This is usually less desirable.
You might need to prefix the file name with the $MY_DIR path, just like you did in the test.
You could replace this:
mv ServiceFileName_0.log ServiceFileName_0.log.%M%H%S
with this
mv $MY_DIR/logs/ServiceFileName_0.log $MY_DIR/logs/ServiceFileName_0.log.%M%H%S
This isn't your apparent immediate problem, but the if construct is wrong: it introduces a time-of-check to time-of-use race condition. In between the if [ -f check and the mv, some other process could come along and change things so you can't move the file anymore even though the check succeeded.
To avoid this class of bugs, always write code that starts by attempting the operation you want to do, then if it failed, figure out why. In this case, what you want is to do nothing if the source file didn't exist, but report an error if the operation failed for any other reason. There is no good way to do that in portable shell, you need something that lets you inspect errno. I'd probably write this C helper:
#include <stdio.h>
#include <errno.h>
#include <string.h>
int main(int argc, char **argv)
{
if (argc != 3) {
fprintf(stderr, "usage: %s source destination\n", argv[0]);
return 2;
}
if (rename(argv[1], argv[2]) && errno != ENOENT) {
fprintf(stderr, "rename '%s' to '%s': %s\n",
argv[1], argv[2], strerror(errno));
return 1;
}
return 0;
}
and then use it like so:
renameLogs()
{
( cd "$MY_DIR/logs"
rename_if_exists ServiceFileName_0.log ServiceFileName_0.log.$(date +%M%H%S)
)
}
The ( cd construct fixes your immediate problem, and unlike the other suggestions, avoids another race in which some other process comes along and messes with the logs directory or its parent directories.
Obligatory shell scripting addendum: Always enclose variable expansions in double quotes, except in the rare cases where you want the expansion to be subject to word splitting.