Cross platform way of testing whether a file is a directory - unix

Currently I have some code like (condensed and removed a bunch of error checking):
dp = readdir(dir);
if (dp->d_type == DT_DIR) {
}
This works swimmingly on my Linux machine. However on another machine (looks like SunOS, sparc):
SunOS HOST 5.10 Generic_127127-11 sun4u sparc SUNW,Ultra-5_10
I get the following error at compile time:
error: structure has no member named `d_type'
error: `DT_DIR' undeclared (first use in this function)
I thought the dirent.h header was crossplatform (for POSIX machines). Any suggestions.

Ref http://www.nexenta.org/os/Porting_Codefixes:
The struct dirent definition in solaris does not contain the d_type field. You would need to make the changes as follows
if (de->d_type == DT_DIR)
{
return 0;
}
changes to
struct stat s; /*include sys/stat.h if necessary */
..
..
stat(de->d_name, &s);
if (s.st_mode & S_IFDIR)
{
return 0;
}
Since stat is also POSIX standard it should be more cross-platform. But you may want to use if ((s.st_mode & S_IFMT) == S_IFDIR) to follow the standard.

Related

path not being detected by Nextflow

i'm new to nf-core/nextflow and needless to say the documentation does not reflect what might be actually implemented. But i'm defining the basic pipeline below:
nextflow.enable.dsl=2
process RUNBLAST{
input:
val thr
path query
path db
path output
output:
path output
script:
"""
blastn -query ${query} -db ${db} -out ${output} -num_threads ${thr}
"""
}
workflow{
//println "I want to BLAST $params.query to $params.dbDir/$params.dbName using $params.threads CPUs and output it to $params.outdir"
RUNBLAST(params.threads,params.query,params.dbDir, params.output)
}
Then i'm executing the pipeline with
nextflow run main.nf --query test2.fa --dbDir blast/blastDB
Then i get the following error:
N E X T F L O W ~ version 22.10.6
Launching `main.nf` [dreamy_hugle] DSL2 - revision: c388cf8f31
Error executing process > 'RUNBLAST'
Error executing process > 'RUNBLAST'
Caused by:
Not a valid path value: 'test2.fa'
Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run
I know test2.fa exists in the current directory:
(nfcore) MN:nf-core-basicblast jraygozagaray$ ls
CHANGELOG.md conf other.nf
CITATIONS.md docs pyproject.toml
CODE_OF_CONDUCT.md lib subworkflows
LICENSE main.nf test.fa
README.md modules test2.fa
assets modules.json work
bin nextflow.config workflows
blast nextflow_schema.json
I also tried with "file" instead of path but that is deprecated and raises other kind of errors.
It'll be helpful to know how to fix this to get myself started with the pipeline building process.
Shouldn't nextflow copy the file to the execution path?
Thanks
You get the above error because params.query is not actually a path value. It's probably just a simple String or GString. The solution is to instead supply a file object, for example:
workflow {
query = file(params.query)
BLAST( query, ... )
}
Note that a value channel is implicitly created by a process when it is invoked with a simple value, like the above file object. If you need to be able to BLAST multiple query files, you'll instead need a queue channel, which can be created using the fromPath factory method, for example:
params.query = "${baseDir}/data/*.fa"
params.db = "${baseDir}/blastdb/nt"
params.outdir = './results'
db_name = file(params.db).name
db_path = file(params.db).parent
process BLAST {
publishDir(
path: "{params.outdir}/blast",
mode: 'copy',
)
input:
tuple val(query_id), path(query)
path db
output:
tuple val(query_id), path("${query_id}.out")
"""
blastn \\
-num_threads ${task.cpus} \\
-query "${query}" \\
-db "${db}/${db_name}" \\
-out "${query_id}.out"
"""
}
workflow{
Channel
.fromPath( params.query )
.map { file -> tuple(file.baseName, file) }
.set { query_ch }
BLAST( query_ch, db_path )
}
Note that the usual way to specify the number of threads/cpus is using cpus directive, which can be configured using a process selector in your nextflow.config. For example:
process {
withName: BLAST {
cpus = 4
}
}

posix_fallocate() failed: Operation not permitted while opening .realm file

I get the below error when i try to open and download .realm file in /tmp directory of serverless framework.
{"errorType":"Runtime.UnhandledPromiseRejection","errorMessage":"Error: posix_fallocate() failed: Operation not permitted" }
Below is the code:
let realm = new Realm({path: '/tmp/custom.realm', schema: [schema1, schema2]});
realm.write(() => {
console.log('completed==');
});
EDIT: this might soon be finally fixed in Realm-Core: see issue 4957.
In case you'll run into this problem elsewhere, here's a workaround.
This caused by AWS Lambda not supporting the fallocate and fallocate64 system calls. Instead of returning the correct error code in this case, which would be EINVAL for not supported on this file system, Amazon has blocked the system call so that it returns EPERM. Realm-Core has code that handles EINVAL return value correctly but will be bewildered by the unexpected EPERM returned from the system call.
The solution is to add a small shared library as a layer to the lambda: compile the following C file on Linux machine or inside lambda-ci Docker image:
#include <errno.h>
#include <fcntl.h>
int posix_fallocate(int __fd, off_t __offset, off_t __len) {
return EINVAL;
}
int posix_fallocate64(int __fd, off_t __offset, off_t __len) {
return EINVAL;
}
Now, compile this to a shared object with something like
gcc -shared fix.c -o fix.so
Then add it to a root of a ZIP file:
zip layer.zip fix.so
Create a new lambda layer from this zip
Add the lambda layer to your lambda function
Finally make the shared object be loaded by configuring the environment value LD_PRELOAD with value /opt/fix.so to your Lambda.
Enjoy.

Python 3.8.0a3 failed to get the Python codec of the filesystem encoding

When I call the embedded python from my c++ app, the application crashes. Something in _PyCodec_Lookup is not working. Debugging seems to indicate the error comes from this line in file codecs.c:
PyInterpreterState *interp = _PyInterpreterState_GET_UNSAFE();
if (interp->codec_search_path == NULL && _PyCodecRegistry_Init())
goto onError;
interp->codec_search_path is NULL and _PyCodecRegistry_Init() returns -1.
This problems is already known; Solutions like removing python27, or setting environment variables PYTHONPATH or PYTHONHOME to either "" or the src path do not work.
I link to cpython version 3.8.0a3 which I've compiled with the included scripts inside the PCBUILD folder, using the same compiler as my applications (MSVC 15 2017).
The OS is windows 8.1 64 bit.
My application links to the produced binary python38.dll / python38.lib successfully but then crashes at runtime, and I have no idea what might cause the trouble - people suggest the environment is polluted, but it must not depend on the system's environment. How would I confidently ship the app to other people's computers? Did I maybe miss some compile flags or defines?
I'm sure you are, but you are doing this after Py_Initialize ?
I include the python3.8 dll but when starting up I take the executable full
path and then add that to the python path (the code below is not tested but
it's roughly what I do)
void py_add_to_path (const char *path)
{
PyObject *py_cur_path, *py_item;
char *new_path;
int wc_len, i;
wchar_t *wc_new_path;
char *item;
new_path = strdup(path);
py_cur_path = PySys_GetObject("path");
for (i = 0; i < PyList_Size(py_cur_path); i++) {
char *tmp = strappend(new_path, PATHSEP);
myfree(new_path);
new_path = tmp;
py_item = PyList_GetItem(py_cur_path, i);
if (!PyUnicode_Check(py_item)) {
continue;
}
item = py_obj_to_str(py_item);
if (!item) {
continue;
}
tmp = strappend(new_path, item);
free(new_path);
new_path = tmp;
free(item);
}
/* Convert to wide chars. */
wc_len = sizeof(wchar_t) * (strlen(new_path) + 1);
wc_new_path = (wchar_t *) malloc(wc_len, "wchar str");
mbstowcs(wc_new_path, new_path, wc_len);
PySys_SetPath(wc_new_path);
}
Seems a bit old now but looks like there is need to add the Lib folder including all the python system files to the search paths of the application.
Take a look at this Python port for UWP
It has an embedded python34app.zip that includes the codec files.

Compilation issue with sitmo prng, c++11, and armadillo

I'm trying to compile the sitmo prng under C++11 within an R package. The problematic code has been packaged and is available here. The objective of this R package is to make available the sitmo header file so that other packages are able to use the LinkTo field within description. As an added bonus, the package is scheduled to ship with an Armadillo + OpenMP example. There is one other package, mvnfast, that uses sitmo, but only under c++98 and boost headers.
I believe that the error which I am receiving is specific to OS X and clang. I haven't been able to replicate it on Windows via win-build. With that being said, the error is:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/random:3641:44: error: non-type template argument is not a constant expression
const size_t __logR = __log2<uint64_t, _URNG::max() - _URNG::min() + uint64_t(1)>::value;
The error has only popped up on the Rcpp dev list. The resolution in this case was to compile under C++98 and use boost.
The above error is followed by the following notes:
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/random:3773:18: note: in instantiation of function template specialization 'std::__1::generate_canonical<double, 53, sitmo::prng_engine>' requested here
* _VSTD::generate_canonical<_RealType, numeric_limits<_RealType>::digits>(__g)
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/random:3737:17: note: in instantiation of function template specialization 'std::__1::uniform_real_distribution<double>::operator()<sitmo::prng_engine>' requested here
{return (*this)(__g, __p_);}
^
sitmo_test.cpp:77:26: note: in instantiation of function template specialization 'std::__1::uniform_real_distribution<double>::operator()<sitmo::prng_engine>' requested here
double u = distunif(engine);
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/random:3641:44: note: non-constexpr function 'max' cannot be used in a constant expression
const size_t __logR = __log2<uint64_t, _URNG::max() - _URNG::min() + uint64_t(1)>::value;
^
../inst/include/prng_engine.hpp:100:23: note: declared here
static result_type (max)() { return 0xFFFFFFFF; }
The version of clang being used is:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.3.0
Thread model: posix
Looking into the code, there is a bug in the sitmo prng_engine.h. min() and max() were declared as
static result_type (min)() { return 0; }
static result_type (max)() { return 0xFFFFFFFF; }
If you take a look at, say, standard LCG max from here, you could see that it is declared constexpr, ditto for min.
As soon as you make those methods constexpr in the sitmo header file, I believe you could use them in template expression.
UPDATE
I've looked into GCC 5 headers, methods indeed are declared constexpr

Unix If file exists, rename

I am working on a UNIX task where i want check if a particular log file is present in the directory or not. If it is present, i would like to rename it by appending a timestamp at the end. The format of the file name is as such: ServiceFileName_0.log
This is what i have so far but it wouldn't rename when i run the script, even though there is a file with the name ServiceFileName_0.log present.
renameLogs()
{
#If a ServiceFileName log exists, rename it
if [ -f $MY_DIR/logs/ServiceFileName_0.log ];
then
mv ServiceFileName_0.log ServiceFileName_0.log.%M%H%S
fi
}
Pls Help!
Thanks
renameLogs()
{
if [ -f $MY_DIR/logs/ServiceFileName_0.log ]
then mv $MY_DIR/ServiceFileName_0.log $MY_DIR/ServiceFileName_0.log.$(date +%M%H%S)
fi
}
Use the directory prefix consistently. Also you need to specify the time properly, as shown.
Better, though (less repetition):
renameLogs()
{
logfile="$MY_DIR/logs/ServiceFileName_0.log"
if [ -f "$logfile" ]
then mv "$logfile" "$logfile.$(date +%H%M%S)"
fi
}
NB: I've reordered the format from MMHHSS to the more conventional HHMMSS order. If you work with date components too, you should seriously consider using the ordering recommended by ISO 8601, which is [YYYY]mmdd. It groups all the log files for a month together in an ls listing, which is usually helpful. Using ddmm order means that the files for the first of each month are grouped together, then the files for the second of each month, etc. This is usually less desirable.
You might need to prefix the file name with the $MY_DIR path, just like you did in the test.
You could replace this:
mv ServiceFileName_0.log ServiceFileName_0.log.%M%H%S
with this
mv $MY_DIR/logs/ServiceFileName_0.log $MY_DIR/logs/ServiceFileName_0.log.%M%H%S
This isn't your apparent immediate problem, but the if construct is wrong: it introduces a time-of-check to time-of-use race condition. In between the if [ -f check and the mv, some other process could come along and change things so you can't move the file anymore even though the check succeeded.
To avoid this class of bugs, always write code that starts by attempting the operation you want to do, then if it failed, figure out why. In this case, what you want is to do nothing if the source file didn't exist, but report an error if the operation failed for any other reason. There is no good way to do that in portable shell, you need something that lets you inspect errno. I'd probably write this C helper:
#include <stdio.h>
#include <errno.h>
#include <string.h>
int main(int argc, char **argv)
{
if (argc != 3) {
fprintf(stderr, "usage: %s source destination\n", argv[0]);
return 2;
}
if (rename(argv[1], argv[2]) && errno != ENOENT) {
fprintf(stderr, "rename '%s' to '%s': %s\n",
argv[1], argv[2], strerror(errno));
return 1;
}
return 0;
}
and then use it like so:
renameLogs()
{
( cd "$MY_DIR/logs"
rename_if_exists ServiceFileName_0.log ServiceFileName_0.log.$(date +%M%H%S)
)
}
The ( cd construct fixes your immediate problem, and unlike the other suggestions, avoids another race in which some other process comes along and messes with the logs directory or its parent directories.
Obligatory shell scripting addendum: Always enclose variable expansions in double quotes, except in the rare cases where you want the expansion to be subject to word splitting.

Resources