Error when attempt to define a variable with two (or more) unlimited dimensions in netcdf4 - netcdf

I am trying to define and write a variable that has two unlimited dimensions, using netcdf-c (version 4.8.1) like the following..
...
...
...
if ((rval = nc_create(out_file_nm, NC_CLOBBER, &out_ncid))) err(rval);
//// create netcdf dimensions
if ((rval = nc_def_dim(out_ncid, t_nm, NC_UNLIMITED, &out_t_dimid))) err(rval);
if ((rval = nc_def_dim(out_ncid, y_nm, nres_y , &out_y_dimid))) err(rval);
if ((rval = nc_def_dim(out_ncid, x_nm, nres_x , &out_x_dimid))) err(rval);
if ((rval = nc_def_dim(out_ncid, b_nm, NC_UNLIMITED, &out_b_dimid))) err(rval);
...
...
...
And I get the following error message.
Error: NetCDF: NC_UNLIMITED size already in use
As far as I know, starting with netcdf-4, I thought we can use multiple unlimited dimensions (link: https://www.unidata.ucar.edu/software/netcdf/workshops/2010/netcdf4/UnlimDims.html).
I don't know what to do. The size of the values that I am trying to write is undetermined, it changes along the time dimension. So, I would really like to use two unlimited dimensions for this.
Does anyone have experience using multiple unlimited dimensions?

I'd try adding a mode option to your nc_create call. NetCDF defaults to the classic data model w one unlimited dimension. See: https://www.unidata.ucar.edu/software/netcdf/docs/faq.html#formatsdatamodelssoftwarereleases

Related

How to append / add layers to geopackages in PyQGIS

For a project I am creating different layers which should all be written into one geopackage.
I am using QGIS 3.16.1 and the Python console inside QGIS which runs on Python 3.7
I tried many things but cannot figure out how to do this. This is what I used so far.
vl = QgsVectorLayer("Point", "points1", "memory")
vl2 = QgsVectorLayer("Point", "points2", "memory")
pr = vl.dataProvider()
pr.addAttributes([QgsField("DayID", QVariant.Int), QgsField("distance", QVariant.Double)])
vl.updateFields()
f = QgsFeature()
for x in range(len(tag_temp)):
f.setGeometry(QgsGeometry.fromPointXY(QgsPointXY(lon[x],lat[x])))
f.setAttributes([dayID[x], distance[x]])
pr.addFeature(f)
vl.updateExtents()
# I'll do the same for vl2 but with other data
uri ="D:/Documents/QGIS/test.gpkg"
options = QgsVectorFileWriter.SaveVectorOptions()
context = QgsProject.instance().transformContext()
QgsVectorFileWriter.writeAsVectorFormatV2(vl1,uri,context,options)
QgsVectorFileWriter.writeAsVectorFormatV2(vl2,uri,context,options)
Problem is that the in the 'test.gpkg' a layer is created called 'test' and not 'points1' or 'points2'.
And the second QgsVectorFileWriter.writeAsVectorFormatV2() also overwrites the output of the first one instead of appending the layer into the existing geopackage.
I also tried to create single .geopackages and then use 'Package Layers' processing tool (processing.run("native:package") to merge all layers into one geopackage, but then the attributes types are all converted into strings unfortunately.
Any help is much appreciated. Many thanks in advance.
You need to change the SaveVectorOptions, in particular the mode of actionOnExistingFile after creating the gpkg file :
options = QgsVectorFileWriter.SaveVectorOptions()
#options.driverName = "GPKG"
options.layerName = v1.name()
QgsVectorFileWriter.writeAsVectorFormatV2(v1,uri,context,options)
#switch mode to append layer instead of overwriting the file
options.actionOnExistingFile = QgsVectorFileWriter.CreateOrOverwriteLayer
options.layerName = v2.name()
QgsVectorFileWriter.writeAsVectorFormatV2(v2,uri,context,options)
The documentation is here : SaveVectorOptions
I also tried to create single .geopackages and then use 'Package Layers' processing tool (processing.run("native:package") to merge all layers into one geopackage, but then the attributes types are all converted into strings unfortunately.
This is definitively the recommended way, please consider reporting the bug

How do I create a loop function to apply acoustic indices from "soundecology" to specific sections of .wav files using R

I have a large quantity of .wav files that I need to analyze using the acoustic indices from the "soundecology" package in R. However, the recordings do not have uniform start times and I need to analyze specific periods of time within the files. I want to create a function and loop for automating the process.
I have created a spread sheet for each folder of recordings (each folder is a different location) that lays out the recording and the times within each recording that I need to analyze. Basically, a row contains: the sound file name, the time when the sample should start (eg. 09:00:00, the number of seconds from the start of the file that that time occurs, and the munber of seconds from the start time of the file that the end of the sample should occur.
That data looks like this:
Spread sheet of data
I am using the package "tuneR" and "warbleR" to select the specific portion of a sound file that I want to analyze. Here is the the code and the output that I would like to loop across all the sound files:
wavrow1 <-read_wave(mvb$sound.files[1], from = mvb$start[1], to = mvb$end[1])
wavrow1.aci <- acoustic_complexity(wavrow1, j=10)
which yeilds
max_freq not set, using value of: 22050
min_freq not set, using value of: 0
This is a mono file.
Calculating index. Please wait...
Acoustic Complexity Index (total): 934.568
However, when I put this into a function in order to then put it into a loop I get a different output.
acianalyzeFUN <- function(mvb, i){
r <- read_wave(mvb$sound.files[i], mvb$start[i], mvb$end[i])
soundfile.aci <- acoustic_complexity(r, j=10)
}
row1.test <- acianalyzeFUN(mvb, 1)
This gives the output:
max_freq not set, using value of: 22050
min_freq not set, using value of: 0
This is a mono file.
Calculating index. Please wait...
Acoustic Complexity Index (total): 19183.03
Acoustic Complexity Index (by minute): 931.98
Which is different.
So I need to fix this function and put it into a loop so that I can apply it across all the files and save the results into a data frame or ultimately another spread sheet.
I was thinking a loop like the following might work but I am also getting errors with it:
output <- vector("logical", length(97))
for (i in seq_along(mvb$sound.files)) {
output[[i]] <- acianalyzeFUN(mvb, i)
}
Which returns this error:
max_freq not set, using value of: 22050
min_freq not set, using value of: 0
This is a mono file.
Calculating index. Please wait...
Acoustic Complexity Index (total): 19183.03
Acoustic Complexity Index (by minute): 931.98
Error in output[[i]] <- acianalyzeFUN(mvb, i) :
more elements supplied than there are to replace
Thanks for any help and advice on this. Please let me know if there are any other pieces of information that would be helpful.
the read_wave function takes following arguments :
read_wave(X, index, from = X$start[index], to = X$end[index], channel = NULL,
header = FALSE, path = NULL)
In the manual test, you specify from = mvb$start[1], to = mvb$end[1]
In the function you created, you dont specify the arguments :
r <- read_wave(mvb$sound.files[i], mvb$start[i], mvb$end[i])
so that mvb$start[i] gets affected to index and mvb$end[i] to from.
You should write:
acianalyzeFUN <- function(mvb, i){
r <- read_wave(mvb$sound.files[i], from = mvb$start[i], to = mvb$end[i])
soundfile.aci <- acoustic_complexity(r, j=10)
}
This should explain the difference you observe.
Regarding the error, you create a vector of logical to collect the result, but acianalyzeFUN returns nothing : it just sets two variables r and soundfileaci without returning anything.

U-SQL Ignore Empty Files

I receive a daily dump of files from a data provider. On occasion we receive empty files (20bytes). Is there any way to automatically avoid processing or skip these files?
I have tried:
USING Extractors.Csv(skipFirstNRows:1, silent:true);
But I seem to get a vertex failure related to what I believe is the empty files.
We recently added a FILE.LENGTH property as a computed virtual column that you can use to filter out files of a certain size.
For example the following should only operate on the files that are larger than 20 bytes:
#data =
EXTRACT
// ... columns to extract
, file_sz = FILE.LENGTH()
FROM "/mydata/{*}"
USING Extractors.Csv();
#res =
SELECT *
FROM #data
WHERE file_sz > 20;

Unable to build inline segments in RSiteCatalyst package in R

I am trying to build the inline segment to filter the pages (ex. to separate the pages for blogs and games) using the function BuildClassificationValueSegment() to get the data from Adobe Analytics API,
I have tried some thing like
report.data.visits <- QueueTrended(reportsuite.id,date.from,date.to,metrics,elements,
segment.inline = BuildClassificationValueSegment("evar2","blog","OR")).
Got error like :
Error in ApiRequest(body = report.description, func.name = "Report.Validate") :
ERROR: segment_invalid - Segment "evar2" not valid for this company
In addition: Warning message:
In if (segment.inline != "") { :
the condition has length > 1 and only the first element will be used
Please help on the same.Thanks in advance...
I recommend you to declare the InlineSegment in advance and store it in a variable. Then pass it to the QueueTrended function.
I've been using the following syntax to generate an inline segment:
InlineSegment <- list(container=list(type=unbox("hits"),
rules=data.frame(
name=c("Page Name(eVar48)"),
element=c("evar48"),
operator=c("equals"),
value=c(as.character("value1","value2"))
))
You can change the name and element arguments in order to personalize the query.
The next step is to pass the InlineSegment to the QueueRanked function:
Report <- as.data.frame(QueueRanked("reportsuite",
date.from = dateStart,
date.to = dateEnd,
metrics = c("pageviews"),
elements = c("element"),
segment.inline = InlineSegment,
max.attempts=500))
I borrowed that syntax from this thread some time ago: https://github.com/randyzwitch/RSiteCatalyst/issues/129
Please note that there might be easier ways to obtain this kind of report without using InlineSegmentation. Maybe you can use the selected argument from the QueueRanked function in order to narrow down the scope of the report.
Also, I'm purposefully avoiding the BuildClassificationValueSegment function as I found it a bit difficult to understand.
Hope this workaround helps...

Looking for algorithm to do long pair wise nucleotide alignments

I am trying to scan for possible SNPs and indels by aligning scaffolds to subsequences from a reference genome. (the raw reads are not available). I am using R/bioconductor and the `pairwiseAlignment function from the Biostrings package.
This was working fine for smaller scaffolds, but failed when I tried to align as 56kbp scaffold with the error message:
Error in QualityScaledXStringSet.pairwiseAlignment(pattern = pattern,
: cannot allocate memory block of size 17179869183.7 Gb
I am not sure if this is a bug or not ? ; I was under the impression that the Needleman-Wunsch algorithm used by pairwiseAlignment is an O(n*m) which I thought would imply the computational demand to be on the order of 3.1E9 operations (56K * 56k ~= 3.1E9). It seems the Needleman-Wunsch similarity matrix should as well take up on the order of 3.1 gigs of memory as well. Not sure if I'm not remembering big-o notation correctly or that is actually the memory overhead that would be needed to build the alignment given the overhead of the R scripting environment.
Does anybody have suggestions for a better alignment algorithm to use for aligning longer sequences? An initial alignment was already done using BLAST to find the region of the reference genome to align. I am not entirely confident BLAST's reliability for correctly placing indels and I have not yet been able to find an api as good as that provided by biostrings for parsing the raw BLAST alignments.
By the way, here is a code snippet that replicates the problem:
library("Biostrings")
scaffold_set = read.DNAStringSet(scaffold_file_name) #scaffold_set is a DNAStringSet instance
scafseq = scaffold_set[[scaffold_name]] #scaf_seq is a "DNAString" instance
genome = read.DNAStringSet(genome_file_name)[[1]] #genome is a "DNAString" instance
#qstart, qend, substart, subend are all from intial BLAST alignment step
scaf_sub = subseq(scafseq, start=qstart, end=qend) #56170-letter "DNAString" instance
genomic_sub = subseq(genome, start=substart, end=subend) #56168-letter "DNAString" instance
curalign = pairwiseAlignment(pattern = scaf_sub, subject = genomic_sub)
#that last line gives the error:
#Error in .Call2("XStringSet_align_pairwiseAlignment", pattern, subject, :
#cannot allocate memory block of size 17179869182.9 Gb
The error does not happen with shorter alignments (hundreds of bases).
I have not yet found the length cutoff where the error starts happening
So I use Clustal as an alignment tool. Not sure about the specific performance, but it has never given me issues when doing multiple sequence alignments of large quantity. Here is a script that runs a whole directory of .fasta files and aligns them. You can modify the flags on the system call to suit your input/output needs. Just look at the clustal documentation. This is in Perl, I don't use R too much for alignments. You need to edit the executable path in the script to match where clustal is on your computer.
#!/usr/bin/perl
use warnings;
print "Please type the list file name of protein fasta files to align (end the directory path with a / or this will fail!): ";
$directory = <STDIN>;
chomp $directory;
opendir (DIR,$directory) or die $!;
my #file = readdir DIR;
closedir DIR;
my $add="_align.fasta";
foreach $file (#file) {
my $infile = "$directory$file";
(my $fileprefix = $infile) =~ s/\.[^.]+$//;
my $outfile="$fileprefix$add";
system "/Users/Wes/Desktop/eggNOG_files/clustalw-2.1-macosx/clustalw2 -INFILE=$infile -OUTFILE=$outfile -OUTPUT=FASTA -tree";
}

Resources