Trouble concatenating netcdf files with ncrcat - netcdf

I have a list of netcdf files that I am trying to concatenate along the time dimension.
I am attempting to use the steps outlined here, which seem simple enough. However, I am running into some errors (likely some small/stupid oversight on my part...)
When I try to first make time a record dimension, I am using the following command:
ncks -O --mk_rec_dmn time TiMREX_20080526_000001.nc test_out.nc
This, however, give me the following error:
ncks: invalid option -- '-'
It seems like this is just some simple syntax/typo error on my part, but try as I might I can' find anything wrong.
Just to be sure, when I run a ncdump -h on the file, it confirms that there is indeed a time dimension
ncdump -h TiMREX_20080526_000001.nc
netcdf TiMREX_20080526_000001 {
dimensions:
time = 1 ;
bounds = 2 ;
x0 = 300 ;
y0 = 300 ;
z0 = 40 ;
Additionally, if I try to skip this step and just go right to the ncrcat part...
ncrcat -O TiMREX_20080526_000001.nc TiMREX_20080526_000733.nc test_out.nc
I get the following error:
ncopen: filename "TiMREX_20080526_000001.nc": Not a netCDF file
Which is especially odd...I'm pretty confident it is indeed at netCDF file (I just ran ncdump on it after all, and have no problem viewing it with ncview...)
Any thoughts? What simple step am I embarrassingly missing?

This is a weird error as your command looks syntactically correct. To be sure, I copied it to my machine where it ran as expected, with no 'invalid option' error. Thus I am unable to reproduce the problem. Based on the error message you report, it seems as though you might (somehow) be using a character that the system does not understand as a dash. In other words, the error you report is what I would expect if ncks received a funky character that looks like a dash but is not really a dash. Maybe when you copy it to stackoverflow it gets converted to a dash, so it works for me (try copying your own command above back into your console). Make sure the dash character you type is the same as the minus sign on a normal keyboard, and something else. Some keyboard/character sets make characters that look similar to dashes but are not ASCII dashes. Good luck.

Related

View NetCDF metadata without tripping on large file size / format

Summary
I need help getting NCO tools to be helpful. I'm running into the error
"One or more variable sizes violate format constraints"
... when trying to just view the list of variables in the file with:
ncdump -h isrm_v1.2.1.ncf
It seems odd to trip on this when I'm not asking for any large variables to be read ... just metadata. Are there any flags I should or could be passing to avoid this error?
Reprex
isrm_v1.2.1.ncf (165 GB) is available on Zenodo.
Details
I've just installed the NCO suite via brew --install nco --build-from-source on a Mac (I know, I know) running OS X 11.6.5. ncks --version says 5.0.6.
Tips appreciated. I've been trawling through the ncks docs for a couple of hours without much insight. A friend was able to slice the file on a different system running actual Linux, so I'm pretty sure my NCO install is to blame.
How can I dig in deeper to find the root cause? NCO tools don't seem very verbose. I understand there are different sub-formats of NetCDF (3, 4, ...) but I'm not even sure how to verify the version/format of the .nc file that I'm trying to access.
My larger goal is to be able to slice it, like ncks -v pNH4 -d layer,0 isrm_v1.2.1.ncf pNH4L0.nc, but if I can't even view metadata, I'm thinking I need to solve that first.
The more-verbose version of the error message, for the record, is:
HINT: NC_EVARSIZE errors occur when attempting to copy or aggregate input files together into an output file that exceeds the per-file capacity of the output file format, and when trying to copy, aggregate, or define individual variables that exceed the per-variable constraints of the output file format. The per-file limit of all netCDF formats is not less than 8 EiB on modern computers, so any NC_EVARSIZE error is almost certainly due to violating a per-variable limit. Relevant limits: netCDF3 NETCDF_CLASSIC format limits fixed variables to sizes smaller than 2^31 B = 2 GiB ~ 2.1 GB, and record variables to that size per record. A single variable may exceed this limit if and only if it is the last defined variable. netCDF3 NETCDF_64BIT_OFFSET format limits fixed variables to sizes smaller than 2^32 B = 4 GiB ~ 4.2 GB, and record variables to that size per record. Any number of variables may reach, though not exceed, this size for fixed variables, or this size per record for record variables. The netCDF3 NETCDF_64BIT_DATA and netCDF4 NETCDF4 formats have no variable size limitations of real-world import. If any variable in your dataset exceeds these limits, alter the output file to a format capacious enough, either netCDF3 classic with 64-bit offsets (with -6 or --64), to PnetCDF/CDF5 with 64-bit data (with -5), or to netCDF4 (with -4 or -7). For more details, see http://nco.sf.net/nco.html#fl_fmt
Tips appreciated!
ncdump is not an NCO program, so I can't help you there, except to say that printing metadata should not cause an error in this case, so try ncks -m in.nc instead of ncdump -h in.nc.
Nevertheless, the hyperslab problem you have experienced is most likely due to trying to shove too much data into a netCDF format that can't hold it. The generic solution to that is to write the data to a more capacious netCDF format:
Try either one of these commands:
ncks -5 -v pNH4 -d layer,0 isrm_v1.2.1.ncf pNH4L0.nc
ncks -7 -v pNH4 -d layer,0 isrm_v1.2.1.ncf pNH4L0.nc
Formats are documented here

Looping through the content of a file in Zsh

I'm trying to loop through the contents of a file in zsh. In my loop I want to get user input. Going off of this answer for Bash, I'm attempting to do:
while read -u 10 line; do
echo $line;
# TODO read from stdin here, etc.
done 10<myfile.txt
However I get an error:
zsh: parse error near `10'
Referring to the 10 after the done. Obviously I'm not getting the file descriptor syntax right, but I'm having trouble figuring out the docs.
Use a file descriptor number less than 10. If you want to hard code file descriptor numbers, stick to the range 3-9 (plus 0-2 for stdin,out,err). When zsh needs file descriptors itself, it uses them in the 10+ range.
If you're even getting close to needing more than the 7 available hard coded file descriptors, you should really think about using variables to name them. Syntax like exec {myfd}<myfile.txt will open a file with zsh allocating a file descriptor greater than 10 and assigning it to $myfd.
Bourne shell syntax is not entirely unambiguous given file descriptors numbering 10 and over and even in bash, I'd advise against using them. I'm not entirely sure how bash avoids conflicts if it needs to open any for internal use - I guess it never needs to leave any open. This may look like a zsh limitation at first sight but is actually a sensible feature.

Is there a way to check where R is 'stuck' within a for loop? (R)

I am using system() to run several files iteratively through a program via CMD. It deposits each outputs into a sub-directory designated for specifically and only that input file. So # of inputs is exactly equal to the number of output directories/outputs.
My code works for the first iteration, but I can see in the console that it won't move on to the second file after completing the first. The stop sign remains active so I know R is still 'running', but since the for loop environment is unique I can't really tell what it's stuck on. It just stays like this for hours. Therefore I'm not sure how to begin to diagnose the issue I'm having. Is there a way of tracing what happened after cancelling the code, for example?
If your curious, the code looks like this btw. I don't know how to make it reproducible, so I just commented each line:
for (i in 1:length(flist)) {
##flist is a vector of character strings. Each
row of characters is both the name of the input file and the name of the
output directory
setwd(paste0(solutions_dir, "\\", flist[i]))
#sets the appropriate dir
system(paste0(program_dir,"\\program.exe I=",
file_dir, "\\", flist[i], " O=",solutions_dir, "\\", flist[i],
"\\solv"))
##line that inputs program's exe file and the appropriate input/output
locations
}

Unix SQLLDR scipt gives 'Unexpected End of File' error

All, I am running the following script to load the data on to the Oracle Server using unix box and sqlldr. Earlier it gave me an error saying sqlldr: command not found. I added "SQLPLUS < EOF", it still gives me an error for unexpected end of file syntax error on line 12 but it is only 11 line of code. What seems to be the problem according to you.
#!/bin/bash
FILES='ls *.txt'
CTL='/blah/blah1/blah2/name/filename.ctl'
for f in $FILES
do
cat $CTL | sed "s/:FILE/$f/g" >$f.ctl
sqlplus ID/'PASSWORD'#SERVERNAME << EOF sqlldr SCHEMA_NAME/SCHEMA_PASSWORD control=$f.ctl data=$f EOF
done
sqlplus will never know what to do with the command sqlldr. They are two complementary cmd-line utilities for interfacing with Oracle DB.
Note NO sqlplus or EOF etc required to load data into a schema:
#!/bin/bash
#you dont want this FILES='ls *.txt'
CTL_PATH=/blah/blah1/blah2/name/'
CTL_FILE="$CTL_PATH/filename.ctl"
SCHEMA_NM=SCHEMA_NAME
SCHEMA_PSWD=SCHEMA_PASSWORD
for f in *.txt
do
# don't need cat! cat $CTL | sed "s/:FILE/$f/g" >"$f".ctl
sed "s/:FILE/$f/g" "$CTL_FILE" > "$CTL_PATH/$f.ctl"
#myBad sqlldr "$SCHEMA_NAME/$SCHEMA_PASSWORD" control="$CTL_PATH/$f.ctl" data="$f"
sqlldr $SCHEMA_USER/$SCHEMA_PASSWORD#$SERVER_NAME control="$CTL_PATH/$f.ctl" data="$f" rows=10000 direct=true errors=999
done
Without getting too philosophical, using assignments like FILES=$(ls *.txt) is a bad habit to get into. By contrast, for f in *.txt will deal correctly for files with odd characters in them (like spaces or other syntax breaking values). BUT the other habit you do want to get into is to quote all variable references (like $f), with dbl-quotes : "$f", OK? ;-) This is the otherside of protection for files with spaces etc embedded in them.
In the edit update, I've varibalized your CTL_PATH and CTL_FILE. I think I understand your intent, that you have 1 std CTL_FILE that you pass thru sed to create a table specific .ctl file (a good approach in my experience). Note that you don't need to use cat to send a file to sed, but your use to create a altered file via redirection (> $f.ctl) is very shell-like too.
In 2nd edit update, I looked here on S.O. and found an example sqlldr cmdline that has the correct syntax and have modified to work with your variable names.
To finish up,
A. Are you sure the Oracle Client package is installed on the machine
that you are running your script on?
B. Is the /path/to/oracle/client/tools/bin included in your working
$PATH?
C. try which sqlldr. If you don't get anything, either its not
installed or its not in the path.
D. If not installed, you'll have to get it installed.
E. Once installed, note the directory that contains the sqlldr cmd.
find / -name 'sqlldr*' will take a long time to run, but it will
print out the path you want to use.
F. Take the "path" part of what is returned (like
/opt/oracle/11.2/client/bin/ (but not the sqlldr at the end), and
edit script at 2nd line with
(Txt added to appease the S.O. Formatter ;-) )
export ORCL_PATH="/path/you/found/to/oracle/client"
export PATH="$ORCL_PATH:$PATH"
These steps should solve any remaining issues. If this doesn't work, see if there is someone where you work that understands your local computing environment that can help explain any missing or different steps.
IHTH

zsh make **/*.cpp **/*.cxx **/*.hpp not result in error

I have "v" aliased to "vim **/*.cpp **/*.hpp **/*.cxx"
Problem is, if I'm in a directory without any *.cxx files, zsh treats this as an error. Is there anyway to tell zsh to create the absence of **/*.cxx files as "" instead of an error?
It sounds like you want:
set -o NULL_GLOB
Another variation that may be of interest is:
set -o CSH_NULL_GLOB
They work slightly different when all the patterns fail to expand. When at least one pattern successfully expands, the two are the same. But if none of the patterns expand, NULL_GLOB will still run the command while CSH_NULL_GLOB will return an error.

Resources