Extract a given variable from multiple Netcdf files and concatenate to a single file - netcdf

I am trying to extract a single variable (DUEXTTAU) from multiple NC files, and then combine all the individual files into a single NC file. I am using nco, but have an issue with ncks.
The NC filenames follow:
MERRA2_100.tavgM_2d_aer_Nx.YYYYMM.nc4
Each file has 1 (monthly) time step, and the time coordinate has no real value, but changes in units or begin_date. For example, in the file MERRA2_100.tavgM_2d_aer_Nx.198001.nc4, it has:
int time(time=1);
:long_name = "time";
:units = "minutes since 1980-01-01 00:30:00";
:time_increment = 60000; // int
:begin_date = 19800101; // int
:begin_time = 3000; // int
:vmax = 9.9999999E14f; // float
:vmin = -9.9999999E14f; // float
:valid_range = -9.9999999E14f, 9.9999999E14f; // float
:_ChunkSizes = 1U; // uint
I repeat this step for each file
ncks -v DUEXTTAU MERRA2_100.tavgM_2d_aer_Nx.YYYYMM.nc4 YYYYMM.nc4
and then
ncrcat YYYYMM.nc4 final.nc4
In final.nc4, the time coordinate has the same value (of the first YYYYMM.nc4). For example, after combining the 3 files of 198001, 198002 and 198003, the time coordinate equals 198001 for all the time steps. How should I deal with this?

Firstly, this command should work:
ncrcat -v DUEXTTAU MERRA2_100.tavgM_2d_aer_Nx.??????.nc4 final.nc4
However, recent versions of NCO fail to correctly reconstruct or re-base the time coordinate when time is an integer, which it is in your case. The fix is in the latest NCO snapshot on GitHub and will be in 4.9.3 to be released hopefully this week. If installing from source is not an option, then manual intervention would be required (e.g., change time to floating point in each input file with ncap2 -s 'time=float(time)' in.nc out.nc). In any case, the time_increment, begin_date, and begin_time attributes are non-standard and will simply be copied from the first file. But time itself should be correctly reconstructed if you use a non-broken version of ncrcat.

you can do this using cdo as well, but you need two steps:
cdo mergetime MERRA2_100.tavgM_2d_aer_Nx.??????.nc4 merged_file.nc
cdo selvar,DUEXTTAU merged_file.nc DUEXTTAU.nc
This should actually work if the begin dates are all set correctly. The problem is that merged_file.nc could actually be massive, and so it may be better to loop through to extract the variable first and then combine:
for file in `ls MERRA2_100.tavgM_2d_aer_Nx.??????.nc4`; do
cdo selvar,DUEXTTAU $file ${file#????}_duexttau.nc4
done
cdo mergetime MERRA2_100.tavgM_2d_aer_Nx.??????_duexttau.nc4 DUEXTTAU.nc
rm -f MERRA2_100.tavgM_2d_aer_Nx.??????_duexttau.nc4 # clean up

Related

Using CDO to convert 2D .nc file to a 4D .nc file

I have a 2D .nc file with dimensions time and depth that I want to convert to a 4D .nc file. Latitude and longitude are saved as variable names in the 2D file. They are not in a particular order and there is large missing areas as well. The .nc file also contains temperature recordings for each time and depth.
The file header is as follows:
dimensions:
time = UNLIMITED ; // (309 currently)
level = 2000 ;
variables:
float latitude(time) ;
latitude:units = "degree_north" ;
float longitude(time) ;
longitude:units = "degree_east" ;
float temperature(time, level) ;
temperature:standard_name = "sea_water_temperature" ;
temperature:long_name = "Water Temperature" ;
temperature:units = "Celsius" ;
temperature:_FillValue = -9999.f ;
temperature:missing_value = -9999.f ;
Is there an easy way using cdo or nco to bin the temperature recordings into a pre-defined latitude x longitude grid so that the resulting .nc file has four dimensions? (time,depth,latitude,longitude)
Adrian's answer looks correct to me except you do not need/want the dollarsigns in front of the dimension names, to so try
ncap2 -s 'Temp_new[time,depth,latitude,longitude]=temperature' in.nc out.nc
Documentation is here.
I think this posting is perhaps related to your question.
Maybe try
ncap2 -s 'Temp_new[time,depth,latitude,longitude]=temperature' in.nc out.nc
I'm not great at ncap2, perhaps Charlie will correct this post if this is not exactly correct.
(now edited to correct the error pointed out by Charlie)

View NetCDF metadata without tripping on large file size / format

Summary
I need help getting NCO tools to be helpful. I'm running into the error
"One or more variable sizes violate format constraints"
... when trying to just view the list of variables in the file with:
ncdump -h isrm_v1.2.1.ncf
It seems odd to trip on this when I'm not asking for any large variables to be read ... just metadata. Are there any flags I should or could be passing to avoid this error?
Reprex
isrm_v1.2.1.ncf (165 GB) is available on Zenodo.
Details
I've just installed the NCO suite via brew --install nco --build-from-source on a Mac (I know, I know) running OS X 11.6.5. ncks --version says 5.0.6.
Tips appreciated. I've been trawling through the ncks docs for a couple of hours without much insight. A friend was able to slice the file on a different system running actual Linux, so I'm pretty sure my NCO install is to blame.
How can I dig in deeper to find the root cause? NCO tools don't seem very verbose. I understand there are different sub-formats of NetCDF (3, 4, ...) but I'm not even sure how to verify the version/format of the .nc file that I'm trying to access.
My larger goal is to be able to slice it, like ncks -v pNH4 -d layer,0 isrm_v1.2.1.ncf pNH4L0.nc, but if I can't even view metadata, I'm thinking I need to solve that first.
The more-verbose version of the error message, for the record, is:
HINT: NC_EVARSIZE errors occur when attempting to copy or aggregate input files together into an output file that exceeds the per-file capacity of the output file format, and when trying to copy, aggregate, or define individual variables that exceed the per-variable constraints of the output file format. The per-file limit of all netCDF formats is not less than 8 EiB on modern computers, so any NC_EVARSIZE error is almost certainly due to violating a per-variable limit. Relevant limits: netCDF3 NETCDF_CLASSIC format limits fixed variables to sizes smaller than 2^31 B = 2 GiB ~ 2.1 GB, and record variables to that size per record. A single variable may exceed this limit if and only if it is the last defined variable. netCDF3 NETCDF_64BIT_OFFSET format limits fixed variables to sizes smaller than 2^32 B = 4 GiB ~ 4.2 GB, and record variables to that size per record. Any number of variables may reach, though not exceed, this size for fixed variables, or this size per record for record variables. The netCDF3 NETCDF_64BIT_DATA and netCDF4 NETCDF4 formats have no variable size limitations of real-world import. If any variable in your dataset exceeds these limits, alter the output file to a format capacious enough, either netCDF3 classic with 64-bit offsets (with -6 or --64), to PnetCDF/CDF5 with 64-bit data (with -5), or to netCDF4 (with -4 or -7). For more details, see http://nco.sf.net/nco.html#fl_fmt
Tips appreciated!
ncdump is not an NCO program, so I can't help you there, except to say that printing metadata should not cause an error in this case, so try ncks -m in.nc instead of ncdump -h in.nc.
Nevertheless, the hyperslab problem you have experienced is most likely due to trying to shove too much data into a netCDF format that can't hold it. The generic solution to that is to write the data to a more capacious netCDF format:
Try either one of these commands:
ncks -5 -v pNH4 -d layer,0 isrm_v1.2.1.ncf pNH4L0.nc
ncks -7 -v pNH4 -d layer,0 isrm_v1.2.1.ncf pNH4L0.nc
Formats are documented here

Concatenate ncdf files and chunk on record dimension (nco)

I am trying to concatenate a large (~4000) number of ncdf files into a single file. Each input file is a spatial raster, with a x and y dimension.
I am trying to work with ncecat:
ncecat -4 -L 5 -D 2 --open_ram --cnk_csh=1000000000 \
--cnk_dmn record,2000 --cnk_dmn x,10 --cnk_dmn y,10 \
$input_files output.nc
This gives me something like this:
netcdf test { dimensions:
record = UNLIMITED ; // (6 currently)
y = 11250 ;
x = 15000 ; variables:
float Band1(record, y, x) ;
Band1:long_name = "GDAL Band Number 1" ;
Band1:_FillValue = -3.4e+38f ;
Band1:grid_mapping = "transverse_mercator" ;
Band1:_Storage = "chunked" ;
Band1:_ChunkSizes = 1, 10, 10 ;
Band1:_DeflateLevel = 5 Band1:_Filter = "|1,5 ;
Band1:_Shuffle = "true" ;
Band1:_Endianness = "little" ;
, and the record dimension was not actually chunked.
I think I can run this command first, and then use ncks on the output file to fix the record dim and rechunk again, however, as ncks needs to read everything into ram, and is also another time-costly operation, I am searching a way to tell ncecat that it should also consider the record-dim as a chunking dimension. I haven't found a way to do this yet.
Your command looks well-formed, though there are a few comments I would make. First, the behavior you are seeing may be a bug, since the command should produce record dimension chunks of size 2000 as request. Second, please read the chunking documentation here. This leads to the possibility that adding the --cnk_plc=cnk_xpl option may help. Third, I suggest you concatenate and chunk the files with ncrcat not ncecat. The former is less memory-intensive than the latter, as described here.

How to add time dimension when concatenating daily TRMM netCDF files using NCO?

I downloaded daily TRMM 3B42 data for a few days from https://disc.gsfc.nasa.gov/datasets. The filenames are of the form 3B42_Daily.yyyymmdd.7.nc4.nc4 but the files do not contain any time dimension. Hence when I use ncecat to concatenate various files, the date information is missing in the resulting file. I want to know how to add the time information in the combined dataset.
It seems that the timestamp is a part of the global attributes. Here is the relevant part from ncdump:
$ ncdump -h ~/Downloads/3B42_Daily.19980730.7.nc4.nc4
netcdf \3B42_Daily.19980730.7.nc4 {
dimensions:
lon = 201 ;
lat = 201 ;
variables:
float lon(lon) ;
...trimmed...
// global attributes:
:BeginDate = "1998-07-30" ;
:BeginTime = "01:30:00.000Z" ;
:EndDate = "1998-07-31" ;
:EndTime = "01:29:59.999Z" ;
...trimmed...
When I tried using ncecat 3B42_Daily.199808??.7.nc4.nc4 /tmp/daily.nc4 it gives
$ ncdump -h /tmp/daily.nc
netcdf daily {
dimensions:
record = UNLIMITED ; // (5 currently)
lon = 201 ;
lat = 201 ;
variables:
float lon(lon) ;
...trimmed...
// global attributes:
:BeginDate = "1998-08-01" ;
:BeginTime = "01:30:00.000Z" ;
:EndDate = "1998-08-02" ;
:EndTime = "01:29:59.999Z" ;
...trimmed...
The time information in the global attribute of the first file is retained, which is not very useful.
When trying to use xarray in python, I face the same issue - again, I could not find how the time information that is contained in the global attribute can be used when concatenating the data.
I can think of two possible solutions, but I do not know how to achieve them.
Add timestamp "by hand" using some command to each of the file first, and then use ncecat
Somehow ncecat may read the global attribute and convert it into a dimension and a variable while concatenating.
From the documentation on http://nco.sourceforge.net/nco.html I could not figure out how to achieve either of these two ways. Or is there a third better way to achieve this (concatenation with relevant time information added)?
Since the data files do not follow CF conventions, you will likely have to manually create a time coordinate after using ncecat to concatenate the files. It only takes a few commands if the times are regular, e.g.,
ncecat -u time in*.nc out.nc
ncap2 -O -s 'time[time]={0.0,1.0,2.0,3.0,4.0}' out.nc out.nc
ncatted -a units,time,o,c,"days since 1998-08-01" out.nc
or use ncap2's array facilities for generic arithmetic arrays.

How do facetiles in Apple Photos that correspond to RKFace.modelId?

I have been digging through the Apple Photos macOS app for a couple weekends now and I am stuck. I am hoping the smart people at StackOverflow can figure this out.
What I don't know:
How are new hex directories determined and how do they correspond to RK.modelId. Perhaps 16 mod of RKFace.ModelId, or mod 256 of RKFace.ModelId?
After a while, the facetile hex value no longer corresponds to the RKFace.ModelId. For example RKFace.modelId 61047 should be facetile_ee77.jpeg. The correct facetile, however, is face/20/01/facetile_1209b.jpeg. hex value 1209b is dec value 73883 for which I have no RKFace.ModelId.
Things I know:
Apple Photos leverages deep learning networks to detect and crop faces out of your imported photos. It saves a cropped jpeg these detected faces into your photo library in resources/media/face/00/00/facetile_1.jpeg.
A record corresponding to this facetile is inserted into RKFace where RKFace.modelId integer is the decimal number of tail hex version of the filename. You can use a standard dec to hex converter and derive the correct values. For example:
Each subdirectory, for example "/00/00" will only hold a maximum of 256 facetiles before it starts a new directory. The directory name is also in hex format with directories. For example 3e, 3f.
While trying to render photo mosaics, I stumbled upon that issue, too...
Then I was lucky to find both a master image and the corresponding facetile,
allowing me to grep around, searching for the decimal and hex equivalent of the numbers embedded in the filenames.
This is what I came up with (assuming you are searching for someone named NAME):
SELECT
printf('%04x', mr.modelId) AS tileId
FROM
RKModelResource mr, RKFace f, RKPerson p
WHERE
f.modelId = mr.attachedModelId
AND f.personId = p.modelId
AND p.displayName = NAME
This select prints out all RKModelResource.modelIds in hex, used to name the corresponding facetiles you were searching for. All that is needed now is the complete path to the facetile.
So, a complete bash script to copy all those facetiles of a person (to a local folder out in the current directory) could be:
#!/bin/bash
set -eEu
PHOTOS_PATH=$HOME/Pictures/Photos\ Library.photoslibrary
DB_PATH=$PHOTOS_PATH/database/photos.db
echo $NAME
mkdir -p out/$NAME
TILES=( $(sqlite3 "$DB_PATH" "SELECT printf('%04x', mr.modelId) AS tileId FROM RKModelResource mr, RKFace f, RKPerson p WHERE f.modelId = mr.attachedModelId AND f.personId = p.modelId AND p.displayName='"$NAME"'") )
for TILE in ${TILES[#]}; do
FOLDER=${TILE:0:2}
SOURCE="$PHOTOS_PATH/resources/media/face/$FOLDER/00/facetile_$TILE.jpeg"
[[ -e "$SOURCE" ]] || continue
TARGET=out/$NAME/$TILE.jpeg
[[ -e "$TARGET" ]] && continue
cp "$SOURCE" "$TARGET" || :
done

Resources