How to add time dimension when concatenating daily TRMM netCDF files using NCO? - netcdf

I downloaded daily TRMM 3B42 data for a few days from https://disc.gsfc.nasa.gov/datasets. The filenames are of the form 3B42_Daily.yyyymmdd.7.nc4.nc4 but the files do not contain any time dimension. Hence when I use ncecat to concatenate various files, the date information is missing in the resulting file. I want to know how to add the time information in the combined dataset.
It seems that the timestamp is a part of the global attributes. Here is the relevant part from ncdump:
$ ncdump -h ~/Downloads/3B42_Daily.19980730.7.nc4.nc4
netcdf \3B42_Daily.19980730.7.nc4 {
dimensions:
lon = 201 ;
lat = 201 ;
variables:
float lon(lon) ;
...trimmed...
// global attributes:
:BeginDate = "1998-07-30" ;
:BeginTime = "01:30:00.000Z" ;
:EndDate = "1998-07-31" ;
:EndTime = "01:29:59.999Z" ;
...trimmed...
When I tried using ncecat 3B42_Daily.199808??.7.nc4.nc4 /tmp/daily.nc4 it gives
$ ncdump -h /tmp/daily.nc
netcdf daily {
dimensions:
record = UNLIMITED ; // (5 currently)
lon = 201 ;
lat = 201 ;
variables:
float lon(lon) ;
...trimmed...
// global attributes:
:BeginDate = "1998-08-01" ;
:BeginTime = "01:30:00.000Z" ;
:EndDate = "1998-08-02" ;
:EndTime = "01:29:59.999Z" ;
...trimmed...
The time information in the global attribute of the first file is retained, which is not very useful.
When trying to use xarray in python, I face the same issue - again, I could not find how the time information that is contained in the global attribute can be used when concatenating the data.
I can think of two possible solutions, but I do not know how to achieve them.
Add timestamp "by hand" using some command to each of the file first, and then use ncecat
Somehow ncecat may read the global attribute and convert it into a dimension and a variable while concatenating.
From the documentation on http://nco.sourceforge.net/nco.html I could not figure out how to achieve either of these two ways. Or is there a third better way to achieve this (concatenation with relevant time information added)?

Since the data files do not follow CF conventions, you will likely have to manually create a time coordinate after using ncecat to concatenate the files. It only takes a few commands if the times are regular, e.g.,
ncecat -u time in*.nc out.nc
ncap2 -O -s 'time[time]={0.0,1.0,2.0,3.0,4.0}' out.nc out.nc
ncatted -a units,time,o,c,"days since 1998-08-01" out.nc
or use ncap2's array facilities for generic arithmetic arrays.

Related

Using CDO to convert 2D .nc file to a 4D .nc file

I have a 2D .nc file with dimensions time and depth that I want to convert to a 4D .nc file. Latitude and longitude are saved as variable names in the 2D file. They are not in a particular order and there is large missing areas as well. The .nc file also contains temperature recordings for each time and depth.
The file header is as follows:
dimensions:
time = UNLIMITED ; // (309 currently)
level = 2000 ;
variables:
float latitude(time) ;
latitude:units = "degree_north" ;
float longitude(time) ;
longitude:units = "degree_east" ;
float temperature(time, level) ;
temperature:standard_name = "sea_water_temperature" ;
temperature:long_name = "Water Temperature" ;
temperature:units = "Celsius" ;
temperature:_FillValue = -9999.f ;
temperature:missing_value = -9999.f ;
Is there an easy way using cdo or nco to bin the temperature recordings into a pre-defined latitude x longitude grid so that the resulting .nc file has four dimensions? (time,depth,latitude,longitude)
Adrian's answer looks correct to me except you do not need/want the dollarsigns in front of the dimension names, to so try
ncap2 -s 'Temp_new[time,depth,latitude,longitude]=temperature' in.nc out.nc
Documentation is here.
I think this posting is perhaps related to your question.
Maybe try
ncap2 -s 'Temp_new[time,depth,latitude,longitude]=temperature' in.nc out.nc
I'm not great at ncap2, perhaps Charlie will correct this post if this is not exactly correct.
(now edited to correct the error pointed out by Charlie)

Concatenate ncdf files and chunk on record dimension (nco)

I am trying to concatenate a large (~4000) number of ncdf files into a single file. Each input file is a spatial raster, with a x and y dimension.
I am trying to work with ncecat:
ncecat -4 -L 5 -D 2 --open_ram --cnk_csh=1000000000 \
--cnk_dmn record,2000 --cnk_dmn x,10 --cnk_dmn y,10 \
$input_files output.nc
This gives me something like this:
netcdf test { dimensions:
record = UNLIMITED ; // (6 currently)
y = 11250 ;
x = 15000 ; variables:
float Band1(record, y, x) ;
Band1:long_name = "GDAL Band Number 1" ;
Band1:_FillValue = -3.4e+38f ;
Band1:grid_mapping = "transverse_mercator" ;
Band1:_Storage = "chunked" ;
Band1:_ChunkSizes = 1, 10, 10 ;
Band1:_DeflateLevel = 5 Band1:_Filter = "|1,5 ;
Band1:_Shuffle = "true" ;
Band1:_Endianness = "little" ;
, and the record dimension was not actually chunked.
I think I can run this command first, and then use ncks on the output file to fix the record dim and rechunk again, however, as ncks needs to read everything into ram, and is also another time-costly operation, I am searching a way to tell ncecat that it should also consider the record-dim as a chunking dimension. I haven't found a way to do this yet.
Your command looks well-formed, though there are a few comments I would make. First, the behavior you are seeing may be a bug, since the command should produce record dimension chunks of size 2000 as request. Second, please read the chunking documentation here. This leads to the possibility that adding the --cnk_plc=cnk_xpl option may help. Third, I suggest you concatenate and chunk the files with ncrcat not ncecat. The former is less memory-intensive than the latter, as described here.

Extract a given variable from multiple Netcdf files and concatenate to a single file

I am trying to extract a single variable (DUEXTTAU) from multiple NC files, and then combine all the individual files into a single NC file. I am using nco, but have an issue with ncks.
The NC filenames follow:
MERRA2_100.tavgM_2d_aer_Nx.YYYYMM.nc4
Each file has 1 (monthly) time step, and the time coordinate has no real value, but changes in units or begin_date. For example, in the file MERRA2_100.tavgM_2d_aer_Nx.198001.nc4, it has:
int time(time=1);
:long_name = "time";
:units = "minutes since 1980-01-01 00:30:00";
:time_increment = 60000; // int
:begin_date = 19800101; // int
:begin_time = 3000; // int
:vmax = 9.9999999E14f; // float
:vmin = -9.9999999E14f; // float
:valid_range = -9.9999999E14f, 9.9999999E14f; // float
:_ChunkSizes = 1U; // uint
I repeat this step for each file
ncks -v DUEXTTAU MERRA2_100.tavgM_2d_aer_Nx.YYYYMM.nc4 YYYYMM.nc4
and then
ncrcat YYYYMM.nc4 final.nc4
In final.nc4, the time coordinate has the same value (of the first YYYYMM.nc4). For example, after combining the 3 files of 198001, 198002 and 198003, the time coordinate equals 198001 for all the time steps. How should I deal with this?
Firstly, this command should work:
ncrcat -v DUEXTTAU MERRA2_100.tavgM_2d_aer_Nx.??????.nc4 final.nc4
However, recent versions of NCO fail to correctly reconstruct or re-base the time coordinate when time is an integer, which it is in your case. The fix is in the latest NCO snapshot on GitHub and will be in 4.9.3 to be released hopefully this week. If installing from source is not an option, then manual intervention would be required (e.g., change time to floating point in each input file with ncap2 -s 'time=float(time)' in.nc out.nc). In any case, the time_increment, begin_date, and begin_time attributes are non-standard and will simply be copied from the first file. But time itself should be correctly reconstructed if you use a non-broken version of ncrcat.
you can do this using cdo as well, but you need two steps:
cdo mergetime MERRA2_100.tavgM_2d_aer_Nx.??????.nc4 merged_file.nc
cdo selvar,DUEXTTAU merged_file.nc DUEXTTAU.nc
This should actually work if the begin dates are all set correctly. The problem is that merged_file.nc could actually be massive, and so it may be better to loop through to extract the variable first and then combine:
for file in `ls MERRA2_100.tavgM_2d_aer_Nx.??????.nc4`; do
cdo selvar,DUEXTTAU $file ${file#????}_duexttau.nc4
done
cdo mergetime MERRA2_100.tavgM_2d_aer_Nx.??????_duexttau.nc4 DUEXTTAU.nc
rm -f MERRA2_100.tavgM_2d_aer_Nx.??????_duexttau.nc4 # clean up

How do facetiles in Apple Photos that correspond to RKFace.modelId?

I have been digging through the Apple Photos macOS app for a couple weekends now and I am stuck. I am hoping the smart people at StackOverflow can figure this out.
What I don't know:
How are new hex directories determined and how do they correspond to RK.modelId. Perhaps 16 mod of RKFace.ModelId, or mod 256 of RKFace.ModelId?
After a while, the facetile hex value no longer corresponds to the RKFace.ModelId. For example RKFace.modelId 61047 should be facetile_ee77.jpeg. The correct facetile, however, is face/20/01/facetile_1209b.jpeg. hex value 1209b is dec value 73883 for which I have no RKFace.ModelId.
Things I know:
Apple Photos leverages deep learning networks to detect and crop faces out of your imported photos. It saves a cropped jpeg these detected faces into your photo library in resources/media/face/00/00/facetile_1.jpeg.
A record corresponding to this facetile is inserted into RKFace where RKFace.modelId integer is the decimal number of tail hex version of the filename. You can use a standard dec to hex converter and derive the correct values. For example:
Each subdirectory, for example "/00/00" will only hold a maximum of 256 facetiles before it starts a new directory. The directory name is also in hex format with directories. For example 3e, 3f.
While trying to render photo mosaics, I stumbled upon that issue, too...
Then I was lucky to find both a master image and the corresponding facetile,
allowing me to grep around, searching for the decimal and hex equivalent of the numbers embedded in the filenames.
This is what I came up with (assuming you are searching for someone named NAME):
SELECT
printf('%04x', mr.modelId) AS tileId
FROM
RKModelResource mr, RKFace f, RKPerson p
WHERE
f.modelId = mr.attachedModelId
AND f.personId = p.modelId
AND p.displayName = NAME
This select prints out all RKModelResource.modelIds in hex, used to name the corresponding facetiles you were searching for. All that is needed now is the complete path to the facetile.
So, a complete bash script to copy all those facetiles of a person (to a local folder out in the current directory) could be:
#!/bin/bash
set -eEu
PHOTOS_PATH=$HOME/Pictures/Photos\ Library.photoslibrary
DB_PATH=$PHOTOS_PATH/database/photos.db
echo $NAME
mkdir -p out/$NAME
TILES=( $(sqlite3 "$DB_PATH" "SELECT printf('%04x', mr.modelId) AS tileId FROM RKModelResource mr, RKFace f, RKPerson p WHERE f.modelId = mr.attachedModelId AND f.personId = p.modelId AND p.displayName='"$NAME"'") )
for TILE in ${TILES[#]}; do
FOLDER=${TILE:0:2}
SOURCE="$PHOTOS_PATH/resources/media/face/$FOLDER/00/facetile_$TILE.jpeg"
[[ -e "$SOURCE" ]] || continue
TARGET=out/$NAME/$TILE.jpeg
[[ -e "$TARGET" ]] && continue
cp "$SOURCE" "$TARGET" || :
done

How to write null values in netcdf file?

Does _FillValue or missing_value still occup storage space?
If there is a 2-dimission array with some null values, How can i write it to netcdf file for saving storage space?
In netCDF3 every value requires the same amount of disk space. In netCDF4 it is possible to reduce the required disk space using gzip compression. The actual compression ratio depends on the data. If there are lots of identical values (for example missing data), you can achieve good results. Here is an example in python:
import netCDF4
import numpy as np
import os
# Define sample data with all elements masked out
N = 1000
data = np.ma.masked_all((N, N))
# Write data to netCDF file using different data formats
for fmt in ('NETCDF3_CLASSIC', 'NETCDF4'):
fname = 'test.nc'
ds = netCDF4.Dataset(fname, format=fmt, mode='w')
xdim = ds.createDimension(dimname='x', size=N)
ydim = ds.createDimension(dimname='y', size=N)
var = ds.createVariable(
varname='data',
dimensions=(ydim.name, xdim.name),
fill_value=-999,
datatype='f4',
complevel=9, # set gzip compression level
zlib=True # enable compression
)
var[:] = data
ds.close()
# Determine file size
print fmt, os.stat(fname).st_size
See the netCDF4-python documentation, section 9) "Efficient compression of netCDF variables" for details.
Just to add to the excellent answer from Funkensieper, you can copy and compress files from the command line using cdo:
cdo -f nc4c -z zip_9 copy in.nc out.nc
One could compress files simply using gzip or zip etc, but the disadvantage is that you need to decompress before reading. Using the netcdf4 compression capabilities avoids this.
You can select your level X of compression by using -z zip_X. If your files are very large you may want to sacrifice a little bit the file size in return for faster access times (e.g. using zip_5 or 6, instead of 9). In many cases with heterogeneous data, the compression gain is small relative to the uncompressed file.
or similarly with NCO
ncks -7 -L 9 in.nc out.nc

Resources