NetCDF spatially merging to global data - netcdf

Currently I use global precipitation (ppt) and potential evapotranspiration (pet) data to calculate SPEI. As I have limited hardware resources, I divided global ppt and ppt data into 32 parts, each file covering 45x45deg and contains 756 data - monthly from 1958-2020 (tile01.nc, tile02.nc, ... tile32.nc)
For example to do this, I use cdo sellonlatbox,-180,-135,45,90 in.nc out.nc or ncks -d lat,45.,90. -d lon,-180.,-135. in.nc -O out.nc
As required by SPEI script, I reorder and fixed the dimension from time,lat,lon to lat,lon,time using ncpdq and ncks.
From the SPEI output, I got the output in lat,lon,time. So I did reorder the dimension so that it becomes time,lat,lon using ncpdq.
Each tile SPEI output covering 45x45deg and contains 756 SPEI data - monthly from 1958-2020
Finally I need to merge all the output together (32 files) into one, so I will get the global SPEI output. I have try to use cdo mergegrid but the result is not what I expected. Is there any command from cdo or nco to solve this problem that has function similar to gdal_merge if we are dealing with geoTIFF format?
Below is the example of the SPEI output
UPDATE
I managed to merge all the data using cdo collgrid as suggested by Robert below. And here's the result:

I believe you want to use CDO's distgrid and collgrid methods for this.
First, run this:
cdo distgrid,4,8 in.nc obase
That will split the files up the way you want them.
Then do the post-processing necessary on the files.
Then use collgrid to merge the files:
cdo collgrid obase* out.nc
Potentially you can just use collgrid in place of mergegrid in your present work flow, depending on how you have split the files up.

Related

CDO : Masking 3D and 4D variables within the same netcdf file

I have a netcdf file called data.nc and a masking file called mask.nc. The data.nc file has a 3D variable A with dimensions over (time,long,lat), and 4D variable B with dimensions over (time,depth,long,lat).
The mask.nc file has two masking variables mask_3d (time,long,lat) and mask_4d (time,depth,long,lat) with 0 and 1 values.
So far, I am masking each variable separately using:
cdo -div -selname,A data.nc -selname,mask_3d mask.nc out.nc
and
cdo -div -selname,B data.nc -selname,mask_4d mask.nc out2.nc
My question is:
How can I mask both variables A and B in data.nc using only one command ?
I'm not sure this really qualifies as an answer, perhaps it should be more of a comment, but I think this is not possible with cdo. If I understand correctly, you essentially want the output to be in the same file (cdo only allows one output file, so by definition your question implies this).
The reason is that you would need to cat the two output files together in order to get what you want, but because cdo cat allows you to cat a variable number of files together, and even use wildcards (e.g. cdo cat *.nc out.nc) then it doesn't know how many input files to expect and thus cdo does not let you pipe such commands in combination with other commands as it can't interpret them safely.
Thus you would need to keep this as three lines (at least for a cdo based solution, I think, but stand to be corrected):
cdo -div -selname,A data.nc -selname,mask_3d mask.nc out1.nc
cdo -div -selname,B data.nc -selname,mask_4d mask.nc out2.nc
cdo cat out?.nc out.nc
sorry about that... That said, once commands start to get long, I think that keeping them separate as here aids legibility of the code.

Failed to concatenate global layer netCDF data using NCO

I am using monthly global potential evapotranspiration data from TerraClimate from 1958-2020 (available as 1 nc per year) and planning to concatenate all into single nc file.
The data has a variable pet and three dimension ppt(time,lat,lon).
I managed to combine all of the data using cod mergetime TerraClimate_*.nc and generate around 100GB of output file.
For analysis purpose in Windows machine, I need single netCDF file with order lat,lon,time. What I have done is as follows:
Reorder the dimension from time,lat,lon into lat,lon,time using ncpdq command
for fl in *.nc; do ncpdq -a lat,lon,time $fl ../pet2/$fl; done
Loop all file in the folder to make time the record dimension/variable used for concatenating files using ncks command
for fl in *.nc; do ncks -O --mk_rec_dmn time $fl $fl; done
Concatenates all nc files in the folder into one nc file using ncrcat command
ncrcat -h TerraClimate_*.nc -O TerraClimate_pet_1958_2020.nc
It's worked, but the result is not what I expected, it generate 458KB size of file, when I check the result using Panoply it provide wrong information, all have value -3276.7. See below picture.
I have check the files from step 1 and 2, and everything is correct.
I also try to concatenate only 2 files, using 1958 and 1959 data (each file 103MB), but the result still not what I expected.
ncrcat -h TerraClimate_pet_1958.nc TerraClimate_pet_1959.nc -O ../TerraClimate_pet_1958_1959.nc
Did I missed something on the code or write the wrong code? Any suggestion how to solve the problem?
UPDATE 1 (22 Oct 2021):
Here's the metadata of original data downloaded from above link.
UPDATE 2 (23 Oct 2021):
Following suggestion from Charlie, I did unpack for all the data from point 2 above using below command.
for fl in *.nc4; do ncpdq --unpack $fl ../unpack/$fl; done
Here's the example metadata from unpack process.
And the data visualised using Panoply.
Then I did test to concatenate again using 2 data from unpack process (1958 and 1959)
ncrcat -h TerraClimate_pet_1958.nc TerraClimate_pet_1959.nc -O ../TerraClimate_pet_1958_1959.nc
Unfortunately the result remain same, I got result with size 1MB. Below is the metadata
And visualised the ncrcat result using Panoply
Your commands appear to be correct, however I suspect that the data in the input files is packed. As explained in the ncrcat documentation here, the input data should be unpacked (e.g., with ncpdq --unpack) prior to concatenating all the input files (unless they all share the same values of scale_factor and add_offset). If that does not solve the problem, then (1) there is likely an issue with _FillValue and (2) please post the pet metadata from a sample input file.

How to merge 2 separate netcdf files into 1 and add a time dimension

I have two NetCDF files of the Greenland ice sheet velocities, one from 2015 and one from 2016. These files contain grided data where the velocity is plotted with x,y coordinates. However, no time dimension is included. How can I merge these two files into 1, where the final file has a time dimension? So in stead of two separate x,y,z grids, I would like to have one x,y,z,t data structure, where time = 2.
Thanks!
If the files contain the same variables and are the same size, try ncecat
ncecat -u time file1.nc file2.nc out.nc
You can add a time dimension to a file with ncap2:
ncap2 -s 'defdim("time",1);time[time]=74875.0;time#long_name="Time"; etc.etc.etc.' -O ~/nco/data/in.nc ~/foo.nc
I suggest reading this thread for more details: https://sourceforge.net/p/nco/discussion/9830/thread/cee4e1ad/
After you have done that you can merge them together either using the ncrcat command (see https://linux.die.net/man/1/ncrcat) or also with cdo
cdo mergetime file1.nc file2.nc combined_file.nc

How should I use CDO selyear? I get an output file four times larger

CDO seems to work fine for me, until I met this. I have a netcdf of daily data from year 2101 to 2228, and I want to obtain a file with only years from 2101 to 2227, so I run:
cdo selyear,2101/2227 in.nc out.nc
But the output file is more than four times the input in memory size! It seems to have the right number of time steps, and the initial and end date are correct. Also, latitude and longitude seem to be the same as the input, so I wonder why the file size.
Perhaps try to retain the compression with the cdo operator and output netcdf 4
cdo -f nc4c -z zip_9 selyear,2101/2227 in.nc out.nc
This is the maximum compression, usually I use zip_5 or so, as the files are not much larger than zip_9 and the command is much faster.
An alternative is to (re?)pack the data to SHORTS with add_offset and scale_factor like this:
cdo pack -selyear,2101/2227 in.nc out.nc

Using nco to convert NetCDF file with monthly data into daily files

I have a year of data as monthly NetCDF files, and I'd like to turn them into daily files. I know I can use ncks to extract 24 hours, like this
ncks -d time,0,23 -o output.nc input.nc
but I'd like to automatically chunk the whole month into days, without having to worry about the number of days in each month and whatnot. Is there an easy way to do this? (I could of course write a python script or similar that calls ncks, but it would be more elegant to avoid that.)
The NCO -d hyperslab switch understands dates in UDUnits format. If your input file contains well-formatted units attribute for time, there should be no problem in writing twelve commands each with a date hyperslab like
ncks -d time,1918-01-01,1918-01-31 in.nc jan.nc
Other than that, there is no more elegant method currently supported by NCO.
From the question it is not clear to me if your monthly files contain daily data. i.e. you have 12 files and each file contains daily (or finer) time resolution information. If this is the case then I think what you want to do is very easy with cdo using
cdo splitday input.nc output.nc
you will end up with a number of files, each with a day of data, and the number of days in each month is handled automatically for you.

Resources