How to extract a single variable using NCO. Associated variable problem - netcdf

I am trying extract to a single variable kd_490 from a NetCDF file (over thredds) using NCO.
My code is below:
ncks -v kd_490 -d lat,40.0,70.0 -d lon,-20.0,15.0 https://rsg.pml.ac.uk/thredds/dodsC/cci/v4.2-release/geographic/daily/kd/1998/ESACCI-OC-L3S-K_490-MERGED-1D_DAILY_4km_GEO_PML_KD490_Lee-19980102-fv4.2.nc out.nc
However, along with kd_490 it also extracts kd_490_bias and kd_490_rmsd. I know that ncks extracts "associated" variables. See here. However, I am not clear on why NCO is identifying these as "associated" variable.
I cannot figure out a way to only select the variable kd_490. Adding "-C" to the code results in the grid being wrong. Does anyone know how?
Note: CDO can solve this problem. However, CDO is less efficient at extracting the spatial subset in the code, so NCO is more appropriate.

NCO identifies kd_490_bias and kd_490_rmsd as associated variables because of the ancillary_variables attribute in kd_490:
float kd_490(time,lat,lon) ;
kd_490:_FillValue = 9.96921e+36f ;
kd_490:long_name = "Downwelling attenuation coefficient at 490nm, derived using Lee 2005 equation and bbw from Zhang 2009 (following the SeaDAS Kd_lee algorithm)" ;
kd_490:units = "m-1" ;
kd_490:ancillary_variables = "kd_490_rmsd kd_490_bias" ;
kd_490:grid_mapping = "crs" ;
kd_490:parameter_vocab_uri = "http://vocab.ndg.nerc.ac.uk/term/P071/19/CFSN0064" ;
kd_490:standard_name = "volume_attenuation_coefficient_of_downwelling_radiative_flux_in_sea_water" ;
kd_490:units_nonstandard = "m^-1" ;
kd_490:_ChunkSizes = 1, 270, 270 ;
as documented here. To extract kd_490 without the ancillary variables, but with the grid variables (which are other "associated" variables), this works for me:
ncks -O -C -v kd_490,crs,lat,lon -d lat,40.0,70.0 -d lon,-20.0,15.0 https://rsg.pml.ac.uk/thredds/dodsC/cci/v4.2-release/geographic/daily/kd/1998/ESACCI-OC-L3S-K_490-MERGED-1D_DAILY_4km_GEO_PML_KD490_Lee-19980102-fv4.2.nc ~/foo2.nc

Related

How to delete a group variable while keeping the group structure of a netcdf in python or bash?

I have a .nc file with a group structure, one of the groups containing a variable I need to delete.
Using xarray, if I want to delete the variable I can only extract its group as a new .nc file.
ds = xr.load_dataset(path_test,group='/data_01/ku')
ds = ds.drop_vars(["ssh"])
ds.to_netcdf(path_test, mode="a", group='/data_01/ku')
Using bash command ncks (from nco) doing this :
ncks -x -g data_01/ku -v ssh in.nc out.nc
I get a memory error.
Does anyone know how to delete one specific variable while keeping the complete group structure of the file ?
Thanks guys
The ncks command you tried looks correct, and such commands work for me.
Try adding the -C switch just in case:
ncks -O -x -C -g g1/g1g1 -v ppc_dbl ~/nco/data/in_grp.nc ~/foo.nc
Seems like you got unlucky, or possibly are employing an old NCO version?

Failed to concatenate global layer netCDF data using NCO

I am using monthly global potential evapotranspiration data from TerraClimate from 1958-2020 (available as 1 nc per year) and planning to concatenate all into single nc file.
The data has a variable pet and three dimension ppt(time,lat,lon).
I managed to combine all of the data using cod mergetime TerraClimate_*.nc and generate around 100GB of output file.
For analysis purpose in Windows machine, I need single netCDF file with order lat,lon,time. What I have done is as follows:
Reorder the dimension from time,lat,lon into lat,lon,time using ncpdq command
for fl in *.nc; do ncpdq -a lat,lon,time $fl ../pet2/$fl; done
Loop all file in the folder to make time the record dimension/variable used for concatenating files using ncks command
for fl in *.nc; do ncks -O --mk_rec_dmn time $fl $fl; done
Concatenates all nc files in the folder into one nc file using ncrcat command
ncrcat -h TerraClimate_*.nc -O TerraClimate_pet_1958_2020.nc
It's worked, but the result is not what I expected, it generate 458KB size of file, when I check the result using Panoply it provide wrong information, all have value -3276.7. See below picture.
I have check the files from step 1 and 2, and everything is correct.
I also try to concatenate only 2 files, using 1958 and 1959 data (each file 103MB), but the result still not what I expected.
ncrcat -h TerraClimate_pet_1958.nc TerraClimate_pet_1959.nc -O ../TerraClimate_pet_1958_1959.nc
Did I missed something on the code or write the wrong code? Any suggestion how to solve the problem?
UPDATE 1 (22 Oct 2021):
Here's the metadata of original data downloaded from above link.
UPDATE 2 (23 Oct 2021):
Following suggestion from Charlie, I did unpack for all the data from point 2 above using below command.
for fl in *.nc4; do ncpdq --unpack $fl ../unpack/$fl; done
Here's the example metadata from unpack process.
And the data visualised using Panoply.
Then I did test to concatenate again using 2 data from unpack process (1958 and 1959)
ncrcat -h TerraClimate_pet_1958.nc TerraClimate_pet_1959.nc -O ../TerraClimate_pet_1958_1959.nc
Unfortunately the result remain same, I got result with size 1MB. Below is the metadata
And visualised the ncrcat result using Panoply
Your commands appear to be correct, however I suspect that the data in the input files is packed. As explained in the ncrcat documentation here, the input data should be unpacked (e.g., with ncpdq --unpack) prior to concatenating all the input files (unless they all share the same values of scale_factor and add_offset). If that does not solve the problem, then (1) there is likely an issue with _FillValue and (2) please post the pet metadata from a sample input file.

Extract specific particle id variable from a netCDF file

I have a netCDF file output from a particle dispersion model (GNOME).
As it is a particle dispersion model, I have every particle identified by a particle id variable:
int id(data) ;
id:description = "particle ID" ;
id:units = "1" ;
I need to extract only some specific particle id and their locations. I have tried with cdo and nco operators and I get these errors:
ncks -v longitude,latitude -d id,62001. infile.nc outputfile.nc
ncks: ERROR dimension id is not in input file
cdo -select,name=latitude,longitude,id=62968 infile.nc outputfile.nc
cdo select (Abort): Unsupported selection keyword: 'id'!
I hope someone could help me. Thanks
The dimension is actually named "data". I suggest you rename the dimension to "id". Then your command should work:
ncrename -d data,id in.nc
ncks -v longitude,latitude -d id,62001. in.nc out.nc
or you could leave the names alone, and if the id is really the data index, then this should work:
ncks -v longitude,latitude -d data,62001 in.nc out.nc
NB: no decimal point this time since data is not a coordinate, as explained here.
EDIT: 20210921 in response to comment below, unless I am missing something, the dataset would need to have a variable traj dimensioned traj(time,data) in order for the suggested commands to have the result you desire. The header of your file shows no such variable.

multiply variables in two NetCDF files in single command

I have two netcdf files:
file_1.nc with variables qty_1 and qty_2 and
file_2.nc with variables qty_3, qty_4 and qty_5.
I want a file with 3 variables qty_3=qty_3*qty_2; qty_4=qty_4+qty_2 and qty_5.
Now I am first copying the variables to file_2 using
ncks -A -v qty_1,qty_2 file_1.nc file_2.nc
then I am doing math operation as,
ncap2 -A -s 'qty_3=qty_3*qty_2' -s 'qty_4=qty_4+qty_2' file_2.nc
This works, however, take some time.
Is there a way I can do this calculation in a single command ?
If you aren't totallly dependent on NCO, you could do this with CDO:
cdo -selname,qty_3,qty_4,qty_5 -aexpr,'qty_3=qty_3*qty_2;qty_4=qty_4+qty_2' -merge file_1.nc file_2.nc out.nc

Select data along non-conventional dimension with CDO or NCO

I have a large number of NetCDF files from which I would like to extract a small number of variables for one location, and merge them into a new NetCDF file. The dimensions of the files are:
dimensions:
time = 18 ;
level = 65 ;
levelh = 66 ;
domain = 36 ;
I can subtract/merge the files for all domains with something like:
cdo select,name=u,v file1.nc file2.nc out.nc
But all other operators seem to be related to selections in space (e.g. sellonlatbox) or time (e.g. seltimestep), but I can't find a way to select only 1 domain from the NetCDF files. Is this possible with CDO's or NCO's?
Not sure I fully understand the question/intent. NCO treats all dimensions equally. If you want domain #17 then try
ncrcat -v u,v -d domain,17 file1.nc file2.nc out.nc
If file1.nc and file2.nc are not sequential in a record coordinate then try
ncecat -v u,v -d domain,17 file1.nc file2.nc out.nc
ADDED 20180929:
or if you don't like that, and the files do not have a record dimension yet are time-sequential then before using ncrcat turn the temporal dimension into a record coordinate for each file with
ncks -O --mk_rec_dmn time file1.nc file1.nc
ncks -O --mk_rec_dmn time file2.nc file2.nc
...
etc. and proceed as above. That may be the best way forward with NCO.

Resources