Computing the median of a nc file with nco - netcdf

As the title indicates, I want to compute the median of a .nc file named sfcWind_1999.nc. On each grid point (lat/lon combination) I want to compute the median in the time dimension. Is there any nco attribute that does this?

The median is the 50th percentile, so you can use the percentile function in CDO to do this:
cdo timpctl,50 in.nc -timmin in.nc -timmax in.nc median.nc
The function needs to know the data bounding values to calculate the histogram, hence the piped min and max commands.

As described in the manual, NCO can report the median of a single variable, but not, in one step, all variables in the file. So you would have to create the loop over variables yourself, either in Bash or in ncap2 directly.
ncap2 -O -v -s "foo=gsl_stats_median_from_sorted_data(var_nm.sort());print(foo)" in.nc out.nc
or
ncmdn var_nm in.nc

Related

Merge Two Datasets with Different Number of Variables per Timestep Using CDO

I am analyzing a time series of data that is split by time into two NetCDF files (infiles). These files have a different number of variables/fields, by design. Traditionally I have been using Climate Data Operators (CDO) to easily merge two datasets sorted by time using the following command in a terminal:
cdo mergetime <infiles> <outfile>
this command merges any number of files "infiles" sorted by time and writes a new "outfile" containing a time series of all the data in each ; however this doesn't appear to work by default with cdo, as it kicks back the following:
cdo select (Abort): Input streams have different number of variables per timestep!
the statement is true, each file does have a different number of variables per timestep. But it prevents me from looking at the dataset as a whole. I have also tried the following modifications to the cdo command I use to merge the time series, without success:
cdo mergetime -select,name=<variable> <infiles> <outfile>
cdo -select,name=<variable> <infiles> <outfile>
I have read through the CDO Userguide and have not found any alternative solutions yet. I would be very grateful if anyone could offer a workaround for joining the two files into a single time-series of data (preferably in cdo but not necessarily) as I am running out of ideas.
On phone but you could delete the extra annoying new variables from files with nco like this
ncks -x -v var1,var2 in.nc out.nc
And then merge as usual. I think you can use the cdo delete operator to do the same thing.

How to remove seasonality from time series data?

How can I remove seasonality data from a timeseries with the data stored in a netcdf file? I would like to find a solution using Linux, while I used Grads and Ferret for visualization.
Thanks a lot!
You can use CDO to calculate the average for each day/month of the year and subtract from the origin file:
If the file contains daily data:
cdo sub in.nc -ydaymean in.nc deseasonalized.nc
Likewise if the data is monthly:
cdo sub in.nc -ymonmean in.nc deseasonalized.nc
The ydaymean and ymonmean commands calculate the annual cycle over the dataset in.nc, i.e. ymonmean returns 12 time slices, the average of all the january, february and so on, which is then subtracted from the original file using sub. I've used piping, but it may be easier to understand on two separate lines:
cdo ymonmean in.nc annual_cycle.nc
cdo sub in.nc annual_cycle.nc deseasonalized.nc
This does exactly the same, deseasonalized.nc will be identical (well almost, there will be a few bytes differences due to the different "history" log in the netcdf global metadata header), but you will also have a new file with the annual_cycle.nc inside it (might also be useful?).
When doing the subtraction, CDO detects that the number of timeslices is smaller in the second file to be subtracted and thus loops/cycles through it. Note as the seasonal cycle is calculated from the same file as the original data it is fine to simply use "sub" as, if the data starts in e.g. April, the results of ymonmean will also start from April. However, if you want to remove a seasonal cycle calculated from a different source, the start day/month may be different and you end up subtracting e.g. April mean from January! To avoid this, you can use the ymonsub command instead:
cdo ymonsub full_timeseries.nc seasonal_file.nc deseasonalised.nc
In addition, there are now also packages in both R and python to allow you to access the full functionality of cdo from within those languages without having to resort to using shell access tools.
Edit 2021: i now have a video on this topic you can view here https://youtu.be/jKlA1ouoQIs

How to permanently round values in a raster layer, matrix, or array, to write to a new file using r studio?

I am trying to round either round a 3 dimensional matrix or several raster layers that will be made into a 3 dimensional matrix to 2 decimal places to make a new netcdf file that will take up less memory.
Using the round function as such:
newmatrix <- round(oldmatrix, 2)
only seems to round values superficially for display. Opening newmatrix and extracting values from it after adding it to a new file returns the unrounded values from oldmatrix. This is despite the fact that values extracted from newmatrix before adding it to a new file will be rounded to 2 decimal places as is supposed to be. The same thing happens if I round off the raster layers before creating a new matrix with them.
What function can I use to permanently round off my matrix's or raster's values to write to a new file rounded?
NetCDF doesn't have a fixed precision format that would save space in the way you are expecting. (See here for data types). The usual way of saving space is to encode as a short integer and set the variable attributes scale_factor and add_offset.
In your case, you would multiply by 100, convert to short, and have scale_factor=0.01. Doing this in R is probably a lot of work, but nco utility would handle it in a few lines.
Let's say you have a variable called rh.
ncap2 -v -s 'rh=short(100*rh);' in.nc out.nc
ncatted -O -h -a add_offset,rh,o,f,0 out.nc
ncatted -O -h -a scale_factor,rh,o,f,0.01 out.nc
Equivalently, this can be done in one line using
ncap2 -v -O -s 'rh=pack_short(rh, 0.01, 0.0);' in.nc out.nc
If you are looking to save memory when reading a variable into R, you might be disappointed as it will just be converted back into a float upon reading.

Extracting data with NCO bounded by variable values

Is it possible to pull out all data within a region (or specific cells) that meet criteria set by a variable rather than a dimension?
For example, I'm looking to pull data within a height ('HGT') bounded region.
Hyperslabbing seems to requires dimension rather than variable.
The ncap2 where() function serves this purpose, e.g.,
ncap2 -s 'where(th < 0.0 || th > 50.0) th=th.get_miss();' in.nc out.nc
you can use the mask facility in CDO to set values to missing for variables above or below a threshold, e.g. for above a threshold:
cdo lec,50 in.nc mask.nc
cdo mul in.nc mask.nc masked_output.nc
you can pipe to do this on one line and also include a lower bound:
cdo mul -lec,50 -gec,0 in.nc in.nc masked_output.nc

count number of missing values in netcdf file - R

Is there a quick way to know how many missing values are in a netcdf file? Possibly using R.
Currently I have to
hum<-nc_open("rhum.sig995.2008.nc")
rhum<-ncvar_get(hum, "rhum")
then manually look up the missing value by typing 'hum' and copy it into this operation
sum(abs(rhum - 9.96920996838687e+36) < -9.96920996838687e+36)
Is there a more direct way, especially if I have to work with hundreds of files? I would like to avoid copying and pasting the missing value, and also I am not sure with what kind of precision the number should be handled.
My suggestion is to use the excellent raster package:
install.packages(raster)
library(raster)
r <- raster("rhum.sig995.2008.nc", var="rhum")
NAnum <- summary(r)[6]
The total number of missing data points for variable names "var" can be stored in a new additional variable using
ncap2 -s "nmiss=number_miss(var)" in.nc out.nc
or
ncap2 -s "nmiss=var.number_miss()" in.nc out.nc
If your data has a time dimension and you want to see the total number of missing points summed over the space dimensions, then you can see this with
cdo info in.nc

Resources