count number of missing values in netcdf file - R - r

Is there a quick way to know how many missing values are in a netcdf file? Possibly using R.
Currently I have to
hum<-nc_open("rhum.sig995.2008.nc")
rhum<-ncvar_get(hum, "rhum")
then manually look up the missing value by typing 'hum' and copy it into this operation
sum(abs(rhum - 9.96920996838687e+36) < -9.96920996838687e+36)
Is there a more direct way, especially if I have to work with hundreds of files? I would like to avoid copying and pasting the missing value, and also I am not sure with what kind of precision the number should be handled.

My suggestion is to use the excellent raster package:
install.packages(raster)
library(raster)
r <- raster("rhum.sig995.2008.nc", var="rhum")
NAnum <- summary(r)[6]

The total number of missing data points for variable names "var" can be stored in a new additional variable using
ncap2 -s "nmiss=number_miss(var)" in.nc out.nc
or
ncap2 -s "nmiss=var.number_miss()" in.nc out.nc
If your data has a time dimension and you want to see the total number of missing points summed over the space dimensions, then you can see this with
cdo info in.nc

Related

making shape files with zeros values eliminated

I am trying to make a shapefile where the zero values are taken out in order to reduce the amount of space used. I attempted to do this through the following procedure but only got more zero values displayed:
cdo gtc,0.000008 precip_2022110815_0p50.nc mask1.nc
cdo setctomiss,0 mask1.nc mask2.nc
cdo mul precip_2022110815_0p50.nc mask2.nc precip_2022110815_0p50_adjust.nc
cdo setctomiss,0 precip_2022110815_0p50_adjust.nc precip_2022110815_0p50_final.nc
gdal_contour -a precip -i 0.00001 precip_2022110815_0p50_final.nc precip_2022110815_0p50.shp
I got the netcdf from a grib file that was obtained from ftp.ncep.noaa.gov.
Is there anyway I could tweak this code or other methods I could use to get a shape file where all zero values are filtered out or even values below a certain threshold are filtered out? Is there a way to filter the values below a certain amount out from a grib2 file?

Merge Two Datasets with Different Number of Variables per Timestep Using CDO

I am analyzing a time series of data that is split by time into two NetCDF files (infiles). These files have a different number of variables/fields, by design. Traditionally I have been using Climate Data Operators (CDO) to easily merge two datasets sorted by time using the following command in a terminal:
cdo mergetime <infiles> <outfile>
this command merges any number of files "infiles" sorted by time and writes a new "outfile" containing a time series of all the data in each ; however this doesn't appear to work by default with cdo, as it kicks back the following:
cdo select (Abort): Input streams have different number of variables per timestep!
the statement is true, each file does have a different number of variables per timestep. But it prevents me from looking at the dataset as a whole. I have also tried the following modifications to the cdo command I use to merge the time series, without success:
cdo mergetime -select,name=<variable> <infiles> <outfile>
cdo -select,name=<variable> <infiles> <outfile>
I have read through the CDO Userguide and have not found any alternative solutions yet. I would be very grateful if anyone could offer a workaround for joining the two files into a single time-series of data (preferably in cdo but not necessarily) as I am running out of ideas.
On phone but you could delete the extra annoying new variables from files with nco like this
ncks -x -v var1,var2 in.nc out.nc
And then merge as usual. I think you can use the cdo delete operator to do the same thing.

How to permanently round values in a raster layer, matrix, or array, to write to a new file using r studio?

I am trying to round either round a 3 dimensional matrix or several raster layers that will be made into a 3 dimensional matrix to 2 decimal places to make a new netcdf file that will take up less memory.
Using the round function as such:
newmatrix <- round(oldmatrix, 2)
only seems to round values superficially for display. Opening newmatrix and extracting values from it after adding it to a new file returns the unrounded values from oldmatrix. This is despite the fact that values extracted from newmatrix before adding it to a new file will be rounded to 2 decimal places as is supposed to be. The same thing happens if I round off the raster layers before creating a new matrix with them.
What function can I use to permanently round off my matrix's or raster's values to write to a new file rounded?
NetCDF doesn't have a fixed precision format that would save space in the way you are expecting. (See here for data types). The usual way of saving space is to encode as a short integer and set the variable attributes scale_factor and add_offset.
In your case, you would multiply by 100, convert to short, and have scale_factor=0.01. Doing this in R is probably a lot of work, but nco utility would handle it in a few lines.
Let's say you have a variable called rh.
ncap2 -v -s 'rh=short(100*rh);' in.nc out.nc
ncatted -O -h -a add_offset,rh,o,f,0 out.nc
ncatted -O -h -a scale_factor,rh,o,f,0.01 out.nc
Equivalently, this can be done in one line using
ncap2 -v -O -s 'rh=pack_short(rh, 0.01, 0.0);' in.nc out.nc
If you are looking to save memory when reading a variable into R, you might be disappointed as it will just be converted back into a float upon reading.

Read large netcdf data by ncl

I'm reading a large wrfout data(about 100x100 in space, 30 in vertical, 400 in times) by ncl.
fid=addfile("wrfout_d03.nc","r")
u=fid->U
The variable U is about 500M, so it takes much time, and I also need to read other variables.Is there any way for ncl to read large netcdf data quickly? Or can I use other languages?
It may be more helpful to extract the variables and timeslices you need before reading them into NCL.
To select by variable:
cdo selvar,var in.nc out.nc
To select by level:
cdo sellevel
or levels selected by their index:
cdo sellevidx
you can also extract subsets in terms of dates or times...
More info here:
https://code.mpimet.mpg.de/projects/cdo/wiki/Cdo#Documentation

Extract certain values out of netCDF

I ve a netCDF file with 3 Dimensions. The first dimension is the longitude and reaches from 1-464. The second dimension is the latitude and reaches from 1-201. The third dimension is time and reaches from 1-5479.
Now I want to extract certain values out of the file. I think one can handle it with the start argument. I tried this command.
test = open.ncdf("rr_0.25deg_reg_1980-1994_v8.0.nc")
data = get.var.ncdf(test,start=c(1:464,1:201,1:365))
But somehow it doesnt work. Has anybody a solution?
Thanks in advance...
It looks like you are using the ncdf package in R. If you can, I recommend using the updated ncdf4 package, which is based on Unidata's netcdf version 4 library (link).
Back to your problem. I use the ncdf4 package, but I think the ncdf package works the same way. When you call the function get.var.ncdf, you also need to explicitly supply the name of the variable that you want to extract. I think you can get the names of the variables using names(test$var).
So you need to do something like this:
# Open the nc file
test = open.ncdf("rr_0.25deg_reg_1980-1994_v8.0.nc")
# Now get the names of the variables in the nc file
names(test$var)
# Get the data from the first variable listed above
# (May not fit in memory)
data = get.var.ncdf(test,varid=names(test$var)[1])
# If you only want a certain range of data.
# The following will probably not fit in memory either
# data = get.var.ncdf(test,varid=names(test$var)[1])[1:464,1:201,1:365]
For your problem, you would need to replace varid=names(test$var)[1] above with varid='VARIABLE_NAME', where VARIABLE_NAME is the variable you want to extract.
Hope that helps.
EDIT:
I installed the ncdf package on my system, and the above code works for me!
You could also do the extracting of timesteps/dates and locations outside of R before reading it into to R for plotting etc, by using CDO. This has the advantage that you can work directly in the coordinate space and specify timesteps or dates directly:
e.g.
cdo seldate,20100101,20121031 in.nc out.nc
cdo sellonlatbox,lon1,lon2,lat1,lat2 in.nc out.nc

Resources