Is there a quick way to know how many missing values are in a netcdf file? Possibly using R.
Currently I have to
hum<-nc_open("rhum.sig995.2008.nc")
rhum<-ncvar_get(hum, "rhum")
then manually look up the missing value by typing 'hum' and copy it into this operation
sum(abs(rhum - 9.96920996838687e+36) < -9.96920996838687e+36)
Is there a more direct way, especially if I have to work with hundreds of files? I would like to avoid copying and pasting the missing value, and also I am not sure with what kind of precision the number should be handled.
My suggestion is to use the excellent raster package:
install.packages(raster)
library(raster)
r <- raster("rhum.sig995.2008.nc", var="rhum")
NAnum <- summary(r)[6]
The total number of missing data points for variable names "var" can be stored in a new additional variable using
ncap2 -s "nmiss=number_miss(var)" in.nc out.nc
or
ncap2 -s "nmiss=var.number_miss()" in.nc out.nc
If your data has a time dimension and you want to see the total number of missing points summed over the space dimensions, then you can see this with
cdo info in.nc
Related
I am trying to make a shapefile where the zero values are taken out in order to reduce the amount of space used. I attempted to do this through the following procedure but only got more zero values displayed:
cdo gtc,0.000008 precip_2022110815_0p50.nc mask1.nc
cdo setctomiss,0 mask1.nc mask2.nc
cdo mul precip_2022110815_0p50.nc mask2.nc precip_2022110815_0p50_adjust.nc
cdo setctomiss,0 precip_2022110815_0p50_adjust.nc precip_2022110815_0p50_final.nc
gdal_contour -a precip -i 0.00001 precip_2022110815_0p50_final.nc precip_2022110815_0p50.shp
I got the netcdf from a grib file that was obtained from ftp.ncep.noaa.gov.
Is there anyway I could tweak this code or other methods I could use to get a shape file where all zero values are filtered out or even values below a certain threshold are filtered out? Is there a way to filter the values below a certain amount out from a grib2 file?
I am analyzing a time series of data that is split by time into two NetCDF files (infiles). These files have a different number of variables/fields, by design. Traditionally I have been using Climate Data Operators (CDO) to easily merge two datasets sorted by time using the following command in a terminal:
cdo mergetime <infiles> <outfile>
this command merges any number of files "infiles" sorted by time and writes a new "outfile" containing a time series of all the data in each ; however this doesn't appear to work by default with cdo, as it kicks back the following:
cdo select (Abort): Input streams have different number of variables per timestep!
the statement is true, each file does have a different number of variables per timestep. But it prevents me from looking at the dataset as a whole. I have also tried the following modifications to the cdo command I use to merge the time series, without success:
cdo mergetime -select,name=<variable> <infiles> <outfile>
cdo -select,name=<variable> <infiles> <outfile>
I have read through the CDO Userguide and have not found any alternative solutions yet. I would be very grateful if anyone could offer a workaround for joining the two files into a single time-series of data (preferably in cdo but not necessarily) as I am running out of ideas.
On phone but you could delete the extra annoying new variables from files with nco like this
ncks -x -v var1,var2 in.nc out.nc
And then merge as usual. I think you can use the cdo delete operator to do the same thing.
I am trying to round either round a 3 dimensional matrix or several raster layers that will be made into a 3 dimensional matrix to 2 decimal places to make a new netcdf file that will take up less memory.
Using the round function as such:
newmatrix <- round(oldmatrix, 2)
only seems to round values superficially for display. Opening newmatrix and extracting values from it after adding it to a new file returns the unrounded values from oldmatrix. This is despite the fact that values extracted from newmatrix before adding it to a new file will be rounded to 2 decimal places as is supposed to be. The same thing happens if I round off the raster layers before creating a new matrix with them.
What function can I use to permanently round off my matrix's or raster's values to write to a new file rounded?
NetCDF doesn't have a fixed precision format that would save space in the way you are expecting. (See here for data types). The usual way of saving space is to encode as a short integer and set the variable attributes scale_factor and add_offset.
In your case, you would multiply by 100, convert to short, and have scale_factor=0.01. Doing this in R is probably a lot of work, but nco utility would handle it in a few lines.
Let's say you have a variable called rh.
ncap2 -v -s 'rh=short(100*rh);' in.nc out.nc
ncatted -O -h -a add_offset,rh,o,f,0 out.nc
ncatted -O -h -a scale_factor,rh,o,f,0.01 out.nc
Equivalently, this can be done in one line using
ncap2 -v -O -s 'rh=pack_short(rh, 0.01, 0.0);' in.nc out.nc
If you are looking to save memory when reading a variable into R, you might be disappointed as it will just be converted back into a float upon reading.
I'm reading a large wrfout data(about 100x100 in space, 30 in vertical, 400 in times) by ncl.
fid=addfile("wrfout_d03.nc","r")
u=fid->U
The variable U is about 500M, so it takes much time, and I also need to read other variables.Is there any way for ncl to read large netcdf data quickly? Or can I use other languages?
It may be more helpful to extract the variables and timeslices you need before reading them into NCL.
To select by variable:
cdo selvar,var in.nc out.nc
To select by level:
cdo sellevel
or levels selected by their index:
cdo sellevidx
you can also extract subsets in terms of dates or times...
More info here:
https://code.mpimet.mpg.de/projects/cdo/wiki/Cdo#Documentation
I ve a netCDF file with 3 Dimensions. The first dimension is the longitude and reaches from 1-464. The second dimension is the latitude and reaches from 1-201. The third dimension is time and reaches from 1-5479.
Now I want to extract certain values out of the file. I think one can handle it with the start argument. I tried this command.
test = open.ncdf("rr_0.25deg_reg_1980-1994_v8.0.nc")
data = get.var.ncdf(test,start=c(1:464,1:201,1:365))
But somehow it doesnt work. Has anybody a solution?
Thanks in advance...
It looks like you are using the ncdf package in R. If you can, I recommend using the updated ncdf4 package, which is based on Unidata's netcdf version 4 library (link).
Back to your problem. I use the ncdf4 package, but I think the ncdf package works the same way. When you call the function get.var.ncdf, you also need to explicitly supply the name of the variable that you want to extract. I think you can get the names of the variables using names(test$var).
So you need to do something like this:
# Open the nc file
test = open.ncdf("rr_0.25deg_reg_1980-1994_v8.0.nc")
# Now get the names of the variables in the nc file
names(test$var)
# Get the data from the first variable listed above
# (May not fit in memory)
data = get.var.ncdf(test,varid=names(test$var)[1])
# If you only want a certain range of data.
# The following will probably not fit in memory either
# data = get.var.ncdf(test,varid=names(test$var)[1])[1:464,1:201,1:365]
For your problem, you would need to replace varid=names(test$var)[1] above with varid='VARIABLE_NAME', where VARIABLE_NAME is the variable you want to extract.
Hope that helps.
EDIT:
I installed the ncdf package on my system, and the above code works for me!
You could also do the extracting of timesteps/dates and locations outside of R before reading it into to R for plotting etc, by using CDO. This has the advantage that you can work directly in the coordinate space and specify timesteps or dates directly:
e.g.
cdo seldate,20100101,20121031 in.nc out.nc
cdo sellonlatbox,lon1,lon2,lat1,lat2 in.nc out.nc