How to remove the scaling and offset attributes of a variable in a netcdf file to get the actual data values - netcdf

I have a variable which has the dimensions time x lat x lon x levels and I am trying to read this in my global climate model. The problem I am facing is that the data is scaled and offset in the original file and it is a hassle to incorporate this inside the model. I want to modify the file so that the original data without any scaling or offset is available to read in the climate model.

You can use the following syntax to unpack the scale and offset in the data
ncpdq --unpack input_file.nc out_file.nc
The file out_file.nc will contain the actual values without any scaling or offset

I think this should do the trick:
cdo -b 32 copy input_file.nc out_file.nc
or this
cdo -b f32 copy input_file.nc out_file.nc
essentially it unpacked the data by converting it to a 32 bit float. If you want a 64 bit double precision float you can use 64 instead 32.

Related

making shape files with zeros values eliminated

I am trying to make a shapefile where the zero values are taken out in order to reduce the amount of space used. I attempted to do this through the following procedure but only got more zero values displayed:
cdo gtc,0.000008 precip_2022110815_0p50.nc mask1.nc
cdo setctomiss,0 mask1.nc mask2.nc
cdo mul precip_2022110815_0p50.nc mask2.nc precip_2022110815_0p50_adjust.nc
cdo setctomiss,0 precip_2022110815_0p50_adjust.nc precip_2022110815_0p50_final.nc
gdal_contour -a precip -i 0.00001 precip_2022110815_0p50_final.nc precip_2022110815_0p50.shp
I got the netcdf from a grib file that was obtained from ftp.ncep.noaa.gov.
Is there anyway I could tweak this code or other methods I could use to get a shape file where all zero values are filtered out or even values below a certain threshold are filtered out? Is there a way to filter the values below a certain amount out from a grib2 file?

How to concatenate multiple netCDF files with varying dimension sizes?

I have 20 netCDF files containing oceanographic CTD data. Each file contains the same dimension and variable names, however they differ in the size of the vertical coordinate (ie. CTD profiles inshore have a smaller depth range than profiles offshore). I need to concatenate these separate files into one netCDF file with a record variable "station".
I have tried:
ncecat -u station *.nc outfile.nc
This concatenates the files in the correct way, but it takes the dimension size of the first netCDF file (which is the smallest) and so I lose the data below the depth of the shallowest CTD profile for the rest of the netCDF files.
I'm assuming I need to add FillValues (or similar) in place of the data that is shallower than the maximum depth of the deepest CTD profile.
Is there a way to do this using ncecat?
The closest you can get with ncecat alone is to use group aggregation to store each station profile as its own group in a netCDF4 file. Then you do not need to search for and fill-in any missing data:
ncecat --gag *.nc outfile.nc

R save as NetCDF file after simple calculation

I want to do something (apparently) simple, but didn't yet find the right way to do it:
I read a netcdf file (wind speed from the ERA5 reanalysis) on a grid.
From this, I use the wind speed to calculate a wind capacity factor (using a given power curve).
I then want to write a new netcdf file, with exactly the same structure as the input file, but just replacing the input wind speed by the new variable (wind capacity factor).
Is there a simple/fast way to do this, avoiding to redefine all the dims, vars ... with ncvar_def and ncdim_def ?
Thanks in advance for your replies!
Writing a netcdf file in R is not overly complicated, there is a nice example online here:
http://geog.uoregon.edu/GeogR/topics/netCDF-write-ncdf4.html
You could copy the dimensions from the input file.
However if your wind power curve is a simple analytical expression then you could perform this task in one line from the command line in bash/linux using climate data operators (cdo).
For example, if you have two variables 10u and 10v in the file (I don't recalled the reanalysis names exactly) then you could make a new variable WCF=SQRT(U2+V2) in the following way
cdo expr,'wcf=sqrt(10u**2+10v**2)' input.nc output.nc
See an example here:
https://code.mpimet.mpg.de/boards/53/topics/1622
So if your window power function is an analytical expression you can define it this way without using R at all or worrying about dimensions etc, the new file will have an variable wcf added. You should then probably use NCO to alter the metadata (units etc) to ensure they are appropriate.

Expand a netCDF Variable into an additional Dimension or multiple Variables

I am working with a very large netCDF file in three dimensions (lat/lon/time). The resolution is 300 meters and the time variable has 25 steps, which leads to 64800x129600x25 cells.
The one variable contained in the file is an integer (ranging from -36 to 120) but represents an underlying factor, which is the problem.
It is a land cover data set, so for example: -20 means the cell is of the land type Forest or 10 means the cell is covered by water.
I want to reshape the netCDF file such that there is an additional dimension which represents every factor level of the original variable. And the variable would then be just a 1 or 0 per cell indicating the presence of every factor level at a certain lat/lon/time.
The dimensions would then be lat/lon/time/land type.
Here is an example data set, that does not concern land type but is small enough that it can be used for testing. And here is some code to read it in:
library(ncdf4)
# Download the data
download.file("http://schubert.atmos.colostate.edu/~cslocum/code/air.sig995.2012.nc",
mode="wb", destfile = "test.nc")
test.ncdf <- nc_open("test.nc", write=TRUE)
# See the lon,lat,time dimensions
print(test.ncdf)
tmp.array <- ncvar_get(test.ncdf, varid="air")
I'm not sure if the raster package is better more suited for this task. For very small netCDF-files I have managed the intended result to some extent, by extracting the data and then stacking it as a data.frame.
Any help or pointing in the right direction would be greatly appreciated.
Thanks in advance.
If I understand correctly, you want to have a set of fields for each type that are 1 or 0 as a function of lat/long/time. e.g. if you are looking a forest you want an array which is 1 when the factor=20 and 0 otherwise.
I know you want to do this in a 4 dimensional array, for that you will need to use R I expect as you tagged the question. But if you don't mind to have a series of 3D arrays for types, a quick and easy way to do this is to use CDO to process the integer array
cdo eqc,-20 air.sig995.2012.nc test.nc
The issue with this is that the output variable still has the same name
(you don't say what it is called, so I refer to it as sfctype), and so you would need to change the meta data with nco.
Therefore a better way would be to use expr in cdo.
cdo expr,"forest=sfctype==-20" air.sig995.2012.nc forest.nc
This makes a new variable called forest which is 1 or 0.
You could now process all the types you want, and then merge them into one file:
cdo expr,"forest=(sfctype==-20)" air.sig995.2012.nc type_forest.nc
cdo expr,"forest=(sfctype==10)" air.sig995.2012.nc type_water.nc
...etc...
cdo merge type_*.nc combined_file.nc
(I don't think you need the curly brackets, but it is a clearer syntax)
...almost what you wanted in a few lines, but not quite... I am not sure how to "stack" these new variables into a 4D array if you really need that, but perhaps nco can do it.

Extract certain values out of netCDF

I ve a netCDF file with 3 Dimensions. The first dimension is the longitude and reaches from 1-464. The second dimension is the latitude and reaches from 1-201. The third dimension is time and reaches from 1-5479.
Now I want to extract certain values out of the file. I think one can handle it with the start argument. I tried this command.
test = open.ncdf("rr_0.25deg_reg_1980-1994_v8.0.nc")
data = get.var.ncdf(test,start=c(1:464,1:201,1:365))
But somehow it doesnt work. Has anybody a solution?
Thanks in advance...
It looks like you are using the ncdf package in R. If you can, I recommend using the updated ncdf4 package, which is based on Unidata's netcdf version 4 library (link).
Back to your problem. I use the ncdf4 package, but I think the ncdf package works the same way. When you call the function get.var.ncdf, you also need to explicitly supply the name of the variable that you want to extract. I think you can get the names of the variables using names(test$var).
So you need to do something like this:
# Open the nc file
test = open.ncdf("rr_0.25deg_reg_1980-1994_v8.0.nc")
# Now get the names of the variables in the nc file
names(test$var)
# Get the data from the first variable listed above
# (May not fit in memory)
data = get.var.ncdf(test,varid=names(test$var)[1])
# If you only want a certain range of data.
# The following will probably not fit in memory either
# data = get.var.ncdf(test,varid=names(test$var)[1])[1:464,1:201,1:365]
For your problem, you would need to replace varid=names(test$var)[1] above with varid='VARIABLE_NAME', where VARIABLE_NAME is the variable you want to extract.
Hope that helps.
EDIT:
I installed the ncdf package on my system, and the above code works for me!
You could also do the extracting of timesteps/dates and locations outside of R before reading it into to R for plotting etc, by using CDO. This has the advantage that you can work directly in the coordinate space and specify timesteps or dates directly:
e.g.
cdo seldate,20100101,20121031 in.nc out.nc
cdo sellonlatbox,lon1,lon2,lat1,lat2 in.nc out.nc

Resources