I have a 2D .nc file with dimensions time and depth that I want to convert to a 4D .nc file. Latitude and longitude are saved as variable names in the 2D file. They are not in a particular order and there is large missing areas as well. The .nc file also contains temperature recordings for each time and depth.
The file header is as follows:
dimensions:
time = UNLIMITED ; // (309 currently)
level = 2000 ;
variables:
float latitude(time) ;
latitude:units = "degree_north" ;
float longitude(time) ;
longitude:units = "degree_east" ;
float temperature(time, level) ;
temperature:standard_name = "sea_water_temperature" ;
temperature:long_name = "Water Temperature" ;
temperature:units = "Celsius" ;
temperature:_FillValue = -9999.f ;
temperature:missing_value = -9999.f ;
Is there an easy way using cdo or nco to bin the temperature recordings into a pre-defined latitude x longitude grid so that the resulting .nc file has four dimensions? (time,depth,latitude,longitude)
Adrian's answer looks correct to me except you do not need/want the dollarsigns in front of the dimension names, to so try
ncap2 -s 'Temp_new[time,depth,latitude,longitude]=temperature' in.nc out.nc
Documentation is here.
I think this posting is perhaps related to your question.
Maybe try
ncap2 -s 'Temp_new[time,depth,latitude,longitude]=temperature' in.nc out.nc
I'm not great at ncap2, perhaps Charlie will correct this post if this is not exactly correct.
(now edited to correct the error pointed out by Charlie)
Related
I am trying to concatenate a large (~4000) number of ncdf files into a single file. Each input file is a spatial raster, with a x and y dimension.
I am trying to work with ncecat:
ncecat -4 -L 5 -D 2 --open_ram --cnk_csh=1000000000 \
--cnk_dmn record,2000 --cnk_dmn x,10 --cnk_dmn y,10 \
$input_files output.nc
This gives me something like this:
netcdf test { dimensions:
record = UNLIMITED ; // (6 currently)
y = 11250 ;
x = 15000 ; variables:
float Band1(record, y, x) ;
Band1:long_name = "GDAL Band Number 1" ;
Band1:_FillValue = -3.4e+38f ;
Band1:grid_mapping = "transverse_mercator" ;
Band1:_Storage = "chunked" ;
Band1:_ChunkSizes = 1, 10, 10 ;
Band1:_DeflateLevel = 5 Band1:_Filter = "|1,5 ;
Band1:_Shuffle = "true" ;
Band1:_Endianness = "little" ;
, and the record dimension was not actually chunked.
I think I can run this command first, and then use ncks on the output file to fix the record dim and rechunk again, however, as ncks needs to read everything into ram, and is also another time-costly operation, I am searching a way to tell ncecat that it should also consider the record-dim as a chunking dimension. I haven't found a way to do this yet.
Your command looks well-formed, though there are a few comments I would make. First, the behavior you are seeing may be a bug, since the command should produce record dimension chunks of size 2000 as request. Second, please read the chunking documentation here. This leads to the possibility that adding the --cnk_plc=cnk_xpl option may help. Third, I suggest you concatenate and chunk the files with ncrcat not ncecat. The former is less memory-intensive than the latter, as described here.
I downloaded daily TRMM 3B42 data for a few days from https://disc.gsfc.nasa.gov/datasets. The filenames are of the form 3B42_Daily.yyyymmdd.7.nc4.nc4 but the files do not contain any time dimension. Hence when I use ncecat to concatenate various files, the date information is missing in the resulting file. I want to know how to add the time information in the combined dataset.
It seems that the timestamp is a part of the global attributes. Here is the relevant part from ncdump:
$ ncdump -h ~/Downloads/3B42_Daily.19980730.7.nc4.nc4
netcdf \3B42_Daily.19980730.7.nc4 {
dimensions:
lon = 201 ;
lat = 201 ;
variables:
float lon(lon) ;
...trimmed...
// global attributes:
:BeginDate = "1998-07-30" ;
:BeginTime = "01:30:00.000Z" ;
:EndDate = "1998-07-31" ;
:EndTime = "01:29:59.999Z" ;
...trimmed...
When I tried using ncecat 3B42_Daily.199808??.7.nc4.nc4 /tmp/daily.nc4 it gives
$ ncdump -h /tmp/daily.nc
netcdf daily {
dimensions:
record = UNLIMITED ; // (5 currently)
lon = 201 ;
lat = 201 ;
variables:
float lon(lon) ;
...trimmed...
// global attributes:
:BeginDate = "1998-08-01" ;
:BeginTime = "01:30:00.000Z" ;
:EndDate = "1998-08-02" ;
:EndTime = "01:29:59.999Z" ;
...trimmed...
The time information in the global attribute of the first file is retained, which is not very useful.
When trying to use xarray in python, I face the same issue - again, I could not find how the time information that is contained in the global attribute can be used when concatenating the data.
I can think of two possible solutions, but I do not know how to achieve them.
Add timestamp "by hand" using some command to each of the file first, and then use ncecat
Somehow ncecat may read the global attribute and convert it into a dimension and a variable while concatenating.
From the documentation on http://nco.sourceforge.net/nco.html I could not figure out how to achieve either of these two ways. Or is there a third better way to achieve this (concatenation with relevant time information added)?
Since the data files do not follow CF conventions, you will likely have to manually create a time coordinate after using ncecat to concatenate the files. It only takes a few commands if the times are regular, e.g.,
ncecat -u time in*.nc out.nc
ncap2 -O -s 'time[time]={0.0,1.0,2.0,3.0,4.0}' out.nc out.nc
ncatted -a units,time,o,c,"days since 1998-08-01" out.nc
or use ncap2's array facilities for generic arithmetic arrays.
I am trying to extract a single variable (DUEXTTAU) from multiple NC files, and then combine all the individual files into a single NC file. I am using nco, but have an issue with ncks.
The NC filenames follow:
MERRA2_100.tavgM_2d_aer_Nx.YYYYMM.nc4
Each file has 1 (monthly) time step, and the time coordinate has no real value, but changes in units or begin_date. For example, in the file MERRA2_100.tavgM_2d_aer_Nx.198001.nc4, it has:
int time(time=1);
:long_name = "time";
:units = "minutes since 1980-01-01 00:30:00";
:time_increment = 60000; // int
:begin_date = 19800101; // int
:begin_time = 3000; // int
:vmax = 9.9999999E14f; // float
:vmin = -9.9999999E14f; // float
:valid_range = -9.9999999E14f, 9.9999999E14f; // float
:_ChunkSizes = 1U; // uint
I repeat this step for each file
ncks -v DUEXTTAU MERRA2_100.tavgM_2d_aer_Nx.YYYYMM.nc4 YYYYMM.nc4
and then
ncrcat YYYYMM.nc4 final.nc4
In final.nc4, the time coordinate has the same value (of the first YYYYMM.nc4). For example, after combining the 3 files of 198001, 198002 and 198003, the time coordinate equals 198001 for all the time steps. How should I deal with this?
Firstly, this command should work:
ncrcat -v DUEXTTAU MERRA2_100.tavgM_2d_aer_Nx.??????.nc4 final.nc4
However, recent versions of NCO fail to correctly reconstruct or re-base the time coordinate when time is an integer, which it is in your case. The fix is in the latest NCO snapshot on GitHub and will be in 4.9.3 to be released hopefully this week. If installing from source is not an option, then manual intervention would be required (e.g., change time to floating point in each input file with ncap2 -s 'time=float(time)' in.nc out.nc). In any case, the time_increment, begin_date, and begin_time attributes are non-standard and will simply be copied from the first file. But time itself should be correctly reconstructed if you use a non-broken version of ncrcat.
you can do this using cdo as well, but you need two steps:
cdo mergetime MERRA2_100.tavgM_2d_aer_Nx.??????.nc4 merged_file.nc
cdo selvar,DUEXTTAU merged_file.nc DUEXTTAU.nc
This should actually work if the begin dates are all set correctly. The problem is that merged_file.nc could actually be massive, and so it may be better to loop through to extract the variable first and then combine:
for file in `ls MERRA2_100.tavgM_2d_aer_Nx.??????.nc4`; do
cdo selvar,DUEXTTAU $file ${file#????}_duexttau.nc4
done
cdo mergetime MERRA2_100.tavgM_2d_aer_Nx.??????_duexttau.nc4 DUEXTTAU.nc
rm -f MERRA2_100.tavgM_2d_aer_Nx.??????_duexttau.nc4 # clean up
Does _FillValue or missing_value still occup storage space?
If there is a 2-dimission array with some null values, How can i write it to netcdf file for saving storage space?
In netCDF3 every value requires the same amount of disk space. In netCDF4 it is possible to reduce the required disk space using gzip compression. The actual compression ratio depends on the data. If there are lots of identical values (for example missing data), you can achieve good results. Here is an example in python:
import netCDF4
import numpy as np
import os
# Define sample data with all elements masked out
N = 1000
data = np.ma.masked_all((N, N))
# Write data to netCDF file using different data formats
for fmt in ('NETCDF3_CLASSIC', 'NETCDF4'):
fname = 'test.nc'
ds = netCDF4.Dataset(fname, format=fmt, mode='w')
xdim = ds.createDimension(dimname='x', size=N)
ydim = ds.createDimension(dimname='y', size=N)
var = ds.createVariable(
varname='data',
dimensions=(ydim.name, xdim.name),
fill_value=-999,
datatype='f4',
complevel=9, # set gzip compression level
zlib=True # enable compression
)
var[:] = data
ds.close()
# Determine file size
print fmt, os.stat(fname).st_size
See the netCDF4-python documentation, section 9) "Efficient compression of netCDF variables" for details.
Just to add to the excellent answer from Funkensieper, you can copy and compress files from the command line using cdo:
cdo -f nc4c -z zip_9 copy in.nc out.nc
One could compress files simply using gzip or zip etc, but the disadvantage is that you need to decompress before reading. Using the netcdf4 compression capabilities avoids this.
You can select your level X of compression by using -z zip_X. If your files are very large you may want to sacrifice a little bit the file size in return for faster access times (e.g. using zip_5 or 6, instead of 9). In many cases with heterogeneous data, the compression gain is small relative to the uncompressed file.
or similarly with NCO
ncks -7 -L 9 in.nc out.nc
I am trying to open a file in R, which is binary and written in Fortran. The file is called GlobalLakeDepth.dat and is available at: http://www.flake.igb-berlin.de/gldbv2.tar.gz
The instructions specify that to open GlobalLakeDepth.dat (in Fortran), one would need to do the following:
An example of opening the binary file in FORTRAN90:
-- open(1, file = 'GlobalLakeDepth.dat', form='unformatted', access='direct', recl=2)
An example of reading the binary file in FORTRAN90:
-- read(1,rec=n) LakeDepth
-- where: n - record number, INTEGER(8);
LakeDepth - mean lake depth in decimeters, INTEGER(2).
My question is: Given these instructions in Fortran, how can I open this file in R? That is, is there an 'R way' of doing this?
I've been following the instructions at http://www.ats.ucla.edu/stat/r/faq/read_binary.htm, but, am still not any closer to getting anything from the data file. All I need is the information provided on the measured lake bathemetry for 36 large lakes.
You can use readBin to read a binary file. For this file, I think the correct command is
lk <- readBin("GlobalLakeDepth.dat", n = 43200 * 21600, what = "integer", endian = "little", size = 2)
This makes a very long vector that could be made into a 43200 * 21600 matrix.