How to reorganize netcdf level by increasing order - netcdf

I got a netcdf data but with strange level sequence.
I like the level data to range from 1000, 900, 800, .. to 100
however my data shows 300, 400, 1000, 900, 800, ... 100
What's the cleanest method to reorganize this data by rising or decreasing order?
p.s. I am considering about CDO or NCO, but failed to find a good method...
Any one help me?

You could try using NCO's ncpdq to reverse values in the level dimension, something like:
ncpdq -a -level in.nc out.nc
See more details and examples here.

Related

How to read a stars object with over 32768 bands?

I have a very large dataset consisting of one attribute, simulated daily from 1970 to 2100, defined on a rather fine geographic grid. It has been given to me as a netCDF file, which I would like to read and analyze in an R script. The data is too big to fully fit in memory, so I wrote a script that does the analysis with stars proxy objects and the purrr package. It has worked for similar smaller datasets.
However, this dataset seems too big - there are 45956 bands, one for each time step., and it seems like the read_stars() command has an upper limit to how many bands an object can have. This is what my code looks like after loading the proper librairies, where data_path points to a single .nc file:
data_full <- read_stars(data_path, proxy = TRUE)
It returns the following:
Warning message:
In CPL_read_gdal(as.character(x), as.charater(options), as.characters(driver), :
GDAL Message 1 : Limiting number of bands to 32768 instead of 45956
Then the data is cropped and stops around 2050. I would like to have the full data in the data_full variable. Is is posible to increase the bands limits? Or are there alternative ways of doing this?
Try setting GDAL_MAX_BAND_COUNT to 65536
Python:
gdal.SetConfigOption('GDAL_MAX_BAND_COUNT',65536)
bash:
export GDAL_MAX_BAND_COUNT=65536

Change variable chunk of a Netcdf with R

Regularly I face the same problem when using R to work with big netcdf files (bigger than the computer memory). There is not an obvious way to change the chunk of the data. This is probably the only netcdf common task that I can not figure out how to do it in an efficient way in R. I used to go trough this problem using NCO or nccopy depending the situation. Even CDO has options to copy a nc changing the chunk but much less flexible than the previous tools. I am wondering if there is any efficient way to do it in R.
The following example generates a toy nc chunked as Chunking: [100,100,1]
library(ncdf4)
foo_nc_path=paste0(tempdir(),"/thing.nc")
xvals <- 1:100
yvals <- 1:100
lon <- ncdim_def("longitude", "Km_east", xvals)
lat <- ncdim_def("latitude", "Km_north", yvals)
time <- ncdim_def("Time","hours", 1:1000, unlim=TRUE)
var<- ncvar_def("foo_var", "nothing", list(lon, lat, time), chunksizes=c(100,100,1),
longname="xy chunked numbers", missval=-9)
foo_nc <- nc_create(foo_nc_path, list(var))
data <- array(runif(100*100*1000),dim = c(100,100,1000))
ncvar_put(foo_nc, var, data)
nc_close(foo_nc)
####Check speed
foo_nc <- nc_open(foo_nc_path)
system.time({timestep <- ncvar_get(foo_nc,"foo_var",start = c(1,1,1),count=c(-1,-1,1))})
system.time({timeserie <- ncvar_get(foo_nc,"foo_var",start = c(1,1,1),count=c(1,1,-1))})
As you can see, the read time is much bigger for the timeserie than fot the timestep var
The time difference increase exponentially with the size of the .nc.
Does anybody know any way to change the chunk of a nc file in R, whose size is bigger than the computer memory?
It depends on you purpose. If you need to extract/analyze "map-wise" slices (i.e. on the lat-lon matrix), then keep the chunking strategy on the spatial coordinates. However, if you wish to run a time-wise analysis (such as extracting time series of each grid cell to calculate trends), then my advice is to switch your chunking strategy to the time dimension.
Try re-rerunning your code by replacing chunksizes=c(100,100,1) with something like, say chunksizes=c(10,10,1000). The time series reading becomes much faster that way.
If your code is really slow in R you can try a faster alternative, such as (for example) nccopy or nco.
You can re-chunk your netcdf file using a simple nccopy command like this: nccopy -c time/1000,lat/10,lon/10 input.nc output.chunked.nc
In nco (which I recommend over nccopy for this operation), you could do something along the lines:
ncks -O -4 -D 4 --cnk_plc g2d --cnk_dmn lat,10 --cnk_dmn lon,10 --cnk_dmn time,1000 in.nc out.nc
specifying --cnk_dmn to your specific variables with the chunk size of interest. More examples at http://nco.sourceforge.net/nco.html#Chunking.
Either way, you have to play around a little bit with the different chunk sizes in order to determine what works best for you specific case.

FFmpeg how to get an image for a particular 'coded_picture_number' together with motion vectors

I am looking for ways to get this output from ffmpeg:
Basically, I would like to pass to the shell, a command that allows me to output a particular frame number, let's say coded_picture_number=200 with the motion vectors drawn into it.
Any clue? Thanks in advance.
This
ffmpeg -flags2 +export_mvs -i video.avi -vf 'select=gte(n\,200),codecview=mv=pf+bf+bb' -vframes 1 frame.png
will open video.avi, skip first 200 frames (n starts from 0), visualize motion vectors (all types), and writes exactly 1 frame into frame.png.

Read large netcdf data by ncl

I'm reading a large wrfout data(about 100x100 in space, 30 in vertical, 400 in times) by ncl.
fid=addfile("wrfout_d03.nc","r")
u=fid->U
The variable U is about 500M, so it takes much time, and I also need to read other variables.Is there any way for ncl to read large netcdf data quickly? Or can I use other languages?
It may be more helpful to extract the variables and timeslices you need before reading them into NCL.
To select by variable:
cdo selvar,var in.nc out.nc
To select by level:
cdo sellevel
or levels selected by their index:
cdo sellevidx
you can also extract subsets in terms of dates or times...
More info here:
https://code.mpimet.mpg.de/projects/cdo/wiki/Cdo#Documentation

count number of missing values in netcdf file - R

Is there a quick way to know how many missing values are in a netcdf file? Possibly using R.
Currently I have to
hum<-nc_open("rhum.sig995.2008.nc")
rhum<-ncvar_get(hum, "rhum")
then manually look up the missing value by typing 'hum' and copy it into this operation
sum(abs(rhum - 9.96920996838687e+36) < -9.96920996838687e+36)
Is there a more direct way, especially if I have to work with hundreds of files? I would like to avoid copying and pasting the missing value, and also I am not sure with what kind of precision the number should be handled.
My suggestion is to use the excellent raster package:
install.packages(raster)
library(raster)
r <- raster("rhum.sig995.2008.nc", var="rhum")
NAnum <- summary(r)[6]
The total number of missing data points for variable names "var" can be stored in a new additional variable using
ncap2 -s "nmiss=number_miss(var)" in.nc out.nc
or
ncap2 -s "nmiss=var.number_miss()" in.nc out.nc
If your data has a time dimension and you want to see the total number of missing points summed over the space dimensions, then you can see this with
cdo info in.nc

Resources