Problems plotting Netcdf files in Panoply - netcdf

I can't create plot from .nc files using Panoply. The error message reads: "Axis contains NaN". Is there a workaround ?

Change the NaNs to normal "missing values", then use Panoply.
The netCDF Operator (NCO) command ncatted works with NaNs.
First set the missing value (i.e., the value of the _FillValue
attribute) for the variable(s) in question to the IEEE NaN value:
ncatted -a _FillValue,,o,f,NaN in.nc
Then change the missing value from the IEEE NaN value to a
normal IEEE number, like 1.0e36 (or to whatever the original
missing value was). Now use Panoply. More at
http://nco.sf.net/nco.html#NaN

Related

ArgumentError: quantiles are undefined in presence of NaNs or missing values

I would like to create a boxplot that contains some missing values in Julia. Here is some reproducible code:
using DataFrames
using StatsPlots
df = DataFrame(y = [1,2,3,2,1,2,4,NaN,NaN,2,1])
boxplot(df[!, "y"])
Output:
ArgumentError: quantiles are undefined in presence of NaNs or missing values
I know that the error happens because of the NaN values, but is there not an option in boxplot to still plot the values instead of removing the missing values beforehand? I would assume that it might be designed in a way that it works in presence of missing values. In R it will still plot the boxplot, so I was wondering why in Julia you must remove these missing values and what is an appropriate way to do this?
so I was wondering why in Julia you must remove these missing values
So the general reason is difference in philosophy of design behind R and Julia.
R was designed to be maximally convenient at the risk of doing an incorrect thing sometimes. It tries to guess what you most likely want and does this. In this case - you most likely want NaN values to be ignored.
Julia is designed for safety and production use. If you have NaN in your data it means that data preparation process had some serious issue (like division of 0 by 0). In production scenarios you want your code to error in such cases as otherwise it is hard to identify the root cause of the issue.
Now, seconding what Dan Getz commented - most likely your NaN is actually missing (as you refer to it as missing). These two should not be mixed and have a significantly different interpretation. NaN is a value that is undefined or unrepresentable, especially in floating-point arithmetic (e.g. 0 divided by 0). While missing is a value that is missing (e.g. we have not collected a measurement).
Still - even if your data contained missing you would get an error for the same safety reason.
what is an appropriate way to do this?
NaNs are very rare in practice, so what Dan Getz recommended is a typical way to filter them. Other would be [x for x in df.y if !isnan(x)].
If you had missing values in your data (as this is most likely what you want) you should write boxplot(skipmissing(df.y)).

making shape files with zeros values eliminated

I am trying to make a shapefile where the zero values are taken out in order to reduce the amount of space used. I attempted to do this through the following procedure but only got more zero values displayed:
cdo gtc,0.000008 precip_2022110815_0p50.nc mask1.nc
cdo setctomiss,0 mask1.nc mask2.nc
cdo mul precip_2022110815_0p50.nc mask2.nc precip_2022110815_0p50_adjust.nc
cdo setctomiss,0 precip_2022110815_0p50_adjust.nc precip_2022110815_0p50_final.nc
gdal_contour -a precip -i 0.00001 precip_2022110815_0p50_final.nc precip_2022110815_0p50.shp
I got the netcdf from a grib file that was obtained from ftp.ncep.noaa.gov.
Is there anyway I could tweak this code or other methods I could use to get a shape file where all zero values are filtered out or even values below a certain threshold are filtered out? Is there a way to filter the values below a certain amount out from a grib2 file?

How to remove the scaling and offset attributes of a variable in a netcdf file to get the actual data values

I have a variable which has the dimensions time x lat x lon x levels and I am trying to read this in my global climate model. The problem I am facing is that the data is scaled and offset in the original file and it is a hassle to incorporate this inside the model. I want to modify the file so that the original data without any scaling or offset is available to read in the climate model.
You can use the following syntax to unpack the scale and offset in the data
ncpdq --unpack input_file.nc out_file.nc
The file out_file.nc will contain the actual values without any scaling or offset
I think this should do the trick:
cdo -b 32 copy input_file.nc out_file.nc
or this
cdo -b f32 copy input_file.nc out_file.nc
essentially it unpacked the data by converting it to a 32 bit float. If you want a 64 bit double precision float you can use 64 instead 32.

netcdf dimension variable interpretation

I'm trying to understand if this is allowed by NetCDF standards. It does not make sence to me, but maybe there is a reason why it is not forbidden at library level. Ncdump:
netcdf tt {
dimensions:
one = 2 ;
two = 1 ;
variables:
int64 one(two) ;
data:
one = 1 ;
}
And code to produce this file in python:
from netCDF4 import Dataset
rr=Dataset('tt.nc','w')
rr.createDimension('one',2)
rr.createDimension('two',1)
var1=rr.createVariable('one','i8',('two'))
var1[:]=1
rr.close()
Note the variable with the same name as dimension, but with a different dimension than itself?!
So two questions:
is this allowed by standard?
if not, should it be restricted by libraries?
It's valid because the names of attributes, names of dimensions, and names of variables all exist in different namespaces.
It's valid, but obviously makes for confusing code and output and would not be acceptable in a professional sense. Though, note that single-dimension arrays that have the same name and size as the dimension they are assigned to are called "coordinate variables."
For example, you'll often see a variable named latitude that is 1D and has a dimension named latitude. ncks or ncdump should reveal a (CRD) next to that variable display, indicating that it is indeed coordinated to the array of latitudes.

How to conditionally plot in gnuplot with missing or invalid data?

In gnuplot (I'm using 5.1 CVS) one can specify missing data (set datafile missing '?' for example) and gnuplot also knows invalid data (like NaN or 1/0).
How can I conditionally react to them? If my data has one of them I sometimes (i.e. on some columns, but not on all) want to do something else instead of just skipping them. So, basically I want to say (pseudocode)
plot 'datafile' using 1:($2 = MISSING ? $3+$4 : $2 )
I can use strcol(2) to check the column content, but this does not work for the string specified by missing set datafile missing '?' because the string specified by set datafile missing seems to have a higher "importance", because I can't check it using colstr() (gnuplot stops handling the datapoint before it even comes to evaluating strcol()).
My data can have missing data in several columns. If it is, for example, missing in column 2 I just want a gap in the data (like it's invalid), but if it is in column 3 I want it to plot something else instead and not leaving a gap.
For invalid data (like the pre-defined NaN) this works perfectly fine. It is skipped when appearing in the data, but I can also react to it by saying strcol(2) == 'NaN' ? $3+$4 : $2. So for invalid data, gnuplot first evaluates strcol() if (and only if) it is there.
I can simulate this behaviour by using two "missing chars", one that I use for set datafile missing and another one that I use for strcol() checks. But this is an ugly workaround, I would have to edit my datafiles and replacing half of the missing chars by hand. Is there a way to handle missing also data conditionally, like one can handle invalid data?

Resources