NCO combine variables into 3rd dimension - netcdf

I have a netcdf file with the below variables
Albers
Band1
Band2
Band3
x
y
time
Each band has dimensions (x,y). I want to combine the bands into a single variable with dimensions (x,y,time).
Is that possible with NCO or another library? Python or command line.
Thanks

Try NCO's ncap2. If time is size 3, and each Band is interpreted as the band at a different time, then something like this will work:
ncap2 -s 'band=Band1*time;band[:,:,1]=Band2;band[:,:,2]=Band3' in.nc out.nc
If instead the Bands are independent of time then you must add a new dimension of size 3 to accomodate the bands, e.g.,
ncap2 -s 'defdim("band",3);Band[$x,$y,$band,$time]=0;Band[:,:,0,:]=Band1;....' in.nc out.nc
Good luck!

Related

Reducing number of datapoints when plotting in loglog scale in Gnuplot

I have a large dataset which I need to plot in loglog scale in Gnuplot, like this:
set log xy
plot 'A_1D_l0.25_L1024_r0.dat' u 1:($2-512)
LogLogPlot of my datapoints
Text file with the datapoints
Datapoints on the x axis are equally spaced, but because of the logscale they get very dense on the right part of the graph, and as a result the output file (I finally export it in .tex) gets very large.
In linear scale, I would simply use the option every to reduce the number of points which get plotted. Is there a similar option for loglogscale, such that the plotted points appear equally spaced?
I am aware of a similar question which was raised a few years ago, but in my opinion the solution is unsatisfactory: plotted points are not equally spaced along the x-axis. I think this is a really unsophisticated problem which deserves a clearer solution.
As I understand it, you don't want to plot the actual data points; you just want to plot a line through them. But you want to keep the appearance of points rather than a line. Is that right?
set log xy
plot 'A_1D_l0.25_L1024_r0.dat' u 1:($2-512) with lines dashtype '.' lw 2
Amended answer
If it is important to present outliers/errors in the data set then you must not use every or any other technique that simply discards or skips most of the data points. In that case I would prefer the plot with points that you show in the original question, perhaps modified to represent each point as a dot rather than a cross. I will simulate this by modifying a single point in your 500000 point data set (first figure below). But I would also suggest that the presence of outliers is even more apparent if you plot with lines (second figure below).
Showing error bounds is another alternative for noisy data, but the options depend on what you have to work with in your data set. If you want to pursue that, please ask a separate question.
If you really want to reduce the number of data to be plotted, you might consider the following script.
s = 0.1 ### sampling interval in log scale
### (try 0.05 for more detail)
c = log10(0.01) ### a parameter used in sampler(x)
### which should be initialized by
### smaller value than any x in log scale
sampler(x) = (x>0 && log10(x)>=c) ? (c=ceil(log10(x)/s+0.5)*s, x) : NaN
set log xy
set grid xtics
plot 'A_1D_l0.25_L1024_r0.dat' using (sampler($1)):($2-512) with points pt 7 lt 1 notitle , \
'A_1D_l0.25_L1024_r0.dat' using 1:($2-512) with lines lt 1 notitle
This script samples the data in increments of roughly 0.1 on x-axis in log scale. It makes use of the property that points whose x value is evaluated as NaN in using are not drawn.

Calculating weighted spatial global annual averages across grid cells using netcdf dataset in R

I am currently working on a project that involves climate model data stored in a NetCDF file. I am currently trying to calculate "weighted" spatial annual "global" averages for precipitation. I need to do this for each of the 95 years of global precipitation data that I have. The idea would be to somehow apply weights to each grid cell by using the cosine of its latitude (which means latitude grid cells at the equator would have a weight of 1 (i.e. the cosine of 0 degrees is 1), and the poles would have a value of 1 (as the cosine of 90 is 1)). Then, I would be in a position to calculate annual weighted averages based on averaging each grid cell.
I have an idea how to do this conceptually, but I am not sure where to begin writing a script in R to apply the weights across all grid cells and then average these for each of the 95 years. I would greatly appreciate any help with this, or any resources that may be helpful!!!
At the very least, I have opened the .nc file and read-in the NetCDF variables, as shown below:
ncfname<-"MaxPrecCCCMACanESM2rcp45.nc"
Prec<-raster(ncfname)
print(Prec)
Model<-nc_open(ncfname)
get<-ncvar_get(Model,"onedaymax")
longitude<-ncvar_get(Model, "lon")
latitude<-ncvar_get(Model, "lat")
Year<-ncvar_get(Model, "Year")
Additionally, let's say that I wanted to create a time series of these newly derived weighted averaged for a specific location or region, the following code that I previously used to show trends over the 95 years for one-day maximum precipitation works, but I would just need to change it slightly to use the annual weighted means? :
r_brick<-brick(get, xmn=min(latitude), xmx=max(latitude), ymn=min(longitude),
ymx=max(longitude), crs=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84
+no_defs+ towgs84=0,0,0"))
r_brick<-flip(t(r_brick), direction='y')
randompointlon<-13.178
randompointlat<--59.548
Random<-extract(r_brick,
SpatialPoints(cbind(randompointlon,randompointlat)),method='simple')
df<-data.frame(year=seq(from=1, to=95, by=1), Precipitation=t(Hope))
ggplot(data=df, aes(x=Year, y=Precipitation,
group=1))+geom_line()+ggtitle("One-day maximum precipitation (mm/day) trend
for Barbados for CanESM2 RCP4.5")
Also, if it helps, here is what the .nc file contains:
3 variables (excluding dimension variables):
double onedaymax[lon,lat,time] (Contiguous storage)
units: mm/day
double fivedaymax[lon,lat,time] (Contiguous storage)
units: mm/day
short Year[time] (Contiguous storage)
3 dimensions:
time Size:95
lat Size:64
units: degree North
lon Size:128
units: degree East
Again, any assistance would be extremely valuable with this! I look forward to your response!
Please ask one clear question at a time, and provide example data (through code).
I do not think you go about reading the ncdf data the right way. I think you should do
library(raster)
ncfname <- "MaxPrecCCCMACanESM2rcp45.nc"
Prec <- brick(ncfname, var="onedaymax")
(do not use nc_open etc)
To get a global weighted average
Example data
library(raster)
r <- abs(init(raster(), 'y'))
s <- stack(r, r, r)
s is a RasterStack with value 90 at the poles and 0 at the equator
The unweighted global mean. First average the layers, then the cells (reverse order would also work in this case)
sm <- mean(s, na.rm=TRUE)
cellStats(sm, mean, na.rm=TRUE)
[1] 45
Now use weighting (to get a lower number is the high latitudes get less weight)
# raster with latitude cell values
w <- init(s, 'y')
# cosine after transforming to radians
w <- cos(w * (pi/180))
# multiply weights with values
x <- sm * w
# compute weighted average
cellStats(x, sum) / cellStats(w, sum)
#[1] 32.70567
An alternative, and perhaps simpler, solution is uses the area of each cell (which is proportional to cos(lat)). The result is perhaps a little bit more precise (as area does not only considering the cell center).
a <- area(s) / 10000
y <- sm * a
cellStats(y, sum) / cellStats(a, sum)
#[1] 32.72697
Later:
For a time series, just use s.
unweighted
cellStats(s, mean)
#layer.1 layer.2 layer.3
# 45 45 45
weighted
a <- area(s) / 10000
y <- s * a
cellStats(y, sum) / cellStats(a, sum)
# layer.1 layer.2 layer.3
#32.72697 32.72697 32.72697
Not that I want to pull you away from R, but this sort of calculation is the absolute bread and butter of cdo (climate data operators) straight from the command line!
Calculate the spatial weighted mean (this accounts for the sin of latitude and can also handle reduced Gaussian grids etc) :
cdo fldmean input.nc fldmean.nc
Calculate the annual mean
cdo yearmean input.nc yearmean.nc
Calculate a timeseries of annual global means by combining the two (i.e. use fldmean.nc as the input for the second command), or you can do on one line by piping:
cdo yearmean -fldmean input.nc yearglobal.nc
What's that? You want to calculate it for a lat-lon box region you say, not global averages? No problem, use sellonlatbox first to cut out an area
cdo sellonlatbox,lon1,lon2,lat1,lat2 in.nc out.nc
so piping this:
cdo yearmean -fldmean -sellonlatbox,lon1,lon2,lat1,lat2 in.nc yearregion.nc
But wait! you now want a specific location, not a region average? Well you can pick out the nearest gridbox to a location with
cdo remapnn,lon=mylon/lat=mylat in.nc out.nc
so you can get your series of annual averaged values there with:
cdo yearmean -remapnn,lon=mylon/lat=mylat in.nc yearmylocation.nc
the possibilities are many... install it with
sudo apt install cdo
and take a look at the documentation here: https://code.mpimet.mpg.de/projects/cdo/wiki/Cdo#Documentation

Creating a 3D variable from a 1D variable with NCO

I have a 1D variable describing height in my NetCDF file. I'd like to create an 3D variable that is uniform on the X and Y axes from this column of data. Is there an easy way to do this in NCO?
If the 1-D axes are named x and y and z, then construct a 3D field containing z that is uniform over x and y with ncap2:
ncap2 -s 'z_3D[z,y,x]=z' in.nc out.nc
Pretty cool, huh?

DBSCAN for clustering data by location and density

I'm using the method dbscan::dbscan in order to cluster my data by location and density.
My data looks like this:
str(data)
'data.frame': 4872 obs. of 3 variables:
$ price : num ...
$ lat : num ...
$ lng : num ...
Now I'm using following code:
EPS = 7
cluster.dbscan <- dbscan(data, eps = EPS, minPts = 30, borderPoints = T,
search = "kdtree")
plot(lat ~ lng, data = data, col = cluster.dbscan$cluster + 1L, pch = 20)
but the result isn't satisfying at all, the point's aren't really clustered.
I would like to have the clusters nicely defined, something like this:
I also tried to use use a decision tree classifier tree:tree which works better, but I can't tell if it is really a good classification.
File:
http://www.file-upload.net/download-11246655/file.csv.html
Question:
is it possible to achieve what I want?
am I using the right method?
should I play more with the parameters? if yes, with which?
This is the output of a careful density-based clustering using the quite new HDBSCAN* algorithm.
Using Haversine distance, instead of Euclidean!
It identified some 50-something regions that are substantially more dense than their surroundings. In this figure, some clusters look as if they had only 3 elements, but they do have many more.
The outermost area, these are the noise points that do not belong to any cluster at all!
(Parameters used: -verbose -dbc.in file.csv -parser.labelIndices 0,1 -algorithm clustering.hierarchical.extraction.HDBSCANHierarchyExtraction -algorithm SLINKHDBSCANLinearMemory -algorithm.distancefunction geo.LatLngDistanceFunction -hdbscan.minPts 20 -hdbscan.minclsize 20)
OPTICS is another density-based algorithm, here is a result:
Again, we have a "noise" area with red dots are not dense at all.
Parameters used: -verbose -dbc.in file.csv -parser.labelIndices 0,1 -algorithm clustering.optics.OPTICSXi -opticsxi.xi 0.1 -algorithm.distancefunction geo.LatLngDistanceFunction -optics.minpts 25
The OPTICS plot for this data set looks like this:
You can see there are many small valleys that correspond to clusters. But there is no "large" structure here.
You probably were looking for a result like this:
But in fact, this is a meaningless and rather random way of breaking the data into large chunks. Sure, it minimizes variance; but it does not at all care about the structure of the data. Points within one cluster will frequently have less in common than points in different clusters. Just look at the points at the border between the red, orange, and violet clusters.
Last but not least, the oldtimers: hierarchical clustering with complete linkage:
and the dendrogram:
(Parameters used: -verbose -dbc.in file.csv -parser.labelIndices 0,1 -algorithm clustering.hierarchical.extraction.SimplifiedHierarchyExtraction -algorithm AnderbergHierarchicalClustering -algorithm.distancefunction geo.LatLngDistanceFunction -hierarchical.linkage CompleteLinkageMethod -hdbscan.minclsize 50)
Not too bad. Complete linkage works on such data rather well, too. But you could merge or split any of these clusters.
You can use something called as Hullplot
In your cases
hullplot(select(data, lng, lat), cluster.dbscan$cluster)

Gnuplot: plot with circles of a defined radius

I know on gnuplot you can plot some data with circles as the plot points:
plot 'data.txt' using 1:2 ls 1 with circles
How do I then set the size of the circles? I want to plot several sets of data but with different size circles for each data set.
If you have a third column in your data, the third column specifies the size of the circles. In your case, you could have the third column have the same value for all the points in each data set. For example:
plot '-' with circles
1 1 0.2
e
will plot a circle at (1,1) with radius 0.2. Note that the radius is in the same units as the data. (The special file name '-' lets you input data directly; typing 'e' ends the input. Type help special at the gnuplot console for more info.)
You can look here for more ideas of how to use circles.
I used:
plot "file" using 1:2:($2*0+10) with circles
This will fake a the third column specifying the sizes - it is probably possible to write it simpler, but this worked for me.

Resources