Is there a way to read a raw netcdf file and tell what layer a value belongs to? - netcdf

I'm in the process of evaluating how successful a script I wrote is and kind of a quick and dirty method I've employed is looking at the first few values and last few values of a single variable and doing a few calculations with them based on the same values in another netcdf file.
I know that there are better ways to approach this but again, this is a really quick and dirty method that has worked for me so far. My question though is that by looking at the raw data through ncdump, is there a way to tell which vertical layer that data belongs to? In my example, the file has 14 layers. I"m assuming that the first few values are a part of the surface layer and the last few values are a part of the top layer, but I suspect that this assumption is wrong, at least in part.
As a follow-up question, what would then be the easiest 'proper' way to tell what layer data belongs to? Thank you in advance!

ncview and NCO are both very powerful and quick command line operators to view data inside a netcdf file.
ncview: http://meteora.ucsd.edu/~pierce/ncview_home_page.html
NCO: http://nco.sourceforge.net/
You can easily show variables over all layers for example with
ncks -d layer,0,13 some_infile.nc

ncdump dumps the data with the last dimension varying fastest (http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/CDL-Syntax.html) so if 'layer' is the slowest/first dimension, the earlier values are all in the first layer, while the last few values are in the last layer.
As to whether the first layer is the top or bottom layer, you'd have to look to the 'layer' dimension and its data.

Related

How to identify each file of origin when concatinating many netcdf files with ncrcat?

I am concatenating 1000s of nc-files (outputs from simulations) to allow me to handle them more easily in Matlab. To do this I use ncrcat. The files have different sizes, and the time variable is not unique between files. The concatenate works well and allows me to read the data into Matlab much quicker than individually reading the files. However, I want to be able to identify the original nc-file from which each data point originates. Is it possible to, say, add the source filename as an extra variable so I can trace back the data?
Easiest way: Online indexing
Before we start, I would use an integer index rather than the filename to identify each run, as it is a lot easier to handle, both for writing and then for handling in the matlab programme. Rather than a simple monotonically increasing index, the identifier can have relevance for your run (or you can even write several separate indices if necessary (e.g. you might have a number for the resolution, the date, the model version etc).
So, the obvious way to do this that I can think of would be that each simulation writes an index to the file to identify itself. i.e. the first model run would write a variable
myrun=1
the second
myrun=2
and so on... then when you cat the files the data can be uniquely identified very easily using this index.
Note that if your spatial dimensions are not unique and the number of time steps also changes from run to run from what you write, your index will need to be a function of all the non-unique dimensions, e.g. myrun(x,y,t). If any of your dimensions are unique across all files then that dimension is redundant in the index and can be omitted.
Of course, the only issue with this solution is it means running the simulations again :-D and you might be talking about an expensive model to run or someone else's runs you can't repeat. If rerunning is out of the question you will need to try to add an index offline...
Offline indexing (easy if grids are same, more complex otherwise)
IF your space dimensions were the same across all files, then this is still an easy task as you can add an index offline very easily across all the time steps in each file using nco:
ncap2 -s 'myrun[$time]=array(X,0,$time)' infile.nc outfile.nc
or if you are happy to overwrite the original file (be careful!)
ncap2 -O -s 'myrun[$time]=array(X,0,$time)'
where X is the run number. This will add a variable, with a new variable myrun which is a function of time and then puts X at each step. When you merge you can see which data slice was from which specific run.
By the way, the second zero is the increment, as this is set to zero the number X will be written for all timesteps in a given file (otherwise if it were 1, the index would increase by one each timestep - this could be useful in some cases. For example, you might use two indices, one with increment of zero to identify the run, and the second with an increment of unity to easily tell you which step of the Xth run the data slice belongs to).
If your files are for different domains too, then you might want to put them on a common grid before you do that... I think for that
cdo enlarge
might be of help, see this post : https://code.mpimet.mpg.de/boards/2/topics/1459
I agree that an index will be simpler than a filename. I would just add to the above answer that the command to add a unique index X with a time dimension to each input file can be simplified to
ncap2 -s 'myrun[$time]=X' in.nc out.nc

R save as NetCDF file after simple calculation

I want to do something (apparently) simple, but didn't yet find the right way to do it:
I read a netcdf file (wind speed from the ERA5 reanalysis) on a grid.
From this, I use the wind speed to calculate a wind capacity factor (using a given power curve).
I then want to write a new netcdf file, with exactly the same structure as the input file, but just replacing the input wind speed by the new variable (wind capacity factor).
Is there a simple/fast way to do this, avoiding to redefine all the dims, vars ... with ncvar_def and ncdim_def ?
Thanks in advance for your replies!
Writing a netcdf file in R is not overly complicated, there is a nice example online here:
http://geog.uoregon.edu/GeogR/topics/netCDF-write-ncdf4.html
You could copy the dimensions from the input file.
However if your wind power curve is a simple analytical expression then you could perform this task in one line from the command line in bash/linux using climate data operators (cdo).
For example, if you have two variables 10u and 10v in the file (I don't recalled the reanalysis names exactly) then you could make a new variable WCF=SQRT(U2+V2) in the following way
cdo expr,'wcf=sqrt(10u**2+10v**2)' input.nc output.nc
See an example here:
https://code.mpimet.mpg.de/boards/53/topics/1622
So if your window power function is an analytical expression you can define it this way without using R at all or worrying about dimensions etc, the new file will have an variable wcf added. You should then probably use NCO to alter the metadata (units etc) to ensure they are appropriate.

Masking a variable in GrADS

I am trying to plot a variable that is in a NetCDF file using GrADS and I would like to plot only the values that are smaller than -20 (could be any other number as an example). I can't find a way to do it though. I saw several examples of variable substitution using both maskoutand const (for example define ones = const(const(maskout(p,p-10),1),0,-u), which is here) but I couldn't make that work for my purposes.
I want the variable at a given point not plotted in case its value is below -20, and not this value change to another one in such point.
Solved it using the command maskout(p,p+20). The badly written documentation is what delayed me on this one. Answered here so that the community may benefit.
Thanks.

Missing Time Series data in Hadoop

I have a big text file (in TBs), every line has a timestamp and some other data, like this:
timestamp1,data
timestamp2,data
timestamp5,data
timestamp7,data
...
timestampN,data
This file is ordered by timestamp but there might be gaps between consecutive timestamps. I need to fill those gaps and write the new file.
Can this be done in Hadoop Map Reduce? The reason for asking this question,to interpolate the missing lines I need the previous and next lines too. For Eg. To interpolate timestamp6, I need the values in timestamp5 and timestamp7. So what if, starting from timestamp7 sits in another data block in which case I will not be able to calculate timestamp6 at all..
Any other algorithm/solution? Maybe this can not be done with mapreduce? Can we do this in RHADOOP?
(Pig/Hive solutions are also valid)
Though my suggestion is a bit tedious and may impact a little bit performance also. You can implement your own RecordReader and at the end of all lines in the current split, get the first line of next split using its block location. I am suggesting this because, hadoop itself do this if last line of any mapper is incomplete. Hope this helps!!

arcmap network analyst iteration over multiple files using model builder

I have 10+ files that I want to add to ArcMap then do some spatial analysis in an automated fashion. The files are in csv format which are located in one folder and named in order as "TTS11_path_points_1" to "TTS11_path_points_13". The steps are as follows:
Make XY event layer
Export the XY table to a point shapefile using the feature class to feature class tool
Project the shapefiles
Snap the points to another line shapfile
Make a Route layer - network analyst
Add locations to stops using the output of step 4
Solve to get routes between points based on a RouteName field
I tried to attach a snapshot of the model builder to show the steps visually but I don't have enough points to do so.
I have two problems:
How do I iterate this procedure over the number of files that I have?
How to make sure that every time the output has a different name so it doesn't overwrite the one form the previous iteration?
Your help is much appreciated.
Once you're satisfied with the way the model works on a single input CSV, you can batch the operation 10+ times, manually adjusting the input/output files. This easily addresses your second problem, since you're controlling the output name.
You can use an iterator in your ModelBuilder model -- specifically, Iterate Files. The iterator would be the first input to the model, and has two outputs: File (which you link to other tools), and Name. The latter is a variable which you can use in other tools to control their output -- for example, you can set the final output to C:\temp\out%Name% instead of just C:\temp\output. This can be a little trickier, but once it's in place it tends to work well.
For future reference, gis.stackexchange.com is likely to get you a faster response.

Resources