R save as NetCDF file after simple calculation - r

I want to do something (apparently) simple, but didn't yet find the right way to do it:
I read a netcdf file (wind speed from the ERA5 reanalysis) on a grid.
From this, I use the wind speed to calculate a wind capacity factor (using a given power curve).
I then want to write a new netcdf file, with exactly the same structure as the input file, but just replacing the input wind speed by the new variable (wind capacity factor).
Is there a simple/fast way to do this, avoiding to redefine all the dims, vars ... with ncvar_def and ncdim_def ?
Thanks in advance for your replies!

Writing a netcdf file in R is not overly complicated, there is a nice example online here:
http://geog.uoregon.edu/GeogR/topics/netCDF-write-ncdf4.html
You could copy the dimensions from the input file.
However if your wind power curve is a simple analytical expression then you could perform this task in one line from the command line in bash/linux using climate data operators (cdo).
For example, if you have two variables 10u and 10v in the file (I don't recalled the reanalysis names exactly) then you could make a new variable WCF=SQRT(U2+V2) in the following way
cdo expr,'wcf=sqrt(10u**2+10v**2)' input.nc output.nc
See an example here:
https://code.mpimet.mpg.de/boards/53/topics/1622
So if your window power function is an analytical expression you can define it this way without using R at all or worrying about dimensions etc, the new file will have an variable wcf added. You should then probably use NCO to alter the metadata (units etc) to ensure they are appropriate.

Related

How can I run inputs from R through a matlab function?

I have a matlab code that I would like to run 3 inputs through that I have in R (delta_x, cutoff_1, cutoff_2).
First I would like to know how to save these in order to run them through the matlab code. delta_x is an r dataframe with around 1000 lines of data in a single column. Cutoff 1 and 2 are both just integers
The code is much longer than that shown below so it would take far too long to convert into R and I am on a tight schedule and not very familiar with matlab.
Essentially, what I would like to do is run my 3 inputs that I have in r through the matlab code that I have been provided with. Is anyone aware of how I can do this? I have read something about matlabr package?? but not been able to figure it out.
Matlab code looks like the following for reference:
function [start_end]=plume_finder(delta_x,cutoff_1,cutoff_2)
[pks,pksloc]=findpeaks(delta_x);% find the peaks
valley=find(islocalmin(delta_x)==1);% find the valleys
list_va=delta_x(valley);%find the values at the valley
%There is two possibliltes, the valley presents first or the peak present
%first. So we have to use 'shift' to make adjustment for this.
if valley(1)>pksloc(1)
shift=0;
else
shift=1;
end
upperlim=min(length(pks),length(valley));% Find where to stop
%pksloc(:,2:end), indicating the whether this peak point could meet the
%requirements(2:whether the peak value greater than the cut-off, 3:whether
%the peak value has significant difference from valley point on its left,
%4:whether the peak value has significant difference from valley point on
%its right--> 1 stands for yes, and 0 stands for no)
for i=1:upperlim
if pks(i)<cutoff_1
pksloc(i,2)=0;
else
pksloc(i,2)=1;
end
end
for i=2:upperlim
if (pks(i)-list_va(i-1+shift))<=cutoff_2
pksloc(i,3)=0;
else
pksloc(i,3)=1;
end
end
In MATLAB, write a matlab script that can load ascii .csv files, process data and save results to new ascii .csv files.
In R, save the variables to ascii .csv files.
In R, invoke matlab script in batch mode through system() and wait for the generation of new processed files.
In R, load the processed data file, and do whatever is waiting for.
P.S.1. csv files are pretty straightforward but lose precision, HDF5 or SQLite formats can be used instead.
P.S.2. The whole thing can be turned around that some R script is invoked within matlab through file system and system calls. Or call both R and MATLAB from bash or python or something else.
P.S.3 It's also possible to pass data through a local network socket connection, I will leave the details for now.

How to identify each file of origin when concatinating many netcdf files with ncrcat?

I am concatenating 1000s of nc-files (outputs from simulations) to allow me to handle them more easily in Matlab. To do this I use ncrcat. The files have different sizes, and the time variable is not unique between files. The concatenate works well and allows me to read the data into Matlab much quicker than individually reading the files. However, I want to be able to identify the original nc-file from which each data point originates. Is it possible to, say, add the source filename as an extra variable so I can trace back the data?
Easiest way: Online indexing
Before we start, I would use an integer index rather than the filename to identify each run, as it is a lot easier to handle, both for writing and then for handling in the matlab programme. Rather than a simple monotonically increasing index, the identifier can have relevance for your run (or you can even write several separate indices if necessary (e.g. you might have a number for the resolution, the date, the model version etc).
So, the obvious way to do this that I can think of would be that each simulation writes an index to the file to identify itself. i.e. the first model run would write a variable
myrun=1
the second
myrun=2
and so on... then when you cat the files the data can be uniquely identified very easily using this index.
Note that if your spatial dimensions are not unique and the number of time steps also changes from run to run from what you write, your index will need to be a function of all the non-unique dimensions, e.g. myrun(x,y,t). If any of your dimensions are unique across all files then that dimension is redundant in the index and can be omitted.
Of course, the only issue with this solution is it means running the simulations again :-D and you might be talking about an expensive model to run or someone else's runs you can't repeat. If rerunning is out of the question you will need to try to add an index offline...
Offline indexing (easy if grids are same, more complex otherwise)
IF your space dimensions were the same across all files, then this is still an easy task as you can add an index offline very easily across all the time steps in each file using nco:
ncap2 -s 'myrun[$time]=array(X,0,$time)' infile.nc outfile.nc
or if you are happy to overwrite the original file (be careful!)
ncap2 -O -s 'myrun[$time]=array(X,0,$time)'
where X is the run number. This will add a variable, with a new variable myrun which is a function of time and then puts X at each step. When you merge you can see which data slice was from which specific run.
By the way, the second zero is the increment, as this is set to zero the number X will be written for all timesteps in a given file (otherwise if it were 1, the index would increase by one each timestep - this could be useful in some cases. For example, you might use two indices, one with increment of zero to identify the run, and the second with an increment of unity to easily tell you which step of the Xth run the data slice belongs to).
If your files are for different domains too, then you might want to put them on a common grid before you do that... I think for that
cdo enlarge
might be of help, see this post : https://code.mpimet.mpg.de/boards/2/topics/1459
I agree that an index will be simpler than a filename. I would just add to the above answer that the command to add a unique index X with a time dimension to each input file can be simplified to
ncap2 -s 'myrun[$time]=X' in.nc out.nc

arcmap network analyst iteration over multiple files using model builder

I have 10+ files that I want to add to ArcMap then do some spatial analysis in an automated fashion. The files are in csv format which are located in one folder and named in order as "TTS11_path_points_1" to "TTS11_path_points_13". The steps are as follows:
Make XY event layer
Export the XY table to a point shapefile using the feature class to feature class tool
Project the shapefiles
Snap the points to another line shapfile
Make a Route layer - network analyst
Add locations to stops using the output of step 4
Solve to get routes between points based on a RouteName field
I tried to attach a snapshot of the model builder to show the steps visually but I don't have enough points to do so.
I have two problems:
How do I iterate this procedure over the number of files that I have?
How to make sure that every time the output has a different name so it doesn't overwrite the one form the previous iteration?
Your help is much appreciated.
Once you're satisfied with the way the model works on a single input CSV, you can batch the operation 10+ times, manually adjusting the input/output files. This easily addresses your second problem, since you're controlling the output name.
You can use an iterator in your ModelBuilder model -- specifically, Iterate Files. The iterator would be the first input to the model, and has two outputs: File (which you link to other tools), and Name. The latter is a variable which you can use in other tools to control their output -- for example, you can set the final output to C:\temp\out%Name% instead of just C:\temp\output. This can be a little trickier, but once it's in place it tends to work well.
For future reference, gis.stackexchange.com is likely to get you a faster response.

Running mahout k means clustering command without converting input file to vectors

I have a dataset(300MB) on which I wish to run k means clustering using Mahout. The data is in a form of csv which contains only numerical values. Is it still necessary to input the file in vectorized format for the mahout k means command? If not, how can i run the k means command directly on my csv file without converting it to a vector format?
If your data is 300 MB, the answer is don't use Mahout at all.
Really ONLY EVER use Mahout when your data no longer fits into memory. Map Reduce is expensive, you only want to use it when you can't solve the problem without.

Extract certain values out of netCDF

I ve a netCDF file with 3 Dimensions. The first dimension is the longitude and reaches from 1-464. The second dimension is the latitude and reaches from 1-201. The third dimension is time and reaches from 1-5479.
Now I want to extract certain values out of the file. I think one can handle it with the start argument. I tried this command.
test = open.ncdf("rr_0.25deg_reg_1980-1994_v8.0.nc")
data = get.var.ncdf(test,start=c(1:464,1:201,1:365))
But somehow it doesnt work. Has anybody a solution?
Thanks in advance...
It looks like you are using the ncdf package in R. If you can, I recommend using the updated ncdf4 package, which is based on Unidata's netcdf version 4 library (link).
Back to your problem. I use the ncdf4 package, but I think the ncdf package works the same way. When you call the function get.var.ncdf, you also need to explicitly supply the name of the variable that you want to extract. I think you can get the names of the variables using names(test$var).
So you need to do something like this:
# Open the nc file
test = open.ncdf("rr_0.25deg_reg_1980-1994_v8.0.nc")
# Now get the names of the variables in the nc file
names(test$var)
# Get the data from the first variable listed above
# (May not fit in memory)
data = get.var.ncdf(test,varid=names(test$var)[1])
# If you only want a certain range of data.
# The following will probably not fit in memory either
# data = get.var.ncdf(test,varid=names(test$var)[1])[1:464,1:201,1:365]
For your problem, you would need to replace varid=names(test$var)[1] above with varid='VARIABLE_NAME', where VARIABLE_NAME is the variable you want to extract.
Hope that helps.
EDIT:
I installed the ncdf package on my system, and the above code works for me!
You could also do the extracting of timesteps/dates and locations outside of R before reading it into to R for plotting etc, by using CDO. This has the advantage that you can work directly in the coordinate space and specify timesteps or dates directly:
e.g.
cdo seldate,20100101,20121031 in.nc out.nc
cdo sellonlatbox,lon1,lon2,lat1,lat2 in.nc out.nc

Resources