How to format netCDF datasets to be compatible with Thredds OGC services - netcdf

I have a netcdf dataset produced from the NASA LIS model that i want to be able to show through WMS using a Thredds server. The specific dataset and thredds server can be found at https://tethys.byu.edu/thredds/catalog/testAll/anomaly/catalog.html where you can also download the dataset.
The dataset's variables all have time, ensemble, lat, and lon dimensions and a few variables have additional dimensions. There are corresponding variables for those dimensions. When i open the wms endpoint to view the xml, i see under layers that there is
<Layer>
<Title>LIS land surface model output</Title>
</Layer>
But no list of the variables beneath it. I can't find any documentation about required netcdf structure for Thredds and i've tried comparing this to other datasets that work to look for differences but i'm stuck. The catalog files are configured such that you can read .nc files, expose wms services, etc.
What do i need to change to make this file readable by thredds?

The THREDDS Data Server (TDS) ships with a WMS server called ncWMS as a plugin. The short answer is that I do not think ncWMS works for data with an ensemble dimension, as there does not appear to be a way of requesting an ensemble member through the getMap request.
If my understanding is incorrect, and ncWMS will support data with an ensemble dimension, then you will need to make sure netCDF-java will recognize the ensemble dimension/variable in your example dataset (which it currently does not). The first issue is that netCDF-java does not see the ensemble variable as a coordinate variable. To fix that, you can add a _CoordinateAxisType attribute to the ensemble variable to tell netCDF-Java that it is a coordinate variable. You can do this using NcML, such that you won't need to rewrite the file:
<?xml version="1.0" encoding="UTF-8"?>
<ncml:netcdf xmlns:ncml="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="/path_to_file/processed_LIS_HIST_201908010000.d01.nc">
<ncml:variable name="ensemble">
<ncml:attribute name="_CoordinateAxisType" value="Ensemble" />
</ncml:variable>
</ncml:netcdf>
However, the ensemble variable in your example dataset has two dimensions, [ensemble, time], which netCDF-Java does not currently handle. Surprisingly (probably because the time dimension has a size of 1), netCDF-Java and NcML can do the trick here once again with the addition of logicalReduce element to the NcML:
<?xml version="1.0" encoding="UTF-8"?>
<ncml:netcdf xmlns:ncml="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="/path_to_file/processed_LIS_HIST_201908010000.d01.nc">
<ncml:variable name="ensemble">
<ncml:attribute name="_CoordinateAxisType" value="Ensemble" />
<ncml:logicalReduce dimNames="time" />
</ncml:variable>
</ncml:netcdf>
At this point, netCDF-Java will be able to fully recognize the grids within your example dataset.

Related

Regridding operations on smaller subsets

I have a NetCDF file, let's say only a portion of South America as shown below.
I want to use a cdo operation remapnn to remap the NetCDF file from 0.25x0.25 to 0.05x0.05 resolution. For that I use the code mentioned below, but the results are not what I expect it to be. Can someone provide some clarity to this?
#cdo version 1.9.3
export REMAP_EXTRAPOLATE='off'
cdo remapnn,r7200x3600 test.nc test2.nc
I always seem to have this problem when remapping a subset of a global map but works fine for the complete global map. It seems like cdo operation assumes that I am always working with a complete global map.
Link to the data is added here.

how do i convert my image data to a format similar to the fashion-MNIST data

I'm new to Machine Learning so please bear with me with my novice question. I'm trying to train a model to recognize benthic foraminifera based on their detailed taxonomy... here is a sample of what foraminifera look like
I've been successful in doing this simply by loading my data using flow_images_from_directory (). However, i don't know how to explore the structure of the object usually generated by flow_images_from_directory. I will like to format my data-set similar to the structure of the Fashion MNIST data. So that it easy to us the modification of the code below. I have some experience with magick package
dataset_fashion_mnist()
c(train_images, train_labels) %<-% fashion_mnist$train
c(test_images, test_labels) %<-% fashion_mnist$test
so that i have something like set which would make it easier for me to understand especially the labeling part. Also, if possible, i want to be able to append other information from CSV file to the data-set. My data is already arranged in folders and sub-folders as follows
data/
train/
ammonia/ ### 102 pictures
ammonia001.tif
ammonia002.tif
...
elphidium/ ### 1024 pictures
elphidium001.jpg
elphidium002.jpg
...
test/
ammonia/ ### 16 pictures
ammonia001.jpg
ammonia002.jpg
...
elphidium/ ### 6 pictures
elphidium.jpg
elphidium.jpg
...
Any help or guide to materials will be highly appreciated.
I'll describe the steps you would go through on a high level.
Assuming you now have a training and testing set, both with all your classes reasonably balanced
load your images and extract the pixel values, normalize the values so they are between 0 and 1
if the images are of different sizes, you should pad them so they are all of the same size
if you are not using a method requiring 2D structure such as a CNN, you should also flatten the pixel values
Associate your images (in pixel form) with your class labels
Now you have a set of images of fixed size in pixel form with their associated class labels. You can then feed this into whatever model you are using
Hope this helps, let me know if you're confused by any part
Side note: from your sample, it looks like your dataset is heavily skewed - lots of elphidium examples but not a lot of ammonia examples. This will probably lead to problems later on. In general, you want a balanced number of examples between your classes.

Extracting point data from a large shape file in R

I'm having trouble extracting point data from a large shape file (916.2 Mb, 4618197 elements - from here: https://earthdata.nasa.gov/data/near-real-time-data/firms/active-fire-data) in R. I'm using readShapeSpatial in maptools to read in the shape file which takes a while but eventually works:
worldmap <- readShapeSpatial("shp_file_name")
I then have a data.frame of coordinates that I want extract data for. However R is really struggling with this and either loses connection or freezes, even with just one set of coordinates!
pt <-data.frame(lat=-64,long=-13.5)
pt<-SpatialPoints(pt)
e<-over(pt,worldmap)
Could anyone advise me on a more efficient way of doing this?
Or is it the case that I need to run this script on something more powerful (currently using a mac mini with 2.3 GHz processor)?
Many thanks!
By 'point data' do you mean the longitude and latitude coordinates? If that's the case, you can obtain the data underlying the shapefile with:
worldmap#data
You can view this in the same way you would any other data frame, for example:
View(worldmap#data)
You can also access columns in this data frame in the same way you normally would, except you don't need the #data, e.g.:
worldmap$LATITUDE
Finally, it is recommended to use readOGR from the rgdal package rather than maptools::readShapeSpatial as the former reads in the CRS/projection information.

How do I divide a very large OpenStreetMap file into smaller files in R without running out of memory?

I am currently looking to have map files that are no larger than the sizes of municipalities in Mexico (at largest, about 3 degrees longitude/latitude across). However, I have been running into memory issues (at the very least) when trying to do so. The file size of the OSM XML object is 1.9 GB, for reference.
library(osmar)
get.map.for.municipality<-function(province,municipality){
base.map.filename = 'OpenStreetMap/mexico-latest.osm'
#bounds.list is a list that contains the boundaries
bounds = bounds.list[[paste0(province,'*',municipality)]]
my.bbox = corner_bbox(bounds[1],bounds[2],bounds[3],bounds[4])
my.map.source = osmsource_file(base.map.filename)
my.map = get_osm(my.bbox,my.map.source)
return(my.map)
}
I am running this inside of a loop, but it can't even get past the first one. When I tried running it, my computer froze and I was only able to take a screenshot with my phone. The memory steadily inclined over the course of a few minutes, and then it shot up really quickly, and I was unable to react before my computer froze.
What is a better way of doing this? I expect to have to run this loop about 100-150 times, so any way that is more efficient in terms of memory would help. I would prefer not to download smaller files from an API service.
If necessary, I would be willing to use another programming language (preferably Python or C++), but I prefer to keep this in R.
I'd suggest not use R for that.
There are better tools for that job. Many ways to split, filter stuff from the command line or using a DBMS.
Here are some alternatives extracted from the OSM Wiki http://wiki.openstreetmap.org:
Filter your osm files using osmfilter: "osmfilter is used to filter OpenStreetMap data files for specific tags. You can define different kinds of filters to get OSM objects (i.e. nodes, ways, relations), including their dependent objects, e.g. nodes of ways, ways of relations, relations of other relations."
Clipping based on Polygons or borders using osmconvert: http://wiki.openstreetmap.org/wiki/Osmconvert#Applying_Geographical_Borders
You can write bash scripts for both osmfilter and osmconvert, but I'd recommend using a DBMS. Just import into PostGIS using osm2pgsql, and connect your R code with any Postgresql driver. This will optimize your read/write ops.

arcmap network analyst iteration over multiple files using model builder

I have 10+ files that I want to add to ArcMap then do some spatial analysis in an automated fashion. The files are in csv format which are located in one folder and named in order as "TTS11_path_points_1" to "TTS11_path_points_13". The steps are as follows:
Make XY event layer
Export the XY table to a point shapefile using the feature class to feature class tool
Project the shapefiles
Snap the points to another line shapfile
Make a Route layer - network analyst
Add locations to stops using the output of step 4
Solve to get routes between points based on a RouteName field
I tried to attach a snapshot of the model builder to show the steps visually but I don't have enough points to do so.
I have two problems:
How do I iterate this procedure over the number of files that I have?
How to make sure that every time the output has a different name so it doesn't overwrite the one form the previous iteration?
Your help is much appreciated.
Once you're satisfied with the way the model works on a single input CSV, you can batch the operation 10+ times, manually adjusting the input/output files. This easily addresses your second problem, since you're controlling the output name.
You can use an iterator in your ModelBuilder model -- specifically, Iterate Files. The iterator would be the first input to the model, and has two outputs: File (which you link to other tools), and Name. The latter is a variable which you can use in other tools to control their output -- for example, you can set the final output to C:\temp\out%Name% instead of just C:\temp\output. This can be a little trickier, but once it's in place it tends to work well.
For future reference, gis.stackexchange.com is likely to get you a faster response.

Resources