Dimension problem while reading netcdf4 files in 'stars' R package - r

In essence I'm trying to do a relatively simple set of operations on a collection of netcdf4 files I've downloaded. They're sourced from ESA's Lakes Climate Change Initiative database of satellite-derived limnological data, and each netcdf4 file represents one day in a time series going back to the 2000s or earlier. Each netcdf4 file has a number of dimensions representing different variables of interest (surface temperature, chlorophyll-a concentration, etc).
Using the stars package I was hoping to geographically subset the dataset to only the lake I'm interested in and create a monthly aggregated time series for the lake for a number of those variables.
Unfortunately it seems as though there might be something wrong with the netcdf4 files ESA provided, as I'm running into odd errors while working with them in the 'stars' package in R.
For reproducibility's sake here's a link to the file in question - it should be the first file in the directory. If you download it to a directory of your choice and setwd() to it you should be able to repeat what I've managed here:
library(stars)
setwd()
lake_2015_01_01 <- read_stars('ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20150101-fv2.0.2.nc')
## Compare to using the read_ncdf4() function also from stars:
lake_2015_01_01_nc <- read_ncdf('ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20150101-fv2.0.2.nc')
Running the file through read_stars() produces the error:
Error in attr(x, "dimensions")[[along]] : subscript out of bounds
In addition: There were 50 or more warnings (use warnings() to see the first 50)
While running it through read_ncdf() produces the following:
Warning messages:
1: In CPL_crs_from_input(x) :
GDAL Error 1: PROJ: proj_create: Error 1027 (Invalid value for an argument): longlat: Invalid value for units
2: In value[3L] : failed to create crs based on grid mapping
and coordinate variable units. Will return NULL crs.
Original error:
Error in st_crs.character(base_gm): invalid crs: +proj=longlat +a=6378137 +f=0.00335281066474748 +pm=0 +no_defs +units=degrees
But it does successfully complete, just with a broken coordinate system that can be fixed by approximating the coordinate system originally set by the creators:
lake_2015_01_01_nc <- st_set_crs(lake_2015_01_01_nc, 4979)
However the functions stars uses to manipulate data don't work on it, as it's a proxy file that points to the original netcdf4. The architecture of the stars package would suggest that if I want to manipulate the file I need to use read_stars().
I've tried to reverse-engineer the problem a little bit by opening the .nc file in QGIS, where it seems to perform as expected. I'm able to display the data and it seems to be georeferenced correctly, which makes me suspect the data is not being read into 'stars' correctly in the read_ncdf() function, and likely in the read_stars() function as well.
I'm not sure how to go about fixing this however, and I'd like to use both this dataset and stars in the analysis. Does anyone have any insight as to what might be going on here and if it's something I can fix?

Related

How to read a stars object with over 32768 bands?

I have a very large dataset consisting of one attribute, simulated daily from 1970 to 2100, defined on a rather fine geographic grid. It has been given to me as a netCDF file, which I would like to read and analyze in an R script. The data is too big to fully fit in memory, so I wrote a script that does the analysis with stars proxy objects and the purrr package. It has worked for similar smaller datasets.
However, this dataset seems too big - there are 45956 bands, one for each time step., and it seems like the read_stars() command has an upper limit to how many bands an object can have. This is what my code looks like after loading the proper librairies, where data_path points to a single .nc file:
data_full <- read_stars(data_path, proxy = TRUE)
It returns the following:
Warning message:
In CPL_read_gdal(as.character(x), as.charater(options), as.characters(driver), :
GDAL Message 1 : Limiting number of bands to 32768 instead of 45956
Then the data is cropped and stops around 2050. I would like to have the full data in the data_full variable. Is is posible to increase the bands limits? Or are there alternative ways of doing this?
Try setting GDAL_MAX_BAND_COUNT to 65536
Python:
gdal.SetConfigOption('GDAL_MAX_BAND_COUNT',65536)
bash:
export GDAL_MAX_BAND_COUNT=65536

How to save a simple feature with several geometries?

I would like to organise my data as a simple feature object using the sf package in R that has several geometric columns, for example a polygon, its centroid and its buffer, that all share the same attributes.
I have difficulties saving such an object to disk. For example this is what I get if I attempt to save it as a shapefile with st_write():
Error in clean_columns(as.data.frame(obj), factorsAsCharacter) :
list columns are only allowed with raw vector contents
Warning:
1: In abbreviate_shapefile_names(obj) :
Field names abbreviated for ESRI Shapefile driver
2: In clean_columns(as.data.frame(obj), factorsAsCharacter) :
Dropping column(s) nghbrhd,landscp of class(es) sfc_POLYGON;sfc,sfc_POLYGON;sfc
Of course I could split it into several objects that each have one geometry and all attributes but I find this more messy and error-prone that having all data in one file.
How can I (best) save such an object?
When saving objects via sf::st_write() you have to respect the context of your output object; two popular file formats - ESRI Shapefile / *.shp and OGC GeoPackage / *.gpkg - specifically allow only a single geometry per saved shapefile.
This is a feature, not a bug, of the output file format.
If your workflow requires saving of objects with multiple geometries consider either R specific format saveRDS() or if exchange with other spatial tools is required one of the formats that allow multiple geometries per object - PostgreSQL / PostGIS database is what I would use in such a situation.

Extracting point data from a large shape file in R

I'm having trouble extracting point data from a large shape file (916.2 Mb, 4618197 elements - from here: https://earthdata.nasa.gov/data/near-real-time-data/firms/active-fire-data) in R. I'm using readShapeSpatial in maptools to read in the shape file which takes a while but eventually works:
worldmap <- readShapeSpatial("shp_file_name")
I then have a data.frame of coordinates that I want extract data for. However R is really struggling with this and either loses connection or freezes, even with just one set of coordinates!
pt <-data.frame(lat=-64,long=-13.5)
pt<-SpatialPoints(pt)
e<-over(pt,worldmap)
Could anyone advise me on a more efficient way of doing this?
Or is it the case that I need to run this script on something more powerful (currently using a mac mini with 2.3 GHz processor)?
Many thanks!
By 'point data' do you mean the longitude and latitude coordinates? If that's the case, you can obtain the data underlying the shapefile with:
worldmap#data
You can view this in the same way you would any other data frame, for example:
View(worldmap#data)
You can also access columns in this data frame in the same way you normally would, except you don't need the #data, e.g.:
worldmap$LATITUDE
Finally, it is recommended to use readOGR from the rgdal package rather than maptools::readShapeSpatial as the former reads in the CRS/projection information.

An error on Ripley's K function on spatstat

I would like some help with the Kest function on spatstat. I'm trying to calculate de Ripley's K for my simulate data. When I try to run the function the R show me some error, that says:
Error: 7 segments do not lie entirely inside the window.
But when I plot the map, all the points shows perfectly inside the observation window.
I am using the following codes:
library(spatstat)
library(sp)
library(maptools)
pemt.area <- readShapePoly("Fragment/Fragment_SIRGAS.shp")
pemt.area <- as(pemt.area, "owin")
plot(pemt.area)
tree.simu <- read.table("simulation.txt", h=T)
tree.simu
tree.simu.ppp<- ppp(x=tree.simu$X,y=tree.simu$Y,window=pemt.area)
plot(tree.simu.ppp)
ripley <- Kest(tree.simu.ppp, correction="isotropic", domain=pemt.area, var.approx=TRUE, ratio=TRUE)
plot(ripley, main = "K function")
I tryed to look up on the vignette but I had no success on fixing this problem.
The following link contains the data files. If you need something else, please let me know: https://drive.google.com/folderview?id=0B_cWshGkkAzjbHBnemNuMzh1Q1k&usp=sharing
Thank you, I really appreciate your help!
This is not related to Kest or ppp.
Notice that the error message is talking about segments, not points.
This is an error message from the spatstat function psp which creates a spatial pattern of line segments. The error indicates that the polygon data are not conforming to the requirements. It is most likely to be happening during the call to readShapePoly or as.owin. To track it down, you could type traceback() immediately after the error report, and this will show where the error is occurring.
First it would be wise to update all your packages.
I cannot reproduce your problem. Here is the K graph I generated.
In my case: OSX, R 3.1.0 . All I can suggest is to make sure your packages are up to date and to double-check that the files you're using are identical to the ones in your link.

How can I get around memory problems in R?

I'm applying the function image(cp) to gps data but when I do so it throws up the following error
Error in image(as(x, "SpatialGridDataFrame"), ...) :
error in evaluating the argument 'x' in selecting a method for function 'image': Error: cannot allocate vector of size 12.3 Mb
The SpatialPointsDataFrame of my relocation gps data has two columns. One with the coordinates, the other with the ID of the animal.
I'm running it on a 32 bit system with 4 gigs of RAM.
How do I get around this?
One way that might work with no thinking required:
library(raster)
r <- raster(cp)
image(r)
But, you say cp is "gps data" so it's not at all clear why this would be imageable.
One thing you can do is plot it:
plot(cp)
That will work for a SpatialPointsDataFrame. If you want to create an image from this somehow you'll need to specify some details.

Resources