I would like to organise my data as a simple feature object using the sf package in R that has several geometric columns, for example a polygon, its centroid and its buffer, that all share the same attributes.
I have difficulties saving such an object to disk. For example this is what I get if I attempt to save it as a shapefile with st_write():
Error in clean_columns(as.data.frame(obj), factorsAsCharacter) :
list columns are only allowed with raw vector contents
Warning:
1: In abbreviate_shapefile_names(obj) :
Field names abbreviated for ESRI Shapefile driver
2: In clean_columns(as.data.frame(obj), factorsAsCharacter) :
Dropping column(s) nghbrhd,landscp of class(es) sfc_POLYGON;sfc,sfc_POLYGON;sfc
Of course I could split it into several objects that each have one geometry and all attributes but I find this more messy and error-prone that having all data in one file.
How can I (best) save such an object?
When saving objects via sf::st_write() you have to respect the context of your output object; two popular file formats - ESRI Shapefile / *.shp and OGC GeoPackage / *.gpkg - specifically allow only a single geometry per saved shapefile.
This is a feature, not a bug, of the output file format.
If your workflow requires saving of objects with multiple geometries consider either R specific format saveRDS() or if exchange with other spatial tools is required one of the formats that allow multiple geometries per object - PostgreSQL / PostGIS database is what I would use in such a situation.
Related
In essence I'm trying to do a relatively simple set of operations on a collection of netcdf4 files I've downloaded. They're sourced from ESA's Lakes Climate Change Initiative database of satellite-derived limnological data, and each netcdf4 file represents one day in a time series going back to the 2000s or earlier. Each netcdf4 file has a number of dimensions representing different variables of interest (surface temperature, chlorophyll-a concentration, etc).
Using the stars package I was hoping to geographically subset the dataset to only the lake I'm interested in and create a monthly aggregated time series for the lake for a number of those variables.
Unfortunately it seems as though there might be something wrong with the netcdf4 files ESA provided, as I'm running into odd errors while working with them in the 'stars' package in R.
For reproducibility's sake here's a link to the file in question - it should be the first file in the directory. If you download it to a directory of your choice and setwd() to it you should be able to repeat what I've managed here:
library(stars)
setwd()
lake_2015_01_01 <- read_stars('ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20150101-fv2.0.2.nc')
## Compare to using the read_ncdf4() function also from stars:
lake_2015_01_01_nc <- read_ncdf('ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20150101-fv2.0.2.nc')
Running the file through read_stars() produces the error:
Error in attr(x, "dimensions")[[along]] : subscript out of bounds
In addition: There were 50 or more warnings (use warnings() to see the first 50)
While running it through read_ncdf() produces the following:
Warning messages:
1: In CPL_crs_from_input(x) :
GDAL Error 1: PROJ: proj_create: Error 1027 (Invalid value for an argument): longlat: Invalid value for units
2: In value[3L] : failed to create crs based on grid mapping
and coordinate variable units. Will return NULL crs.
Original error:
Error in st_crs.character(base_gm): invalid crs: +proj=longlat +a=6378137 +f=0.00335281066474748 +pm=0 +no_defs +units=degrees
But it does successfully complete, just with a broken coordinate system that can be fixed by approximating the coordinate system originally set by the creators:
lake_2015_01_01_nc <- st_set_crs(lake_2015_01_01_nc, 4979)
However the functions stars uses to manipulate data don't work on it, as it's a proxy file that points to the original netcdf4. The architecture of the stars package would suggest that if I want to manipulate the file I need to use read_stars().
I've tried to reverse-engineer the problem a little bit by opening the .nc file in QGIS, where it seems to perform as expected. I'm able to display the data and it seems to be georeferenced correctly, which makes me suspect the data is not being read into 'stars' correctly in the read_ncdf() function, and likely in the read_stars() function as well.
I'm not sure how to go about fixing this however, and I'd like to use both this dataset and stars in the analysis. Does anyone have any insight as to what might be going on here and if it's something I can fix?
I am currently researching publicly available payer transparency files across multiple insurers and I am trying to parse and extract JSON files using R and output them into .CSV files to later use with SQL. The file I am currently working with contains nested tables within the highest table.
I have attached the specific file I am working with right now in a link below, along with the code to mount it into R's dataviewer. I have used R extensively in healthcare analytics classes for statistical analysis and machine learning; though, I have never used R for building out data tables.
My goal is to assign a primary key to the highest level of the table, apply foreign and primary keys to lower tables and extract the lower tables and join them onto eachother later to build out a large CSV or TXT file to load onto SQL.
So far, I have used the jsonlite and rjson packages to extract the JSON itself into R, but trying to delist an unnest the tables within the tables are an enigma to me even after extensive research. I also find myself running into problems with "subscript out of bounds", "unimplemented list errors" and other issues.
It could also very well be the case that the JSON is too large for R's packages or that the JSON is structurally flawed (I wouldn't know if it is, I am not accustomed to JSONs). It seems that this could be a problem better solved with Python, though I don't know how to use Python too well and I am optimistic in R given how powerful it is.
Any feedback or answers would be greatly appreciated.
JSON file link: https://individual.carefirst.com/carefirst-resources/machine-readable/Provider_Med_5.json
Code to load JSON:
json2 <- fromJSON('https://individual.carefirst.com/carefirst-resources/machine-readable/Provider_Med_5.json')
JSONs load correctly, but there are tables embedded within tables. I would hope that these tables could be easily exported and have keys for joining, but I can not figure out how to denest these tables from within the data.
Some nested tables are out of subscript bounds for the data array. I have never encountered this problem and am bewildered as to how to go about and resolve the issue.
I can not figure out how to 'extract' the lower level tables, let alone open them, due to the subscript boundary error.
I can assign row ID to the main/highest table in the file, but I can not figure out how to add sub row ID's to the lower table for future joins.
Maybe the jsonStrings package can help. It allows to manipulate JSON, without converting to a R object. That's the first time I try it on such a big JSON string and it works fine.
Here is how to get the table in the first element of the JSON array:
options(timeout = 300)
download.file(
"https://individual.carefirst.com/carefirst-resources/machine-readable/Provider_Med_5.json",
"jsonFile.json"
)
library(jsonStrings)
# load the JSON file
jstring <- jsonString$new("jsonFile.json")
# extract table "plans" of first element (indexed by 0)
jsonTable <- jstring$at(0, "plans")
# get a dataframe
library(jsonlite)
dat <- fromJSON(jsonTable$asString())
But the dataframe dat has a list column. I don't know how you want to make a CSV with this dataframe.
I am an absolute beginner in PostgreSQL and PostGIS (databases in general) but have a fairly good working experience in R. I have two multi-polygon data sets of vulnerable areas of India from two different sources - one is around 12gb and it's in .gdb format (let's call it mygdb) and the other is a shapefile around 2gb (let's call it myshp). I want to compare the two sets of vulnerability maps and generate some state-wise measures of fit using intersection (I), difference (D), and union (U) between the maps.
I would like to make use of PostGIS functionalities (via R) as neither R (crashes!) nor qgis (too slow) is efficient for this. To start with, I have uploaded both data sets in my PostGIS database. I used ogr2ogr in R to upload mygdb. But I am kind of stuck at this point. My idea is to split both polygon files by states and then apply other functions to get I, U and D. From my search, I think I can use sf functions like st_split, st_intersect, st_difference, and st_union. However, even after splitting, I would imagine that the file sizes will be still too large for r to process, so my questions are
Is my approach the best way forward?
How can I use sf::st_ functions (e.g. st_split, st_intersection) without importing the data from database into R
There are some useful answers to previous relevant questions, like this one for example. But I find it hard to put the steps together from different links and any help with a dummy example would be great. Many thanks in advance.
Maybe you could try loading it as a stars proxy. It doesn't load the file to the memory, it applies it directly to the hard drive.
https://r-spatial.github.io/stars/articles/stars2.html
Not answer for question sensu stricte, however in response to request in comment, an example of postgresql/postgis query for ST_Intersection. Based on OSM data in postgresql database imported with osm2pgsql:
WITH
highway AS (
select osm_id, way from planet_osm_line where osm_id = 332054927),
dln AS (
select osm_id, way from planet_osm_polygon where "boundary" = 'administrative'
and "admin_level" = '4' and "ref" = 'DS')
SELECT ST_Intersection(dln.way, highway.way) FROM highway, dln
I am processing hundreds of 4-band images in R and need help on what is probably a very simple task. As part of the processing, I need to export a single band RGB composite that maintains the spatial information of the original GeoTiff. In other software I've exported a .jgw file but I need to be able to do this in R. These images will be used as basemaps and fed into another mapping interface. I have searched and searched and can only find how to plotRGB() and how to writeRaster(). PlotRGB loses the spatial information and writeRaster() produces a multi-band image.
Any ideas? There is a built in raster in R that can be used.
library(raster)
library(rgdal)
r <- raster(system.file("external/test.grd", package="raster"))
x <- RGB(r)
plotRGB(x) #Is there a way to output this where it will maintain spatial information?
writeRaster(x, filename="file.tif") #This produces a 3-band tiff, not a composite
The writeRaster function can take an options argument to pass options to the underlying GDAL library (e.g., GeoTIFF options are documented here). The option TFW=YES writes out a .tfw world file which appears to be the same thing as a .jgw file.
Now, "composite RGB" isn't standard terminology in the TIFF world; it seems to be specific to "ArcMap" and friends, so it's hard to tell what's really meant by this, but you can generate what one would normally think of as a "standard" RGB TIFF format by specifying that the datatype for the color components be 1-byte unsigned integers (datatype="INT1U"), so the following may do what you want:
writeRaster(RGB(r), filename="file2.tif", datatype="INT1U",
options="TFW=YES", format="GTiff")
As far as I can tell, unrecognized or misspelled options values don't generate any error messages, so you need to be careful they're all spelled correctly.
Just noting an update to the process utilizing the terra package. Process is very similar but there are some different parameters.
r <- rast(system.file("ex/logo.tif", package="terra"))
# a little forced as RGB is already assigned in this image...
RGB(r) <- 1:3
# export as geotiff -- again force due to input file example...
writeRaster(x = r, filename = "rgb2.tif",datatype="INT1U",filetype = "GTiff")
I've been using with NAIP imagery successfully.
I'm having trouble extracting point data from a large shape file (916.2 Mb, 4618197 elements - from here: https://earthdata.nasa.gov/data/near-real-time-data/firms/active-fire-data) in R. I'm using readShapeSpatial in maptools to read in the shape file which takes a while but eventually works:
worldmap <- readShapeSpatial("shp_file_name")
I then have a data.frame of coordinates that I want extract data for. However R is really struggling with this and either loses connection or freezes, even with just one set of coordinates!
pt <-data.frame(lat=-64,long=-13.5)
pt<-SpatialPoints(pt)
e<-over(pt,worldmap)
Could anyone advise me on a more efficient way of doing this?
Or is it the case that I need to run this script on something more powerful (currently using a mac mini with 2.3 GHz processor)?
Many thanks!
By 'point data' do you mean the longitude and latitude coordinates? If that's the case, you can obtain the data underlying the shapefile with:
worldmap#data
You can view this in the same way you would any other data frame, for example:
View(worldmap#data)
You can also access columns in this data frame in the same way you normally would, except you don't need the #data, e.g.:
worldmap$LATITUDE
Finally, it is recommended to use readOGR from the rgdal package rather than maptools::readShapeSpatial as the former reads in the CRS/projection information.