I have downloaded the Corine 2012 data about land coverage (available here), in order to use it for creating online a map via Shiny and Leaflet. I have uploaded the data in my PostgreSQL database and wish to use queries of data parts for my Shiny application. I began trying getting the data but it is quite slow and my main query leads to a 80 MB dataframe. How can I approach this differently in order to speed up getting the data and reduce its size? My code snippet (getting data for areas with coniferous plants) is:
library(RPostgreSQL)
library(postGIStools)
drv <- dbDriver("PostgreSQL") # loads the PostgreSQL driver
con <- dbConnect(drv, dbname = mydbname, host = myhost, port = myport,
user = myuser, password = mypassword)
# Getting data
coniferous <- get_postgis_query(con, "SELECT id, geom from mycorine WHERE code='312'",geom_name = "geom")
Thank you in advance!
i've had quite a lot of joy using rpostgis and sf when extracting large amounts of vector data from postgis into R. Also incorporate ST_Simplify to speed up geometry displays in Leaflet:
# set up connection
conn <- dbConnect("PostgreSQL",user="user",password="mypass",port=5432,dbname="postgis_name")
# dummy query (obviously), including a spatial subset and ST_Simplify to simplify geometry
qry <- "SELECT ST_Simplify(geom,60) AS geom FROM mytable WHERE geom && ST_MakeEnvelope(xmin, ymin, xmax, ymax, EPSG)"
the.data = st_read_db(conn, query=qry, geom="geom")
this will return simplified sf objects, which are read in as a data frame and are very quick to read into R.
The above query was against 600,000 polygons and subset by a bounding box that read in about 8,000 of them. It took 0.4 seconds. Obviously this could be done by attribute instead of spatial bounding box (query times may differ though).
https://cran.r-project.org/web/packages/sf/sf.pdf
You should always take into account how much data is reasonable to display and what level of geometrical detail is acceptable at your zoom level etc.
Related
Completely new to R, forgive me -
I'm trying to use R to create some historic OSM data thats stored as a sf (simple feature) within the R script. I'd like to export this data once called as a shapefile (or GEOJSON) readable by QGIS. The reason I am doing this is that it seems to me that extracting specific historic datasets via Rs osmdata package is the most efficient way of creating data of specific segments of OSM historic data for a given period in its editing history (Let me know if anyone have a faster way of doing this for country-sized batches of data for different years.)
My current code, using a sample dataset, looks like this:
library(osmdata)
library(sf)
q1 <- opq('Sevilla') %>%
add_osm_feature(key = 'highway', value = 'cycleway')
cway_sev <- osmdata_sp(q1)
sp::plot(cway_sev$osm_lines)
I'm getting two different types of errors:
When I add a specific datetime (like this: q1 <- opq('Sevilla',datetime = "2015-01-01T12:00:00Z") %>%), I'm getting this error:
Error in check_for_error(doc) : General overpass server error; returned:
The data included in this document is from www.openstreetmap.org. The data is made available under ODbL. runtime error: Query timed out in "query" at line 4 after 45 seconds.
Additionally, and I'm guessing more improtantly, when I add the function to convert from an SF to a SHP
(st_write(cway_sev, "sev_t_1.shp"))
I get this error:
Error in UseMethod("st_write") :
no applicable method for 'st_write' applied to an object of class "c('list', 'osmdata', 'osmdata_sp')"
Any advice? Again, complete R newbie here.
I can not help you with the timeout on a historic datetime; there may be a server side issue for all that I know (I get the same error, and your construction of the argument seems to follow documentation).
Regarding the other issues:
When working in the world of {sf} I suggest to download your data via osmdata_sf() call; it will be good for your sanity if you avoid mixing sf and sp worlds unless absolutely necessary.
The returned object will contain not only lines, but also points, polygons and (in your case empty) multi-type objects.
When working with cycle paths just select the osm_lines object to a new variable; it will contain Sevilla bike paths with geometry of type linestring.
After checking it visually you can now save it as a ESRI Shapefile; note that this is an ancient file format, based on Ashton Tate dBase IV (a DOS program for Pete's sake :) and as such allows data column names of only limited number of characters, hence the warning.
library(osmdata)
library(dplyr)
library(sf)
sevilla <- opq('Sevilla') %>%
add_osm_feature(key = 'highway', value = 'cycleway') %>%
osmdata_sf()
names(sevilla) # note the points, polygons, multilines and multipolygons
# [1] "bbox" "overpass_call" "meta" "osm_points" "osm_lines"
# [6] "osm_polygons" "osm_multilines" "osm_multipolygons"
# just a single geometry type
sevilla_lines <- sevilla$osm_lines
# visual check of lines geometry only / does this look right?
plot(st_geometry(sevilla_lines))
# this will work, with the lines only
st_write(sevilla_lines, "sevilla.shp")
# Writing layer `sevilla' to data source `sevilla.shp' using driver `ESRI Shapefile'
# Writing 555 features with 34 fields and geometry type Line String.
# Warning message:
# In abbreviate_shapefile_names(obj) :
# Field names abbreviated for ESRI Shapefile driver
I have a geopackage containing millions of traffic points as well as a highway zone layer. I would like to extract points inside the highway zones within a R-script.
It is possible to query the attribute data using the library(RSQLite) and dbGetQuery but it seems like the spatial extensions are not available. I'm probably attempting something that is better in another way. Below is a query retrieving data that works and a simple query to replicate the error I receive with any geometry functions.
library(RSQLite)
library(dbplyr)
library(sqldf)
#Connect to geopackage
con <- dbConnect(RSQLite::SQLite(), dbname = "Traffic_data.gpkg",
loadable.extensions = TRUE)
#Select friday traffic
Traffic_Friday <- dbGetQuery(con, "SELECT*
FROM Traffic_data_points
WHERE Day_ = 'Friday' ;")
#But if I include ST_Within:
Traffic_Friday <- dbGetQuery(con, "SELECT*
FROM Traffic_data_points as tp, highway_buf as hb
WHERE tp.Day_ = 'Friday' and ST_Within(tp.geom, hb.geom) ;")
I get this error: Error in result_create(conn#ptr, statement) : no such function: ST_Within. It is the same with all the geometry functions
Is it possible to do something like this?
The answer was straightforward in the end. Instead of using SQLite to connect to a geopackage, one can use the st_read function found in the SF library Link to sf_read. It also also allow for sql query in which the spatial functions are included.
library(sf)
Traffic_Friday <- st_read("Traffic_data.gpkg", "SELECT*
FROM Traffic_data_points as tp, highway_buf as hb
WHERE tp.Day_ = 'Friday' and ST_Within(tp.geom, hb.geom) ;")
I'm using R to extract latitude and longitude from a data frame and then getting an address using reverse geocoding.
I have some toy data here:
latitude <- c(40.84935,40.76306,40.81423,40.63464,40.71054)
longitude <- c(-73.87119,-73.90235,-73.93443,-73.88090,-73.83765)
x = data.frame(latitude,longitude)
I write a function to do the actual geocoding:
require(ggmap)
get_address <- function(df){
long <- as.numeric(df$longitude)
lat <- as.numeric(df$latitude)
revgeocode(c(long,lat))
}
Then apply:
apply(x,1,get_address)
Using system.time(), this takes about a second. However, I plan to do this for data with over a million observations. If it's going to take a while to run, I don't mind, but since I'm fairly new to this, I never know whether long running times are just an inevitable part of getting the data or are due to poor function design. Is there an obvious way to significantly speed up this operation?
EDIT:
I learned from commenters that I'm going to be limited in the number of free requests (2,500/day) I can make. All of my data comes from New York, and the purpose is to match latitude/longitude coordinates with a borough name. Before I found out about the daily restrictions for free users, I had planned to get the address from Google using lat/long coordinates, extract the zip code from this address, then match the zip to a borough. Does anyone have suggestions on how to do this without using the Google Maps Geocoding API?
You could find a 'spatial' data source of the boroughs, then perform geometric operations to find points in polygons using the sf library
Setting up the data
Find a spatial data source. Here is one of the neighbourhoods in geojson format
library(sf)
sf <- sf::st_read("https://raw.githubusercontent.com/blackmad/neighborhoods/master/new-york-city-boroughs.geojson")
Convert your coordinates into a sf object. I've swapped your lat & lon column order.
latitude <- c(40.84935,40.76306,40.81423,40.63464,40.71054)
longitude <- c(-73.87119,-73.90235,-73.93443,-73.88090,-73.83765)
x = data.frame(longitude, latitude)
sf_x <- sf::st_as_sf(x, coords = c("longitude", "latitude"))
To perform spatial operations, the coordinate reference system needs to match between the two geometries
## set the cooridnate reference systesm to be the same
st_crs(sf_x) <- st_crs(sf)
use st_within to find the polygons (neighbourhoods) each point is in
Point-in-polygon calculation
res <- st_within(sf_x, sf) ## return the indexes of sf that sf_x are within
This gives you a sparse matrix of the indexes of the polygons that each point is in
## view the results
sapply(res, function(x) as.character(sf$name[x]))
# [1] "Bronx" "Queens" "Manhattan" "Brooklyn" "Queens"
Visual
Confirm with a plot
library(googleway)
x$neighbourhood <- sapply(res, function(x) as.character(sf$name[x]))
mapKey <- "map_api_key"
google_map(key = mapKey) %>%
add_markers(data = x, info_window = "neighbourhood")
Further Reading
Simple features vignette
As far as I know, Google's free API is limited to 2,500 requests per day. Nominatim alternatively is provided by OSM, but without any API in R. But for this amount of data I wouldn't consider a web service. Do you have a licence for ArcGIS?
Maybe you can also aggregate your function by avoiding the assignments like this:
require(ggmap)
get_address <- function(df){
revgeocode(c(as.numeric(df$longitude),as.numeric(df$latitude)))
}
I am new to the world of spatial analysis using R. Using this link I have downloaded OSM data in .osm.pbf format. Then I used osm2pgsql tool to get data in PostgreSQL (PostGIS extension). Now I have several tables in my database and I want to access the polygons table in R and then perform spatial analysis on the polygon vector data. I have been searching around allot but am not able to import required data in R. I found this tutorial quite similar to what I am looking for but its in Python. I want to access polygon data from PostGIS using R.
Therefore, essentially I would like to know interaction of R with PostGIS. Can anybody recommend me any book on this topic? Since I couldn't find a blog or tutorial so far that works for me on my Windows 10 64-bit machine.
Thanks for your time and looking forward for the suggestions.
I have still not found a way to get required data form PostGIS using rgdal package available in R. Probably it is because of my OS issues. (I am not exactly sure as I am not an expert). But I have found an alternative to rgdal and it has done exactly what I wanted it to do. The code is as following:
library(RPostgreSQL)
library(rgeos)
library(sp)
# Load data from the PostGIS server
conn = dbConnect(
dbDriver("PostgreSQL"), dbname="dbNAME", host="localhost", port=5432,
user="username", password="pw"
)
strSQL = "SELECT osm_id, name, area, highway, railway, place, ST_AsText(way) AS wkt_geometry FROM table"
df = dbGetQuery(conn, strSQL)
#Geomtery column as R list
geo_col = df$wkt_geometry
polygon_list = suppressWarnings(lapply(geo_col, function(x){
x <- gsub("POLYGON\\(\\(", "", x)
x <- gsub("\\)", "", x)
x <- strsplit(x, ",")[[1]]
#Now each polygon has been parsed by removing POLYGON(( from the start and )) from the end
#Now for each POLYGON its xValues and yValues are to be extracted to for Polygon object
xy <- strsplit(x, " ")
v_xy = suppressWarnings(sapply(xy, function(p){
xValue = p[1]
yValue = p[2]
vec = c(xValue, yValue)
}))
#Now we have all x values in first column of v_xy and all y values in second column of v_xy
#Let us make the Polygon object now
p_xvalues = as.numeric(v_xy[1, ])
p_yvalues = as.numeric(v_xy[2, ])
p_object <- Polygon(cbind(p_xvalues, p_yvalues))
}))
#Now we have all of the polygons in polygon object format
#Let us join it with main data frame, i.e. df
df$object_polygon <- polygon_list
#View(df)
#Now Let us form SpatialPolygons() object out of it
Ps_list = list()
for (i in seq(nrow(df))) {
Ps_list[[i]] <- Polygons(polygon_list[i], ID=df[i,][1])
}
SPs = SpatialPolygons(Ps_list)
#Now FINALY its the time to form SpatialPolygonsDataFrame
row.names(df) = df$osm_id
SPDF = SpatialPolygonsDataFrame(Sr = SPs, data = df[, 1:6], match.ID = TRUE)
Therefore, essentially I had to write a parser to get the required data which readOGR() does it one line.
I dont know R, but very familiar with postgis. Postgis is just SQL functions, if you can use Select you can access postgis,
So if you want access polygon just create new function in postgres.
But if you want display polygons you need check with R what function are available, i usually use Openlayer javascript api to show my results in the webpage.
I have read so many threads and articles and I keep getting errors. I am trying to make a choropleth? map of the world using data I have from the global terrorism database. I want to color countries on a factor of nkills or just the number of attacks in that country.. I don't care at this point. Because there are so many countries with data, it is unreasonable to make any plots to show this data.
Help is strongly appreciated and if I did not ask this correctly I sincerely apologize, I am learning the rules of this website as I go.
my code (so far..)
library(maps)
library(ggplot2)
map("world")
world<- map_data("world")
gtd<- data.frame(gtd)
names(gtd)<- tolower(names(gtd))
gtd$country_txt<- tolower(rownames(gtd))
demo<- merge(world, gts, sort=FALSE, by="country_txt")
In the gtd data frame, the name for the countries column is "country_txt" so I thought I would use that but I get error in fix.by(by.x, x) : 'by' must specify a uniquely valid column
If that were to work, I would plot as I have seen on a few websites..
I have honestly been working on this for so long and I have read so many codes/other similar questions/websites/r handbooks etc.. I will accept that I am incompetent when it comes to R gladly for some help.
Something like this? This is a solution using rgdal and ggplot. I long ago gave up on using base R for this type of thing.
library(rgdal) # for readOGR(...)
library(RColorBrewer) # for brewer.pal(...)
library(ggplot2)
setwd(" < directory with all files >")
gtd <- read.csv("globalterrorismdb_1213dist.csv")
gtd.recent <- gtd[gtd$iyear>2009,]
gtd.recent <- aggregate(nkill~country_txt,gtd.recent,sum)
world <- readOGR(dsn=".",
layer="world_country_admin_boundary_shapefile_with_fips_codes")
countries <- world#data
countries <- cbind(id=rownames(countries),countries)
countries <- merge(countries,gtd.recent,
by.x="CNTRY_NAME", by.y="country_txt", all.x=T)
map.df <- fortify(world)
map.df <- merge(map.df,countries, by="id")
ggplot(map.df, aes(x=long,y=lat,group=group)) +
geom_polygon(aes(fill=nkill))+
geom_path(colour="grey50")+
scale_fill_gradientn(name="Deaths",
colours=rev(brewer.pal(9,"Spectral")),
na.value="white")+
coord_fixed()+labs(x="",y="")
There are several versions of the Global Terrorism Database. I used the full dataset available here, and then subsetted for year > 2009. So this map shows total deaths due to terrorism, by country, from 2010-01-01 to 2013-01-01 (the last data available from this source). The files are available as MS Excel download, which I converted to csv for import into R.
The world map is available as a shapefile from the GeoCommons website.
The tricky part of making choropleth maps is associating your data with the correct polygons (countries). This is generally a four step process:
Find a field in the shapefile attributes table that maps (no pun intended) to a corresponding field in your data. In this case, it appears that the field "CNTRY_NAME" in the shapefile maps to the field "country_txt" in gtd database.
Create an association between ploygon IDs (stored in the row names of the attribute table), and the CNTRY_NAME field.
Merge the result with your data using CNTRY_NAME and country_txt.
Merge the result of that with the data frame created using the fortify(map) - this associates ploygons with deaths (nkill).
Building on the nice work by #jlhoward. You could instead use rworldmap that already has a world map in R and has functions to aid joining data to the map. The default map is deliberately low resolution to create a 'cleaner' look. The map can be customised (see rworldmap documentation) but here is a start :
library(rworldmap)
#3 lines from #jlhoward
gtd <- read.csv("globalterrorismdb_1213dist.csv")
gtd.recent <- gtd[gtd$iyear>2009,]
gtd.recent <- aggregate(nkill~country_txt,gtd.recent,sum)
#join data to a map
gtdMap <- joinCountryData2Map( gtd.recent,
nameJoinColumn="country_txt",
joinCode="NAME" )
mapDevice('x11') #create a world shaped window
#plot the map
mapCountryData( gtdMap,
nameColumnToPlot='nkill',
catMethod='fixedWidth',
numCats=100 )
Following a comment from #hk47, you can also add the points to the map sized by the number of casualties.
deaths <- subset(x=gtd, nkill >0)
mapBubbles(deaths,
nameX='longitude',
nameY='latitude',
nameZSize='nkill',
nameZColour='black',
fill=FALSE,
addLegend=FALSE,
add=TRUE)