I'm wondering if there is an easy way to write a CSV of a point sf object (sf R package) that includes the coordinates.
You can use st_write(input, "output.csv") and it will write a CSV without coordinates. My hack for writing a file with coordinates is:
coords <- st_coordinates(input)
input_dat <- input %>% st_set_geometry(., NULL)
input_dat <- cbind(input_dat, coords)
But it seems there must be a simpler way.
As requested, here is the setup for the code above:
input <- data.frame(ID = 1:10, longitude = rnorm(10), latitude = rnorm(10))
input <- st_as_sf(input, coords = c("longitude", "latitude"))
I was sent to the solution by Jakub Nowosad. He pointed me to this github issue which was solved by Etienne B. Racine.
Apparently GDAL has a flag that allows you to include the coordinates. So:
st_write(input, "output.csv", layer_options = "GEOMETRY=AS_XY")
You want a different treatment for POINT simple feature geometries from all other geometry types for something as basic as as.data.frame; I consider that feature creep. I think
cbind(as.data.frame(input), st_coordinates(input))
is easy enough, for this particular case.
st_write is meant to be your portal to all GDAL drivers and nothing more, it will do nothing with the data first, or manipulate GDAL settings by itself.
There should be an easier way, I agree. The as.data.frame() method for sp objects appends the coordinates and I feel like there should be one for sf objects too. But for now, how about:
input %>%
cbind(., st_coordinates(.)) %>%
st_set_geometry(NULL)
Related
I am new to programming in R and with .shp files.
I am trying to take a subsample / subset of a .shp file that is so big, you can download this file from here: https://www.ine.es/ss/Satellite?L=es_ES&c=Page&cid=1259952026632&p=1259952026632&pagename=ProductosYServicios%2FPYSLayout (select the year 2021 and then go ahead).
I have tried several things but none of them work, neither is it worth passing it to sf because it would simply add one more column called geometry with the coordinates listed and that is not enough for me to put it later in the leaflet package.
I have tried this here but it doesn't work for me:
myspdf = readOGR(getwd(), layer = "SECC_CE_20210101") #It works
PV2 = myspdf[myspdf#data$NCA == 'País Vasco', ] #Dont work
PV2 = myspdf[,myspdf#data$NCA == 'País Vasco'] #Dont work
What I intend is to create a sample of myspdf (with data, polygons, plotorder, bbox and proj4string) but I don't want it from all the NCA values (myspdf#data$NCA), I only want those in which data$NCA are 'País Vasco'
In short, I would like to have a sample for each value of the different NCA column.
Is that possible? someone can help me on this? thank you very much.
I have tried this too but the same thing as before appears to me, all 18 variables appear and all are empty:
Pais_V = subset(myspdf, NCA == 'País Vasco')
dim(Pais_V)
Here's one approach:
library(rgdal)
dlshape=function(shploc, shpfile) {
temp=tempfile()
download.file(shploc, temp)
unzip(temp)
shp.data <- sapply(".", function(f) {
fp <- file.path(temp, f)
return(readOGR(dsn=".",shpfile))
})
}
setwd("C:/temp")
x = dlshape(shploc="https://www2.census.gov/geo/tiger/GENZ2020/shp/cb_2020_us_aitsn_500k.zip", "cb_2020_us_aitsn_500k")
x<-x$. # extract the shapefile
mycats<-c("00","T2","T3","28")
x2<-subset(x, x$LSAD %in% mycats) # subset using the list `mycats`
mypal=colorFactor("Dark2",domain=x2$LSAD)
library(leaflet)
leaflet(x2) %>% addPolygons(weight=.2, color=mypal(x2$LSAD))
dlshape function courtesy of #yokota
Here's another option. This uses the package sf.
myspdf <- st_read("./_data/España_Seccionado2021/SECC_CE_20210101.shp",
as_tibble = T)
Now you can filter this data any way that you filter a data frame. It will still work as spatial data, as well.
Using tidyverse (well, technically dplyr):
myspdf %>% filter(NCA == "País Vasco")
This takes it from 36,334 observations to 1714 observations.
The base R method you tried to use with readOGR will work, as well.
myspdf[myspdf$NCA == "País Vasco",]
I need to add some points to the map using simple points function. The issue is that points don't add to the map. It's simple command, I follow some tutorial where adding points to the map works this way but not in my case. Plot function plots Texas choropleth properly but next line (points) doesn't add points to the map at all:
library(rgdal)
library(rgeos)
library(sp)
companies <- read.csv('geoloc_data_comp.csv', header = T, dec = ',', sep = ';')
states <- readOGR('.', 'states')
plot(states[states#data$stat_name == 'texas',])
points(companies$coords.x1, companies$coords.x2, pch = 21)
First you shoud start to avoid rgeos/rgdal because they will stop being maintains. See : https://github.com/r-spatial/evolution
sf is replacing them:
library(sp)
library(sf)
library(spData) #used because I wanted US states
# list of data in spData you have one with US states
data(package = "spData")
if you want to read shapefile or other GIS format check sf::st_read() (instead of readOGR())
# one way with sf
plot(us_states$geometry[us_states$NAME == "Texas"])
# if you want do use the sp way
us_sp <- as(us_states, "Spatial") # convert to sp
plot(us_sp[us_sp#data$NAME == "Texas",])
with sf you have the geometry in one column (see "geometry") instead of having an R S4 with nested lists (see #data and #polygones).
Before getting some points we need to check in which CRS our data are. If you do not know CRS I like this website : https://ihatecoordinatesystems.com/
You also have information in the us_states documentation: https://www.rdocumentation.org/packages/spData/versions/2.0.1/topics/us_states
Then you can use:
sp::proj4string(us_sp)
sf::st_crs(us_states)
# This is EPSG 4269 or NAD83
If you want to use points() they need to be in this coordinates system (I suspect this explain your trouble ie different CRS).
You didn't provide data points so I produced some:
library(osmdata)
#this will just download node matching the key/value place=city
some_city_in_texas <- osmdata::opq(osmdata::getbb("Texas US"),
nodes_only = TRUE) %>%
osmdata::add_osm_feature(key = "place", value = "city") %>%
osmdata::osmdata_sf() #keep them in sf format
# osmdata_sp() also exist
The class osmdata is a bit complicated but here you just need to know that some_city_in_texas$osm_points provide us with points (to test points()). Now we can check their CRS:
sf::st_crs(some_city_in_texas$osm_points)
As you can see we are in an other CRS so we need to transform it. (you will probably need to do it).
city_in_texas <- sf::st_transform(some_city_in_texas$osm_points,
4269)
sf use simple feature standard to store localization and points() want two vectors x&y. You should also check that (common cause of error): R use x/y (long/lat) and not lat/long.
Here we convert city_in_texas to just coords. (if you need to do the reverse, ie converting data frame with X/Y, into an sf object look at sf::st_as_sf())
coords_city <- sf::st_coordinates(city_in_texas)
Finally this works fine now:
plot(us_states$geometry[us_states$NAME == "Texas"])
points(coords_city, pch = 21)
Good ressources are https://r-spatial.org/ and https://geocompr.robinlovelace.net/
I have a large polyline shapefile that needs to be dissolved. However, the examples online only relate to polygons not polylines for example gUnaryUnion. I am reading in my shapefile using st_read from the sf package. Any help would be appreciated.
If I understand your question, one option is to use dplyr.
library(sf)
library(dplyr)
# get polyline road data
temp_shapefile <- tempfile()
download.file("https://www2.census.gov/geo/tiger/TIGER2017//ROADS/tl_2017_06075_roads.zip", temp_shapefile)
temp_dir <- tempdir()
unzip(temp_shapefile, exdir = temp_dir)
sf_roads <- read_sf(file.path(temp_dir,'tl_2017_06075_roads.shp'))
Use the RTTYP field to reduce the polyline from ~4000 unique segments to 6 segments.
sf_roads_summarized <- sf_roads %>%
group_by(RTTYP) %>%
summarize()
I manged to achieve this by using st_combine.
I'm using R to extract latitude and longitude from a data frame and then getting an address using reverse geocoding.
I have some toy data here:
latitude <- c(40.84935,40.76306,40.81423,40.63464,40.71054)
longitude <- c(-73.87119,-73.90235,-73.93443,-73.88090,-73.83765)
x = data.frame(latitude,longitude)
I write a function to do the actual geocoding:
require(ggmap)
get_address <- function(df){
long <- as.numeric(df$longitude)
lat <- as.numeric(df$latitude)
revgeocode(c(long,lat))
}
Then apply:
apply(x,1,get_address)
Using system.time(), this takes about a second. However, I plan to do this for data with over a million observations. If it's going to take a while to run, I don't mind, but since I'm fairly new to this, I never know whether long running times are just an inevitable part of getting the data or are due to poor function design. Is there an obvious way to significantly speed up this operation?
EDIT:
I learned from commenters that I'm going to be limited in the number of free requests (2,500/day) I can make. All of my data comes from New York, and the purpose is to match latitude/longitude coordinates with a borough name. Before I found out about the daily restrictions for free users, I had planned to get the address from Google using lat/long coordinates, extract the zip code from this address, then match the zip to a borough. Does anyone have suggestions on how to do this without using the Google Maps Geocoding API?
You could find a 'spatial' data source of the boroughs, then perform geometric operations to find points in polygons using the sf library
Setting up the data
Find a spatial data source. Here is one of the neighbourhoods in geojson format
library(sf)
sf <- sf::st_read("https://raw.githubusercontent.com/blackmad/neighborhoods/master/new-york-city-boroughs.geojson")
Convert your coordinates into a sf object. I've swapped your lat & lon column order.
latitude <- c(40.84935,40.76306,40.81423,40.63464,40.71054)
longitude <- c(-73.87119,-73.90235,-73.93443,-73.88090,-73.83765)
x = data.frame(longitude, latitude)
sf_x <- sf::st_as_sf(x, coords = c("longitude", "latitude"))
To perform spatial operations, the coordinate reference system needs to match between the two geometries
## set the cooridnate reference systesm to be the same
st_crs(sf_x) <- st_crs(sf)
use st_within to find the polygons (neighbourhoods) each point is in
Point-in-polygon calculation
res <- st_within(sf_x, sf) ## return the indexes of sf that sf_x are within
This gives you a sparse matrix of the indexes of the polygons that each point is in
## view the results
sapply(res, function(x) as.character(sf$name[x]))
# [1] "Bronx" "Queens" "Manhattan" "Brooklyn" "Queens"
Visual
Confirm with a plot
library(googleway)
x$neighbourhood <- sapply(res, function(x) as.character(sf$name[x]))
mapKey <- "map_api_key"
google_map(key = mapKey) %>%
add_markers(data = x, info_window = "neighbourhood")
Further Reading
Simple features vignette
As far as I know, Google's free API is limited to 2,500 requests per day. Nominatim alternatively is provided by OSM, but without any API in R. But for this amount of data I wouldn't consider a web service. Do you have a licence for ArcGIS?
Maybe you can also aggregate your function by avoiding the assignments like this:
require(ggmap)
get_address <- function(df){
revgeocode(c(as.numeric(df$longitude),as.numeric(df$latitude)))
}
I want to to convert two .shp files into one database that would allow me to draw the maps together.
Also, is there a way to convert .shp files into .csv files? I want to be able to personalize and add some data which is easier for me under a .csv format. What I have in mind if to add overlay yield data and precipitation data on the maps.
Here are the shapefiles for Morocco, and Western Sahara.
Code to plot the two files:
# This is code for mapping of CGE_Morocco results
# Loading administrative coordinates for Morocco maps
library(sp)
library(maptools)
library(mapdata)
# Loading shape files
Mor <- readShapeSpatial("F:/Purdue University/RA_Position/PhD_ResearchandDissert/PhD_Draft/Country-CGE/MAR_adm1.shp")
Sah <- readShapeSpatial("F:/Purdue University/RA_Position/PhD_ResearchandDissert/PhD_Draft/Country-CGE/ESH_adm1.shp")
# Ploting the maps (raw)
png("Morocco.png")
Morocco <- readShapePoly("F:/Purdue University/RA_Position/PhD_ResearchandDissert/PhD_Draft/Country-CGE/MAR_adm1.shp")
plot(Morocco)
dev.off()
png("WesternSahara.png")
WesternSahara <- readShapePoly("F:/Purdue University/RA_Position/PhD_ResearchandDissert/PhD_Draft/Country-CGE/ESH_adm1.shp")
plot(WesternSahara)
dev.off()
After looking into suggestions from #AriBFriedman and #PaulHiemstra and subsequently figuring out how to merge .shp files, I have managed to produce the following map using the following code and data (For .shp data, cf. links above)
code:
# Merging Mor and Sah .shp files into one .shp file
MoroccoData <- rbind(Mor#data,Sah#data) # First, 'stack' the attribute list rows using rbind()
MoroccoPolys <- c(Mor#polygons,Sah#polygons) # Next, combine the two polygon lists into a single list using c()
summary(MoroccoData)
summary(MoroccoPolys)
offset <- length(MoroccoPolys) # Next, generate a new polygon ID for the new SpatialPolygonDataFrame object
browser()
for (i in 1: offset)
{
sNew = as.character(i)
MoroccoPolys[[i]]#ID = sNew
}
ID <- c(as.character(1:length(MoroccoPolys))) # Create an identical ID field and append it to the merged Data component
MoroccoDataWithID <- cbind(ID,MoroccoData)
MoroccoPolysSP <- SpatialPolygons(MoroccoPolys,proj4string=CRS(proj4string(Sah))) # Promote the merged list to a SpatialPolygons data object
Morocco <- SpatialPolygonsDataFrame(MoroccoPolysSP,data = MoroccoDataWithID,match.ID = FALSE) # Combine the merged Data and Polygon components into a new SpatialPolygonsDataFrame.
Morocco#data$id <- rownames(Morocco#data)
Morocco.fort <- fortify(Morocco, region='id')
Morocco.fort <- Morocco.fort[order(Morocco.fort$order), ]
MoroccoMap <- ggplot(data=Morocco.fort, aes(long, lat, group=group)) +
geom_polygon(colour='black',fill='white') +
theme_bw()
Results:
New Question:
1- How to eliminate the boundaries data that cuts though the map in half?
2- How to combine different regions within a .shp file?
Thanks you all.
P.S: the community in stackoverflow.com is wonderful and very helpful, and especially toward beginners like :) Just thought of emphasizing it.
Once you have loaded your shapefiles into Spatial{Lines/Polygons}DataFrames (classes from the sp-package), you can use the fortify generic function to transform them to flat data.frame format. The specific functions for the fortify generic are included in the ggplot2 package, so you'll need to load that first. A code example:
library(ggplot2)
polygon_dataframe = fortify(polygon_spdf)
where polygon_spdf is a SpatialPolygonsDataFrame. A similar approach works for SpatialLinesDataFrame's.
The difference between my solution and that of #AriBFriedman is that mine includes the x and y coordinates of the polygons/lines, in addition to the data associated to those polgons/lines. I really like visualising my spatial data with the ggplot2 package.
Once you have your data in a normal data.frame you can simply use write.csv to generate a csv file on disk.
I think you mean you want the associated data.frame from each?
If so, it can be accessed with the # slot access function. The slot is called data:
write.csv( WesternSahara#data, file="/home/wherever/myWesternSahara.csv")
Then when you read it back in with read.csv, you can try assigning:
myEdits <- read.csv("/home/wherever/myWesternSahara_modified.csv")
WesternSahara#data <- myEdits
You may need to do some massaging of row names and so forth to get it to accept the new data.frame as valid. I'd probably try to merge the existing data.frame with a csv you read in in R, rather than making edits destructively....