I am using the osmdata package to extract data from Open Street Map (OSM) and turn it into a sf object. Unfortunately, I have not found a way to get the encoding right using the functions of the osmdata and sf package. Currently, I am changing the encoding afterwards via the Encoding function, which is quite cumbersome because it involves a nested loop (over all data frames contained in the object returned from Open Street Map, and over all character columns within these data frames).
Is there a more generic, nicer way to get the encoding right?
The following code shows the problem. It extracts OSM data on pharmacy's in the German city Neumünster:
library(osmdata)
library(sf)
library(purrr)
results <- opq(bbox = "Neumünster, Germany") %>%
add_osm_feature(key = "amenity", value = "pharmacy") %>%
osmdata_sf()
pharmacy_points <- results$osm_points
head(pharmacy_points$addr.city)
My locale and encoding is set as follows:
My current, but unsatisfactory, solution is the following:
encode_osm <- function(list){
# For all data frames in query result
for (df in (names(list)[map_lgl(list, is.data.frame)])) {
last <- length(list[[df]])
# For all columns except the last column ("geometry")
for (col in names(list[[df]])[-last]){
# Change the encoding to UTF8
Encoding(list[[df]][[col]]) <- "UTF-8"
}
}
return(list)
}
results_encoded <- encode_osm(results)
Related
I am new to programming in R and with .shp files.
I am trying to take a subsample / subset of a .shp file that is so big, you can download this file from here: https://www.ine.es/ss/Satellite?L=es_ES&c=Page&cid=1259952026632&p=1259952026632&pagename=ProductosYServicios%2FPYSLayout (select the year 2021 and then go ahead).
I have tried several things but none of them work, neither is it worth passing it to sf because it would simply add one more column called geometry with the coordinates listed and that is not enough for me to put it later in the leaflet package.
I have tried this here but it doesn't work for me:
myspdf = readOGR(getwd(), layer = "SECC_CE_20210101") #It works
PV2 = myspdf[myspdf#data$NCA == 'País Vasco', ] #Dont work
PV2 = myspdf[,myspdf#data$NCA == 'País Vasco'] #Dont work
What I intend is to create a sample of myspdf (with data, polygons, plotorder, bbox and proj4string) but I don't want it from all the NCA values (myspdf#data$NCA), I only want those in which data$NCA are 'País Vasco'
In short, I would like to have a sample for each value of the different NCA column.
Is that possible? someone can help me on this? thank you very much.
I have tried this too but the same thing as before appears to me, all 18 variables appear and all are empty:
Pais_V = subset(myspdf, NCA == 'País Vasco')
dim(Pais_V)
Here's one approach:
library(rgdal)
dlshape=function(shploc, shpfile) {
temp=tempfile()
download.file(shploc, temp)
unzip(temp)
shp.data <- sapply(".", function(f) {
fp <- file.path(temp, f)
return(readOGR(dsn=".",shpfile))
})
}
setwd("C:/temp")
x = dlshape(shploc="https://www2.census.gov/geo/tiger/GENZ2020/shp/cb_2020_us_aitsn_500k.zip", "cb_2020_us_aitsn_500k")
x<-x$. # extract the shapefile
mycats<-c("00","T2","T3","28")
x2<-subset(x, x$LSAD %in% mycats) # subset using the list `mycats`
mypal=colorFactor("Dark2",domain=x2$LSAD)
library(leaflet)
leaflet(x2) %>% addPolygons(weight=.2, color=mypal(x2$LSAD))
dlshape function courtesy of #yokota
Here's another option. This uses the package sf.
myspdf <- st_read("./_data/España_Seccionado2021/SECC_CE_20210101.shp",
as_tibble = T)
Now you can filter this data any way that you filter a data frame. It will still work as spatial data, as well.
Using tidyverse (well, technically dplyr):
myspdf %>% filter(NCA == "País Vasco")
This takes it from 36,334 observations to 1714 observations.
The base R method you tried to use with readOGR will work, as well.
myspdf[myspdf$NCA == "País Vasco",]
I'm working with a feature class dataset extracted from a geodatabase, which I've filtered to my area of interest and intersected with a SpatialPointsDataFrame. In order to export it to a shapefile with WriteOGR I need to format the attribute names and I also want to only select specific columns to export in my final shapefile. I have been running into a lot of errors using standard select or base R subletting techniques. For some reason R doesn't seem to recognize the column names when I try to select. I've tried lots of different methods and can't figure out where I'm going wrong.
```bfcln%>%
+ select(STATEFP,DP2_HC03_V, DP2_HC03V.1)
Error in tolower(use) : object 'STATEFP' not found```
# create a spatial join between bf_pop and or_acs
#check CRS
```crsbf <- bf_pop#proj4string```
# change acs CRS to match bf_pop
```oracs_reprj <- spTransform(or_acs, crsbf)```
# join by spatial attributes
```bf_int <- raster::intersect(bf_pop, oracs_reprj)```
#truncate field names to 10 characters for ESRI formatting
```names(bf_int) <- strtrim(names(bf_int),10)```
#remove duplicates from attribute table
```bfcln <- bf_int[which(!duplicated(bf_int$id)), ]```
After failing with the select() method multiple times, I tried renaming columns.
# rename variables of interest
```bfcln1 <-bfcln%>%
select(DP2_HC03_V)%>%
rename(DP2_HC03_V=pcntunmar)%>%
select(DP2_HC03_V.1)%>%
rename(DP2_HC03_V.1=pcntirsh)
Error in tolower(use) : object 'DP2_HC03_V' not found```
To rename spatial files you'll need to install the package spdplyr.
Similarly to dplyr, you'd do:
df <- df %>%
rename(newName = oldName)
I've a GeoJson file for Peru and it's states (Departamentos in Spanish).
I can plot Peru's states using leaflet, but as the GeoJson file has not all the data I need, I'm thinking of converting it to a data.frame adding the columns of data I need then return it to GeoJson format for plotting.
Data: You can donwload the GeoJson data of Perú from here:
This is the data I'm using and I need to add it a Sales Column with a row for every state ("NOMBDEP" - 24 in total)
library(leaflet)
library(jsonlite)
library(dplyr)
states <- geojsonio::geojson_read("https://raw.githubusercontent.com/juaneladio/peru-geojson/master/peru_departamental_simple.geojson", what = "sp")
I thought of using "jsonlite" package to transform "GeoJson" to a data frame, but getting this error:
library(jsonlite)
states <- fromJSON(states)
Error: Argument 'txt' must be a JSON string, URL or file.
I was expecting that after having a data frame I could be able to do something like:
states$sales #sales is a vector with the sales for every department
states <- toJson(states)
You can use library(geojsonsf) to go to and from GeoJSON and sf
library(geojsonsf)
library(sf) ## for sf print methods
states <- geojsonsf::geojson_sf("https://raw.githubusercontent.com/juaneladio/peru-geojson/master/peru_departamental_simple.geojson")
states$myNewValue <- 1:nrow(states)
geo <- geojsonsf::sf_geojson(states)
substr(geo, 1, 200)
# [1] "{\"type\":\"FeatureCollection\",\"features\":[{\"type\":\"Feature\",\"properties\":{\"COUNT\":84,\"FIRST_IDDP\":\"01\",\"HECTARES\":3930646.567,\"NOMBDEP\":\"AMAZONAS\",\"myNewValue\":1},\"geometry\":{\"type\":\"Polygon\",\"coordinat"
.
You can see myNewValue is in the GeoJSON
You don't need to convert back and forth, you can just add another column to the states SPDF:
states <- geojsonio::geojson_read("https://raw.githubusercontent.com/juaneladio/peru-geojson/master/peru_departamental_simple.geojson", what = "sp")
states$sales <- abs(rnorm(nrow(states), sd=1000))
plot(states, col=states$sales)
Yields this image:
I have a large, un-organized XML file that I need to search to determine if a certain ID numbers are in the file. I would like to use R to do so and because of the format, I am having trouble converting it to a data frame or even a list to extract to a csv. I figured I can search easily if it is in a csv format. So , I need help understanding how to do convert it and extract it properly, or how to search the document for values using R. Below is the code I have used to try and covert the doc,but several errors occur with my various attempts.
## Method 1. I tried to convert to a data frame, but the each column is not the same length.
require(XML)
require(plyr)
file<-"EJ.XML"
doc <- xmlParse(file,useInternalNodes = TRUE)
xL <- xmlToList(doc)
data <- ldply(xL, data.frame)
datanew <- read.table(data, header = FALSE, fill = TRUE)
## Method 2. I tried to convert it to a list and the file extracts but only lists 2 words on the file.
data<- xmlParse("EJ.XML")
print(data)
head(data)
xml_data<- xmlToList(data)
class(data)
topxml <- xmlRoot(data)
topxml <- xmlSApply(topxml,function(x) xmlSApply(x, xmlValue))
xml_df <- data.frame(t(topxml),
row.names=NULL)
write.csv(xml_df, file = "MyData.csv",row.names=FALSE)
I am going to do some research on how to search within R as well, but I assume the file needs to be in a data frame or list to so either way. Any help is appreciated! Attached is a screen shot of the data. I am interested in finding matching entity id numbers to a list I have in a excel doc.
I want to to convert two .shp files into one database that would allow me to draw the maps together.
Also, is there a way to convert .shp files into .csv files? I want to be able to personalize and add some data which is easier for me under a .csv format. What I have in mind if to add overlay yield data and precipitation data on the maps.
Here are the shapefiles for Morocco, and Western Sahara.
Code to plot the two files:
# This is code for mapping of CGE_Morocco results
# Loading administrative coordinates for Morocco maps
library(sp)
library(maptools)
library(mapdata)
# Loading shape files
Mor <- readShapeSpatial("F:/Purdue University/RA_Position/PhD_ResearchandDissert/PhD_Draft/Country-CGE/MAR_adm1.shp")
Sah <- readShapeSpatial("F:/Purdue University/RA_Position/PhD_ResearchandDissert/PhD_Draft/Country-CGE/ESH_adm1.shp")
# Ploting the maps (raw)
png("Morocco.png")
Morocco <- readShapePoly("F:/Purdue University/RA_Position/PhD_ResearchandDissert/PhD_Draft/Country-CGE/MAR_adm1.shp")
plot(Morocco)
dev.off()
png("WesternSahara.png")
WesternSahara <- readShapePoly("F:/Purdue University/RA_Position/PhD_ResearchandDissert/PhD_Draft/Country-CGE/ESH_adm1.shp")
plot(WesternSahara)
dev.off()
After looking into suggestions from #AriBFriedman and #PaulHiemstra and subsequently figuring out how to merge .shp files, I have managed to produce the following map using the following code and data (For .shp data, cf. links above)
code:
# Merging Mor and Sah .shp files into one .shp file
MoroccoData <- rbind(Mor#data,Sah#data) # First, 'stack' the attribute list rows using rbind()
MoroccoPolys <- c(Mor#polygons,Sah#polygons) # Next, combine the two polygon lists into a single list using c()
summary(MoroccoData)
summary(MoroccoPolys)
offset <- length(MoroccoPolys) # Next, generate a new polygon ID for the new SpatialPolygonDataFrame object
browser()
for (i in 1: offset)
{
sNew = as.character(i)
MoroccoPolys[[i]]#ID = sNew
}
ID <- c(as.character(1:length(MoroccoPolys))) # Create an identical ID field and append it to the merged Data component
MoroccoDataWithID <- cbind(ID,MoroccoData)
MoroccoPolysSP <- SpatialPolygons(MoroccoPolys,proj4string=CRS(proj4string(Sah))) # Promote the merged list to a SpatialPolygons data object
Morocco <- SpatialPolygonsDataFrame(MoroccoPolysSP,data = MoroccoDataWithID,match.ID = FALSE) # Combine the merged Data and Polygon components into a new SpatialPolygonsDataFrame.
Morocco#data$id <- rownames(Morocco#data)
Morocco.fort <- fortify(Morocco, region='id')
Morocco.fort <- Morocco.fort[order(Morocco.fort$order), ]
MoroccoMap <- ggplot(data=Morocco.fort, aes(long, lat, group=group)) +
geom_polygon(colour='black',fill='white') +
theme_bw()
Results:
New Question:
1- How to eliminate the boundaries data that cuts though the map in half?
2- How to combine different regions within a .shp file?
Thanks you all.
P.S: the community in stackoverflow.com is wonderful and very helpful, and especially toward beginners like :) Just thought of emphasizing it.
Once you have loaded your shapefiles into Spatial{Lines/Polygons}DataFrames (classes from the sp-package), you can use the fortify generic function to transform them to flat data.frame format. The specific functions for the fortify generic are included in the ggplot2 package, so you'll need to load that first. A code example:
library(ggplot2)
polygon_dataframe = fortify(polygon_spdf)
where polygon_spdf is a SpatialPolygonsDataFrame. A similar approach works for SpatialLinesDataFrame's.
The difference between my solution and that of #AriBFriedman is that mine includes the x and y coordinates of the polygons/lines, in addition to the data associated to those polgons/lines. I really like visualising my spatial data with the ggplot2 package.
Once you have your data in a normal data.frame you can simply use write.csv to generate a csv file on disk.
I think you mean you want the associated data.frame from each?
If so, it can be accessed with the # slot access function. The slot is called data:
write.csv( WesternSahara#data, file="/home/wherever/myWesternSahara.csv")
Then when you read it back in with read.csv, you can try assigning:
myEdits <- read.csv("/home/wherever/myWesternSahara_modified.csv")
WesternSahara#data <- myEdits
You may need to do some massaging of row names and so forth to get it to accept the new data.frame as valid. I'd probably try to merge the existing data.frame with a csv you read in in R, rather than making edits destructively....