I'm trying to perform a dissolve in R. I've previously done this in QGIS but I want to achieve this in R to integrate with the rest of my workflow if possible.
I have an ESRI shapefile with small geographical polygons (output areas, if you're familiar with UK census geography). I also have a lookup table provided to me with a list of all OA codes with their associated aggregated geography code.
I can't provide the actual files I'm working on, but comparable files and a minimal reproducable example below:
https://www.dropbox.com/s/4puoof8u5btigxq/oa-soa.csv?dl=1 (130kb csv)
https://www.dropbox.com/s/xqbi7ub2122q14r/soa.zip?dl=1 (~4MB shp)
And code:
require("rgdal") # for readOGR
require("rgeos") # for gUnion
require("maptools")
unzip("soa.zip")
soa <- readOGR(dsn = "soa", "england_oac_2011")
proj4string(soa) <- CRS("+init=epsg:27700") # British National Grid
lookup <- read.csv("oa-soa.csv")
slsoa <- gUnaryUnion(soa, id = lookup$LSOA11CD)
I've also tried:
slsoa <- unionSpatialPolygons(soa, lookup$$LSOA11CD)
but my understanding is that since I have (R)GEOS installed this uses the gUnion methods from the rgeos package anyway.
So, my problem is that the dissolve appears to work; I don't get an error message and the length() function suggests I now have fewer polygons:
length(soa#polygons) # 1,817
length(slsoa#polygons) # should be 338
but the plots appear to be the same (i.e. the internal dissolves haven't worked), as demonstrated by the following two plots:
plot(soa)
plot(slsoa)
I've looked around on the internet and stackoverflow to see if I can solve my issue and found several articles but without success.
problems when unioning and dissolving polygons in R (I don't think the quality of the shapefile is the problem because I'm using a lookup table to match geographies).
https://www.nceas.ucsb.edu/scicomp/usecases/PolygonDissolveOperationsR (uses two sp objects, not lookup table).
https://gis.stackexchange.com/questions/93441/problem-with-merging-and-dissolving-a-shapefile-in-r (as far as I can tell I've followed the relevant steps)
Does anyone have any idea what I'm doing wrong and why the plots aren't working correctly?
Thanks muchly.
First, your soa shapefile has 1817 elements, each with a unique code (corresponding to lookup$OA11CD). But your lookup file has only 1667 rows. Obviously, lookup does not have "a list of all OA codes".
Second, unless lookup has the same codes as your shapefile in the same order, using gUnaryUnion(...) this way will yield garbage. You need to merge soa#data with lookup on the corresponding fields first.
Third, gUnaryUnion(...) cannot remove internal boundaries if the polygons are not contiguous (obviously).
This seems to work
soa <- merge(soa,lookup,by.x="code",by.y="OA11CD",all.x=TRUE)
slsoa <- gUnaryUnion(soa,id=soa$LSOA11CD)
length(slsoa)
# [1] 338
par(mfrow=c(1,2),mar=c(0,0,0,0))
plot(soa);plot(slsoa)
Related
I have a shiny app that I am building with interactive maps in it. I am primarily using leaflet to do this.
I added new data (statistics and whatnot) to a SpatialPolygonsDataframe in the map#data section of the df (map being the SpatialPolygonDataframe).
map <- readOGR("FilePath","map")
#merge the processed dataframe with the maps data
map#data = merge(map#data, stats, by= 'name',
all = TRUE)
name
OBJECTID
ID_NUM
SHAPE_Leng
SHAPE_Area
Stat1
Stat2
First
1
3
x1
y1
.90
32
Second
2
4
x2
y2
.85
33
the R package Lealfet will not apply the correct data to the correct area on the map unless I write:
map#data <- map#data[order(map#data$OBJECTID),]
It needs to be sorted in descending order on this column OBJECTID to show the labels correctly.
Why does sorting map differently, mostly for labeling purposes, cause leaflet to label the shapes incorrectly? (This question is for understanding this type of data and package better).
All of the examples I have seen don't have this is issue. And they either hide or dont have code that sort the data by descending or ascending in the legend.
This sorting affects the legend too.
How can I sort the legend (which shows the data I added to df) how I need it to? (be it through leaflet or map through dplyr)
It seems you are using the older {sp} package workflow. It does not have so strict pairing between data (in the #data slot) and geometry as the newer {sf} package. {sp} is older, and rather risky (for exactly this reason) - and while it is maintained (mainly to give time to users of existing workflows to migrate) it is not actively developed - the action happens in {sf}. If you have your packages at current version you should be seeing a big fat deprecation warning when you load {rgdal} (where rgdal::readOGR() lives).
I suggest you update your workflow to {sf}, which has much tighter pairing between geometry (in a special list column) and regular tabular data (other columns).
I have two spatial datasets. One dataset contains lots polygons (more than 150k in total) specifying different features, like rivers, vegetation. The other dataset contains much less polygons (500) specifying different areas.
I need to intersect those two datasets to get the features in the different areas.
I can subset the first dataset by the different features. If I use a subset from a small feature (2,500 polygons) the intersection with the areas is quite fast (5min). But if I want to interest a bigger feature subset (20,000 polygons) the computation runs really long (I terminated it after two hours). And this is not even the biggest feature (50,000 polygons) I need to intersect.
This is the code snipped I run:
clean_intersect_save = function(geo_features, areas) {
# make geometries valid
data_valid_geoms = st_parallel(sf_df = st_geometry(geo_features),
sf_func = st_make_valid,
n_cores = 4)
# remove unnecessary columns
data_valid = st_drop_geometry(x) %>% select("feature")
data_valid = st_sf(data_clean, geometry = data_valid_geoms)
# intersect the geo-features and areas
data_valid_split = st_parallel(sf_df = bezirke,
sf_func = st_intersection,
n_cores = 4,
data_clean)
# save shp file
st_write(data_valid_split, "data_valid_splir.shp")
return(data_valid_split)
}
Where both inputs are sf data frames.
st_parallel is a function I found
here.
My question is: How would experienced spatial data people solve such a task usually? Do I just need more cores and/or more patience? Am I using sf wrong? Is R/sf the wrong tool?
Thanks for any help.
This is my very first spatial data analysis project, so sorry if I oversee some obvious thinks.
As there probably won´t come a real answer to this vague question I will answer it on my own.
Thanks #Chris and #TimSalabim for the help. I ended up with a combination of both ideas.
I ended up using PostGIS which is from my experience a pretty intuitive way to work with spatial data.
The three things which speeded up the calculations of intersection for me are:
In my chase the spatial data was stored in MULTIPOLYGONS when loading from shapefile. I expanded those into POLYGONS using ST_DUMP:
https://postgis.net/docs/ST_Dump.html
I created a Spatial Index on the POLYGONS: https://postgis.net/workshops/postgis-intro/indexing.html
I used a combination of ST_Intersection and ST_Intersects to only call the costly ST_Intersection when realy needed (As #TimSalabim suggested, this approach could also speed up things in R....But I currently have no time to test this approach): https://postgis.net/2014/03/14/tip_intersection_faster/
I'm having trouble extracting point data from a large shape file (916.2 Mb, 4618197 elements - from here: https://earthdata.nasa.gov/data/near-real-time-data/firms/active-fire-data) in R. I'm using readShapeSpatial in maptools to read in the shape file which takes a while but eventually works:
worldmap <- readShapeSpatial("shp_file_name")
I then have a data.frame of coordinates that I want extract data for. However R is really struggling with this and either loses connection or freezes, even with just one set of coordinates!
pt <-data.frame(lat=-64,long=-13.5)
pt<-SpatialPoints(pt)
e<-over(pt,worldmap)
Could anyone advise me on a more efficient way of doing this?
Or is it the case that I need to run this script on something more powerful (currently using a mac mini with 2.3 GHz processor)?
Many thanks!
By 'point data' do you mean the longitude and latitude coordinates? If that's the case, you can obtain the data underlying the shapefile with:
worldmap#data
You can view this in the same way you would any other data frame, for example:
View(worldmap#data)
You can also access columns in this data frame in the same way you normally would, except you don't need the #data, e.g.:
worldmap$LATITUDE
Finally, it is recommended to use readOGR from the rgdal package rather than maptools::readShapeSpatial as the former reads in the CRS/projection information.
I've dug around a lot for an answer to this and wasn't able to find anything, so here I am.
I have a whole bunch of ascii raster files corresponding to air temperature and dew point temperature of a certain area over 744 hourly time steps. (So I have 744 air temp and 744 dew point files corresponding to a 31-day month). The files are only about 45 kB each.
I want to stack them together so I can perform some analyses on them, and I also want to convert their units from K to deg F.
The file names air Tair1.txt, Tair2.txt, ... Tair744.txt and Eair1.txt, Eair2.txt, ... Eair744.txt.
Using the raster package, I can easily load all the files as rasters:
for (i in 1:744) {
assign(paste0("Tair",i),raster(paste0("Tair",i,".txt")))
assign(paste0("Eair",i),raster(paste0("Tair",i,".txt")))
}
I've tried to use ls() with pattern or glob2rx to define just the raster file names and
then do conversions on them, or to do something similar to join them in a stack, but to no avail. I also tried mget, values(mget(filename)) and things like that to get at the values in a loop.
I know R doesn't handle large datasets very well, but I'm thinking these aren't really that large so there should be something pretty simple?
I would appreciate any help and advice! Thank you.
The raster package's RasterStack is for this:
library(raster)
files <- paste0("Tair",1:744,".txt")
rs <- stack(files)
Why do you have these files in text format though? Who imposed this disaster on you? I suspect your individual layers have insufficient metadata, so try one and see if it's sensible. You can use extent(rs) <- and projection(rs) <- to fix:
r <- raster(files[1])
print(r)
Don't use assign() that's just creating a mess.
My goal is very simple, namely to add 1 column of statistical data to a shapefile so that I can use it for example to colour a geographical area. The data are a country file from gadm. To this end I usually use the foreign package in R thus:
library(foreign)
newdbf <- read.dbf("CHN_adm1.dbf") #original shape file
incrdata <- read.csv("CHN_test.csv") #.csv file with same region names column + new data column
mergedbf <- merge(newdbf,incrdata)
write.dbf(mergedbf,"CHN_New")
This achieves what I want in almost all circumstances, but one of the pieces of software I am dealing with external to R will only recognize .shp files and will not read .dbf (although clearly in a sense that statement is a slight contradiction). Not sure why it won't. Anyhow, essentially it leaves me needing to do the same thing as above, but with a shapefile. I think that according to notes on shapefiles package, the process should run something like this:
library(shapefiles)
shaper <- read.shp("CHN_adm1.shp")
simplified <- convert.to.simple(shaper)
simplified <- change.id(simplified,incrdata$DataNew) #DataNew being new column of data from the .csv
simpleAsList <- by(simplified,simplified[,1],function(x)x)
####This is where I hit problems####
backToShape <- convert.to.shapefile(simplified,
data.frame(index=c("20","30","40","50","60","70","80")),"index",5)
write.shapefile(backToShape,"CHN_TestShape")
I'm afraid that I can't get my head around shapefiles, since I can't unpick them or visualize them in a way I can with dataframes, and so the resultant shape has been screwed up when it goes back to the external charting package.
To be clear: in 'backToShape' I just want to add the column of data and reconstruct the shapefile. It so happens that the data I have appears as a factor, ie 20,30,40 etc, but the data could just as easily be continuous, and I'm sure I don't need to type in all possibilities, but it was the only way I could seem to get it to be accepted. Can somebody please put me on the right track, and if I'm missing a simpler way, I'd also be extremely grateful to hear a suggestion. Many thanks in advance.
Stop using the shapefiles package.
Install the sp and rgdal packages.
Read shapefile with:
chn = readOGR(".","CHN_adm1") # first arg is path, second is shapefile name w/o .shp
Now chn is like a data frame. In fact chn#data is a data frame. Do what you like to that data frame but keep it in the same order, and then you can save the updated shapefile with the new data by:
writeOGR(chn, ".", "CHN_new", driver="ESRI Shapefile")
Note you shouldn't really manipulate the chn#data data frame directly, you can work with chn like it is a data frame in many respects, for example chn$foo gets the column named foo, or chn$popden = chn$pop/chn$area would create a new column of population density if you have population and area columns.
spplot(chn, "popden")
will map by the popden column you just created, and:
head(as.data.frame(chn))
should show you the first few lines of the shapefile data.