display a subset of regions using a shapefile in R - r

I have a shapefile of the UK: https://geoportal.statistics.gov.uk/Docs/Boundaries/Local_authority_district_(GB)_2014_Boundaries_(Generalised_Clipped).zip
I've read the shapefile into a variable, UK
>UK <- readOGR(dsn = "....."
>England <- UK
I'd like to only display English Local Authority regions. They are specified in the LAD_DEC_2014_GB_BGC.dbf where LAD14CD starts with "E"
>UK#data
LAD14CD LAD14NM LAD14NMW
0 E06000001 Hartlepool <NA>
1 E06000002 Middlesbrough <NA>
2 E06000003 Redcar and Cleveland <NA>
371 W06000015 Cardiff Caerdydd
>#filter UK#data and replace England#data with only English regions
>England#data <- UK#data$LAD14CD[c(grep("^E", UK$LAD14CD))]
>plot(England)
But the grep command appears to change the shapefile into a factor, meaning the plot looks like this:

With this command:
England <- UK#data$LAD14CD[c(grep("^E", UK$LAD14CD))]
...you are subsetting just one column from the data slot, not the whole shapefile and assigning that to England.
This ought to do the job:
England <- UK[grep("^E", UK#data$LAD14CD),]
Note, you need the trailing comma in there! Also you don't need to wrap the grep statement in c(), but that doesn't hurt it's just unnecessary.

I ended up using dplyr and grepl instead to make things simpler:
library('rgdal')
library('dplyr')
UK <- readOGR(dsn="LAD_DEC_2014_GB_BGC.shp", layer="LAD_DEC_2014_GB_BGC") %>%
subset(grepl("^E", LAD14CD))
plot(UK)

Related

Conditionally Fill a Column based on Another Column

I have a dataframe (df) where in one column I have US states by their two letter acronym; 'AK','AL','AR','AZ','CA', ..., 'WV','WY'.
I want to create a new column that reads the 'df$state' column and apply a region: West, Midwest, Northeast, Southeast, Southwest.
I have the regions broken down into lists (for example:
list_southwest <- c('TX','AZ','NM','OK')
I duplicated the 'df$state' column and renamed it 'df$region'. What I want to do is replace the two-letter state elements with regions and not do it state-by-state.
I have been successful with the code: df$region [df$region == 'TX'] <- "Southwest"
But I'd like to go faster, I tried: df$region [df$region == 'list_west'] <- "Southwest"
in an attempt to check the column for all the two-letter strings in "list_west" but I'm not getting anything replaced and I'm not receiving an error of any kind.
I've also tried the tedious:
df$region [df$region == 'TX', 'AZ', ... but r doesn't seem to like that, I've tried replacing the commas with |, &&, ||, and no luck.
I was thinking there might be a way to add a for loop and case_when(), and a lot of other things, but I'm stuck. Any help would be greatly appreciated!
Here's what I'm hoping for without having to run a line of code per each individual state:
state
region
AK
West
AL
South
AR
South
AZ
West
CA
West
CO
West
CT
NorthEast
SOLVED!!
Here's how the code looks after a comment to use %in% versus ==:
df$region [df$region %in% list_west] <- "West"

Merging (two and a half) countries from maps-package to one map object in R

I am looking for a map that combines Germany, Austria and parts of Switzerland together to one spatial object. This area should represent the German speaking areas in those three countries. I have some parts in place, but can not find a way to combine them. If there is a completely different solution to solve this problem, I am still interested.
I get the German and the Austrian map by:
require(maps)
germany <- map("world",regions="Germany",fill=TRUE,col="white") #get the map
austria <- map("world",regions="Austria",fill=TRUE,col="white") #get the map
Switzerland is more complicated, as I only need the 60-70% percent which mainly speak German. The cantones that do so (taken from the census report) are
cantonesGerman = c("Uri", "Appenzell Innerrhoden", "Nidwalden", "Obwalden", "Appenzell Ausserrhoden", "Schwyz", "Lucerne", "Thurgau", "Solothurn", "Sankt Gallen", "Schaffhausen", "Basel-Landschaft", "Aargau", "Glarus", "Zug", "Zürich", "Basel-Stadt")
The cantone names can used together with data from gadm.org/country (selecting Switzerland & SpatialPolygonsDataFrame -> Level 1 or via the direct link) to get the German-speaking areas from the gadm-object:
gadmCH = readRDS("~/tmp/CHE_adm1.rds")
dataGermanSwiss <- gadmCH[gadmCH$NAME_1 %in% cantonesGerman,]
I am now missing the merging step to get this information together. The result should look like this:
It represents a combined map consisting of the contours of the merged area (Germany + Austria + ~70% of Switzerland), without borders between the countries. If adding and leaving out the inter-country borders would be parametrizable, that would be great but not a must have.
You can that like this:
Get the polygons you need
library(raster)
deu <- getData('GADM', country='DEU', level=0)
aut <- getData('GADM', country='AUT', level=0)
swi <- getData('GADM', country='CHE', level=1)
Subset the Swiss cantons (here an example list, not the correct one); there is no need for a loop for such things in R.
cantone <- c('Aargau', 'Appenzell Ausserrhoden', 'Appenzell Innerrhoden', 'Basel-Landschaft', 'Basel-Stadt', 'Sankt Gallen', 'Schaffhausen', 'Solothurn', 'Thurgau', 'Zürich')
GermanSwiss <- swi[swi$NAME_1 %in% cantone,]
Aggregate (dissolve) Swiss internal boundaries
GermanSwiss <- aggregate(GermanSwiss)
Combine the three countries and aggregate
german <- bind(deu, aut, GermanSwiss)
german <- aggregate(german)

Shading counties using FIPS code in R map

I am looking for a way to shade counties on the US maps in R. I have list of numeric/char county FIPS code that I can input as parameter. I just need to highlight these counties -- so would just need to shade them and there are no values or variations corresponding to the counties. I tried to look up
library(choroplethr)
library(maps)
and
county_choropleth(df_pop_county)
head(df_pop_county)
region value
1 1001 54590
2 1003 183226
3 1005 27469
4 1007 22769
5 1009 57466
6 1011 10779
But these need a region, value pair. For e.g.,fips code and population in the above. Is there a way to call the county_choropleth function without having to use the values, just with the fipscode dataframe. In that way, I can my fips code with one color. What would be an efficient way to accomplish this in R using Choroplethr?
Here's an example using the maps library:
library(maps)
library(dplyr)
data(county.fips)
## Set up fake df_pop_county data frame
df_pop_county <- data.frame(region=county.fips$fips)
df_pop_county$value <- county.fips$fips
y <- df_pop_county$value
df_pop_county$color <- gray(y / max(y))
## merge population data with county.fips to make sure color column is
## ordered correctly.
counties <- county.fips %>% left_join(df_pop_county, by=c('fips'='region'))
map("county", fill=TRUE, col=counties$color)
Here's the resulting map:
Notice that counties with lower FIPS are darker, while counties with higher FIPS are lighter.

R "maps" package and choropleths

I would like to make a choropleth with the maps package in R. I have data which I have constructed to create bins and associate color names with those bins. Now, I need to use the col= argument to point the colors to the counties, in this example. How do I construct that argument? I would have thought that constructing a data frame would associate the county and color on the same line? Is that not true? So far I have the following
Example Data:
County | Value | Bin | Color
alamance | 100 | 1 | white
brunswick | 1000 | 2 | red
... through 100 counties
R code (which does not work):
library("maps")
DATA <- read.csv("~/Example_Data.csv")
DATA$County <- as.character(DATA$County)
DATA$Color <- as.character(DATA$Color)
NC <- map('county', 'north carolina', col= DATA$Color, Fill=TRUE)
So, after many iterations here is the essence of the solution. Instead of giving the R code which made it work (pretty bland), here are the rules that helped solve the problem.
The county.fips data included in the package has a column with all states and county names. This revealed the formatting of county name matches which are all lowercase, "state,county" with no spaces.
For the NC subset there are 102 entries, not 100, because Currituck County is subdivided into three entities. This was the source of most/all of the issues and was difficult to diagnose but easy to solve.
Solution 1 - Match a vector of colors to the vector of counties. 102 color entries IN THE PROPER ALPHA ORDER will produce a correctly resulting choropleth. Fastest, but also the least convenient if you were trying to do this for, say, all counties in the U.S.
Solution 2 - Add fips codes to original data and then match on fips. Since the county.fips file has Currituck entities listed as "north carolina,currituck:main", etc., this is still going to take some manipulation or finding an external fips reference. This is the method used in the maps() documentation, but which would have taken too long so I preferred the former. However, taking the time would allow you to approach a national dataset, for instance.

Simple way to subset SpatialPolygonsDataFrame (i.e. delete polygons) by attribute in R

I would like simply delete some polygons from a SpatialPolygonsDataFrame object based on corresponding attribute values in the #data data frame so that I can plot a simplified/subsetted shapefile. So far I haven't found a way to do this.
For example, let's say I want to delete all polygons from this world shapefile that have an area of less than 30000. How would I go about doing this?
Or, similarly, how can I delete Antartica?
require(maptools)
getinfo.shape("TM_WORLD_BORDERS_SIMPL-0.3.shp")
# Shapefile type: Polygon, (5), # of Shapes: 246
world.map <- readShapeSpatial("TM_WORLD_BORDERS_SIMPL-0.3.shp")
class(world.map)
# [1] "SpatialPolygonsDataFrame"
# attr(,"package")
# [1] "sp"
head(world.map#data)
# FIPS ISO2 ISO3 UN NAME AREA POP2005 REGION SUBREGION LON LAT
# 0 AC AG ATG 28 Antigua and Barbuda 44 83039 19 29 -61.783 17.078
# 1 AG DZ DZA 12 Algeria 238174 32854159 2 15 2.632 28.163
# 2 AJ AZ AZE 31 Azerbaijan 8260 8352021 142 145 47.395 40.430
# 3 AL AL ALB 8 Albania 2740 3153731 150 39 20.068 41.143
# 4 AM AM ARM 51 Armenia 2820 3017661 142 145 44.563 40.534
# 5 AO AO AGO 24 Angola 124670 16095214 2 17 17.544 -12.296
If I do something like this, the plot does not reflect any changes.
world.map#data = world.map#data[world.map#data$AREA > 30000,]
plot(world.map)
same result if I do this:
world.map#data = world.map#data[world.map#data$NAME != "Antarctica",]
plot(world.map)
Any help is appreciated!
looks like you're overwriting the data, but not removing the polygons. If you want to cut down the dataset including both data and polygons, try e.g.
world.map <- world.map[world.map$AREA > 30000,]
plot(world.map)
[[Edit 19 April, 2016]]
That solution used to work, but #Bonnie reports otherwise for a newer R version (though perhaps the data has changed too?):
world.map <- world.map[world.map#data$AREA > 30000, ]
Upvote #Bonnie's answer if that helped.
When I tried to do this in R 3.2.1, tim riffe's technique above did not work for me, although modifying it slightly fixed the problem. I found that I had to specifically reference the data slot as well before specifying the attribute to subset on, as below:
world.map <- world.map[world.map#data$AREA > 30000, ]
plot(world.map)
Adding this as an alternative answer in case others come across the same issue.
Just to mention that subset also makes the work avoiding to write the data's name in the condition.
world.map <- subset(world.map, AREA > 30000)
plot(world.map)
I used the above technique to make a map of just Australia:
australia.map < - world.map[world.map$NAME == "Australia",]
plot(australia.map)
The comma after "Australia" is important, as it turns out.
One flaw with this method is that it appears to retain all of the attribute columns and rows for all of the other countries, and just populates them with zeros. I found that if I wrote out a .shp file, then read it back in using readOGR (rgdal package), it automatically removes the null geographic data. Then I could write another shape file with only the data I want.
writeOGR(australia.map,".","australia",driver="ESRI Shapefile")
australia.map < - readOGR(".","australia")
writeOGR(australia.map,".","australia_small",driver="ESRI Shapefile")
On my system, at least, it's the "read" function that removes the null data, so I have to write the file after reading it back once (and if I try to re-use the filename, I get an error). I'm sure there's a simpler way, but this seems to work good enough for my purposes anyway.
As a second pointer: this does not work for shapefiles with "holes" in the shapes, because it is subsetting by index.

Resources