R: Gradient plot on a shapefile - r

I currently have a shapefile of the UK and have plot the population of species in different regions of the UK. So far I have just plotted 3 levels of species population and coloured them red=high, orange=med, green=low. But what I would like to do would be to have a gradient plot instead of being bounded by just 3 colours.
So far I have a table called Count that has the regions as the column names and then the count of species for each region below. My lowest count being 0 and my highest being around 2500 and the regions in Count match with the regions in my shapefile. I have a function that determines what is high, med, low based on levels you input yourself
High<-colnames(Count)[which(Count>'input value here')]
and then these are plotted onto the shapefile like this:
plot(ukmap[(ukmap$Region %in% High),],col='red',add=T)
Unfortunately I can't really install any packages, I was thinking of using colorRamp, but I'm not really sure what to do?
EDIT: my data looks something like this
Wales Midlands North Scotland South East South West
1 551 32 124 1 49 28
3 23 99 291 152 164 107
4 1 7 17 11 21 14
7 192 32 12 0 1 9
9 98 97 5 1 21 0
and the first column is just a number that represents the species and currently I have a function that plots the count onto a UK shapefile but based on boundaries of high, med and low. The data above is not attached to my shapefile. I then loop through for each line (species) of my data set and plot a new map for each line (species).

All right, I'll bite. I'm not going to use base R because plot is too hard for me to understand, so instead we will be using ggplot2.
# UK shapefile found via http://www.gadm.org/download
uk.url <- "http://www.filefactory.com/file/s3dz3jt3vr/n/GBR_adm_zip"
# replace following with your working directory - no trailing slash
work.dir <- "C:/Temp/r.temp/gb_map"
# the full file path for storing file
file.loc <- paste0(work.dir, "/uk.zip")
download.file (uk.url, destfile = file.loc, mode = "wb")
unzip(file.loc, exdir = work.dir)
# open the shapefile
require(rgdal)
require(ggplot2)
uk <- readOGR(work.dir, layer = "GBR_adm2")
# use the NAME_2 field (representing counties) to create data frame
uk.map <- fortify(uk, region = "NAME_2")
# create fake count data...
uk.map$count <- round(runif(nrow(uk.map), 0, 2500), 0)
# quick visual check
ggplot(uk.map, aes(x = long, y = lat, group = group, fill = count)) +
geom_polygon(colour = "black", size = 0.5, aes(group = group)) +
theme()
This generates the output below, which may be similar to what you need.
Note that we don't explictly specify the gradient in this case - we just leave it up to ggplot. If you wish to specify those details it is possible but more involved. If you go down that route you should create another column in uk.map to allocate each count into one of (say) 10 bins using the cut function. The uk.map data frame looks like this:
> str(uk.map)
'data.frame': 427339 obs. of 8 variables:
$ long : num -2.05 -2.05 -2.05 -2.05 -2.05 ...
$ lat : num 57.2 57.2 57.2 57.2 57.2 ...
$ order: int 1 2 3 4 5 6 7 8 9 10 ...
$ hole : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ piece: Factor w/ 234 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ group: Factor w/ 1136 levels "Aberdeen.1","Aberdeenshire.1",..: 1 1 1 1 1 1 1 1 1 1 ...
$ id : chr "Aberdeen" "Aberdeen" "Aberdeen" "Aberdeen" ...
$ count: num 1549 1375 433 427 1282 ...
>

OK, here is an alternative solution that doesn't use ggplot (I will leave the ggplot solution for reference). This code is simple but it should be enough to give you some ideas as to how you can adapt it to your own data.
# UK shapefile found via http://www.gadm.org/download
uk.url <- "http://www.filefactory.com/file/s3dz3jt3vr/n/GBR_adm_zip"
# replace following with your working directory - no trailing slash
work.dir <- "C:/Temp/r.temp/gb_map"
# the full file path for storing file
file.loc <- paste0(work.dir, "/uk.zip")
download.file (uk.url, destfile = file.loc, mode = "wb")
unzip(file.loc, exdir = work.dir)
# open the shapefile
require(rgdal)
uk <- readOGR(work.dir, layer = "GBR_adm2")
# make some fake data to plot
uk#data$count <- round(runif(nrow(uk#data), 0, 2500), 0)
uk#data$count <- as.numeric(uk#data$count)
# and plot it
plot(uk, col = gray(uk#data$count/2500))
The result of the code is the following plot.
EDIT following a request to include a legend, I have tweaked the code a little but in all honesty I don't understand base R's legend function well enough to get something of production quality and I have no wish to research it further. (Incidentally hat tip to this question for ideas.) A look at the plot beneath the code suggests that we need to reorder the legend colours etc but I will leave that to the original poster as an exercise or to post as another question.
# UK shapefile found via http://www.gadm.org/download
uk.url <- "http://www.filefactory.com/file/s3dz3jt3vr/n/GBR_adm_zip"
# replace following with your working directory - no trailing slash
work.dir <- "C:/Temp/r.temp/gb_map"
# the full file path for storing file
file.loc <- paste0(work.dir, "/uk.zip")
download.file (uk.url, destfile = file.loc, mode = "wb")
unzip(file.loc, exdir = work.dir)
# open the shapefile
require(rgdal)
uk <- readOGR(work.dir, layer = "GBR_adm2")
# make some fake data to plot
uk#data$count <- as.numeric(round(runif(nrow(uk#data), 0, 2500), 0))
uk#data$bin <- cut(uk#data$count, seq(0, 2500, by = 250),
include.lowest = TRUE, dig.lab = 4)
# labels for the legend
lev = levels(uk#data$bin)
lev2 <- gsub("\\,", " to ", lev)
lev3 <- gsub("\\]$", "", lev2)
lev4 <- gsub("\\(|\\)", " ", lev3)
lev5 <- gsub("^\\[", " ", lev4)
my.levels <- lev5
# Create a function to generate a continuous color palette
rbPal <- colorRampPalette(c('red','blue'))
uk#data$Col <- rbPal(10)[as.numeric(cut(uk#data$count, seq(0, 2500, by = 250)))]
# Plot
plot(uk, col = uk#data$Col)
legend("topleft", fill = uk#data$Col, legend = my.levels, col = uk#data$Col)

Have you tried colorRampPalette?
Here is how you could try to build a gradient palette
gradient_color <- colorRampPalette(c("blue", "red"))
gradient_color(10)
[1] "#0000FF" "#1C00E2" "#3800C6" "#5500AA" "#71008D" "#8D0071" "#AA0055"
[8] "#C60038" "#E2001C" "#FF0000"
An example plot
plot(rep(1,10),col=gradient_color(10))

Related

Calculating measure of spatial segregation?

There is five polygons for five different cities (see attached file in the link, it's called bound.shp). I also have a point file "points.csv" with longitude and latitude where for each point I know the proportion of people belonging to group m and group h.
I am trying to calculate the spatial segregation proposed by Reardon and O’Sullivan, “Measures of Spatial Segregation”
There is a package called "seg" which should allow us to do it. I am trying to do it but so far no success.
Here is the link to the example file: LINK. After downloading the "example". This is what I do:
setwd("~/example")
library(seg)
library(sf)
bound <- st_read("bound.shp")
points <- st_read("points.csv", options=c("X_POSSIBLE_NAMES=x","Y_POSSIBLE_NAMES=y"))
#I apply the following formula
seg::spseg(bound, points[ ,c(group_m, group_h)] , smoothing = "kernel", sigma = bandwidth)
Error: 'x' must be a numeric matrix with two columns
Can someone help me solve this issue? Or is there an alternate method which I can use?
Thanks a lot.
I don't know what exactly spseg function does but when evaluating the spseg function in the seg package documentation;
First argument x should be dataframe or object of class Spatial.
Second argument data should be matrix or dataframe.
After evaluating the Examples for spseg function, it should have been noted that the data should have the same number of rows as the id number of the Spatial object. In your sample, the id is the cities that have different polygons.
First, let's examine the bound data;
setwd("~/example")
library(seg)
library(sf)
#For the fortify function
library(ggplot2)
bound <- st_read("bound.shp")
bound <- as_Spatial(bound)
class(bound)
"SpatialPolygonsDataFrame"
attr(,"package")
"sp"
tail(fortify(bound))
Regions defined for each Polygons
long lat order hole piece id group
5379 83.99410 27.17326 972 FALSE 1 5 5.1
5380 83.99583 27.17339 973 FALSE 1 5 5.1
5381 83.99705 27.17430 974 FALSE 1 5 5.1
5382 83.99792 27.17552 975 FALSE 1 5 5.1
5383 83.99810 27.17690 976 FALSE 1 5 5.1
5384 83.99812 27.17700 977 FALSE 1 5 5.1
So you have 5 id's in your SpatialPolygonsDataFrame. Now, let's read the point.csv with read.csv function since the data is required to be in matrix format for the spseg function.
points <- read.csv("c://Users/cemozen/Downloads/example/points.csv")
tail(points)
group_m group_h x y
950 4.95 78.49000 84.32887 26.81203
951 5.30 86.22167 84.27448 26.76932
952 8.68 77.85333 84.33353 26.80942
953 7.75 82.34000 84.35270 26.82850
954 7.75 82.34000 84.35270 26.82850
955 7.75 82.34000 84.35270 26.82850
In the documentation and the example within, it has been strictly stated that; the row number of the points which have two attributes (group_m and group_h in our data), should be equal to the id number (which is the cities). Maybe, you should calculate a value by using the mean for each polygon or any other statistics for each city in your data to be able to get only one value for each polygon.
On the other hand, I just would like to show that the function is working properly after feeding with a matrix that has 5 rows and 2 groups.
sample_spseg <- spseg(bound, as.matrix(points[1:5,c("group_m", "group_h")]))
print(sample_spseg)
Reardon and O'Sullivan's spatial segregation measures
Dissimilarity (D) : 0.0209283
Relative diversity (R): -0.008781
Information theory (H): -0.0066197
Exposure/Isolation (P):
group_m group_h
group_m 0.07577679 0.9242232
group_h 0.07516285 0.9248372
--
The exposure/isolation matrix should be read horizontally.
Read 'help(spseg)' for more details.
first: I do not have experience with the seg-package and it's function.
What I read from your question, is that you want to perform the spseg-function, om the points within each area?
If so, here is a possible apprach:
library(sf)
library(tidyverse)
library(seg)
library(mapview) # for quick viewing only
# read polygons, make valif to avoid probp;ems later on
areas <- st_read("./temp/example/bound.shp") %>%
sf::st_make_valid()
# read points and convert to sf object
points <- read.csv("./temp/example/points.csv") %>%
sf::st_as_sf(coords = c("x", "y"), crs = 4326) %>%
#spatial join city (use st_intersection())
sf::st_join(areas)
# what do we have so far??
mapview::mapview(points, zcol = "city")
# get the coordinates back into a data.frame
mydata <- cbind(points, st_coordinates(points))
# drop the geometry, we do not need it anymore
st_geometry(mydata) <- NULL
# looks like...
head(mydata)
# group_m group_h city X Y
# 1 8.02 84.51 2 84.02780 27.31180
# 2 8.02 84.51 2 84.02780 27.31180
# 3 8.02 84.51 2 84.02780 27.31180
# 4 5.01 84.96 2 84.04308 27.27651
# 5 5.01 84.96 2 84.04622 27.27152
# 6 5.01 84.96 2 84.04622 27.27152
# Split to a list by city
L <- split(mydata, mydata$city)
# loop over list and perform sppseg function
final <- lapply(L, function(i) spseg(x = i[, 4:5], data = i[, 1:2]))
# test for the first city
final[[1]]
# Reardon and O'Sullivan's spatial segregation measures
#
# Dissimilarity (D) : 0.0063
# Relative diversity (R): -0.0088
# Information theory (H): -0.0067
# Exposure/Isolation (P):
# group_m group_h
# group_m 0.1160976 0.8839024
# group_h 0.1157357 0.8842643
# --
# The exposure/isolation matrix should be read horizontally.
# Read 'help(spseg)' for more details.
spplot(final[[1]], main = "Equal")

Plotting shape files loaded using read.shp with ggplot2

I would like to plot a shape file loaded using read.shp from the fastshp package. However, the read.shp function returns a list of list and not a data.frame. I'm unsure which part of the list I need to extract to obtain the correctly formatted data.frame object. This exact question has been asked on stack overflow already, however, the solution no longer seems to work (solution was from > 7 years ago). Any help is much appreciated.
remotes::install_github("s-u/fastshp") #fastshp not on CRAN
library(ggplot2);library(fastshp)
temp <- tempfile()
temp2 <- tempfile()
download.file("https://www2.census.gov/geo/tiger/TIGER2017/COUNTY/tl_2017_us_county.zip",temp)
unzip(zipfile = temp, exdir = temp2)
shp <- list.files(temp2, pattern = ".shp$",full.names=TRUE) %>% read.shp(.)
shp is a list of lists containing a plethora of information. I tried the following solution from the SO posted earlier, but to no avail:
shp.list <- sapply(shp, FUN = function(x) Polygon(cbind(lon = x$x, lat = x$y))) #throws an error here cbind(lon = x$x, lat = x$y) returns NULL
shp.poly <- Polygons(shp.list, "area")
shp.df <- fortify(shp.poly, region = "area")
I also tried the following:
shp.list <- sapply(shp, FUN = function(x) do.call(cbind, x[c("id","x","y")])) #returns NULL value here...
shp.df <- as.data.frame(do.call(rbind, shp.list))
Updated: Still no luck but closer:
file_shp<-list.files(temp2, pattern = ".shp$",full.names=TRUE) %>%
read.shp(., format = c("table"))
ggplot() +
geom_polygon(data = file_shp, aes(x = x, y = y, group = part),
colour = "black", fill = NA)
Looks like the projection is off. I'm not sure how to order the data to map correctly, also not sure how to read in the CRS data. Tried the following to no avail:
file_prj<-list.files(temp2, pattern = ".prj$",full.names=TRUE) %>%
proj4string(.)
I tried to use the census data you have in your script. However, R Studio somehow kept crashing when I applied read.shp() to the polygon data. Hence, I decided to use the example from the help page of read.shp(), which is also census data. I hope you do not mind. It took some time to figure out how to draw a map with class shp. Let me explain what I went through step by step.
This part is from the help page. I am basically getting shapefile and importing it as shp object.
# Census 2010 TIGER/Line(TM) state shapefile
library(fastshp)
fn <- system.file("shp", "tl_2010_us_state10.shp.xz", package="fastshp")
s <- read.shp(xzfile(fn, "rb"))
Let's check how this object, s is like. It contains 52 lists. In each list, there are six vectors. ID is a unique integer to represent a state. x is longitude and y is latitude. The nasty part was parts. In this example below, there is only one number, which means there is one polygon only in this state. But some other lists (states) have multiple numbers. These numbers are basically indices which indicate where new polygons begin in the data.
#> str(s)
#List of 52
# $ :List of 6
# ..$ id : int 1
# ..$ type : int 5
# ..$ box : num [1:4] -111 41 -104 45
# ..$ parts: int 0
# ..$ x : num [1:9145] -109 -109 -109 -109 -109 ...
# ..$ y : num [1:9145] 45 45 45 45 45 ...
Here is the one for Alaska. As you see there are some numbers in parts These numbers indicate where new polygon data begin. Alaksa has many small islands. Hence they needed to indicate different polygons in the data with this information. We will come back to this later when we create data frames.
#List of 6
# $ id : int 18
# $ type : int 5
# $ box : num [1:4] -179.2 51.2 179.9 71.4
# $ parts: int [1:50] 0 52 88 127 175 207 244 306 341 375 ...
# $ x : num [1:14033] 177 177 177 177 177 ...
# $ y : num [1:14033] 52.1 52.1 52.1 52.1 52.1 ...
What we need is the following. For each list, we need to extract longitude (i.e., x), latitude (i.e., y), and id in order to create a data fame for one state. In addition, we need to use parts so that we can indicate all polygons with unique IDs. We need to crate a new group variable, which contains unique ID value for each polygon. I used findInterval() which takes indices to create a group variable. One tricky part was that we need to use left.open = TRUE in findInterval() in order to create a group variable. (This gave me some hard time to figure out what was going on.) This map_dfr() part handles the job I just described.
library(tidyverse)
map_dfr(.x = s,
.f = function(mylist){
temp <- data.frame(id = mylist$id,
lon = mylist$x,
lat = mylist$y)
ind <- mylist$parts
out <- mutate(temp,
subgroup = findInterval(x = 1:n(), vec = ind, left.open = TRUE),
group = paste(id, subgroup, sep = "_"))
return(out)
}) -> test
Once we have test, we have another job. Some longitude points of Alaska stay in positive numbers (e.g., 179.85). As long as we have numbers like this, ggplot2 draws funny long lines, which you can see even in your example. What we need is to convert these positive numbers to negative ones so that ggplot2 can draw a proper map.
mutate(test,
lon = if_else(lon > 0, lon * -1, lon)) -> out
By this time, out looks like this.
id lon lat subgroup group
1 1 -108.6213 45.00028 1 1_1
2 1 -108.6197 45.00028 1 1_1
3 1 -108.6150 45.00031 1 1_1
4 1 -108.6134 45.00032 1 1_1
5 1 -108.6133 45.00032 1 1_1
6 1 -108.6130 45.00032 1 1_1
Now we are ready to draw a map.
ggplot() +
geom_polygon(data = out, aes(x = lon, y = lat, group = group))

Overlapped data with messed up axises using facet_grid in R

I am using facet grid to generate neat presentations of my data.
Basically, my data frame has four columns:
idx, density, marker, case.
There are 5 cases, each case corresponds to 5 markers, and each marker corresponds to multiple idx, each idx corresponds to one density.
The data is uploaded here:
data frame link
I tried to use facet_grid to achieve my goal, however, I obtained a really messed up graph:
The x-axis and y-axis are messed up, the codes are:
library(ggplot2)
library(cowplot)
plot.density <-
ggplot(df_densityWindow, aes(x = idx, y = density)) +
geom_col() +
facet_grid(marker ~ case, scales = 'free') +
background_grid(major = 'y', minor = "none") + # add thin horizontal lines
panel_border() # and a border around each panel
plot(plot.density)
EDIT:
I reupload the file, now it should be work:
download file here
All 4 columns have been read as factors. This is an issue from however you loaded the data into R. Take a look at:
df <- readRDS('df.rds')
str(df)
'data.frame': 52565 obs. of 4 variables:
$ idx : Factor w/ 4712 levels "1","10","100",..: 1 1112 2223 3334 3546 3657 3768 3879 3990 2 ...
$ density: Factor w/ 250 levels "1022.22222222222",..: 205 205 204 203 202 201 199 198 197 197 ...
$ marker : Factor w/ 5 levels "CD3","CD4","CD8",..: 1 1 1 1 1 1 1 1 1 1 ...
$ case : Factor w/ 5 levels "Case_1","Case_2",..: 1 1 1 1 1 1 1 1 1 1 ...
Good news is that you can fix it with:
df$idx <- as.integer(as.character(df$idx))
df$density <- as.numeric(as.character(df$density))
Although you should look into how you are loading the data, to avoid future.
As another trick, try the above code without using the as.character calls, and compare the differences.
As already explained by MrGumble, the idx and density variables are of type factor but should be plotted as numeric.
The type.convert() function does the data conversion in one go:
library(ggplot2)
library(cowplot)
ggplot(type.convert(df_densityWindow), aes(x = idx, y = density)) +
geom_col() +
facet_grid(marker ~ case, scales = 'free') +
background_grid(major = 'y', minor = "none") + # add thin horizontal lines
panel_border() # and a border around each panel

Plotting when time series is in rows not columns- using R

Excel allows you to switch rows and columns in its Chart functionality.
I am trying to replicate this in R. My data (shown) below, is showing production for each company in rows. I am unable to figure out how to show the Month-1, Month-2 etc in x-axis, and the series for each company in the same graph. Any help appreciated.
Data:
tibble::tribble( ~Company.Name, ~Month-1, ~Month-2, ~Month-3, ~Month-4, "Comp-1", 945.5438986, 1081.417009, 976.7388701, 864.309703, "Comp-2", 16448.87, 13913.19, 12005.28, 10605.32, "Comp-3", 346.9689321, 398.2297592, 549.1282647, 550.4207169, "Comp-4", 748.8806367, 949.463941, 1018.877481, 932.3773791 )
I'm going to skip the part where you want to transpose, and infer that your purpose for that was solely to help with plotting. The part I'm focusing on here is "show the Month-1, Month-2 etc in x-axis, and the series for each company in the same graph".
This is doable in base graphics, but I highly recommend using ggplot2 (or plotly or similar), due to its ease of dealing with dimensional plots like this. The "grammar of graphics" (which both tend to implement) really prefers data like this be in a "long" format, so part of what I'll do is convert to this format.
First, some data:
set.seed(2)
months <- paste0("Month", 1:30)
companies <- paste0("Comp", 1:5)
m <- matrix(abs(rnorm(length(months)*length(companies), sd=1e3)),
nrow = length(companies))
d <- cbind.data.frame(
Company = companies,
m,
stringsAsFactors = FALSE
)
colnames(d)[-1] <- months
str(d)
# 'data.frame': 5 obs. of 31 variables:
# $ Company: chr "Comp1" "Comp2" "Comp3" "Comp4" ...
# $ Month1 : num 896.9 184.8 1587.8 1130.4 80.3
# $ Month2 : num 132 708 240 1984 139
# $ Month3 : num 418 982 393 1040 1782
# $ Month4 : num 2311.1 878.6 35.8 1012.8 432.3
# (truncated)
Reshaping can be done with multiple libraries, including base R, here are two techniques:
library(data.table)
d2 <- melt(as.data.table(d), id = 1, variable.name = "Month", value.name = "Cost")
d2[,Month := as.integer(gsub("[^0-9]", "", Month)),]
d2
# Company Month Cost
# 1: Comp1 1 896.91455
# 2: Comp2 1 184.84918
# 3: Comp3 1 1587.84533
# 4: Comp4 1 1130.37567
# 5: Comp5 1 80.25176
# ---
# 146: Comp1 30 653.67306
# 147: Comp2 30 657.10598
# 148: Comp3 30 549.90924
# 149: Comp4 30 806.72936
# 150: Comp5 30 997.37972
library(dplyr)
# library(tidyr)
d2 <- tbl_df(d) %>%
tidyr::gather(Month, Cost, -Company) %>%
mutate(Month = as.integer(gsub("[^0-9]", "", Month)))
I also integerized the Month, since it made sense with an ordinal variable. This isn't strictly necessary, the plot would just treat them as discretes.
The plot is anti-climactically simple:
library(ggplot2)
ggplot(d2, aes(Month, Cost, group=Company)) +
geom_line(aes(color = Company))
Bottom line: I don't think you need to worry about transposing your data: doing so has many complications that can just confuse things. Reshaping is a good thing (in my opinion), but with this kind of data is fast enough that if your data is stored in the wide format, you can re-transform it without too much difficulty. (If you are thinking about putting this in a database, however, I'd strongly recommend you re-think "wide", your db schema will be challenging if you keep it.)

R: Adding a condition to an existing plot

I have a dataframe consists of three variables asn(this is an id),ip_used,domain_used,correct(this is binary 0 or 1). data example :
asn, ip_used,domain_used,correct
1,234,34,1
30,45,765,1
498,4,765,0
3874,876,8765,1
I have plotted ip_used and domain_used against each other for each asn in a bubble plot and now I want to specify bubbles of the entries that are equal to 1 for "correct" with a different bubble color.
Here is my current plot and my current code:
symbols(log_domused,log_ipused, circles = radius,inches=0.40, fg="black", bg="white",xlab = "# used domain",ylab="# used ips",main="dnsdb distribution of domains per ips for each ASN")
Does anybody have any idea how to do that?
Your data:
myData <- rbind(c(1,234,34,1), c(30,45,765,1), c(498,4,765,0), c(3874,876,8765,1))
colnames(myData) <- c("asn", "ip_used", "domain_used", "correct")
myData
asn ip_used domain_used correct
[1,] 1 234 34 1
[2,] 30 45 765 1
[3,] 498 4 765 0
[4,] 3874 876 8765 1
You can specify the color of each circle with "fg" (or "bg"):
symbols(myData[,1], myData[,3], circles=c(1,1,1,1), inches=0.40, fg=myData[,4]+1, bg="white",
xlab = "# used domain",ylab="# used ips",
main="dnsdb distribution of domains per ips for each ASN"
)

Resources