I am trying to map a UK government petition data in R. I used the boundary data from ONS geography portal. The code works and the first map I created also works.
#Install packages
install.packages("tidyverse")
install.packages("jsonlite")
install.packages("geojsonio")
install.packages("sp")
install.packages("parlitools")
install.packages("rvest")
install.packages("xml2")
install.packages("magrittr")
#Load packages
library(tidyverse)
library(jsonlite)
library(geojsonio)
library(sp)
library(parlitools)
library(rvest)
library(xml2)
library(magrittr)
[#GETTING PETITION DATA
#Importing petition for UK-wide lockdown from JSON format
petition <- fromJSON("https://petition.parliament.uk/petitions/301397.json", flatten = TRUE)
signatures <- petition$data$attributes$signatures_by_constituency %>%
rename(constituency = name)
#MAPPING BOUNDARIES
#Save url for boundary data UK
url <- "https://opendata.arcgis.com/datasets/b64677a2afc3466f80d3d683b71c3468_0.geojson"
#Load and save the boundary data as uk_map
uk_map <- geojson_read (url, what = "sp")
#pcon18cd is code name for constituency (as we can see when we view uk_map). Use fortify to get this data.
fort_uk_map <- fortify(uk_map, region = "pcon18cd")
#MAPPING PETITION DATA
#Join map data to signatures data from constituency using left_join
full_uk_map <- left_join(fort_uk_map, signatures, by = c("id" = "ons_code"))
#Plot-1a: Map of signatures in the whole of UK
ggplot() +
geom_polygon(data = full_uk_map, aes(x = long, y= lat, group = group, fill = signature_count)) +
geom_path(color = "black", size = 0.1) +theme(legend.position = "bottom") +
theme_void() +
labs(x = NULL,
y = NULL,
title = "Signatories of the UK Coronavirus Lockdown Petition",
subtitle = "Let's investigate where the signatures come from",
caption = "Geometries: ONS Open Geography Portal; Data: UK Parliament and Government",
fill = "Signature Count")][1]
But, as you can see from the image, the higher signatures have a lighter color. I would like to change it so that the higher number of signatures have a darker color.
So, I tried this code just below the above code and that's where I am facing issues.
#Change color of legend so that higher signature count equals darker color. Use quantile () [Doesn't work]
no_of_classes <- 9
quantiles <- quantile(full_uk_map$signature_count, probs = seq(0, 1, length.out = no_of_classes + 1))
labels <- c()
for(band in 1:length(quantiles)){
labels <- c(labels, paste0(round(quantiles[band])," - ", round(quantiles[band + 1])))
}
full_uk_map$quantiles <- cut(full_uk_map$signature_count, breaks = quantiles, labels = labels,
include.lowest = T)
labels <- labels[1:length(labels)-1]
#Plot-1b: Map of signatures in the whole of UK [Doesn't work]
sig_map_by_quantile <- ggplot() +
geom_polygon(data = full_uk_map, aes(x = long, y = lat, group = group, fill = quantiles)) +
geom_path(color = "black", size = 0.1) +
scale_fill_brewer(type = 'qual', palette = "Blues", guide = "legend", name = "Signature Count", labels = labels) +
theme_void +
theme(legend.position = "bottom") +
labs(x = NULL,
y = NULL,
title = "Signatories of the UK Coronavirus Lockdown Petition",
caption = "Geometries: ONS Open Geography Portal; Data: UK Parliament and Government")
When I run the full_uk_map$quantiles, this is the error message I see:
> full_uk_map$quantiles <- cut(full_uk_map$signature_count, breaks = quantiles, labels = labels,
+ include.lowest = T)
Error in cut.default(full_uk_map$signature_count, breaks = quantiles, :
lengths of 'breaks' and 'labels' differ
Would anyone be able to help? Much appreciated!
Why you made us go through all that package installation, downloading files from the Internet, fortification, merging, and then waiting for the plot to appear is beyond me.
All you had to ask was why the cut function was returning an error. Your title is totally irrelevant to the problem.
Anyway, the cut function, although not mentioned in the documentation (which is a shame if true), requires that the length of labels be one less than the length of breaks, if breaks is specified as a vector. Apologies to all if this is in fact mentioned in the documentation, but I didn't see it after a good long look. It may be hidden between the lines of the descriptions for the breaks and labels arguments. Note that the breaks argument can be provided as a number (of break-points) or, as in your case, a vector of cut-points.
For example, if breaks = c(1,2,3), then that implies you have two intervals, so you need 2 labels.
In your code, you supply the quantiles vector as the breaks and labels vector and the labels. Both have the same length, which triggers the error; you have 1 too many labels. Solution: make the length of labels one less than the length of breaks.
Related
I am attempting to complete a principal component analysis on a set of data containing columns of numeric data.
Assuming a dataset like this (in reality I have a pre configured data frame, this one if for reproducibility):
v1 <- c(1,2,3,4,5,6,7)
v2 <- c(3,6,2,5,2,4,9)
v3 <- c(6,1,4,2,3,7,5)
dataset <-data.frame(v1,v2,v3)
row.names(dataset) <-c('New York', 'Seattle', 'Washington DC', 'Dallas', 'Chicago','Los Angeles','Minneapolis')
I have ran my principal component analysis, and successfully plotted it:
pca=prcomp(dataset,scale=TRUE)
plot(pca$x[,1], pca$x[,2],
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],cex=0.7,pos=3,col="darkgrey")
What I want to do however is colour code my data points based on the city, which is the row names of my dataset. I also want to use these cities (i.e. rownames) as labels.
I've tried the following, but neither have worked:
## attempt 1 - I get row labels, but no chart
plot(pca$x[,1], pca$x[,2],col=rownames(dataset),pch=rownames(dataset),
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],labels=rownames(dataset),cex=0.7,pos=3,col="darkgrey")
## attempt 2
datasetwithcity = rownames_to_column(dataset, var = "city")
head(datasetwithcity)
OnlyCities=datasetwithcity[,1]
OnlyCities
# this didn't work:
City_Labels=as.numeric(OnlyCities)
head(City_Labels)
# gets city labels, but loses points and no colour
plot(pca$x[,1], pca$x[,2],col=City_Labels,pch=City_Labels,
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],labels=rownames(dataset),
cex=0.7,pos=3,col="darkgrey")
There are many different ways to do this.
In base R, you could do:
plot(pca$x[,1], pca$x[,2],
xlab="First PC",ylab="Second PC", col = seq(nrow(pca$x)),
xlim = c(-2.5, 2.5), ylim = c(-2, 2))
text(pca$x[,1], pca$x[,2],cex=0.7,pos=3,col="darkgrey")
text(x = pca$x[,1], y = pca$x[,2], labels = rownames(pca$x), pos = 1)
Personally, I think the resulting aesthetics are nicer (and more easy to change to suit your needs) with ggplot. The code is also a bit easier to read once you get used to the syntax.
library(ggplot2)
df <- as.data.frame(pca$x)
df$city <- rownames(df)
ggplot(df, aes(PC1, PC2, color = city)) +
geom_point(size = 3) +
geom_text(aes(label = city) , vjust = 2) +
lims(x = c(-2.5, 2.5), y = c(-2, 2)) +
theme_bw() +
theme(legend.position = "none")
Created on 2021-10-28 by the reprex package (v2.0.0)
I have a dataframe object, created by reading in a shape file with sf::read_sf and merged with some pre-existing data with a common geography column:
boundaries <- sf::read_sf('./shapefile')
map <- merge(boundaries, data, by.x = "InterZone",
by.y = "IntermediateZone2011Code", all.x = FALSE, duplicateGeoms = TRUE)
This is then overlaid using ggmap on top of a provider tile obtained with the sf get_map function:
myMap <- get_map(location = c(lon = -2.27, lat = 57.1), zoom = 6,
maptype="toner", crop=FALSE, source = 'stamen')
ggmap(myMap) +
geom_sf(data = map, aes(fill=as.factor(column1)), inherit.aes = FALSE) +
scale_fill_brewer(palette = "OrRd") +
coord_sf(crs = st_crs(4326)) +
labs(x = 'Longitude', y = 'Latitude', fill = 'column1') +
ggtitle('column1')
The issue is that this auto creates hundreds of bins.
I have been looking through the documentation but cannot find an additional argument to specify the number of bins. How can I make it clear to breakdown the column by a fixed number of bins and then map this?
Without a reproducible example it is hard to say exactly what is going on, but it looks like you might be converting a continuous variable into a factor with fill=as.factor(column1).
One option is you remove as.factor and use scale_fill_continuous or some other continuous color scale of your choice.
Another option is to look into cut, where you bin continuous data by specifying the number of bins, or the specific start and end points of your bins.
# Make n bins
map$data_bin <- cut(map$column, breaks = n )
# Or make specific start and end points for bins
map$data_bin <- cut(map$column, breaks = c(-Inf,50,100,Inf) )
I would like to use different colors for specific ranges of values a have in grids in a map (Southern Hemisphere), when working with this dataset. I'm not sure if have defined to blocks correctly and I don't know how to ask for different colors for each block where I have at least one occurrence.
I'm using ggplot2 in R to create a map with grids with different numbers of individuals in each. I got a nice plot, but it is showing me only few colors/shades (because I have few grids with high values). So I divided the range of the individuals sighted in each grid (n, that varies from 1-15035) in blocks (by = 100) to then ask R to use a different color/shades considering the block that each grid belongs to (e.g. use one color for the grids where I have 1-100 individuals, another color for the ones with 101-200 individuals, and so on). I know that I have many (151) blocks, but from them there's only 30 where a have at least one occurrence (there's no grid with most of the ranges). There's a mistake in my blocks (breaks in the code provided, as there's some overlap) and I don't know how to include this information when creating the plot to ask for the 30 different colors for each of the blocks where the frequency is different from zero. I tried some options using the ggplot arguments there I kept in the code provided here (the first three lines creating a object 'sc'). How should I specify my blocks (breaks) to avoid the overlap I'm getting? Is the limits argument right in the code? How to ask for different colors for the blocks where I have at least one individuals (n>0)?
Any tip will be very appreciated :)
##Setting workspace
#setwd...
rm(list=ls()) #removing previous objects
#Installing (or loading) necessary packages
packages = c('ggplot2','sp','rgdal','sf','readxl','maps','dplyr')
package.check = lapply(packages, FUN = function(x) {
if (!require(x, character.only = TRUE)) {
install.packages(x, dependencies = TRUE)
library(x, character.only = TRUE)
}
})
#### Data ###
hump = read_excel("data_r.xlsx")
range(hump$Lat_Dec)
range(hump$Lon_Dec)
#Convert df to sf -----------------------------
hbk_sf = st_as_sf(x = hump,
coords = c("Lon_Dec", "Lat_Dec"),
crs = "+init=epsg:4326")
class(hbk_sf)
#Plot
plot(st_geometry(hbk_sf))
#Plot all variables (I don't recommend, it will take some time)
#plot(hbk_sf)
#Create grid--------------
grd = st_make_grid(hbk_sf, cellsize = 10, square = T)
grd = as_Spatial(grd)
grd = st_as_sf(grd)
grd$id = rownames(grd)
class(grd)
#plot(st_geometry(grd),add=T)
#Count individuals at grid------------------
#Spatial join: Add the grid id at hbk dataframe
hbk = st_join(hbk_sf, grd, join = st_within)
range(hbk$id,na.rm=T)
hbk_count = count(as_tibble(hbk), id) #alternativelly hbk_count =
aggregate(hbk$Hbk, by=list(hbk$id), FUN=sum)
hbk_count
#Adding the count in grids
grd_hbk = left_join(grd,hbk_count)
plot(grd_hbk)
range(grd_hbk$n,na.rm=T)
ggplot(grd_hbk,aes(x=n))+geom_density()
#Plotting with ggplot
world = sf::st_as_sf(map('world', plot = FALSE, fill = TRUE))
mymap = ggplot(grd_hbk[!is.na(grd_hbk$n),],aes(fill=n))+
geom_sf(data=world,aes(),fill='grey',lwd=.2)+
geom_sf(alpha=.7,lwd=0)+
scale_fill_distiller(palette = "Spectral")+
coord_sf(crs = st_crs(4326),xlim = c(-165,165), ylim = c(-70,6))+
theme_bw() # + theme(legend.position = 'none')
mymap
mymap = mymap + ggtitle("...")+
theme(plot.title = element_text(color = "black", size = 10, hjust = 0.5))
mymap
ggsave("mymap.png", dpi=300)
#-------
#To give colours to the grids considering the range of catches in each one
#that has at least one catch (grd_hbk>0)
#Tryied to create breaks to then consider in the grd_hbk$n. I wanna R to
#use a different colour/shade considering the range of the grd_hbk$n in
#the grids (one colour for grids with 1-100 catches, another colour for
#grids with 101-200 and so on)---
# Generate breaks to cut the data
breaks = seq(0, 15100, by = 100)
# Cut the data and save the result in an object
r = cut(grd_hbk$n, breaks) #this an overlap (Levels: (0,100] (100,200]
#(200,300]...)
# Check the number of different categories
length(levels(r))
# Name for the levels
levels(r) = as.character(1:152)
#table(levels(r))
table(r)
#create a combination of colours to use in the following plot
#sc = scale_colour_gradientn(colors = 'red', 'blue', 'green')
#sc = scale_fill_grey(start=1, end=15035, aes(fill=y))
#sc = scale_fill_grey(start=min(levels(r)), end=max(levels(r)),
#aes(fill=y))
sc = scale_fill_gradient(low="blue", high="red")
mymapii = ggplot(grd_hbk[!is.na(grd_hbk$n),],aes(fill=n))+
geom_sf(data=world,aes(),fill='grey',lwd=.2)+
geom_sf(alpha=.7,lwd=0)+
scale_colour_manual(limits = min(levels(r)),max(levels(r)),
values = sc, #colors to be used
breaks = breaks,
aes(fill=sc))+ #maybe need another specification here?
coord_sf(crs = st_crs(4326),xlim = c(-165,165), ylim = c(-70,6))+
theme_bw()
mymapii
I'm trying to build a column chart through highchart in r studio. I've converted the values to % as I want the graph to show %, but I want the data labels to show the value, is there a way of doing this?
My data set has a column with the values for London and the percentages for London, I want the Y axis of the graph to show the % while the data labels show the value.
This is my current code:
hc <- highchart() %>%
hc_title(text= "Gender - London")%>%
hc_colors('#71599b') %>%
hc_yAxis(max = 0.7) %>%
hc_xAxis(categories = Sex$Gender) %>%
hc_add_series(name = "London", type = "column",
data = Sex$LON_PERC, dataLabels = list(enabled=TRUE, format={Sex$London}) )
So, I've put Sex$LON_PERC (% in London) as the data to plot while Sex$London is the data labels.
But this code puts all the values of London in each data label.
Edit:
This is the data I'm trying to plot, LON_PERC on the Y Axis, Gender on the X axis and London as the Data Labels
Gender London LON_PERC
Declined 5 0.000351247
Female 8230 0.578152441
Male 4640 0.325957148
No Data 1360 0.095539164
I am rather uncomfortable working with the ´highcharter´ package, as it requires a commercial license, which I do not have.
The result you want to achieve can be reached with the following - rather straightforward - code using base r or ggplot functionality, both of which are freeware. I will show this with two code fragments below.
### your data
Sex <- read.table(header = TRUE, text =
"Gender London LON_PERC
Declined 5 0.000351247
Female 8230 0.578152441
Male 4640 0.325957148
'No Data' 1360 0.095539164
")
A Solution using base r
The barplot function returns a vector (when besides is false) with the coordinates of all the midpoints of the bars drawn (if besides is true, it is a matrix). This gives us the X-coordinates for setting text above the bars, the bar-heights we already have in the data we plot, right.
# Draw the barplot and store result in `mp`
mp <- barplot(Sex$LON_PERC, # height of the bar
names.arg = Sex$Gender, # x-axis labels
ylim = c(0, 0.7), # limits of y-axis
col = '#71599b', # your color
main = "Gender - London") # main title
# add text to the barplot using the stored values
text(x = mp, # middle of the bars
y = Sex$LON_PERC, # height of the bars
labels = Sex$London, # text to display
adj = c(.5, -1.5)) # adjust horizontally and vertically
This yields the following plot:
A solution based on ggplot
library(ggplot2)
ggplot(aes(x = Gender, y = LON_PERC), data = Sex) +
geom_bar(stat = "identity", width = .60, fill = "#71599b" ) +
geom_text(aes(label = London),
position = position_dodge(width = .9),
vjust = -.3, size = 3, hjust = "center") +
theme_minimal() +
scale_y_continuous(limits = c(0, 0.7),
breaks = seq(0.0, 0.7, by = 0.1),
minor_breaks = NULL) +
labs(title = "Gender - London") +
theme(axis.title.y = element_blank(), axis.title.x = element_blank())
yielding the following plot:
In both cases, a lot of characteristics may be adapted to your needs/wishes.
I hope you benefit from these examples, even though it is not made with highcharter.
I've found a work around.
So, I can add in a "tooltip" that appears when I hover over the column/bar.
Firstly, a function is needed:
myhc_add_series_labels_values <- function (hc, labels, values, text, colors= NULL, ...)
{
assertthat::assert_that(is.highchart(hc), is.numeric(values),
length(labels) == length(values))
df <- dplyr::data_frame(name = labels, y = values, text=text)
if (!is.null(colors)) {
assert_that(length(labels) == length(colors))
df <- mutate(df, color = colors)
}
ds <- list_parse(df)
hc <- hc %>% hc_add_series(data = ds, ...)
hc
}
and then when creating the highchart this function needs to be called.
The data looks as follows:
Sex <- read.table(header = TRUE, text =
"Gender London LON_PERC
Declined 5 0.000351247
Female 8230 0.578152441
Male 4640 0.325957148
'No Data' 1360 0.095539164
")
Then the code to generate the highchart is:
Gender<- highchart() %>%
hc_xAxis(categories = Sex$Gender, labels=list(rotation=0))%>%
myhc_add_series_labels_values(labels = Sex$Gender,values=Sex$LON_PERC, text=Sex$London, type="column")%>%
hc_tooltip(crosshairs=TRUE, borderWidth=5, sort=TRUE, shared=TRUE, table=TRUE,pointFormat=paste('<br>%: {point.y}%<br>#: {point.text}'))%>%
hc_legend()
This gives the below output:
Then when I hover over each column/bar it gives be the % information and the number information as can be seen here:
I am trying to change the background colors in US map for displaying presidential results for different states. I read so many posts regarding this color change but I was not able to change any of those colors. Below is my code, link for dataset and snapshot which I am getting:
#install.packages("ggplot2")
#install.packages("ggmap")
#install.packages("plyr")
#install.packages("raster")
#install.packages("stringr")
library(ggplot2) # for plotting and miscellaneuous things
library(ggmap) # for plotting
library(plyr) # for merging datasets
library(raster) # to get map shape filegeom_polygon
library(stringr) # for string operation
# Get geographic data for USA
usa.shape<-getData("GADM", country = "usa", level = 1)
# Creating a data frame of map data
usa.df <- map_data("state")
#rename 'region' as 'state' and make it a factor variable
colnames(usa.df) [5] <- "State"
usa.df$State <- as.factor(usa.df$State)
#set working directory
setwd("C:/Users/Ashish/Documents/Stats projects/2/")
#input data from file separated by commas
usa.dat <- read.csv("data1.csv", header = T)
# printing data structure
str(usa.df)
# removing % sign from the data, and converting percentage win to numeric
usa.dat$Clinton <- as.numeric(sub("%","",usa.dat$Clinton))/1
usa.dat$Trump <- as.numeric(sub("%","",usa.dat$Trump))/1
usa.dat$Others <- as.numeric(sub("%","",usa.dat$Others))/1
# Creating a winner column based on the percentage
usa.dat$Winner = "Trump"
usa.dat[usa.dat$Clinton > usa.dat$Trump,]$Winner = "Clinton"
usa.dat$State <- tolower(usa.dat$State)
# Creating a chance column which corresponds to winning percentage of the candidate
usa.dat$chance <- usa.dat$Trump
a <- usa.dat[usa.dat$Clinton > usa.dat$Trump,]
usa.dat[usa.dat$Clinton > usa.dat$Trump,]$chance <- a$Clinton
# display the internal structure of the object
usa.dat
#join the usa.df and usa.dat objects on state variable
usa.df <- join(usa.df, usa.dat, by = "State", type = "inner")
str(usa.df)
states <- data.frame(state.center, state.abb) # centers of states and abbreviations
#function for plotting different regions of USA map based on the input data showing different coloring scheme
#for each state.
p <- function(data, title) {
ggp <- ggplot() +
# Draw borders of states
geom_polygon(data = data, aes(x = long, y = lat, group = group,
fill = Winner, alpha=chance), color = "black", size = 0.15) +
#scale_alpha_continuous(range=c(0,1))+
scale_color_gradientn(colours = c("#F08080","white","#5DADE2"),breaks = c(0,50,100),
labels=c("Clinton","Equal","Trump"),
limits=c(0,100),name="Election Forecast") +
# Add state abbreviations
geom_text(data = states, aes(x = x, y = y, label = state.abb), size = 2)+
guides(fill = guide_legend(direction='vertical', title='Candidate', label=TRUE, colours=c("red", "blue")))
return(ggp)
}
figure.title <- "2016 presidential election result"
# Save the map to a file to viewing (you can plot on the screen also, but it takes
# much longer that way. The ratio of US height to width is 1:9.)
#print(p(usa.df, brks.to.use, figure.title))
ggsave(p(usa.df, figure.title), height = 4, width = 4*1.9,
file = "election_result.jpg")
Image link:
Dataset: Dataset link
I would like to get same coloring scheme as displayed in Election forecast gradient.
Thanks to Alistaire for providing his valuable feedbacks and solution for the above problem. Using scale_fill_brewer(type = 'qual', palette = 6) along with ggplot() resolves the above issue in R.