I am trying to plot some information that shows full population and then a subset of that population by location on a map. I've seen data visualizations that use concentric circles or 3-d inverted cones to convey this. I just can't figure out how to do it in ggplot / ggmap
Here's a free hand version in Paint that shows a rough idea of what I'm looking to do:
Here's a rough piece of data for an example:
> dput(df1)
structure(list(zip = c("00210", "00653", "00952", "02571", "04211",
"05286", "06478", "07839", "10090", "11559"), city = c("Portsmouth",
"Guanica", "Sabana Seca", "Wareham", "Auburn", "Craftsbury",
"Oxford", "Greendell", "New York", "Lawrence"), state = c("NH",
"PR", "PR", "MA", "ME", "VT", "CT", "NJ", "NY", "NY"), latitude = c(43.005895,
17.992112, 18.429218, 41.751554, 44.197009, 44.627698, 41.428163,
41.12831, 40.780751, 40.61579), longitude = c(-71.013202, -66.90097,
-66.18014, -70.71059, -70.239485, -72.434398, -73.12729, -74.678956,
-73.977182, -73.73126), timezone = c(-5L, -4L, -4L, -5L, -5L,
-5L, -5L, -5L, -5L, -5L), dst = c(TRUE, FALSE, FALSE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE), totalPop = c(43177, 37224, 37168,
15492, 1614, 88802, 2587, 80043, 78580, 87461), subPop = c(42705,
36926, 27556, 10827, 774, 39060, 1542, 21304, 53438, 2896)), .Names = c("zip",
"city", "state", "latitude", "longitude", "timezone", "dst",
"totalPop", "subPop"), row.names = c(1L, 50L, 200L, 900L, 1500L,
2000L, 2500L, 3000L, 3500L, 4000L), class = "data.frame")
Any suggestions?
The basic idea is to use separate geoms for the two populations, making sure the smaller one is plotted after the larger one, so its layer is on top:
library(ggplot2) # using version 0.9.2.1
library(maps)
# load us map data
all_states <- map_data("state")
# start a ggplot. it won't plot til we type p
p <- ggplot()
# add U.S. states outlines to ggplot
p <- p + geom_polygon(data=all_states, aes(x=long, y=lat, group = group),
colour="grey", fill="white" )
# add total Population
p <- p + geom_point(data=df1, aes(x=longitude, y=latitude, size = totalPop),
colour="#b5e521")
# add sub Population as separate layer with smaller points at same long,lat
p <- p + geom_point(data=df1, aes(x=longitude, y=latitude, size = subPop),
colour="#00a3e8")
# change name of legend to generic word "Population"
p <- p + guides(size=guide_legend(title="Population"))
# display plot
p
From the map, it is clear your data include non-contiguous-US locations, in which case you may want different underlying map data. get_map() from ggmap package provides a couple options:
require(ggmap)
require(mapproj)
map <- get_map(location = 'united states', zoom = 3, maptype = "terrain",
source = "google")
p <- ggmap(map)
After which you add the total and sub Population geom_point() layers and display it as before.
Related
I have tried unsuccessfully to reverse the direction of the legend color ramp in a ggplot2 raster graph. I want the temperature ranges to be ordered from highest to lowest without changing the color assignment of the variable.
The dataframe was built from a raster: EneroT5cmSC
datene <- as.data.frame(EneroT5cmSC,xy=TRUE)%>%drop_na()
datene$cuts <- cut(datene$layer, breaks=seq(21, 29, length.out=12))
ggplot2 code:
p1 <-ggplot()+
geom_raster(aes(x=x,y=y,fill=cuts),data=datene_stuc)+
geom_sf(fill='transparent',data=conturu)+
geom_sf(fill='transparent',data=locuru)+
scale_fill_viridis_d( option = "B", 'Temp (CÂș)')+
theme_minimal()+
theme(axis.title.x=element_blank(),
axis.title.y=element_blank())+
labs(title="Soil Temperature at 5cm depth",
subtitle='January',
caption='Fagro, 2022')
Graph:
Legend in reverse direction with correct color assignment
dput:
datene_stuc <-
structure(
list(
x = c(-57.063098328,-57.021448328,-56.996458328,-56.988128328),
y = c(-30.087481664,-30.087481664,-30.087481664,-30.087481664),
layer = c(
25.6227328470624,
26.6386584334308,
26.0636709134397,
26.0580615984563
),
cuts = structure(
c(7L, 9L,
8L, 8L),
.Label = c(
"(20,20.8]",
"(20.8,21.6]",
"(21.6,22.5]",
"(22.5,23.3]",
"(23.3,24.1]",
"(24.1,24.9]",
"(24.9,25.7]",
"(25.7,26.5]",
"(26.5,27.4]",
"(27.4,28.2]",
"(28.2,29]"
),
class = "factor"
)
),
row.names = c(NA,
4L),
class = "data.frame")
I am trying to put on a Italian geographical map a dot reporting the provenience ('provincia') of our patients. Ideally, the dot size should be proportional to the number of patients coming from that 'provincia'. An example of the list I would like to plot is the following.
MI 8319
CO 537
MB 436
VA 338
BG 310
PV 254
CR 244
NO 210
RM 189
CS 179
In the first column there is the 'provincia' code: MI (Milano), CO (Como), MB (Monza-Brianza), etc. In the second column there is the number of patients from that 'provincia'. So the output should be an Italian political map where the biggest dot is around the city of Milano (MI), the second biggest dot is near the city of Como (CO), the third one is around the city of Monza-Brianza (MB),etc.
Is there any package that could do the plot I am looking for? I found a tool that could do the job here, but apparently they expect that I load the geographical coordinates in order to do the plot.
https://www.littlemissdata.com/blog/maps
Thanks in advance.
Here is one way to handle your task. You have the abbreviations for Italian province. You want to use them to merge your data with polygon data. If you download Italy's polygons from GADM, you can obtain data that contain the abbreviations. Specifically, the column, HASC_2 is the one. You need to merge your data with the polygon data. Then, you want to create another data set which contains centroid. You can draw a map with the two data sets.
library(tidyverse)
library(sf)
library(ggthemes)
# Get the sf file from https://gadm.org/download_country_v3.html
# and import it in R.
mysf <- readRDS("gadm36_ITA_2_sf.rds")
# This is your data, which is called mydata.
mydata <- structure(list(abbs = c("MI", "CO", "MB", "VA", "BG", "PV", "CR",
"NO", "RM", "CS"), value = c(8319L, 537L, 436L, 338L, 310L, 254L,
244L, 210L, 189L, 179L)), class = "data.frame", row.names = c(NA,
-10L))
abbs value
1 MI 8319
2 CO 537
3 MB 436
4 VA 338
5 BG 310
6 PV 254
7 CR 244
8 NO 210
9 RM 189
10 CS 179
# Abbreviations are in HASC_2 in mysf. Manipulate strings so that
# I can join mydata with mysf with the abbreviations. I also get
# longitude and latitude with st_centroid(). This data set is for
# geom_point().
mysf2 <- mutate(mysf, HASC_2 = sub(x = HASC_2, pattern = "^IT.", replacement = "")) %>%
left_join(mydata, by = c("HASC_2" = "abbs")) %>%
mutate(lon = map_dbl(geometry, ~st_centroid(.x)[[1]]),
lat = map_dbl(geometry, ~st_centroid(.x)[[2]]))
# Draw a map
ggplot() +
geom_sf(data = mysf) +
geom_point(data = mysf2, aes(x = lon, y = lat, size = value)) +
theme_map()
UPDATE ON INSET MAP
This is an update following different suggestion on using inset maps, which I think it would be the best solution for yout question and comments:
library(sf)
library(cartography)
EU = st_read("~/R/mapslib/EUROSTAT/NUTS_RG_03M_2016_3035_LEVL_3.geojson")
IT = subset(EU, CNTR_CODE == "IT")
mydata <-
structure(list(
abbs = c("MI", "CO", "MB", "VA", "BG", "PV", "CR",
"NO", "RM", "CS"),
value = c(8319L, 537L, 436L, 338L, 310L, 254L,
244L, 210L, 189L, 179L),
nuts = c("ITC4C","ITC42","ITC4D","ITC41",
"ITC46", "ITC48","ITC4A","ITC15",
"ITI43","ITF61")
),
class = "data.frame",
row.names = c(NA, -10L))
patients = merge(IT, mydata, by.x = "id", by.y = "nuts")
#Get breaks for map
br=getBreaks(patients$value)
#Delimit zone
#Based on NUTS1, Nortwest Italy
par(mar=c(0,0,0,0))
ghostLayer(IT[grep("ITC",IT$NUTS_ID),], bg="lightblue")
plot(st_geometry(EU), col="grey90", add=TRUE)
plot(st_geometry(IT), col = "#FEFEE9", border = "#646464", add=TRUE)
choroLayer(
patients,
var = "value",
breaks = br,
col = carto.pal(pal1 = "red.pal", n1 = length(br)-1),
legend.pos = "topleft",
legend.title.txt = "Total patients",
add = TRUE,
legend.frame = TRUE
)
labelLayer(patients,txt="abbs", halo=TRUE, overlap = FALSE)
#Inset
par(
fig = c(0, 0.4, 0.01, 0.4),
new = TRUE
)
inset=patients[patients$abbs %in% c("RM","CS"),]
ghostLayer(inset, bg="lightblue")
plot(st_geometry(EU), col="grey90", add=TRUE)
plot(st_geometry(IT), col = "#FEFEE9", border = "#646464", add=TRUE)
choroLayer(
patients,
var = "value",
breaks = br,
col = carto.pal(pal1 = "red.pal", n1 = length(br)-1),
legend.pos = "n",
add = TRUE
)
labelLayer(patients,txt="abbs", halo=TRUE, overlap = FALSE)
box(which = "figure", lwd = 1)
#RESTORE PLOT
par(fig=c(0,1,0,1))
OLD ANSWER
Following my comment on plotting labels, maybe with circles is not the best option for your map, given the concentration. I suggest you to use another kind of map for that, as chorolayer, I leveraged on https://stackoverflow.com/users/3304471/jazzurro for the dataframe.
library(sf)
library(cartography)
EU = st_read("~/R/mapslib/EUROSTAT/NUTS_RG_03M_2016_3035_LEVL_3.geojson")
IT = subset(EU, CNTR_CODE == "IT")
mydata <-
structure(list(
abbs = c("MI", "CO", "MB", "VA", "BG", "PV", "CR",
"NO", "RM", "CS"),
value = c(8319L, 537L, 436L, 338L, 310L, 254L,
244L, 210L, 189L, 179L),
nuts = c("ITC4C","ITC42","ITC4D","ITC41",
"ITC46", "ITC48","ITC4A","ITC15",
"ITI43","ITF61")
),
class = "data.frame",
row.names = c(NA, -10L))
patients = merge(IT, mydata, by.x = "id", by.y = "nuts")
#Options1 - With circles
par(mar = c(0, 0, 0, 0))
plot(st_geometry(IT), col = "#FEFEE9", border = "#646464")
propSymbolsLayer(
x = patients,
var = "value",
col = carto.pal(pal1 = "red.pal", n1 = 6),
legend.title.txt = "Total patients",
add = TRUE
)
#Option 2 - Chorolayer with labels
par(mar = c(0, 0, 0, 0))
plot(st_geometry(IT), col = "#FEFEE9", border = "#646464")
choroLayer(
patients,
var = "value",
col = carto.pal(pal1 = "red.pal", n1 = 6),
legend.title.txt = "Total patients",
add = TRUE
)
#Create labels
patients$label = paste(patients$abbs, patients$value, sep = " - ")
labelLayer(
patients,
txt = "label",
overlap = FALSE,
halo = TRUE,
show.lines = TRUE,
)
Data from
https://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/nuts-2016-files.html
I am trying to use ggplot2 to make a US map based on some state level data, and color each state based on the value of one variable.
State loan
AL 25310770
AK 45310770
AZ 35310770
AR 25682770
...
Edit: Thanks to #Hector Haffenden, the dput(head(your_data)) gives:
structure(list(state = c("AL", "AK", "AZ", "AR", "IL", "MA"),
loan = c(25310770, 21230922, 15055436, 15212963, 12796921, 20311736),
row.names = c(NA, 6L), class = "data.frame")
Since I have a variable of state name, is it possible to automatically match each row to the map based on the state name abbreviations? Here is an example of my expected output:
https://i.imgur.com/0CD4fOx.png
Try this, first define our data like this,
dat <- data.frame(state = c("AL", "AK", "AZ", "AR"), Loan = c(25310770, 45310770, 35310770, 25682770))
Import the packages usmap and ggplot2, then, with more complete data, it will fill the whole map, but using the sample provided, we see
library(usmap)
library(ggplot2)
plot_usmap(
data = dat, values = "Loan", lines = "red"
) +
scale_fill_continuous(
low = "white", high = "red", name = "Loan", label = scales::comma
) +
labs(title = "US States", subtitle = "States and loan data") +
theme(legend.position = "right")
Note some states are grey due to small sample of data provided.
When plotting out markers on a interactive worlmap from the r package leaflet data with exactly the same coordinates will overlap each other.
See the example below:
library(leaflet)
Data <- structure(list(Name = structure(1:3, .Label = c("M1", "M2", "M3"), class = "factor"), Latitude = c(52L, 52L, 51L), Longitude = c(50L, 50L, 50L), Altitude = c(97L, 97L, 108L)), .Names = c("Name", "Latitude", "Longitude", "Altitude"), class = "data.frame", row.names = c(NA, -3L))
leaflet(data = Data) %>%
addProviderTiles("Esri.WorldImagery", options = providerTileOptions(noWrap = TRUE)) %>%
addMarkers(~Longitude, ~Latitude, popup = ~as.character(paste(sep = "",
"<b>",Name,"</b>","<br/>", "Altitude: ",Altitude)))
There is a possibilty to show all coordinates with the cluster option, but this is far from my goal. I dont want clusters and only the overlapping Markers are shown when fully zoomed in. When fully zoomed in the background map turns into grey("Map data not yet available"). The spider view of the overlapping markers is what i want, but not when fully zoomed in.
See example below:
leaflet(data = Data) %>%
addProviderTiles("Esri.WorldImagery", options = providerTileOptions(noWrap = TRUE)) %>%
addMarkers(~Longitude, ~Latitude, popup = ~as.character(paste(sep = "",
"<b>",Name,"</b>","<br/>", "Altitude: ",Altitude)), clusterOptions = markerClusterOptions())
I found some literatur about the solution i want but i dont know how to implement it in the r leaflet code/package.
https://github.com/jawj/OverlappingMarkerSpiderfier-Leaflet
Also if there are other approaches to handle overlapping Markers, feel free to answer. (for example multiple Markers info in one popup)
You could jitter() your coordinates slightly:
library(mapview)
library(sp)
Data <- structure(list(Name = structure(1:3, .Label = c("M1", "M2", "M3"),
class = "factor"),
Latitude = c(52L, 52L, 51L),
Longitude = c(50L, 50L, 50L),
Altitude = c(97L, 97L, 108L)),
.Names = c("Name", "Latitude", "Longitude", "Altitude"),
class = "data.frame", row.names = c(NA, -3L))
Data$lat <- jitter(Data$Latitude, factor = 0.0001)
Data$lon <- jitter(Data$Longitude, factor = 0.0001)
coordinates(Data) <- ~ lon + lat
proj4string(Data) <- "+init=epsg:4326"
mapview(Data)
This way you still need to zoom in for the markers to separate, how far you need to zoom in depends on the factor attribute in jitter().
Note that I am using library(mapview) in the example for simplicity.
Following up on my comment, here's a somewhat more modern solution (circa 2020) that takes advantage of some newer packages designed to make our lives easier (tidyverse & sf). I use sf:st_jitter as well as mapview as #TimSalabim does. Finally, I chose a slightly larger jitter factor so you wouldn't have to zoom in quite so far to see the effect:
library(mapview)
library(sf)
Data <- tibble(Name = c("M1", "M2", "M3"),
Latitude = c(52L, 52L, 51L),
Longitude = c(50L, 50L, 50L),
Altitude = c(97L, 97L, 108L))
Data %>%
st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326) %>%
st_jitter(factor = 0.001) %>%
mapview
I want to plot parts of a raster (highest values) on top of a previously plotted map. Here my code:
ggplot() + geom_map(data = GERblue_iran, aes(map_id = state, fill = veggies), map = iran_states) +
expand_limits(x = iran_states$long, y = iran_states$lat) +
geom_polygon(data=iran_states, aes(x= long, y= lat, group=group), color='grey40', lwd=0.3, fill=NA) +
scale_fill_distiller(palette="PuBu")
resulting in this plot:
And the code for the raster overlay:
ggplot() + geom_tile(data=nut_iran.df, aes(x = x, y = y, fill = layer)) + scale_fill_distiller(palette = "Reds")
My expected result should look (more or less) like this (result made in photoshop!!):
showing only values (of raster) above +/-1000.
Does anyone know how to plot like this in ggplot2?
For reproducibility, here a bit of my data:
library(ggplot2)
# polygons
iran <- getData("GADM", country = "IRN", level = 1)
iran_states <- fortify(iran, region = "NAME_1")
# data for choropleth map:
GERblue_iran <- structure(list(veggies = c(NA, 1135142.7169744, 1064475.14405642,
579007.139090945, 2291173.06203667, 1609487.86612194, 5514745.42173307,
210033.193615536, NA, 1082275.82518455, 395053.664034339, 833546.886449334,
1350410.79594876, 2030498.45168616, 5018327.9046678, 413119.296060151,
853322.135586823, 2136776.14200603, 581494.047168068, 535593.624579909,
414310.523642145, NA, NA, 2156369.86690811, 274390.590608389,
546804.909031463, 144406.95766963, 285002.432443622, 1605244.30546598,
307546.827903725, 589330.238261654), fruits = c(NA, 19645300.9396573,
39318516.6693754, 15154130.3351692, 38374281.8287458, 29164989.9985857,
125240289.719822, 6419392.00424945, NA, 23342736.5294504, 9223806.43987587,
19008972.7788205, 62709223.291618, 41703691.9392781, 164306013.73773,
13518682.0729514, 17420595.3934969, 44462391.2814304, 11715807.4374495,
13475005.0070146, 12228946.6624824, NA, NA, 63708363.0757236,
9221772.9477743, 13545791.4738047, 4268610.12809181, 6496251.74039526,
31651316.7352119, 8570276.47057257, 10288282.5059752), nuts = c(NA,
108771666.285736, 188713516.14938, 84626256.2539182, 227028948.551643,
165167762.523232, 669060935.751113, 17905599.826691, NA, 124536243.958677,
62369588.7036755, 123253859.776379, 3137384087.58166, 279956412.016931,
506078060.03775, 70275261.7698334, 115596090.695869, 284469721.056207,
73232219.7923014, 47691287.4633623, 73453936.1223698, NA, NA,
382631908.316345, 78226462.4088062, 60449633.6571361, 25656409.607032,
36523271.0224757, 233944364.555385, 158201233.377931, 74033085.0714528
), state = c("Alborz", "Ardebil", "Bushehr", "Chahar Mahall and Bakhtiari",
"East Azarbaijan", "Esfahan", "Fars", "Gilan", "Golestan", "Hamadan",
"Hormozgan", "Ilam", "Kerman", "Kermanshah", "Khuzestan", "Kohgiluyeh and Buyer Ahmad",
"Kordestan", "Lorestan", "Markazi", "Mazandaran", "North Khorasan",
"Qazvin", "Qom", "Razavi Khorasan", "Semnan", "Sistan and Baluchestan",
"South Khorasan", "Tehran", "West Azarbaijan", "Yazd", "Zanjan"
)), .Names = c("veggies", "fruits", "nuts", "state"), row.names = c(NA,
31L), class = "data.frame")
However, the raster data set is too large and I don't know how to make a random raster with this country extent. However, if someone knows how. Here is the transformation I did in order to plot it with geom_tile():
nut_iran.spdf <- as(nut_iran, "SpatialPixelsDataFrame")
nut_iran.df <- as.data.frame(nut_iran.spdf)
Thanks for your ideas.
EDIT: I managed to plot as #spacedman suggested the raster as I want, setting all values in nut_iran.df$layer < 1000 into NAs. And plot them inculding na.value="transparent" to scale_fill_distiller. Result which is fine:
However, plotting both together delivers:
due to the 2 different fills (pointed by #hrbrmstr) the color are set to last colors (red) and the previous NA from the choropleth map are as well setted transparent.