I frequently need to create geographical heat maps in R. Currently, I have been doing it in a licensed version of Tableau in my office computer which does a superb job. But I need to learn how to do it when I'm out of office. The data is sometimes confidential, so I cannot use Tableau public over the internet. I looked but could not find any solution that produces the result I need.
The data consists of names of districts in the state of Jharkhand, India along with child population in age group 6 to 14 in thousands. In Tableau, I merely have to set the DISTNAME column to "Geographical Role" at "County" level and it pulls the map of the state along with district boundaries from the internet (OpenStreetMap) and produces a heat map like this which is the result I expect from R, if possible:
The data is:
geo_data <- structure(list(DISTNAME = c("BOKARO", "CHATRA", "DEOGHAR", "DHANBAD",
"DUMKA", "GARHWA", "GIRIDIH", "GODDA", "GUMLA", "HAZARIBAGH",
"JAMTARA", "KHUNTI", "KODARMA", "LATEHAR", "LOHARDAGA", "PAKUR",
"PALAMU", "PASHCHIMI SINGHBHUM", "PURBI SINGHBHUM", "RAMGARH",
"RANCHI", "SAHIBGANJ", "SARAIKELA-KHARSAWAN", "SIMDEGA"), POP = c(521.5,
196.5, 323.8, 445.5, 123, 373.9, 357.6, 248.2, 212.4, 686.7,
626.7, 383.6, 391.9, 141, 436.1, 454.6, 301.3, 325.5, 193.7,
238.3, 208.7, 587.4, 130.1, 268)), .Names = c("DISTNAME", "POP"
), row.names = c(NA, 24L), class = "data.frame")
And looks like:
DISTNAME POP
1 BOKARO 521.5
2 CHATRA 196.5
3 DEOGHAR 323.8
4 DHANBAD 445.5
5 DUMKA 123.0
6 GARHWA 373.9
7 GIRIDIH 357.6
8 GODDA 248.2
9 GUMLA 212.4
10 HAZARIBAGH 686.7
11 JAMTARA 626.7
12 KHUNTI 383.6
13 KODARMA 391.9
14 LATEHAR 141.0
15 LOHARDAGA 436.1
16 PAKUR 454.6
17 PALAMU 301.3
18 PASHCHIMI SINGHBHUM 325.5
19 PURBI SINGHBHUM 193.7
20 RAMGARH 238.3
21 RANCHI 208.7
22 SAHIBGANJ 587.4
23 SARAIKELA-KHARSAWAN 130.1
24 SIMDEGA 268.0
You'll need SHP file, which can be found using getData(). Full working code:
library(tidyverse)
library(broom)
library(rgdal)
Your geo data
geo_data <- structure(list(DISTNAME = c("BOKARO", "CHATRA", "DEOGHAR", "DHANBAD", "DUMKA", "GARHWA", "GIRIDIH", "GODDA", "GUMLA", "HAZARIBAGH", "JAMTARA", "KHUNTI", "KODARMA", "LATEHAR", "LOHARDAGA", "PAKUR", "PALAMU", "PASHCHIMI SINGHBHUM", "PURBI SINGHBHUM", "RAMGARH", "RANCHI", "SAHIBGANJ", "SARAIKELA-KHARSAWAN", "SIMDEGA"),
POP = c(521.5, 196.5, 323.8, 445.5, 123, 373.9, 357.6, 248.2, 212.4, 686.7, 626.7, 383.6, 391.9, 141, 436.1, 454.6, 301.3, 325.5, 193.7, 238.3, 208.7, 587.4, 130.1, 268)),
.Names = c("DISTNAME", "POP"),
row.names = c(NA, 24L),
class = "data.frame")
get the map
library(raster)
IN2 <- getData('GADM', country='IND', level=2)
IN2 <- spTransform(IN2, CRS("+init=epsg:4326"))
IN2_map <- tidy(IN2, region = "NAME_2")
id in geo_data to lower
geo_data$DISTNAME <- tolower(geo_data$DISTNAME)
IN2_map %>%
mutate(id = tolower(id)) %>%
left_join(geo_data, by = c("id" = "DISTNAME")) %>%
ggplot() +
geom_polygon(aes(long, lat, group=group, fill = POP), color = "black")
In the solution below I've used map shapefiles downloaded from: http://projects.datameet.org/maps/districts/
Edit: Later I also tried Jharkhand map extracted from http://gadm.org/country which shows slight differences in district boundaries. It matches better with other political maps of the state available on the internet.
Here's my solution:
library(tmap)
library(tmaptools)
geo_data <- data.frame(
DISTNAME = c("BOKARO", "CHATRA", "DEOGHAR", "DHANBAD", "DUMKA", "GARHWA", "GIRIDIH", "GODDA", "GUMLA", "HAZARIBAGH", "JAMTARA", "KHUNTI", "KODARMA", "LATEHAR", "LOHARDAGA", "PAKUR", "PALAMU", "PASHCHIMI SINGHBHUM", "PURBI SINGHBHUM", "RAMGARH", "RANCHI", "SAHIBGANJ", "SARAIKELA-KHARSAWAN", "SIMDEGA"),
POP = c(521.5, 196.5, 323.8, 445.5, 123, 373.9, 357.6, 248.2, 212.4, 686.7, 626.7, 383.6, 391.9, 141, 436.1, 454.6, 301.3, 325.5, 193.7, 238.3, 208.7, 587.4, 130.1, 268))
# the path to shape file
shp_file <- "H:/Mapping/maps-master/Districts/Census_2011/2011_Dist.shp"
india <- read_shape(shp_file, as.sf = TRUE, stringsAsFactors = FALSE)
india$DISTRICT <- toupper(india$DISTRICT)
jharkhand <- india[india$ST_NM =="Jharkhand", ]
jharkhand_pop <- merge(x = jharkhand,
y = geo_data,
by.x = "DISTRICT",
by.y = "DISTNAME")
#tmap_mode(mode = "plot") # static
tmap_mode(mode = "view") # interactive
qtm(jharkhand_pop, fill = "POP",
text = "DISTRICT",
text.size=.9)
The static map (plot mode) is very good but the interactive map (view mode) is super awesome. It gives the option to pull additional map information from three different sources from the internet.
A big thanks to the creators of tmap and tmaptools packages. This method is far superior to many comparatively longer and awkward solutions that can be found on the internet.
If we want more customization:
tm_shape(jharkhand_pop) +
tm_polygons() +
tm_shape(jharkhand_pop) +
tm_borders() +
tm_fill("POP",
palette = get_brewer_pal("YlOrRd", n = 20),
n = 20,
legend.show = F,
style = "order") + # "cont" or "order" for continuous variable
tm_text("DISTRICT", size = .7, ymod = .1) +
tm_shape(jharkhand_pop) +
tm_text("POP", size = .7, ymod = -.2)
we get the following in plot mode:
Related
I'm working on a Bubble map where I generated two columns, one for a color id (column Color) and one for a text refering to the id (column Class). This is a classification of my individuals (Color always belongs to Class).
Class is a factor following a certain order that I made with :
COME1039$Class <- as.factor(COME1039$Class, levels = c('moins de 100 000 F.CFP',
'entre 100 000 et 5 millions F.CFP',
'entre 5 millions et 1 milliard F.CFP',
'entre 1 milliard et 20 milliards F.CFP',
'plus de 20 milliards F.CFP'))
This is my code
g <- list(
scope = 'world',
visible = F,
showland = TRUE,
landcolor = toRGB("#EAECEE"),
showcountries = T,
countrycolor = toRGB("#D6DBDF"),
showocean = T,
oceancolor = toRGB("#808B96")
)
COM.g1 <- plot_geo(data = COME1039,
sizes = c(1, 700))
COM.g1 <- COM.g1 %>% add_markers(
x = ~LONGITUDE,
y = ~LATITUDE,
name = ~Class,
size = ~`Poids Imports`,
color = ~Color,
colors=c(ispfPalette[c(1,2,3,7,6)]),
text=sprintf("<b>%s</b> <br>Poids imports: %s tonnes<br>Valeur imports: %s millions de F.CFP",
COME1039$NomISO,
formatC(COME1039$`Poids Imports`/1000,
small.interval = ",",
digits = 1,
big.mark = " ",
decimal.mark = ",",
format = "f"),
formatC(COME1039$`Valeur Imports`/1000000,
small.interval = ",",
digits = 1,
big.mark = " ",
decimal.mark = ",",
format = "f")),
hovertemplate = "%{text}<extra></extra>"
)
COM.g1 <- COM.g1%>% layout(geo=g)
COM.g1 <- COM.g1%>% layout(dragmode=F)
COM.g1 <- COM.g1 %>% layout(showlegend=T)
COM.g1 <- COM.g1 %>% layout(legend = list(title=list(text='Valeurs des importations<br>'),
orientation = "h",
itemsizing='constant',
x=0,
y=0)) %>% hide_colorbar()
COM.g1
Unfortunately my data are too big to be added here, but this is the output I get :
As you can see, the order of the legend is not the one of the factor levels. How to get it ? If data are mandatory to help you to give me a hint, I will try to limit their size.
Many thanks !
Plotly is going to alphabetize your legend and you have to 'make' it listen. The order of the traces in your plot is the order in which the items appear in your legend. So if you rearrange the traces in the object, you'll rearrange the legend.
I don't have your data, so I used some data from rnaturalearth.
First I created a plot, using plot_geo. Then I used plotly_build() to make sure I had the trace order in the Plotly object. I used lapply to investigate the current order of the traces. Then I created a new order, rearranged the traces, and plotted it again.
The initial plot and build.
library(tidyverse)
library(plotly)
library(rnaturalearth)
canada <- ne_states(country = "Canada", returnclass = "SF")
x = plot_geo(canada, sizes = c(1, 700)) %>%
add_markers(x = ~longitude, y = ~latitude,
name = ~name, color = ~name)
x <- plotly_build(x) # capture all elements of the object
Now for the investigation; this is more so you can see how this all comes together.
# what order are they in?
y = vector()
invisible(
lapply(1:length(x$x$data),
function(i) {
z <- x$x$data[[i]]$name
message(i, " ", z)
})
)
# 1 Alberta
# 2 British Columbia
# 3 Manitoba
# 4 New Brunswick
# 5 Newfoundland and Labrador
# 6 Northwest Territories
# 7 Nova Scotia
# 8 Nunavut
# 9 Ontario
# 10 Prince Edward Island
# 11 Québec
# 12 Saskatchewan
# 13 Yukon
In your question, you show that you made the legend element a factor. That's what I've done as well with this data.
can2 = canada %>%
mutate(name = ordered(name,
levels = c("Manitoba", "New Brunswick",
"Newfoundland and Labrador",
"Northwest Territories",
"Alberta", "British Columbia",
"Nova Scotia", "Nunavut",
"Ontario", "Prince Edward Island",
"Québec", "Saskatchewan", "Yukon")))
I used the data to reorder the traces in my Plotly object. This creates a vector. It starts with the levels and their row number or order (1:13). Then I alphabetized the data by the levels (so it matches the current order in the Plotly object).
The output of this set of function calls is a vector of numbers (i.e., 5, 6, 1, etc.). Since I have 13 names, I have 1:13. You could always make it dynamic, as well 1:length(levels(can2$name).
# capture order
df1 = data.frame(who = levels(can2$name), ord = 1:13) %>%
arrange(who) %>% select(ord) %>% unlist()
Now all that's left is to rearrange the object traces and visualize it.
x$x$data = x$x$data[order(c(df1))] # reorder the traces
x # visualize
Originally:
With reordered traces:
I have the dataframe below for which I want to create a chorpleth map. I downloaded the germany shapefile from here and then I use this code to create the map. As you can see the map is created but because I have several regions missing they are set to NAs and they get a black color. How can I deal with this issue? Maybe eliminate them or change them to 0? Im open to other packages like leaflet or something if they can solve the issue.
region<-c("09366",
"94130",
"02627",
"95336",
"08525",
"92637",
"95138",
"74177",
"08606",
"94152" )
value<-c( 39.5,
519.,
5.67,
5.10,
5.08,
1165,
342,
775,
3532,
61.1 )
df<-data.frame(region,value)
#shapefile from http://www.suche-postleitzahl.org/downloads?download=zuordnung_plz_ort.csv
library(choroplethr)
library(dplyr)
library(ggplot2)
library(rgdal)
library(maptools)
library(gpclib)
library(readr)
library(R6)
ger_plz <- readOGR(dsn = ".", layer = "plz-gebiete")
gpclibPermit()
#convert the raw data to a data.frame as ggplot works on data.frames
ger_plz#data$id <- rownames(ger_plz#data)
ger_plz.point <- fortify(ger_plz, region="id")
ger_plz.df <- inner_join(ger_plz.point,ger_plz#data, by="id")
head(ger_plz.df)
ggplot(ger_plz.df, aes(long, lat, group=group )) + geom_polygon()
#data file
#df <- produce_sunburst_sequences
# variable name 'region' is needed for choroplethr
ger_plz.df$region <- ger_plz.df$plz
head(ger_plz.df)
#subclass choroplethr to make a class for your my need
GERPLZChoropleth <- R6Class("GERPLZChoropleth",
inherit = choroplethr:::Choropleth,
public = list(
initialize = function(user.df) {
super$initialize(ger_plz.df, user.df)
}
)
)
#df<-df[,c(6,13)]
#choropleth needs these two columnames - 'region' and 'value'
colnames(df) = c("region", "value")
#df<-df[!(df$region=="Missing_company_zip"),]
#df<-df[!duplicated(df$region), ]
#instantiate new class with data
c <- GERPLZChoropleth$new(df)
#plot the data
c$ggplot_polygon = geom_polygon(aes(fill = value), color = NA)
c$title = "Comparison of number of Inhabitants per Zipcode in Germany"
c$legend= "Number of Inhabitants per Zipcode"
c$set_num_colors(9)
c$render()
Package sf will make your process easier.
library(tidyverse)
library(sf)
df <- data.frame(region = c("09366", "94130", "02627", "95336", "08525", "92637", "95138", "74177", "08606", "94152"),
value = c(39.5, 519, 5.67, 5.1, 5.08, 1165, 342, 775, 3532, 61.1))
germany_sf <- sf::st_read(dsn = "plz-gebiete.shp") %>%
left_join(df, by = c("plz" = "region"))
germany_sf %>%
ggplot() +
geom_sf(alpha = 0.1, size = 0.1, colour = "gray") +
geom_sf(data = . %>% filter(!is.na(value)), aes(fill = value)) +
scale_fill_viridis_c() +
theme_bw()
For a zoomable/interactive option, use {tmap}, a package that wraps {leaflet} for quick, simple maps.
library(tmap)
tmap_mode("view")
tm_shape(shp = germany_sf) +
tm_polygons(col = "value", border.alpha = 0)
I've been messing around with the choroplethr package a bit and had this same question. The "aha" moment was learning that the output from the various x_choropleth functions is actually just a ggplot object. This means you can modify them as you would any ggplot graphic. So if you add something like this in your graphic output pipeline I think it might achieve what you're after:
+ scale_fill_distiller(na.value = "white")
Not sure if some of the other things you're doing here would preclude this from working.
Shout out to this write-up: https://statisticaloddsandends.wordpress.com/2019/07/15/looking-at-flood-insurance-claims-with-choroplethr/
I have some authors with their city or country of affiliation. I would like to know if it is possible to plot the coauthors' networks (figure 1), on the map, having the coordinates of the countries. Please consider multiple authors from the same country. [EDIT: Several networks could be generated as in the example and should not show avoidable overlaps]. This is intended for dozens of authors. A zooming option is desirable. Bounty promise +100 for future better answer.
refs5 <- read.table(text="
row bibtype year volume number pages title journal author
Bennett_1995 article 1995 76 <NA> 113--176 angiosperms. \"Annals of Botany\" \"Bennett Md, Leitch Ij\"
Bennett_1997 article 1997 80 2 169--196 estimates. \"Annals of Botany\" \"Bennett MD, Leitch IJ\"
Bennett_1998 article 1998 82 SUPPL.A 121--134 weeds. \"Annals of Botany\" \"Bennett MD, Leitch IJ, Hanson L\"
Bennett_2000 article 2000 82 SUPPL.A 121--134 weeds. \"Annals of Botany\" \"Bennett MD, Someone IJ\"
Leitch_2001 article 2001 83 SUPPL.A 121--134 weeds. \"Annals of Botany\" \"Leitch IJ, Someone IJ\"
New_2002 article 2002 84 SUPPL.A 121--134 weeds. \"Annals of Botany\" \"New IJ, Else IJ\"" , header=TRUE,stringsAsFactors=FALSE)
rownames(refs5) <- refs5[,1]
refs5<-refs5[,2:9]
citations <- as.BibEntry(refs5)
authorsl <- lapply(citations, function(x) as.character(toupper(x$author)))
unique.authorsl<-unique(unlist(authorsl))
coauth.table <- matrix(nrow=length(unique.authorsl),
ncol = length(unique.authorsl),
dimnames = list(unique.authorsl, unique.authorsl), 0)
for(i in 1:length(citations)){
paper.auth <- unlist(authorsl[[i]])
coauth.table[paper.auth,paper.auth] <- coauth.table[paper.auth,paper.auth] + 1
}
coauth.table <- coauth.table[rowSums(coauth.table)>0, colSums(coauth.table)>0]
diag(coauth.table) <- 0
coauthors<-coauth.table
bip = network(coauthors,
matrix.type = "adjacency",
ignore.eval = FALSE,
names.eval = "weights")
authorcountry <- read.table(text="
author country
1 \"LEITCH IJ\" Argentina
2 \"HANSON L\" USA
3 \"BENNETT MD\" Brazil
4 \"SOMEONE IJ\" Brazil
5 \"NEW IJ\" Brazil
6 \"ELSE IJ\" Brazil",header=TRUE,fill=TRUE,stringsAsFactors=FALSE)
matched<- authorcountry$country[match(unique.authorsl, authorcountry$author)]
bip %v% "Country" = matched
colorsmanual<-c("red","darkgray","gainsboro")
names(colorsmanual) <- unique(matched)
gdata<- ggnet2(bip, color = "Country", palette = colorsmanual, legend.position = "right",label = TRUE,
alpha = 0.9, label.size = 3, edge.size="weights",
size="degree", size.legend="Degree Centrality") + theme(legend.box = "horizontal")
gdata
In other words, adding the names of authors, lines and bubbles to the map. Note, several authors maybe from the same city, or country and should not overlap.
Figure 1 Network
EDIT: The current JanLauGe answer overlaps two non-related networks. authors "ELSE" and "NEW" need to be apart from others as in figure 1.
Are you looking for a solution using exactly the packages you used, or would you be happy to use suite of other packages? Below is my approach, in which I extract the graph properties from the network object and plot them on a map using the ggplot2 and map package.
First I recreate the example data you gave.
library(tidyverse)
library(sna)
library(maps)
library(ggrepel)
set.seed(1)
coauthors <- matrix(
c(0,3,1,1,3,0,1,0,1,1,0,0,1,0,0,0),
nrow = 4, ncol = 4,
dimnames = list(c('BENNETT MD', 'LEITCH IJ', 'HANSON L', 'SOMEONE ELSE'),
c('BENNETT MD', 'LEITCH IJ', 'HANSON L', 'SOMEONE ELSE')))
coords <- data_frame(
country = c('Argentina', 'Brazil', 'USA'),
coord_lon = c(-63.61667, -51.92528, -95.71289),
coord_lat = c(-38.41610, -14.23500, 37.09024))
authorcountry <- data_frame(
author = c('LEITCH IJ', 'HANSON L', 'BENNETT MD', 'SOMEONE ELSE'),
country = c('Argentina', 'USA', 'Brazil', 'Brazil'))
Now I generate the graph object using the snp function network
# Generate network
bip <- network(coauthors,
matrix.type = "adjacency",
ignore.eval = FALSE,
names.eval = "weights")
# Graph with ggnet2 for centrality
gdata <- ggnet2(bip, color = "Country", legend.position = "right",label = TRUE,
alpha = 0.9, label.size = 3, edge.size="weights",
size="degree", size.legend="Degree Centrality") + theme(legend.box = "horizontal")
From the network object we can extract the values of each edge, and from the ggnet2 object we can get degree of centrality for nodes as below:
# Combine data
authors <-
# Get author numbers
data_frame(
id = seq(1, nrow(coauthors)),
author = sapply(bip$val, function(x) x$vertex.names)) %>%
left_join(
authorcountry,
by = 'author') %>%
left_join(
coords,
by = 'country') %>%
# Jittering points to avoid overlap between two authors
mutate(
coord_lon = jitter(coord_lon, factor = 1),
coord_lat = jitter(coord_lat, factor = 1))
# Get edges from network
networkdata <- sapply(bip$mel, function(x)
c('id_inl' = x$inl, 'id_outl' = x$outl, 'weight' = x$atl$weights)) %>%
t %>% as_data_frame
dt <- networkdata %>%
left_join(authors, by = c('id_inl' = 'id')) %>%
left_join(authors, by = c('id_outl' = 'id'), suffix = c('.from', '.to')) %>%
left_join(gdata$data %>% select(label, size), by = c('author.from' = 'label')) %>%
mutate(edge_id = seq(1, nrow(.)),
from_author = author.from,
from_coord_lon = coord_lon.from,
from_coord_lat = coord_lat.from,
from_country = country.from,
from_size = size,
to_author = author.to,
to_coord_lon = coord_lon.to,
to_coord_lat = coord_lat.to,
to_country = country.to) %>%
select(edge_id, starts_with('from'), starts_with('to'), weight)
Should look like this now:
dt
# A tibble: 8 × 11
edge_id from_author from_coord_lon from_coord_lat from_country from_size to_author to_coord_lon
<int> <chr> <dbl> <dbl> <chr> <dbl> <chr> <dbl>
1 1 BENNETT MD -51.12756 -16.992729 Brazil 6 LEITCH IJ -65.02949
2 2 BENNETT MD -51.12756 -16.992729 Brazil 6 HANSON L -96.37907
3 3 BENNETT MD -51.12756 -16.992729 Brazil 6 SOMEONE ELSE -52.54160
4 4 LEITCH IJ -65.02949 -35.214117 Argentina 4 BENNETT MD -51.12756
5 5 LEITCH IJ -65.02949 -35.214117 Argentina 4 HANSON L -96.37907
6 6 HANSON L -96.37907 36.252312 USA 4 BENNETT MD -51.12756
7 7 HANSON L -96.37907 36.252312 USA 4 LEITCH IJ -65.02949
8 8 SOMEONE ELSE -52.54160 -9.551913 Brazil 2 BENNETT MD -51.12756
# ... with 3 more variables: to_coord_lat <dbl>, to_country <chr>, weight <dbl>
Now moving on to plotting this data on a map:
world_map <- map_data('world')
myMap <- ggplot() +
# Plot map
geom_map(data = world_map, map = world_map, aes(map_id = region),
color = 'gray85',
fill = 'gray93') +
xlim(c(-120, -20)) + ylim(c(-50, 50)) +
# Plot edges
geom_segment(data = dt,
alpha = 0.5,
color = "dodgerblue1",
aes(x = from_coord_lon, y = from_coord_lat,
xend = to_coord_lon, yend = to_coord_lat,
size = weight)) +
scale_size(range = c(1,3)) +
# Plot nodes
geom_point(data = dt,
aes(x = from_coord_lon,
y = from_coord_lat,
size = from_size,
colour = from_country)) +
# Plot names
geom_text_repel(data = dt %>%
select(from_author,
from_coord_lon,
from_coord_lat) %>%
unique,
colour = 'dodgerblue1',
aes(x = from_coord_lon, y = from_coord_lat, label = from_author)) +
coord_equal() +
theme_bw()
Obviously you can change the colour and design in the usual way with ggplot2 grammar. Notice that you could also use geom_curve and the arrow aesthetic to get a plot similar to the one in the uber post linked in the comments above.
As an effort to avoid the overlapping of the 2 networks, I came to this modification of the x and y coordenates of the ggplot, which by default does not overlap the networks, see figure 1 in the question.
# get centroid positions for countries
# add coordenates to authorcountry table
# download and unzip
# https://worldmap.harvard.edu/data/geonode:country_centroids_az8
setwd("~/country_centroids_az8")
library(rgdal)
cent <- readOGR('.', "country_centroids_az8", stringsAsFactors = F)
countrycentdf<-cent#data[,c("name","Longitude","Latitude")]
countrycentdf$name[which(countrycentdf$name=="United States")]<-"USA"
colnames(countrycentdf)[names(countrycentdf)=="name"]<-"country"
authorcountry$Longitude<-countrycentdf$Longitude[match(authorcountry$country,countrycentdf$country)]
authorcountry$Latitude <-countrycentdf$Latitude [match(authorcountry$country,countrycentdf$country)]
# original coordenates of plot and its transformation
ggnetbuild<-ggplot_build(gdata)
allcoord<-ggnetbuild$data[[3]][,c("x","y","label")]
allcoord$Latitude<-authorcountry$Latitude [match(allcoord$label,authorcountry$author)]
allcoord$Longitude<-authorcountry$Longitude [match(allcoord$label,authorcountry$author)]
allcoord$country<-authorcountry$country [match(allcoord$label,authorcountry$author)]
# increase with factor the distance among dots
factor<-7
allcoord$coord_lat<-allcoord$y*factor+allcoord$Latitude
allcoord$coord_lon<-allcoord$x*factor+allcoord$Longitude
allcoord$author<-allcoord$label
# plot as in answer of JanLauGe, without jitter
library(tidyverse)
library(ggrepel)
authors <-
# Get author numbers
data_frame(
id = seq(1, nrow(coauthors)),
author = sapply(bip$val, function(x) x$vertex.names)) %>%
left_join(
allcoord,
by = 'author')
# Continue as in answer of JanLauGe
networkdata <- ##
dt <- ##
world_map <- map_data('world')
myMap <- ##
myMap
I am struggling to get my first map to work. I have read every document I could find but I am not able to pull it all together to view my data on a map.
This is what I have done so far.
1. I created a very basic data table with 3 observations and 5 variables as a very simple starting point.
str(Datawithlatlongnotvector)
'data.frame': 3 obs. of 5 variables:
$ Client: Factor w/ 3 levels "Jan","Piet","Susan": 2 1 3
$ Sales : int 100 1000 15000
$ Lat : num 26.2 33.9 23.9
$ Lon : num 28 18.4 29.4
$ Area : Factor w/ 3 levels "Gauteng","Limpopo",..: 1 3 2
(the Area is the provinces of South Africa and also is as per the SHP file that I downloaded, see below)
I downloaded a map of South Africa and placed all 3 files (.dbf, shp and shx) files in the same directory - previous error but I found the answer from another user's question. http://www.mapmakerdata.co.uk.s3-website-eu-west-1.amazonaws.com/library/stacks/Africa/South%20Africa/index.htm and selected Simple base map.
I created a map as follows :
SAMap <- readOGR(dsn = ".", layer = "SOU-level_1")
and I can plot the map of the country showing the provinces with plot(SAMap)
I can also plot the data
plot(datawithlatlong)
I saw the instructions how to make a SpatialPointsData frame and I did that :
coordinates(Datawithlatlong) = ~Lat + Lon
I do not know how to pull it all together and do the following :
Show the data (100,1000 and 15000) on the map with different colours i.e. between 1 and 500 is one colour, between 501 and 10 000 is one colour and above 10 000 is one colour.
Maybe trying ggplot2 with some function like:
map = ggplot(df, aes(long, lat, fill = Sales_cat)) + scale_fill_brewer(type = "seq", palette = "Oranges", name = "Sales") + geom_polygon()
With scale_fill_brewer you can represent scales in terms of colours on the map. You should create a factor variable that represents categories according to the range of sales ("Sales_cat"). In any case, the shape file must be transformed into a data.frame.
Try this for 'SAMap' as the country shapefile and 'datawithlatlong' as your data convereted to SpatialPointDataFrame:
library(maptools)
library(classInt)
library(RColorBrewer)
# Prepare colour pallete
plotclr <- brewer.pal(3,"PuRd")
class<-classIntervals(datawithlatlong#data$sales, n=3, style="fixed", fixedBreaks=c(0, 500,1000,10000)) # you can adjust the intervals here
colcode <- findColours(class, plotclr)
# Plot country map
plot(SAMap,xlim=c(16, 38.0), ylim=c(-46,-23))# plot your polygon shapefile with appropriate xlim and ylim (extent)
# Plot dataframe convereted to SPDF (in your step 5)
plot(datawithlatlong, col=colcode, add=T,pch=19)
# Creating the legend
legend(16.2, -42, legend=names(attr(colcode, "table")), fill=attr(colcode, "palette"), cex=0.6, bty="n") # adjust the x and y for fixing appropriate location for the legend
I generated a bigger dataset because I think with only 3 points it hard to see how things are working.
library(rgdal)
library(tmap)
library(ggmap)
library(randomNames)
#I downloaded the shapefile with the administrative area polygons
map <- readOGR(dsn = ".", layer = "SOU")
#the coordinate system is not part of the loaded object hence I added this information
proj4string(map) <- CRS("+init=epsg:4326")
# Some sample data with random client names and random region
ADM2 <- sample(map#data$ADM2, replace = TRUE, 50)
name <- randomNames(50)
sales <- sample(0:5000, 50)
clientData <- data.frame(id = 1:50, name, region = as.character(ADM2), sales,
stringsAsFactors = FALSE)
#In order to add the geoinformation for each client I used the awesome
#function `ggmap::geocode` which takes a character string as input an
#provides the lon and lat for the region, city ...
geoinfo <- geocode(clientData$region, messaging = FALSE)
# Use this information to build a Point layer
clientData_point <- SpatialPointsDataFrame(geoinfo, data = clientData)
proj4string(clientData_point) <- CRS("+init=epsg:4326")
Now the part I hope that answers the question:
# Adding all sales which occured in one region
# If there are 3 clients in one region, the sales of the three are
# summed up and returned in a new layer
sales_map <- aggregate(x = clientData_point[ ,4], by = map, FUN = sum)
# Building a map using the `tmap` package`
tm_shape(sales_map) + tm_polygons(col = "sales")
Edit:
Here is a ggplot2 solution because it seems you want to stick with it.
First, for ggplot you have to transform your SpatialPolygonDataFrame to an ordinary data.frame. Fortunately, broom::tidy() will do the job automatically.
Second, your Lat values are missing a -. I added it.
Third, I renamed your objects for less typing.
point_layer<- structure(list(Client = structure(c(2L, 1L, 3L),
.Label = c("Jan", "Piet", "Susan"),
class = "factor"),
Sales = c(100, 1000, 15000 ),
Lat = c(-26.2041, -33.9249, -23.8962),
Lon = c(28.0473, 18.4241, 29.4486),
Area = structure(c(1L, 3L, 2L),
.Label = c("Gauteng", "Limpopo", "Western Cape"),
class = "factor"),
Sale_range = structure(c(1L, 2L, 4L),
.Label = c("(1,500]", "(500,2e+03]", "(2e+03,5e+03]", "(5e+03,5e+04]"),
class = "factor")),
.Names = c("Client", "Sales", "Lat", "Lon", "Area", "Sale_range"),
row.names = c(NA, -3L), class = "data.frame")
point_layer$Sale_range <- cut(point_layer$Sales, c(1,500.0,2000.0,5000.0,50000.0 ))
library(broom)
library(ggplot2)
ggplot_map <- tidy(map)
ggplot() + geom_polygon(ggplot_map, mapping = aes(x = long, y = lat, group = group),
fill = "grey65", color = "black") +
geom_point(point_layer, mapping = aes(x = Lon, y = Lat, col = Sale_range)) +
scale_colour_brewer(type = "seq", palette = "Oranges", direction = 1)
I want to generate a map of India in R. I have five indicators with different values of every state. I want to plot bubbles with five different colors, and their size should represent their intensity in every state. For example:
State A B C D E
Kerala - 39, 5, 34, 29, 11
Bihar - 6, 54, 13, 63, 81
Assam - 55, 498, 89, 15, 48,
Chandigarh - 66, 11, 44, 33, 71
I have gone through some links related to my problem:
[1] http://www.r-bloggers.com/nrega-and-indian-maps-in-r/
[2] An R package for India?
But these links could not serve my purpose. Any help in this direction would be greatly appreciated.
I have also tried
library(mapproj)
map(database= "world", regions = "India", exact=T, col="grey80", fill=TRUE, projection="gilbert", orientation= c(90,0,90))
lat <- c(23.30, 28.38)
lon <- c(80, 77.12) # Lon and Lat for two cities Bhopal and Delhi
coord <- mapproject(lon, lat, proj="gilbert", orientation=c(90, 0, 90))
points(coord, pch=20, cex=1.2, col="red")
In nut shell problems are:
(1) It does not give me plot at district level. Not even boundries of states.
(2) How to create bubbles or dots of my data in this plot, if I have only name of locations and corresponding value to plot?
(3) can this be done in easily in library(RgoogleMaps) or library(ggplot2)? (Just a guess, I do not know much about these packages)
As #lawyeR states, a choropleth (or thematic) map is more commonly used to represent variables on a map. This would require you to produce one map per variable. Let me take you through an example:
require("rgdal") # needed to load shapefiles
# obtain India administrative shapefiles and unzip
download.file("http://biogeo.ucdavis.edu/data/diva/adm/IND_adm.zip",
destfile = "IND_adm.zip")
unzip("IND_adm.zip", overwrite = TRUE)
# load shapefiles
india <- readOGR(dsn = "shapes/", "IND_adm1")
# check they've loaded correctly with a plot
plot(india)
# all fine. Let's plot an example variable using ggplot2
require("ggplot2")
require("rgeos") # for fortify() with SpatialPolygonsDataFrame types
india#data$test <- sample(65000:200000000, size = nrow(india#data),
replace = TRUE)
# breaks the shapefile down to points for compatibility with ggplot2
indiaF <- fortify(india, region = "ID_1")
indiaF <- merge(indiaF, india, by.x = "id", by.y = "ID_1")
# plots the polygon and fills them with the value of 'test'
ggplot() +
geom_polygon(data = indiaF, aes(x = long, y = lat, group = group,
fill = test)) +
coord_equal()
Finally, I notice you asked the same question on GIS SE. This is considered bad practice and is generally frowned upon, so I've flagged that question to be closed as a duplicate of this. As a general rule of thumb try not to create duplicates.
Good luck!
Once you have the shapefile for India, you need to create a choropleth. That will take the shapefule map and color each State in India on a gradient that reflects your data. You may want to create a panel of five plots, each one showing India and its states colored according to one of your five variables.
For others who can push this answer farthr, here is the dput of the data frame, after a bit of cleaning.
dput(df)
structure(list(State = c("Kerala", "Bihar", "Assam", "Chandigarh"
), A = c("39", "6", "55", "66"), B = c("5", "54", "498", "11"
), C = c("34", "13", "89", "44"), D = c("29", "63", "15", "33"
), E = c("11", "81", "48", "71")), .Names = c("State", "A", "B",
"C", "D", "E"), row.names = c("Kerala", "Bihar", "Assam", "Chandigarh"
), class = "data.frame")