Related
This may be a wish list thing, not sure (i.e. maybe there would need to be the creation of geom_pie for this to occur). I saw a map today (LINK) with pie graphs on it as seen here.
I don't want to debate the merits of a pie graph, this was more of an exercise of can I do this in ggplot?
I have provided a data set below (loaded from my drop box) that has the mapping data to make a New York State map and some purely fabricated data on racial percentages by county. I have given this racial make up as a merge with the main data set and as a separate data set called key. I also think Bryan Goodrich's response to me in another post (HERE) on centering county names will be helpful to this concept.
How can we make the map above with ggplot2?
A data set and the map without the pie graphs:
load(url("http://dl.dropbox.com/u/61803503/nycounty.RData"))
head(ny); head(key) #view the data set from my drop box
library(ggplot2)
ggplot(ny, aes(long, lat, group=group)) + geom_polygon(colour='black', fill=NA)
# Now how can we plot a pie chart of race on each county
# (sizing of the pie would also be controllable via a size
# parameter like other `geom_` functions).
Thanks in advance for your ideas.
EDIT: I just saw another case at junkcharts that screams for this type of capability:
Three years later this is solved. I've put together a number of processes together and thanks to #Guangchuang Yu's excellent ggtree package this can be done fairly easily. Note that as of (9/3/2015) you need to have version 1.0.18 of ggtree installed but these will eventually trickle down to their respective repositories.
I've used the following resources to make this (the links will give greater detail):
ggtree blog
move ggplot legend
correct ggtree version
centering things in polygons
Here's the code:
load(url("http://dl.dropbox.com/u/61803503/nycounty.RData"))
head(ny); head(key) #view the data set from my drop box
if (!require("pacman")) install.packages("pacman")
p_load(ggplot2, ggtree, dplyr, tidyr, sp, maps, pipeR, grid, XML, gtable)
getLabelPoint <- function(county) {Polygon(county[c('long', 'lat')])#labpt}
df <- map_data('county', 'new york') # NY region county data
centroids <- by(df, df$subregion, getLabelPoint) # Returns list
centroids <- do.call("rbind.data.frame", centroids) # Convert to Data Frame
names(centroids) <- c('long', 'lat') # Appropriate Header
pops <- "http://data.newsday.com/long-island/data/census/county-population-estimates-2012/" %>%
readHTMLTable(which=1) %>%
tbl_df() %>%
select(1:2) %>%
setNames(c("region", "population")) %>%
mutate(
population = {as.numeric(gsub("\\D", "", population))},
region = tolower(gsub("\\s+[Cc]ounty|\\.", "", region)),
#weight = ((1 - (1/(1 + exp(population/sum(population)))))/11)
weight = exp(population/sum(population)),
weight = sqrt(weight/sum(weight))/3
)
race_data_long <- add_rownames(centroids, "region") %>>%
left_join({distinct(select(ny, region:other))}) %>>%
left_join(pops) %>>%
(~ race_data) %>>%
gather(race, prop, white:other) %>%
split(., .$region)
pies <- setNames(lapply(1:length(race_data_long), function(i){
ggplot(race_data_long[[i]], aes(x=1, prop, fill=race)) +
geom_bar(stat="identity", width=1) +
coord_polar(theta="y") +
theme_tree() +
xlab(NULL) +
ylab(NULL) +
theme_transparent() +
theme(plot.margin=unit(c(0,0,0,0),"mm"))
}), names(race_data_long))
e1 <- ggplot(race_data_long[[1]], aes(x=1, prop, fill=race)) +
geom_bar(stat="identity", width=1) +
coord_polar(theta="y")
leg1 <- gtable_filter(ggplot_gtable(ggplot_build(e1)), "guide-box")
p <- ggplot(ny, aes(long, lat, group=group)) +
geom_polygon(colour='black', fill=NA) +
theme_bw() +
annotation_custom(grob = leg1, xmin = -77.5, xmax = -78.5, ymin = 44, ymax = 45)
n <- length(pies)
for (i in 1:n) {
nms <- names(pies)[i]
dat <- race_data[which(race_data$region == nms)[1], ]
p <- subview(p, pies[[i]], x=unlist(dat[["long"]])[1], y=unlist(dat[["lat"]])[1], dat[["weight"]], dat[["weight"]])
}
print(p)
This functionality should be in ggplot, I think it is coming to ggplot soonish, but it is currently available in base plots. I thought I would post this just for comparison's sake.
load(url("http://dl.dropbox.com/u/61803503/nycounty.RData"))
library(plotrix)
e=10^-5
myglyff=function(gi) {
floating.pie(mean(gi$long),
mean(gi$lat),
x=c(gi[1,"white"]+e,
gi[1,"black"]+e,
gi[1,"hispanic"]+e,
gi[1,"asian"]+e,
gi[1,"other"]+e),
radius=.1) #insert size variable here
}
g1=ny[which(ny$group==1),]
plot(g1$long,
g1$lat,
type='l',
xlim=c(-80,-71.5),
ylim=c(40.5,45.1))
myglyff(g1)
for(i in 2:62)
{gi=ny[which(ny$group==i),]
lines(gi$long,gi$lat)
myglyff(gi)
}
Also, there may be (probably are) more elegant ways of doing this in the base graphics.
As, you can see, there are quite a few problems with this that need to be solved. A fill color for the counties. The pie charts tend to be too small or overlap. The lat and long do not take a projection so sizes of counties are distorted.
In any event, I am interested in what others can come up with.
I've written some code to do this using grid graphics. There is an example here: https://qdrsite.wordpress.com/2016/06/26/pies-on-a-map/
The goal here was to associate the pie charts with specific points on the map, and not necessarily regions. For this particular solution, it is necessary to convert the map coordinates (latitude and longitude) to a (0,1) scale so they can be plotted in the proper locations on the map. The grid package is used to print to the viewport that contains the plot panel.
Code:
# Pies On A Map
# Demonstration script
# By QDR
# Uses NLCD land cover data for different sites in the National Ecological Observatory Network.
# Each site consists of a number of different plots, and each plot has its own land cover classification.
# On a US map, plot a pie chart at the location of each site with the proportion of plots at that site within each land cover class.
# For this demo script, I've hard coded in the color scale, and included the data as a CSV linked from dropbox.
# Custom color scale (taken from the official NLCD legend)
nlcdcolors <- structure(c("#7F7F7F", "#FFB3CC", "#00B200", "#00FFFF", "#006600", "#E5CC99", "#00B2B2", "#FFFF00", "#B2B200", "#80FFCC"), .Names = c("unknown", "cultivatedCrops", "deciduousForest", "emergentHerbaceousWetlands", "evergreenForest", "grasslandHerbaceous", "mixedForest", "pastureHay", "shrubScrub", "woodyWetlands"))
# NLCD data for the NEON plots
nlcdtable_long <- read.csv(file='https://www.dropbox.com/s/x95p4dvoegfspax/demo_nlcdneon.csv?raw=1', row.names=NULL, stringsAsFactors=FALSE)
library(ggplot2)
library(plyr)
library(grid)
# Create a blank state map. The geom_tile() is included because it allows a legend for all the pie charts to be printed, although it does not
statemap <- ggplot(nlcdtable_long, aes(decimalLongitude,decimalLatitude,fill=nlcdClass)) +
geom_tile() +
borders('state', fill='beige') + coord_map() +
scale_x_continuous(limits=c(-125,-65), expand=c(0,0), name = 'Longitude') +
scale_y_continuous(limits=c(25, 50), expand=c(0,0), name = 'Latitude') +
scale_fill_manual(values = nlcdcolors, name = 'NLCD Classification')
# Create a list of ggplot objects. Each one is the pie chart for each site with all labels removed.
pies <- dlply(nlcdtable_long, .(siteID), function(z)
ggplot(z, aes(x=factor(1), y=prop_plots, fill=nlcdClass)) +
geom_bar(stat='identity', width=1) +
coord_polar(theta='y') +
scale_fill_manual(values = nlcdcolors) +
theme(axis.line=element_blank(),
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
legend.position="none",
panel.background=element_blank(),
panel.border=element_blank(),
panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
plot.background=element_blank()))
# Use the latitude and longitude maxima and minima from the map to calculate the coordinates of each site location on a scale of 0 to 1, within the map panel.
piecoords <- ddply(nlcdtable_long, .(siteID), function(x) with(x, data.frame(
siteID = siteID[1],
x = (decimalLongitude[1]+125)/60,
y = (decimalLatitude[1]-25)/25
)))
# Print the state map.
statemap
# Use a function from the grid package to move into the viewport that contains the plot panel, so that we can plot the individual pies in their correct locations on the map.
downViewport('panel.3-4-3-4')
# Here is the fun part: loop through the pies list. At each iteration, print the ggplot object at the correct location on the viewport. The y coordinate is shifted by half the height of the pie (set at 10% of the height of the map) so that the pie will be centered at the correct coordinate.
for (i in 1:length(pies))
print(pies[[i]], vp=dataViewport(xData=c(-125,-65), yData=c(25,50), clip='off',xscale = c(-125,-65), yscale=c(25,50), x=piecoords$x[i], y=piecoords$y[i]-.06, height=.12, width=.12))
The result looks like this:
I stumbled upon what looks like a function to do this: "add.pie" in the "mapplots" package.
The example from the package is below.
plot(NA,NA, xlim=c(-1,1), ylim=c(-1,1) )
add.pie(z=rpois(6,10), x=-0.5, y=0.5, radius=0.5)
add.pie(z=rpois(4,10), x=0.5, y=-0.5, radius=0.3)
A slight variation on the OP's original requirements, but this seems like an appropriate answer/update.
If you want an interactive Google Map, as of googleway v2.6.0 you can add charts inside info_windows of map layers.
see ?googleway::google_charts for documentation and examples
library(googleway)
set_key("GOOGLE_MAP_KEY")
## create some dummy chart data
markerCharts <- data.frame(stop_id = rep(tram_stops$stop_id, each = 3))
markerCharts$variable <- c("yes", "no", "maybe")
markerCharts$value <- sample(1:10, size = nrow(markerCharts), replace = T)
chartList <- list(
data = markerCharts
, type = 'pie'
, options = list(
title = "my pie"
, is3D = TRUE
, height = 240
, width = 240
, colors = c('#440154', '#21908C', '#FDE725')
)
)
google_map() %>%
add_markers(
data = tram_stops
, id = "stop_id"
, info_window = chartList
)
I want to set custom shape, size and color for points added to a map based upon a variable called 'Dataset'. I'm able to set the color of the points if I set the shape to the same type for all the points, but I'm hoping to have a map with a little more information. When I runt this code, all the points are circles colored black. What am I missing?
Thanks everyone for your help & time!!
Here's a reproducible example:
# Read in libraries
library(ggplot2)
library(maps)
library(maptools)
library(ggmap)
# Create mapping objects
world <- map_data("world2")
world$long <- world$long
state_dat <- map_data("state")
canada <- world[world$region==c("Canada"),]
map_dat <- rbind(state_dat, canada)
# Create custom shapes, sizes, colors
pt_colors=c("red", "blue", "grey", "green")
shapes = c(120, 22, 24, 21)
shape_size = c(1.1, 0.8, 1, 1)
# Create lat/long dataframe
xy <- data.frame(Dataset=c("GBIF","Flower","GBIF","Leaf","DNA","GBIF","GBIF","Leaf","GBIF","GBIF","DNA","GBIF","DNA","GBIF","GBIF","Leaf","GBIF","GBIF","GBIF","DNA"),
lat=c(38.89450,34.45300,39.86556,30.38818,28.74590,33.78527,41.23439,30.37935,41.38250,40.60648,30.87580,40.56425,28.75000,41.52666,35.46451,30.73621,38.50221,33.70335,38.98000,29.61100),
long=c(-77.06292,-84.22643,-79.50248,-84.64519,-81.47860,-84.37109,-81.46374,-86.17667,-72.10861,-74.53538,-84.41520,-74.86654,-81.47750,-73.15833,-78.89952,-86.73095,-78.40308,-86.70289,-77.03917,-81.78740)
)
# Create base map
p0 <- ggplot() +
geom_polygon(data=map_dat,aes(x=long,y=lat,group=group, fill=region),fill="white",color="black", show.legend=FALSE)+
coord_map("gilbert",xlim=c(-60,-97),ylim=c(15,47.5)) +#mollweide is pretty good
labs(x=expression("Longitude"*~degree*W), y=expression("Latitude"*~degree*N)) +
theme(panel.border = element_rect(colour = "black", fill=NA, size=1),
plot.margin=unit(c(0.25,0.25,0.25,0.25),'inches'),
legend.position='none') +
theme(rect = element_blank())
# Add points to the map
p1 <- p0 +
geom_point(data=xy,aes(x=long,y=lat,fill=Dataset)) +
scale_color_manual(values=pt_colors) +
scale_shape_manual(values=shapes) +
scale_size_manual(values=shape_size)
You need to have colour, shape, and size within your geom_point aesthetic values. Geom_point doesn't use fill as an aesthetic, but uses colour.
Simply fixing that will generate what you want.
p1 <- p0 +
geom_point(data=xy,aes(x=long,y=lat,colour = Dataset, shape = Dataset, size = Dataset)) +
scale_color_manual(values=pt_colors) +
scale_shape_manual(values=shapes) +
scale_size_manual(values=shape_size)
I am having trouble in separating the legends in a ggplot2 graph with multiple layers. What my plot does is to fill different municipalities according to the number of textile companies present there and I also plot the plant localization with geom_point. My guess is to use aes.override() somehow, but I haven't been able to do this still. The solutions that I have read do not deal with a different variable for the plots detailed in the aes() of geom_point().
If you want to test the code below, you could download the shapefile for the brazilian municipalities here, use readOGR and fortify, then choose to fill the municipalities with your preference with fill and set arbitrarily random points within Brazil for geom_point() creating a different variable, such as lat_plant and long_plant below. The region column below details brazilian regions -- in this case, the "1" details the northern region of Brazil.
The Code
#setting the ggplot
library(ggplot2)
gg2 < -ggplot(data = out[out$region =="1",],
aes(x = long, y = lat, group = group, fill = as.factor(companies))) +
geom_polygon() +
ggtitle("title") +
scale_fill_discrete(name = "Number of Textile Companies") +
theme(plot.title = element_text(size = 30, face = "bold")) +
theme(legend.text = element_text(size = 12),
legend.title = element_text(colour = "blue", size = 16, face = "bold"))
#graph output
gg2 +
geom_point(data = out[out$region =="1",], aes(x = long_plant, y = lat_plant), color = "red")
What I am getting as legend is this:
And I would like to separate it, detailing that the dots as localizations and the colors as the filling for the number of textile companies in the region.
I leave another option for you. hmgeiger treated the number of textile companies as factor. But, I rather treated the variable as a continuous variable. Since there is NO reproducible data, I created a sample data by myself. Here, I created random samples uing longitude and latitude of Brazil, and made sure that some data points stay in Brazil. whatever2 contains data points staying in Brazil. I did a bit of trick here as well. I added a new column called Factory location. This is the dummy variable for adding color to data points in the final graphic. hmgeiger created Dummy.var that contains characters for you. I rather left "" in this column since you may not want to see any text in legend.
For your legend issue, as Antonio mentioned and hmgeiger did, you need to add color in aes() in geom_point(). This solves it. I did a bit more thing for you. If you do not know how many factories exist in each municipal, you need to count the number of factories. I did the job using poly.count() in the GISTools package and created another data frame that contains the numbers of factories in each municipal.
When I drew the map, I had three layers. One is for the polygons and another for filling the polygons with colors. They are done with geom_cartogram() from the ggalt package. The key thing is that you need to have a common key column for map_id. id in the first geom_cartogram() and ind in the second geom_cartogram() are identical information. In geom_point() you need color in aes(). The legend has a continuous bar for the number of factories and a single dot for factory location. No text exists next to it. So this makes the legend tidy, I think.
library(raster)
library(tidyverse)
library(GISTools)
library(RColorBrewer)
library(ggalt)
library(ggthemes)
# Get polygon data for Brazil
brazil <- getData("GADM", country = "brazil", level = 1)
mymap <- fortify(brazil)
# Create dummy data staying in the polygons
# For more information: https://stackoverflow.com/questions/47696382/removing-data-outside-country-map-boundary-in-r/47699405#47699405
set.seed(123)
mydata <- data.frame(long = runif(200, min = quantile(mymap$long)[1], max = quantile(mymap$long)[4]),
lat = runif(200, min = quantile(mymap$lat)[1], max = quantile(mymap$lat)[4]),
factory = paste("factory ", 1:200, sep = ""),
stringsAsFactors = FALSE)
spdf <- SpatialPointsDataFrame(coords = mydata[, c("long", "lat")], data = mydata,
proj4string = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"))
whatever <- spdf[!is.na(over(spdf, as(brazil, "SpatialPolygons"))), ]
whatever2 <- as.data.frame(whatever) %>%
mutate(`Factory location` = "")
# Now I check how many data points (factories) exist in each polygon
# and create a data frame
factory.num <- poly.counts(pts = whatever, polys = brazil)
factory.num <- stack(factory.num)
ggplot() +
geom_cartogram(data = mymap, aes(x = long, y = lat, map_id = id),
map = mymap) +
geom_cartogram(data = factory.num, aes(fill = values, map_id = ind),
map = mymap) +
geom_point(data = whatever2, aes(x = long, y = lat, color = `Factory location`)) +
scale_fill_gradientn(name = "Number of factories", colours = brewer.pal(5, "Greens")) +
coord_map() +
theme_map()
FYI, the link you posted to download the shape file is quite slow, at least to download to a US computer.
This link has downloads that work a lot better, and also shows how to read in shape data: https://dioferrari.wordpress.com/2014/11/27/plotting-maps-using-r-example-with-brazilian-municipal-level-data/
I made an example using the regions rather than municipalities data to keep it simple.
Data I used available for download here: https://drive.google.com/file/d/0B64xLcn8DZfwakNMbHFLQWo4YzA/view?usp=sharing
#Load libraries.
library(rgeos)
library(rgdal)
library(ggplot2)
#Read in and format map data.
regions_OGR <- readOGR(dsn="/Users/hmgeiger/Downloads/regioes_2010",
layer = "regioes_2010")
map_regions <- spTransform(regions_OGR,CRS("+proj=longlat +datum=WGS84"))
map_regions_fortified <- fortify(map_regions)
#We make there be 0, 1, or 3 textile companies.
#map_regions_fortified is in order by ID (region).
#So, we add a column with the number of textile companies
#repeated the right number of times for how many of each region there is.
num_rows_per_region <- data.frame(table(map_regions_fortified$id))
map_regions_fortified <- data.frame(map_regions_fortified,
Num.factories = factor(rep(c(1,0,1,3,1),times=num_rows_per_region$Freq)))
#First, plot without any location dots.
ggplot()+geom_polygon(data=map_regions_fortified,
aes(x = long,y = lat, group=group, fill=Num.factories),colour="black")
Now, let's add the factory locations.
#Set latitude and longitude based on the number of factories per region.
factory_locations <- data.frame(long = c(-65,-55,-51,-44,-42,-38),
lat = c(-5,-15,-27,-7,-12,-8))
#Add a dummy variable, which then allows the colour of the dots
#to be a part of the legend.
factory_locations <- data.frame(factory_locations,
Dummy.var = rep("One dot = one factory location",times=nrow(factory_locations)))
#Replot adding factory location dots.
#We will use black dots here since will be easier to see.
ggplot()+geom_polygon(data=map_regions_fortified,
aes(x = long,y = lat, group=group, fill=Num.factories),colour="black")
+ geom_point(data = factory_locations,aes(x = long,y = lat,colour = Dummy.var))
+ scale_colour_manual(values="black") + labs(colour="")
#Bonus: Let's change the color vector to something more color-blind friendly.
mycol <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442",
"#0072B2", "#D55E00", "#CC79A7","#490092")
ggplot()+geom_polygon(data=map_regions_fortified,
aes(x = long,y = lat, group=group, fill=Num.factories),colour="black")
+ geom_point(data = factory_locations,aes(x = long,y = lat,colour = Dummy.var))
+ scale_colour_manual(values="black") + labs(colour="")
+ scale_fill_manual(values=mycol)
I have sampled 10,000 coordinates from my data in this file. I have around 130,000 points.
https://www.dropbox.com/s/40hfyx6a5hsjuv7/data.csv
I am trying to plot these points on the Americas map using ggplot2. Here is my code.
library(ggplot2)
library(maps)
map_world <- map_data("world")
map_world <- subset(map_world, (lat >= -60 & lat <= 75))
map_world <- subset(map_world, (long >= -170 & long <= -30))
p <- ggplot(data = data_coords, legend = FALSE) +
geom_polygon(data = map_world, aes(x = long, y = lat, group = group)) +
geom_point(aes(x = lon, y = lat), shape = 19, size = 0.00001,
alpha = 0.3, colour = "red") +
theme(panel.grid.major = element_blank()) +
theme(panel.grid.minor = element_blank()) +
theme(axis.text.x = element_blank(),axis.text.y = element_blank()) +
theme(axis.ticks = element_blank()) +
xlab("") + ylab("")
png("my_plot.png", width = 8000, height = 7000, res = 1000)
print(p)
dev.off()
The points seem to cover the whole area in which they were plotted. I would like them to be more smaller to better represent a location. You can see that I've set the size to 0.00001. I was just trying to see if it has any effect but it doesn't seem to help after a certain limit. Is this the best that is possible at this resolution or could it be reduced more?
I had actually plotted around 400,000 points but only on the US map before and they looked much better like below. Hoping to get something like this. Thanks.
https://www.dropbox.com/s/8d0niu9g6ygz0wo/Clusters_reduced.png
Try playing with very small values of alpha, instead of the point size:
http://docs.ggplot2.org/0.9.3.1/geom_point.html
# Varying alpha is useful for large datasets
d <- ggplot(diamonds, aes(carat, price))
d + geom_point(alpha = 1/1000)
Edit:
Additional ideas are given in the documentation. Here's a summary:
Details
The scatterplot is useful for displaying the relationship between two continuous variables, although it can also be used with one continuous and one categorical variable, or two categorical variables. See geom_jitter for possibilities.
The bubblechart is a scatterplot with a third variable mapped to the size of points. There are no special names for scatterplots where another variable is mapped to point shape or colour, however.
The biggest potential problem with a scatterplot is overplotting: whenever you have more than a few points, points may be plotted on top of one another. This can severely distort the visual appearance of the plot. There is no one solution to this problem, but there are some techniques that can help. You can add additional information with stat_smooth, stat_quantile or stat_density2d. If you have few unique x values, geom_boxplot may also be useful. Alternatively, you can summarise the number of points at each location and display that in some way, using stat_sum.
Another technique is to use transparent points, geom_point(alpha = 0.05).
Edit 2:
Combining the details from the manual with the hints in Transparency and Alpha levels for ggplot2 stat_density2d with maps and layers in R
This might look like the solution:
library(ggplot2)
library(maps)
data_coords <- read.csv("C:/Downloads/data.csv")
map_world <- map_data("world")
map_world <- subset(map_world, (lat >= -60 & lat <= 75))
map_world <- subset(map_world, (long >= -170 & long <= -30))
p <- ggplot( data = data_coords, legend = FALSE) +
geom_polygon( data = map_world, aes(x = long, y = lat, group = group)) +
stat_density2d( data = data_coords, aes(x=lon, y=lat, fill = as.factor(..level..)), size=1, bins=10, geom='polygon') +
scale_fill_manual(values = c("yellow","red","green","royalblue", "black","white","orange","brown","grey"))
png("my_plot2k.png", width = 2000, height = 2000, res = 500)
print(p)
dev.off()
Resulting image (not the best colour palette used):
This may be a wish list thing, not sure (i.e. maybe there would need to be the creation of geom_pie for this to occur). I saw a map today (LINK) with pie graphs on it as seen here.
I don't want to debate the merits of a pie graph, this was more of an exercise of can I do this in ggplot?
I have provided a data set below (loaded from my drop box) that has the mapping data to make a New York State map and some purely fabricated data on racial percentages by county. I have given this racial make up as a merge with the main data set and as a separate data set called key. I also think Bryan Goodrich's response to me in another post (HERE) on centering county names will be helpful to this concept.
How can we make the map above with ggplot2?
A data set and the map without the pie graphs:
load(url("http://dl.dropbox.com/u/61803503/nycounty.RData"))
head(ny); head(key) #view the data set from my drop box
library(ggplot2)
ggplot(ny, aes(long, lat, group=group)) + geom_polygon(colour='black', fill=NA)
# Now how can we plot a pie chart of race on each county
# (sizing of the pie would also be controllable via a size
# parameter like other `geom_` functions).
Thanks in advance for your ideas.
EDIT: I just saw another case at junkcharts that screams for this type of capability:
Three years later this is solved. I've put together a number of processes together and thanks to #Guangchuang Yu's excellent ggtree package this can be done fairly easily. Note that as of (9/3/2015) you need to have version 1.0.18 of ggtree installed but these will eventually trickle down to their respective repositories.
I've used the following resources to make this (the links will give greater detail):
ggtree blog
move ggplot legend
correct ggtree version
centering things in polygons
Here's the code:
load(url("http://dl.dropbox.com/u/61803503/nycounty.RData"))
head(ny); head(key) #view the data set from my drop box
if (!require("pacman")) install.packages("pacman")
p_load(ggplot2, ggtree, dplyr, tidyr, sp, maps, pipeR, grid, XML, gtable)
getLabelPoint <- function(county) {Polygon(county[c('long', 'lat')])#labpt}
df <- map_data('county', 'new york') # NY region county data
centroids <- by(df, df$subregion, getLabelPoint) # Returns list
centroids <- do.call("rbind.data.frame", centroids) # Convert to Data Frame
names(centroids) <- c('long', 'lat') # Appropriate Header
pops <- "http://data.newsday.com/long-island/data/census/county-population-estimates-2012/" %>%
readHTMLTable(which=1) %>%
tbl_df() %>%
select(1:2) %>%
setNames(c("region", "population")) %>%
mutate(
population = {as.numeric(gsub("\\D", "", population))},
region = tolower(gsub("\\s+[Cc]ounty|\\.", "", region)),
#weight = ((1 - (1/(1 + exp(population/sum(population)))))/11)
weight = exp(population/sum(population)),
weight = sqrt(weight/sum(weight))/3
)
race_data_long <- add_rownames(centroids, "region") %>>%
left_join({distinct(select(ny, region:other))}) %>>%
left_join(pops) %>>%
(~ race_data) %>>%
gather(race, prop, white:other) %>%
split(., .$region)
pies <- setNames(lapply(1:length(race_data_long), function(i){
ggplot(race_data_long[[i]], aes(x=1, prop, fill=race)) +
geom_bar(stat="identity", width=1) +
coord_polar(theta="y") +
theme_tree() +
xlab(NULL) +
ylab(NULL) +
theme_transparent() +
theme(plot.margin=unit(c(0,0,0,0),"mm"))
}), names(race_data_long))
e1 <- ggplot(race_data_long[[1]], aes(x=1, prop, fill=race)) +
geom_bar(stat="identity", width=1) +
coord_polar(theta="y")
leg1 <- gtable_filter(ggplot_gtable(ggplot_build(e1)), "guide-box")
p <- ggplot(ny, aes(long, lat, group=group)) +
geom_polygon(colour='black', fill=NA) +
theme_bw() +
annotation_custom(grob = leg1, xmin = -77.5, xmax = -78.5, ymin = 44, ymax = 45)
n <- length(pies)
for (i in 1:n) {
nms <- names(pies)[i]
dat <- race_data[which(race_data$region == nms)[1], ]
p <- subview(p, pies[[i]], x=unlist(dat[["long"]])[1], y=unlist(dat[["lat"]])[1], dat[["weight"]], dat[["weight"]])
}
print(p)
This functionality should be in ggplot, I think it is coming to ggplot soonish, but it is currently available in base plots. I thought I would post this just for comparison's sake.
load(url("http://dl.dropbox.com/u/61803503/nycounty.RData"))
library(plotrix)
e=10^-5
myglyff=function(gi) {
floating.pie(mean(gi$long),
mean(gi$lat),
x=c(gi[1,"white"]+e,
gi[1,"black"]+e,
gi[1,"hispanic"]+e,
gi[1,"asian"]+e,
gi[1,"other"]+e),
radius=.1) #insert size variable here
}
g1=ny[which(ny$group==1),]
plot(g1$long,
g1$lat,
type='l',
xlim=c(-80,-71.5),
ylim=c(40.5,45.1))
myglyff(g1)
for(i in 2:62)
{gi=ny[which(ny$group==i),]
lines(gi$long,gi$lat)
myglyff(gi)
}
Also, there may be (probably are) more elegant ways of doing this in the base graphics.
As, you can see, there are quite a few problems with this that need to be solved. A fill color for the counties. The pie charts tend to be too small or overlap. The lat and long do not take a projection so sizes of counties are distorted.
In any event, I am interested in what others can come up with.
I've written some code to do this using grid graphics. There is an example here: https://qdrsite.wordpress.com/2016/06/26/pies-on-a-map/
The goal here was to associate the pie charts with specific points on the map, and not necessarily regions. For this particular solution, it is necessary to convert the map coordinates (latitude and longitude) to a (0,1) scale so they can be plotted in the proper locations on the map. The grid package is used to print to the viewport that contains the plot panel.
Code:
# Pies On A Map
# Demonstration script
# By QDR
# Uses NLCD land cover data for different sites in the National Ecological Observatory Network.
# Each site consists of a number of different plots, and each plot has its own land cover classification.
# On a US map, plot a pie chart at the location of each site with the proportion of plots at that site within each land cover class.
# For this demo script, I've hard coded in the color scale, and included the data as a CSV linked from dropbox.
# Custom color scale (taken from the official NLCD legend)
nlcdcolors <- structure(c("#7F7F7F", "#FFB3CC", "#00B200", "#00FFFF", "#006600", "#E5CC99", "#00B2B2", "#FFFF00", "#B2B200", "#80FFCC"), .Names = c("unknown", "cultivatedCrops", "deciduousForest", "emergentHerbaceousWetlands", "evergreenForest", "grasslandHerbaceous", "mixedForest", "pastureHay", "shrubScrub", "woodyWetlands"))
# NLCD data for the NEON plots
nlcdtable_long <- read.csv(file='https://www.dropbox.com/s/x95p4dvoegfspax/demo_nlcdneon.csv?raw=1', row.names=NULL, stringsAsFactors=FALSE)
library(ggplot2)
library(plyr)
library(grid)
# Create a blank state map. The geom_tile() is included because it allows a legend for all the pie charts to be printed, although it does not
statemap <- ggplot(nlcdtable_long, aes(decimalLongitude,decimalLatitude,fill=nlcdClass)) +
geom_tile() +
borders('state', fill='beige') + coord_map() +
scale_x_continuous(limits=c(-125,-65), expand=c(0,0), name = 'Longitude') +
scale_y_continuous(limits=c(25, 50), expand=c(0,0), name = 'Latitude') +
scale_fill_manual(values = nlcdcolors, name = 'NLCD Classification')
# Create a list of ggplot objects. Each one is the pie chart for each site with all labels removed.
pies <- dlply(nlcdtable_long, .(siteID), function(z)
ggplot(z, aes(x=factor(1), y=prop_plots, fill=nlcdClass)) +
geom_bar(stat='identity', width=1) +
coord_polar(theta='y') +
scale_fill_manual(values = nlcdcolors) +
theme(axis.line=element_blank(),
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
legend.position="none",
panel.background=element_blank(),
panel.border=element_blank(),
panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
plot.background=element_blank()))
# Use the latitude and longitude maxima and minima from the map to calculate the coordinates of each site location on a scale of 0 to 1, within the map panel.
piecoords <- ddply(nlcdtable_long, .(siteID), function(x) with(x, data.frame(
siteID = siteID[1],
x = (decimalLongitude[1]+125)/60,
y = (decimalLatitude[1]-25)/25
)))
# Print the state map.
statemap
# Use a function from the grid package to move into the viewport that contains the plot panel, so that we can plot the individual pies in their correct locations on the map.
downViewport('panel.3-4-3-4')
# Here is the fun part: loop through the pies list. At each iteration, print the ggplot object at the correct location on the viewport. The y coordinate is shifted by half the height of the pie (set at 10% of the height of the map) so that the pie will be centered at the correct coordinate.
for (i in 1:length(pies))
print(pies[[i]], vp=dataViewport(xData=c(-125,-65), yData=c(25,50), clip='off',xscale = c(-125,-65), yscale=c(25,50), x=piecoords$x[i], y=piecoords$y[i]-.06, height=.12, width=.12))
The result looks like this:
I stumbled upon what looks like a function to do this: "add.pie" in the "mapplots" package.
The example from the package is below.
plot(NA,NA, xlim=c(-1,1), ylim=c(-1,1) )
add.pie(z=rpois(6,10), x=-0.5, y=0.5, radius=0.5)
add.pie(z=rpois(4,10), x=0.5, y=-0.5, radius=0.3)
A slight variation on the OP's original requirements, but this seems like an appropriate answer/update.
If you want an interactive Google Map, as of googleway v2.6.0 you can add charts inside info_windows of map layers.
see ?googleway::google_charts for documentation and examples
library(googleway)
set_key("GOOGLE_MAP_KEY")
## create some dummy chart data
markerCharts <- data.frame(stop_id = rep(tram_stops$stop_id, each = 3))
markerCharts$variable <- c("yes", "no", "maybe")
markerCharts$value <- sample(1:10, size = nrow(markerCharts), replace = T)
chartList <- list(
data = markerCharts
, type = 'pie'
, options = list(
title = "my pie"
, is3D = TRUE
, height = 240
, width = 240
, colors = c('#440154', '#21908C', '#FDE725')
)
)
google_map() %>%
add_markers(
data = tram_stops
, id = "stop_id"
, info_window = chartList
)