Related
This may be a wish list thing, not sure (i.e. maybe there would need to be the creation of geom_pie for this to occur). I saw a map today (LINK) with pie graphs on it as seen here.
I don't want to debate the merits of a pie graph, this was more of an exercise of can I do this in ggplot?
I have provided a data set below (loaded from my drop box) that has the mapping data to make a New York State map and some purely fabricated data on racial percentages by county. I have given this racial make up as a merge with the main data set and as a separate data set called key. I also think Bryan Goodrich's response to me in another post (HERE) on centering county names will be helpful to this concept.
How can we make the map above with ggplot2?
A data set and the map without the pie graphs:
load(url("http://dl.dropbox.com/u/61803503/nycounty.RData"))
head(ny); head(key) #view the data set from my drop box
library(ggplot2)
ggplot(ny, aes(long, lat, group=group)) + geom_polygon(colour='black', fill=NA)
# Now how can we plot a pie chart of race on each county
# (sizing of the pie would also be controllable via a size
# parameter like other `geom_` functions).
Thanks in advance for your ideas.
EDIT: I just saw another case at junkcharts that screams for this type of capability:
Three years later this is solved. I've put together a number of processes together and thanks to #Guangchuang Yu's excellent ggtree package this can be done fairly easily. Note that as of (9/3/2015) you need to have version 1.0.18 of ggtree installed but these will eventually trickle down to their respective repositories.
I've used the following resources to make this (the links will give greater detail):
ggtree blog
move ggplot legend
correct ggtree version
centering things in polygons
Here's the code:
load(url("http://dl.dropbox.com/u/61803503/nycounty.RData"))
head(ny); head(key) #view the data set from my drop box
if (!require("pacman")) install.packages("pacman")
p_load(ggplot2, ggtree, dplyr, tidyr, sp, maps, pipeR, grid, XML, gtable)
getLabelPoint <- function(county) {Polygon(county[c('long', 'lat')])#labpt}
df <- map_data('county', 'new york') # NY region county data
centroids <- by(df, df$subregion, getLabelPoint) # Returns list
centroids <- do.call("rbind.data.frame", centroids) # Convert to Data Frame
names(centroids) <- c('long', 'lat') # Appropriate Header
pops <- "http://data.newsday.com/long-island/data/census/county-population-estimates-2012/" %>%
readHTMLTable(which=1) %>%
tbl_df() %>%
select(1:2) %>%
setNames(c("region", "population")) %>%
mutate(
population = {as.numeric(gsub("\\D", "", population))},
region = tolower(gsub("\\s+[Cc]ounty|\\.", "", region)),
#weight = ((1 - (1/(1 + exp(population/sum(population)))))/11)
weight = exp(population/sum(population)),
weight = sqrt(weight/sum(weight))/3
)
race_data_long <- add_rownames(centroids, "region") %>>%
left_join({distinct(select(ny, region:other))}) %>>%
left_join(pops) %>>%
(~ race_data) %>>%
gather(race, prop, white:other) %>%
split(., .$region)
pies <- setNames(lapply(1:length(race_data_long), function(i){
ggplot(race_data_long[[i]], aes(x=1, prop, fill=race)) +
geom_bar(stat="identity", width=1) +
coord_polar(theta="y") +
theme_tree() +
xlab(NULL) +
ylab(NULL) +
theme_transparent() +
theme(plot.margin=unit(c(0,0,0,0),"mm"))
}), names(race_data_long))
e1 <- ggplot(race_data_long[[1]], aes(x=1, prop, fill=race)) +
geom_bar(stat="identity", width=1) +
coord_polar(theta="y")
leg1 <- gtable_filter(ggplot_gtable(ggplot_build(e1)), "guide-box")
p <- ggplot(ny, aes(long, lat, group=group)) +
geom_polygon(colour='black', fill=NA) +
theme_bw() +
annotation_custom(grob = leg1, xmin = -77.5, xmax = -78.5, ymin = 44, ymax = 45)
n <- length(pies)
for (i in 1:n) {
nms <- names(pies)[i]
dat <- race_data[which(race_data$region == nms)[1], ]
p <- subview(p, pies[[i]], x=unlist(dat[["long"]])[1], y=unlist(dat[["lat"]])[1], dat[["weight"]], dat[["weight"]])
}
print(p)
This functionality should be in ggplot, I think it is coming to ggplot soonish, but it is currently available in base plots. I thought I would post this just for comparison's sake.
load(url("http://dl.dropbox.com/u/61803503/nycounty.RData"))
library(plotrix)
e=10^-5
myglyff=function(gi) {
floating.pie(mean(gi$long),
mean(gi$lat),
x=c(gi[1,"white"]+e,
gi[1,"black"]+e,
gi[1,"hispanic"]+e,
gi[1,"asian"]+e,
gi[1,"other"]+e),
radius=.1) #insert size variable here
}
g1=ny[which(ny$group==1),]
plot(g1$long,
g1$lat,
type='l',
xlim=c(-80,-71.5),
ylim=c(40.5,45.1))
myglyff(g1)
for(i in 2:62)
{gi=ny[which(ny$group==i),]
lines(gi$long,gi$lat)
myglyff(gi)
}
Also, there may be (probably are) more elegant ways of doing this in the base graphics.
As, you can see, there are quite a few problems with this that need to be solved. A fill color for the counties. The pie charts tend to be too small or overlap. The lat and long do not take a projection so sizes of counties are distorted.
In any event, I am interested in what others can come up with.
I've written some code to do this using grid graphics. There is an example here: https://qdrsite.wordpress.com/2016/06/26/pies-on-a-map/
The goal here was to associate the pie charts with specific points on the map, and not necessarily regions. For this particular solution, it is necessary to convert the map coordinates (latitude and longitude) to a (0,1) scale so they can be plotted in the proper locations on the map. The grid package is used to print to the viewport that contains the plot panel.
Code:
# Pies On A Map
# Demonstration script
# By QDR
# Uses NLCD land cover data for different sites in the National Ecological Observatory Network.
# Each site consists of a number of different plots, and each plot has its own land cover classification.
# On a US map, plot a pie chart at the location of each site with the proportion of plots at that site within each land cover class.
# For this demo script, I've hard coded in the color scale, and included the data as a CSV linked from dropbox.
# Custom color scale (taken from the official NLCD legend)
nlcdcolors <- structure(c("#7F7F7F", "#FFB3CC", "#00B200", "#00FFFF", "#006600", "#E5CC99", "#00B2B2", "#FFFF00", "#B2B200", "#80FFCC"), .Names = c("unknown", "cultivatedCrops", "deciduousForest", "emergentHerbaceousWetlands", "evergreenForest", "grasslandHerbaceous", "mixedForest", "pastureHay", "shrubScrub", "woodyWetlands"))
# NLCD data for the NEON plots
nlcdtable_long <- read.csv(file='https://www.dropbox.com/s/x95p4dvoegfspax/demo_nlcdneon.csv?raw=1', row.names=NULL, stringsAsFactors=FALSE)
library(ggplot2)
library(plyr)
library(grid)
# Create a blank state map. The geom_tile() is included because it allows a legend for all the pie charts to be printed, although it does not
statemap <- ggplot(nlcdtable_long, aes(decimalLongitude,decimalLatitude,fill=nlcdClass)) +
geom_tile() +
borders('state', fill='beige') + coord_map() +
scale_x_continuous(limits=c(-125,-65), expand=c(0,0), name = 'Longitude') +
scale_y_continuous(limits=c(25, 50), expand=c(0,0), name = 'Latitude') +
scale_fill_manual(values = nlcdcolors, name = 'NLCD Classification')
# Create a list of ggplot objects. Each one is the pie chart for each site with all labels removed.
pies <- dlply(nlcdtable_long, .(siteID), function(z)
ggplot(z, aes(x=factor(1), y=prop_plots, fill=nlcdClass)) +
geom_bar(stat='identity', width=1) +
coord_polar(theta='y') +
scale_fill_manual(values = nlcdcolors) +
theme(axis.line=element_blank(),
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
legend.position="none",
panel.background=element_blank(),
panel.border=element_blank(),
panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
plot.background=element_blank()))
# Use the latitude and longitude maxima and minima from the map to calculate the coordinates of each site location on a scale of 0 to 1, within the map panel.
piecoords <- ddply(nlcdtable_long, .(siteID), function(x) with(x, data.frame(
siteID = siteID[1],
x = (decimalLongitude[1]+125)/60,
y = (decimalLatitude[1]-25)/25
)))
# Print the state map.
statemap
# Use a function from the grid package to move into the viewport that contains the plot panel, so that we can plot the individual pies in their correct locations on the map.
downViewport('panel.3-4-3-4')
# Here is the fun part: loop through the pies list. At each iteration, print the ggplot object at the correct location on the viewport. The y coordinate is shifted by half the height of the pie (set at 10% of the height of the map) so that the pie will be centered at the correct coordinate.
for (i in 1:length(pies))
print(pies[[i]], vp=dataViewport(xData=c(-125,-65), yData=c(25,50), clip='off',xscale = c(-125,-65), yscale=c(25,50), x=piecoords$x[i], y=piecoords$y[i]-.06, height=.12, width=.12))
The result looks like this:
I stumbled upon what looks like a function to do this: "add.pie" in the "mapplots" package.
The example from the package is below.
plot(NA,NA, xlim=c(-1,1), ylim=c(-1,1) )
add.pie(z=rpois(6,10), x=-0.5, y=0.5, radius=0.5)
add.pie(z=rpois(4,10), x=0.5, y=-0.5, radius=0.3)
A slight variation on the OP's original requirements, but this seems like an appropriate answer/update.
If you want an interactive Google Map, as of googleway v2.6.0 you can add charts inside info_windows of map layers.
see ?googleway::google_charts for documentation and examples
library(googleway)
set_key("GOOGLE_MAP_KEY")
## create some dummy chart data
markerCharts <- data.frame(stop_id = rep(tram_stops$stop_id, each = 3))
markerCharts$variable <- c("yes", "no", "maybe")
markerCharts$value <- sample(1:10, size = nrow(markerCharts), replace = T)
chartList <- list(
data = markerCharts
, type = 'pie'
, options = list(
title = "my pie"
, is3D = TRUE
, height = 240
, width = 240
, colors = c('#440154', '#21908C', '#FDE725')
)
)
google_map() %>%
add_markers(
data = tram_stops
, id = "stop_id"
, info_window = chartList
)
I am trying to generate a heatmap where I can show more than one level of information on each cell. For each cell I would like to show a different color depending on its value in one variable and then overlay this with a transparency (alpha) that shades the cell according to its value for another variable.
Similar questions have been addressed here (Place 1 heatmap on another with transparency in R) a
and here (Making a heatmap in R varying both color and transparency). In both cases the suggestion is to use ggplot and overlay two geom_tiles, one with the colors one with the transparency.
I have managed to overlay two geom_tiles (see code below). However, in my case, the problem is that the shading defined by the transparency (or "alpha") geom_tile also shades some cells that should remain as white or blank according to the colors (or "fill") geom_tile. I would like these cells to remain white even after overlaying the transparency.
#Create sample dataframe
df <- data.frame("x_pos" = c("A","A","A","B","B","B","C","C","C"),
"y_pos" = c("X","Y","Z","X","Y","Z","X","Y","Z"),
"col_var"= c(1,2,NA,4,5,6,NA,8,9),
"alpha_var" = c(7,12,0,3,2,15,0,6,15))
#Convert factor columns to numeric
df$col_var<- as.numeric(df$col_var)
df$alpha_var<- as.numeric(df$alpha_var)
#Cut display variable into breaks
df$col_var_cut <- cut(df$col_var,
breaks = c(0,3,6,10),
labels = c("cat1","cat2", "cat3"))
#Plot
library(ggplot2)
ggplot(df, aes (x = x_pos, y = y_pos, fill = col_var_cut, label = col_var)) +
geom_tile () +
geom_text() +
scale_fill_manual(values=(brewer.pal(3, "RdYlBu")),na.value="white") +
geom_tile(aes(alpha = alpha_var), fill ="gray29")+
scale_alpha_continuous("alpha_var", range=c(0,0.7), trans = 'reverse')+
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
I would like cells "AZ" and "CX" in the heatmap resulting from the code above to be colored white instead of grey such that the alpha transparency doesn't apply to them. In my data, these cells have NA in the color variable (col_var) and can have a value of NA or 0 (as in the example code) in the transparency/alpha variable (alpha_var).
If this is not possible, then I would like to know whether there are other options to display both variables in a heatmap and keep the NA cells in the col_var white? I am happy to use other packages or alternative heatmap layouts such as those where the size of each cell or the thickness of its border vary according to the values the alpha_var. However, I am not sure how I could achieve this either.
Thanks in advance and my apologies for the cumbersome bits in the example code (I am still learning R and this is my first time asking questions here).
You were not far. See below for a possible solution. The first plot shows an implementation of adding transparency within the geom_tile call itself - note I removed the trans = reverse specification from your plot.
Plot 2 just adds back the white tiles on top of the other plot - simple hack which you will often find necessary when wanting to plot certain data points differently.
Note I have added a few minor comments to your code below.
# creating your data frame with better name - df is a base R function and not recommended as example name.
# Also note that I removed the quotation marks in the data frame call - they were not necessary. I also called as.numeric directly.
mydf <- data.frame(x_pos = c("A","A","A","B","B","B","C","C","C"), y_pos = c("X","Y","Z","X","Y","Z","X","Y","Z"), col_var= as.numeric(c(1,2,NA,4,5,6,NA,8,9)), alpha_var = as.numeric(c(7,12,0,3,2,15,0,6,15)))
mydf$col_var_cut <- cut(mydf$col_var, breaks = c(0,3,6,10), labels = c("cat1","cat2", "cat3"))
#Plot
library(tidyverse)
library(RColorBrewer) # you forgot to add this to your reprex
ggplot(mydf, aes (x = x_pos, y = y_pos, fill = col_var_cut, label = col_var)) +
geom_tile(aes(alpha = alpha_var)) +
geom_text() +
scale_fill_manual(values=(brewer.pal(3, "RdYlBu")), na.value="white")
#> Warning: Removed 2 rows containing missing values (geom_text).
# a bit hacky for quick and dirty solution. Note I am using dplyr::filter from the tidyverse
ggplot(mapping = aes(x = x_pos, y = y_pos, fill = col_var_cut, label = col_var)) +
geom_tile(data = filter(mydf, !is.na(col_var))) +
geom_tile(data = filter(mydf, !is.na(col_var)), aes(alpha = alpha_var), fill ="gray29")+
geom_tile(data = filter(mydf, is.na(col_var)), fill = 'white') +
geom_text(data = mydf) +
scale_fill_manual(values = (brewer.pal(3, "RdYlBu"))) +
scale_alpha_continuous("alpha_var", range=c(0,0.7), trans = 'reverse')
#> Warning: Removed 2 rows containing missing values (geom_text).
Created on 2019-07-04 by the reprex package (v0.2.1)
I have a basic map of India with states and borders, some labels, and a number of other specifications stored as a gg object. I'd like to generate a number of maps with a district layer, which will bear data from different variables.
To prevent the district maps overwriting state and country borders, it must be before all the previous code, which I'd like to avoid repeating.
I thought I could do this by calling on $layers for the gg object as per this answer. However, it throws an error. Reprex is below:
library(ggplot2)
library(sf)
library(raster)
# Download district and state data (should be less than 10 Mb in total)
distSF <- st_as_sf(getData("GADM",country="IND",level=2))
stateSF <- st_as_sf(getData("GADM",country="IND",level=1))
# Add border
countryborder <- st_union(stateSF)
# Basic plot
basicIndia <- ggplot() +
geom_sf(data = stateSF, color = "white", fill = NA) +
geom_sf(data = countryborder, color = "blue", fill = NA) +
theme_dark()
basicIndia
# Data-bearing plot
districts <- ggplot() +
geom_sf(data = distSF, fill = "gold")
basicIndia$layers <- c(geom_sf(data = distSF, fill = "gold"), basicIndia$layers)
basicIndia
#> Error in y$layer_data(plot$data): attempt to apply non-function
Intended outcome
Any help would be much appreciated!
I'm still not sure if I'm missing a detail of what you're looking for, but ggplot2 draws layers in the order you provide them. So something like
ggplot(data) +
geom_col() +
geom_point(...) +
geom_line(...)
will draw columns, then points on top of those, then lines on top of the previous layers.
Same goes for sf plots, which makes it easy to make a plot like this of multiple geographic levels.
(I'm using rmapshaper::ms_simplify on the sf objects just to simplify them and speed things up for plotting.)
library(dplyr)
library(ggplot2)
library(sf)
library(raster)
distSF <- st_as_sf(getData("GADM",country="IND",level=2)) %>% rmapshaper::ms_simplify()
...
Then you can plot by adding up the layers in the order you need them displayed. Keep in mind that if you needed to do other calculations with any of these sfs, you could do that in advance or inside your geom_sf.
ggplot() +
geom_sf(data = distSF, fill = "gold", size = 0.1) +
geom_sf(data = stateSF, color = "white", fill = NA) +
geom_sf(data = countryborder, color = "blue", fill = NA)
Regarding trying to add one plot to another: ggplot2 works in layers, so you create a single base ggplot object, then add geometries on top of it. So you could make, for example, two valid plots:
state_plot <- ggplot(stateSF) +
geom_sf(color = "white", fill = NA)
country_plot <- ggplot(countryborder) +
geom_sf(color = "blue", fill = NA)
But you can't add them, because you would have 2 base ggplot objects. This should be the error you mentioned:
state_plot +
country_plot
#> Error: Don't know how to add country_plot to a plot
Instead, if you need to make a plot, then add something else on top of it, make the base ggplot, then add geometry layers, such as a geom_sf with a different set of data.
state_plot +
geom_sf(data = countryborder, fill = NA, color = "blue")
Created on 2018-10-29 by the reprex package (v0.2.1)
If you look at geom_sf(data=distSF) you'll see that it is a list made up of two elements - you want the first one which contains the layer information, so geom_sf(data = distSF, fill = "gold")[[1]] should work.
districts <- ggplot() +
geom_sf(data = distSF, fill = "gold")
basicIndia$layers <- c(geom_sf(data = distSF, fill = "gold")[[1]], basicIndia$layers)
Im making a treemap of some data using a pretty cool library called treemapifyof which the details can be found here and github repository here
Based on my reading of the documentation it seems to be based on ggplot2 so it should be possible to modify the graph using the grammar of graphics
My code is below with some made up data. The end result is pretty nice but i want to change the color scheme to a more subtle using the line scale_colour_brewer. The graph runs fine but the colour scheme seems to be ignored. Has anyone had any experience with this?
# Create Random Data
country <- c("Ireland","England","France","Germany","USA","Spain")
job <- c("IT","SOCIAL","Project Manager","Director","Vice-President")
mydf <- data.frame(countries = sample(country,100,replace = TRUE),
career = sample(job,100,replace=TRUE),
participent = sample(1:100, replace = TRUE)
)
# Set Up the coords
treemap_coords <- treemapify(mydf,
area="participent",
fill="countries",
label="career",
group="countries")
# Plot the results using the Green Pallete
ggplotify(treemap_coords,
group.label.size.factor = 2,
group.label.colour = "white",
label.colour = "black",
label.size.factor = 1) +
labs(title="Work Breakdown") +
scale_colour_brewer(palette = "Greens")
If you want to change the fill color of the rectangles, try the scale for fill instead the one for colour:
scale_fill_brewer(palette = "Greens")
I am trying to figure out how to display a map including the legend with ggmap/ggplot.
I have gotten so far:
library(ggmap)
library(RColorBrewer)
bbox <- c(8.437526,47.328268,8.605915,47.462160)
map.base <- get_map(maptype='toner',source = 'stamen',location = bbox)
ggmap(map.base) +
geom_blank() +
ggtitle("2015-09-21 06:00:00 CEST") +
scale_colour_manual(values = rev(brewer.pal(7,"Spectral")), drop = FALSE)+
scale_size_manual(values=c(1:7), drop = FALSE) +
guides(color=guide_legend(title='Mean Delay [s]'), size = guide_legend(title='Mean Delay [s]'))+
ggsave(file=paste("map_","2015-09-21 060000",".png",sep=""),dpi = 100)
dev.off()
This generates the correct map in the correct bounding box. But even thought I have specified: "scale_colour_manual" and "scale_size_manual" with "drop = FALSE", no legend is appearing. How can I have the legend shown when no data is to be displayed?
The overall intention is to create a single map of a given interval in a time series. Now the problem is that some intervals have no data and so the map is displayed without a scale. If the map does not have a scale the dimensions of the map are different making it impossible to create a movie out of the different maps. That is why I need to be able to create a map WITHOUT data but WITH the legend showing.
Thank you.
Taking Jaap's comment into account, that I have to call a legend in aes I have been able to achieve what I want with following code:
library(ggmap)
library(RColorBrewer)
bbox <- c(8.437526,47.328268,8.605915,47.462160)
map.base <- get_map(maptype='toner',source = 'stamen',location = bbox)
ggmap(map.base) +
geom_point(aes(x=0,y=0, color=cut(0,breaks = c(-Inf,0,60,120,240,300,360,Inf),right = FALSE), size=cut(0,breaks = c(-Inf,0,60,120,240,300,360,Inf),right = FALSE))) +
ggtitle("2015-09-21 06:00:00 CEST") +
scale_colour_manual(values = rev(brewer.pal(7,"Spectral")), drop = FALSE)+
scale_size_manual(values=c(1:7), drop = FALSE) +
guides(color=guide_legend(title='Mean Delay [s]'), size = guide_legend(title='Mean Delay [s]')) +
ggsave(file=paste("map_","2015-09-21 060000",".png",sep=""),dpi = 100)
dev.off()
I know this is not the most elegant way, but it works for now.
I basically make a dummy point outside of the bounding box to be displayed. I then give the point a value, which is cut according to the breaks I want and then colored and sized accordingly. Just remember to put the values of x and y in aes outside of the bounding box.
Better solutions are welcome.