Related
This may be a wish list thing, not sure (i.e. maybe there would need to be the creation of geom_pie for this to occur). I saw a map today (LINK) with pie graphs on it as seen here.
I don't want to debate the merits of a pie graph, this was more of an exercise of can I do this in ggplot?
I have provided a data set below (loaded from my drop box) that has the mapping data to make a New York State map and some purely fabricated data on racial percentages by county. I have given this racial make up as a merge with the main data set and as a separate data set called key. I also think Bryan Goodrich's response to me in another post (HERE) on centering county names will be helpful to this concept.
How can we make the map above with ggplot2?
A data set and the map without the pie graphs:
load(url("http://dl.dropbox.com/u/61803503/nycounty.RData"))
head(ny); head(key) #view the data set from my drop box
library(ggplot2)
ggplot(ny, aes(long, lat, group=group)) + geom_polygon(colour='black', fill=NA)
# Now how can we plot a pie chart of race on each county
# (sizing of the pie would also be controllable via a size
# parameter like other `geom_` functions).
Thanks in advance for your ideas.
EDIT: I just saw another case at junkcharts that screams for this type of capability:
Three years later this is solved. I've put together a number of processes together and thanks to #Guangchuang Yu's excellent ggtree package this can be done fairly easily. Note that as of (9/3/2015) you need to have version 1.0.18 of ggtree installed but these will eventually trickle down to their respective repositories.
I've used the following resources to make this (the links will give greater detail):
ggtree blog
move ggplot legend
correct ggtree version
centering things in polygons
Here's the code:
load(url("http://dl.dropbox.com/u/61803503/nycounty.RData"))
head(ny); head(key) #view the data set from my drop box
if (!require("pacman")) install.packages("pacman")
p_load(ggplot2, ggtree, dplyr, tidyr, sp, maps, pipeR, grid, XML, gtable)
getLabelPoint <- function(county) {Polygon(county[c('long', 'lat')])#labpt}
df <- map_data('county', 'new york') # NY region county data
centroids <- by(df, df$subregion, getLabelPoint) # Returns list
centroids <- do.call("rbind.data.frame", centroids) # Convert to Data Frame
names(centroids) <- c('long', 'lat') # Appropriate Header
pops <- "http://data.newsday.com/long-island/data/census/county-population-estimates-2012/" %>%
readHTMLTable(which=1) %>%
tbl_df() %>%
select(1:2) %>%
setNames(c("region", "population")) %>%
mutate(
population = {as.numeric(gsub("\\D", "", population))},
region = tolower(gsub("\\s+[Cc]ounty|\\.", "", region)),
#weight = ((1 - (1/(1 + exp(population/sum(population)))))/11)
weight = exp(population/sum(population)),
weight = sqrt(weight/sum(weight))/3
)
race_data_long <- add_rownames(centroids, "region") %>>%
left_join({distinct(select(ny, region:other))}) %>>%
left_join(pops) %>>%
(~ race_data) %>>%
gather(race, prop, white:other) %>%
split(., .$region)
pies <- setNames(lapply(1:length(race_data_long), function(i){
ggplot(race_data_long[[i]], aes(x=1, prop, fill=race)) +
geom_bar(stat="identity", width=1) +
coord_polar(theta="y") +
theme_tree() +
xlab(NULL) +
ylab(NULL) +
theme_transparent() +
theme(plot.margin=unit(c(0,0,0,0),"mm"))
}), names(race_data_long))
e1 <- ggplot(race_data_long[[1]], aes(x=1, prop, fill=race)) +
geom_bar(stat="identity", width=1) +
coord_polar(theta="y")
leg1 <- gtable_filter(ggplot_gtable(ggplot_build(e1)), "guide-box")
p <- ggplot(ny, aes(long, lat, group=group)) +
geom_polygon(colour='black', fill=NA) +
theme_bw() +
annotation_custom(grob = leg1, xmin = -77.5, xmax = -78.5, ymin = 44, ymax = 45)
n <- length(pies)
for (i in 1:n) {
nms <- names(pies)[i]
dat <- race_data[which(race_data$region == nms)[1], ]
p <- subview(p, pies[[i]], x=unlist(dat[["long"]])[1], y=unlist(dat[["lat"]])[1], dat[["weight"]], dat[["weight"]])
}
print(p)
This functionality should be in ggplot, I think it is coming to ggplot soonish, but it is currently available in base plots. I thought I would post this just for comparison's sake.
load(url("http://dl.dropbox.com/u/61803503/nycounty.RData"))
library(plotrix)
e=10^-5
myglyff=function(gi) {
floating.pie(mean(gi$long),
mean(gi$lat),
x=c(gi[1,"white"]+e,
gi[1,"black"]+e,
gi[1,"hispanic"]+e,
gi[1,"asian"]+e,
gi[1,"other"]+e),
radius=.1) #insert size variable here
}
g1=ny[which(ny$group==1),]
plot(g1$long,
g1$lat,
type='l',
xlim=c(-80,-71.5),
ylim=c(40.5,45.1))
myglyff(g1)
for(i in 2:62)
{gi=ny[which(ny$group==i),]
lines(gi$long,gi$lat)
myglyff(gi)
}
Also, there may be (probably are) more elegant ways of doing this in the base graphics.
As, you can see, there are quite a few problems with this that need to be solved. A fill color for the counties. The pie charts tend to be too small or overlap. The lat and long do not take a projection so sizes of counties are distorted.
In any event, I am interested in what others can come up with.
I've written some code to do this using grid graphics. There is an example here: https://qdrsite.wordpress.com/2016/06/26/pies-on-a-map/
The goal here was to associate the pie charts with specific points on the map, and not necessarily regions. For this particular solution, it is necessary to convert the map coordinates (latitude and longitude) to a (0,1) scale so they can be plotted in the proper locations on the map. The grid package is used to print to the viewport that contains the plot panel.
Code:
# Pies On A Map
# Demonstration script
# By QDR
# Uses NLCD land cover data for different sites in the National Ecological Observatory Network.
# Each site consists of a number of different plots, and each plot has its own land cover classification.
# On a US map, plot a pie chart at the location of each site with the proportion of plots at that site within each land cover class.
# For this demo script, I've hard coded in the color scale, and included the data as a CSV linked from dropbox.
# Custom color scale (taken from the official NLCD legend)
nlcdcolors <- structure(c("#7F7F7F", "#FFB3CC", "#00B200", "#00FFFF", "#006600", "#E5CC99", "#00B2B2", "#FFFF00", "#B2B200", "#80FFCC"), .Names = c("unknown", "cultivatedCrops", "deciduousForest", "emergentHerbaceousWetlands", "evergreenForest", "grasslandHerbaceous", "mixedForest", "pastureHay", "shrubScrub", "woodyWetlands"))
# NLCD data for the NEON plots
nlcdtable_long <- read.csv(file='https://www.dropbox.com/s/x95p4dvoegfspax/demo_nlcdneon.csv?raw=1', row.names=NULL, stringsAsFactors=FALSE)
library(ggplot2)
library(plyr)
library(grid)
# Create a blank state map. The geom_tile() is included because it allows a legend for all the pie charts to be printed, although it does not
statemap <- ggplot(nlcdtable_long, aes(decimalLongitude,decimalLatitude,fill=nlcdClass)) +
geom_tile() +
borders('state', fill='beige') + coord_map() +
scale_x_continuous(limits=c(-125,-65), expand=c(0,0), name = 'Longitude') +
scale_y_continuous(limits=c(25, 50), expand=c(0,0), name = 'Latitude') +
scale_fill_manual(values = nlcdcolors, name = 'NLCD Classification')
# Create a list of ggplot objects. Each one is the pie chart for each site with all labels removed.
pies <- dlply(nlcdtable_long, .(siteID), function(z)
ggplot(z, aes(x=factor(1), y=prop_plots, fill=nlcdClass)) +
geom_bar(stat='identity', width=1) +
coord_polar(theta='y') +
scale_fill_manual(values = nlcdcolors) +
theme(axis.line=element_blank(),
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
legend.position="none",
panel.background=element_blank(),
panel.border=element_blank(),
panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
plot.background=element_blank()))
# Use the latitude and longitude maxima and minima from the map to calculate the coordinates of each site location on a scale of 0 to 1, within the map panel.
piecoords <- ddply(nlcdtable_long, .(siteID), function(x) with(x, data.frame(
siteID = siteID[1],
x = (decimalLongitude[1]+125)/60,
y = (decimalLatitude[1]-25)/25
)))
# Print the state map.
statemap
# Use a function from the grid package to move into the viewport that contains the plot panel, so that we can plot the individual pies in their correct locations on the map.
downViewport('panel.3-4-3-4')
# Here is the fun part: loop through the pies list. At each iteration, print the ggplot object at the correct location on the viewport. The y coordinate is shifted by half the height of the pie (set at 10% of the height of the map) so that the pie will be centered at the correct coordinate.
for (i in 1:length(pies))
print(pies[[i]], vp=dataViewport(xData=c(-125,-65), yData=c(25,50), clip='off',xscale = c(-125,-65), yscale=c(25,50), x=piecoords$x[i], y=piecoords$y[i]-.06, height=.12, width=.12))
The result looks like this:
I stumbled upon what looks like a function to do this: "add.pie" in the "mapplots" package.
The example from the package is below.
plot(NA,NA, xlim=c(-1,1), ylim=c(-1,1) )
add.pie(z=rpois(6,10), x=-0.5, y=0.5, radius=0.5)
add.pie(z=rpois(4,10), x=0.5, y=-0.5, radius=0.3)
A slight variation on the OP's original requirements, but this seems like an appropriate answer/update.
If you want an interactive Google Map, as of googleway v2.6.0 you can add charts inside info_windows of map layers.
see ?googleway::google_charts for documentation and examples
library(googleway)
set_key("GOOGLE_MAP_KEY")
## create some dummy chart data
markerCharts <- data.frame(stop_id = rep(tram_stops$stop_id, each = 3))
markerCharts$variable <- c("yes", "no", "maybe")
markerCharts$value <- sample(1:10, size = nrow(markerCharts), replace = T)
chartList <- list(
data = markerCharts
, type = 'pie'
, options = list(
title = "my pie"
, is3D = TRUE
, height = 240
, width = 240
, colors = c('#440154', '#21908C', '#FDE725')
)
)
google_map() %>%
add_markers(
data = tram_stops
, id = "stop_id"
, info_window = chartList
)
I have a task using GGplot that I'm stuck with.
Im asked to horizontally plot the percentage of income provenance per individual with the following data.frame :
PersonalID <- c(1,2,3)
Stock.Return <- c(20,10,70)
Salary <- c(90,25,40)
Bond.Return <- c(16,10,7)
MyDat <- data.frame(PersonalID, Stock.Return, Salary, Bond.Return)
And it should look like this
As i understand i'm supposed to use facet_wrap , I have X and Y but no fill so i'm a bit lost.
Thank you
Try this approach. The key in ggplot2 to get the maximum power of its functions is first reshaping to long and after that you can design the plot. Here the code for the plot, using tidyverse functions for data process and ggplot2:
library(tidyverse)
#Code
MyDat %>% pivot_longer(-PersonalID) %>%
ggplot(aes(x=factor(PersonalID),y=value,fill=name,group=name))+
geom_bar(stat='identity',color='black',position='fill')+
scale_y_continuous(labels = scales::percent)+
theme_bw()+
theme(legend.position = 'bottom',
plot.title = element_text(hjust=0.5))+
coord_flip()+
xlab('PersonalID')+ylab('Percentage')+
ggtitle('Income distribution per person')
Output:
Trying to use ggplotly to graph time series data with a vertical line to indicate dates of interest.
Call fails with Error in Ops.Date(z[[xy]], 86400000) : * not defined for "Date" objects. I have tried unsuccessfully using both the latest CRAN and development versions of ggplot2 (as per plotly recommendation). Other SO questions (e.g., ggplotly and geom_bar when using dates - latest version of plotly (4.7.0)) do not address my concerns.
As illustrated below with plot object p - both ggplot and ggplotly work as expected. However, when a geom_vline() is added to the plot in p2, it only works correctly in ggplot, failing when calling ggplotly(p2).
library(plotly)
library(ggplot2)
library(magrittr)
set.seed(1)
df <- data.frame(date = seq(from = lubridate::ymd("2019-01-01"), by = 1, length.out = 10),
y = rnorm(10))
p <- df %>%
ggplot(aes(x = date, y = y)) +
geom_line()
p ## plots as expected
ggplotly(p) ## plots as expected
p2 <- p + geom_vline(xintercept = lubridate::ymd("2019-01-08"), linetype = "dashed")
p2 ## plots as expected
ggplotly(p2) ##fails
I just solved this using #Axeman's suggestion. In your case, you can just replace the date:
lubridate::ymd("2019-01-01")
becomes
as.numeric(lubridate::ymd("2019-01-01"))
Not pretty, but it works.
For future reference:
The pop-up window for vertical lines created via date (or POSIX*) to numeric conversions is rather blank. This is particularly valid for POSIX* applications where the exact time can often not be read off directly.
In case you need more significant pop-up content, the definition of a text aesthetic could be helpful (just ignore the 'unknown aesthetics' warning as it doesn't seem to apply). Then, simply specify what you want to see during mouse hover via the tooltip argument, ie. rule out xintercept, and you're all set.
p2 = p +
geom_vline(
aes(
xintercept = as.numeric(lubridate::ymd("2019-01-08"))
, text = "date: 2019-01-08"
)
, linetype = "dashed"
)
ggplotly(p2, tooltip = c("x", "y", "text"))
I'm trying to create a forest plot with R plotly where I want to color code the effect sizes (points) and their error bars by their corresponding p-values.
Here are toy data:
set.seed(1)
factors <- paste0(1:25,":age")
effect.sizes <- rnorm(25,0,1)
effect.errors <- abs(rnorm(25,0,1))
p.values <- runif(25,0,1)
Here's what I'm trying:
library(dplyr)
plotly::plot_ly(type='scatter',mode="markers",y=~factors,x=~effect.sizes,color=~p.values,colors=grDevices::colorRamp(c("darkred","gray"))) %>%
plotly::add_trace(error_x=list(array=effect.errors),marker=list(color=~p.values,colors=grDevices::colorRamp(c("darkred","gray")))) %>%
plotly::colorbar(limits=c(0,1),len=0.4,title="P-Value") %>%
plotly::layout(xaxis=list(title="Effect Size",zeroline=T,showticklabels=T),yaxis=list(title="Factor",zeroline=F,showticklabels=T))
which gives me:
Which is pretty close to what I want except for:
I'd like the error bars to be colored similar to the effect sizes (by the corresponding p-values).
Remove the two trace legends below the colorbar
Have the order of the labels on the y-axis be that of factors
Any idea?
Okay it took me a while to warm up my plotly skills. Since your first point was the most difficult, I will go reversely through your points.
That can be achied by manipulating the layout using categoryorder
and categoryarray in the yaxis-list (cf. motos answer here)
Set showlegend=FALSE
That was tricky. I had to move your second line (the error bars) in the first. Added a color vector to it. Put it in the plot_ly-function. Used split to allow the correct coloring by group. Added the color for the points in a marker-list. In additon I converted the p.values via the colorRamp to hex-because every simpler solution didn't work for me.
Looks like this:
The code (the colorbar created some issues):
### Set category order
yform <- list(categoryorder = "array",
categoryarray = rev(factors),
title="Factor",zeroline=F,showticklabels=T)
### set the color scale and convert it to hex
library(grDevices)
mycramp<-colorRamp(c("darkred","gray"))
mycolors<-rgb(mycramp(p.values),maxColorValue = 255)
### plot without the adjusted colorbar
library(plotly)
### Without colorbar adjustment
plot_ly(type='scatter',mode="markers",y=~factors,x=~effect.sizes,
color=~p.values,colors=grDevices::colorRamp(c("darkred","gray")),
error_x=list(array=effect.errors,color=mycolors),split=factors,showlegend=FALSE,marker=list(color=mycolors)) %>%
layout(xaxis=list(title="Effect Size",zeroline=T,showticklabels=T),yaxis=yform)
### The colorbar-adjustment kicks out the original colors of the scatter points. Either you plot them over
plot_ly(type='scatter',mode="markers",y=~factors,x=~effect.sizes,
color=~p.values,colors=grDevices::colorRamp(c("darkred","gray")),
error_x=list(array=effect.errors,color=mycolors),split=factors,showlegend=FALSE,marker=list(color=mycolors)) %>%
layout(xaxis=list(title="Effect Size",zeroline=T,showticklabels=T),yaxis=yform) %>%
colorbar(limits=c(0,1),len=0.4,title="P-Value",inherit=FALSE) %>%
add_trace(type='scatter',mode="markers",y=~factors,x=~effect.sizes,
showlegend=FALSE,marker=list(color=mycolors),inherit=FALSE) %>%
layout(xaxis=list(title="Effect Size",zeroline=T,showticklabels=T),yaxis=yform)
### or you try to set the colorbar before the plot. This results in some warnings
plot_ly() %>%
colorbar(limits=c(0,1),len=0.4,title="P-Value",inherit=FALSE) %>%
add_trace(type='scatter',mode="markers",y=~factors,x=~effect.sizes,
color=~p.values,colors=grDevices::colorRamp(c("darkred","gray")),
error_x=list(array=effect.errors,color=mycolors),split=factors,showlegend=FALSE,marker=list(color=mycolors)) %>%
layout(xaxis=list(title="Effect Size",zeroline=T,showticklabels=T),yaxis=yform)
Just odd that this first point was so difficult to solve and results in such a big code bracket, because normally plotly supports that pipe logic quite well and you get a very readable code with all the add-functions.
I expected e.g., some add_errorbar-function, but apparently you have to add the errorbars in the plot_ly-function and the color-vector for the errors only works if you use the split-function. If someone would like to comment or post an alternative answer with more readable code on this, that would be interesting.
Here is an idea by constructing first a ggplot2 graph and using ggplotly:
create a data frame :
df <- data.frame(factors = factor(factors, levels = factors), #note the order of the levels which determines the order of the y axes
effect.sizes = effect.sizes,
effect.errors = effect.errors,
p.values = p.values)
create the ggplot graph:
library(ggplot2)
library(plotly)
ggplot(df)+
geom_vline(xintercept = 0, color = "grey50") +
geom_point(aes(y = factors,
x = effect.sizes,
color = p.values)) +
geom_errorbarh(aes(y = factors,
xmin = effect.sizes - effect.errors,
xmax = effect.sizes + effect.errors,
x = effect.sizes,
color = p.values)) +
scale_color_continuous(low = "darkred", high = "gray")+
theme_bw() +
xlab("Effect Sizes")+
ylab("Factors") +
theme(panel.border = element_blank(),
plot.margin = margin(1, 1, 1, 1, "cm")) -> p1
ggplotly(p1)
data:
set.seed(1)
factors <- paste0(1:25,":age")
effect.sizes <- rnorm(25,0,1)
effect.errors <- abs(rnorm(25,0,1))
p.values <- runif(25,0,1)
I'm trying to make a Gantt chart in ggplot based on the generous code offered by user Didzis Elferts. I'm trying to add a vertical line showing today's date, but the geom_vline layer in the ggplot2 package simply returns Error: Discrete value supplied to continuous scale. Here is my code:
today <- as.Date(Sys.Date(), "%m/%d/%Y")
library(scales)
ggplot(mdfr, aes(time,name, colour = is.critical)) +
geom_line(size = 6) +
xlab("") + ylab("")+
labs(title="Sample Project Progress")+
theme_bw()+
scale_x_datetime(breaks=date_breaks("1 year"))+
geom_vline(aes(xintercept=today))
The plot without the geom_vline command looks like this :
Any reason why geom_vline wouldn't work for the "Date" character?
EDIT: Reproducible code used to generate plot:
### GANTT CHART 1 ###############3
tasks <- c("Meetings", "Client Calls", "Design", "Bidding", "Construction")
dfr <- data.frame(
name = factor(tasks, levels = tasks),
start.date = c("07/08/2013", "07/08/2013", "07/23/2013", "08/30/2013", "9/30/2013"),
end.date = c("07/12/2013", "07/13/2013", "08/15/2013", "09/12/2013", "12/01/2013"),
is.critical = c(TRUE, FALSE, TRUE, TRUE, TRUE))
mdfr <- melt(dfr, measure.vars = c("start.date", "end.date"))
mdfr$time <- as.POSIXct(strptime(mdfr$value,"%m/%d/%Y"))
There are two thinks you need to change in your code.
First, as for making the time column in mdfr you use as.POSIXct() the same should be done with today - both variables should have the same format.
today <- as.POSIXct(Sys.Date(), "%m/%d/%Y")
Second, use as.numeric() inside the geom_vline() around today.
+ geom_vline(aes(xintercept=as.numeric(today)))