Bar/Pie Chart Label from Data Frame Column - r

I am making a pie chart and want to label it with the value for each slice. I have the information in a data frame but the column in which to look should be defined in the function call.
The code is the (decently) long, but I think only 1 line needs to be changed. I have tried mainsym, as.symbol, as.name, quote, and anything else I could think to throw at it but to no avail.
Thanks
library(dplyr)
library(ggplot2)
library(gridExtra)
pie_chart <- function(df, main, labels, labels_title=NULL) {
mainsym <- as.symbol(main)
labelssym <- as.symbol(labels)
# convert the data into percentages. add label position and inner label text
df <- df %>%
mutate(perc = mainsym / sum(mainsym)) %>%
mutate(label_pos = 1 - cumsum(perc) + perc / 2,
inner_label_text = paste0(round(perc * 100), "%\n",main)) #NEED HELP HERE! Replace 'main' with something
#debug print statement
print(df)
# reorder the category factor levels to order the legend
df[[labels]] <- factor(df[[labels]], levels = unique(df[[labels]]))
p <- ggplot(data = df, aes_(x = factor(1), y = ~perc, fill = labelssym)) +
# make stacked bar chart with black border
geom_bar(stat = "identity", color = "black", width = 1) +
# add the percents and values to the interior of the chart
geom_text(aes(x = 1.25, y = label_pos, label = inner_label_text), size = 4) +
# convert to polar coordinates
coord_polar(theta = "y",direction=-1)
return(p)
}
set.seed(42)
donations <- data.frame(donation_total=sample(1:1E5,50,replace=TRUE))
donation_size_levels_same <- seq(0,2E6,10E3)
donations$bracket <- cut(donations$donation_total,breaks=donation_size_levels_same,right=FALSE,dig.lab = 50)
donations.by_bracket <- donations %>%
group_by(bracket) %>%
summarize(n=n(),total=sum(donation_total)) %>%
ungroup() %>%
arrange(bracket)
grid.arrange(
pie_chart(df=donations.by_bracket,main="n",labels="bracket",labels_title="Total Amount Donated"),
pie_chart(df=donations.by_bracket,main="total",labels="bracket",labels_title="Total Amount Donated"))

The label placement still needs some adjustment but this seems to address the labelling issue, if you just replace that one line (where you say need help here) as follows:
mutate(label_pos = 1 - cumsum(perc) + perc / 2,
inner_label_text = paste0(round(perc * 100), "%\n",as.character(df[[main]])))

Related

Show other data points when using ggiraph in R?

I am using ggiraph to make an interactive plot in R. My data is grouped and what I'm hoping to do is plot just the mean value of the group but when I hover over that point in the plot, the other points appear. Hopefully, my example below will explain what I mean.
To begin I create some data and make a basic plot:
library(ggplot2)
library(ggiraph)
# create some data
dat1 <- data.frame(X=rnorm(21),
Y=rnorm(21),
groupID=rep(1,21))
dat2 <- data.frame(X=rnorm(21,5),
Y=rnorm(21,5),
groupID=rep(2,21))
dat3 <- data.frame(X=rnorm(21,10),
Y=rnorm(21,10),
groupID=rep(3,21))
ggdat <- rbind(dat1,dat2,dat3)
ggdat$groupID <- as.factor(ggdat$groupID)
# create a plot
ggplot(ggdat, aes(X,Y)) +
geom_point(aes(color = groupID)) +
theme(legend.position = 'none')
We can see the 3 different groups in the above plot.
Then, I'm finding the mean value of each group and plot that. In the example plot below, I'm also plotting all the points with a low alpha value and the mean point in black.
library(dplyr)
# create mean data frame
dfMean <- ggdat %>%
group_by(groupID) %>%
dplyr::summarize(mX = mean(X), mY = mean(Y))
gg_scatter <- ggplot(dfMean, aes(mX, mY, tooltip = groupID, data_id = groupID)) +
geom_point(data = ggdat, aes(X,Y), alpha = 0.1, color = ggdat$groupID) +
theme(legend.position = 'none') +
geom_point_interactive()
gg_scatter
What I'm hoping to do is when I hover over one of the black points, it changes the alpha value for that group to, say, alpha = 1 and shows all the points for that group.
Naively I just tried:
girafe(ggobj = gg_scatter,
options = list(
opts_hover_inv(css = "opacity:0.5;"),
opts_hover(css = "fill:red;")
) )
but this will just highlight the mean point that I'm hovering over and changes the other mean values points alpha.
Is there a way to hover over the mean value point, which changes the alpha for that particular group?
I am not sure if I answer correctly, but I hope it could help:
In your code, you did not use geom_point_interactive()when plotting the first points, so they can not be interactive.
library(ggplot2)
library(ggiraph)
# create some data
dat1 <- data.frame(X=rnorm(21),
Y=rnorm(21),
groupID=rep(1,21))
dat2 <- data.frame(X=rnorm(21,5),
Y=rnorm(21,5),
groupID=rep(2,21))
dat3 <- data.frame(X=rnorm(21,10),
Y=rnorm(21,10),
groupID=rep(3,21))
ggdat <- rbind(dat1,dat2,dat3)
ggdat$groupID <- as.factor(ggdat$groupID)
library(dplyr)
# create mean data frame
dfMean <- ggdat %>%
group_by(groupID) %>%
dplyr::summarize(mX = mean(X), mY = mean(Y))
gg_scatter <- ggplot(dfMean, aes(mX, mY, tooltip = groupID, data_id = groupID)) +
geom_point_interactive(data = ggdat, aes(X,Y, color = groupID), alpha = 0.9) +
theme(legend.position = 'none') +
geom_point_interactive()
gg_scatter
girafe(ggobj = gg_scatter,
options = list(
opts_hover_inv(css = "opacity:0.1;"),
opts_hover(css = "fill:red;")
) )

Visualizing stacked bar chart in the format of Jonathan A. Schwabish (JEP 2014)

I am trying to plot the following data (df_input) in the format of a stacked bar graph where we can also see the change over time by line. Any idea how to do it?
df_input <- data.frame( Year= c(2010,2010,2010,2010,2020,2020,2020,2020), village= c("A","B","C","D","A","B","C","D"), share = c(40,30,20,10,30,30,25,15))
df_input_2 <- data.frame( Year= c(2010,2010,2010,2010,2015,2015,2015,2015,2020,2020,2020,2020), village= c("A","B","C","D","A","B","C","D","A","B","C","D"), share = c(40,30,20,10,30,30,25,15,20,10,30,40))
One option to achieve that would be via a geom_col and a geom_line. For the geom_line you have to group by the variable mapped on fill, set position to "stack" and adjust the start/end positions to account for the widths of the bars. Additionally you have to manually set the orientation for the geom_line to y:
library(ggplot2)
width <- .6 # Bar width
ggplot(df_input, aes(share, factor(Year), fill = village)) +
geom_col(width = width) +
geom_line(aes(x = share,
y = as.numeric(factor(Year)) + ifelse(Year == 2020, -width / 2, width / 2),
group = village), position = "stack", orientation = "y")
EDIT With more than two years things get a bit trickier. In that case I would switch to ´geom_segment`. Additionally we have to do some data wrangling to prepare the data for use with ´geom_segment´:
library(ggplot2)
library(dplyr)
# Example data with three years
df_input_2 <- data.frame( Year= c(2010,2010,2010,2010,2015,2015,2015,2015,2020,2020,2020,2020), village= c("A","B","C","D","A","B","C","D","A","B","C","D"), share = c(40,30,20,10,30,30,25,15,20,10,30,40))
width = .6
# Data wrangling
df_input_2 <- df_input_2 %>%
group_by(Year) %>%
arrange(desc(village)) %>%
mutate(share_cum = cumsum(share)) %>%
group_by(village) %>%
arrange(Year) %>%
mutate(Year = factor(Year),
Year_lead = lead(Year), share_cum_lead = lead(share_cum))
ggplot(df_input_2, aes(share, factor(Year), fill = village)) +
geom_col(width = width) +
geom_segment(aes(x = share_cum, xend = share_cum_lead, y = as.numeric(Year) + width / 2, yend = as.numeric(Year_lead) - width / 2, group = village))
#> Warning: Removed 4 rows containing missing values (geom_segment).

Move chart labels of variables in opposite directions

I couldn't find out how to do this anywhere so I thought I would post the solution now that I've figured it out.
I created a simple chart with labels based on a data set in long format (see below for dat). There are two lines and the labels overlap. I would like to move the labels for the upper chart up and for the lower chart down.
library(dplyr)
library(ggplot2)
library(tidyr)
# sample data
dat <- data.frame(
x = seq(1, 10, length.out = 10),
y1 = seq(1, 5, length.out = 10),
y2 = seq(1, 6, length.out = 10))
# convert to long format
dat <- dat %>%
gather(var, value, -x)
# plot it
ggplot(data = dat, aes(x = x, y = value, color = var)) +
geom_line() +
geom_label(aes(label = value))
To move the labels in opposite directions, one can create a step function in nudge_y to multiply the upper line's labels by +1 times a nudge factor and the multiply the lower line's labels by -1 times the nudge factor:
# move labels in opposite directions
ggplot(data = dat, aes(x = x, y = value, color = var)) +
geom_line() +
geom_label(aes(label = value),
nudge_y = ifelse(dat$var == "y2", 1, -1) * 1)
This produces the following chart with adjusted labels.

R ggplot facet_wrap y ticks on different sides

For some reason, I have to make a plot that looks more or less like this:
For this I use the following code:
library(ggplot2)
library(tidyverse)
set.seed(10)
df<-data.frame(Meas = runif(1000,0,10),
Prop1 = sample(x = LETTERS[1:3],1000,replace=TRUE),
Prop2 = sample(x = letters[1:5],1000,replace=TRUE),
Prop3 = sample(x=c("monkey","donkey","flipper"),1000,replace=TRUE))%>%
gather(Prop,Propvalue,-Meas)
ggplot(df,aes(x = Propvalue,y=Meas))+
geom_boxplot()+
facet_wrap(~Prop,ncol=2,scales="free_y")+
coord_flip()
I believe this would look better if the y-ticks on the right graph would appear on the right (for the graphs on the left, the y-ticks should remain where they are, but flipper and donkey should appear on the right side to avoid the gap between the left and right panels), but I can't find a way to do this.
Here's a hack that utilises ggplot's sec.axis argument, which creates a secondary axis opposite the primary axis & has to be a one-to-one mapping of it. I call this a hack, because this works only for continuous axis, so we need to map the categorical Propvalue to numeric values.
Note: I assumed in this example that you want all odd numbered PropX facets' labels on the left, & even numbered PropX facets' labels on the right. You can also tweak the options for other variations.
library(ggplot2)
library(tidyverse)
# generate data
set.seed(10)
df<-data.frame(Meas = runif(1000,0,10),
Prop1 = sample(x = LETTERS[1:3],1000,replace=TRUE),
Prop2 = sample(x=c("monkey","donkey","flipper"),1000,replace=TRUE),
Prop3 = sample(x = letters[1:5],1000,replace=TRUE))%>%
gather(Prop,Propvalue,-Meas)
# map Propvalue to integers, primary axis contents, & secondary axis contents.
df2 <- df %>%
mutate(Propvalue.int = as.integer(factor(Propvalue,
levels = df %>% select(Prop, Propvalue) %>%
arrange(Prop, Propvalue) %>% unique() %>%
select(Propvalue) %>% unlist())),
facet.column = ifelse(as.integer(str_extract(Prop, "[0-9]")) %% 2 == 0, 2, 1),
Propvalue.left = ifelse(facet.column == 1, Propvalue, ""),
Propvalue.right = ifelse(facet.column == 2, Propvalue, ""))
# create mapping table
integer2factor <- df2 %>%
select(Propvalue.int, Propvalue.left, Propvalue.right) %>%
unique() %>% arrange(Propvalue.int)
ggplot(df2,aes(x = Propvalue.int, y=Meas,
group = Propvalue.int))+
geom_boxplot() +
scale_x_continuous(breaks = integer2factor$Propvalue.int,
labels = integer2factor$Propvalue.left,
name = "Propvalue",
sec.axis = dup_axis(breaks = integer2factor$Propvalue.int,
labels = integer2factor$Propvalue.right,
name = "")) +
facet_wrap(~Prop,ncol=2,scales="free")+
coord_flip() +
theme(axis.ticks.y = element_blank())
I believe this will do the trick.
library(ggplot2)
library(tidyverse)
library(tidyr)
set.seed(10)
df <-data.frame(Meas = runif(1000,0,10),
Prop1 = sample(x = LETTERS[1:3],1000,replace=TRUE),
Prop2 = sample(x = letters[1:5],1000,replace=TRUE),
Prop3 = sample(x=c("monkey","donkey","flipper"),1000,replace=TRUE))%>%
gather(Prop,Propvalue,-Meas)
ggplot(df,aes(x = Propvalue,y=Meas))+
geom_boxplot()+
facet_wrap(~Prop,ncol=2,scales="free_y")+
coord_flip()
p.list = lapply(sort(unique(df$Prop)), function(i) { # i <- "Prop1"
ggplot(df[df$Prop==i,],aes(x = Propvalue, y=Meas))+
geom_boxplot()+
facet_wrap(~Prop,scales="free_y")+
coord_flip()
})
p.list[[2]] <- p.list[[2]] + scale_x_discrete(position = "top")
library(gridExtra)
do.call(grid.arrange, c(p.list, nrow=2))

How to prevent two labels to overlap in a barchart?

The image below shows a chart that I created with the code below. I highlighted the missing or overlapping labels. Is there a way to tell ggplot2 to not overlap labels?
week = c(0, 1, 1, 1, 1, 2, 2, 3, 4, 5)
statuses = c('Shipped', 'Shipped', 'Shipped', 'Shipped', 'Not-Shipped', 'Shipped', 'Shipped', 'Shipped', 'Not-Shipped', 'Shipped')
dat <- data.frame(Week = week, Status = statuses)
p <- qplot(factor(Week), data = dat, geom = "bar", fill = factor(Status))
p <- p + geom_bar()
# Below is the most important line, that's the one which displays the value
p <- p + stat_bin(aes(label = ..count..), geom = "text", vjust = -1, size = 3)
p
You can use a variant of the well-known population pyramid.
Some sample data (code inspired by Didzis Elferts' answer):
set.seed(654)
week <- sample(0:9, 3000, rep=TRUE, prob = rchisq(10, df = 3))
status <- factor(rbinom(3000, 1, 0.15), labels = c("Shipped", "Not-Shipped"))
data.df <- data.frame(Week = week, Status = status)
Compute count scores for each week, then convert one category to negative values:
library("plyr")
plot.df <- ddply(data.df, .(Week, Status), nrow)
plot.df$V1 <- ifelse(plot.df$Status == "Shipped",
plot.df$V1, -plot.df$V1)
Draw the plot. Note that the y-axis labels are adapted to show positive values on either side of the baseline.
library("ggplot2")
ggplot(plot.df) +
aes(x = as.factor(Week), y = V1, fill = Status) +
geom_bar(stat = "identity", position = "identity") +
scale_y_continuous(breaks = 100 * -1:5,
labels = 100 * c(1, 0:5)) +
geom_text(aes(y = sign(V1) * max(V1) / 30, label = abs(V1)))
The plot:
For production purposes you'd need to determine the appropriate y-axis tick labels dynamically.
Made new sample data (inspired by code of #agstudy).
week <- sample(0:5,1000,rep=TRUE,prob=c(0.2,0.05,0.15,0.5,0.03,0.1))
statuses <- gl(2,1000,labels=c('Not-Shipped', 'Shipped'))
dat <- data.frame(Week = week, Status = statuses)
Using function ddply() from library plyr made new data frame text.df for labels. Column count contains number of observations in each combination of Week and Status. Then added column ypos that contains cumulative sum of count for each Week plus 15. This will be used for y position. For Not-Shipped ypos replaced with -10.
library(plyr)
text.df<-ddply(dat,.(Week,Status),function(x) data.frame(count=nrow(x)))
text.df<-ddply(text.df,.(Week),transform,ypos=cumsum(count)+15)
text.df$ypos[text.df$Status=="Not-Shipped"]<- -10
Now labels are plotted with geom_text() using new data frame.
ggplot(dat,aes(as.factor(Week),fill=Status))+geom_bar()+
geom_text(data=text.df,aes(x=as.factor(Week),y=ypos,label=count))
One solution to avoid overlaps is to use to dodge position of bars and texts. To avoid missing values you can set ylim. Here an example.
## I create some more realistic data similar to your picture
week <- sample(0:5,1000,rep=TRUE)
statuses <- gl(2,1000,labels=c('Not-Shipped', 'Shipped'))
dat <- data.frame(Week = week, Status = statuses)
## for dodging
dodgewidth <- position_dodge(width=0.9)
## get max y to set ylim
ymax <- max(table(dat$Week,dat$Status))+20
ggplot(dat,aes(x = factor(Week),fill = factor(Status))) +
geom_bar( position = dodgewidth ) +
stat_bin(geom="text", position= dodgewidth, aes( label=..count..),
vjust=-1,size=5)+
ylim(0,ymax)
Based on Didzis plot you could also increase readability by keeping the position on the y axis constant and by colouring the text in the same colour as the legend.
library(ggplot2)
week <- sample(0:5,1000,rep=TRUE,prob=c(0.2,0.05,0.15,0.5,0.03,0.1))
statuses <- gl(2,1000,labels=c('Not-Shipped', 'Shipped'))
dat <- data.frame(Week = week, Status = statuses)
library(plyr)
text.df<-ddply(dat,.(Week,Status),function(x) data.frame(count=nrow(x)))
text.df$ypos[text.df$Status=="Not-Shipped"]<- -15
text.df$ypos[text.df$Status=="Shipped"]<- -55
p <- ggplot(dat,aes(as.factor(Week),fill=Status))+geom_bar()+
geom_text(data=text.df,aes(x=as.factor(Week),y=ypos,label=count),colour=ifelse(text.df$Status=="Not-Shipped","#F8766D","#00BFC4"))

Resources