I have an aggregated dataset which looks like this one:
place<-c('PHF','Mobile clinic','pharmacy','PHF','pharmacy','PHF','normal shop','pharmacy')
District<-c('District1','District1','District1','District2','District2','District3','District3','District3')
cat<-c('public','public','private','public','private','public','private','private')
Freq<-c(7,2,5,4,7,5,1,8)
Q14_HH<-data.frame(place,District,cat,Freq)
I create a barplot which looks great:
plot<- ggplot(data = Q14_HH, aes(x=place,y=Freq,fill=cat)) +
geom_bar(stat='identity') +
labs(title="Where do you get your medicines from normally? (human)",
subtitle='Precentage of households, n=30',x="", y="Percentage",fill='Outlet type') +
theme(axis.text.x = element_text(angle = 90, hjust=1,vjust=0.5))
Now I want to put the sum of the frequencies on top of each bar i.e. for each place variable:
plot+ stat_summary(geom='text',aes(label = Freq,group=place),fun=sum)
But for some reason it won't calculate the sum. I get a warning message:
Removed 2 rows containing missing values (geom_text)
Can someone help me understand what is happening here?
As you are computing the sum you have to map the computed y value on the label aes using label=..y.. or label=after_stat(y):
library(ggplot2)
plot <- ggplot(data = Q14_HH, aes(x = place, y = Freq, fill = cat)) +
geom_bar(stat = "identity") +
labs(
title = "Where do you get your medicines from normally? (human)",
subtitle = "Precentage of households, n=30", x = "", y = "Percentage", fill = "Outlet type"
) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
plot + stat_summary(geom = "text", aes(label = ..y.., group = place), fun = sum)
Related
I created a histogram with a number (count) of subjects on the y-axis and I'm trying to find a way to add the ID label of each ID on each bar of the histogram. I tried geom_text and geom_text_repel but I still can't get the number to be organized exactly on each bar for each subject.
For my dataset I'm reading a CSV file with 72 subjects, the columns are ID and Drugratio
ggplot code:
plot_5 <- ggplot(All_data, aes(x=as.numeric(logMRP), fill = as.factor(PATIENTID)))+geom_histogram(aes( bins = 30, label=as.factor(PATIENTID)))+ geom_text( stat='count', aes(label=ID), color="Black", size=3, check_overlap = TRUE, hjust=1, position=position_stack(vjust=0.5 ))+theme(legend.position = "none")
show(plot_5)
Any suggestions!
Thank you
Without your data or code, we can only guess, but it seems your data is something like this:
set.seed(4)
df <- data.frame(ID = factor(1:72), value = 2 - rgamma(72, 3, 2))
And your plotting code is like this:
library(ggplot2)
ggplot(df, aes(value, fill = ID)) +
geom_histogram(bins = 30) +
geom_text(stat = "count", aes(label = ID, y = ..count..),
check_overlap = TRUE) +
guides(fill = guide_none()) +
labs(x = NULL)
This looks very similar to your own plot. To fix it, let's use stat_bin with position = position_stack() for the text layer.
ggplot(df, aes(x = value, fill = ID)) +
geom_histogram(bins = 30) +
stat_bin(geom = "text", bins = 30, na.rm = TRUE,
aes(label = ifelse(after_stat(count) == 0, NA, after_stat(group)),
group = ID, y = after_stat(count)),
position = position_stack(vjust = 0.5)) +
guides(fill = guide_none()) +
labs(x = NULL)
Created on 2022-09-01 with reprex v2.0.2
Following the great suggestions I got from Allan Cameron, I was able to add the ID values for each subject similar to the graphs above
plot_5_new <- ggplot(All_data, aes(x=as.numeric(logMRP), fill = as.factor(ID)))+geom_histogram(bins=30)+ stat_bin(geom = "text", bins = 30, na.rm = TRUE, aes(label = ifelse(after_stat(count) == 0, NA, after_stat(group)), group = as.factor(ID), y = after_stat(count)),position = position_stack(vjust = 0.5)) + guides(fill = guide_none())+theme(legend.position = "none") show(plot_5_new)
I am trying to plot one column by Date (different color points for each animal category) and on the same graph, plot a second column by Date as well. The second column has entries for the days but only for certain categories, Large Dog. There is no adoption_with_discount for small or medium dogs (please see the reproducible example data set, example_data). When I plot them separately they visualize fine but not when plotted together. I thought I would just overlay a separate geom but that is not working.
I want to combine the two plots into one. My goal is for the points plot to have the line graph on top of it. I am trying to visualize the adoption as points colored by animal and put a line on the same graph of adoption_with_discount.
Thank you for your help!
# Make example -----------------------------------------------------------
# Here is an example data set
# You can see in the `adoption_with_discount` the values I want to add as a line.
library(lubridate)
library(tidyverse)
example_days <- data.frame(Date = c(seq.Date(from = as.Date('2022-03-01'), to = as.Date('2022-04-30'), by = 'days')))
example_small <-
example_days %>%
mutate(animal = "Small Dog")
a <-sample(100:150, nrow(example_small), rep = TRUE)
example_small <-
example_small %>%
mutate(adoption = a,
adoption_with_discount = NA)
example_med <-
example_days %>%
mutate(animal = "Medium Dog")
b <-sample(150:180, nrow(example_med), rep = TRUE)
example_med <-
example_med %>%
mutate(adoption = b,
adoption_with_discount = NA)
example_large <-
example_days %>%
mutate(animal = "Large Dog")
c <-sample(150:200, nrow(example_large), rep = TRUE)
example_large <-
example_large %>%
mutate(adoption = c)
example_large <-
example_large %>%
mutate(adoption_with_discount = adoption - 15)
example_data <- rbind(example_small, example_med, example_large)
# Plot --------------------------------------------------------------------
ggplot(data = example_data) +
geom_point(mapping = aes(x = Date,
y = adoption,
color = animal)) +
ggtitle("Dog Adoption by Size") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))
# Plot with Fee -----------------------------------------------------------
# This is where the problem is occurring
# When I want to add a line that plots the adoption with discount by day
# on top of the points, it does not populate.
ggplot(data = example_data) +
geom_point(mapping = aes(x = Date,
y = adoption,
color = animal)) +
geom_line(mapping = aes(x = Date,
y = adoption_with_discount),
color = "black") +
ggtitle("Dog Adoption by Size with Discount Included") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))
# See if just Discount will Plot -----------------------------------------
#This plots separately
ggplot(data = example_large) +
geom_line(mapping = aes(x = Date,
y = adoption_with_discount),
color = "black") +
ggtitle("Discount") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))
While subsetting is an option to fix the issue, the reason why no line is plotted is simply the missing grouping, i.e. in geom_line you are trying to plot observations for all three dog types as one group or line. However, because of the NAs no line will show up. An easy option to solve that would be to explicitly map animal on the group aes. Additionally I added na.rm=TRUE to silent the warning about removed NAs. Finally I right aligned your axis labels by adding hjust=1:
library(ggplot2)
ggplot(data = example_data) +
geom_point(mapping = aes(
x = Date,
y = adoption,
color = animal
)) +
geom_line(
mapping = aes(
x = Date,
y = adoption_with_discount,
group = animal
),
color = "black",
na.rm = TRUE
) +
ggtitle("Dog Adoption by Size with Discount Included") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Based on discussion here I found that you can use subset argument in the aes of geom_line to select values that are not NAs in adoption_with_discount column.
ggplot(data = example_data) +
geom_point(mapping = aes(x = Date,
y = adoption,
color = animal)) +
geom_line(mapping = aes(x = Date,
y = adoption_with_discount),
color = "black") +
ggtitle("Dog Adoption by Size with Discount Included") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45)) +
geom_line(mapping = aes(x = Date,
y = adoption_with_discount,
subset = !is.na(adoption_with_discount)),
color = "black") +
ggtitle("Discount") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))
The result:
It looks like it is the NA that are included in the geom_line portion that is creating the issue so you can filter those out before plotting the line:
geom_point(mapping = aes(x = Date,
y = adoption,
color = animal)) +
geom_line(data=example_data %>% filter(!is.na(adoption_with_discount)),
mapping = aes(x = Date,
y = adoption_with_discount),
color = "black") +
ggtitle("Dog Adoption by Size with Discount Included") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))
I am using the windrose function posted here: Wind rose with ggplot (R)?
I need to have the percents on the figure showing on the individual lines (rather than on the left side), but so far I have not been able to figure out how. (see figure below for depiction of goal)
Here is the code that makes the figure:
p.windrose <- ggplot(data = data,
aes(x = dir.binned,y = (..count..)/sum(..count..),
fill = spd.binned)) +
geom_bar()+
scale_y_continuous(breaks = ybreaks.prct,labels=percent)+
ylab("")+
scale_x_discrete(drop = FALSE,
labels = waiver()) +
xlab("")+
coord_polar(start = -((dirres/2)/360) * 2*pi) +
scale_fill_manual(name = "Wind Speed (m/s)",
values = spd.colors,
drop = FALSE)+
theme_bw(base_size = 12, base_family = "Helvetica")
I marked up the figure I have so far with what I am trying to do! It'd be neat if the labels either auto-picked the location with the least wind in that direction, or if it had a tag for the placement so that it could be changed.
I tried using geom_text, but I get an error saying that "aesthetics must be valid data columns".
Thanks for your help!
One of the things you could do is to make an extra data.frame that you use for the labels. Since the data isn't available from your question, I'll illustrate with mock data below:
library(ggplot2)
# Mock data
df <- data.frame(
x = 1:360,
y = runif(360, 0, 0.20)
)
labels <- data.frame(
x = 90,
y = scales::extended_breaks()(range(df$y))
)
ggplot(data = df,
aes(x = as.factor(x), y = y)) +
geom_point() +
geom_text(data = labels,
aes(label = scales::percent(y, 1))) +
scale_x_discrete(breaks = seq(0, 1, length.out = 9) * 360) +
coord_polar() +
theme(axis.ticks.y = element_blank(), # Disables default y-axis
axis.text.y = element_blank())
#teunbrand answer got me very close! I wanted to add the code I used to get everything just right in case anyone in the future has a similar problem.
# Create the labels:
x_location <- pi # x location of the labels
# Get the percentage
T_data <- data %>%
dplyr::group_by(dir.binned) %>%
dplyr::summarise(count= n()) %>%
dplyr::mutate(y = count/sum(count))
labels <- data.frame(x = x_location,
y = scales::extended_breaks()(range(T_data$y)))
# Create figure
p.windrose <- ggplot() +
geom_bar(data = data,
aes(x = dir.binned, y = (..count..)/sum(..count..),
fill = spd.binned))+
geom_text(data = labels,
aes(x=x, y=y, label = scales::percent(y, 1))) +
scale_y_continuous(breaks = waiver(),labels=NULL)+
scale_x_discrete(drop = FALSE,
labels = waiver()) +
ylab("")+xlab("")+
coord_polar(start = -((dirres/2)/360) * 2*pi) +
scale_fill_manual(name = "Wind Speed (m/s)",
values = spd.colors,
drop = FALSE)+
theme_bw(base_size = 12, base_family = "Helvetica") +
theme(axis.ticks.y = element_blank(), # Disables default y-axis
axis.text.y = element_blank())
Here is a dataframe
DF <- data.frame(SchoolYear = c("2015-2016", "2016-2017"),
Value = sample(c('Agree', 'Disagree', 'Strongly agree', 'Strongly disagree'), 50, replace = TRUE))
I have created this graph.
ggplot(DF, aes(x = Value, fill = SchoolYear)) +
geom_bar(position = 'dodge', aes(y = (..count..)/sum(..count..))) +
geom_text(aes(y = ((..count..)/sum(..count..)), label = scales::percent((..count..)/sum(..count..))),
stat = "count", vjust = -0.25, size = 2, position = position_dodge(width = 0.9)) +
scale_y_continuous(labels = percent) +
ylab("Percent") + xlab("Response") +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
Is there a way to make the data for each school year add up to 100%, but not have the data stacked, in the graph?
I know this question is similar to this question Create stacked barplot where each stack is scaled to sum to 100%, but I don't want the graph to be stacked. I can't figure out how to apply the solution in my question to this situation. Also I would prefer not to summarize the data before graphing, as I have to make this graph many times using different data each time and would prefer not to have to summarize the data each time.
I'm not sure how to create the plot that you want without transforming the data. But if you want to re-use the same code for multiple datasets, you can write a function to transform your data and generate the plot at the same time:
plot.fun <- function (original.data) {
newDF <- reshape2::melt(apply(table(original.data), 1, prop.table))
Plot <- ggplot(newDF, aes(x=Value, y=value)) +
geom_bar(aes(fill=SchoolYear), stat="identity", position="dodge") +
geom_text(aes(group=SchoolYear, label=scales::percent(value)), stat="identity", vjust=-0.25, size=2, position=position_dodge(width=0.85)) +
scale_y_continuous(labels=scales::percent) +
ylab("Percent") + xlab("Response") +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
return (Plot)
}
plot.fun(DF)
Big Disclaimer: I would highly recommend you summarize your data before hand and not try to do these calculations within ggplot. That is not what ggplot is meant to do. Furthermore, it not only complicates your code unnecessarily, but can easily introduce bugs/unintended results.
Given that, it appears that what you want is doable (without summarizing first). A very hacky way to get what you want by doing the calculations within ggplot would be:
#Store factor values
fac <- unique(DF$SchoolYear)
ggplot(DF, aes(x = Value, fill = SchoolYear)) +
geom_bar(position = 'dodge', aes(y = (..count..)/stats::ave(..count.., get("fac", globalenv()), FUN = sum))) +
geom_text(aes(y = (..count..)/stats::ave(..count.., get("fac", globalenv()), FUN = sum), label = scales::percent((..count..)/stats::ave(..count.., get("fac", globalenv()), FUN = sum))),
stat = "count", vjust = -0.25, size = 2, position = position_dodge(width = 0.9)) +
scale_y_continuous(labels = percent) +
ylab("Percent") + xlab("Response") +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
This takes the ..count.. variable and divides it by the sum within it's respective group using stats::ave. Note this can be messed up extremely easily.
Finally, we check to see the plot is in fact giving us what we want.
#Check to see we have the correct values
d2 <- DF
d2 <- setDT(d2)[, .(count = .N), by = .(SchoolYear, Value)][, percent := count/sum(count), by = SchoolYear]
I have a test dataset like this:
df_test <- data.frame(
proj_manager = c('Emma','Emma','Emma','Emma','Emma','Alice','Alice'),
proj_ID = c(1, 2, 3, 4, 5, 6, 7),
stage = c('B','B','B','A','C','A','C'),
value = c(15,15,20,20,20,70,5)
)
Preparation for viz:
input <- select(df_test, proj_manager, proj_ID, stage, value) %>%
filter(proj_manager=='Emma') %>%
do({
proj_value_by_manager = sum(distinct(., proj_ID, value)$value);
mutate(., proj_value_by_manager = proj_value_by_manager)
}) %>%
group_by(stage) %>%
do({
sum_value_byStage = sum(distinct(.,proj_ID,value)$value);
mutate(.,sum_value_byStage= sum_value_byStage)
}) %>%
mutate(count_proj = length(unique(proj_ID)))
commapos <- function(x, ...) {
format(abs(x), big.mark = ",", trim = TRUE,
scientific = FALSE, ...) }
Visualization:
ggplot (input, aes(x=stage, y = count_proj)) +
geom_bar(stat = 'identity')+
geom_bar(aes(y=-proj_value_by_manager),
stat = "identity", fill = "Blue") +
scale_y_continuous(labels = commapos)+
coord_flip() +
ylab('') +
geom_text(aes(label= sum_value_byStage), hjust = 5) +
geom_text(aes(label= count_proj), hjust = -1) +
labs(title = "Emma: 4 projects| $90M Values \n \n Commitment|Projects") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = 0, linetype =1)
My questions are:
Why is the y-values not showing up right? e.g. C is labeled 20, but nearing hitting 100 on the scale.
How to adjust the position of labels so that it sits on the top of its bar?
How to re-scale the y axis so that both the very short bar of 'count of project' and long bar of 'Project value' can be well displayed?
Thank you all for the help!
I think your issues are coming from the fact that:
(1) Your dataset has duplicated values. This causes geom_bar to add all of them together. For example there are 3 obs for B where proj_value_by_manager = 90 which is why the blue bar extends to 270 for that group (they all get added).
(2) in your second geom_bar you use y = -proj_value_by_manager but in the geom_text to label this you use sum_value_byStage. That's why the blue bar for A is extending to 90 (since proj_value_by_manager is 90) but the label reads 20.
To get you what I believe the chart you want is you could do:
#Q1: No dupe dataset so it doesnt erroneous add columns
input2 <- input[!duplicated(input[,-c(2,4)]),]
ggplot (input2, aes(x=stage, y = count_proj)) +
geom_bar(stat = 'identity')+
geom_bar(aes(y=-sum_value_byStage), #Q1: changed so this y-value matches your label
stat = "identity", fill = "Blue") +
scale_y_continuous(labels = commapos)+
coord_flip() +
ylab('') +
geom_text(aes(label= sum_value_byStage, y = -sum_value_byStage), hjust = 1) + #Q2: Added in y-value for label and hjust so it will be on top
geom_text(aes(label= count_proj), hjust = -1) +
labs(title = "Emma: 4 projects| $90M Values \n \n Commitment|Projects") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = 0, linetype =1)
For your last question, there is no good way to display both of these. One option would be to rescale the small data and still label it with a 1 or 3. However, I didn't do this because once you scale down the blue bars the other bars look OK to me.