I have a time-series of data, where I'm plotting diagnosis rates for a disease on the y-axis DIAG_RATE_65_PLUS, and geographical groups for comparison on the x-axis NAME as a simple bar graph. My time variable is ACH_DATEyearmon, which the animation is cycling through as seen in the title.
df %>% ggplot(aes(reorder(NAME, DIAG_RATE_65_PLUS), DIAG_RATE_65_PLUS)) +
geom_bar(stat = "identity", alpha = 0.66) +
labs(title='{closest_state}') +
theme(plot.title = element_text(hjust = 1, size = 22),
axis.text.x=element_blank()) +
transition_states(ACH_DATEyearmon, transition_length = 1, state_length = 1) +
ease_aes('linear')
I've reordered NAME so it gets ranked by DIAG_RATE_65_PLUS.
What gganimate produces:
I now have two questions:
1) How exactly does gganimate reorder the data? There is some overall general reordering, but each month has no frame where the groups are perfectly ordered by DIAG_RATE_65_PLUS from smallest to biggest. Ideally, I would like the final month "Aug 2018" to be ordered perfectly. All of the previous months can have their x-axis based on the ordered NAME for "Aug 2018`.
2) Is there an option in gganimate where the groups "shift" to their correct rank for each month in the bar chart?
Plots for my comment queries:
https://i.stack.imgur.com/s2UPw.gif
https://i.stack.imgur.com/Z1wfd.gif
#JonSpring
df %>%
ggplot(aes(ordering, group = NAME)) +
geom_tile(aes(y = DIAG_RATE_65_PLUS/2,
height = DIAG_RATE_65_PLUS,
width = 0.9), alpha = 0.9, fill = "gray60") +
geom_hline(yintercept = (2/3)*25, linetype="dotdash") +
# text in x-axis (requires clip = "off" in coord_cartesian)
geom_text(aes(y = 0, label = NAME), hjust = 2) + ## trying different hjust values
theme(plot.title = element_text(hjust = 1, size = 22),
axis.ticks.y = element_blank(), ## axis.ticks.y shows the ticks on the flipped x-axis (the now metric), and hides the ticks from the geog layer
axis.text.y = element_blank()) + ## axis.text.y shows the scale on the flipped x-axis (the now metric), and hides the placeholder "ordered" numbers from the geog layer
coord_cartesian(clip = "off", expand = FALSE) +
coord_flip() +
labs(title='{closest_state}', x = "") +
transition_states(ACH_DATEyearmon,
transition_length = 2, state_length = 1) +
ease_aes('cubic-in-out')
With hjust=2, labels are not aligned and move around.
Changing the above code with hjust=1
#eipi10
df %>%
ggplot(aes(y=NAME, x=DIAG_RATE_65_PLUS)) +
geom_barh(stat = "identity", alpha = 0.66) +
geom_hline(yintercept=(2/3)*25, linetype = "dotdash") + #geom_vline(xintercept=(2/3)*25) is incompatible, but geom_hline works, but it's not useful for the plot
labs(title='{closest_state}') +
theme(plot.title = element_text(hjust = 1, size = 22)) +
transition_states(ACH_DATEyearmon, transition_length = 1, state_length = 50) +
view_follow(fixed_x=TRUE) +
ease_aes('linear')
To add on to #eipi10's great answer, I think this is a case where it's worth replacing geom_bar for more flexibility. geom_bar is normally quite convenient for discrete categories, but it doesn't let us take full advantage of gganimate's silky-smooth animation glory.
For instance, with geom_tile, we can recreate the same appearance as geom_bar, but with fluid movement on the x-axis. This helps to keep visual track of each bar and to see which bars are shifting order the most. I think this addresses the 2nd part of your question nicely.
To make this work, we can add to the data a new column showing the ordering that should be used at each month. We save this order as a double, not an integer (by using* 1.0). This will allow gganimate to place a bar at position 1.25 when it's animating between position 1 and 2.
df2 <- df %>%
group_by(ACH_DATEyearmon) %>%
mutate(ordering = min_rank(DIAG_RATE_65_PLUS) * 1.0) %>%
ungroup()
Now we can plot in similar fashion, but using geom_tile instead of geom_bar. I wanted to show the NAME both on top and at the axis, so I used two geom_text calls with different y values, one at zero and one at the height of the bar. vjust lets us align each vertically using text line units.
The other trick here is to turn off clipping in coord_cartesian, which lets the bottom text go below the plot area, into where the x-axis text would usually go.
p <- df2 %>%
ggplot(aes(ordering, group = NAME)) +
geom_tile(aes(y = DIAG_RATE_65_PLUS/2,
height = DIAG_RATE_65_PLUS,
width = 0.9), alpha = 0.9, fill = "gray60") +
# text on top of bars
geom_text(aes(y = DIAG_RATE_65_PLUS, label = NAME), vjust = -0.5) +
# text in x-axis (requires clip = "off" in coord_cartesian)
geom_text(aes(y = 0, label = NAME), vjust = 2) +
coord_cartesian(clip = "off", expand = FALSE) +
labs(title='{closest_state}', x = "") +
theme(plot.title = element_text(hjust = 1, size = 22),
axis.ticks.x = element_blank(),
axis.text.x = element_blank()) +
transition_states(ACH_DATEyearmon,
transition_length = 2, state_length = 1) +
ease_aes('cubic-in-out')
animate(p, nframes = 300, fps = 20, width = 400, height = 300)
Back to your first question, here's a color version that I made by removing fill = "gray60" from the geom_tile call. I sorted the NAME categories in order of Aug 2017, so they will look sequential for that one, as you described.
There's probably a better way to do that sorting, but I did it by joining df2 to a table with just the Aug 2017 ordering.
Aug_order <- df %>%
filter(ACH_DATEyearmon == "Aug 2017") %>%
mutate(Aug_order = min_rank(DIAG_RATE_65_PLUS) * 1.0) %>%
select(NAME, Aug_order)
df2 <- df %>%
group_by(ACH_DATEyearmon) %>%
mutate(ordering = min_rank(DIAG_RATE_65_PLUS) * 1.0) %>%
ungroup() %>%
left_join(Aug_order) %>%
mutate(NAME = fct_reorder(NAME, -Aug_order))
The bar ordering is done by ggplot and is not affected by gganimate. The bars are being ordered based on the sum of DIAG_RATE_65_PLUS within each ACH_DATEyearmon. Below I'll show how the bars are ordered and then provide code for creating the animated plot with the desired sorting from low to high in each frame.
To see how the bars are ordered, first let's create some fake data:
library(tidyverse)
library(gganimate)
theme_set(theme_classic())
# Fake data
dates = paste(rep(month.abb, each=10), 2017)
set.seed(2)
df = data.frame(NAME=c(replicate(12, sample(LETTERS[1:10]))),
ACH_DATEyearmon=factor(dates, levels=unique(dates)),
DIAG_RATE_65_PLUS=c(replicate(12, rnorm(10, 30, 5))))
Now let's make a single bar plot. The bars are the sum of DIAG_RATE_65_PLUS for each NAME. Note the order of the x-axis NAME values:
df %>%
ggplot(aes(reorder(NAME, DIAG_RATE_65_PLUS), DIAG_RATE_65_PLUS)) +
geom_bar(stat = "identity", alpha = 0.66) +
labs(title='{closest_state}') +
theme(plot.title = element_text(hjust = 1, size = 22))
You can see below that the ordering is the same when we explicitly sum DIAG_RATE_65_PLUS by NAME and sort by the sum:
df %>% group_by(NAME) %>%
summarise(DIAG_RATE_65_PLUS = sum(DIAG_RATE_65_PLUS)) %>%
arrange(DIAG_RATE_65_PLUS)
NAME DIAG_RATE_65_PLUS
1 A 336.1271
2 H 345.2369
3 B 346.7151
4 I 350.1480
5 E 356.4333
6 C 367.4768
7 D 368.2225
8 F 368.3765
9 J 368.9655
10 G 387.1523
Now we want to create an animation that sorts NAME by DIAG_RATE_65_PLUS separately for each ACH_DATEyearmon. To do this, let's first generate a new column called order that sets the ordering we want:
df = df %>%
arrange(ACH_DATEyearmon, DIAG_RATE_65_PLUS) %>%
mutate(order = 1:n())
Now we create the animation. transition_states generates the frames for each ACH_DATEyearmon. view_follow(fixed_y=TRUE)shows x-values only for the current ACH_DATEyearmon and maintains the same y-axis range for all frames.
Note that we use order as the x variable, but then we run scale_x_continuous to change the x-labels to be the NAME values. I've included these labels in the plot so you can see that they change with each ACH_DATEyearmon, but you can of course remove them in your actual plot as you did in your example.
p = df %>%
ggplot(aes(order, DIAG_RATE_65_PLUS)) +
geom_bar(stat = "identity", alpha = 0.66) +
labs(title='{closest_state}') +
theme(plot.title = element_text(hjust = 1, size = 22)) +
scale_x_continuous(breaks=df$order, labels=df$NAME) +
transition_states(ACH_DATEyearmon, transition_length = 1, state_length = 50) +
view_follow(fixed_y=TRUE) +
ease_aes('linear')
animate(p, nframes=60)
anim_save("test.gif")
If you turn off view_follow(), you can see what the "whole" plot looks like (and you can, of course, see the full, non-animated plot by stopping the code before the transition_states line).
p = df %>%
ggplot(aes(order, DIAG_RATE_65_PLUS)) +
geom_bar(stat = "identity", alpha = 0.66) +
labs(title='{closest_state}') +
theme(plot.title = element_text(hjust = 1, size = 22)) +
scale_x_continuous(breaks=df$order, labels=df$NAME) +
transition_states(ACH_DATEyearmon, transition_length = 1, state_length = 50) +
#view_follow(fixed_y=TRUE) +
ease_aes('linear')
UPDATE: To answer your questions...
To order by a given month's values, turn the data into a factor with the levels ordered by that month. To plot a rotated graph, instead of coord_flip, we'll use geom_barh (horizontal bar plot) from the ggstance package. Note that we have to switch the y's and x's in aes and view_follow() and that the order of the y-axis NAME values is now constant:
library(ggstance)
# Set NAME order based on August 2017 values
df = df %>%
arrange(DIAG_RATE_65_PLUS) %>%
mutate(NAME = factor(NAME, levels=unique(NAME[ACH_DATEyearmon=="Aug 2017"])))
p = df %>%
ggplot(aes(y=NAME, x=DIAG_RATE_65_PLUS)) +
geom_barh(stat = "identity", alpha = 0.66) +
labs(title='{closest_state}') +
theme(plot.title = element_text(hjust = 1, size = 22)) +
transition_states(ACH_DATEyearmon, transition_length = 1, state_length = 50) +
view_follow(fixed_x=TRUE) +
ease_aes('linear')
animate(p, nframes=60)
anim_save("test3.gif")
For smooth transitions, it seems like #JonSpring's answer handles that well.
Related
I have a gganimate sketch in R and I would like to have the percentages of my bar chart appear as labels.
But for some bizarre reason, I am getting seemingly random colours in place of the labels that I'm requesting.
If I run the ggplot part without animating then it's a mess (as it should be), but it's obvious that the percentages are appearing correctly.
Any ideas? The colour codes don't correspond to the colours of the bars which I have chosen separately. The codes displayed also cycle through about half a dozen different codes, at a rate different to the frame rate that I selected. And while the bars are the same height (they grow until they reach the chosen height displayed in the animation) then they display the same code until they stop and it gets frozen.
Code snippet:
df_new <- data.frame(index, rate, year, colour)
df_new$rate_label <- ifelse(round(df_new$rate, 1) %% 1 == 0,
paste0(round(df_new$rate, 1), ".0%"), paste0(round(df_new$rate, 1), "%"))
p <- ggplot(df_new, aes(x = year, y = rate, fill = year)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = colour) +
#geom_text(aes(y = rate, label = paste0(rate, "%")), vjust = -0.7) +
geom_shadowtext(aes(y = rate, label = rate_label),
bg.colour='white',
colour = 'black',
size = 9,
fontface = "bold",
vjust = -0.7,
alpha = 1
) +
coord_cartesian(clip = 'off') +
ggtitle("% population belonging to 'No religion', England and Wales census") +
theme_minimal() +
xlab("") + ylab("") +
theme(legend.position = "none") +
theme(plot.title = element_text(size = 18, face = "bold")) +
theme(axis.text = element_text(size = 14)) +
scale_y_continuous(limits = c(0, 45), breaks = 10*(0:4))
p
p <- p + transition_reveal(index) + view_follow(fixed_y = T)
animate(p, renderer = gifski_renderer(), nframes = 300, fps = frame_rate, height = 500, width = 800,
end_pause = 0)
anim_save("atheism.gif")
I think you have missed some delicate points about ggplot2. I will try my best to describe them to you. First of all, you need to enter the discrete values as factor or integer. So you can use as.factor() before plotting or just factor() in the aesthetic. Also, you should consider rounding the percentages as you wish. Here is an example:
set.seed(2023)
df_new <- data.frame(index=1:10, rate=runif(10), year=2001:2010, colour=1:10)
df_new$rate_label <- ifelse(round(df_new$rate, 1) %% 1 == 0,
paste0(round(df_new$rate, 1), ".0%"),
paste0(round(df_new$rate, 1), "%"))
The ggplot for this data is:
library(ggplot2)
p <- ggplot(df_new, aes(x = factor(year), y = rate, fill = factor(colour))) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(y = rate, label = paste0(round(rate,2), "%")), vjust = -0.7) +
coord_cartesian(clip = 'off') +
ggtitle("% population belonging to 'No religion', England and Wales census") +
theme_minimal() +
xlab("") + ylab("") +
theme(legend.position = "none",
plot.title = element_text(size = 18, face = "bold"),
axis.text = element_text(size = 14))
p
And you can combine all theme element in one theme() function (as did I). The output is:
And you can easily animate the plot using the following code:
library(gganimate)
p + transition_reveal(index)
And the output is as below:
Hope it helps.
So it was answered here although I don't know why the fix works.
For some reason, labels need to go into gganimate as factors
as.factor()
I just had to add the line:
df_new$rate_label <- as.factor(df_new$rate_label)
and it works fine.
I am trying to make a histogram using ggplot, where over 95% of the data is 0 and the rest of it is between 1 - 55. I do not want to show the 0s on the histogram - but I do want them accounted for in the total percentage, that way the other %s remain low. I've taken two approaches for this -- but what happens is the percentages for the rest of the data get messed up and the 0s aren't included in the calculation.
My first approach was this:
set1 %>% filter(total>0)%>%
ggplot(aes(x=total, fill=lowcost))+
geom_histogram(binwidth=1,aes(y = (..count..)/sum(..count..)),col=I("black"))+
scale_color_grey()+scale_fill_grey(start = .85,
end = .85,) +
theme_linedraw()+
guides(fill = "none", cols='none')+
geom_vline(aes(xintercept=10, size='Low target'),
color="black", linetype=5)+
geom_vline(aes(xintercept=50, size='High target'),
color="black", linetype="dotted")+
scale_size_manual(values = c(.5, 0.5), guide=guide_legend(title = "Target", override.aes = list(linetype=c(3,5), color=c('black', 'black'))))+
scale_y_continuous(labels=scales::percent)+
scale_x_continuous(breaks = c(seq(0,50,10), 55), labels = c(seq(0, 50, 10), '>55'), limits = c(0, 60))+
facet_grid(cols = vars(lowcost))+
ggtitle("Ask Set 1 ")+
theme(plot.title = element_text(hjust = 0.5))+
xlab("Total donation ($)")+
ylab("Percent")
My second approach was not filtering out the 0s, but instead limiting the X axis to not include them, but this didn't work either:
set1 %>%
ggplot(aes(x=total, fill=lowcost))+
geom_histogram(binwidth=1,aes(y = (..count..)/sum(..count..)),col=I("black"))+
scale_color_grey()+scale_fill_grey(start = .85,
end = .85,) +
theme_linedraw()+
guides(fill = "none", cols='none')+
geom_vline(aes(xintercept=10, size='Low target'),
color="black", linetype=5)+
geom_vline(aes(xintercept=50, size='High target'),
color="black", linetype="dotted")+
scale_size_manual(values = c(.5, 0.5), guide=guide_legend(title = "Target", override.aes = list(linetype=c(3,5), color=c('black', 'black'))))+
scale_y_continuous(labels=scales::percent)+
scale_x_continuous(breaks = c(seq(0,50,10), 55), labels = c(seq(0, 50, 10), '>55'), limits = c(0.01, 60))+
facet_grid(cols = vars(lowcost))+
ggtitle("Ask Set 1 ")+
theme(plot.title = element_text(hjust = 0.5))+
xlab("Total donation ($)")+
ylab("Percent")
Both result in histograms like look like this:
The tallest bar on the left histogram should actually be 1.19%
The percents should be the following in the histogram on the left:
The percents should be the following in the histogram on the right:
I think you can do what you want using "clipping" with coord_cartesian. Try this (untested):
set1 %>%
# filter(total>0) %>% # comment this out, do not filter
ggplot(aes(x=total, fill=lowcost)) +
coord_cartesian(xlim = c(1, NA)) + # start at 1, extend to the normal limit
geom_histogram(binwidth=1, aes(y = (..count..)/sum(..count..)), col=I("black")) +
... # rest unchanged
Perhaps try something like this:
# Test data + expected outcome
set1 <- tibble(total=c(rep(0,10), rep(1,5), rep(2,5)))
set1 %>% count(total) %>% mutate(percent = n/sum(n))
# First, count the percentage and store it in a temporary variable
# Then, use the percentage variable with "identity" option for the histogram
# You can then either filter out the total first, or change the limit
set1 %>%
count(total) %>%
mutate(percent = n/sum(n)) %>%
filter(total>0) %>%
ggplot(aes(x=total,y=percent)) +
geom_histogram(stat="identity") +
scale_x_continuous(limits = c(0, 3)) +
scale_y_continuous(labels=scales::percent) +
ylab("Percent")
I want to show the x-axis labels and the form of the line clearly on this plot. It is a point plot with a lot of categories along the x-axis which makes the plot very wide and very hard to read the x-axis.
Would it be possible to fold the plot in half and display it on two panels, one above the other? How would I do that? I thought about hacking around with facet_wrap but this got ugly with the ordered points (as I wish to maintain the order of the x-axis based on the value).
Or are there better ways of showing this data? The position of the categories along the x-axis is of interest, as is the shape of the line formed by the points.
I generated the example plot using this code:
library(stringi)
example <- data.frame(
cat = do.call(paste0, Map(stri_rand_strings, n=150, length=c(25, 14, 13), pattern = c('[A-Z]', '[0-9]', '[A-Z]'))),
val = rnorm(150, mean = 20)
)
cat_ordered_by_val <- example$cat[order(example$val)]
example$cat = factor(example$cat, levels=cat_ordered_by_val)
ggplot(example, aes(y = val, x = cat)) +
geom_point() +
ylab("Value") + xlab("Category") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size=5))
ggsave("~/Desktop/what_a_plot.jpg")
This puts points in one of the two facets in alternating ways. You can also do mutate(facet = row_number() < nrow(example) / 2) to put the first half of the points in one facet and the other half in the other facet:
library(tidyverse)
example <- data.frame(
cat = do.call(paste0, Map(stri_rand_strings, n = 150, length = c(25, 14, 13), pattern = c("[A-Z]", "[0-9]", "[A-Z]"))),
val = rnorm(150, mean = 20)
)
cat_ordered_by_val <- example$cat[order(example$val)]
example$cat <- factor(example$cat, levels = cat_ordered_by_val)
example %>%
arrange(cat) %>%
mutate(facet = row_number() %% 2) %>%
ggplot(aes(y = val, x = cat)) +
geom_point() +
ylab("Value") +
xlab("Category") +
theme_bw() +
facet_wrap(~facet, ncol = 1, scales = "free") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1, size = 5))
In ggplot2/geom_tile, how to change fill color whice exceed the limits?
As the image, Region_4/5 are out of limis(1,11) , so the fill color is default grey, how to change 'Region_4' to 'darkblue', 'Region_5' to 'black' . Thanks!
library(tidyverse)
library(RColorBrewer)
tile_data <- data.frame(category=letters[1:5],
region=paste0('region_',1:5),
sales=c(1,2,5,0.1,300))
tile_data %>% ggplot(aes(x=category,
y=region,
fill=sales))+
geom_tile()+
scale_fill_gradientn(limits=c(1,11),
colors=brewer.pal(12,'Spectral'))+
theme_minimal()
If you want to keep the gradient scale and have two additional discrete values for off limits above and below, I think the easiest way would be to have separate fill scales for "in-limit" and "off-limit" values. This can be done with separate calls to geom_tile on subsets of your data and with packages such as {ggnewscale}.
I think it then would make sense to place the discrete "off-limits" at the respective extremes of your gradient color bar. You need then three geom_tile calls and three scale_fill calls, and you will need to specify the guide order within each scale_fill call. You will then need to play around with the legend margins, but it's not a big problem to make it look OK.
library(tidyverse)
library(RColorBrewer)
tile_data <- data.frame(
category = letters[1:5],
region = paste0("region_", 1:5),
sales = c(1, 2, 5, 0.1, 300)
)
ggplot(tile_data, aes(
x = category,
y = region,
fill = sales
)) +
geom_tile(data = filter(tile_data, sales <= 11 & sales >=1)) +
scale_fill_gradientn(NULL,
limits = c(1, 11),
colors = brewer.pal(11, "Spectral"),
guide = guide_colorbar(order = 2)
) +
ggnewscale::new_scale_fill() +
geom_tile(data = filter(tile_data, sales > 11), mapping = aes(fill = sales > 11)) +
scale_fill_manual("Sales", values = "black", labels = "> 11", guide = guide_legend(order = 1)) +
ggnewscale::new_scale_fill() +
geom_tile(data = filter(tile_data, sales < 1), mapping = aes(fill = sales < 1)) +
scale_fill_manual(NULL, values = "darkblue", labels = "< 1", guide = guide_legend(order = 3)) +
theme_minimal() +
theme(legend.spacing.y = unit(-6, "pt"),
legend.title = element_text(margin = margin(b = 10)))
Created on 2021-11-22 by the reprex package (v2.0.1)
You can try scales::squish, define the limits, and put the out of bound (oob) values into the scalw:
p = tile_data %>% ggplot(aes(x=category,y=region,fill=sales))+ geom_tile()
p + scale_fill_gradientn(colors = brewer.pal(11,"Spectral"),
limit = c(1,11),oob=scales::squish)
I have a test dataset like this:
df_test <- data.frame(
proj_manager = c('Emma','Emma','Emma','Emma','Emma','Alice','Alice'),
proj_ID = c(1, 2, 3, 4, 5, 6, 7),
stage = c('B','B','B','A','C','A','C'),
value = c(15,15,20,20,20,70,5)
)
Preparation for viz:
input <- select(df_test, proj_manager, proj_ID, stage, value) %>%
filter(proj_manager=='Emma') %>%
do({
proj_value_by_manager = sum(distinct(., proj_ID, value)$value);
mutate(., proj_value_by_manager = proj_value_by_manager)
}) %>%
group_by(stage) %>%
do({
sum_value_byStage = sum(distinct(.,proj_ID,value)$value);
mutate(.,sum_value_byStage= sum_value_byStage)
}) %>%
mutate(count_proj = length(unique(proj_ID)))
commapos <- function(x, ...) {
format(abs(x), big.mark = ",", trim = TRUE,
scientific = FALSE, ...) }
Visualization:
ggplot (input, aes(x=stage, y = count_proj)) +
geom_bar(stat = 'identity')+
geom_bar(aes(y=-proj_value_by_manager),
stat = "identity", fill = "Blue") +
scale_y_continuous(labels = commapos)+
coord_flip() +
ylab('') +
geom_text(aes(label= sum_value_byStage), hjust = 5) +
geom_text(aes(label= count_proj), hjust = -1) +
labs(title = "Emma: 4 projects| $90M Values \n \n Commitment|Projects") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = 0, linetype =1)
My questions are:
Why is the y-values not showing up right? e.g. C is labeled 20, but nearing hitting 100 on the scale.
How to adjust the position of labels so that it sits on the top of its bar?
How to re-scale the y axis so that both the very short bar of 'count of project' and long bar of 'Project value' can be well displayed?
Thank you all for the help!
I think your issues are coming from the fact that:
(1) Your dataset has duplicated values. This causes geom_bar to add all of them together. For example there are 3 obs for B where proj_value_by_manager = 90 which is why the blue bar extends to 270 for that group (they all get added).
(2) in your second geom_bar you use y = -proj_value_by_manager but in the geom_text to label this you use sum_value_byStage. That's why the blue bar for A is extending to 90 (since proj_value_by_manager is 90) but the label reads 20.
To get you what I believe the chart you want is you could do:
#Q1: No dupe dataset so it doesnt erroneous add columns
input2 <- input[!duplicated(input[,-c(2,4)]),]
ggplot (input2, aes(x=stage, y = count_proj)) +
geom_bar(stat = 'identity')+
geom_bar(aes(y=-sum_value_byStage), #Q1: changed so this y-value matches your label
stat = "identity", fill = "Blue") +
scale_y_continuous(labels = commapos)+
coord_flip() +
ylab('') +
geom_text(aes(label= sum_value_byStage, y = -sum_value_byStage), hjust = 1) + #Q2: Added in y-value for label and hjust so it will be on top
geom_text(aes(label= count_proj), hjust = -1) +
labs(title = "Emma: 4 projects| $90M Values \n \n Commitment|Projects") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = 0, linetype =1)
For your last question, there is no good way to display both of these. One option would be to rescale the small data and still label it with a 1 or 3. However, I didn't do this because once you scale down the blue bars the other bars look OK to me.