Show only one text value in ggplot2 - r

I'm attempting to limit the text printing to one variable in a bar plot. How can I just label the pink bar 601, 215, 399, 456?
ggplot(df, aes(Var1, value, label=value, fill=Var2)) +
geom_bar(stat="identity", position=position_dodge(width=0.9)) +
geom_text(position=position_dodge(width=0.9))
structure(list(Var1 = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L), .Label = c("Zero", "1-30", "31-100", "101+"
), class = "factor"), Var2 = structure(c(1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("Searches", "Contact",
"Accepts"), class = "factor"), value = c(21567, 215, 399, 456,
13638, 99, 205, 171, 5806, 41, 88, 78)), .Names = c("Var1", "Var2",
"value"), row.names = c(NA, -12L), class = "data.frame")

You can do this with an ifelse statement in geom_text. First, remove label=value from the main ggplot2 call. Then, in geom_text add an ifelse condition on the label as shown below. Also, if you're dodging more than one aesthetic, you can save some typing by creating a dodging object.
pd = position_dodge(0.9)
ggplot(df, aes(Var1, value, fill=Var2)) +
geom_bar(stat="identity", position=pd) +
geom_text(position=pd, aes(label=ifelse(Var2=="Searches", value,"")))
If you want the text in the middle of the bar, rather than at the top, you can do:
geom_text(position=pd, aes(label=ifelse(Var2=="Searches", value, ""), y=0.5*value))
You can actually keep the label statement (with the ifelse condition added) in the main ggplot call, but since label only applies to geom_text (or geom_label), I usually keep it with the geom rather than the main call.

Related

gganimate transition_reveal() with geom_line() breaking on the final frame?

I am trying to animate a line graph with multiple lines. It seems that there is an error with the gganimate package involving transition_reveal() that is causing the final frame to revert for all of the lines but one. This error is not present when not using gganimate. Here is the code:
df <- read.csv("test.csv", stringsAsFactors = TRUE)
anim <- ggplot(df, aes(Day, Accidents, group = State, color = State)) +
geom_line() +
transition_reveal(Day) +
ease_aes('cubic-in-out')
jiff <- animate(anim, fps = 24, duration = 5, start_pause = 0, end_pause = 72, height = 4, width = 7, units = "in", res = 150)
jiff
Here is the dput of the dataframe:
structure(list(State = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L), levels = c("A", "B", "C", "D"), class = "factor"),
Day = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L),
Accidents = c(5L, 2L, 5L, 6L, 1L, 2L, 6L, 8L, 4L, 10L, 2L,
4L)), class = "data.frame", row.names = c(NA, -12L))
Here is the output:
Regardless of the ending pause or how many values I have along the x-axis, the final frame will always look like this with only one line appearing as updated. Does anyone know why this might be happening?
UPDATE: Reverting the gganimate package from 1.0.8 to 1.0.7 did seem to do the trick after all.
The issue is in this line start_pause = 0, end_pause = 72,. Remove or adapt it:
anim <- ggplot(df, aes(Day, Accidents, group= State, color = State)) +
geom_line() +
transition_reveal(Day) +
ease_aes('cubic-in-out')
animate(anim, fps = 24, duration = 5,
height = 4, width = 7, units = "in", res = 150)

Plot factors in order with grouping variable

I'm working with R. I have a dataframe that looks like this:
df <- (structure(list(year = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 5L), .Label = c("2013", "2014", "2015", "2016", "2017"),
class = "factor"), user = structure(c(2L, 4L, 1L, 3L, 5L, 2L, 4L, 1L,
3L, 5L, 2L, 4L, 1L, 3L, 5L, 2L, 4L, 1L, 3L, 5L, 2L, 4L, 1L, 3L, 5L),
.Label = c("John", "Laura", "Liz", "Mark", "Martha"), class = "factor"),
spent = c(56, 64, 69, 38, 93, 70, 29, 94, 56, 76, 48, 17,
74, 67, 100, 29, 16, 23, 10, 51, 72, 35, 77, 83, 17)),
class = "data.frame", row.names = c(NA, -25L)))
I'm trying to generate a histogram with the "spent" variable on the y-axis, the "user" on the x-axis, and a facet for each year. For each year, the users should be ordered based on the "spent" variable.
I tried something like df$user2=factor(df$user, levels = df$user[order(df$year,df$spent)])
But I get an error saying that the 6th factor is duplicated.
Any help is greatly appreciated!
Gerry
What you are describing is a bar plot. A histogram shows the distribution of a single continuous variable (for example hist(rnorm(100)).
Your ordering statement gave an error because each level in a factor variable (each unique value of user in this case) can appear only once in the levels argument. factor allows you to set a new ordering of the unique levels of user. For example, instead of alphabetic ordering, we can do levels=c("Liz","Laura","Mark","John","Martha")). Then df[order(df$user),] will sort the data frame by the new order of user and df[order(df$year, df$user),] will sort by year than user. However, we can't use factor to get a different order of user for each year.
Based on your description, it looks like you want a faceted plot, but with a different x-axis order in each facet. You can do this in ggplot if you create a new variable that sets the x-axis order (I've called this variable r below) and then use the labels argument in scale_x_continuous to get the desired axis labels.
library(tidyverse)
df = df %>%
# Convert year back to numeric
mutate(year = as.numeric(as.character(year))) %>%
# Sort data into the order we want
arrange(year, spent) %>%
# Create a new variable with the desired row order
mutate(r = row_number())
ggplot(df, aes(r, spent)) +
geom_col() +
facet_grid(. ~ year, scale="free_x") +
scale_x_continuous(breaks=df$r, labels=df$user)
The above plot seems confusing due to the user order changing in each facet. Maybe something like this would work better:
ggplot(df, aes(year, spent, colour=user, group=user)) +
geom_line() +
geom_point() +
geom_text(data=df %>% filter(year==min(year)), aes(label=user),
hjust=1, position=position_nudge(x=-0.1), size=3) +
expand_limits(y=0, x=2012.5) +
theme_classic() +
guides(colour=FALSE)

Center Labels in Filled Bar Chart using geom_text

I am new to ggplot2 (and R) and am trying to make a filled bar chart with labels in each box indicating the percentage composing that block.
Here is an example of my current figure to which I would like to add labels:
##ggplot figure
library(gpplot2)
library(scales)
#specify order I want in plots
ZIU$Affinity=factor(ZIU$Affinity, levels=c("High", "Het", "Low"))
ZIU$Group=factor(ZIU$Group, levels=c("ZUM", "ZUF", "ZIM", "ZIF"))
ggplot(ZIU, aes(x=Group))+
geom_bar(aes(fill=Affinity), position="fill", width=1, color="black")+
scale_y_continuous(labels=percent_format())+
scale_fill_manual("Affinity", values=c("High"="blue", "Het"="lightblue", "Low"="gray"))+
labs(x="Group", y="Percent Genotype within Group")+
ggtitle("Genotype Distribution", "by Group")
I would like to add labels centered in each box with the percentage that box represents
I have tried to add labels using this code, but it keeps producing the error message "Error: geom_text requires the following missing aesthetics: y" but my plot has no y aesthetic, does this mean I cannot use geom_text? (Also, I am not sure if once the y aesthetic issue is resolved, if the remainder of the geom_text statement will accomplish what I desire, centered white labels in each box.)
ggplot(ZIU, aes(x=Group)) +
geom_bar(aes(fill=Affinity), position="fill", width=1, color="black")+
geom_text(aes(label=paste0(sprintf("%.0f", ZIU$Affinity),"%")),
position=position_fill(vjust=0.5), color="white")+
scale_y_continuous(labels=percent_format())+
scale_fill_manual("Affinity", values=c("High"="blue", "Het"="lightblue", "Low"="gray"))+
labs(x="Group", y="Percent Genotype within Group")+
ggtitle("Genotype Distribution", "by Group")
Also if anyone has suggestions for eliminating the NA values that would be appreciated! I tried
geom_bar(aes(fill=na.omit(Affinity)), position="fill", width=1, color="black")
but was getting the error "Error: Aesthetics must be either length 1 or the same as the data (403): fill, x"
dput(sample)
structure(list(Group = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("ZUM", "ZUF", "ZIM", "ZIF"), class = "factor"),
StudyCode = c(1, 2, 3, 4, 5, 6, 20, 21, 22, 23, 143, 144,
145, 191, 192, 193, 194, 195, 196, 197, 10, 24, 25, 26, 27,
28, 71, 72, 73, 74, 274, 275, 276, 277, 278, 279, 280, 290,
291, 292), Affinity = structure(c(3L, 2L, 1L, 2L, 3L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 3L, 1L, 1L, 1L, 3L,
2L, 1L, 2L, 2L, 1L, 2L, 2L, 3L, 3L, 2L, 1L, 3L, 2L, 1L, 3L,
3L, 2L, 2L, 2L), .Label = c("High", "Het", "Low"), class = "factor")), .Names = c("Group",
"StudyCode", "Affinity"), row.names = c(NA, 40L), class = c("tbl_df",
"tbl", "data.frame"))
Thank you so much!
The linked examples have a y aesthetic, because the data are pre-summarized, rather than having ggplot do the counting internally. With your data, the analogous approach would be:
library(scales)
library(tidyverse)
# Summarize data to get counts and percentages
ZIU %>% group_by(Group, Affinity) %>%
tally %>%
mutate(percent=n/sum(n)) %>% # Pipe summarized data into ggplot
ggplot(aes(x=Group, y=percent, fill=Affinity)) +
geom_bar(stat="identity", width=1, color="black") +
geom_text(aes(label=paste0(sprintf("%1.1f", percent*100),"%")),
position=position_stack(vjust=0.5), colour="white") +
scale_y_continuous(labels=percent_format()) +
scale_fill_manual("Affinity", values=c("High"="blue", "Het"="lightblue", "Low"="gray")) +
labs(x="Group", y="Percent Genotype within Group") +
ggtitle("Genotype Distribution", "by Group")
Another option would be to use a line plot, which might make the relative values more clear. Assuming the Group values don't form a natural sequence, the lines are just there as a guide for differentiating the Affinity values across different values of Group.
ZIU %>% group_by(Group, Affinity) %>%
tally %>%
mutate(percent=n/sum(n)) %>% # Pipe summarized data into ggplot
ggplot(aes(x=Group, y=percent, colour=Affinity, group=Affinity)) +
geom_line(alpha=0.4) +
geom_text(aes(label=paste0(sprintf("%1.1f", percent*100),"%")), show.legend=FALSE) +
scale_y_continuous(labels=percent_format(), limits=c(0,1)) +
labs(x="Group", y="Percent Genotype within Group") +
ggtitle("Genotype Distribution", "by Group") +
guides(colour=guide_legend(override.aes=list(alpha=1, size=1))) +
theme_classic()

How to plot errorbars on this plot and change the overlay?

Hi have this dataset :
tdat=structure(list(Condition = structure(c(1L, 3L, 2L, 1L, 3L, 2L,
1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L,
3L, 2L, 1L, 3L, 2L), .Label = c("AS", "Dup", "MCH"), class = "factor"),
variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L), .Label = c("Bot", "Top", "All"), class = "factor"),
value = c(1.782726022, 1, 2.267946449, 1.095240234, 1, 1.103630141,
1.392545278, 1, 0.854984833, 4.5163067, 1, 4.649271897, 0.769428018,
1, 0.483117123, 0.363854608, 1, 0.195799358, 0.673186975,
1, 1.661568993, 1.174998373, 1, 1.095026419, 1.278455823,
1, 0.634152231)), .Names = c("Condition", "variable", "value"
), row.names = c(NA, -27L), class = "data.frame")
> head(tdat)
Condition variable value
1 AS Bot 1.782726
2 MCH Bot 1.000000
3 Dup Bot 2.267946
4 AS Bot 1.095240
5 MCH Bot 1.000000
6 Dup Bot 1.103630
I can plot it like that using this code :
ggplot(tdat, aes(x=interaction(Condition,variable,drop=TRUE,sep='-'), y=value,
fill=Condition)) +
geom_point() +
scale_color_discrete(name='interaction levels')+
stat_summary(fun.y='mean', geom='bar',
aes(label=signif(..y..,4),x=as.integer(interaction(Condition,variable))))
I have 2 questions :
How to change the overlay so the black points are not hidden by the
bar chart (3points should be visible per column)
How to add vertical errorbar on top of the bars using the standard
deviation from the black points ?
I'm not much in favor of mixing error bars with a bar plot.
In ggplot2 geoms are drawn in the order you add them to the plot. So, in order to have the points not hidden, add them after the bars.
ggplot(tdat, aes(x=interaction(Condition,variable,drop=TRUE,sep='-'), y=value,
fill=Condition)) +
stat_summary(fun.data="mean_sdl", mult=1, geom="errorbar") +
stat_summary(fun.y='mean', geom='bar') +
geom_point(show_guide=FALSE) +
scale_fill_discrete(name='interaction levels')
Like this:
tdat$x <- with(tdat,interaction(Condition,variable,drop=TRUE,sep='-'))
tdat_err <- ddply(tdat,.(x),
summarise,ymin = mean(value) - sd(value),
ymax = mean(value) + sd(value))
ggplot(tdat, aes(x=x, y=value)) +
stat_summary(fun.y='mean', geom='bar',
aes(label=signif(..y..,4),fill=Condition)) +
geom_point() +
geom_errorbar(data = tdat_err,aes(x = x,ymin = ymin,ymax = ymax,y = NULL),width = 0.5) +
labs(fill = 'Interaction Levels')
I've cleaned up your code somewhat. You will run into fewer problems if you move any extraneous computations outside of your ggplot() call. Better to create the new x variable first. Everything is more readable that way too.
The overlaying issue just requires re-ordering the layers.
Note that you were using scale_colour_* when you had mapped fill not colour (this is a very common error).
The only other "trick" was the un-mapping of y. Normally, when things get tricky I omit aes from the top level ggplot call entirely to make sure that each layer gets only the aesthetics that it needs.
The error bars again I tend to create the data frame outside of ggplot first. I find that cleaner and easier to read.

placing linear line based on the aggregate data in ggplot2

dput(x)
structure(list(Date = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L), .Label = c("1/1/2012", "2/1/2012", "3/1/2012"
), class = "factor"), Server = structure(c(1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"),
Storage = c(10000L, 20000L, 30000L, 15000L, 15000L, 25000L,
35000L, 15700L, 16000L, 27000L, 37000L, 16700L)), .Names = c("Date",
"Server", "Storage"), class = "data.frame", row.names = c(NA,
-12L))
I would like to create a stack bar x=Date, y=Storage and alos place a linear line based on the total storage.
I have come up with this ggplot line:
ggplot(x, aes(x=Date, y=Storage)) + geom_bar(aes(x=Date,y=Storage,fill=Server), stat="identity", position="stack") + geom_smooth(aes(group=1),method="lm", size=2, color="red")
It kinda works but linear line is not based on total storage for a given Date on the date frame x. Is there an easy way to do this?
Often the easiest way is just to calculate the values outside of ggplot2. So calculate the totals:
dd = as.data.frame(tapply(x$Storage, x$Date, sum))
dd$Date = rownames(dd)
colnames(dd)[1] = "Storage"
then add a geom_smooth call but specify the data:
ggplot(x, aes(x=Date, y=Storage)) +
geom_bar(aes(x=Date,y=Storage, fill=Server), stat="identity", position="stack") +
geom_smooth(data = dd, aes(x=Date, y=Storage, group=1),method="lm")

Resources