Changing width of just one column in geom_col - r

I have a dataset with most values very close to 0, and one value closer to 6. I am asked for a bar plot of each value. I have roughly 4000 observations. With so many observations, geom_col can't seem to fit them all in the plot:
test <- data.frame(obs = 1:5000, value = abs(rnorm(5000, 0, .001)))
test[2500, 'value'] <- test[2500, 'value'] + 6
ggplot(test, aes(obs, value)) +
geom_col() +
theme_bw()
ggplot(test[2400:2600,], aes(obs, value)) +
geom_col() +
theme_bw()
If I narrow the number of observations graphed, my single large value is plotted:
Is it possible to change the thickness of just the single large observation so I can still display the full range of data?

Instead of relying on geom_col or trying some hacks via geom_rect to change the width of the plot you can simply use geom_line and optionally add a geom_point to make a dot or lollipop plot which is basically the same as a barplot. Try this
BTW: Maybe using a log-scale is also a good option.
set.seed(42)
library(ggplot2)
test <- data.frame(obs = 1:5000, value = abs(rnorm(5000, 0, .001)))
test[2500, 'value'] <- test[2500, 'value'] + 6
ggplot(test, aes(obs, value, color = value)) +
geom_line() +
geom_point() +
theme_bw()
Created on 2020-06-09 by the reprex package (v0.3.0)

Related

Add data label to bar chart in R [duplicate]

I'd like to have some labels stacked on top of a geom_bar graph. Here's an example:
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
ggplot(df) + geom_bar(aes(x,fill=x)) + opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),axis.title.x=theme_blank(),legend.title=theme_blank(),axis.title.y=theme_blank())
Now
table(df$x)
FALSE TRUE
3 5
I'd like to have the 3 and 5 on top of the two bars. Even better if I could have the percent values as well. E.g. 3 (37.5%) and 5 (62.5%). Like so:
(source: skitch.com)
Is this possible? If so, how?
To plot text on a ggplot you use the geom_text. But I find it helpful to summarise the data first using ddply
dfl <- ddply(df, .(x), summarize, y=length(x))
str(dfl)
Since the data is pre-summarized, you need to remember to change add the stat="identity" parameter to geom_bar:
ggplot(dfl, aes(x, y=y, fill=x)) + geom_bar(stat="identity") +
geom_text(aes(label=y), vjust=0) +
opts(axis.text.x=theme_blank(),
axis.ticks=theme_blank(),
axis.title.x=theme_blank(),
legend.title=theme_blank(),
axis.title.y=theme_blank()
)
As with many tasks in ggplot, the general strategy is to put what you'd like to add to the plot into a data frame in a way such that the variables match up with the variables and aesthetics in your plot. So for example, you'd create a new data frame like this:
dfTab <- as.data.frame(table(df))
colnames(dfTab)[1] <- "x"
dfTab$lab <- as.character(100 * dfTab$Freq / sum(dfTab$Freq))
So that the x variable matches the corresponding variable in df, and so on. Then you simply include it using geom_text:
ggplot(df) + geom_bar(aes(x,fill=x)) +
geom_text(data=dfTab,aes(x=x,y=Freq,label=lab),vjust=0) +
opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),
axis.title.x=theme_blank(),legend.title=theme_blank(),
axis.title.y=theme_blank())
This example will plot just the percentages, but you can paste together the counts as well via something like this:
dfTab$lab <- paste(dfTab$Freq,paste("(",dfTab$lab,"%)",sep=""),sep=" ")
Note that in the current version of ggplot2, opts is deprecated, so we would use theme and element_blank now.
Another solution is to use stat_count() when dealing with discrete variables (and stat_bin() with continuous ones).
ggplot(data = df, aes(x = x)) +
geom_bar(stat = "count") +
stat_count(geom = "text", colour = "white", size = 3.5,
aes(label = ..count..),position=position_stack(vjust=0.5))
So, this is our initial plot↓
library(ggplot2)
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
p <- ggplot(df, aes(x = x, fill = x)) +
geom_bar()
p
As suggested by yuan-ning, we can use stat_count().
geom_bar() uses stat_count() by default. As mentioned in the ggplot2 reference, stat_count() returns two values: count for number of points in bin and prop for groupwise proportion. Since our groups match the x values, both props are 1 and aren’t useful. But we can use count (referred to as “..count..”) that actually denotes bar heights, in our geom_text(). Note that we must include “stat = 'count'” into our geom_text() call as well.
Since we want both counts and percentages in our labels, we’ll need some calculations and string pasting in our “label” aesthetic instead of just “..count..”. I prefer to add a line of code to create a wrapper percent formatting function from the “scales” package (ships along with “ggplot2”).
pct_format = scales::percent_format(accuracy = .1)
p <- p + geom_text(
aes(
label = sprintf(
'%d (%s)',
..count..,
pct_format(..count.. / sum(..count..))
)
),
stat = 'count',
nudge_y = .2,
colour = 'royalblue',
size = 5
)
p
Of course, you can further edit the labels with colour, size, nudges, adjustments etc.

Issue: Plot alpha values scaling with Y-Axis/number of observations in ggplot facets

I haven't found anyone else with this issue. Here is my plot:
facet plot
Why are there different alpha values for each facet?
As you can see, the alpha value of the geom_rect() elements seems to scale with the y-axis or number of observations, maybe because I have set these to "free_y" in the facet_wrap() argument. How can I prevent this from happening?
Here is my code:
plot_data %>%
ggplot(aes(Date, n)) +
geom_rect(data= plot_data, inherit.aes = FALSE,
aes(xmin=current_date - lubridate::weeks(1), xmax=current_date, ymin=-Inf, ymax=+Inf),
fill='pink', alpha=0.2) +
geom_col() +
facet_wrap(~Type, scales = "free_y") +
xlab("Date") +
ylab("Count") +
theme_bw() +
scale_y_continuous(breaks = integer_breaks()) +
scale_alpha_manual(values = 0.2) +
theme(axis.text.x=element_text(angle=90, hjust=1))
Cheers!
TL;DR - It seems this is probably due to overplotting. You have 5 rect geoms drawn in the facet, but probably more than 5 observations in your dataset. The fix is to summarize your data and associate geom_rect() to plot with the summarized dataset.
Since OP did not provide an example dataset, we can only guess at the reason, but likely what's happening here is due to overplotting. geom_rect() behaves like all other geoms, which is to say that ggplot2 will draw or add to any geom layer with every observation (row) in the original dataset. If the geoms are drawn across facets and overlap in position, then you'll get overplotting. You can notice that this is happening based on:
Different alpha appearing on each facet, even though it should be constant based on the code, and
The fact that in order to get the rectangles to look like "light red", OP had to use pink color and an alpha value of 0.2... which shouldn't look like that if there was only one rect drawn.
Representative Example of the Issue
Here's an example that showcases the problem and how you can fix it using mtcars:
library(ggplot2)
df <- mtcars
p <- ggplot(df, aes(disp, mpg)) + geom_point() +
facet_wrap(~cyl) +
theme_bw()
p + geom_rect(
aes(xmin=200, xmax=300, ymin=-Inf, ymax=Inf),
alpha=0.01, fill='red')
Like OP's case, we expect all rectangles to be the same alpha value, but they are not. Also, note the alpha value is ridiculously low (0.01) for the color you see there. What's going on should be more obvious if we check number of observations in mtcars that falls within each facet:
> library(dplyr)
> mtcars %>% group_by(cyl) %>% tally()
# A tibble: 3 x 2
cyl n
<dbl> <int>
1 4 11
2 6 7
3 8 14
There's a lower number of observations where cyl==6 and cyl==4 has lower observations than cyl==8. This corresponds precisely to the alpha values we see for the geoms in the plot, so this is what's going on. For each observation, a rectangle is drawn over the same position and so there are 7 rectangles drawn in the middle facet, 14 on the right facet, and 11 on the left facet.
Fixing the Issue: Summarize the Data
To fix the issue, you should summarize your data and use the summarized dataset for plotting the rectangles.
summary_df <- df %>%
group_by(cyl) %>%
summarize(mean_d = mean(disp))
p + geom_rect(
data = summary_df,
aes(x=1, y=1, xmin=mean_d-50, xmax=mean_d+50, ymin=-Inf, ymax=Inf),
alpha=0.2, fill='red')
Since summary_df has only 3 observations (one for each group of cyl), the rectangles are drawn correctly and now alpha=0.2 with fill="red" gives the expected result. One thing to note here is that we still have to define x and y in the aes(). I set them both to 1 because although geom_rect() doesn't use them, ggplot2 still expects to find them in the dataset summary_df because we stated that they are assigned to that plot globally up in ggplot(df, aes(x=..., y=...)). The fix is to either move the aes() declaration into geom_point() or just assign both to be constant values in geom_rect().

How do I add label for each of my bar plot? [duplicate]

I'd like to have some labels stacked on top of a geom_bar graph. Here's an example:
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
ggplot(df) + geom_bar(aes(x,fill=x)) + opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),axis.title.x=theme_blank(),legend.title=theme_blank(),axis.title.y=theme_blank())
Now
table(df$x)
FALSE TRUE
3 5
I'd like to have the 3 and 5 on top of the two bars. Even better if I could have the percent values as well. E.g. 3 (37.5%) and 5 (62.5%). Like so:
(source: skitch.com)
Is this possible? If so, how?
To plot text on a ggplot you use the geom_text. But I find it helpful to summarise the data first using ddply
dfl <- ddply(df, .(x), summarize, y=length(x))
str(dfl)
Since the data is pre-summarized, you need to remember to change add the stat="identity" parameter to geom_bar:
ggplot(dfl, aes(x, y=y, fill=x)) + geom_bar(stat="identity") +
geom_text(aes(label=y), vjust=0) +
opts(axis.text.x=theme_blank(),
axis.ticks=theme_blank(),
axis.title.x=theme_blank(),
legend.title=theme_blank(),
axis.title.y=theme_blank()
)
As with many tasks in ggplot, the general strategy is to put what you'd like to add to the plot into a data frame in a way such that the variables match up with the variables and aesthetics in your plot. So for example, you'd create a new data frame like this:
dfTab <- as.data.frame(table(df))
colnames(dfTab)[1] <- "x"
dfTab$lab <- as.character(100 * dfTab$Freq / sum(dfTab$Freq))
So that the x variable matches the corresponding variable in df, and so on. Then you simply include it using geom_text:
ggplot(df) + geom_bar(aes(x,fill=x)) +
geom_text(data=dfTab,aes(x=x,y=Freq,label=lab),vjust=0) +
opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),
axis.title.x=theme_blank(),legend.title=theme_blank(),
axis.title.y=theme_blank())
This example will plot just the percentages, but you can paste together the counts as well via something like this:
dfTab$lab <- paste(dfTab$Freq,paste("(",dfTab$lab,"%)",sep=""),sep=" ")
Note that in the current version of ggplot2, opts is deprecated, so we would use theme and element_blank now.
Another solution is to use stat_count() when dealing with discrete variables (and stat_bin() with continuous ones).
ggplot(data = df, aes(x = x)) +
geom_bar(stat = "count") +
stat_count(geom = "text", colour = "white", size = 3.5,
aes(label = ..count..),position=position_stack(vjust=0.5))
So, this is our initial plot↓
library(ggplot2)
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
p <- ggplot(df, aes(x = x, fill = x)) +
geom_bar()
p
As suggested by yuan-ning, we can use stat_count().
geom_bar() uses stat_count() by default. As mentioned in the ggplot2 reference, stat_count() returns two values: count for number of points in bin and prop for groupwise proportion. Since our groups match the x values, both props are 1 and aren’t useful. But we can use count (referred to as “..count..”) that actually denotes bar heights, in our geom_text(). Note that we must include “stat = 'count'” into our geom_text() call as well.
Since we want both counts and percentages in our labels, we’ll need some calculations and string pasting in our “label” aesthetic instead of just “..count..”. I prefer to add a line of code to create a wrapper percent formatting function from the “scales” package (ships along with “ggplot2”).
pct_format = scales::percent_format(accuracy = .1)
p <- p + geom_text(
aes(
label = sprintf(
'%d (%s)',
..count..,
pct_format(..count.. / sum(..count..))
)
),
stat = 'count',
nudge_y = .2,
colour = 'royalblue',
size = 5
)
p
Of course, you can further edit the labels with colour, size, nudges, adjustments etc.

R/ggplot2 - Overlapping labels on facet_grid

Folks,
I am plotting histograms using geom_histogram and I would like to label each histogram with the mean value (I am using mean for the sake of this example). The issue is that I am drawing multiple histograms in one facet and I get labels overlapping. This is an example:
library(ggplot2)
df <- data.frame (type=rep(1:2, each=1000), subtype=rep(c("a","b"), each=500), value=rnorm(4000, 0,1))
plt <- ggplot(df, aes(x=value, fill=subtype)) + geom_histogram(position="identity", alpha=0.4)
plt <- plt + facet_grid(. ~ type)
plt + geom_text(aes(label = paste("mean=", mean(value)), colour=subtype, x=-Inf, y=Inf), data = df, size = 4, hjust=-0.1, vjust=2)
Result is:
The problem is that the labels for Subtypes a and b are overlapping. I would like to solve this.
I have tried the position, both dodge and stack, for example:
plt + geom_text(aes(label = paste("mean=", mean(value)), colour=subtype, x=-Inf, y=Inf), position="stack", data = df, size = 4, hjust=-0.1, vjust=2)
This did not help. In fact, it issued warning about the width.
Would you pls help ?
Thx,
Riad.
I think you could precalculate mean values before plotting in new data frame.
library(plyr)
df.text<-ddply(df,.(type,subtype),summarise,mean.value=mean(value))
df.text
type subtype mean.value
1 1 a -0.003138127
2 1 b 0.023252169
3 2 a 0.030831337
4 2 b -0.059001888
Then use this new data frame in geom_text(). To ensure that values do not overlap you can provide two values in vjust= (as there are two values in each facet).
ggplot(df, aes(x=value, fill=subtype)) +
geom_histogram(position="identity", alpha=0.4)+
facet_grid(. ~ type)+
geom_text(data=df.text,aes(label=paste("mean=",mean.value),
colour=subtype,x=-Inf,y=Inf), size = 4, hjust=-0.1, vjust=c(2,4))
Just to expand on #Didzis:
You actually have two problems here. First, the text overlaps, but more importantly, when you use aggregating functions in aes(...), as in:
geom_text(aes(label = paste("mean=", mean(value)), ...
ggplot does not respect the subsetting implied in the facets (or in the groups for that matter). So mean(value) is based on the full dataset regardless of faceting or grouping. As a result, you have to use an auxillary table, as #Didzis shows.
BTW:
df.text <- aggregate(df$value,by=list(type=df$type,subtype=df$subtype),mean)
gets you the means and does not require plyr.

How to put labels over geom_bar in R with ggplot2

I'd like to have some labels stacked on top of a geom_bar graph. Here's an example:
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
ggplot(df) + geom_bar(aes(x,fill=x)) + opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),axis.title.x=theme_blank(),legend.title=theme_blank(),axis.title.y=theme_blank())
Now
table(df$x)
FALSE TRUE
3 5
I'd like to have the 3 and 5 on top of the two bars. Even better if I could have the percent values as well. E.g. 3 (37.5%) and 5 (62.5%). Like so:
(source: skitch.com)
Is this possible? If so, how?
To plot text on a ggplot you use the geom_text. But I find it helpful to summarise the data first using ddply
dfl <- ddply(df, .(x), summarize, y=length(x))
str(dfl)
Since the data is pre-summarized, you need to remember to change add the stat="identity" parameter to geom_bar:
ggplot(dfl, aes(x, y=y, fill=x)) + geom_bar(stat="identity") +
geom_text(aes(label=y), vjust=0) +
opts(axis.text.x=theme_blank(),
axis.ticks=theme_blank(),
axis.title.x=theme_blank(),
legend.title=theme_blank(),
axis.title.y=theme_blank()
)
As with many tasks in ggplot, the general strategy is to put what you'd like to add to the plot into a data frame in a way such that the variables match up with the variables and aesthetics in your plot. So for example, you'd create a new data frame like this:
dfTab <- as.data.frame(table(df))
colnames(dfTab)[1] <- "x"
dfTab$lab <- as.character(100 * dfTab$Freq / sum(dfTab$Freq))
So that the x variable matches the corresponding variable in df, and so on. Then you simply include it using geom_text:
ggplot(df) + geom_bar(aes(x,fill=x)) +
geom_text(data=dfTab,aes(x=x,y=Freq,label=lab),vjust=0) +
opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),
axis.title.x=theme_blank(),legend.title=theme_blank(),
axis.title.y=theme_blank())
This example will plot just the percentages, but you can paste together the counts as well via something like this:
dfTab$lab <- paste(dfTab$Freq,paste("(",dfTab$lab,"%)",sep=""),sep=" ")
Note that in the current version of ggplot2, opts is deprecated, so we would use theme and element_blank now.
Another solution is to use stat_count() when dealing with discrete variables (and stat_bin() with continuous ones).
ggplot(data = df, aes(x = x)) +
geom_bar(stat = "count") +
stat_count(geom = "text", colour = "white", size = 3.5,
aes(label = ..count..),position=position_stack(vjust=0.5))
So, this is our initial plot↓
library(ggplot2)
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
p <- ggplot(df, aes(x = x, fill = x)) +
geom_bar()
p
As suggested by yuan-ning, we can use stat_count().
geom_bar() uses stat_count() by default. As mentioned in the ggplot2 reference, stat_count() returns two values: count for number of points in bin and prop for groupwise proportion. Since our groups match the x values, both props are 1 and aren’t useful. But we can use count (referred to as “..count..”) that actually denotes bar heights, in our geom_text(). Note that we must include “stat = 'count'” into our geom_text() call as well.
Since we want both counts and percentages in our labels, we’ll need some calculations and string pasting in our “label” aesthetic instead of just “..count..”. I prefer to add a line of code to create a wrapper percent formatting function from the “scales” package (ships along with “ggplot2”).
pct_format = scales::percent_format(accuracy = .1)
p <- p + geom_text(
aes(
label = sprintf(
'%d (%s)',
..count..,
pct_format(..count.. / sum(..count..))
)
),
stat = 'count',
nudge_y = .2,
colour = 'royalblue',
size = 5
)
p
Of course, you can further edit the labels with colour, size, nudges, adjustments etc.

Resources