Add facet_grid panel means as text and hline - r

I have a dataframe that looks like this.
> head(df)
DGene JGene cdr3_len Sum
1 IGHD1 IGHJ1 0 22
2 IGHD1 IGHJ1 1 11
3 IGHD1 IGHJ1 2 16
4 IGHD1 IGHJ1 3 40
5 IGHD1 IGHJ1 4 18
6 IGHD1 IGHJ1 5 30
...
It is pretty simple to facet_grid.
ggplot(df,aes(x=cdr3_len,y=Sum)) + geom_line() + xlim(c(1,42)) + facet_grid(JGene~DGene,scales="free_y")
and getting something that looks like.
I was wondering if anyone could help me with adding a hline to the mean of each grid. Or possibly how to print the mean of each grid in the top right corner.
Thanks,
Edit -
Full link to dataframe

Here's a way to add both text and a vertical line for the mean of cdr3_len by pre-computing the desired values (per #jwillis0720's comment):
First, calculate the mean of cdr3_len for each panel and then left_join that data frame to a second data frame that calculates the appropriate y-value for placing the text on each panel (because the appropriate y-value varies only by level of JGene).
library(dplyr)
meanData = df %>% group_by(JGene, DGene) %>%
summarise(meanCDR = sum(Sum*cdr3_len)/sum(Sum)) %>%
left_join(df %>% group_by(JGene) %>%
summarise(ypos = 0.9*max(Sum)))
Now for the plot:
ggplot(df,aes(x=cdr3_len, y=Sum)) +
geom_vline(data=meanData, aes(xintercept=meanCDR), colour="red", lty=3) +
geom_line() +
geom_text(data=meanData,
aes(label=round(meanCDR,1), x=40, y=ypos), colour="red",
hjust=1) +
xlim(c(1,42)) +
facet_grid(JGene~DGene,scales="free_y")

Related

r ggplot stack identity

enter image description here
i want to make this image..
my data is difficult to disclose so i make a arbitrary data.. TnT...
data
no total outcome
1 800 40
2 700 30
3 650 27
4 600 25
5 500 20
i tried..
ggplot(data, aes(x=no, y=total))+
your textgeom_bar(stat="identity")
your textgeom_bar(stat="identity")+
your textlabs(x="No", y="Total")+
your textscale_y_continuous(breaks=seq(900,100)) +
your texttheme_minimal()
i want make
enter image description here
black bar is total, grey bar is outcome..
help me plz..TnT..
i write this topic using papago, so... sentence can be awkward...I ask for your understand.!!
For visualizing the chart that you want, you need to pivot your data.
You can pivot(make wide data long) with reshape2::melt()
df <- data.frame(
no = 1:5,
total = c(800,700,650,600,500),
outcome = c(40,30,27,25,20)
)
require(ggplot2)
require(reshape2)
df_long <- melt(df,
id.vars='no',
measure.vars=c('total','outcome'))
ggplot(df_long, aes(x=no, y=value, fill=variable))+
geom_col()+
scale_fill_manual(values = c('black','grey')) +
scale_y_continuous(breaks=seq(0,900,100))+
theme_minimal() +
labs(x='No', y='Total')
Note that geom_bar(stat='identity') is equal to geom_col().

How to add legend on a line plot?

I have a data like this
year catch group
2011 22 1
2012 45 1
2013 34 1
2011 11 2
2012 22 2
2013 32 2
I would like to have the number of the group (1 and 2) to appear above the line in the plot.
Any suggestion?
My real data has 8 groups in total with 8 lines which makes it hard to see because the lines cross one another and the colors of the legend are similar.
I tried this:
library(ggplot2)
ggplot(aes(x=as.factor(year), y=catch, group=as.factor(group),
col=as.factor(group)), data=df) +
geom_line() +
geom_point() +
xlab("year") +
labs(color="group")
Firstly, distinguishing 8 different colours is very difficult. That's why your 8 groups seem to have similar colors.
What you want in this case is not a legend (which usually is an off-chart summary), but rather "annotation".
You can directly add the groups with
ggplot(...) +
geom_text(aes(x=as.factor(year), y=catch, label=group)) +
...
and then try to tweak the position of the text with nudge_x and nudge_y. But if you wanted only 1 label per group, you would have to prepare a data frame with it:
labels <- df %>% group_by(group) %>% top_n(1, -year)
ggplot(...) +
geom_text(data=labels, aes(x=as.factor(year), y=catch, label=group)) +
...

Color points by their occurrence count in ggplot2 geom_count

I want to color the points drawn by ggplot2's geom_count based on their count.
This is what I have so far:
ggplot(test3, aes(eleStart, eleLength)) + geom_count(aes(alpha=0.25, color= ..prop..)) +
scale_y_continuous(breaks=seq(0,130,5)) +
scale_x_continuous(breaks=seq(0,114)) +
theme(panel.grid.minor = element_blank())
Now I basically just want to exchange the color=..prop.. with the actual count calculated by geom_count, not their proportion.
test3 dataframe looks like:
# A tibble: 294 x 2
# Groups: X1 [56]
eleStart eleLength
<int> <int>
1 0 3
2 0 6
3 0 7
4 0 9
5 0 11
6 0 23
7 0 25
8 0 26
9 0 26
10 0 26
# ... with 284 more rows
You can color points by their occurance with color = ..n.. in aes. See the follow example:
ggplot(mtcars, aes(cyl, carb)) + geom_count(aes(color = ..n..))
To know all the computed variables that can be accessed with ..x.. syntax, you can check the manual of a geom_* function for "Computed variables". For geom_count, it looks like:
Computed variables
n number of observations at position
prop percent of points in that panel at that position
If you want to "combine the 2 legends into one legend with colorized points", try the following:
ggplot(mtcars, aes(cyl, carb)) +
geom_count(aes(color = ..n.., size = ..n..)) +
guides(color = 'legend')
Color was displayed as colorbar by default. Here, guides(color = 'legend') tells ggplot to dispaly it as legend instead of a seperate colorbar.
If you examine the help file for the geom_count function: help(geom_count), you will see a list of its Computed variables.
Computed variables
n
number of observations at position
prop
percent of points in that panel at that position
So you can use geom_count(aes(alpha=0.25, color= ..n..)) to color by the number of observations at a position and geom_count(aes(alpha=0.25, color= ..prop..)) to color by the percent of points at that position.

Stacked bar plot based in 4 variables with ggplot2

I have a data frame like this:
nthreads ab_1 ab_2 ab_3 ab_4 ...
1 0 0 0 0 ...
2 1 0 12 1 ...
4 2 1 22 1 ...
8 10 2 103 8 ...
Each ab_X represents different causes that trigger an abort in my code. I want to summarize all abort causes in a barplot displaying nthreads vs aborts with different ab_X stacked in each bar.
I can do
ggplot(data, aes(x=factor(nthreads), y=ab_1+ab_2+ab_3+ab_4)) +
geom_bar(stat="identity")
But it only gives the total number of aborts. I know there is a fill aes, but I can not make it work with continuous variables.
You have to melt the data frame first
library(data.table)
dt_melt <- melt(data, id.vars = 'nthreads')
ggplot(dt_melt, aes(x = nthreads, y = value, fill = variable)) +
geom_bar(stat = 'identity')
It gives the total number of aborts because you are adding them together :)
You need to get your data from wide to long format first, i.e. create one column for the abort causes and a second for their values. You can use tidyr::gather for that. I also find geom_col more convenient than geom_bar:
library(tidyr)
library(ggplot2)
data %>%
gather(abort, value, -nthreads) %>%
ggplot(aes(factor(nthreads), value)) +
geom_col(aes(fill = abort)) +
labs(x = "nthreads", y = "count")
Note that the range of values makes some of the bars rather hard to see, so you might want to think about scales and maybe even facets.

Creating a Bar Plot with Proportions on ggplot

I'm trying to create a bar graph on ggplot that has proportions rather than counts, and I have c+geom_bar(aes(y=(..count..)/sum(..count..)*100)) but I'm not sure what either of the counts refer to. I tried putting in the data but it didn't seem to work. What should I input here?
This is the data I'm using
> describe(topprob1)
topprob1
n missing unique Info Mean
500 0 9 0.93 3.908
1 2 3 4 5 6 7 8 9
Frequency 128 105 9 15 13 172 39 12 7
% 26 21 2 3 3 34 8 2 1
You haven't provided a reproducible example, so here's an illustration with the built-in mtcars data frame. Compare the following two plots. The first gives counts. The second gives proportions, which are displayed in this case as percentages. ..count.. is an internal variable that ggplot creates to store the count values.
library(ggplot2)
library(scales)
ggplot(mtcars, aes(am)) +
geom_bar()
ggplot(mtcars, aes(am)) +
geom_bar(aes(y=..count../sum(..count..))) +
scale_y_continuous(labels=percent_format())
You can also use ..prop.. computed variable with group aesthetics:
library(ggplot2)
library(scales)
ggplot(mtcars, aes(am)) +
geom_bar(aes(y=..prop.., group = 1)) +
scale_y_continuous(labels=percent_format())

Resources