I have a data like this
year catch group
2011 22 1
2012 45 1
2013 34 1
2011 11 2
2012 22 2
2013 32 2
I would like to have the number of the group (1 and 2) to appear above the line in the plot.
Any suggestion?
My real data has 8 groups in total with 8 lines which makes it hard to see because the lines cross one another and the colors of the legend are similar.
I tried this:
library(ggplot2)
ggplot(aes(x=as.factor(year), y=catch, group=as.factor(group),
col=as.factor(group)), data=df) +
geom_line() +
geom_point() +
xlab("year") +
labs(color="group")
Firstly, distinguishing 8 different colours is very difficult. That's why your 8 groups seem to have similar colors.
What you want in this case is not a legend (which usually is an off-chart summary), but rather "annotation".
You can directly add the groups with
ggplot(...) +
geom_text(aes(x=as.factor(year), y=catch, label=group)) +
...
and then try to tweak the position of the text with nudge_x and nudge_y. But if you wanted only 1 label per group, you would have to prepare a data frame with it:
labels <- df %>% group_by(group) %>% top_n(1, -year)
ggplot(...) +
geom_text(data=labels, aes(x=as.factor(year), y=catch, label=group)) +
...
Related
I have a dataset in the following format
value1 value2 group
10 20 A
20 30 A
67 45 B
98 76 C
102 11 A
11 22 B
10 10 B
19 20 C
I am trying to make box plots for three groups (A, B and C) and the box plots for 1st and end column should be side by side. I can do two separate plots like following, but not able to figure out how to combine to put it side by side.
p1 <- ggplot(x, aes(x=group, y=value1)) + geom_boxplot()
p2 <- ggplot(x, aes(x=group, y=value)) + geom_boxplot()
I would appreciate any help. I am a newbie in R and ggplot.
Here's an option using pivot_longer from tidyr
x_new <- tidyr::pivot_longer(x, c(value1, value2))
ggplot(x_new, aes(x = group, y = value, col = name, fill = name)) + geom_boxplot(alpha = .5)
The gridExtra package can do this too. Assign your plots to variables then just use grid.arrange(plot1,plot2). Look up the documentation with ?grid.arrange for extra options.
When ggplot makes a line plot with polar coordinates, it leaves a gap between the highest and lowest x-values (Dec and Jan below) instead of wrapping around into a spiral. How can I continue the line and close that gap?
In particular, I want to use months as my x-axis, but plot multiple years of data in one looping line.
Reprex:
library(ggplot2)
# three years of monthly data
df <- expand.grid(month = month.abb, year = 2014:2016)
df$value <- seq_along(df$year)
head(df)
## month year value
## 1 Jan 2014 1
## 2 Feb 2014 2
## 3 Mar 2014 3
## 4 Apr 2014 4
## 5 May 2014 5
## 6 Jun 2014 6
ggplot(df, aes(month, value, group = year)) +
geom_line() +
coord_polar()
Here's a somewhat-hacky option:
# make a data.frame of start values end values should continue to
bridges <- df[df$month == 'Jan',]
bridges$year <- bridges$year - 1 # adjust index to align with previous group
bridges$month <- NA # set x value to any new value
# combine extra points with original
ggplot(rbind(df, bridges), aes(month, value, group = year)) +
geom_line() +
# close gap by removing expansion; redefine breaks to get rid of "NA/Jan" label
scale_x_discrete(expand = c(0,0), breaks = month.abb) +
coord_polar()
Obviously adding extra data points is not ideal, though, so maybe a more elegant answer exists.
I'm trying to create a bar graph on ggplot that has proportions rather than counts, and I have c+geom_bar(aes(y=(..count..)/sum(..count..)*100)) but I'm not sure what either of the counts refer to. I tried putting in the data but it didn't seem to work. What should I input here?
This is the data I'm using
> describe(topprob1)
topprob1
n missing unique Info Mean
500 0 9 0.93 3.908
1 2 3 4 5 6 7 8 9
Frequency 128 105 9 15 13 172 39 12 7
% 26 21 2 3 3 34 8 2 1
You haven't provided a reproducible example, so here's an illustration with the built-in mtcars data frame. Compare the following two plots. The first gives counts. The second gives proportions, which are displayed in this case as percentages. ..count.. is an internal variable that ggplot creates to store the count values.
library(ggplot2)
library(scales)
ggplot(mtcars, aes(am)) +
geom_bar()
ggplot(mtcars, aes(am)) +
geom_bar(aes(y=..count../sum(..count..))) +
scale_y_continuous(labels=percent_format())
You can also use ..prop.. computed variable with group aesthetics:
library(ggplot2)
library(scales)
ggplot(mtcars, aes(am)) +
geom_bar(aes(y=..prop.., group = 1)) +
scale_y_continuous(labels=percent_format())
I have a dataframe that looks like this.
> head(df)
DGene JGene cdr3_len Sum
1 IGHD1 IGHJ1 0 22
2 IGHD1 IGHJ1 1 11
3 IGHD1 IGHJ1 2 16
4 IGHD1 IGHJ1 3 40
5 IGHD1 IGHJ1 4 18
6 IGHD1 IGHJ1 5 30
...
It is pretty simple to facet_grid.
ggplot(df,aes(x=cdr3_len,y=Sum)) + geom_line() + xlim(c(1,42)) + facet_grid(JGene~DGene,scales="free_y")
and getting something that looks like.
I was wondering if anyone could help me with adding a hline to the mean of each grid. Or possibly how to print the mean of each grid in the top right corner.
Thanks,
Edit -
Full link to dataframe
Here's a way to add both text and a vertical line for the mean of cdr3_len by pre-computing the desired values (per #jwillis0720's comment):
First, calculate the mean of cdr3_len for each panel and then left_join that data frame to a second data frame that calculates the appropriate y-value for placing the text on each panel (because the appropriate y-value varies only by level of JGene).
library(dplyr)
meanData = df %>% group_by(JGene, DGene) %>%
summarise(meanCDR = sum(Sum*cdr3_len)/sum(Sum)) %>%
left_join(df %>% group_by(JGene) %>%
summarise(ypos = 0.9*max(Sum)))
Now for the plot:
ggplot(df,aes(x=cdr3_len, y=Sum)) +
geom_vline(data=meanData, aes(xintercept=meanCDR), colour="red", lty=3) +
geom_line() +
geom_text(data=meanData,
aes(label=round(meanCDR,1), x=40, y=ypos), colour="red",
hjust=1) +
xlim(c(1,42)) +
facet_grid(JGene~DGene,scales="free_y")
I am having a hard time with a coloring scheme in ggplot. If someone could help me out or send me to another question that would be fantastic.
I have data that look along the lines of
day=rep(1:10, 5)
year=rep(1992:1996, each=10)
state=rep(c("A","B"), each=25)
set.seed(4)
y=runif(50, 5.0, 7.5)
df=data.frame(year,day,state,y)
> head(df)
year day state y
1 1992 1 A 6.464501
2 1992 2 A 5.022364
3 1992 3 A 5.734349
4 1992 4 A 5.693437
5 1992 5 A 7.033936
6 1992 6 A 5.651069
I want to create a plot similar to the below. Using the code:
library(ggplot2)
p = ggplot(df, aes(day, y))
p = p + geom_line(aes(colour = factor(year)))
print(p)
I want the coloring to be based off of the state variable. I would like the years that are in state 'A' to be one color and the years in state 'B' to be another.
Thank you
If you want it separated by years but colored by state the key is to use the group= argument:
ggplot(data=df, aes(x=day, y=y, group=year, colour=state)) +
geom_line() +
geom_point()