Creating a ggplot boxplot with jitter duplicates my data (R) [duplicate] - r

This question already has answers here:
R ggplot geom_jitter duplicates outlier
(1 answer)
How to exclude outliers when using geom_boxplot() + geom_jitter() in R
(2 answers)
Closed last year.
I want to create a boxplot that shows individual data points as well.
This is the code that I am using:
ggplot(data, aes(x=treatment, y=aggregate_count, color = treatment)) +
geom_boxplot() +
geom_point(position = "jitter") +
ylab("Aggregate Count") +
xlab("") +
theme_classic()
Using both the geom_boxplot() and geom_point() function together like this however duplicates my dataset. I noticed this because there is only one value in my dataset with a value above 30, but in the plot, I can see two. If I remove either geom_boxplot() or geom_point() the data gets displayed correctly.
Does someone have an idea on how to fix this?
Thank you in advance!!

Related

ggplot2 - why does changing axis scale affect summary statistics of variables? [duplicate]

This question already has an answer here:
R ggplot boxplot: change y-axis limit
(1 answer)
Closed last month.
I have a the following data:
x <- data.frame('myvar'=c(10,10,9,9,8,8, runif(100)), 'mygroup' = c(rep('a', 26), rep('b', 80)))
I want to describe the data using a box-and-whiskers plot in ggplot2. I have also included the mean using a stat_summary.
library(ggplot2)
ggplot(x, aes(x=myvar, y=mygroup)) +
geom_boxplot() +
stat_summary(fun=mean, geom='point', shape=20, color='red', fill='red')
This is fine, but for some of my graphs, the outliers are so huge, that it's hard to make sense of the total distribution. In these cases, I have cut the x axis:
ggplot(x, aes(x=myvar, y=mygroup)) +
geom_boxplot() +
stat_summary(fun=mean, geom='point', shape=20, color='red', fill='red') +
scale_x_continuous(limit=c(0,5))
Note, now that the means (and medians?) are calculated using only the subset of data that is visible on the graph. Is there a ggplot way to include the outlier observations in the calculation but drop them from the visualisation?
My desired output would be a graph with x limits at c(0,5) and a red dot at 2.48 for group mygroup='a'.
scale_x_continuous will remove those points not lying within the limits. You want to use coord_cartesian to "zoom in" without removing your data:
ggplot(x, aes(x=myvar, y=mygroup)) +
geom_boxplot() +
stat_summary(fun=mean, geom='point', shape=20, color='red', fill='red') +
coord_cartesian(c(0,5))

How to add a legend to multiple plots in R [duplicate]

This question already has an answer here:
regrading adding a legend using ggplot2 for different lines
(1 answer)
Closed 2 years ago.
I have this code:
testPlot= ggplot(residFrame) +
geom_point(aes(x=STATEFP, y=total_diff, colour='total'), colour='red', shape=1) +
geom_point(aes(x=STATEFP, y=desalination_diff, colour='desalination'), colour='blue', shape=1) +
geom_point(aes(x=STATEFP, y=surfacewater_diff), colour='green', shape=1) +
geom_point(aes(x=STATEFP, y=groundwater_diff), colour='yellow', shape=1) +
xlab('STATEFP') + ylab('Difference') + ggtitle('Difference for all states', subtitle='For each source')
testPlot
And now I want to add a legend to testPlot that describes what the colours in the plot represent. I have searched the web, but cannot find the answer to this particular problem, can someone help me out here?
Thanks!
You should get the data in long format and then plot instead of calling geom_point multiple times. You have not provided an example of your data but you can try.
library(ggplot2)
residFrame %>%
tidyr::pivot_longer(cols = ends_with('diff')) %>%
ggplot() + aes(STATEFP, value, color = name) +
geom_point(shape = 1) +
xlab('STATEFP') + ylab('Difference') +
ggtitle('Difference for all states', subtitle='For each source')

How to Add Lines With A Facet R [duplicate]

This question already has answers here:
facet_wrap add geom_hline
(2 answers)
Closed 5 months ago.
So I have a faceted graph, and I want to be able to add lines to it that change by each facet.
Here's the code:
p <- ggplot(mtcars, aes(x=wt))+
geom_histogram(bins = 20,aes(fill = factor(cyl)))+
facet_grid(.~cyl)+
scale_color_manual(values = c('red','green','blue'))+
geom_vline(xintercept = mean(mtcars$wt))
p
So my question is, how would I get it so that the graph is showing the mean of each faceted sub-graph.
I hope that makes sense and appreciate your time regardless of your answering capability.
You can do this within the ggplot call by using stat_summaryh from the ggstance package. In the code below, I've also changed scale_colour_manual to scale_fill_manual on the assumption that you were trying to set the fill colors of the histogram bars:
library(tidyverse)
library(ggstance)
ggplot(mtcars, aes(x=wt))+
geom_histogram(bins = 20,aes(fill = factor(cyl)))+
stat_summaryh(fun.x=mean, geom="vline", aes(xintercept=..x.., y=0),
colour="grey40") +
facet_grid(.~cyl)+
scale_fill_manual(values = c('red','green','blue')) +
theme_bw()
Another option is to calculate the desired means within geom_vline (this is an implementation of the summary approach that #Ben suggested). In the code below, the . is a "pronoun" that refers to the data frame (mtcars in this case) that was fed into ggplot:
ggplot(mtcars, aes(x=wt))+
geom_histogram(bins = 20,aes(fill = factor(cyl)))+
geom_vline(data = . %>% group_by(cyl) %>% summarise(wt=mean(wt)),
aes(xintercept=wt), colour="grey40") +
facet_grid(.~cyl)+
scale_fill_manual(values = c('red','green','blue')) +
theme_bw()

Different `geom_hline()` for each facet of ggplot [duplicate]

This question already has an answer here:
Display a summary line per facet rather than overall
(1 answer)
Closed 4 years ago.
library(tidyverse)
ggplot(mpg, aes(cty, hwy)) +
geom_point() +
facet_grid(year ~ fl) +
geom_hline(yintercept = mean(mpg$hwy))
I want each geom_hline() in the facet shown above to be the mean of the points that are only contained within that facet. I would think that I could do it with something like (below). But that doesn't work. I'm close, right?
library(tidyverse)
ggplot(mpg, aes(cty, hwy)) +
geom_point() +
facet_grid(year ~ fl) +
geom_hline(yintercept = mean(mpg %>% group_by(year, fl)$hwy))
If you have the value you wish to use for each facet as a column in the data frame, and that value is unique within each facet, then you can use geom_hline(aes(yintercept=column)), which will then plot a horizontal line for each of the facets

Multiple graphs with different x-axis ticks [duplicate]

This question already has answers here:
Order discrete x scale by frequency/value
(7 answers)
Closed 6 years ago.
I have the following data.frame:
ef2 <- data.frame(X1=c(50,100,'bb','aa'), X2=c('A','A','B','B'), value=c(1,4,3,6))
I want to create two plots, one for each group in X2.
Here is the code I have and the plot obtained:
ggplot(data=ef2, aes(x=X1, y=value, group=X2)) +
facet_grid(.~X2, scales="free_x") +
geom_line(size=1) +
geom_point(size=3) +
xlab('') +
ylab('Y')
The problem is that the x-axis is ordered alphabetically and I don't know how to fix it. I have tried adding scale_x_discrete, but I don't know how to separate groups. You can see the plot I obtained adding this parameter in the following link:
ggplot(data=ef2, aes(x=X1, y=value, group=X2)) +
facet_grid(.~X2, scales="free_x") +
geom_line(size=1) +
geom_point(size=3) +
xlab('') +
ylab('Y') +
scale_x_discrete(limits=ef2$X1)
Edited: I can't change ef2 data.frame. I've tried ordering factors in another data.frame:
ef2 <- data.frame(X1=c(50,100,'bb','aa'), X2=c('A','A','B','B'), value=c(1,4,3,6))
ef2$X1 <- as.character(ef2$X1)
nou <- data.frame(X1=factor(ef2$X1), levels=ef2$X1, X2=ef2$X2, value=ef2$value)
But it doesn't work.
This worked for me but I am not sure if it is exactly what you need:
ef2 <- data.frame(X1=factor(c('50','100','bb','aa'), levels = c('50','100','bb','aa')), X2=c('A','A','B','B'), value=c(1,4,3,6))
ggplot(data=ef2, aes(x=X1, y=value, group=X2)) +
facet_grid(.~X2, scales="free_x") +
geom_line(size=1) +
geom_point(size=3) +
xlab('') +
ylab('Y')
According to this post: Avoid ggplot sorting the x-axis while plotting geom_bar()
ggplot orders automatically unless you provide an already orderd factor.
Update:
The code you use has an error. levels is an argument of the factor function.
Try this:
ef2 <- data.frame(X1=c(50,100,'bb','aa'), X2=c('A','A','B','B'), value=c(1,4,3,6))
ef2$X1 <- factor(ef2$X1, levels = unique(ef2$X1))

Resources