Restrain scattered jitter points within a violin plot by ggplot2 - r

A following is used to generate the violin plot in ggplot2 :
ggplot(violin,aes(x=variable,y=log(value+0.5),color=Group)) +
geom_violin(scale="width") +
geom_jitter(aes(group=Group), position=position_jitterdodge()) +
stat_summary(fun.y="mean",geom="crossbar", mapping=aes(ymin=..y.., ymax=..y..),
width=1, position=position_dodge(),show.legend = FALSE) +
theme(axis.text.x = element_text(angle = 45, margin=margin(0.5, unit="cm")))
A resulting plot looks like following;
As you can see, some points are jittered outside the boundary of violin shape and I need to those points to be inside of the violin. I've played different levels of jittering but have had any success. I'd appreciate any pointers to achieve this.

The package ggbeeswarm has the geoms quasirandom and beeswarm, which do exactly what you are searching for: https://github.com/eclarke/ggbeeswarm

It is a little bit old question but I think there is a better solution.
As #Richard Telford pointed out in a comment, geom_sina is the best solution IMO.
simulate data
df <- data.frame(data=rnorm(1200),
group=rep(c("A","A","A", "B","B","C"),
200)
)
make plot
ggplot(df, aes(y=data,x=group,color=group)) +
geom_violin()+
geom_sina()
result
Hope this is helpful.

Option 1
Using the function geom_quasirandom from package geom_beeswarm:
The quasirandom geom is a convenient means to offset points within categories to reduce overplotting. Uses the vipor package.
library(ggbeeswarm)
p <- ggplot(mpg, aes(class, hwy))
p + geom_violin(width = 1.3) + geom_quasirandom(alpha = 0.2, width = 0.2)
Option 2
Not a satisfactory answer, because by restricting the horizontal jitter we defeat the purpose of handling overplotting. But you can enlarge the width of the violin plots (width = 1.3), and play with alpha for transparency and limit the horizontal jitter (width = .02).
p <- ggplot(mpg, aes(class, hwy))
p + geom_violin(width = 1.3) + geom_jitter(alpha = 0.2, width = .02)

Related

How to draw circles inside each other with ggplot2?

I want to draw two circles inside each other with ggplot2.
So far my effort is:
make a fake data and plot it with geom_line(). If I convert this with coord_polar() then I will not be able to see two different circles the one inside each other
library(ggplot2)
library(tidyverse)
x1=seq(0,6000000,1000)
y1=rep(1,length(x1))
y2=rep(2,length(x1))
data=as.data.frame(cbind(x1,y1,y2))
Created on 2021-12-25 by the reprex package (v2.0.1)
# plot the data
ggplot(data) +
geom_line(aes(x1,y1)) +
geom_line(aes(x1,y2))
#coord_polar()
I would avoid the geom_circle option and use the coord_polar option if possible.
The reason is that these two circles have some differences in the x-axis, which I would indicate after drawing the circles.
I would like my plot to look like this
The code you have with coord_polar() is correct, just the plot limits need adjusting to see both the circles, e.g.
ggplot(data) +
geom_line(aes(x1,y1)) +
geom_line(aes(x1,y2)) +
coord_polar() + ylim(c(0,NA))
The reason for using ylim is that this is the direction getting transformed to the radius by the coord_polar()
Why not use two geom_point() with different sizes and pch = 21?
library(ggplot2)
df <- tibble(x = 0, y = 0)
ggplot(df, aes(x, y)) +
geom_point(pch = 21, size = 50) +
geom_point(pch = 21, size = 40) +
theme_void()

Reduce distance in plot X labels (R: ggplot2)

This is my dataframe:
df = data.frame(info=1:30, type=c(replicate(5,'A'), replicate(5,'B')), group= c(replicate(10,'D1'), replicate(10,'D2'), replicate(10,'D3')))
I want to make a jitter plot of my data distinguished by group (X-label) and type (colour):
ggplot()+
theme(panel.background=element_rect(colour="grey", size=0.2, fill='grey100'))+
geom_jitter(data=df, aes(x=group, y=info, color=type, shape=type), position=position_dodge(0.2), cex=2)+
scale_shape_manual(values=c(17,15,19))+
scale_color_manual(values=c(A="mediumvioletred", B="blue"))
How can I reduce the distance between the X-labels (D1, D2, D3) in the representation?
P.D. I want to do it even if I left a blank space in the graphic
Here are a few options.
# Setting up the plot
library(ggplot2)
df <- data.frame(
info=1:30,
type=c(replicate(5,'A'), replicate(5,'B')),
group= c(replicate(10,'D1'), replicate(10,'D2'), replicate(10,'D3'))
)
p <- ggplot(df, aes(group, info, colour = type, shape = type))
Option 1: increase the dodge distance. This won't put the labels closer, but it makes better use of the space available so that the labels appear less isolated.
p +
geom_point(position = position_dodge(width = 0.9))
Option 2: Expand the x-axis. Increasing the expansion factor from the default 0.5 to >0.5 increases the space at the ends of the axis, putting the labels closer.
p +
geom_point(position = position_dodge(0.2)) +
scale_x_discrete(expand = c(2, 0))
Option 3: change the aspect ratio. Depending on the plotting window size, this also visually puts the x-axis labels closer together.
p +
geom_point(position = position_dodge(0.2)) +
theme(aspect.ratio = 2)
Created on 2021-06-25 by the reprex package (v1.0.0)
Try adding coord_fixed(ratio = 0.2) and play around with the ratio.
ggplot()+
theme(panel.background=element_rect(colour="grey", size=0.2, fill='grey100'))+
geom_jitter(data=df, aes(x=group, y=info, color=type, shape=type), position=position_dodge(0.2))+
scale_shape_manual(values=c(17,15,19))+
scale_color_manual(values=c(A="mediumvioletred", B="blue")) + coord_fixed(ratio = 0.2)
The simplest solution is to resize the plot. For example if you follow your command with ggsave("my_plot.pdf", width = 3, height = 4.5) it looks like this:
Or in an Rmd file you can control the dimensions by various means: see this link.

Whisker plots to compare mean and variance between clusters [duplicate]

I am trying to recreate a figure from a GGplot2 seminar http://dl.dropbox.com/u/42707925/ggplot2/ggplot2slides.pdf.
In this case, I am trying to generate Example 5, with jittered data points subject to a dodge. When I run the code, the points are centered around the correct line, but have no jitter.
Here is the code directly from the presentation.
set.seed(12345)
hillest<-c(rep(1.1,100*4*3)+rnorm(100*4*3,sd=0.2),
rep(1.9,100*4*3)+rnorm(100*4*3,sd=0.2))
rep<-rep(1:100,4*3*2)
process<-rep(rep(c("Process 1","Process 2","Process 3","Process 4"),each=100),3*2)
memorypar<-rep(rep(c("0.1","0.2","0.3"),each=4*100),2)
tailindex<-rep(c("1.1","1.9"),each=3*4*100)
ex5<-data.frame(hillest=hillest,rep=rep,process=process,memorypar=memorypar, tailindex=tailindex)
stat_sum_df <- function(fun, geom="crossbar", ...) {stat_summary(fun.data=fun, geom=geom, ...) }
dodge <- position_dodge(width=0.9)
p<- ggplot(ex5,aes(x=tailindex ,y=hillest,color=memorypar))
p<- p + facet_wrap(~process,nrow=2) + geom_jitter(position=dodge) +geom_boxplot(position=dodge)
p
In ggplot2 version 1.0.0 there is new position named position_jitterdodge() that is made for such situation. This postion should be used inside the geom_point() and there should be fill= used inside the aes() to show by which variable to dodge your data. To control the width of dodging argument dodge.width= should be used.
ggplot(ex5, aes(x=tailindex, y=hillest, color=memorypar, fill=memorypar)) +
facet_wrap(~process, nrow=2) +
geom_point(position=position_jitterdodge(dodge.width=0.9)) +
geom_boxplot(fill="white", outlier.colour=NA, position=position_dodge(width=0.9))
EDIT: There is a better solution with ggplot2 version 1.0.0 using position_jitterdodge. See #Didzis Elferts' answer. Note that dodge.width controls the width of the dodging and jitter.width controls the width of the jittering.
I'm not sure how the code produced the graph in the pdf.
But does something like this get you close to what you're after?
I convert tailindex and memorypar to numeric; add them together; and the result is the x coordinate for the geom_jitter layer. There's probably a more effective way to do it. Also, I'd like to see how dodging geom_boxplot and geom_jitter, and with no jittering, will produce the graph in the pdf.
library(ggplot2)
dodge <- position_dodge(width = 0.9)
ex5$memorypar2 <- as.numeric(ex5$tailindex) +
3 * (as.numeric(as.character(ex5$memorypar)) - 0.2)
p <- ggplot(ex5,aes(x=tailindex , y=hillest)) +
scale_x_discrete() +
geom_jitter(aes(colour = memorypar, x = memorypar2),
position = position_jitter(width = .05), alpha = 0.5) +
geom_boxplot(aes(colour = memorypar), outlier.colour = NA, position = dodge) +
facet_wrap(~ process, nrow = 2)
p

How to make grouping tighter on ggplot2 geom_jitter?

Attempting to repurpose code for political survey I found on reddit for much smaller sample size.
I am creating a scatterplot using geom_jitter. Here is my code:
ggplot(sae, aes(Alignment, Abortion))+
geom_jitter(aes(color = "green"), size = 4, alpha = 0.6)+
labs("Alignment", "Stance on Abortion")
Here is the graph it gives:
How do I make the grouping around the "Pro-choice" or the "Pro-life" lines tighter? I believe this current graph would confuse many people as to which observations are pro-choice or pro-life.
Extra credit for helping with the color problem.
You have a bigger problem. The x-axis is ordered alphabetically, which is very confusing and probably not what you intended. Also, you probably need to specify both the width (jitter in x-direction) and height (jitter in y direction).
You can fix the ordering using, e.g.,
sae$Alignment <- factor(sae$Alignment, levels=unique(sae$Alignment))
as demonstrated below.
# make up some data - you have this already
set.seed(1) # for reproducible example
sae <- data.frame(Alignment=rep(c("Left","Left Leaning","Center","Right Leaning","Right"),each=5),
Abortion =sample(c("Pro Choice","Pro Life","Other"),25, replace=TRUE))
# you start here...
library(ggplot2)
sae$Alignment <- factor(sae$Alignment, levels=unique(sae$Alignment))
ggplot(sae, aes(Alignment, Abortion))+
geom_point(color = "green", size = 4, alpha = 0.6, position=position_jitter(width=0.1, height=0.1))+
labs("Alignment", "Stance on Abortion")
Also, IMO, you could do better viz. colors:
sae$Orientation <- with(sae,ifelse(grepl("Left",Alignment),"Progressive",
ifelse(grepl("Right",Alignment),"Conservative","Neutral")))
ggplot(sae, aes(x=Alignment, y=Abortion, color=Orientation))+
geom_point(size = 4, alpha = 0.6, position=position_jitter(width=0.1, height=0.1))+
labs("Alignment", "Stance on Abortion")
You can set the width parameter in position = position_jitter() to control how tight the points are.
ggplot(sae, aes(Alignment, Abortion)) +
geom_jitter(color = "green", size = 4, alpha = 0.6, position = position_jitter(width = .2)) +
labs("Alignment", "Stance on Abortion")
If you're using the newest development version of ggplot2 (1.0.1.9003), you can just do geom_jitter(width = .2, ...) instead.
If it's still too wide, decrease width to a smaller value (and vice versa). Also note that to change the color of the points, I removed the aes() around color = "green".
Having hit this problem just recently, none of the answers above give an optimal solution.
I found an elegant answer using geom_beeswarm in library(geom_beeswarm) and thought I'd post it here.
Reproducing with geom_jitter using mpg is fairly messy:
data(mpg)
ggplot(mpg, aes(x=cyl, y=hwy, group=factor(cyl))) +
geom_boxplot() +
geom_jitter(position = position_jitter(height = .2, width = .2))
Whereas geom_beeswarm makes the jitter points centralised and much clearer:
library(geom_beeswarm)
data(mpg)
ggplot(mpg, aes(x=cyl, y=hwy, group=factor(cyl))) +
geom_boxplot() +
geom_beeswarm()

ggplot2, applying two scales to the same plot? Top down barplot

See plot here:
(from here)
How do I reproduce both the upper and lower portion of the barplot using ggplot2?
For example, I can produce the upper portion with
ggplot(data.frame(x=rnorm(1000, 5)), aes(x=x)) + geom_bar() + scale_y_reverse()
However now if I add any other geom_, such as another geom_bar() the scale for y is reversed. Is it possible to apply the scale_y_reverse() to only a specific geom_?
Another option is to make two separate plots and combine them with arrangeGrob from the gridExtra package. After playing with the plot margins, you can arrive at something that looks decent.
library(gridExtra)
library(ggplot2)
set.seed(100)
p2 <- ggplot(data.frame(x=rnorm(1000, 5)), aes(x=x)) + geom_bar() + theme(plot.margin=unit(c(0,0,0,0), 'lines'))
p1 <- p2 + scale_y_reverse() +
theme(plot.margin=unit(c(0, 0, -.8, 0), 'lines'), axis.title.x=element_blank(),
axis.text.x=element_blank(), axis.ticks.x=element_blank())
p <- arrangeGrob(p1, p2)
print(p)
ggplot only like to have one y-axis scale. The easiest thing would be to basically reshape your data yourself. Here we can use geom_rect to draw the data where ever we like and we can condition it on group time. Here's an example
#sample data
dd<-data.frame(
year=rep(2000:2014, 2),
group=rep(letters[1:2], each=15),
count=rpois(30, 20)
)
And now we can plot it. But first, let's define the offset to the top bars by finding the maxima height at a year and adding a bit of space
height <- ceiling(max(tapply(dd$count, dd$year, sum))*1.10)
And here's how we plot
ggplot(dd) +
geom_rect(aes(xmin=year-.4, xmax=year+.4,
ymin=ifelse(group=="a", 0, height-count),
ymax=ifelse(group=="a", count, height), fill=group)) +
scale_y_continuous(expand=c(0,0))
And that will give us

Resources