How to make grouping tighter on ggplot2 geom_jitter? - r

Attempting to repurpose code for political survey I found on reddit for much smaller sample size.
I am creating a scatterplot using geom_jitter. Here is my code:
ggplot(sae, aes(Alignment, Abortion))+
geom_jitter(aes(color = "green"), size = 4, alpha = 0.6)+
labs("Alignment", "Stance on Abortion")
Here is the graph it gives:
How do I make the grouping around the "Pro-choice" or the "Pro-life" lines tighter? I believe this current graph would confuse many people as to which observations are pro-choice or pro-life.
Extra credit for helping with the color problem.

You have a bigger problem. The x-axis is ordered alphabetically, which is very confusing and probably not what you intended. Also, you probably need to specify both the width (jitter in x-direction) and height (jitter in y direction).
You can fix the ordering using, e.g.,
sae$Alignment <- factor(sae$Alignment, levels=unique(sae$Alignment))
as demonstrated below.
# make up some data - you have this already
set.seed(1) # for reproducible example
sae <- data.frame(Alignment=rep(c("Left","Left Leaning","Center","Right Leaning","Right"),each=5),
Abortion =sample(c("Pro Choice","Pro Life","Other"),25, replace=TRUE))
# you start here...
library(ggplot2)
sae$Alignment <- factor(sae$Alignment, levels=unique(sae$Alignment))
ggplot(sae, aes(Alignment, Abortion))+
geom_point(color = "green", size = 4, alpha = 0.6, position=position_jitter(width=0.1, height=0.1))+
labs("Alignment", "Stance on Abortion")
Also, IMO, you could do better viz. colors:
sae$Orientation <- with(sae,ifelse(grepl("Left",Alignment),"Progressive",
ifelse(grepl("Right",Alignment),"Conservative","Neutral")))
ggplot(sae, aes(x=Alignment, y=Abortion, color=Orientation))+
geom_point(size = 4, alpha = 0.6, position=position_jitter(width=0.1, height=0.1))+
labs("Alignment", "Stance on Abortion")

You can set the width parameter in position = position_jitter() to control how tight the points are.
ggplot(sae, aes(Alignment, Abortion)) +
geom_jitter(color = "green", size = 4, alpha = 0.6, position = position_jitter(width = .2)) +
labs("Alignment", "Stance on Abortion")
If you're using the newest development version of ggplot2 (1.0.1.9003), you can just do geom_jitter(width = .2, ...) instead.
If it's still too wide, decrease width to a smaller value (and vice versa). Also note that to change the color of the points, I removed the aes() around color = "green".

Having hit this problem just recently, none of the answers above give an optimal solution.
I found an elegant answer using geom_beeswarm in library(geom_beeswarm) and thought I'd post it here.
Reproducing with geom_jitter using mpg is fairly messy:
data(mpg)
ggplot(mpg, aes(x=cyl, y=hwy, group=factor(cyl))) +
geom_boxplot() +
geom_jitter(position = position_jitter(height = .2, width = .2))
Whereas geom_beeswarm makes the jitter points centralised and much clearer:
library(geom_beeswarm)
data(mpg)
ggplot(mpg, aes(x=cyl, y=hwy, group=factor(cyl))) +
geom_boxplot() +
geom_beeswarm()

Related

Add black outline geom_point and fill independent variable different from others [duplicate]

I'd like to place a black border around points on a scatterplot that are filled based on data, using ggplot2. Also, I would like to avoid having a legend entry for the black border since it will be on each point. Basically I'm looking for this plot, but with a black border around each point.
df <- data.frame(id=runif(12), x=1:12, y=runif(12))
ggplot(df, aes(x=x, y=y))+geom_point(aes(colour=id), size=12)
As a bonus, I'd like to not have a legend entry for the black border. My best try is:
df <- data.frame(id=runif(12), x=1:12, y=runif(12))
ggplot(df, aes(x=x, y=y))+geom_point(aes(fill=id, colour="black"), size=12)
Which gives:
I don't understand why that doesn't give me what I want, and worse (for my education in ggplot2) I don't understand why it doesn't seem to map fill color to anything! Any help?
Perhaps if I can get the outline and fill mapping right I can use a hack like the one in hte last set of figures here to turn off the legend.
It's a bit obscure, but you have to use pch>20 (I think 21:25 are the relevant shapes): fill controls the interior colo(u)ring and colour controls the line around the edge.
(g0 <- ggplot(df, aes(x=x, y=y))+geom_point(aes(fill=id),
colour="black",pch=21, size=5))
update: with recent ggplot2 versions (e.g. 2.0.0, don't know how far back it goes) the default guide is a colourbar. Need g0 + guides(fill="legend") to get a legend with points as in the plot shown here. The default breaks have changed, too: to exactly replicate this plot you need g0 + scale_fill_continuous(guide="legend",breaks=seq(0.2,0.8,by=0.1)) ...
Related but not identical: how to create a plot with customized points in R? . The accepted answer to that question uses the layering technique shown in #joran's answer, but (IMO) the answer by #jbaums, which uses the pch=21 technique, is superior. (I think shape=21 is an alternative, and perhaps even preferred, to pch=21.)
PS you should put colour outside the mapping (aes bit) if you want to set it absolutely and not according to the value of some variable ...
The first question's a gimme:
ggplot(df, aes(x=x, y=y)) +
geom_point(aes(colour=id), size=12) +
geom_point(shape = 1,size = 12,colour = "black")
And, oh, you don't want an extra legend. I think that does it then:
I had the same issue, but I needed a solution that allows for jitter, too. For this you do need to use a pch that is a filled shape with a border and a grid.edit function from gridExtra package. Using your example:
df <- data.frame(id=runif(12), x=1:12, y=runif(12))
ggplot(df, aes(x=x, y=y, fill=id))+geom_point(pch=21, colour="Black", size=12)
library(gridExtra)
grid.edit("geom_point.points", grep = TRUE, gp = gpar(lwd = 3))
I had the same question, but perhaps since I was using geom_map with latitudes and longitudes, the other answers as of January 2020 didn't work for me.
Restating the question, where the following does not have a black outline around the points:
df <- data.frame(id=runif(12), x=1:12, y=runif(12))
ggplot(df, aes(x=x, y=y))+geom_point(aes(colour=id), size=12)
If I declared both the color and fill in the aesthetic and then used shape 21, problem solved.
ggplot(df, aes(x=x, y=y)) +
geom_point(aes(colour=id, fill=id),
shape = 21,size = 12,colour = "black")
If you want more control (for example, borders on points with various shapes and transparencies), use the fill aesthetic with shapes 21:25
ggplot(aes(x = Sepal.Length, y = Petal.Width, fill = Species, shape = Species), data = iris) + # notice: fill
geom_point(size = 4, alpha = 0.5) + # transparent point
geom_point(size = 4, fill = NA, colour = "black") + # black border
scale_shape_manual(values = c(21:23)) + # enable fill aesthetic
theme_classic()

Restrain scattered jitter points within a violin plot by ggplot2

A following is used to generate the violin plot in ggplot2 :
ggplot(violin,aes(x=variable,y=log(value+0.5),color=Group)) +
geom_violin(scale="width") +
geom_jitter(aes(group=Group), position=position_jitterdodge()) +
stat_summary(fun.y="mean",geom="crossbar", mapping=aes(ymin=..y.., ymax=..y..),
width=1, position=position_dodge(),show.legend = FALSE) +
theme(axis.text.x = element_text(angle = 45, margin=margin(0.5, unit="cm")))
A resulting plot looks like following;
As you can see, some points are jittered outside the boundary of violin shape and I need to those points to be inside of the violin. I've played different levels of jittering but have had any success. I'd appreciate any pointers to achieve this.
The package ggbeeswarm has the geoms quasirandom and beeswarm, which do exactly what you are searching for: https://github.com/eclarke/ggbeeswarm
It is a little bit old question but I think there is a better solution.
As #Richard Telford pointed out in a comment, geom_sina is the best solution IMO.
simulate data
df <- data.frame(data=rnorm(1200),
group=rep(c("A","A","A", "B","B","C"),
200)
)
make plot
ggplot(df, aes(y=data,x=group,color=group)) +
geom_violin()+
geom_sina()
result
Hope this is helpful.
Option 1
Using the function geom_quasirandom from package geom_beeswarm:
The quasirandom geom is a convenient means to offset points within categories to reduce overplotting. Uses the vipor package.
library(ggbeeswarm)
p <- ggplot(mpg, aes(class, hwy))
p + geom_violin(width = 1.3) + geom_quasirandom(alpha = 0.2, width = 0.2)
Option 2
Not a satisfactory answer, because by restricting the horizontal jitter we defeat the purpose of handling overplotting. But you can enlarge the width of the violin plots (width = 1.3), and play with alpha for transparency and limit the horizontal jitter (width = .02).
p <- ggplot(mpg, aes(class, hwy))
p + geom_violin(width = 1.3) + geom_jitter(alpha = 0.2, width = .02)

Whisker plots to compare mean and variance between clusters [duplicate]

I am trying to recreate a figure from a GGplot2 seminar http://dl.dropbox.com/u/42707925/ggplot2/ggplot2slides.pdf.
In this case, I am trying to generate Example 5, with jittered data points subject to a dodge. When I run the code, the points are centered around the correct line, but have no jitter.
Here is the code directly from the presentation.
set.seed(12345)
hillest<-c(rep(1.1,100*4*3)+rnorm(100*4*3,sd=0.2),
rep(1.9,100*4*3)+rnorm(100*4*3,sd=0.2))
rep<-rep(1:100,4*3*2)
process<-rep(rep(c("Process 1","Process 2","Process 3","Process 4"),each=100),3*2)
memorypar<-rep(rep(c("0.1","0.2","0.3"),each=4*100),2)
tailindex<-rep(c("1.1","1.9"),each=3*4*100)
ex5<-data.frame(hillest=hillest,rep=rep,process=process,memorypar=memorypar, tailindex=tailindex)
stat_sum_df <- function(fun, geom="crossbar", ...) {stat_summary(fun.data=fun, geom=geom, ...) }
dodge <- position_dodge(width=0.9)
p<- ggplot(ex5,aes(x=tailindex ,y=hillest,color=memorypar))
p<- p + facet_wrap(~process,nrow=2) + geom_jitter(position=dodge) +geom_boxplot(position=dodge)
p
In ggplot2 version 1.0.0 there is new position named position_jitterdodge() that is made for such situation. This postion should be used inside the geom_point() and there should be fill= used inside the aes() to show by which variable to dodge your data. To control the width of dodging argument dodge.width= should be used.
ggplot(ex5, aes(x=tailindex, y=hillest, color=memorypar, fill=memorypar)) +
facet_wrap(~process, nrow=2) +
geom_point(position=position_jitterdodge(dodge.width=0.9)) +
geom_boxplot(fill="white", outlier.colour=NA, position=position_dodge(width=0.9))
EDIT: There is a better solution with ggplot2 version 1.0.0 using position_jitterdodge. See #Didzis Elferts' answer. Note that dodge.width controls the width of the dodging and jitter.width controls the width of the jittering.
I'm not sure how the code produced the graph in the pdf.
But does something like this get you close to what you're after?
I convert tailindex and memorypar to numeric; add them together; and the result is the x coordinate for the geom_jitter layer. There's probably a more effective way to do it. Also, I'd like to see how dodging geom_boxplot and geom_jitter, and with no jittering, will produce the graph in the pdf.
library(ggplot2)
dodge <- position_dodge(width = 0.9)
ex5$memorypar2 <- as.numeric(ex5$tailindex) +
3 * (as.numeric(as.character(ex5$memorypar)) - 0.2)
p <- ggplot(ex5,aes(x=tailindex , y=hillest)) +
scale_x_discrete() +
geom_jitter(aes(colour = memorypar, x = memorypar2),
position = position_jitter(width = .05), alpha = 0.5) +
geom_boxplot(aes(colour = memorypar), outlier.colour = NA, position = dodge) +
facet_wrap(~ process, nrow = 2)
p

Secondary / Dual axis - ggplot

I am opening this question for three reasons : First, to re-open the dual-axis discussion with ggplot. Second, to ask if there is a non-torturing generic approach to do that. And finally to ask for your help with respect to a work-around.
I realize that there are multiple discussions and questions on how to add a secondary axis to a ggplot. Those usually end up in one of two conclusions:
It's bad, don't do it: Hadley Wickham answered the same question here, concluding that it is not possible. He had a very good argument that "using separate y scales (not y-scales that are transformations of each other) are fundamentally flawed".
If you insist, over-complicate your life and use grids : for example here and here
However, here are some situations that I often face, in which the visualization would greatly benefit from dual-axis. I abstracted the concepts below.
The plot is wide, hence duplicating the y-axis on the right side would help (or x-axis on the top) would ease interpretation. (We've all stumbled across one of those plots where we need to use a ruler on the screen, because the axis is too far)
I need to add a new axis that is a transformation to the original axes (eg: percentages, quantiles, .. ). (I am currently facing a problem with that. Reproducible example below)
And finally, adding Grouping/Meta information: I stumble across that when using categorical data with multiple-level, (e.g.: Categories = {1,2,x,y,z}, which are "meta-divided" into letters and numerics.) Even though color-coding the meta-levels and adding a legend or even facetting solve the issue, things get a little bit simpler with a secondary axis, where the user won't need to match the color of the bars to that of the legend.
General question: Given the new extensibility features ggplot 2.0.0, is there a more-robust no-torture way to have dual-axis without using grids?
And one final comment: I absolutely agree that the wrong use of dual-axis can be dangerously misleading... But, isn't that the case for information visualization and data science in general?
Work-around question:
Currently, I need to have a percentage-axis (2nd case). I used annotate and geom_hline as a workaround. However, I can't move the text outside the main plot. hjust also didn't seem to work with me.
Reproducible example:
library(ggplot2)
# Random values generation - with some manipulation :
maxVal = 500
value = sample(1:maxVal, size = 100, replace = T)
value[value < 400] = value[value < 400] * 0.2
value[value > 400] = value[value > 400] * 0.9
# Data Frame prepartion :
labels = paste0(sample(letters[1:3], replace = T, size = length(value)), as.character(1:length(value)))
df = data.frame(sample = factor(labels, levels = labels), value = sort(value, decreasing = T))
# Plotting : Adding Percentages/Quantiles as lines
ggplot(data = df, aes(x = sample, y = value)) +
geom_bar(stat = "identity", fill = "grey90", aes(y = maxVal )) +
geom_bar(stat = "identity", fill = "#00bbd4") +
geom_hline(yintercept = c(0, maxVal)) + # Min and max values
geom_hline(yintercept = c(maxVal*0.25, maxVal*0.5, maxVal*0.75), alpha = 0.2) + # Marking the 25%, 50% and 75% values
annotate(geom = "text", x = rep(100,3), y = c(maxVal*0.25, maxVal*0.5, maxVal*0.75),
label = c("25%", "50%", "75%"), vjust = 0, hjust = 0.2) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(panel.background = element_blank()) +
theme(plot.background = element_blank()) +
theme(plot.margin = unit(rep(2,4), units = "lines"))
In response to #1
We've all stumbled across one of those plots where we need to use a ruler on the screen, because the axis is too far
cowplot.
# Assign your original plot to some variable, `gpv` <- ggplot( ... )
ggdraw(switch_axis_position(gpv, axis="y", keep="y"))

How to adjust figure settings in plotmatrix?

Can I adjust the point size, alpha, font, and axis ticks in a plotmatrix?
Here is an example:
library(ggplot2)
plotmatrix(iris)
How can I:
make the points twice as big
set alpha = 0.5
have no more than 5 ticks on each axis
set font to 1/2 size?
I have fiddled with the mapping = aes() argument to plotmatrix as well as opts() and adding layers such as + geom_point(alpha = 0.5, size = 14), but none of these seem to do anything. I have hacked a bit of a fix to the size by writing to a large pdf (pdf(file = "foo.pdf", height = 10, width = 10)), but this provides only a limited amount of control.
Pretty much all of the ggplot2 scatterplot matrix options are still fairly new and can be a bit experimental.
But the facilities in GGally do allows you to construct this kind of plot manually, though:
custom_iris <- ggpairs(iris,upper = "blank",lower = "blank",
title = "Custom Example")
p1 <- ggplot(iris,aes(x = Sepal.Length,y = Sepal.Width)) +
geom_point(size = 1,alpha = 0.3)
p2 <- ggplot(iris,aes(x = Sepal.Width,y = Sepal.Length)) +
geom_point()
custom_iris <- putPlot(custom_iris,p1,2,1)
custom_iris <- putPlot(custom_iris,p2,3,2)
custom_iris
I did that simply by directly following the last example in ?ggpairs.

Resources