Whisker plots to compare mean and variance between clusters [duplicate]

Whisker plots to compare mean and variance between clusters [duplicate] - r

I am trying to recreate a figure from a GGplot2 seminar http://dl.dropbox.com/u/42707925/ggplot2/ggplot2slides.pdf.
In this case, I am trying to generate Example 5, with jittered data points subject to a dodge. When I run the code, the points are centered around the correct line, but have no jitter.
Here is the code directly from the presentation.
set.seed(12345)
hillest<-c(rep(1.1,100*4*3)+rnorm(100*4*3,sd=0.2),
rep(1.9,100*4*3)+rnorm(100*4*3,sd=0.2))
rep<-rep(1:100,4*3*2)
process<-rep(rep(c("Process 1","Process 2","Process 3","Process 4"),each=100),3*2)
memorypar<-rep(rep(c("0.1","0.2","0.3"),each=4*100),2)
tailindex<-rep(c("1.1","1.9"),each=3*4*100)
ex5<-data.frame(hillest=hillest,rep=rep,process=process,memorypar=memorypar, tailindex=tailindex)
stat_sum_df <- function(fun, geom="crossbar", ...) {stat_summary(fun.data=fun, geom=geom, ...) }
dodge <- position_dodge(width=0.9)
p<- ggplot(ex5,aes(x=tailindex ,y=hillest,color=memorypar))
p<- p + facet_wrap(~process,nrow=2) + geom_jitter(position=dodge) +geom_boxplot(position=dodge)
p

In ggplot2 version 1.0.0 there is new position named position_jitterdodge() that is made for such situation. This postion should be used inside the geom_point() and there should be fill= used inside the aes() to show by which variable to dodge your data. To control the width of dodging argument dodge.width= should be used.
ggplot(ex5, aes(x=tailindex, y=hillest, color=memorypar, fill=memorypar)) +
facet_wrap(~process, nrow=2) +
geom_point(position=position_jitterdodge(dodge.width=0.9)) +
geom_boxplot(fill="white", outlier.colour=NA, position=position_dodge(width=0.9))

EDIT: There is a better solution with ggplot2 version 1.0.0 using position_jitterdodge. See #Didzis Elferts' answer. Note that dodge.width controls the width of the dodging and jitter.width controls the width of the jittering.
I'm not sure how the code produced the graph in the pdf.
But does something like this get you close to what you're after?
I convert tailindex and memorypar to numeric; add them together; and the result is the x coordinate for the geom_jitter layer. There's probably a more effective way to do it. Also, I'd like to see how dodging geom_boxplot and geom_jitter, and with no jittering, will produce the graph in the pdf.
library(ggplot2)
dodge <- position_dodge(width = 0.9)
ex5$memorypar2 <- as.numeric(ex5$tailindex) +
3 * (as.numeric(as.character(ex5$memorypar)) - 0.2)
p <- ggplot(ex5,aes(x=tailindex , y=hillest)) +
scale_x_discrete() +
geom_jitter(aes(colour = memorypar, x = memorypar2),
position = position_jitter(width = .05), alpha = 0.5) +
geom_boxplot(aes(colour = memorypar), outlier.colour = NA, position = dodge) +
facet_wrap(~ process, nrow = 2)
p

Related

Reduce distance in plot X labels (R: ggplot2)

This is my dataframe:
df = data.frame(info=1:30, type=c(replicate(5,'A'), replicate(5,'B')), group= c(replicate(10,'D1'), replicate(10,'D2'), replicate(10,'D3')))
I want to make a jitter plot of my data distinguished by group (X-label) and type (colour):
ggplot()+
theme(panel.background=element_rect(colour="grey", size=0.2, fill='grey100'))+
geom_jitter(data=df, aes(x=group, y=info, color=type, shape=type), position=position_dodge(0.2), cex=2)+
scale_shape_manual(values=c(17,15,19))+
scale_color_manual(values=c(A="mediumvioletred", B="blue"))
How can I reduce the distance between the X-labels (D1, D2, D3) in the representation?
P.D. I want to do it even if I left a blank space in the graphic

Here are a few options.
# Setting up the plot
library(ggplot2)
df <- data.frame(
info=1:30,
type=c(replicate(5,'A'), replicate(5,'B')),
group= c(replicate(10,'D1'), replicate(10,'D2'), replicate(10,'D3'))
)
p <- ggplot(df, aes(group, info, colour = type, shape = type))
Option 1: increase the dodge distance. This won't put the labels closer, but it makes better use of the space available so that the labels appear less isolated.
p +
geom_point(position = position_dodge(width = 0.9))
Option 2: Expand the x-axis. Increasing the expansion factor from the default 0.5 to >0.5 increases the space at the ends of the axis, putting the labels closer.
p +
geom_point(position = position_dodge(0.2)) +
scale_x_discrete(expand = c(2, 0))
Option 3: change the aspect ratio. Depending on the plotting window size, this also visually puts the x-axis labels closer together.
p +
geom_point(position = position_dodge(0.2)) +
theme(aspect.ratio = 2)
Created on 2021-06-25 by the reprex package (v1.0.0)

Try adding coord_fixed(ratio = 0.2) and play around with the ratio.
ggplot()+
theme(panel.background=element_rect(colour="grey", size=0.2, fill='grey100'))+
geom_jitter(data=df, aes(x=group, y=info, color=type, shape=type), position=position_dodge(0.2))+
scale_shape_manual(values=c(17,15,19))+
scale_color_manual(values=c(A="mediumvioletred", B="blue")) + coord_fixed(ratio = 0.2)

The simplest solution is to resize the plot. For example if you follow your command with ggsave("my_plot.pdf", width = 3, height = 4.5) it looks like this:
Or in an Rmd file you can control the dimensions by various means: see this link.

Restrain scattered jitter points within a violin plot by ggplot2

A following is used to generate the violin plot in ggplot2 :
ggplot(violin,aes(x=variable,y=log(value+0.5),color=Group)) +
geom_violin(scale="width") +
geom_jitter(aes(group=Group), position=position_jitterdodge()) +
stat_summary(fun.y="mean",geom="crossbar", mapping=aes(ymin=..y.., ymax=..y..),
width=1, position=position_dodge(),show.legend = FALSE) +
theme(axis.text.x = element_text(angle = 45, margin=margin(0.5, unit="cm")))
A resulting plot looks like following;
As you can see, some points are jittered outside the boundary of violin shape and I need to those points to be inside of the violin. I've played different levels of jittering but have had any success. I'd appreciate any pointers to achieve this.

The package ggbeeswarm has the geoms quasirandom and beeswarm, which do exactly what you are searching for: https://github.com/eclarke/ggbeeswarm

It is a little bit old question but I think there is a better solution.
As #Richard Telford pointed out in a comment, geom_sina is the best solution IMO.
simulate data
df <- data.frame(data=rnorm(1200),
group=rep(c("A","A","A", "B","B","C"),
200)
)
make plot
ggplot(df, aes(y=data,x=group,color=group)) +
geom_violin()+
geom_sina()
result
Hope this is helpful.

Option 1
Using the function geom_quasirandom from package geom_beeswarm:
The quasirandom geom is a convenient means to offset points within categories to reduce overplotting. Uses the vipor package.
library(ggbeeswarm)
p <- ggplot(mpg, aes(class, hwy))
p + geom_violin(width = 1.3) + geom_quasirandom(alpha = 0.2, width = 0.2)
Option 2
Not a satisfactory answer, because by restricting the horizontal jitter we defeat the purpose of handling overplotting. But you can enlarge the width of the violin plots (width = 1.3), and play with alpha for transparency and limit the horizontal jitter (width = .02).
p <- ggplot(mpg, aes(class, hwy))
p + geom_violin(width = 1.3) + geom_jitter(alpha = 0.2, width = .02)

changing ggplot legend unit scale

This question is motivated by a previous post illustrating various ways to change how axes scales are plotted in a ggplot figure, from the default exponential notation to the full integer value (when ones axes values are very large). While I am able to convert the axes scales from exponential notation to full values, I am unclear how one would achieve the same goal for the values appearing in the legend.
While I understand that one can manually change the length of the legend scale with "scale_color..." or "scale_fill..." followed by the "limits" argument, this does not appear to be a solution to getting my legend values to show "6000000000" rather than "6e+09" (or "0" rather than "0e+00" for that matter).
The following example should suffice. My hope is someone can point out how to implement the 'scales' package to apply for legend scales rather than axes scales.
Thanks very much.
library(ggplot2)
library(scales)
Data <- data.frame(
pi = c(2,71,828,1828,45904,523536,2874713,52662497,757247093,6999595749),
e = c(3,14,159,2653,58979,311599,7963468,54418516,1590576171, 99),
face = 1:10)
p <- ggplot(data = Data, aes(x=face, y=e, colour = pi))
myplot <- p + geom_point() +
scale_y_continuous(labels = comma) +
scale_color_gradientn(colours = rainbow(2), limits=c(0,7000000000))
myplot

Use the Comma formatter in scale_color_gradientn by setting labels = comma e.g.:
p <- ggplot(data = Data, aes(x=face, y=e, colour = pi))
myplot <- p + geom_point() +
scale_y_continuous(labels = comma) +
scale_color_gradientn(colours = rainbow(2), limits=c(0,7000000000), labels = comma)
myplot

Merging legends when plotting larger points lighter and smaller points darker in ggplot2

I have trolled ggplot2 documentation, Stack and the ggplot2 Google groups email list - but to no avail.
Please can someone tell me how to merge the legends for alpha (opacity) and size? They are titled "(1-val2)" and "val2", respectively.
Normally mapping alpha and size to val2 would automatically merge the axes. However because I'm using "val2" and "1-val2", this does not happen. I have played around with scale_size_continuous and scale_alpha_continuous, but didn't manage to come right.
Here is a MWE:
require(ggplot2)
dummy <- data.frame(x=c(runif(12,5,10)),
y=c(runif(12,5,10)),
val1=c("a","b","c","a","b","c","a","b","c","a","b","c"),
val2=c(0.4,0.6,0.7,0.2,0.8,0.6,0.7,0.2,0.5,0.8,0.4,0.7))
p <- ggplot() +
geom_point(data=dummy, aes(x=x, y=y,color=val1, size=val2, alpha=(1-val2)))

Use the range argument of scale_alpha_continuous to invert the scale:
ggplot() +
geom_point(data=dummy, aes(x=x, y=y,color=val1, size=val2, alpha=val2)) +
scale_alpha_continuous(range = c(1, 0.1))

The trans argument may also be useful here:
ggplot() +
geom_point(data=dummy, aes(x = x, y = y, color = val1, size = val2, alpha = val2)) +
scale_alpha_continuous(trans = "reverse")
The description of the trans argument in ?scale_alpha_continuous and ?continuous_scale is pretty thin. However, you can find some examples here.

How do I create a categorical scatterplot in R like boxplots?

Does anyone know how to create a scatterplot in R to create plots like these in PRISM's graphpad:
I tried using boxplots but they don't display the data the way I want it. These column scatterplots that graphpad can generate show the data better for me.
Any suggestions would be appreciated.

As #smillig mentioned, you can achieve this using ggplot2. The code below reproduces the plot that you are after pretty well - warning it is quite tricky. First load the ggplot2 package and generate some data:
library(ggplot2)
dd = data.frame(values=runif(21), type = c("Control", "Treated", "Treated + A"))
Next change the default theme:
theme_set(theme_bw())
Now we build the plot.
Construct a base object - nothing is plotted:
g = ggplot(dd, aes(type, values))
Add on the points: adjust the default jitter and change glyph according to type:
g = g + geom_jitter(aes(pch=type), position=position_jitter(width=0.1))
Add on the "box": calculate where the box ends. In this case, I've chosen the average value. If you don't want the box, just omit this step.
g = g + stat_summary(fun.y = function(i) mean(i),
geom="bar", fill="white", colour="black")
Add on some error bars: calculate the upper/lower bounds and adjust the bar width:
g = g + stat_summary(
fun.ymax=function(i) mean(i) + qt(0.975, length(i))*sd(i)/length(i),
fun.ymin=function(i) mean(i) - qt(0.975, length(i)) *sd(i)/length(i),
geom="errorbar", width=0.2)
Display the plot
g
In my R code above I used stat_summary to calculate the values needed on the fly. You could also create separate data frames and use geom_errorbar and geom_bar.
To use base R, have a look at my answer to this question.

If you don't mind using the ggplot2 package, there's an easy way to make similar graphics with geom_boxplot and geom_jitter. Using the mtcars example data:
library(ggplot2)
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot() + geom_jitter() + theme_bw()
which produces the following graphic:
The documentation can be seen here: http://had.co.nz/ggplot2/geom_boxplot.html

I recently faced the same problem and found my own solution, using ggplot2.
As an example, I created a subset of the chickwts dataset.
library(ggplot2)
library(dplyr)
data(chickwts)
Dataset <- chickwts %>%
filter(feed == "sunflower" | feed == "soybean")
Since in geom_dotplot() is not possible to change the dots to symbols, I used the geom_jitter() as follow:
Dataset %>%
ggplot(aes(feed, weight, fill = feed)) +
geom_jitter(aes(shape = feed, col = feed), size = 2.5, width = 0.1)+
stat_summary(fun = mean, geom = "crossbar", width = 0.7,
col = c("#9E0142","#3288BD")) +
scale_fill_manual(values = c("#9E0142","#3288BD")) +
scale_colour_manual(values = c("#9E0142","#3288BD")) +
theme_bw()
This is the final plot:
For more details, you can have a look at this post:
http://withheadintheclouds1.blogspot.com/2021/04/building-dot-plot-in-r-similar-to-those.html?m=1

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Whisker plots to compare mean and variance between clusters [duplicate] - r

Related

Reduce distance in plot X labels (R: ggplot2)

Restrain scattered jitter points within a violin plot by ggplot2

changing ggplot legend unit scale

Merging legends when plotting larger points lighter and smaller points darker in ggplot2

How do I create a categorical scatterplot in R like boxplots?

Categories

Resources