How to add vertical lines to ggplot boxplots in R - r

I am plotting boxplots from this data:
MY_LABEL MY_REAL MY_CATEGORY
1 [POS] .56 POS
1 [POS] .57 POS
1 [POS] .37 POS
2 [POS] .51 POS
1 [sim v] .65 sim v
...
I'm using ggplot2:
ggplot( data=myDF, aes( x=MY_LABEL, y=MY_REAL, fill=MY_CATEGORY ) ) +
scale_colour_manual( values=palette ) +
coord_flip() +
geom_boxplot( outlier.size = 0 )
This works fine, and groups the boxplots by the field MY_CATEGORY:
I'd like to do 2 things:
1) To improve the clarity of this plot, I'd like to add separators between the various blocks, i.e. between POS and sim v, between sim v and C, etc (see the ugly red lines in the plot).
I've been struggling with geom_vline with no luck.
Alternatively, I'd like to add blank space between the blocks.
2) If I print this plot in grayscale, I can't distinguish the different blocks. I'm trying to force a different palette with:
scale_colour_manual( values=c("black","darkgray","gray","white") )
Again, no luck, the plot doesn't change at all.
What would you suggest to do?

Would this work for you?
require(ggplot2)
mtcars$cyl2<- ifelse(mtcars$cyl > 4, c('A'), c('B'))
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot() + facet_grid(. ~ cyl2, scales = "free", space = "free")
would give something like this,

No one covered the horizontal line route, so I thought I'd add it. Not sure why geom_vline() wasn't working for you. Here's what I did (chose to play off of Eric Fail's approach):
require(ggplot2)
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p <- p + geom_boxplot(aes(fill=factor(cyl))) + coord_flip()
p <- p + geom_vline(xintercept=c(1.5,2.5))
p
There's only three boxplots here, but in playing around, ggplot appears to place them at integer locations. Just figure out which box you want a line after (nth) and put the xintercept argument at n+0.5 for the line. You can obviously change the thickness and color to your liking: just add a size=width and colour="name" after the xintercept bit.
By the way, geom_vline() seems to work for me regardless of whether it's before or after coord_flip(). I find that counter-intuitive.
I'm not sure bdemarest is correct that you need the names to match the category names. I think the issue is that you used scale_colour_manual(), which applies if you used aes(..., colour=var) whereas you used fill=var. Thus, you need scale_fill_manual. Building on the above, we can add:
p <- p + scale_fill_manual(values=c("black","gray","white"))
p
Note that I've not defined any factor names for the colors to match. I think the colors are simply applied to your factor levels according to their order, but I could be wrong.
The end result of all of the above:

To change the fill colors, you need a named vector of values. The names need exactly match the y-axis category names.
scale_fill_manual(values=c("POS"="black", "sim v"="gray50",
"C"="gray80", "sim t"="white"))
To separate the y-axis categories, try facet_grid().
facet_grid(factor(MY_CATEGORY) ~ ., drop=TRUE)
I'm not sure that this will work because I don't have your data to test it.

Related

What is the purpose of using facet_grid(variable ~ .) instead of just using facet_wrap?

So I'm self-teaching myself R right now using this online resource: "https://r4ds.had.co.nz/data-visualisation.html#facets"
This particular section is going over the use of facet_wrap and facet_grid. It's clear to me that facet_grid is primarily used when wanting to visualize a plot along two additional dimensions, rather than just one. What I don't understand is why you can use facet_grid(.~variable) or facet_grid(variable~.) to basically achieve the same result as facet_wrap. Putting a "." in place of a variable results in just not faceting along the row or column dimension, or in other words showing 1 additional variable just as facet_wrap would do.
If anyone can shed some light on this, thank you!
If you use facet_grid, the facets will always be in one row/column. They will never wrap to make a rectangle. But really if you just have one variable with few levels, it doesn't much matter.
You can also see that facet_grid(.~variable) and facet_grid(variable~.) will put the facet labels in different places (row headings vs column headings)
mg <- ggplot(mtcars, aes(x = mpg, y = wt)) + geom_point()
mg + facet_grid(vs~ .) + labs(title="facet_grid(vs~ .)"),
mg + facet_grid(.~ vs) + labs(title="facet_grid(.~ vs)")
So in the most simple of cases, there's nothing that different between them. The main reason to use facet_grid is to have a single, common axis for all facets so you can easily scan across all panels to make a direct comparison of data.
Actually, the same result is not produced all the time...
The number of facets which appear across the graphs pane is fixed with facet_grid (always the number of unique values in the variable) where as facet_wrap, like its name suggests, wraps the facets around the graphics pane. In this way the functions only result in the same graph when the number of facets produced is small.
Both facet_grid and facet_wrap take their arguments in the form row~columns, and nowdays we don't need to use the dot with facet_grid.
In order to compare their differences let's add a new variable with 8 unqiue values to the mtcars data set:
library(tidyverse)
mtcars$example <- rep(1:8, length.out = 32)
ggplot()+
geom_point(data = mtcars, aes(x = mpg, y = wt))+
facet_grid(~example, labeller = label_both)
Which results in a cluttered plot:
Compared to:
ggplot()+
geom_point(data = mtcars, aes(x = mpg, y = wt))+
facet_wrap(~example, labeller = label_both)
Which results in:

Removing the NA element from the legend for extra data element in facets

In general I know that I can use breaks in my scale_color_manual command to remove a specific label. But for some reason this doesn't work in my case and I don't see the error. If I try to set the breaks (uncommenting the breaks line) it removes the whole legend.
Did I overlook something? Is it maybe it is not the overall data frame? The Thing is that I want to have thresholds in only one of the facets.
library(ggplot2)
threshold <- log2(c(1.5,2))
ggplot() +
geom_hline(data=data.frame(category=c('contains virus IDs',
rep('enriched',4)),
threshold=c(NA, threshold, -threshold),
color=c(NA, rep(2^threshold, 2))),
aes(yintercept=threshold, color=as.factor(color)),
na.rm=TRUE) +
scale_color_manual('threshold',
values=c(`2`='red', `1.5`='orange')
# , breaks=c(`2`='2 fold', `1.5`='1.5 fold')
) +
facet_grid(~ category, scales='free_x', space='free_x')

Add an average line to an existing plot

I want to add an average line to the existing plot.
library(ggplot2)
A <- c(1:10)
B <- c(1,1,2,2,3,3,4,4,5,5)
donnees <- data.frame(A,B)
datetime<-donnees[,2]
Indcatotvalue<-donnees[,1]
df<-donnees
mn<-tapply(donnees[,1],donnees[,2],mean)
moyenne <- data.frame(template=names(mn),mean=mn)
ggplot(data=df,
aes_q(x=datetime,
y=Indcatotvalue)) + geom_line()
I have tried to add :
geom_line(aes(y = moyenne[,2], colour = "blue"))
or :
lines(moyenne[,1],moyenne[,2],col="blue")
but nothing happens, I don't understand especially for the function "lines".
When you say average line I'm assuming you want to plot a line that represents the average value of Y (Indcatotvalue). For that you want to use geom_hline() which plots horizontal lines on your graph:
ggplot(data=df,aes_q(x=datetime,y=Indcatotvalue)) +
geom_line() +
geom_hline(yintercept = mean(Indcatotvalue), color="blue")
Which, with the example numbers you gave, will give you a plot that looks like this:
The function stat_summary is perfect here.
I have found the answer in this page groups.google from Brian Diggs:
p + stat_summary(aes(group=bucket), fun.y=mean, geom="line", colour="green")
You need to set the group to the faceting variable explicitly since
otherwise it will be type and bucket (which looks like type since type
is nested in bucket).

R - chart Y,X over categories Z, some categories as points others as lines

In a plot of Y and X over categories Z, I would like for categories to be represented by points of different collor, except for one category, which I would like to be displayed as a line connecting the points.
Here is the data and what I have so far:
library(ggplot2);library(reshape);library(scales);library(directlabels)
dat <- read.csv("https://dl.dropboxusercontent.com/u/4329509/Fdat_graf.csv")
dat_long <- melt(dat, id="ano")
p <- qplot(ano,value, data=dat_graf_long, colour=variable)+
scale_y_log10(breaks=c(.1,1,10,100,500,1000),labels = comma) +
scale_x_continuous(breaks=seq(from=1960, to=2010, by=10)) +
theme_bw()
direct.label(p)
I would like for the "Lei_de_Moore" category to be represented by a line, as in this example (done in Stata):
Also, I would like to change a few things (maybe I should ask tem in different topic?):
Change the style of the graph colors more "vivid", as in the Stata
example
Change the Y aixis. I just want plain Numbers in non-scientific
notation form. I used the labels="comma", but I don't want the coma
itself. Ideally I would like the comma to be the decimal place
separator.
EDIT: I had asked another question on how to embed the legend for this graph (this post: Legend as text alongside points for each category and with same collor)
You can mix geoms if you use ggplot and pass only a subset of the data to different geoms. Here you can pass everything in dat_long to geom_point except rows where variable is Lei_de_Moore, and then pass only those dat_long rows to geom_line in a different call.
p <- ggplot(dat_long, aes(ano, value, color=variable)) +
geom_point(data=dat_long[dat_long$variable != 'Lei_de_Moore',]) +
geom_line(data=dat_long[dat_long$variable == 'Lei_de_Moore',]) +
scale_y_log10(breaks=c(.1,1,10,100,500,1000),labels = comma) +
scale_x_continuous(breaks=seq(from=1960, to=2010, by=10)) +
theme_bw()
For colors, have a look at RColorBrewer package palettes. Install the package and use ?brewer.pal to see some more options. For example, this one might work:
p <- p + scale_color_brewer(palette="Set1")
For the y-axis labels, you'll probably have to hack something together. Have a look at this question. So you could do something like this:
fmt <- function(){
f <- function(x) sub(".", ",", as.character(round(x,1)), fixed=T)
f
}
p <- ggplot(dat_long, aes(ano, value, color=variable)) +
geom_point(data=dat_long[dat_long$variable != 'Lei_de_Moore',]) +
geom_line(data=dat_long[dat_long$variable == 'Lei_de_Moore',]) +
scale_y_log10(breaks=c(.1,1,10,100,500,1000), labels=fmt()) +
scale_x_continuous(breaks=seq(from=1960, to=2010, by=10)) +
theme_bw() +
scale_color_brewer(palette="Set1")

How to control ylim for a faceted plot with different scales in ggplot2?

In the following example, how do I set separate ylims for each of my facets?
qplot(x, value, data=df, geom=c("smooth")) + facet_grid(variable ~ ., scale="free_y")
In each of the facets, the y-axis takes a different range of values and I would like to different ylims for each of the facets.
The defaults ylims are too long for the trend that I want to see.
This was brought up on the ggplot2 mailing list a short while ago. What you are asking for is currently not possible but I think it is in progress.
As far as I know this has not been implemented in ggplot2, yet. However a workaround - that will give you ylims that exceed what ggplot provides automatically - is to add "artificial data". To reduce the ylims simply remove the data you don't want plot (see at the and for an example).
Here is an example:
Let's just set up some dummy data that you want to plot
df <- data.frame(x=rep(seq(1,2,.1),4),f1=factor(rep(c("a","b"),each=22)),f2=factor(rep(c("x","y"),22)))
df <- within(df,y <- x^2)
Which we could plot using line graphs
p <- ggplot(df,aes(x,y))+geom_line()+facet_grid(f1~f2,scales="free_y")
print(p)
Assume we want to let y start at -10 in first row and 0 in the second row, so we add a point at (0,-10) to the upper left plot and at (0,0) ot the lower left plot:
ylim <- data.frame(x=rep(0,2),y=c(-10,0),f1=factor(c("a","b")),f2=factor(c("x","y")))
dfy <- rbind(df,ylim)
Now by limiting the x-scale between 1 and 2 those added points are not plotted (a warning is given):
p <- ggplot(dfy,aes(x,y))+geom_line()+facet_grid(f1~f2,scales="free_y")+xlim(c(1,2))
print(p)
Same would work for extending the margin above by adding points with higher y values at x values that lie outside the range of xlim.
This will not work if you want to reduce the ylim, in which case subsetting your data would be a solution, for example to limit the upper row between -10 and 1.5 you could use:
p <- ggplot(dfy,aes(x,y))+geom_line(subset=.(y < 1.5 | f1 != "a"))+facet_grid(f1~f2,scales="free_y")+xlim(c(1,2))
print(p)
There are actually two packages that solve that problem now:
https://github.com/zeehio/facetscales, and https://cran.r-project.org/package=ggh4x.
I would recommend using ggh4x because it has very useful tools, such as facet grid multiple layers (having 2 variables defining the rows or columns), scaling the x and y-axis as you wish in each facet, and also having multiple fill and colour scales.
For your problems the solution would be like this:
library(ggh4x)
scales <- list(
# Here you have to specify all the scales, one for each facet row in your case
scale_y_continuous(limits = c(2,10),
scale_y_continuous(breaks = c(3, 4))
)
qplot(x, value, data=df, geom=c("smooth")) +
facet_grid(variable ~ ., scale="free_y") +
facetted_pos_scales(y = scales)
I have one example of function facet_wrap
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
facet_wrap(vars(class), scales = "free",
nrow=2,ncol=4)
Above code generates plot as:
my level too low to upload an image, click here to see plot

Resources