Add rectangles around common values in ggplot - r

When I make an experimental design, I use ggplot to show the layout. Here's a simple example:
df <- data.frame(Block=rep(1:2, each=18),
Row=rep(1:9, 4),
Col=rep(1:4, each=9),
Treat=sample(c(1:6),replace=F))
Which I'll plot like:
df.p <- ggplot(df, aes(Row, Col)) + geom_tile(aes(fill=as.factor(Treat)))
to give:
Sometimes I have a structure within the design I would like to highlight by putting a box around it, for example a mainplot. In this case:
df$Mainplot <- ceiling(df$Row/3) + 3*(ceiling(df$Col/2) - 1)
I then use geom_rect and some messy code that needs adjusting for each design to generate something like:
Question: How do I add the rectangles around the mainplots in a simple way? It seems like a simple enough problem, but I haven't found an obvious way. I can map colour or some other aesthetic to mainplot, but I can't seem to surround them with a box. Any pointers greatly appreciated.

Here is a possible solution where I create an auxiliary data.frame for plotting borders with geom_rect(). I'm not sure if this is as simple as you would like! I hope the code that computes the rectangle coordinates will be reusable/generalizable with just a bit of additional effort.
library(ggplot2)
# Load example data.
df = data.frame(Block=rep(1:2, each=18),
Row=rep(1:9, 4),
Col=rep(1:4, each=9),
Treat=sample(c(1:6),replace=F))
df$Mainplot = ceiling(df$Row/3) + 3*(ceiling(df$Col/2) - 1)
# Create an auxiliary data.frame for plotting borders.
group_dat = data.frame(Mainplot=sort(unique(df$Mainplot)),
xmin=0, xmax=0, ymin=0, ymax=0)
# Fill data.frame with appropriate values.
for(i in 1:nrow(group_dat)) {
item = group_dat$Mainplot[i]
tmp = df[df$Mainplot == item, ]
group_dat[i, "xmin"] = min(tmp$Row) - 0.5
group_dat[i, "xmax"] = max(tmp$Row) + 0.5
group_dat[i, "ymin"] = min(tmp$Col) - 0.5
group_dat[i, "ymax"] = max(tmp$Col) + 0.5
}
p2 = ggplot() +
geom_tile(data=df, aes(x=Row, y=Col, fill=factor(Treat)),
colour="grey30", size=0.35) +
geom_rect(data=group_dat, aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax),
size=1.4, colour="grey30", fill=NA)
ggsave(filename="plot_2.png", plot=p2, height=3, width=6.5)

Here's a solution that might be a easier. Just use geom_tile with alpha set to 0. I didn't take the time to give you an exact solution, but here's an example. To achieve what you want I'm guessing you'll need to actually create a new data frame, which should be easy enough.
df <- data.frame(Block=rep(1:2, each=18),Row=rep(1:9, 4),Col=rep(1:4, each=9),Treat=sample(c(1:6),replace=F))
df$blocking <- rep(sort(rep(1:3,3)),4)
df.p <- ggplot(df, aes(Row, Col)) + geom_tile(aes(fill=as.factor(Treat)))
df.p+ geom_tile(data=df,aes(x=Row,y=blocking),colour="black",fill="white",alpha=0,lwd=1.4)
the alpha=0 will create a blank tile, and then you can set the line width using lwd. That's probably easier than specifying all the rectangles. Hope it helps.

I thought it would be worth posting my own (non-ideal) solution, since it seems there's nothing obvious I'm missing. I'm going to leave the question unanswered in the hope someone will come up with something.
At present, I use geom_rect in a fashion that would probably be able to be made general (perhaps into a geom_border addition to ggplot??). For the example in my question, the essential information is that each mainplot is 3 x 2.
Adding onto df.p from the original question, this is what I do currently:
df.p1 <- df.p + geom_rect(aes(xmin=((Mainplot- 3*(ceiling(Col/2)-1) )-1)*3 + 0.5,
xmax=((Mainplot - 3*(ceiling(Col/2)-1))-1)*3 + 3.5,
ymin=ceiling(ceiling(Col/2)/2 + 2*(ceiling(Col/2)-1))-0.5,
ymax=2*ceiling(Col/2)+0.5),
colour="black", fill="transparent",size=1)
Ugly, I know - hence the question. That code generates the second plot from the question. Maybe the best option is building this all into a function.

Related

Identifying values in R Plot

I have been trying to identify extreme values in a R ggplot2.
Is there any way to have a plot where besides the point (or instead of it) representing the values, it also shows the index? Or any other thing that allows you to quickly identify it?
The closest thing I found was with the identify() function, but it didn't work very well for me.
Any recommendations?
I'll give a basic ggplot plot:
df = data.frame(x = runif(10,0,1), y = runif(10,0,1))
ggplot(df, aes(x,y)) +
geom_point(col="red") + theme_bw()
Update:
I've been trying new things. I finally got exactly what I wanted.
df = data.frame(x = runif(10,0,1), y = runif(10,0,1))
ggplot(df, aes(x,y, label = rownames(df))) +
geom_point() + geom_text() + theme_bw()
Now I can easily identify the values that I want. Hope it helps other people that are new to ggplot.
If anyone knows ways to improve it, feel free to do so.
I'd suggest installing the plotly package and then running:
plotly::ggplotly(.Last.value)

Customize linetype in ggplot2 OR add automatic arrows/symbols below a line

I would like to use customized linetypes in ggplot. If that is impossible (which I believe to be true), then I am looking for a smart hack to plot arrowlike symbols above, or below, my line.
Some background:
I want to plot some water quality data and compare it to the standard (set by the European Water Framework Directive) in a red line. Here's some reproducible data and my plot:
df <- data.frame(datum <- seq.Date(as.Date("2014-01-01"),
as.Date("2014-12-31"),by = "week"),y=rnorm(53,mean=100,sd=40))
(plot1 <-
ggplot(df, aes(x=datum,y=y)) +
geom_line() +
geom_point() +
theme_classic()+
geom_hline(aes(yintercept=70),colour="red"))
However, in this plot it is completely unclear if the Standard is a maximum value (as it would be for example Chloride) or a minimum value (as it would be for Oxygen). So I would like to make this clear by adding small pointers/arrows Up or Down. The best way would be to customize the linetype so that it consists of these arrows, but I couldn't find a way.
Q1: Is this at all possible, defining custom linetypes?
All I could think of was adding extra points below the line:
extrapoints <- data.frame(datum2 <- seq.Date(as.Date("2014-01-01"),
as.Date("2014-12-31"),by = "week"),y2=68)
plot1 + geom_point(data=extrapoints, aes(x=datum2,y=y2),
shape=">",size=5,colour="red",rotate=90)
However, I can't seem to rotate these symbols pointing downward. Furthermore, this requires calculating the right spacing of X and distance to the line (Y) every time, which is rather inconvenient.
Q2: Is there any way to achieve this, preferably as automated as possible?
I'm not sure what is requested, but it sounds as though you want arrows at point up or down based on where the y-value is greater or less than some expected value. If that's the case, then this satisfies using geom_segment:
require(grid) # as noted by ?geom_segment
(plot1 <-
ggplot(df, aes(x=datum,y=y)) + geom_line()+
geom_segment(data = data.frame( df$datum, y= 70, up=df$y >70),
aes(xend = datum , yend =70 + c(-1,1)[1+up]*5), #select up/down based on 'up'
arrow = arrow(length = unit(0.1,"cm"))
) + # adjust units to modify size or arrow-heads
geom_point() +
theme_classic()+
geom_hline(aes(yintercept=70),colour="red"))
If I'm wrong about what was desired and you only wanted a bunch of down arrows, then just take out the stuff about creating and using "up" and use a minus-sign.

Get rid of second legend in ggplot2

got some problems with ggplot2 again
I want to plot at least two datasets with two different colors and two different shapes.
This works but when i try to put the names for the legend it doubles the legend automatically.
The number of datasets can change and so the legendnames of course.
I`d need a code that not just works for this example:
library(ggplot2)
xdata=1:5
ydata=c(3.45,4.67,7.8,8.98,10)
ydata2=c(12.4,13.5,14.6,15.8,16)
p <-data.frame(matrix(NA,nrow=5,ncol=3))
p$X1 <- xdata
p$X2 <- ydata
p$X3 <- ydata2
shps <-c(1,2)
colp <-c("navy","red3")
p <- melt(p,id="X1")
px <-ggplot(p,aes(X1,value))
legendnames <- c("name1","name2")
px <- px +aes(shape = factor(variable))+
geom_point(aes(colour =factor(variable)))+
theme_bw()+
scale_shape_manual(labels=legendnames,values =shps )+
scale_color_manual(values = colp)
px
This gives me this:
But i want that with my legendnames
I just deleted the labels=legendnames, in scale_shape_manual
So whats the issue to solve that problem.
Please help
I think this is just a matter of providing the same labels parameter to the scale_color_manual, otherwise it doesn't know how to consolidate the legends together.
So
px <- px + aes(shape = factor(variable)) +
geom_point(aes(colour = factor(variable))) +
theme_bw()+
scale_shape_manual(labels=legendnames, values = shps)+
scale_color_manual(labels=legendnames, values = colp)
px
It's not really a problem, you programmed it in yourself by using legendnames (which it then adds, even though those variables are not on your data). If you remove them, the plot behaves as you want:
shps <-c(X2=1,X3=2)
colp <-c(X2="navy",X3="red3")
#easy if you want to rerun code, don't overwrite variables
p2 <- melt(p,id="X1")
px <- ggplot(data=p2) + geom_point(aes(x=X1, y=value,shape=variable,colour=variable)) +
scale_shape_manual(values=shps)+
scale_color_manual(values=colp)
px

Using ggplot2: Create faceted scatterplot with scaled and moved density

I would like to plot some data as a scatter plot using facet_wrap, while superimposing some information such as a linear regression and the density.
I managed to do all that, but the density values are out of proportion with respect to my points, which is a normal thing since these points are far away. Nevertheless, I'd like to scale and move my density curve so that it is clearly visible; I don't care about it's real values but more about its shape.
Here is an exaggerated minimum working example of what I have:
set.seed(48151623)
mydf <- data.frame(x1=rnorm(mean=5,n=100),x2=rnorm(n=100,mean=10),x3=rnorm(n=100,mean=20,sd=3))
mydf$var <- mydf$x1 + mydf$x2 * mydf$x3
mydf.wide <- melt(mydf,id.vars='var',measure.vars=c(1:3))
ggplot(data=mydf.wide,aes(x=value,y=var)) +
geom_point(colour='red') +
geom_smooth(method='lm') +
stat_density(aes(x=value,y=..scaled..),position='identity',geom='line') +
facet_wrap(~variable,scale='free_x')
Which results in:
What I would like resembles to this ugly hack:
stat_density(aes(x=value,y=..scaled..*100+200),position='identity',geom='line')
Ideally, I would use y=..scaled..* diff(range(value)) + min(value) but when I do this I get an error saying that 'value' was not found. I suspect the problem is related to the faceting, but I would prefer to keep my facets.
How can I scale and move the density curve in this case?
I suggest to make two plots and combine them with grid.arrange:
p1 <- ggplot(data=mydf.wide,aes(x=value,y=var)) +
geom_point(colour='red') +
geom_smooth(method='lm') +
facet_wrap(~variable,scale='free_x') +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
plot.margin = unit(c(1, 1, 0, 0.5), "lines"))
p2 <- ggplot(data=mydf.wide,aes(x=value,y=var)) +
stat_density(aes(x=value,y=..scaled..),position='identity',geom='line') +
facet_wrap(~variable,scale='free_x') +
theme(strip.background=element_blank(),
strip.text=element_blank(),
plot.margin = unit(c(-1, 1, 0.5, 0.35), "lines"))
library(gridExtra)
grid.arrange(p1, p2, heights = c(2,1))
I'm not sure if this completely answers your question, but it was too long to put in a comment, so... In response to your second chunk of code in your question, since you've already defined x=value, you can use x instead of value in your definition of y.
stat_density(aes(x=value,y=..scaled..*diff(range(x)) +
min(x)),position='identity',geom='line')
This seems to fix your error and produces the following plot:
The only problem is, of course, if you have data with low y-values, then you're still going to overlap your density curves with your scatterplot. But, if this isn't the case, I personally think this is a fairly informative figure, as long as you can communicate effectively that the y axis values aren't important in interpreting the density curves--only the shapes of the curves are important.
I appreciate the answers of everyone, which led me to better understand ggplot underlying mechanisms. I also realize how awkward my requirement is; ggplot is not going to solve my problem.
I managed to do what I wanted not by using ggplot stat_density but to directly calculate my densities in another data frame:
set.seed(48151623)
mydf <- data.frame(x1=rnorm(mean=5,n=100),x2=rnorm(n=100,mean=10),x3=rnorm(n=100,mean=20,sd=3))
mydf$var <- mydf$x1 + mydf$x2 * mydf$x3
mydf.wide <- melt(mydf,id.vars='var',measure.vars=c(1:3))
mydf.densities <- do.call('rbind',lapply(unique(mydf.wide$variable), function(var) {
tmp <- mydf.wide[which(mydf.wide$variable==var),c('var','value')]
dfit <- density(tmp$value,cut=0)
scaledy <-dfit$y/max(dfit$y) * diff(range(tmp$var)) + min(tmp$var)
data.frame(x=dfit$x,y=scaledy,variable=rep(var,length(dfit$x)))
}))
ggplot(data=mydf.wide,aes(x=value,y=var)) +
geom_point(colour='red') +
geom_smooth(method='lm') +
geom_line(aes(x=x,y=y),data=mydf.densities) +
facet_wrap(~variable,scale='free_x')
(I know that the construction of mydf.densities is a bit obfuscated, but I will work on that later).
I'm giving out the bounty to the most voted solution at the end of the day, for your troubles.

Can I change where the x-axis intersects the y-axis in ggplot2?

I'm plotting some index data as a bar chart. I'd like to emphasise the "above index" and "below index"-ness of the numbers by forcing the x-axis to cross at 100 (such that a value of 80 would appear as a -20 bar.)
This is part of a much longer process, so it's hard to share data usefully. Here, though, is some bodge-y code that illustrates the problem (and the beginnings of my solution):
df <- data.frame(c("a","b","c"),c(118,80,65))
names(df) <- c("label","index")
my.plot <- ggplot(df,aes(label,index))
my.plot + geom_bar()
df$adjusted <- as.numeric(lapply(df$index,function(x) x-100))
my.plot2 <- ggplot(df,aes(label,adjusted))
my.plot2 + geom_bar()
I can, of course, change my index calculation to read: (value.new/value.old)*100-100 then title the chart appropriately (something like "xxx relative to index") but this seems clumsy.
So, too, does the approach I've been testing (to run the simple calculation above, then re-label the y-axis.) Is that really the best solution?
No doubt someone's going to tell me that this sort of axis manipulation is frowned upon. If this is the case, please could they point me in the direction of an explanation? At least then I'll have learned something.
This doesn't directly answer you question, but instead of missing about with the x-axis, why not make a single grid line a bit thicker? For example,
dd = data.frame(x = 1:10, y = runif(10))
g = ggplot(dd, aes(x, y)) + geom_point()
g + geom_hline(yintercept=0.2, colour="white", lwd=3)
Or as Paul suggested, with a black line and some text:
g + geom_hline(yintercept=0.2, colour="black", lwd=3) +
annotate("text", x = 2, y = 0.22, label = "Reference")
The coordinate system of you plot has the x-axis and the y-axis crossing at (0,0). This is just the way you define your coordinate system. You can of course draw a horizontal line at (x = 100), but to call this is x-axis is false.
What you already proposed is to redefine your coordinate system by transforming the data. Whether or not this transformation is appropriate is easier to answer with a reproducible example from your side.

Resources