Get rid of second legend in ggplot2 - r

got some problems with ggplot2 again
I want to plot at least two datasets with two different colors and two different shapes.
This works but when i try to put the names for the legend it doubles the legend automatically.
The number of datasets can change and so the legendnames of course.
I`d need a code that not just works for this example:
library(ggplot2)
xdata=1:5
ydata=c(3.45,4.67,7.8,8.98,10)
ydata2=c(12.4,13.5,14.6,15.8,16)
p <-data.frame(matrix(NA,nrow=5,ncol=3))
p$X1 <- xdata
p$X2 <- ydata
p$X3 <- ydata2
shps <-c(1,2)
colp <-c("navy","red3")
p <- melt(p,id="X1")
px <-ggplot(p,aes(X1,value))
legendnames <- c("name1","name2")
px <- px +aes(shape = factor(variable))+
geom_point(aes(colour =factor(variable)))+
theme_bw()+
scale_shape_manual(labels=legendnames,values =shps )+
scale_color_manual(values = colp)
px
This gives me this:
But i want that with my legendnames
I just deleted the labels=legendnames, in scale_shape_manual
So whats the issue to solve that problem.
Please help

I think this is just a matter of providing the same labels parameter to the scale_color_manual, otherwise it doesn't know how to consolidate the legends together.
So
px <- px + aes(shape = factor(variable)) +
geom_point(aes(colour = factor(variable))) +
theme_bw()+
scale_shape_manual(labels=legendnames, values = shps)+
scale_color_manual(labels=legendnames, values = colp)
px

It's not really a problem, you programmed it in yourself by using legendnames (which it then adds, even though those variables are not on your data). If you remove them, the plot behaves as you want:
shps <-c(X2=1,X3=2)
colp <-c(X2="navy",X3="red3")
#easy if you want to rerun code, don't overwrite variables
p2 <- melt(p,id="X1")
px <- ggplot(data=p2) + geom_point(aes(x=X1, y=value,shape=variable,colour=variable)) +
scale_shape_manual(values=shps)+
scale_color_manual(values=colp)
px

Related

increase distance between stack of geom_line()

I have some diffraction data from XRD. I'd like to plot it all in one chart but stacked. Because the range of y is quite large, stacking is not so straight forward. there's a link to data if you wish to play and the simple script is below
https://www.dropbox.com/s/b9kyubzncwxge9j/xrd.csv?dl=0
library(dplyr)
library(ggplot2)
#load it up
xrd <- read.csv("xrd.csv")
#melt it
xrd.m = melt(xrd, id.var="Degrees_2_Theta")
# Reorder so factor levels are grouped together
xrd.m$variable = factor(xrd.m$variable,
levels=sort(unique(as.character(xrd.m$variable))))
names(xrd.m)[names(xrd.m) == "variable"] <- "Sample"
names(xrd.m)[names(xrd.m) == "Degrees_2_Theta"] <- "angle"
#colours use for nearly everything
cbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
#plot
ggplot(xrd.m, aes(angle, value, colour=Sample, group=Sample)) +
geom_line(position = "stack") +
scale_colour_manual(values=cbPalette) +
theme_linedraw() +
theme(legend.position = "none",
axis.text.y=element_blank(),
axis.ticks.y=element_blank()) +
labs(x="Degrees 2-theta", y="Intensity - stacked for clarity")
Here is the plot- as you can see it's not quite stacked
Here is something I had in excel a way back. ugly - but slightly better
I'm not sure that I will actually use the stacked plot function from R because I find it always looks off from past experience and instead might use the same data manipulation I used from excel.
It seems that you have a different understanding of the result of applying position="stack" on your geom_line() than what actually is happening. What you're looking to do is probably best served by either using faceting or creating a ridgeline plot. I will give you solutions for both of those approaches here with some example data (sorry, I don't click dropbox links and they will eventually break anyway).
What does position="stack" actually do?
The result of position="stack" will be that your y values of each line will be added, or "stacked", together in the resulting plot. That means that the lines as drawn will only actually accurately reflect the actual value in the data for one of the lines, and the other will be "added on top" of that (stacked). The behavior is best illustrated via an example:
ex <- data.frame(x=c(1,1,2,2,3,3), y=c(1,5,1,2,1,1), grp=rep(c('A','B'),3))
ggplot(ex, aes(x,y, color=grp)) + geom_line()
The y values for "A" are equal to 1 at all values of x. This is the same as indicating position="identity". Now, let's see what happens if we use position="stack":
ggplot(ex, aes(x,y, color=grp)) + geom_line(position="stack")
You should see, the value of y plotted for "B" is equal to B, whereas the y value for "A" is actually the value for "A" added to the value for "B". Hope that makes sense.
Faceting
What you're trying to do is take the overlapping lines you have and "separate" them vertically, right? That's not quite stacking, as you likely want to maintain their y values as position="identity" (the default). One way to do that quite easily is to use faceting, which creates what you could call "stacked plots" according to one or two variables in your dataset. In this case, I'm using example data (for reasons outlined above), but you can use this to understand how you want to arrange your own data.
set.seed(1919191)
df <- data.frame(
x=rep(1:100, 5),
y=c(rnorm(100,0,0.1), rnorm(100,0,0.2), rnorm(100,0,0.3), rnorm(100,0,0.4), rnorm(100,0,0.5)),
sample_name=c(rep('A',100), rep('B',100), rep('C',100), rep('D',100), rep('E',100)))
# plot code
p <- ggplot(df, aes(x,y, color=sample_name))
p + geom_line() + facet_grid(sample_name ~ .)
Create a Ridgeline Plot
The other way that kind of does the same thing is to create what is known as a ridgeline plot. You can do this via the package ggridges and here's an example using geom_ridgeline():
p + geom_ridgeline(
aes(y=sample_name, height=y),
fill=NA, scale=1, min_height=-Inf)
The idea here is to understand that geom_ridgeline() changes your y axis to be the grouping variable (so we actually have to redefine that in aes()), and the actual y value for each of those groups should be assigned to the height= aesthetic. If you have data that has negative y values (now height= values), you'll also want to set the min_height=, or it will cut them off at 0 by default. You can also change how much each of the groups are separated by playing with scale= (does not always change in the way you think it would, btw).

How to alter distances between plots in a 4 X 4 graph panel?

I am trying to create a graph panel with 8 graphs in total ( 4 x 4). Each graph corresponds to a different gene, whereby there are three lines ( one for control, one for UC disease and one for Crohns), representing the average change in expression comparing a first measurement and a second.
The code I am using to run each of the plots is;
s <- ggplot(X876, aes(x=Timepoint, y=value, group=Group)) +
geom_line(aes(color=Group), size=1)+
geom_point(aes(color=Group), size=2.5) +
labs(y="X876") + ylim(0.35, 0.55) +
theme_classic() +
scale_color_manual(values=c("darkmagenta", "deepskyblue4", "dimgrey"))
Using grid.arrange(l, m, n, o, p, q, r, s, nrow=4, nrow=4), creates a graph panel where the y axes names overlap.
I have seen on here about changing the plot margins via,
pl = replicate(3, ggplot(), FALSE)
grid.arrange(grobs = pl)
margin = theme(plot.margin = unit(c(2,2,2,2), "cm"))
grid.arrange(grobs = lapply(pl, "+", margin))
However, I am unsure how this can be applied to increase the vertical height between the plots on the top and bottom rows. For each of the graphs l, m, n, o, p, q, r, s do I need to include
+ theme(plot.margin=unit(c(t,r,b,l),"cm"))
and then run the grid.arrange(l, m, n, o, p, q, r, s, nrow=4, ncol=4)
Please could somebody suggest which values do I need to include for top (t), right(r), bottom (b), left(l) to only increase the distance (by about 3cms) between the top and bottom row? I am trying different values and I'm not getting a decent graph panel yet.
Thank-you
Probably the easiest way is to create your own theme based on the theme_classic theme and then modify the plotting margins (and anything else) the way that you prefer.
theme_new <- theme_classic() +
theme(plot.margin=unit(c(1,0,1,0), "cm")) # t,r,b,l
Then set the theme (will revert back to the default on starting a new R session).
theme_set(theme_new)
The alternative is to use grid.arrange and modify the margins using the grobs as you've already mentioned.
Once the panels have been arranged, you can then modify the top and bottom margins (or left and right) by specifying the vp argument of grid.arrange, which allows you to modify the viewport of multiple grobs on a single page. You can specify the height and width using the viewport function from the grid package.
For example, if you have a list of ggplot() grobs called g.list that contain your individual plots (l,m,n,o,p,q,r,s), then the following would reduce the height of the viewport by 90%, which effectively increases the top and bottom margins equally by 5%.
library(grid)
library(gridExtra)
grid.arrange(grobs = g.list, vp=viewport(height=0.9))
Without your data, I can't test it, especially to see if the y-axes labels overlap. And I don't know why you think increasing the top and bottom margins can solve that problem since the y-axes are, by default, on the left-hand side of the graph.
Anyway, I'll use the txhousing dataset from the ggplot2 package to see if I can reproduce your problem.
library(ggplot2)
data(txhousing)
theme_new <- theme_classic() +
theme(plot.margin=unit(c(0.1,0.1,0.1,0.1), "cm"), text=element_text(size=8))
theme_set(theme_new)
tx.list <- split(txhousing, txhousing$year)
g.list <- lapply(tx.list, function(data)
{
ggplot(data, aes(x=listings, y=sales)) +
geom_point(size=0.5)
} )
grid.arrange(grobs = g.list, vp=viewport(height=0.9))
I don't see any overlapping. And I don't see why increasing the top and bottom margins would make much difference.
The question was asked a couple of years ago, but I bumped into it only now and thought that I might share a quick and dirty tip for this, which works good enough in many cases.
In some situations the theme is already so complex that this trick might be the easiest way: adding a few \n's (newlines) to the x and y axis names, as this will affect the distances between the plots in the panel. I've learned this trick for a slightly different purpose from here (originally from here).
I'll use the same logic for the example dataset (in this case: Orange from R built-in data sets) as in the excellent code by the previous answerer.
library(ggplot2)
library(gridExtra)
or.list <- split(Orange, Orange$Tree)
g.list <- lapply(or.list, function(data)
{
ggplot(data, aes(x=age, y=circumference)) +
theme_classic() +
geom_point(size=0.5) +
scale_x_continuous(name = "Age\n\n") +
scale_y_continuous(name = "\n\n\nCircumference")
} )
grid.arrange(grobs = g.list)

compare boxplots with a single value

I want to compare the distribution of several variables (here X1 and X2) with a single value (here bm). The issue is that these variables are too many (about a dozen) to use a single boxplot.
Additionaly the levels are too different to use one plot. I need to use facets to make things more organised:
However with this plot my benchmark category (bm), which is a single value in X1 and X2, does not appear in X1 and seems to have several values in X2. I want it to be only this green line, which it is in the first plot. Any ideas why it changes? Is there any good workaround? I tried the options of facet_wrap/facet_grid, but nothing there delivered the right result.
I also tried combining a bar plot with bm and three empty categories with the boxplot. But firstly it looked terrible and secondly it got similarly screwed up in the facetting. Basically any work around would help.
Below the code to create the minimal example displayed here:
# Creating some sample data & loading libraries
library(ggplot2)
library(RColorBrewer)
set.seed(10111)
x=matrix(rnorm(40),20,2)
y=rep(c(-1,1),c(10,10))
x[y==1,]=x[y==1,]+1
x[,2]=x[,2]+20
df=data.frame(x,y)
# creating a benchmark point
benchmark=data.frame(y=rep("bm",2),key=c("X1","X2"),value=c(-0.216936,20.526312))
# melting the data frame, rbinding it with the benchmark
test_dat=rbind(tidyr::gather(df,key,value,-y),benchmark)
# Creating a plot
p_box <- ggplot(data = test_dat, aes(x=key, y=value,color=as.factor(test_dat$y))) +
geom_boxplot() + scale_color_manual(name="Cluster",values=brewer.pal(8,"Set1"))
# The first line delivers the first plot, the second line the second plot
p_box
p_box + facet_wrap(~key,scales = "free",drop = FALSE) + theme(legend.position = "bottom")
The problem only lies int the use of test_dat$y inside the color aes. Never use $ in aes, ggplot will mess up.
Anyway, I think you plot would improve if you use a geom_hline for the benchmark, instead of hacking in a single value boxplot:
library(ggplot2)
library(RColorBrewer)
ggplot(tidyr::gather(df,key,value,-y)) +
geom_boxplot(aes(x=key, y=value, color=as.factor(y))) +
geom_hline(data = benchmark, aes(yintercept = value), color = '#4DAF4A', size = 1) +
scale_color_manual(name="Cluster",values=brewer.pal(8,"Set1")) +
facet_wrap(~key,scales = "free",drop = FALSE) +
theme(legend.position = "bottom")

R geom_line not plotting as expected

I am using the following code to plot a stacked area graph and I get the expected plot.
P <- ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) + #ggplot initial parameters
geom_ribbon(position='fill', aes(ymin=0, ymax=1))
but then when I add lines which are reading the same data source I get misaligned results towards the right side of the graph
P + geom_line(position='fill', aes(group=model, ymax=1))
does anyone know why this may be? Both plots are reading the same data source so I can't figure out what the problem is.
Actually, if all you wanted to do was draw an outline around the areas, then you could do the same using the colour aesthetic.
ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) +
geom_ribbon(position='fill', aes(ymin=0, ymax=1), colour = "black")
I have an answer, I hope it works for you, it looks good but very different from your original graph:
library(ggplot2)
DATA2 <- read.csv("C:/Users/corcoranbarriosd/Downloads/porsche model volumes.csv", header = TRUE, stringsAsFactors = FALSE)
In my experience you want to have X as a numeric variable and you have it as a string, if that is not the case I can Change that, but this will transform your bucket into a numeric vector:
bucket.list <- strsplit(unlist(DATA2$bucket), "[^0-9]+")
x=numeric()
for (i in 1:length(bucket.list)) {
x[i] <- bucket.list[[i]][2]
}
DATA2$bucket <- as.numeric(x)
P <- ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) +
geom_ribbon(aes(ymin=0, ymax=volume))+ geom_line(aes(group=model, ymax=volume))
It gives me the area and the line tracking each other, hope that's what you needed
If you switch to using geom_path in place of geom_line, it all seems to work as expected. I don't think the ordering of geom_line is behaving the same as geom_ribbon (and suspect that geom_line -- like geom_area -- assumes a zero base y value)
ggplot(DATA2, aes(x=bucket, y=volume, ymin=0, ymax=1,
group=model, fill=model, label=volume)) +
geom_ribbon(position='fill') +
geom_path(position='fill')
Should give you

Customize linetype in ggplot2 OR add automatic arrows/symbols below a line

I would like to use customized linetypes in ggplot. If that is impossible (which I believe to be true), then I am looking for a smart hack to plot arrowlike symbols above, or below, my line.
Some background:
I want to plot some water quality data and compare it to the standard (set by the European Water Framework Directive) in a red line. Here's some reproducible data and my plot:
df <- data.frame(datum <- seq.Date(as.Date("2014-01-01"),
as.Date("2014-12-31"),by = "week"),y=rnorm(53,mean=100,sd=40))
(plot1 <-
ggplot(df, aes(x=datum,y=y)) +
geom_line() +
geom_point() +
theme_classic()+
geom_hline(aes(yintercept=70),colour="red"))
However, in this plot it is completely unclear if the Standard is a maximum value (as it would be for example Chloride) or a minimum value (as it would be for Oxygen). So I would like to make this clear by adding small pointers/arrows Up or Down. The best way would be to customize the linetype so that it consists of these arrows, but I couldn't find a way.
Q1: Is this at all possible, defining custom linetypes?
All I could think of was adding extra points below the line:
extrapoints <- data.frame(datum2 <- seq.Date(as.Date("2014-01-01"),
as.Date("2014-12-31"),by = "week"),y2=68)
plot1 + geom_point(data=extrapoints, aes(x=datum2,y=y2),
shape=">",size=5,colour="red",rotate=90)
However, I can't seem to rotate these symbols pointing downward. Furthermore, this requires calculating the right spacing of X and distance to the line (Y) every time, which is rather inconvenient.
Q2: Is there any way to achieve this, preferably as automated as possible?
I'm not sure what is requested, but it sounds as though you want arrows at point up or down based on where the y-value is greater or less than some expected value. If that's the case, then this satisfies using geom_segment:
require(grid) # as noted by ?geom_segment
(plot1 <-
ggplot(df, aes(x=datum,y=y)) + geom_line()+
geom_segment(data = data.frame( df$datum, y= 70, up=df$y >70),
aes(xend = datum , yend =70 + c(-1,1)[1+up]*5), #select up/down based on 'up'
arrow = arrow(length = unit(0.1,"cm"))
) + # adjust units to modify size or arrow-heads
geom_point() +
theme_classic()+
geom_hline(aes(yintercept=70),colour="red"))
If I'm wrong about what was desired and you only wanted a bunch of down arrows, then just take out the stuff about creating and using "up" and use a minus-sign.

Resources