I am trying to add another set of points on top of a geom_point. The problem is that the initial dataset has a factor with 3 levels while the second doesn't. I want the first set of points to have different colors and shapes according to the factor levels and the second to be uniform. The plot is like this:
plot = ggplot() +
geom_point(data1,
aes(x = x1, y = y1,
color = factor, shape = factor)) +
scale_color_manual(values = factor_color) +
scale_shape_manual(values = factor_shape)
When I add the other set of points,
plot +
geom_point(data2,
aes(x = x2, y = y2))
I get this error
Error: Insufficient values in manual scale. 4 needed but only 3
provided.
I understand why this happens.
But when I set the scales inside the second geom_point, color = "red" and shape = 1 I get this error
Error: Continuous value supplied to discrete scale
Is there a solution to this problem?
EDIT
Example data have this structure
data1 = data.frame(factor = factor(rep(letters[1:3], 3)),
x1 = rnorm(9),
y1 = rnorm(9))
data2 = data.frame(x2 = rnorm(6),
y2 = rnorm(6))
factor_color = scales::hue_pal()(3)
factor_shape = c(19, 15, 17)
In my example, I have no error.
Just guessed you data:
data1 <- data.frame(factor = factor(letters[1:4]),
x1 = rnorm(4),
y1 = rnorm(4))
data2 <- data.frame(factor = factor(letters[1:3]),
x2 = rnorm(3),
y2 = rnorm(3))
factor_color <- scales::hue_pal()(4)
factor_shape <- 1:4
Your code with two small changes.
I specified data = in both geom_point
library(ggplot2)
plot <- ggplot() +
geom_point(data = data1,
aes(x = x1, y = y1,
color = factor, shape = factor)) +
scale_color_manual(values = factor_color) +
scale_shape_manual(values = factor_shape)
plot +
geom_point(data = data2,
aes(x = x2, y = y2))
Related
I would like to plot multiple lines in a single ggplot, where each line would represent relationship between x and y given two or more parameters.
I know how to do that for one parameter:
Take following example data:
library(ggplot2)
library(reshape2)
rs = data.frame(seq(200, 1000, by=200),
runif(5),
runif(5),
rbinom(n = 5, size = 1, prob = 0.5))
names(rs) = c("x_", "var1", "var2", "par")
melted = melt(rs, id.vars="x_")
ggplot(data = melted,
aes(x = x_, y = value, group = variable, col = variable)) +
geom_point() +
geom_line(linetype = "dashed")
This plots three lines one for var1, one for var2 and one for par.
However, I would like four lines: one for var1 given par=0 and another one for var1 given par=1, and the same then again for var2.
How would this scale up, for example if I want that the condition is a combination of multiple parameters (e.g. par2 + par)?
If you melt the data in a different way, you can use par to change the shape and linetype of your lines, so it's nice and clear which line is which:
rs_melt = melt(rs, id.vars = c("x_", "par"))
ggplot(rs_melt, aes(x = x_, y = value, colour = variable,
shape = factor(par), linetype = factor(par))) +
geom_line(size = 1.1) +
geom_point(size = 3) +
labs(shape = "par", linetype = "par")
Output:
You need to adjust your melt function and add a group column which has both par and var details. I think below is what you want?
library(reshape)
library(ggplot2)
rs = data.frame(seq(200, 1000, by=200), runif(5), runif(5), rbinom(n = 5, size = 1, prob = 0.5))
names(rs)=c("x_", "var1", "var2", "par")
melted = melt(rs, id.vars=c("x_", "par"))
melted$group <- paste(melted$par, melted$variable)
ggplot(data=melted, aes(x=x_, y=value, group =group, col=group))+ geom_point() + geom_line(linetype = "dashed")
I apologize if this is more for SO instead of CV.
I am attempting to include a second boxplot into an existing boxplot that is ordered by the mean of the values plotted. When I include the boxplot from the second data.frame (representing a control sample to the other plots), the original plot looses its ordering.
Below is an example:
x1 <- data.frame("V1" = runif(100, 0, 100), "siteno" = "X1") #mean = 50.3
x2 <- data.frame("V1" = runif(100, 200, 450), "siteno" = "X2") #mean = 322.4
x3 <- data.frame("V1" = runif(100, 50, 150), "siteno" = "X3") #mean = 97.8
xData <- rbind(x1,x2,x3)
xData$siteno <- with(xData, reorder(siteno, V1, mean))
zData <- data.frame("V1" = runif(300, 0, 450), "siteno" = "Z1") #mean = 224.2
#orders xData correctly
ggplot(xData, aes(x = siteno , y = V1)) +
stat_summary(fun.y=mean, colour="red", geom="point") +
geom_boxplot (aes(fill=siteno), alpha=.5, width=1, position = position_dodge(width = 1), outlier.colour = "dark gray", outlier.size = 1)
this produces the below plot with x variables correctly ordered by mean:
If I try the code below to add the control data, the order of the x variables is lost:
x1 <- data.frame("V1" = runif(100, 0, 100), "siteno" = "X1") #mean = 50.3
x2 <- data.frame("V1" = runif(100, 200, 450), "siteno" = "X2") #mean = 322.4
x3 <- data.frame("V1" = runif(100, 50, 150), "siteno" = "X3") #mean = 97.8
xData <- rbind(x1,x2,x3)
xData$siteno <- with(xData, reorder(siteno, V1, mean))
zData <- data.frame("V1" = runif(300, 0, 450), "siteno" = "Z1") #mean = 224.2
#orders xData correctly
ggplot(xData, aes(x = siteno , y = V1)) +
stat_summary(fun.y=mean, colour="red", geom="point") +
geom_boxplot (aes(fill=siteno), alpha=.5, width=1, position = position_dodge(width = 1), outlier.colour = "dark gray", outlier.size = 1) +
geom_boxplot(data=zData, aes(x = siteno , y = V1))
this produces the following plot with no ordering of the x variables:
The point of my graph is to show the test values ordered by their mean and then have the control values boxplot off to the right for visual reference. I imagine there could be a solution that combines the xData and zData dataframes; I am willing to try that if there are some suggestions.
Thank you for your time.
When you use two data frame to combine data in one plot original levels (and order) is lost and new levels than combine data from both data frames are used. You don't get this behavior for fill values because you don't provide fill argument for the second data frame. But for the discrete x scale both data frames are combined and new levels are X1, X2, X3 and Z1.
Without making one data frame from all values you can use scale_x_discrete() and then in argument limits= use function levels() to get original order of levels of siteno and combine it with Z1 for reference level.
ggplot(xData, aes(x = siteno , y = V1)) +
stat_summary(fun.y=mean, colour="red", geom="point") +
geom_boxplot (aes(fill=siteno), alpha=.5, outlier.colour = "dark gray",
outlier.size = 1) +
geom_boxplot(data=zData, aes(x = siteno , y = V1))+
scale_x_discrete(limits=c(levels(xData$siteno),"Z1"))
why not add them all in one data.frame and order all 4 levels in that?
data2 <- rbind(xData, zData)
ggplot(data2, aes(x = siteno , y = V1)) +
stat_summary(fun.y=mean, colour="red", geom="point") +
geom_boxplot (aes(fill=siteno), alpha=.5, width=1,
position = position_dodge(width = 1),
outlier.colour = "dark gray", outlier.size = 1)
capture the desired order, eg, something like:
ord <- xvars[order(mean(xvars))]
Then use scale_x_discrete()
goal: a legend which contains two levels of a factor, even if both levels are not represented on the figure
minimum reproducible example:
library(ggplot2)
library(plyr)
mre <- data.frame(plotfactor = factor(rep(c("response1", "response2"), c(2,
2))), linefactor = factor(rep(c("line1", "line2"), 2)), x1 = runif(n = 4),
x2 = runif(n = 4), y1 = runif(n = 4), y2 = runif(n = 4), ltype = c("foo",
"foo", "foo", "bar"))
## this looks great!
ggplot(mre, aes(x = x1, xend = x2, y = y1, yend = y2, colour = linefactor,
linetype = ltype)) + geom_segment() + facet_wrap(~plotfactor)
the problem
However, If I use a plyr function to print each plot to a separate page of a
pdf, the legend for ltype only contains the single factor level that appears in the subset of the dataframe that d_ply passes to ggplot -- i.e., it only says "foo". I want it to say both "foo" and "bar", even if the level "bar" of the factor ltype does not occur in the graph:
pdf("test.pdf")
d_ply(mre, .(plotfactor), function(DF)
g <- ggplot(DF, aes(x = x1, xend = x2, y = y1, yend = y2, colour = linefactor,
linetype = ltype)) + geom_segment()
print(g)
})
dev.off()
(forgive me: I have no idea how to produce this effect in a png)
You'll need to use scale_linetype_discrete to make sure both are on the legend.
mre$ltype <- factor(mre$ltype)
plot1 <- ggplot(subset(mre, plotfactor == "response1"),
aes(x = x1, xend = x2, y = y1, yend = y2, colour = linefactor,
linetype = ltype)) +
geom_segment() +
scale_linetype_discrete(drop = FALSE)
ggsave(plot1, file = "plot1.pdf")
plot2 <- ggplot(subset(mre, plotfactor == "response2"),
aes(x = x1, xend = x2, y = y1, yend = y2, colour = linefactor,
linetype = ltype)) +
geom_segment() +
scale_linetype_discrete(drop = FALSE)
ggsave(plot2, file = "plot2.pdf")
I have a plot from the following script.
require(ggplot2)
df.shape <- data.frame(
AX = runif(10),
AY = runif(10),
BX = runif(10, 2, 3),
BY = runif(10, 2, 3)
)
p <- ggplot(df.shape)
p <- p + geom_point(aes(x = AX, y = AY, shape = 15)) +
geom_point(aes(x = BX, y = BY, shape = 19)) +
scale_shape_identity() +
guides(shape = guide_legend(override.aes = list(shape = 15, shape = 19)) )
print(p)
This doesn't produce a legend, describing which shape is "A" and which shape is "B". Note that the squares and circles may be close to one another, so I can't generally define the variable based on location. How do I display a "shape" legend?
I would reshape my data in the long format using reshape:
dt <- reshape(df.shape ,direction='long', varying=list(c(1, 3), c(2, 4)),
,v.names = c('X','Y'), times = c('A','B'))
Then I plot it simply like this
ggplot(dt) +
geom_point(aes(x = X, y = Y, shape = time),size=5) +
scale_shape_manual(values=c(15,19))
Is it possible to plot two sets of data on the same plot, but use different color palettes for each set?
testdf <- data.frame( x = rnorm(100),
y1 = rnorm(100, mean = 0, sd = 1),
y2 = rnorm(100, mean = 10, sd = 1),
yc = rnorm(100, mean = 0, sd = 3))
ggplot(testdf, aes(x, y1, colour = yc)) + geom_point() +
geom_point(aes(y = y2))
What I would like to see is one set of data, say y1, in blues (color set by yc), and the other set in reds (again color set by yc).
The legend should then show 2 color scales, one in blue, the other red.
Thanks for your suggestions.
If you translate the "blues" and "reds" to varying transparency, then it is not against ggplot's philosophy. So, using Thierry's Moltenversion of the data set:
ggplot(Molten, aes(x, value, colour = variable, alpha = yc)) + geom_point()
Should do the trick.
That's not possible with ggplot2. I think it against the philosophy of ggplot2 because it complicates the interpreatation of the plot.
Another option is to use different shapes to separate the points.
testdf <- data.frame( x = rnorm(100),
y1 = rnorm(100, mean = 0, sd = 1),
y2 = rnorm(100, mean = 10, sd = 1),
yc = rnorm(100, mean = 0, sd = 3))
Molten <- melt(testdf, id.vars = c("x", "yc"))
ggplot(Molten, aes(x, value, colour = yc, shape = variable)) + geom_point()