Related
I have created the following code for a graph in which four fitted lines and corresponding points are plotted. I have problems with the legend. For some reason I cannot find a way to assign the different shapes of the points to a variable name. Also, the colours do not line up with the actual colours in the graph.
y1 <- c(1400,1200,1100,1000,900,800)
y2 <- c(1300,1130,1020,970,830,820)
y3 <- c(1340,1230,1120,1070,940,850)
y4 <- c(1290,1150,1040,920,810,800)
df <- data.frame(x,y1,y2,y3,y4)
g <- ggplot(df, aes(x=x), shape="shape") +
geom_smooth(aes(y=y1), colour="red", method="auto", se=FALSE) + geom_point(aes(y=y1),shape=14) +
geom_smooth(aes(y=y2), colour="blue", method="auto", se=FALSE) + geom_point(aes(y=y2),shape=8) +
geom_smooth(aes(y=y3), colour="green", method="auto", se=FALSE) + geom_point(aes(y=y3),shape=6) +
geom_smooth(aes(y=y4), colour="yellow", method="auto", se=FALSE) + geom_point(aes(y=y4),shape=2) +
ylab("x") + xlab("y") + labs(title="overview")
geom_line(aes(y=1000), linetype = "dashed")
theme_light() +
theme(plot.title = element_text(color="black", size=12, face="italic", hjust = 0.5)) +
scale_shape_binned(name="Value g", values=c(y1="14",y2="8",y3="6",y4="2"))
print(g)
I am wondering why the colours don't match up and how I can construct such a legend that it is clear which shape corresponds to which variable name.
While you can add the legend manually via scale_shape_manual, perhaps the adequate solution would be to reshape your data (try using tidyr::pivot_longer() on y1:y4 variables), and then assigning the resulting variable to the shape aesthetic (you can then manually set the colors to your liking). You would then need to use a single geom_point() and geom_smooth() instead of four of each.
Also, you're missing a reproducible example (what are the values of x?) and your code emits some warnings while trying to perform loess smoothing (because there's fewer data points than need to perform it).
Update (2021-12-12)
Here's a reproducible example in which we reshape the original data and feed it to ggplot using its aes() function to automatically plot different geom_point and geom_smooth for each "y group". I made up the values for the x variable.
library(ggplot2)
library(tidyr)
x <- 1:6
y1 <- c(1400,1200,1100,1000,900,800)
y2 <- c(1300,1130,1020,970,830,820)
y3 <- c(1340,1230,1120,1070,940,850)
y4 <- c(1290,1150,1040,920,810,800)
df <- data.frame(x,y1,y2,y3,y4)
data2 <- df %>%
pivot_longer(y1:y4, names_to = "group", values_to = "y")
ggplot(data2, aes(x, y, color = group, shape = group)) +
geom_point(size = 3) + # increased size for increased visibility
geom_smooth(method = "auto", se = FALSE)
Run the code line by line in RStudio and use it to inspect data2. I think it'll make more sense here's the resulting output:
Another update
Freek19, in your second example you'll need to specify both the shape and color scales manually, so that ggplot2 considers them to be the same, like so:
library(ggplot2)
data <- ... # from your previous example
ggplot(data, aes(x, y, shape = group, color = group)) +
geom_smooth() +
geom_point(size = 3) +
scale_shape_manual("Program type", values=c(1, 2, 3,4,5)) +
scale_color_manual("Program type", values=c(1, 2, 3,4,5))
Hope this helps.
I managed to get close to what I want, using:
library(ggplot2)
data <- data.frame(x = c(0,0.02,0.04,0.06,0.08,0.1),
y = c(1400,1200,1100,1000,910,850, #y1
1300,1130,1010,970,890,840, #y2
1200,1080,980,950,880,820, #y3
1100,1050,960,930,830,810, #y4
1050,1000,950,920,810,800), #y5
group = rep(c("5%","6%","7%","8%","9%"), each = 6))
data
Values <- ggplot(data, aes(x, y, shape = group, color = group)) + # Create line plot with default colors
geom_smooth(aes(color=group)) + geom_point(aes(shape=group),size=3) +
scale_shape_manual(values=c(1, 2, 3,4,5))+
geom_line(aes(y=1000), linetype = "dashed") +
ylab("V(c)") + xlab("c") + labs(title="Valuation")+
theme_light() +
theme(plot.title = element_text(color="black", size=12, face="italic", hjust = 0.5))+
labs(group="Program Type")
Values
I am only stuck with 2 legends. I want to change both name, because otherwise they overlap. However I am not sure how to do this.
I would like to change the y axis label (or main title would also be fine) of a ggplot to reflect the column name being iterated over within an apply function.
Here is some sample data and my working apply function:
trial_df <- data.frame("Patient" = c(1,1,2,2,3,3,4,4),
"Outcome" = c("NED", "NED", "NED", "NED", "Relapse","Relapse","Relapse","Relapse"),
"Time_Point" = c("Baseline", "Week3", "Baseline", "Week3","Baseline", "Week3","Baseline", "Week3"),
"CD4_Param" = c(50.8,53.1,20.3,18.1,30.8,24.5,35.2,31.0),
"CD8_Param" = c(5.3,9.7,4.4,4.3,3.1,3.2,5.6,5.3),
"CD3_Param" = c(11.6,16.6,5.0,5.1,14.3,7.1,5.9,8.1))
apply(trial_df[,4:length(trial_df)], 2, function(i) ggplot(data = trial_df, aes_string(x = "Time_Point", y = i )) +
facet_wrap(~Outcome) +
geom_boxplot(alpha = 0.1) +
geom_point(aes(color = `Outcome`, fill = `Outcome`)) +
geom_path(aes(group = `Patient`, color = `Outcome`)) +
theme_minimal() +
ggpubr::stat_compare_means( method = "wilcox.test") +
scale_fill_manual(values=c("blue", "red")) +
scale_color_manual(values=c("blue", "red")))
Example plot output
This creates 3 graphs as expected, however the y axis just says "y". I would like this to display the column name for the column in that iteration. It would also be fine to add a main title with this information, as I just need to know which graph corresponds to which column.
Here are things I have already tried adding to the ggplot code above based on some similar questions I found, but all of them give me the error "non-numeric argument to binary operator":
ggtitle(paste(i))
labs(y = i)
labs(y = as.character(i))
Any help or resources I may have missed would be greatly appreciated, thanks!
So.....for the strangest of reasons I cannot figure out why. This gives what you want but for only one graph!!!
apply(trial_df[,4:length(trial_df)], 2, function(i) ggplot(data = trial_df, aes_string(x = "Time_Point", y = i )) +
facet_wrap(~Outcome) +
geom_boxplot(alpha = 0.1) +
geom_point(aes(color = `Outcome`, fill = `Outcome`)) +
geom_path(aes(group = `Patient`, color = `Outcome`)) +
theme_minimal() +
stat_compare_means( method = "wilcox.test") +
scale_fill_manual(values=c("blue", "red")) +
scale_color_manual(values=c("blue", "red"))+
labs(y=colnames(trial_df)[i]))
Gives these:
Let's say I have the following data frame:
library(ggplot2)
set.seed(101)
n=10
df<- data.frame(delta=rep(rep(c(0.1,0.2,0.3),each=3),n), metric=rep(rep(c('P','R','C'),3),n),value=rnorm(9*n, 0.0, 1.0))
My goal is to do a boxplot by multiple factors:
p<- ggplot(data = df, aes(x = factor(delta), y = value)) +
geom_boxplot(aes(fill=factor(metric)))
The output is:
So far so good, but if I do:
p+ geom_point(aes(color = factor(metric)))
I get:
I do not know what it is doing. My goal is to color the outliers as it is done here. Note that this solution changes the inside color of the boxes to white and set the border to different colors. I want to keep the same color of the boxes while having the outliers inherit those colors. I want to know how to make the outliers get the same colors from their respective boxplots.
Do you want just to change the outliers' colour ? If so, you can do it easily by drawing boxplot twice.
p <- ggplot(data = df, aes(x = factor(delta), y = value)) +
geom_boxplot(aes(colour=factor(metric))) +
geom_boxplot(aes(fill=factor(metric)), outlier.colour = NA)
# outlier.shape = 21 # if you want a boarder
[EDITED]
colss <- c(P="firebrick3",R="skyblue", C="mediumseagreen")
p + scale_colour_manual(values = colss) + # outliers colours
scale_fill_manual(values = colss) # boxes colours
# the development version (2.1.0.9001)'s geom_boxplot() has an argument outlier.fill,
# so I guess under code would return the similar output in the near future.
p2 <- ggplot(data = df, aes(x = factor(delta), y = value)) +
geom_boxplot(aes(fill=factor(metric)), outlier.shape = 21, outlier.colour = NA)
Maybe this:
ggplot(data = df, aes(x = as.factor(delta), y = value,fill=as.factor(metric))) +
geom_boxplot(outlier.size = 1)+ geom_point(pch = 21,position=position_jitterdodge(jitter.width=0))
I am trying to overlay multiple trend lines using the geom_smooth() in R. I currently have this code.
ggplot(mtcars2, aes(x=Displacement, y = Variable, color = Variable))
+ geom_point(aes(x=mpg, y = hp, col = "Power"))
+ geom_point(aes(x=mpg, y = drat, col = "Drag Coef."))
(mtcars2 is the normalized form of mtcars)
Which give me this graph.
I am trying to use the geom_smooth(method='lm') to draw two trend lines for the the two variables. Any ideas?
(Bonus: I would also like to implement the 'shape=1' paramater to differentiate the varaibles if possible. The following method does not work)
geom_point(aes(x=mpg, y = hp, col = "Power", shape=2))
Update
I managed to do this.
ggplot(mtcars2, aes(x=Displacement, y = Variable, color = Variable))
+ geom_point(aes(x=disp, y = hp, col = "Power"))
+ geom_point(aes(x=disp, y = mpg, col = "MPG"))
+ geom_smooth(method= 'lm',aes(x=disp, y = hp, col = "Power"))
+ geom_smooth(method= 'lm',aes(x=disp, y = mpg, col = "MPG"))
It looks like this.
But this is an ugly piece of code. If anybody can make this code look prettier, it'd be great. Also, I have not yet been able to implement the 'shape=2' parameter.
It seems like you're making your life harder than it needs to be...you can pass in additional parameters into aes() such as group and shape.
I don't know if I got your normalization right, but this should give you enough to get going in the right direction:
library(ggplot2)
library(reshape2)
#Do some normalization
mtcars$disp_norm <- with(mtcars, (disp - min(disp)) / (max(disp) - min(disp)))
mtcars$hp_norm <- with(mtcars, (hp - min(hp)) / (max(hp) - min(hp)))
mtcars$drat_norm <- with(mtcars, (drat - min(drat)) / (max(drat) - min(drat)))
#Melt into long form
mtcars.m <- melt(mtcars, id.vars = "disp_norm", measure.vars = c("hp_norm", "drat_norm"))
#plot
ggplot(mtcars.m, aes(disp_norm, value, group = variable, colour = variable, shape = variable)) +
geom_point() +
geom_smooth(method = "lm")
Yielding:
I am new to R and am trying to plot 3 histograms onto the same graph.
Everything worked fine, but my problem is that you don't see where 2 histograms overlap - they look rather cut off.
When I make density plots, it looks perfect: each curve is surrounded by a black frame line, and colours look different where curves overlap.
Can someone tell me if something similar can be achieved with the histograms in the 1st picture? This is the code I'm using:
lowf0 <-read.csv (....)
mediumf0 <-read.csv (....)
highf0 <-read.csv(....)
lowf0$utt<-'low f0'
mediumf0$utt<-'medium f0'
highf0$utt<-'high f0'
histogram<-rbind(lowf0,mediumf0,highf0)
ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)
Using #joran's sample data,
ggplot(dat, aes(x=xx, fill=yy)) + geom_histogram(alpha=0.2, position="identity")
note that the default position of geom_histogram is "stack."
see "position adjustment" of this page:
geom_histogram documentation
Your current code:
ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)
is telling ggplot to construct one histogram using all the values in f0 and then color the bars of this single histogram according to the variable utt.
What you want instead is to create three separate histograms, with alpha blending so that they are visible through each other. So you probably want to use three separate calls to geom_histogram, where each one gets it's own data frame and fill:
ggplot(histogram, aes(f0)) +
geom_histogram(data = lowf0, fill = "red", alpha = 0.2) +
geom_histogram(data = mediumf0, fill = "blue", alpha = 0.2) +
geom_histogram(data = highf0, fill = "green", alpha = 0.2) +
Here's a concrete example with some output:
dat <- data.frame(xx = c(runif(100,20,50),runif(100,40,80),runif(100,0,30)),yy = rep(letters[1:3],each = 100))
ggplot(dat,aes(x=xx)) +
geom_histogram(data=subset(dat,yy == 'a'),fill = "red", alpha = 0.2) +
geom_histogram(data=subset(dat,yy == 'b'),fill = "blue", alpha = 0.2) +
geom_histogram(data=subset(dat,yy == 'c'),fill = "green", alpha = 0.2)
which produces something like this:
Edited to fix typos; you wanted fill, not colour.
While only a few lines are required to plot multiple/overlapping histograms in ggplot2, the results are't always satisfactory. There needs to be proper use of borders and coloring to ensure the eye can differentiate between histograms.
The following functions balance border colors, opacities, and superimposed density plots to enable the viewer to differentiate among distributions.
Single histogram:
plot_histogram <- function(df, feature) {
plt <- ggplot(df, aes(x=eval(parse(text=feature)))) +
geom_histogram(aes(y = ..density..), alpha=0.7, fill="#33AADE", color="black") +
geom_density(alpha=0.3, fill="red") +
geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
labs(x=feature, y = "Density")
print(plt)
}
Multiple histogram:
plot_multi_histogram <- function(df, feature, label_column) {
plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
geom_density(alpha=0.7) +
geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
labs(x=feature, y = "Density")
plt + guides(fill=guide_legend(title=label_column))
}
Usage:
Simply pass your data frame into the above functions along with desired arguments:
plot_histogram(iris, 'Sepal.Width')
plot_multi_histogram(iris, 'Sepal.Width', 'Species')
The extra parameter in plot_multi_histogram is the name of the column containing the category labels.
We can see this more dramatically by creating a dataframe with many different distribution means:
a <-data.frame(n=rnorm(1000, mean = 1), category=rep('A', 1000))
b <-data.frame(n=rnorm(1000, mean = 2), category=rep('B', 1000))
c <-data.frame(n=rnorm(1000, mean = 3), category=rep('C', 1000))
d <-data.frame(n=rnorm(1000, mean = 4), category=rep('D', 1000))
e <-data.frame(n=rnorm(1000, mean = 5), category=rep('E', 1000))
f <-data.frame(n=rnorm(1000, mean = 6), category=rep('F', 1000))
many_distros <- do.call('rbind', list(a,b,c,d,e,f))
Passing data frame in as before (and widening chart using options):
options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, 'n', 'category')
To add a separate vertical line for each distribution:
plot_multi_histogram <- function(df, feature, label_column, means) {
plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
geom_density(alpha=0.7) +
geom_vline(xintercept=means, color="black", linetype="dashed", size=1)
labs(x=feature, y = "Density")
plt + guides(fill=guide_legend(title=label_column))
}
The only change over the previous plot_multi_histogram function is the addition of means to the parameters, and changing the geom_vline line to accept multiple values.
Usage:
options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, "n", 'category', c(1, 2, 3, 4, 5, 6))
Result:
Since I set the means explicitly in many_distros I can simply pass them in. Alternatively you can simply calculate these inside the function and use that way.