If I have a dataframe like this:
obs<-rnorm(20)
d<-data.frame(year=2000:2019,obs=obs,pred=obs+rnorm(20,.1))
d$pup<-d$pred+.5
d$plow<-d$pred-.5
d$obs[20]<-NA
d
And I want the observation and model prediction error bars to look something like:
(p1<-ggplot(data=d)+aes(x=year)
+geom_point(aes(y=obs),color='red',shape=19)
+geom_point(aes(y=pred),color='blue',shape=3)
+geom_errorbar(aes(ymin=plow,ymax=pup))
)
How do I add a legend/scale/key identifying the red points as observations and the blue plusses with error bars as point predictions with ranges?
Here is one solution melting pred/obs into one column. Can't post image due to rep.
library(ggplot2)
obs <- rnorm(20)
d <- data.frame(dat=c(obs,obs+rnorm(20,.1)))
d$pup <- d$dat+.5
d$plow <- d$dat-.5
d$year <- rep(2000:2019,2)
d$lab <- c(rep("Obs", 20), rep("Pred", 20))
p1<-ggplot(data=d, aes(x=year)) +
geom_point(aes(y = dat, colour = factor(lab), shape = factor(lab))) +
geom_errorbar(data = d[21:40,], aes(ymin=plow,ymax=pup), colour = "blue") +
scale_shape_manual(name = "Legend Title", values=c(6,1)) +
scale_colour_manual(name = "Legend Title", values=c("red", "blue"))
p1
edit: Thanks for the rep. Image added
Here is a ggplot solution that does not require melting and grouping.
set.seed(1) # for reproducible example
obs <- rnorm(20)
d <- data.frame(year=2000:2019,obs,pred=obs+rnorm(20,.1))
d$obs[20]<-NA
library(ggplot2)
ggplot(d,aes(x=year))+
geom_point(aes(y=obs,color="obs",shape="obs"))+
geom_point(aes(y=pred,color="pred",shape="pred"))+
geom_errorbar(aes(ymin=pred-0.5,ymax=pred+0.5))+
scale_color_manual("Legend",values=c(obs="red",pred="blue"))+
scale_shape_manual("Legend",values=c(obs=19,pred=3))
This creates a color and shape scale wiith two components each ("obs" and "pred"). Then uses scale_*_manual(...) to set the values for those scales ("red","blue") for color, and (19,3) for scale.
Generally, if you have only two categories, like "obs" and "pred", then this is a reasonable way to go use ggplot, and avoids merging everything into one data frame. If you have more than two categories, or if they are integral to the dataset (e.g., actual categorical variables), then you are much better off doing this as in the other answer.
Note that your example left out the column year so your code does not run.
Related
i have two dataframes comtaining results from epigenetic analysis.
the column from df1 which is important to the plot is labelled beta_ADHD
the column from df2 which is important to the plot is labelled beta_ADHD
I would like to make the the column from df 1 the x axis and the column from df 2 the y axis,
i would also like to label the points on the graph according to the data set they are from.
this is what ive tried so far but nothing has worked yet:
ggp <- ggplot(NULL, aes(Beta_ADHD, Beta_ADHD)) + # Draw ggplot2 plot based on two data frames
geom_point(data = df1, col = "red") +
geom_point(data = df2, col = "blue")
ggp # Draw plot
and i also tried this:
ggplot(data=data.frame(x=df1$Beta_ADHD, y=df2$Beta_ADHD), aes(x=x, y=y)) + geom_point()
I'm at a complete loss here and any help would be greatly appreciated.
I think you need to combine the inputs into a single data frame in order to use them as co-ordinates for a scatter plot. (Also, the 2 data sets must have the same number of values.)
I don't believe it makes sense to label or colour the points according to which data set they are from. As we are taking the x-coordinate from df1 and the y-coordinate from df2, that means that every point comes from both data sets. It is the labels on the x-axis beta_ADHD1 and y-axis beta_ADHD2 that show which data set the value came from. You can change the text and color of the axis titles using xlab(), ylab() and theme().
# create some sample data
df1 <- data.frame(beta_ADHD=runif(100,0,10))
df2 <- data.frame(beta_ADHD=rnorm(100,0,10))
# create a new data frame containing the required co-ordinates
# the values from df1 are named beta_ADHD1 and the values from df2 are named beta_ADHD2
new_df <- data.frame(beta_ADHD1 = df1$beta_ADHD, beta_ADHD2 = df2$beta_ADHD)
# plot this data using ggplot
ggplot(new_df, aes(x = beta_ADHD1, y = beta_ADHD2)) + geom_point() +
xlab('beta_ADHD from df1') + ylab('beta_ADHD from df2') +
theme(axis.title.x = element_text(color ='red'), axis.title.y = element_text(color = 'blue'))
I would like to have a separate scale bar for each variable.
I have measurements taken throughout the water column for which the means have been calculated into 50cm bins. I would like to use geom_tile to show the variation of each variable in each bin throughout the water column, so the plot has the variable (categorical) on the x-axis, the depth on the y-axis and a different colour scale for each variable representing the value. I am able to do this for one variable using
ggplot(data, aes(x=var, y=depth, fill=value, color=value)) +
geom_tile(size=0.6)+ theme_classic()+scale_y_continuous(limits = c(0,11), expand = c(0, 0))
But if I put all variables onto one plot, the legend is scaled to the min and max of all values so the variation between bins is lost.
To provide a reproducible example, I have used the mtcars, and I have included alpha = which, of course, doesn't help much because the scale of each variable is so different
data("mtcars")
# STACKS DATA
library(reshape2)
dat2b <- melt(mtcars, id.vars=1:2)
dat2b
ggplot(dat2b) +
geom_tile(aes(x=variable , y=cyl, fill=variable, alpha = value))
Which produces
Is there a way I can add a scale bar for each variable on the plot?
This question is similar to others (e.g. here and here), but they do not use a categorical variable on the x-axis, so I have not been able to modify them to produce the desired plot.
Here is a mock-up of the plot I have in mind using just four of the variables, except I would have all legends horizontal at the bottom of the plot using theme(legend.position="bottom")
Hope this helps:
The function myfun was originally posted by Duck here: R ggplot heatmap with multiple rows having separate legends on the same graph
library(purrr)
library(ggplot2)
library(patchwork)
data("mtcars")
# STACKS DATA
library(reshape2)
dat2b <- melt(mtcars, id.vars=1:2)
dat2b
#Split into list
List <- split(dat2b,dat2b$variable)
#Function for plots
myfun <- function(x)
{
G <- ggplot(x, aes(x=variable, y=cyl, fill = value)) +
geom_tile() +
theme(legend.direction = "vertical", legend.position="bottom")
return(G)
}
#Apply
List2 <- lapply(List,myfun)
#Plot
reduce(List2, `+`)+plot_annotation(title = 'My plot')
patchwork::wrap_plots(List2)
I'm trying to plots insect counts of 2 species in 18 experimental plots onto a single graph. Since the second species population peaks later, it is visually doable (see picture below). I would like the 18 population lines from species 1 to be green (using "Greens" from RColorBrewer) and the 18 of species 2 to be red (using "Reds"). I do realize this may be problematic for a colourblind audience, but that is irrelevant here.
I've read here that it is not possible with standard ggplot2 options: R ggplot two color palette on the same plot but this post is more than two years old.
There is a short of "cheat" for points: Using two scale colour gradients ggplot2 but since I prefer lines to show the population through time, I can't use it.
Are there any new "cheats" available for this?
Or does anyone have another idea to visualize my data in a way that shows population trends through time in all plots and shows the difference in timing of the peak? I've included a picture at the bottom that shows my real data, all in the same colour scale though.
Sample code
# example data frame
plot <- as.factor(rep(c("A","B","C"),each=5))
time <- as.numeric(rep(c(1:5),times=3))
S1 <- c(1,4,7,5,2, 2,8,9,3,1, 1,6,6,3,1)
S2 <- c(0,0,2,3,2, 1,2,1,5,3, 0,1,1,6,7)
df <- data.frame(time, plot, S1, S2)
# example colour scales
S1Colours <- colorRampPalette(brewer.pal(9,"Greens"))(3)
S2Colours <- colorRampPalette(brewer.pal(9,"Reds"))(3)
names(S1Colours) <- levels(df$plot)
names(S2Colours) <- levels(df$plot)
# example plot
ggplot(data=df) +
geom_line(aes(x=time, y=S1, colour=plot)) +
geom_line(aes(x=time, y=S2, colour=plot)) +
scale_colour_manual(name = "plot", values = S1Colours) +
scale_colour_manual(name = "plot", values = S2Colours)
# this gives the note "Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the existing scale."
Plot real data
I also would go by creating a manual color scale for all the combinations.
library(tidyverse)
library(RColorBrewer)
df_long=pivot_longer(df,cols=c(S1,S2),names_to = "Species",values_to = "counts") %>% # create long format and
mutate(plot_Species=paste(plot,Species,sep="_")) # make identifiers for combined plot and Species
#make color palette
mycolors=c(colorRampPalette(brewer.pal(9,"Greens"))(sum(grepl("S1",unique(df_long$plot_Species)))),
colorRampPalette(brewer.pal(9,"Reds"))(sum(grepl("S2",unique(df_long$plot_Species)))))
names(mycolors)=c(grep("S1",unique(df_long$plot_Species),value = T),
grep("S2",unique(df_long$plot_Species),value = T))
# example plot
ggplot(data=df_long) +
geom_line(aes(x=time, y=counts, colour=plot_Species)) +
scale_colour_manual(name = "Species by plot", values = mycolors)
You can do this easily with the ggnewscale package (disclaimer: I'm the author).
This is how you would do it:
library(RColorBrewer)
library(ggplot2)
library(ggnewscale)
plot <- as.factor(rep(c("A","B","C"),each=5))
time <- as.numeric(rep(c(1:5),times=3))
S1 <- c(1,4,7,5,2, 2,8,9,3,1, 1,6,6,3,1)
S2 <- c(0,0,2,3,2, 1,2,1,5,3, 0,1,1,6,7)
df <- data.frame(time, plot, S1, S2)
# example colour scales
S1Colours <- colorRampPalette(brewer.pal(9,"Greens"))(3)
S2Colours <- colorRampPalette(brewer.pal(9,"Reds"))(3)
names(S1Colours) <- levels(df$plot)
names(S2Colours) <- levels(df$plot)
ggplot(data=df) +
geom_line(aes(x=time, y=S1, colour=plot)) +
scale_colour_manual(name = "plot 1", values = S1Colours) +
new_scale_color() +
geom_line(aes(x=time, y=S2, colour=plot)) +
scale_colour_manual(name = "plot 2", values = S2Colours)
Created on 2019-12-19 by the reprex package (v0.3.0)
I am trying to simply add a legend to my Nyquist plot where I am plotting 2 sets of data: 1 is an experimental set (~600 points), and 2 is a data frame calculated using a transfer function (~1000 points)
I need to plot both and label them. Currently I have them both plotted okay but when i try to add the label using scale_colour_manual no label appears. Also a way to move this label around would be appreciated!! Code Below.
pdf("nyq_2elc.pdf")
nq2 <- ggplot() + geom_point(data = treat, aes(treat$V1,treat$V2), color = "red") +
geom_point(data = circuit, aes(circuit$realTF,circuit$V2), color = "blue") +
xlab("Real Z") + ylab("-Imaginary Z") +
scale_colour_manual(name = 'hell0',
values =c('red'='red','blue'='blue'), labels = c('Treatment','EQ')) +
ggtitle("Nyquist Plot and Equivilent Circuit for 2 Electrode Treatment Setup at 0 Minutes") +
xlim(0,700) + ylim(0,700)
print(nq2)
dev.off()
Ggplot works best with long dataframes, so I would combine the datasets like this:
treat$Cat <- "treat"
circuit$Cat <- "circuit"
CombData <- data.frame(rbind(treat, circuit))
ggplot(CombData, aes(x=V1, y=V2, col=Cat))+geom_point()
This should give you the legend you want.
You probably have to change the names/order of the columns of dataframes treat and circuit so they can be combined, but it's hard to tell because you're not giving us a reproducible example.
I have data from 2 populations.
I'd like to get the histogram and density plot of both on the same graphic.
With one color for one population and another color for the other one.
I've tried this (example):
library(ggplot2)
AA <- rnorm(100000, 70,20)
BB <- rnorm(100000,120,20)
valores <- c(AA,BB)
grupo <- c(rep("AA", 100000),c(rep("BB", 100000)))
todo <- data.frame(valores, grupo)
ggplot(todo, aes(x=valores, fill=grupo, color=grupo)) +
geom_histogram(aes(y=..density..), binwidth=3)+ geom_density(aes(color=grupo))
But I'm just getting a graphic with a single line and a single color.
I would like to have different colors for the the two density lines. And if possible the histograms as well.
I've done it with ggplot2 but base R would also be OK.
or I don't know what I've changed and now I get this:
ggplot(todo, aes(x=valores, fill=grupo, color=grupo)) +
geom_histogram( position="identity", binwidth=3, alpha=0.5)+
geom_density(aes(color=grupo))
but the density lines were not plotted.
or even strange things like
I suggest this ggplot2 solution:
ggplot(todo, aes(valores, color=grupo)) +
geom_histogram(position="identity", binwidth=3, aes(y=..density.., fill=grupo), alpha=0.5) +
geom_density()
#skan: Your attempt was close but you plotted the frequencies instead of density values in the histogram.
A base R solution could be:
hist(AA, probability = T, col = rgb(1,0,0,0.5), border = rgb(1,0,0,1),
xlim=range(AA,BB), breaks= 50, ylim=c(0,0.025), main="AA and BB", xlab = "")
hist(BB, probability = T, col = rgb(0,0,1,0.5), border = rgb(0,0,1,1), add=T)
lines(density(AA))
lines(density(BB), lty=2)
For alpha I used rgb. But there are more ways to get it in. See alpha() in the scales package for instance. I added also the breaks parameter for the plot of the AAs to increase the binwidth compared to the BB group.