** Edited with Repeatable Data **
I have a data.frame with plots of growth over time for 50 experimental treatments. I have plotted them as a faceted 5x10 plot grid. I also ordered them in a way that makes sense considering my experimental treatments.
I ran a regression function to find growth rate in each treatment, and saved the slope values in another data frame. I have plotted the data, the regression line, and the value of growth rate, but I want to color the backgrounds of the individual faceted plots according to that regression slope value, but I can't figure out how to set color to call to a continuous variable, and especially one from a different df with a different number of rows (original df has 300 rows, df I want to call has 50 - one for each treatment).
My code is as follows:
Df:
df <- data.frame(matrix(ncol = 3,nrow=300))
colnames(df) <- c("Trt", "Day", "Size")
df$Trt <- rep(1:50, each=6)
df$Day <- rep_len(1:6, length.out=300)
df$Size <- rep_len(c(3,5,8,9,12,12,3,7,10,16,17,20),length.out = 300)
Regression function and output dataframe:
regression=function(df){
reg_fun<-lm(formula=df$Size~df$Day)
slope<-round(coef(reg_fun)[2],3)
intercept<-round(coef(reg_fun)[1],3)
R2<-round(as.numeric(summary(reg_fun)[8]),3)
R2.Adj<-round(as.numeric(summary(reg_fun)[9]),3)
c(slope,intercept,R2,R2.Adj)
}
library(plyr)
slopevalues<-ddply(df,"Trt",regression)
colnames(slopevalues)<-c ("Trt","slope","intercept","R2","R2.Adj")
Plot:
ggplot(data=df, aes(x=Day, y=Size))+
geom_line() +
geom_point() +
xlab("Day") + ylab("Size (μm)")+
geom_smooth(method="lm",size=.5,se=FALSE)+
geom_text(data=slopevalues,
inherit.aes=FALSE,
aes(x =1, y = 16,hjust=0,
label=paste(slope)))+
facet_wrap(~ Trt, nrow=5)
What I want to do is color the backgrounds of the individual graphs according to the slope value (slopevalues$slope) on a gradient. My real data are not just 2 values repeated, so I want to do this on a gradient of colors according to that value.
Any advice welcome.
enter image description here
You can use geom_rect with infinite coordinates to do this:
ggplot(data=df, aes(x=Day, y=Size))+
## This is the only new bit
geom_rect(
aes(fill = slope, xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = Inf),
slopevalues,
inherit.aes = FALSE
) +
## New bit ends here
geom_line() +
geom_point() +
xlab("Day") + ylab("Size (μm)")+
geom_smooth(method="lm",size=.5,se=FALSE)+
geom_text(data=slopevalues,
inherit.aes=FALSE,
aes(x =1, y = 16,hjust=0,
label=paste(slope)))+
facet_wrap(~ Trt, nrow=5)
Related
I have "long" format data frame which contains two columns: first col - values, second col- sex [Male - 1/Female - 2]. I wrote some code to make a histogram of entire dataset (code below).
ggplot(kz6, aes(x = values)) +
geom_histogram()
However, I want also add a density over histogram to emphasize the difference between sexes i.e. I want to combine 3 plots: histogram for entire dataset, and 2 density plots for each sex. I tried to use some examples (one, two, three, four), but it still does not work. Code for density only works, while the combinations of hist + density does not.
density <- ggplot(kz6, aes(x = x, fill = factor(sex))) +
geom_density()
both <- ggplot(kz6, aes(x = values)) +
geom_histogram() +
geom_density()
both_2 <- ggplot(kz6, aes(x = values)) +
geom_histogram() +
geom_density(aes(x = kz6[kz6$sex == 1,]))
P.S. some examples contains y=..density.. what does it mean? How to interpret this?
To plot a histogram and superimpose two densities, defined by a categorical variable, use appropriate aesthetics in the call to geom_density, like group or colour.
ggplot(kz6, aes(x = values)) +
geom_histogram(aes(y = ..density..), bins = 20) +
geom_density(aes(group = sex, colour = sex), adjust = 2)
Data creation code.
I will create a test data set from built-in data set iris.
kz6 <- iris[iris$Species != "virginica", 4:5]
kz6$sex <- "M"
kz6$sex[kz6$Species == "versicolor"] <- "F"
kz6$Species <- NULL
names(kz6)[1] <- "values"
head(kz6)
I have a data frame in this format:
row.names 100 50 25 0
metabolite1 113417.2998 62594.7067 39460.7705 1.223243e+02
metabolite2 3494058.7972 2046871.7446 1261278.2476 6.422864e+03
The columns refer to the concentrations of quality controls (%): 100, 50, 25, 0.
Currently to plot a single graph I am extracting the data into a new data frame and plotting it like this:
metabolite1 <- data.frame(Numbers = c(100,50,25,0), Signal = c(113417.2998,62594.7067,39460.7705,122.3243))
# Extract coefficient of variance for line of best fit
Coef <- coef(lm(Signal ~ Numbers, data = metabolite1))
# plot data
ggplot(metabolite1, aes(x = Numbers, y = Signal)) +
geom_point() +
xlim(0,100) +
geom_abline(intercept = Coef[1], slope = Coef[2])
This is extremely inefficient and I am trying to find a better way to plot separate scatter plots for each row rather than creating separate data frames. What would be a better way to do this? I have 160 metabolites I need to produce graphs for. I have attempted the melt the data frame into the format:
Name variable value
metabolite1 100 113417.2998
metabolite2 100 3494058.7972
metabolite1 50 62594.7067
metabolite2 50 2046871.7446
metabolite1 25 39460.7705
metabolite2 25 1261278.2476
metabolite1 0 1.223243e+02
metabolite2 0 6.422864e+03
and then use ggplot and faceting to plot the data
ggplot(data = df, aes(x = variable, y = value)) +
geom_point() + facet_grid(~ Name)
but the plots produced all have the same y axis scale which is not appropriate for the data I am working with. I'm assuming because of this I cannot use faceting to produce the plots.
EDIT: I do not know how to add separate lines of best fit to each plot without using geom_smooth, which I do not wish to do.
You're on the right track with your method of melting and faceting:
ggplot(data = df, aes(x = variable, y = value)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, lwd = .5, col = "black") +
facet_wrap(~ Name, scales = "free_y")
This yields similar plots as those you get from running ggplot on subsets:
out <- lapply(list(metabolite1, metabolite2), function(d) {
Coef <- coef(lm(Signal ~ Numbers, data = d))
# plot data
p <- ggplot(d, aes(x = Numbers, y = Signal)) +
geom_point() +
xlim(0,100) +
geom_abline(intercept = Coef[1], slope = Coef[2])
})
gridExtra::grid.arrange(out[[1]], out[[2]], nrow = 1)
In the following, by selecting free_y, the maximum values of each scale adjust as expected, however, how can I get the minimum values to also adjust? at the moment, they both start at 0, when I really want the upper facet to start at about 99 and go to 100, and the lower facet to start at around 900 and go to 1000.
library(ggplot2)
n = 100
df = rbind(data.frame(x = 1:n,y = runif(n,min=99,max=100),variable="First"),
data.frame(x = 1:n,y = runif(n,min=900,max=1000),variable="Second"))
ggplot(data=df,aes(x,y,fill=variable)) +
geom_bar(stat='identity') +
facet_grid(variable~.,scales='free')
You could use geom_linerange rather than geom_bar. A general way to do this is to first find the min of y for each value of variable and then merge the minimums with the original data. Code would look like:
library(ggplot2)
min_y <- aggregate(y ~ variable, data=df, min)
sp <- ggplot(data=merge(df, min_y, by="variable", suffixes = c("","min")),
aes(x, colour=variable)) +
geom_linerange(aes(ymin=ymin, ymax=y), size=1.3) +
facet_grid(variable ~ .,scales='free')
plot(sp)
Plot looks like:
I am plotting the results of 50 - 100 experiments.
Each experiment results in a time series.
I can plot a spaghetti plot of all time series, but
what I'd like to have is sort of a density map for the time series plume.
(something similar to the gray shading in the lower panel
in this figure: http://www.ipcc.ch/graphics/ar4-wg1/jpg/fig-6-14.jpg)
I can 'sort of' do this with 2d binning or binhex but the result could be prettier (see example below).
Here is a code that reproduces a plume plot for mock data (uses ggplot2 and reshape2).
# mock data: random walk plus a sinus curve.
# two envelopes for added contrast.
tt=10*sin(c(1:100)/(3*pi))
rr=apply(matrix(rnorm(5000),100,50),2,cumsum) +tt
rr2=apply(matrix(rnorm(5000),100,50),2,cumsum)/1.5 +tt
# stuff data into a dataframe and melt it.
df=data.frame(c(1:100),cbind(rr,rr2) )
names(df)=c("step",paste("ser",c(1:100),sep=""))
dfm=melt(df,id.vars = 1)
# ensemble average
ensemble_av=data.frame(step=df[,1],ensav=apply(df[,-1],1,mean))
ensemble_av$variable=as.factor("Mean")
ggplot(dfm,aes(step,value,group=variable))+
stat_binhex(alpha=0.2) + geom_line(alpha=0.2) +
geom_line(data=ensemble_av,aes(step,ensav,size=2))+
theme(legend.position="none")
Does anyone know of a nice way do get a shaded envelope with gradients. I have also tried geom_ribbon but that did not give any indication of density changes along the plume. binhex does that, but not with aesthetically pleasing results.
Compute quantiles:
qs = data.frame(
do.call(
rbind,
tapply(
dfm$value, dfm$step, function(i){quantile(i)})),
t=1:100)
head(qs)
X0. X25. X50. X75. X100. t
1 -0.8514179 0.4197579 0.7681517 1.396382 2.883903 1
2 -0.6506662 1.2019163 1.6889073 2.480807 5.614209 2
3 -0.3182652 2.0480082 2.6206045 4.205954 6.485394 3
4 -0.1357976 2.8956990 4.2082762 5.138747 8.860838 4
5 0.8988975 3.5289219 5.0621513 6.075937 10.253379 5
6 2.0027973 4.5398120 5.9713921 7.015491 11.494183 6
Plot ribbons:
ggplot() +
geom_ribbon(data=qs, aes(x=t, ymin=X0., ymax=X100.),fill="gray30", alpha=0.2) +
geom_ribbon(data=qs, aes(x=t, ymin=X25., ymax=X75.),fill="gray30", alpha=0.2)
This is for two quantile intervals, (0-100) and (25-75). You'll need more args to quantile and more ribbon layers for more quantiles, and need to adjust the colours too.
Based on the idea of Spacedman, I found a way to add more intervals in an automatic way: I first compute the quantiles for each step, group them by pairs of symmetric values and then use geom_ribbon in the right order...
library(tidyr)
library(dplyr)
condquant <- dfm %>% group_by(step) %>%
do(quant = quantile(.$value, probs = seq(0,1,.05)), probs = seq(0,1,.05)) %>%
unnest() %>%
mutate(delta = 2*round(abs(.5-probs)*100)) %>%
group_by(step, delta) %>%
summarize(quantmin = min(quant), quantmax= max(quant))
ggplot() +
geom_ribbon(data = condquant, aes(x = step, ymin = quantmin, ymax = quantmax,
group = reorder(delta, -delta), fill = as.numeric(delta)),
alpha = .5) +
scale_fill_gradient(low = "grey10", high = "grey95") +
geom_line(data = dfm, aes(x = step, y = value, group=variable), alpha=0.2) +
geom_line(data=ensemble_av,aes(step,ensav),size=2)+
theme(legend.position="none")
Thanks Erwan and Spacedman.
Avoiding 'tidyr' ('dplyr' and 'magrittr') my version of Erwans answer becomes
probs=c(0:10)/10 # use fewer quantiles than Erwan
arr=t(apply(df[,-1],1,quantile,prob=probs))
dfq=data.frame(step=df[,1],arr)
names(dfq)=c("step",colnames(arr))
dfqm=melt(dfq,id.vars=c(1))
# add inter-quantile (per) range as delta
dfqm$delta=dfqm$variable
levels(dfqm$delta)=abs(probs-rev(probs))*100
dfplot=ddply(dfqm,.(step,delta),summarize,
quantmin=min(value),
quantmax=max(value) )
ggplot() +
geom_ribbon(data = dfplot, aes(x = step, ymin = quantmin,
ymax =quantmax,group=rev(delta),
fill = as.numeric(delta)),
alpha = .5) +
scale_fill_gradient(low = "grey25", high = "grey75") +
geom_line(data=ensemble_av,aes(step,ensav),size=2) +
theme(legend.position="none")
So, I have a fairly large dataset (Dropbox: csv file) that I'm trying to plot using geom_boxplot. The following produces what appears to be a reasonable plot:
require(reshape2)
require(ggplot2)
require(scales)
require(grid)
require(gridExtra)
df <- read.csv("\\Downloads\\boxplot.csv", na.strings = "*")
df$year <- factor(df$year, levels = c(2010,2011,2012,2013,2014), labels = c(2010,2011,2012,2013,2014))
d <- ggplot(data = df, aes(x = year, y = value)) +
geom_boxplot(aes(fill = station)) +
facet_grid(station~.) +
scale_y_continuous(limits = c(0, 15)) +
theme(legend.position = "none"))
d
However, when you dig a little deeper, problems creep in that freak me out. When I labeled the boxplot medians with their values, the following plot results.
df.m <- aggregate(value~year+station, data = df, FUN = function(x) median(x))
d <- d + geom_text(data = df.m, aes(x = year, y = value, label = value))
d
The medians plotted by geom_boxplot aren't at the medians at all. The labels are plotted at the correct y-axis value, but the middle hinge of the boxplots are definitely not at the medians. I've been stumped by this for a few days now.
What is the reason for this? How can this type of display be produced with correct medians? How can this plot be debugged or diagnosed?
The solution to this question is in the application of scale_y_continuous. ggplot2 will perform operations in the following order:
Scale Transformations
Statistical Computations
Coordinate Transformations
In this case, because a scale transformation is invoked, ggplot2 excludes data outside the scale limits for the statistical computation of the boxplot hinges. The medians calculated by the aggregate function and used in the geom_text instruction will use the entire dataset, however. This can result in different median hinges and text labels.
The solution is to omit the scale_y_continuous instruction and instead use:
d <- ggplot(data = df, aes(x = year, y = value)) +
geom_boxplot(aes(fill = station)) +
facet_grid(station~.) +
theme(legend.position = "none")) +
coord_cartesian(y = c(0,15))
This allows ggplot2 to calculate the boxplot hinge stats using the entire dataset, while limiting the plot size of the figure.