Plot each row of data frame as separate graph in R - r

I have a data frame in this format:
row.names 100 50 25 0
metabolite1 113417.2998 62594.7067 39460.7705 1.223243e+02
metabolite2 3494058.7972 2046871.7446 1261278.2476 6.422864e+03
The columns refer to the concentrations of quality controls (%): 100, 50, 25, 0.
Currently to plot a single graph I am extracting the data into a new data frame and plotting it like this:
metabolite1 <- data.frame(Numbers = c(100,50,25,0), Signal = c(113417.2998,62594.7067,39460.7705,122.3243))
# Extract coefficient of variance for line of best fit
Coef <- coef(lm(Signal ~ Numbers, data = metabolite1))
# plot data
ggplot(metabolite1, aes(x = Numbers, y = Signal)) +
geom_point() +
xlim(0,100) +
geom_abline(intercept = Coef[1], slope = Coef[2])
This is extremely inefficient and I am trying to find a better way to plot separate scatter plots for each row rather than creating separate data frames. What would be a better way to do this? I have 160 metabolites I need to produce graphs for. I have attempted the melt the data frame into the format:
Name variable value
metabolite1 100 113417.2998
metabolite2 100 3494058.7972
metabolite1 50 62594.7067
metabolite2 50 2046871.7446
metabolite1 25 39460.7705
metabolite2 25 1261278.2476
metabolite1 0 1.223243e+02
metabolite2 0 6.422864e+03
and then use ggplot and faceting to plot the data
ggplot(data = df, aes(x = variable, y = value)) +
geom_point() + facet_grid(~ Name)
but the plots produced all have the same y axis scale which is not appropriate for the data I am working with. I'm assuming because of this I cannot use faceting to produce the plots.
EDIT: I do not know how to add separate lines of best fit to each plot without using geom_smooth, which I do not wish to do.

You're on the right track with your method of melting and faceting:
ggplot(data = df, aes(x = variable, y = value)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, lwd = .5, col = "black") +
facet_wrap(~ Name, scales = "free_y")
This yields similar plots as those you get from running ggplot on subsets:
out <- lapply(list(metabolite1, metabolite2), function(d) {
Coef <- coef(lm(Signal ~ Numbers, data = d))
# plot data
p <- ggplot(d, aes(x = Numbers, y = Signal)) +
geom_point() +
xlim(0,100) +
geom_abline(intercept = Coef[1], slope = Coef[2])
})
gridExtra::grid.arrange(out[[1]], out[[2]], nrow = 1)

Related

how to input data from multiple columns in x and y arguments in ggplot

I am trying to create a density plot for particle size data. My data has multiple density and size readings for each genotype set. Is there a way to specify multiple columns into x and y using ggplot? I tried coding for this but am only getting a blank plot as of now. This is the link to the csv file I used: https://drive.google.com/file/d/11djXTmZliPCGLCZavukjb0TT28HsKMRQ/view?usp=sharing
Thanks!
crop.data6 <- read.csv("barleygt25.csv", header = TRUE)
crop.data6
library(ggplot2)
plot1 = ggplot(data=crop.data6, aes(x=, xend=bq, y=a, yend=bq, color=genotype))
plot1
Your data is in a strange format that doesn't lend itself well to plotting. Effectively, it needs to be transposed then pivoted into long format to make it suitable for plotting:
df <- data.frame(xvals = c(t(crop.data6[1:9, -c(1:2)])),
yvals = c(t(crop.data6[10:18, -c(1:2)])),
genotype = rep(crop.data6$genotype[1:9], each = 68))
ggplot(df, aes(xvals, yvals, color = genotype)) +
geom_line(size = 1) +
scale_color_brewer(palette = "Set1") +
theme_bw(base_size = 16) +
labs(x = "value", y = "density")

Density over histogram using ggplot2

I have "long" format data frame which contains two columns: first col - values, second col- sex [Male - 1/Female - 2]. I wrote some code to make a histogram of entire dataset (code below).
ggplot(kz6, aes(x = values)) +
geom_histogram()
However, I want also add a density over histogram to emphasize the difference between sexes i.e. I want to combine 3 plots: histogram for entire dataset, and 2 density plots for each sex. I tried to use some examples (one, two, three, four), but it still does not work. Code for density only works, while the combinations of hist + density does not.
density <- ggplot(kz6, aes(x = x, fill = factor(sex))) +
geom_density()
both <- ggplot(kz6, aes(x = values)) +
geom_histogram() +
geom_density()
both_2 <- ggplot(kz6, aes(x = values)) +
geom_histogram() +
geom_density(aes(x = kz6[kz6$sex == 1,]))
P.S. some examples contains y=..density.. what does it mean? How to interpret this?
To plot a histogram and superimpose two densities, defined by a categorical variable, use appropriate aesthetics in the call to geom_density, like group or colour.
ggplot(kz6, aes(x = values)) +
geom_histogram(aes(y = ..density..), bins = 20) +
geom_density(aes(group = sex, colour = sex), adjust = 2)
Data creation code.
I will create a test data set from built-in data set iris.
kz6 <- iris[iris$Species != "virginica", 4:5]
kz6$sex <- "M"
kz6$sex[kz6$Species == "versicolor"] <- "F"
kz6$Species <- NULL
names(kz6)[1] <- "values"
head(kz6)

annotate r squared to ggplot by using facet_wrap

I just joined the community and looking forward to get some help for the data analysis for my master thesis.
At the moment I have the following problem:
I plotted 42 varieties with ggplot by using facet_wrap:
`ggplot(sumfvvar,aes(x=TemperaturCmean,y=Fv.Fm,col=treatment))+
geom_point(shape=1,size=1)+
geom_smooth(method=lm)+
scale_color_brewer(palette = "Set1")+
facet_wrap(.~Variety)`
That works very well, but I would like to annotate the r squared values for the regression lines. I have two treatments and 42 varieties, therefore 84 regression lines.
Are there any possibilties to calculate all r squared values and integrate them into the ggplot? I found allready the function
ggplotRegression <- function (fit) {
require(ggplot2)
ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) +
geom_point() +
stat_smooth(method = "lm") +
labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
"Intercept =",signif(fit$coef[[1]],5 ),
" Slope =",signif(fit$coef[[2]], 5),
" P =",signif(summary(fit)$coef[2,4], 5)))
}
but that works just for one variety and one treatment. Could be a loop for the lm() function an option?
Here is an example with the ggpmisc package:
library(ggpmisc)
set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x = x,
y = y,
group = c("A", "B"))
formula <- y ~ poly(x, 1, raw = TRUE)
ggplot(my.data, aes(x, y)) +
facet_wrap(~ group) +
geom_point() +
geom_smooth(method = "lm", formula = formula) +
stat_poly_eq(formula = formula, parse = TRUE,
mapping = aes(label = stat(rr.label)))
You can't apply different labels to different facet, unless you add another r^2 column to your data.. One way is to use geom_text, but you need to calculate the stats you need first. Below I show an example with iris, and for your case, just change Species for Variety, and so on
library(tidyverse)
# simulate data for 2 treatments
# d2 is just shifted up from d1
d1 <- data.frame(iris,Treatment="A")
d2 <- data.frame(iris,Treatment="B") %>%
mutate(Sepal.Length=Sepal.Length+rnorm(nrow(iris),1,0.5))
# combine datasets
DF <- rbind(d1,d2) %>% rename(Variety = Species)
# plot like you did
# note I use "free" scales, if scales very different between Species
# your facet plots will be squished
g <- ggplot(DF,aes(x=Sepal.Width,y=Sepal.Length,col=Treatment))+
geom_point(shape=1,size=1)+
geom_smooth(method=lm)+
scale_color_brewer(palette = "Set1")+
facet_wrap(.~Variety,scales="free")
# rsq function
RSQ = function(y,x){signif(summary(lm(y ~ x))$adj.r.squared, 3)}
#calculate rsq for variety + treatment
STATS <- DF %>%
group_by(Variety,Treatment) %>%
summarise(Rsq=RSQ(Sepal.Length,Sepal.Width)) %>%
# make a label
# one other option is to use stringr::str_wrap in geom_text
mutate(Label=paste("Treat",Treatment,", Rsq=",Rsq))
# set vertical position of rsq
VJUST = ifelse(STATS$Treatment=="A",1.5,3)
# finally the plot function
g + geom_text(data=STATS,aes(x=-Inf,y=+Inf,label=Label),
hjust = -0.1, vjust = VJUST,size=3)
For the last geom_text() call, I allowed the y coordinates of the text to be different by multiplying the Treatment.. You might need to adjust that depending on your plot..

Color facets in ggplot by continuous variable

** Edited with Repeatable Data **
I have a data.frame with plots of growth over time for 50 experimental treatments. I have plotted them as a faceted 5x10 plot grid. I also ordered them in a way that makes sense considering my experimental treatments.
I ran a regression function to find growth rate in each treatment, and saved the slope values in another data frame. I have plotted the data, the regression line, and the value of growth rate, but I want to color the backgrounds of the individual faceted plots according to that regression slope value, but I can't figure out how to set color to call to a continuous variable, and especially one from a different df with a different number of rows (original df has 300 rows, df I want to call has 50 - one for each treatment).
My code is as follows:
Df:
df <- data.frame(matrix(ncol = 3,nrow=300))
colnames(df) <- c("Trt", "Day", "Size")
df$Trt <- rep(1:50, each=6)
df$Day <- rep_len(1:6, length.out=300)
df$Size <- rep_len(c(3,5,8,9,12,12,3,7,10,16,17,20),length.out = 300)
Regression function and output dataframe:
regression=function(df){
reg_fun<-lm(formula=df$Size~df$Day)
slope<-round(coef(reg_fun)[2],3)
intercept<-round(coef(reg_fun)[1],3)
R2<-round(as.numeric(summary(reg_fun)[8]),3)
R2.Adj<-round(as.numeric(summary(reg_fun)[9]),3)
c(slope,intercept,R2,R2.Adj)
}
library(plyr)
slopevalues<-ddply(df,"Trt",regression)
colnames(slopevalues)<-c ("Trt","slope","intercept","R2","R2.Adj")
Plot:
ggplot(data=df, aes(x=Day, y=Size))+
geom_line() +
geom_point() +
xlab("Day") + ylab("Size (μm)")+
geom_smooth(method="lm",size=.5,se=FALSE)+
geom_text(data=slopevalues,
inherit.aes=FALSE,
aes(x =1, y = 16,hjust=0,
label=paste(slope)))+
facet_wrap(~ Trt, nrow=5)
What I want to do is color the backgrounds of the individual graphs according to the slope value (slopevalues$slope) on a gradient. My real data are not just 2 values repeated, so I want to do this on a gradient of colors according to that value.
Any advice welcome.
enter image description here
You can use geom_rect with infinite coordinates to do this:
ggplot(data=df, aes(x=Day, y=Size))+
## This is the only new bit
geom_rect(
aes(fill = slope, xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = Inf),
slopevalues,
inherit.aes = FALSE
) +
## New bit ends here
geom_line() +
geom_point() +
xlab("Day") + ylab("Size (μm)")+
geom_smooth(method="lm",size=.5,se=FALSE)+
geom_text(data=slopevalues,
inherit.aes=FALSE,
aes(x =1, y = 16,hjust=0,
label=paste(slope)))+
facet_wrap(~ Trt, nrow=5)

ggplot2 shading envelope of time series

I am plotting the results of 50 - 100 experiments.
Each experiment results in a time series.
I can plot a spaghetti plot of all time series, but
what I'd like to have is sort of a density map for the time series plume.
(something similar to the gray shading in the lower panel
in this figure: http://www.ipcc.ch/graphics/ar4-wg1/jpg/fig-6-14.jpg)
I can 'sort of' do this with 2d binning or binhex but the result could be prettier (see example below).
Here is a code that reproduces a plume plot for mock data (uses ggplot2 and reshape2).
# mock data: random walk plus a sinus curve.
# two envelopes for added contrast.
tt=10*sin(c(1:100)/(3*pi))
rr=apply(matrix(rnorm(5000),100,50),2,cumsum) +tt
rr2=apply(matrix(rnorm(5000),100,50),2,cumsum)/1.5 +tt
# stuff data into a dataframe and melt it.
df=data.frame(c(1:100),cbind(rr,rr2) )
names(df)=c("step",paste("ser",c(1:100),sep=""))
dfm=melt(df,id.vars = 1)
# ensemble average
ensemble_av=data.frame(step=df[,1],ensav=apply(df[,-1],1,mean))
ensemble_av$variable=as.factor("Mean")
ggplot(dfm,aes(step,value,group=variable))+
stat_binhex(alpha=0.2) + geom_line(alpha=0.2) +
geom_line(data=ensemble_av,aes(step,ensav,size=2))+
theme(legend.position="none")
Does anyone know of a nice way do get a shaded envelope with gradients. I have also tried geom_ribbon but that did not give any indication of density changes along the plume. binhex does that, but not with aesthetically pleasing results.
Compute quantiles:
qs = data.frame(
do.call(
rbind,
tapply(
dfm$value, dfm$step, function(i){quantile(i)})),
t=1:100)
head(qs)
X0. X25. X50. X75. X100. t
1 -0.8514179 0.4197579 0.7681517 1.396382 2.883903 1
2 -0.6506662 1.2019163 1.6889073 2.480807 5.614209 2
3 -0.3182652 2.0480082 2.6206045 4.205954 6.485394 3
4 -0.1357976 2.8956990 4.2082762 5.138747 8.860838 4
5 0.8988975 3.5289219 5.0621513 6.075937 10.253379 5
6 2.0027973 4.5398120 5.9713921 7.015491 11.494183 6
Plot ribbons:
ggplot() +
geom_ribbon(data=qs, aes(x=t, ymin=X0., ymax=X100.),fill="gray30", alpha=0.2) +
geom_ribbon(data=qs, aes(x=t, ymin=X25., ymax=X75.),fill="gray30", alpha=0.2)
This is for two quantile intervals, (0-100) and (25-75). You'll need more args to quantile and more ribbon layers for more quantiles, and need to adjust the colours too.
Based on the idea of Spacedman, I found a way to add more intervals in an automatic way: I first compute the quantiles for each step, group them by pairs of symmetric values and then use geom_ribbon in the right order...
library(tidyr)
library(dplyr)
condquant <- dfm %>% group_by(step) %>%
do(quant = quantile(.$value, probs = seq(0,1,.05)), probs = seq(0,1,.05)) %>%
unnest() %>%
mutate(delta = 2*round(abs(.5-probs)*100)) %>%
group_by(step, delta) %>%
summarize(quantmin = min(quant), quantmax= max(quant))
ggplot() +
geom_ribbon(data = condquant, aes(x = step, ymin = quantmin, ymax = quantmax,
group = reorder(delta, -delta), fill = as.numeric(delta)),
alpha = .5) +
scale_fill_gradient(low = "grey10", high = "grey95") +
geom_line(data = dfm, aes(x = step, y = value, group=variable), alpha=0.2) +
geom_line(data=ensemble_av,aes(step,ensav),size=2)+
theme(legend.position="none")
Thanks Erwan and Spacedman.
Avoiding 'tidyr' ('dplyr' and 'magrittr') my version of Erwans answer becomes
probs=c(0:10)/10 # use fewer quantiles than Erwan
arr=t(apply(df[,-1],1,quantile,prob=probs))
dfq=data.frame(step=df[,1],arr)
names(dfq)=c("step",colnames(arr))
dfqm=melt(dfq,id.vars=c(1))
# add inter-quantile (per) range as delta
dfqm$delta=dfqm$variable
levels(dfqm$delta)=abs(probs-rev(probs))*100
dfplot=ddply(dfqm,.(step,delta),summarize,
quantmin=min(value),
quantmax=max(value) )
ggplot() +
geom_ribbon(data = dfplot, aes(x = step, ymin = quantmin,
ymax =quantmax,group=rev(delta),
fill = as.numeric(delta)),
alpha = .5) +
scale_fill_gradient(low = "grey25", high = "grey75") +
geom_line(data=ensemble_av,aes(step,ensav),size=2) +
theme(legend.position="none")

Resources