Convert rarefaction plots from Vegan to ggplot2 in R? - r

Hi I am running species estimator calculations in the package 'vegan'.
The code I'm running is very simple:
library(vegan)
data(BCI)
p<-poolaccum(BCI, permutations = 50)
p.plot<-plot(p, display = c("chao", "jack1", "jack2"))
The object p.plot is a trellis type object. So I was not able to convert it to a dataframe to for ggplot. The reason why I want to be able to use ggplot is because I want all the estimator curves to be on the same graph with labels. I'm also doing these plots for other datasets and I want to consolidate space as much as possible.
Any help would be great! Thank you

summary(p) can help you get input data for ggplot2. I demonstrate Chao plot here:
library(ggplot2)
library(reshape2)
chao <- data.frame(summary(p)$chao,check.names = FALSE)
colnames(chao) <- c("N", "Chao", "lower2.5", "higher97.5", "std")
chao_melt <- melt(chao, id.vars = c("N","std"))
ggplot(data = chao_melt, aes(x = N, y = value, group = variable)) +
geom_line(aes(color = variable))
p is what you got in p<-poolaccum(BCI, permutations = 50) The output is like this, you can make some adjustment for multiple plots and theme.

Related

How to incorporate data into plot which was constructed in ggplot2 using data from another file (R)?

Using a dataset, I have created the following plot:
I'm trying to create the following plot:
Specifically, I am trying to incorporate Twitter names over the first image. To do this, I have a dataset with each name in and a value that corresponds to a point on the axes. A snippet looks something like:
Name Score
#tedcruz 0.108
#RealBenCarson 0.119
Does anyone know how I can plot this data (from one CSV file) over my original graph (which is constructed from data in a different CSV file)? The reason that I am confused is because in ggplot2, you specify the data you want to use at the start, so I am not sure how to incorporate other data.
Thank you.
The question you ask about ggplot combining source of data to plot different element is answered in this post here
Now, I don't know for sure how this is going to apply to your specific data. Here I want to show you an example that might help you to go forward.
Imagine we have two data.frames (see bellow) and we want to obtain a plot similar to the one you presented.
data1 <- data.frame(list(
x=seq(-4, 4, 0.1),
y=dnorm(x = seq(-4, 4, 0.1))))
data2 <- data.frame(list(
"name"=c("name1", "name2"),
"Score" = c(-1, 1)))
The first step is to find the "y" coordinates of the names in the second data.frame (data2). To do this I added a y column to data2. y is defined here as a range of points from the may value of y to the min value of y with some space for aesthetics.
range_y = max(data1$y) - min(data1$y)
space_y = range_y * 0.05
data2$y <- seq(from = max(data1$y)-space, to = min(data1$y)+space, length.out = nrow(data2))
Then we can use ggplot() to plot data1 and data2 following some plot designs. For the current example I did this:
library(ggplot2)
p <- ggplot(data=data1, aes(x=x, y=y)) +
geom_point() + # for the data1 just plot the points
geom_pointrange(data=data2, aes(x=Score, y=y, xmin=Score-0.5, xmax=Score+0.5)) +
geom_text(data = data2, aes(x = Score, y = y+(range_y*0.05), label=name))
p
which gave this following plot:

Plot Integral trace in R

I would like to automate an analysis I have been doing with Graphpad Prism with R, but apparently it is harder than I thought.
I have Voltage~Time data that I would like to integrate and plot. In Graphpad Prism, this is performed by Analysis -> Integrate -> Create the Integral.
Here blow I plot the data in Prism and I plot the trace that I got from the Plot Integral command.
How can I do that with R?
The data I used are similar to these:
Time <- seq(1,100,1)
Voltage <- sample(1:1000,100, replace = F)
I tried integrate(), but that requires a function to integrate, which I do not have, and gives me just a number.
I tried approxfun() and I could create a function of my data but again, as soon as I apply 'integrate()' I only got a single value.
Do you have any ideas on what the Graphpad Prism function does and how I can translate that to R?
Thank you for the help!
With discrete values you can use cumsum:
set.seed(1)
Time <- seq(1,100,1)
Voltage <- sample(1:1000,100, replace = F)
df = data.frame(Time, Voltage)
library(ggplot2)
p1 <- ggplot(data = df)+
geom_line(aes(x = Time, y = Voltage))
p2 <- ggplot(data = df)+
geom_line(aes(x = Time, y = cumsum(Voltage)))
library(gridExtra)
grid.arrange(p1, p2)][1]][1]
For unevenly spaced time values, you would want to calculate:
cumsum(df$Voltage[1:(nrow(df)-1)]) * diff(df$Time)

Correlation matrix plot with ggplot2

I want to create a correlation matrix plot, i.e. a plot where each variable is plotted in a scatterplot against each other variable like with pairs() or splom(). I want to do this with ggplot2. See here for examples. The link mentions some code someone wrote for doing this in ggplot2, however, it is outdated and no longer works (even after you swap out the deprecated parts).
One could do this with a loop in a loop and then multiplot(), but there must be a better way. I tried melting the dataset to long, and copying the value and variable variables and then using facets. This almost gives you something correct.
d = data.frame(x1=rnorm(100),
x2=rnorm(100),
x3=rnorm(100),
x4=rnorm(100),
x5=rnorm(100))
library(reshape2)
d = melt(d)
d$value2 = d$value
d$variable2 = d$variable
library(ggplot2)
ggplot(data=d, aes(x=value, y=value2)) +
geom_point() +
facet_grid(variable ~ variable2)
This gets the general structure right, but only works for the plotting each variable against itself. Is there some more clever way of doing this without resorting to 2 loops?
library(GGally)
set.seed(42)
d = data.frame(x1=rnorm(100),
x2=rnorm(100),
x3=rnorm(100),
x4=rnorm(100),
x5=rnorm(100))
# estimated density in diagonal
ggpairs(d)
# blank
ggpairs(d, diag = list("continuous"="blank")
Using PerformanceAnalytics library :
library("PerformanceAnalytics")
chart.Correlation(df, histogram = T, pch= 19)

Combining two ecdf plots with different

At the moment I`m writing my bachelor thesis and all of my plots are created with ggplot2. Now I need a plot of two ecdfs but my problem is that the two dataframes have different lengths. But by adding values to equalize the length I would change the distribution, therefore my first thought isn't possible. But a ecdf plot with two different dataframes with a different length is forbidden.
daten <- peptidPSMotherExplained[peptidPSMotherExplained$V3!=-1,]
daten <- cbind ( daten , "scoreDistance"= daten$V2-daten$V3 )
daten2 <- peptidPSMotherExplained2[peptidPSMotherExplained2$V3!=-1,]
daten2 <- cbind ( daten2 , "scoreDistance"= daten2$V2-daten2$V3 )
p <- ggplot(daten, aes(x = scoreDistance)) + stat_ecdf()
p <- p + geom_point(aes(x = daten2$lengthDistance))
p
with the normal plot function of R it is possible
plot(ecdf(daten$scoreDistance))
plot(ecdf(daten2$scoreDistance),add=TRUE)
but it looks different to all of my other plots and I dislike this.
Has anybody a solution for me?
Thank you,
Tobias
Example:
df <-data.frame(scoreDifference = rnorm(10,0,12))
df2 <- data.frame(scoreDifference = rnorm(5,-3,9))
plot(ecdf(df$scoreDifference))
plot(ecdf(df2$scoreDifference),add=TRUE)
So how can I achieve this kind of plot in ggplot?
I don't know what geom one should use for such plots, but for combining two datasets you can simply specify the data in a new layer,
ggplot(df, aes(x = scoreDifference)) +
stat_ecdf(geom = "point") +
stat_ecdf(data=df2, geom = "point")
I think, reshaping your data in the right way will probably make ggplot2 work for you:
df <-data.frame(scoreDiff1 = rnorm(10,0,12))
df2 <- data.frame(scoreDiff2 = rnorm(5,-3,9))
library('reshape2')
data <- merge(melt(df),melt(df2),all=TRUE)
Then, with data in the right shape, you can simply go on to plot the stuff with colour (or shape, or whatever you wish) to distinguish the two datasets:
p <- ggplot(daten, aes(x = value, colour = variable)) + stat_ecdf()
Hope this is what you were looking for!?

Density plots with multiple groups

I am trying to produce something similar to densityplot() from the lattice package, using ggplot2 after using multiple imputation with the mice package. Here is a reproducible example:
require(mice)
dt <- nhanes
impute <- mice(dt, seed = 23109)
x11()
densityplot(impute)
Which produces:
I would like to have some more control over the output (and I am also using this as a learning exercise for ggplot). So, for the bmi variable, I tried this:
bar <- NULL
for (i in 1:impute$m) {
foo <- complete(impute,i)
foo$imp <- rep(i,nrow(foo))
foo$col <- rep("#000000",nrow(foo))
bar <- rbind(bar,foo)
}
imp <-rep(0,nrow(impute$data))
col <- rep("#D55E00", nrow(impute$data))
bar <- rbind(bar,cbind(impute$data,imp,col))
bar$imp <- as.factor(bar$imp)
x11()
ggplot(bar, aes(x=bmi, group=imp, colour=col)) + geom_density()
+ scale_fill_manual(labels=c("Observed", "Imputed"))
which produces this:
So there are several problems with it:
The colours are wrong. It seems my attempt to control the colours is completely wrong/ignored
There are unwanted horizontal and vertical lines
I would like the legend to show Imputed and Observed but my code gives the error invalid argument to unary operator
Moreover, it seems like quite a lot of work to do what is accomplished in one line with densityplot(impute) - so I wondered if I might be going about this in the wrong way entirely ?
Edit: I should add the fourth problem, as noted by #ROLO:
.4. The range of the plots seems to be incorrect.
The reason it is more complicated using ggplot2 is that you are using densityplot from the mice package (mice::densityplot.mids to be precise - check out its code), not from lattice itself. This function has all the functionality for plotting mids result classes from mice built in. If you would try the same using lattice::densityplot, you would find it to be at least as much work as using ggplot2.
But without further ado, here is how to do it with ggplot2:
require(reshape2)
# Obtain the imputed data, together with the original data
imp <- complete(impute,"long", include=TRUE)
# Melt into long format
imp <- melt(imp, c(".imp",".id","age"))
# Add a variable for the plot legend
imp$Imputed<-ifelse(imp$".imp"==0,"Observed","Imputed")
# Plot. Be sure to use stat_density instead of geom_density in order
# to prevent what you call "unwanted horizontal and vertical lines"
ggplot(imp, aes(x=value, group=.imp, colour=Imputed)) +
stat_density(geom = "path",position = "identity") +
facet_wrap(~variable, ncol=2, scales="free")
But as you can see the ranges of these plots are smaller than those from densityplot. This behaviour should be controlled by parameter trim of stat_density, but this seems not to work. After fixing the code of stat_density I got the following plot:
Still not exactly the same as the densityplot original, but much closer.
Edit: for a true fix we'll need to wait for the next major version of ggplot2, see github.
You can ask Hadley to add a fortify method for this mids class. E.g.
fortify.mids <- function(x){
imps <- do.call(rbind, lapply(seq_len(x$m), function(i){
data.frame(complete(x, i), Imputation = i, Imputed = "Imputed")
}))
orig <- cbind(x$data, Imputation = NA, Imputed = "Observed")
rbind(imps, orig)
}
ggplot 'fortifies' non-data.frame objects prior to plotting
ggplot(fortify.mids(impute), aes(x = bmi, colour = Imputed,
group = Imputation)) +
geom_density() +
scale_colour_manual(values = c(Imputed = "#000000", Observed = "#D55E00"))
note that each ends with a '+'. Otherwise the command is expected to be complete. This is why the legend did not change. And the line starting with a '+' resulted in the error.
You can melt the result of fortify.mids to plot all variables in one graph
library(reshape)
Molten <- melt(fortify.mids(impute), id.vars = c("Imputation", "Imputed"))
ggplot(Molten, aes(x = value, colour = Imputed, group = Imputation)) +
geom_density() +
scale_colour_manual(values = c(Imputed = "#000000", Observed = "#D55E00")) +
facet_wrap(~variable, scales = "free")

Resources