ggplot2 - plot multiple models on the same plot - r

I have a list of linear and non-linear models derived from different data sets measuring the same two variables x and y that I would like to plot on the same plot using stat_smooth. This is to be able to easily compare the shape of the relationship between x and y across datasets.
I'm trying to figure out the most effective way to do this. Right now I am considering creating an empty ggplot object and then using some kind of loop or lapply to add sequentially to that object, but this is proving more difficult than I thought. Of course it would be easiest to simply supply the models to ggplot but as far as I know, this is not possible. Any thoughts?
Here is a simple example data set to play with using just two models, one linear and one exponential:
df1=data.frame(x=rnorm(10),y=rnorm(10))
df2=data.frame(x=rnorm(15),y=rnorm(15))
df.list=list(lm(y~x,df1),nls(y~exp(a+b*x),start=list(a=1,b=1),df2))
And two separate example plots:
ggplot(df1,aes(x,y))+stat_smooth(method=lm,se=F)
ggplot(df2,aes(x,y))+stat_smooth(method=nls,formula=y~exp(a+b*x),start=list(a=1,b=1),se=F)

EDIT: Note that the OP changed the question after this answer was posted
Combine the data into a single data frame, with a new column indicating the model, then use ggplot to distinguish between the models:
df1=data.frame(x=rnorm(10),y=rnorm(10))
df2=data.frame(x=rnorm(10),y=rnorm(10))
df1$model <- "A"
df2$model <- "B"
dfc <- rbind(df1, df2)
library(ggplot2)
ggplot(dfc, aes(x, y, group=model)) + geom_point() + stat_smooth(aes(col=model))
This produces:

I think the answer here is to get a common range of X and Y you want to run this over, and go from there. You can pull out a curve from each model using predict, and add on layers to a ggplot using l_ply.
d
f1=data.frame(x=rnorm(10),y=rnorm(10))
df2=data.frame(x=rnorm(15),y=rnorm(15))
df.list=list(lm(y~x,df1),nls(y~exp(a+b*x),start=list(a=1,b=1),df2))
a<-ggplot()
#get the range of x you want to look at
x<-seq(min(c(df1$x, df2$x)), max(c(df1$x, df2$x)), .01)
#use l_ply to keep adding layers
l_ply(df.list, function(amod){
#a data frame for predictors and response
ndf <- data.frame(x=x)
#get the response using predict - you can even get a CI here
ndf$y <- predict(amod, ndf)
#now add this new layer to the plot
a<<- a+geom_line(ndf, mapping=(aes(x=x, y=y)))
} )
a
OR, if you want to have a nice color key with model number or something:
names(df.list) <- 1:length(df.list)
modFits <- ldply(df.list, function(amod){
ndf <- data.frame(x=x)
#get the response using predict - you can even get a CI here
ndf$y <- predict(amod, ndf)
ndf
})
qplot(x, y, geom="line", colour=.id, data=modFits)

Related

Why scatter plots in ggpairs function don't have the loess layer on them?

I have a quick question, and can't figure out what the problem is. I wanted to plot a dataset I have, and found one solution here:
How to use loess method in GGally::ggpairs using wrap function
However, I can't seem to figure out what was wrong with my approach. Here is the code chunk below with simple mtcars dataset:
library(ggplot2)
library(GGally)
View(mtcars)
GGally::ggpairs(mtcars,
lower= list(
ggplot(mapping = aes(rownames(mtcars))) +
geom_point()+
geom_smooth(method = "loess"))
)
Here, as you can see, is my output that doesn't put the smooth layer on the scatter plot. I wanted to have it for the regression analysis for my actual dataset. Any direction or explanation would be good. Thank you!
The solution in the post from #Edward's comment works here with mtcars. The snippet below replicates your plot above, with a loess line added:
library(ggplot2)
library(GGally)
View(mtcars)
# make a function to plot generic data with points and a loess line
my_fn <- function(data, mapping, method="loess", ...){
p <- ggplot(data = data, mapping = mapping) +
geom_point() +
geom_smooth(method=method, ...)
p
}
# call ggpairs, using mtcars as data, and plotting continuous variables using my_fn
ggpairs(mtcars, lower = list(continuous = my_fn))
In your snippet, the second argument lower has a ggplot object passed to it, but what it requires is a list with specifically named elements, that specify what to do with specific variable types. The elements in the list can be functions or character vectors (but not ggplot objects). From the ggpairs documentation:
upper and lower are lists that may contain the variables 'continuous',
'combo', 'discrete', and 'na'. Each element of the list may be a
function or a string. If a string is supplied, it must implement one
of the following options:
continuous exactly one of ('points', 'smooth', 'smooth_loess',
'density', 'cor', 'blank'). This option is used for continuous X and Y
data.
combo exactly one of ('box', 'box_no_facet', 'dot', 'dot_no_facet',
'facethist', 'facetdensity', 'denstrip', 'blank'). This option is used
for either continuous X and categorical Y data or categorical X and
continuous Y data.
discrete exactly one of ('facetbar', 'ratio', 'blank'). This option is
used for categorical X and Y data.
na exactly one of ('na', 'blank'). This option is used when all X data
is NA, all Y data is NA, or either all X or Y data is NA.
The reason my snippet works is because I've passed a list to lower, with an element named 'continuous' that is my_fn (which generates a ggplot).

Indexing separate survival curves

I would like to plot Kaplan-Meier survival estimates for each of two groups in ggplot.
To do so requires getting a separate survival curve for each group. The survfit function in the survival package splits the nicely but I don't know how to index the separate plots to work on them.
Here is sample data:
rearrest<-read.table("http://stats.idre.ucla.edu/stat/examples/alda/rearrest.csv", sep=",", header=T)
This is the curve ungrouped
(sCurve <- summary(arr1 <- survfit(Surv(months, abs(censor-1))~1, data = rearrest)))
It is easy to index elements within this, for example
sCurve$n.event
When I fit the same thing except this time grouped according to the value of the personal variable I get two nice survival curve objects ready to go.
(sCurveA <- summary(arr1 <- survfit(Surv(months, abs(censor-1))~personal, data = rearrest)))
One object is labelled personal=0 and the other personal=1. I have tried indexing with $, [], [[]] both with number-type indexes and named-, all to no avail.
Can anyone help?
sCurveA$strata provides the grouping variable as a vector. You can pull out the key pieces and throw them into a data.frame for ggplot.
df = data.frame(Time = sCurveA$time,
Survival = sCurveA$surv,
Strata = sCurveA$strata)
ggplot(df, aes(Time, Survival, col = Strata)) +
geom_line()

Plot multiple traces in R

I started learning R for data analysis and, most importantly, for data visualisation.
Since I am still in the switching process, I am trying to reproduce the activities I was doing with Graphpad Prism or Origin Pro in R. In most of the cases everything was smooth, but I could not find a smart solution for plotting multiple y columns in a single graph.
What I usually get from the softwares I use for data visualisations look like this:
Each single black trace is a measurement, and I would like to obtain the same plot in R. In Prism or Origin, this will take a single copy-paste in a XY graph.
I exported the matrix of data (one X, which indicates the time, and multiple Y values, which are the traces you see in the image).
I imported my data in R with the following commands:
library(ggplot2) #loaded ggplot2
Data <- read.csv("Directory/File.txt", header=F, sep="") #imported data
DF <- data.frame(Data) #transformed data into data frame
If I plot my data now, I obtain a series of columns, where the first one (called V1) is the X axis and all the others (V2 to V140) are the traces I want to put on the same graph.
To plot the data, I tried different solutions:
ggplot(data=DF, aes(x=DF$V1, y=DF[V2:V140]))+geom_line()+theme_bw() #did not work
plot(DF, xy.coords(x=DF$V1, y=DF$V2:V140)) #gives me an error
plot(DF, xy.coords(x=V1, y=c(V2:V10))) #gives me an error
I tried the matplot, without success, following the EZH guide:
The code I used is the following: matplot(x=DF$V1, type="l", lty = 2:100)
The only solution I found would be to individually plot a command for each single column, but it is a crazy solution. The number of columns varies among my data, and manually enter commands for 140 columns is insane.
What would you suggest?
Thank you in advance.
Here there are also some data attached.Data: single X, multiple Y
I tried using the matplot(). I used a very sample data which has no trend at all. so th eoutput from my code shall look terrible, but my main focus is on the code. Since you have already tried matplot() ,just recheck with below solution if you had done it right!
set.seed(100)
df = matrix(sample(1:685765,50000,replace = T),ncol = 100)
colnames(df)=c("x",paste0("y", 1:99))
dt=as.data.frame(df)
matplot(dt[["x"]], y = dt[,c(paste0("y",1:99))], type = "l")
If you want to plot in base R, you have to make a plot and add lines one at a time, however that isn't hard to do.
we start by making some sample data. Since the data in the link seemed to all be on the same scale, I will assume your data frame only has y values and the x value is stored separately.
plotData <- as.data.frame(matrix(sort(rnorm(500)),ncol = 5))
xval <- sort(sample(200, 100))
Now we can initialize a plot with the first column.
plot(xval, plotData[[1]], type = "l",
ylim = c(min(plotData), max(plotData)))
type = "l" makes a line plot instead of a scatter plot
ylim = c(min(plotData), max(plotData)) makes sure the y-axis will fit all the data.
Now we can add the rest of the values.
apply(plotData[-1], 2, lines, x = xval)
plotData[-1] removes the column we already plotted,
apply function with 2 as the second parameter means we want to execute a function on every column,
lines defines the function we are applying to the columns. lines adds a new line to the current plot.
x = xval passes an extra parameter (x) to the lines function.
if you wat to plot the data using ggplot2, the data should be transformed to long format;
library(ggplot2)
library(reshape2)
dat <- read.delim('AP.txt', header = F)
# plotting only first 9 traces
# my rstudio will crach if I plot the full data;
df <- melt(dat[1:10], id.vars = 'V1')
ggplot(df, aes(x = V1, y = value, color = variable)) + geom_line()
# if you want all traces to be in same colour, you can use
ggplot(df, aes(x = V1, y = value, group = variable)) + geom_line()

Combining output from smatr with ggplot2

I have a dataset of leaf trait measurements made at multiple sites at two contrasting seasons. I am interested to explore the association/line fit between a pair of traits and to differentiate the seasons at each site.
Rather than a linear regression, I would prefer to use the Standardised Major Axis approach within the smatr package:
e.g. sma.site1 <- sma(TraitA ~ TraitB * Visit, data=subset(myfile, Site=="Site1")) # testing the null hypothesis of common slopes for the two Visits (Seasons) at a given Site.
I can produce a handy lattice plot in ggplot2 with a separate panel for each Site and the points differentiated by Visit:
e.g. qplot(TraitB, TraitA, data=myfile, colour=Visit) + facet_wrap(~Site, ncol=2)
However, if I add trend lines fitted with the additional argument in ggplot2:
+ geom_smooth(aes(group=Visit), method="lm", se=F)
……, those lines are not a good match for the sma coefficients.
What I would like to do is fit the lines suggested by the sma test onto the ggplot lattice. Is there an easy, or efficient, way to do that?
I know that I can subset the data, produce a plot for each site, add the relevant lines with + geom_abline() and then stitch the separate plots up together with grid.arrange(). But that feels very long-winded.
I would be grateful for any pointers.
I don't know anything about the smatr package but you should be able to tweak this to get the right values. Since you provided no data I used the leaf data from the example in the pkg. The basic idea is to pull out the slope & intercept from the returned sma object and then facet the geom_abline. I may be misinterpreting the object, though.
library(smatr)
library(ggplot2)
data(leaflife)
do.call(rbind, lapply(unique(leaflife$site), function(x) {
obj <- sma(longev~lma*rain, data=subset(leaflife, site=x))
data.frame(site=x,
intercept=obj$coef[[1]][1, 1],
slope=obj$coef[[1]][2, 1])
})) -> fits
gg <- ggplot(leaflife)
gg <- gg + geom_point(aes(x=lma, y=longev, color=soilp))
gg <- gg + geom_abline(data=fits, aes(slope=slope, intercept=intercept))
gg <- gg + facet_wrap(~site, ncol=2)
gg
I just saw this question and am not sure if you are still interested in this. I run the code by hrbrmstr, and found actually the only thing you need to change is:
obj <- sma(longev~lma*rain, data=subset(leaflife, site == x))
then you can get the plot with four lines for each group.
and also

Animation, adding geom

I want to create some kind of animation with ggplot2 but it doesn't work as I want to. Here is a minimal example.
print(p <- qplot(c(1, 2),c(1, 1))+geom_point())
print(p <- p + geom_point(aes(c(1, 2),c(2, 2)))
print(p <- p + geom_point(aes(c(1, 2),c(3, 3)))
Adding extra points by hand is no problem. But now I want to do it in some loop to get an animation.
for(i in 4:10){
Sys.sleep(.3)
print(p <- p + geom_point(aes(c(1, ),c(i, i))))
}
But now only the new points added are shown, and points of the previous iterations are deleted. I want the old ones still to be visible. How can I do this?
Either of these will do what you want, I think.
# create df dynamically
for (i in 1:10) {
df <- data.frame(x=rep(1:2,i),y=rep(1:i,each=2))
Sys.sleep(0.3)
print(ggplot(df, aes(x,y))+geom_point() + ylim(0,10))
}
# create df at the beginning, then subset in the loop
df <- data.frame(x=rep(1:2,10), y=rep(1:10,each=2))
for (i in 1:10) {
Sys.sleep(0.3)
print(ggplot(df[1:(2*i),], aes(x,y))+geom_point() +ylim(0,10))
}
Also, your code will cause the y-axis limits to change for each plot. Using ylim(...) keeps all the plots on the same scale.
EDIT Response to OP's comment.
One way to create animations is using the animations package. Here's an example.
library(ggplot2)
library(animation)
ani.record(reset = TRUE) # clear history before recording
df <- data.frame(x=rep(1:2,10), y=rep(1:10,each=2))
for (i in 1:10) {
plot(ggplot(df[1:(2*i),], aes(x,y))+geom_point() +ylim(0,10))
ani.record() # record the current frame
}
## now we can replay it, with an appropriate pause between frames
oopts = ani.options(interval = 0.5)
ani.replay()
This will "record" each frame (using ani.record(...)) and then play it back at the end using ani.replay(...). Read the documentation for more details.
Regarding the question about why your code fails, the simple answer is: "this is not the way ggplot is designed to be used." The more complicated answer is this: ggplot is based on a framework which expects you to identify a default dataset as a data frame, and then associate (map) various aspects of the graph (aesthetics) with columns in the data frame. So if you have a data frame df with columns A and B, and you want to plot B vs. A, you would write:
ggplot(data=df, aes(x=A, y=B)) + geom_point()
This code identifies df as the dataset, and maps the aesthetic x (the horizontal axis) with column A and y with column B. Taking advantage of the default order of the arguments, you could also write:
ggplot(df, aes(A,B)) + geom_point()
It is possible to specify things other than column names in aes(...) but this can and often does lead to unexpected (even bizarre) results. Don't do it!.
The reason, basically, is that ggplot does not evaluate the arguments to aes(...) immediately, but rather stores them as expressions in a ggplot object, and evaluates them when you plot or print that object. This is why, for example, you can add layers to a plot and ggplot is able to dynamically rescale the x- and y-limits, something that does not work with plot(...) in base R.

Resources