I would like to add the median spline and corresponding confidence interval bands to a ggplot2 scatter plot. I am using the 'quantreg'-package, more specifically the rqss function (Additive Quantile Regression Smoothing).
In ggplot2 I am able to add the median spline, but not the confidence interval bands:
fig = ggplot(dd, aes(y = MeanEst, x = N, colour = factor(polarization)))
fig + stat_quantile(quantiles=0.5, formula = y ~ qss(x), method = "rqss") +
geom_point()
The quantreg-package comes with its own plot function; plot.rqss. Where I am able to add the confidence bands (bands=TRUE):
plot(1, type="n", xlab="", ylab="", xlim=c(2, 12), ylim=c(-3, 0)) # empty plot
plotfigs = function(df) {
rqss_model = rqss(df$MeanEst ~ qss(df$N))
plot(rqss_model, bands=TRUE, add=TRUE, rug=FALSE, jit=FALSE)
return(NULL)
}
figures = lapply(split(dd, as.factor(dd$polarization)), plotfigs)
However plot function that comes with the quantreg-package is not very flexible/well suited for my needs. Is it possible to get the confidence bands in a ggplot2 plot? Perhaps by mimicking the method used in the quantreg-package, or simply copying them from the plot?
Data: pastebin.
You almost have it. When you call
plot(rqss_model, bands=TRUE, add=TRUE, rug=FALSE, jit=FALSE)
The function very helpfully returns the plotted data. All we do is grab the data frame. First a minor tweak to your function, return the data in a sensible way
plotfigs = function(df) {
rqss_model = rqss(df$MeanEst ~ qss(df$N))
band = plot(rqss_model, bands=TRUE, add=TRUE, rug=FALSE, jit=FALSE)
data.frame(x=band[[1]]$x, low=band[[1]]$blo, high=band[[1]]$bhi,
pol=unique(df$polarization))
}
Next call the function and condense
figures = lapply(split(dd, as.factor(dd$polarization)), plotfigs)
bands = Reduce("rbind", figures)
Then use geom_ribbon to plot
## We inherit y and color, so have to set them to NULL
fig + geom_ribbon(data=bands,
aes(x=x, ymin=low, ymax=high,
y=NULL, color=NULL, group=factor(pol)),
alpha=0.3)
Related
I am trying to plot an NMDS plot of species community composition data with ellipses which represent 95% confidence intervals. I generated the data for my NMDS plot using metaMDS and successfully have ordinations generated using the basic plot functions in R (see code below). However, I am struggling to get my data to plot successfully using ggplot2 and this is the only way I have seen 95% CIs plotted on NMDS plots. I am hoping someone is able to help me correct my code so the ellipses show 95% CIs, or could point me in the right direction for achieving this using other methods?
My basic code for plotting my NMDS plot:
orditorp(dung.families.mds, display = "sites", labels = F, pch = c(16, 8, 17, 18) [as.numeric(group.variables$Heating)], col = c("green", "blue", "orange", "black") [as.numeric(group.variables$Dungfauna)], cex = 1.3)
ordiellipse(dung.families.mds, groups = group.variables$Dungfauna, draw = "polygon", lty = 1, col = "grey90")
legend("topleft", "stress = 0.1329627", bty = "n", cex = 1)
My ordination:
I realize this question is old, but I found this post useful for plotting confidence ellipses during my work, and maybe it will help you. Plotting ordiellipse function from vegan package onto NMDS plot created in ggplot2
Edit: Below I have copied the code from the second part of Didzis Elferts's answer on the link above.
Where "sol" is the metaMDS object:
First, make NMDS data frame with group column.
NMDS = data.frame(MDS1 = sol$points[,1], MDS2 = >sol$points[,2],group=MyMeta$amt)
Next, save result of function ordiellipse() as some object.
ord<-ordiellipse(sol, MyMeta$amt, display = "sites", >kind = "se", conf = 0.95, label = T)
Data frame df_ell contains values to show ellipses. It is calculated again with function veganCovEllipse which is hidden in vegan package. This function is applied to each level of NMDS (group) and now it uses arguments stored in ord object - cov, center and scale of each level.
df_ell <- data.frame()
for(g in levels(NMDS$group)){
df_ell <- rbind(df_ell, cbind(as.data.frame(with(NMDS[NMDS$group==g,],
veganCovEllipse(ord[[g]]$cov,ord[[g]]$center,ord[[g]]$scale)))
,group=g))
}
Plotting is done the same way as in previous example. As for the calculating of coordinates for elipses object of ordiellipse() is used, this solution will work with different parameters you provide for this function.
ggplot(data = NMDS, aes(MDS1, MDS2)) + geom_point(aes(color = group)) +
geom_path(data=df_ell, aes(x=NMDS1, y=NMDS2,colour=group), size=1, linetype=2)
I am using the following code in R to a plot a linear regression with confidence interval bands (95%) around the regression line.
Average <- c(0.298,0.783429,0.2295,0.3725,0.598,0.892,2.4816,2.79975,
1.716368,0.4845,0.974133,0.824,0.936846,1.54905,0.8166,1.83535,
1.6902,1.292667,0.2325,0.801,0.516,2.06645,2.64965,2.04785,0.55075,
0.698615,1.285,2.224118,2.8576,2.42905,1.138143,1.94225,2.467357,0.6615,
0.75,0.547,0.4518,0.8002,0.5936,0.804,0.7,0.6415,0.702182,0.7662,0.847)
Area <-c(8.605,16.079,4.17,5.985,12.419,10.062,50.271,61.69,30.262,11.832,25.099,
8.594,17.786,36.995,7.473,33.531,30.97,30.894,4.894,8.572,5.716,45.5,69.431,
40.736,8.613,14.829,4.963,33.159,66.32,37.513,27.302,47.828,39.286,9.244,19.484,
11.877,9.73,11.542,12.603,9.988,7.737,9.298,14.918,17.632,15)
lm.out <- lm (Area ~ Average)
newx = seq(min(Average), by = 0.05)
conf_interval <- predict(lm.out, newdata = data.frame(Average = newx), interval ="confidence",
level = 0.95)
plot(Average, Area, xlab ="Average", ylab = "Area", main = "Regression")
abline(lm.out, col = "lightblue")
lines(newx, conf_interval[,2], col = "blue", lty ="dashed")
lines(newx, conf_interval[,3], col = "blue", lty ="dashed")
I am stuck because the graph I got reports the bands just for the first part pf the line, leaving out all the remaining line (you find the link to the image at the bottom of the message). What is going wrong? I would also like to shade the area of the confidence interval (not just the lines corresponding to the limits) but I can't understand how to do it.
Any help would be really appreciated, I am completely new in R.
This is very easy with the ggplot2 -library. Here is the code:
library(ggplot2)
data = data.frame(Average, Area)
ggplot(data=data, aes(x=Area, y=Average))+
geom_smooth(method="lm", level=0.95)+
geom_point()
Code to install the library:
install.packages("ggplot2")
I am new in plotting time series. I downloaded a time series data and calculated a linear equation and I would like to add it in the time series plot. I want to show the year in the plot so I used index(stk) as x-axis input.
code:
library(quantmod)
stk <- suppressWarnings(getSymbols("AAPL", auto.assign = FALSE,
src = "yahoo", periodicity = "daily"))
stk <- na.omit(stk)
stk.lm1 <- lm(log(Cl(stk)) ~ c(1:nrow(stk)), data = stk)
plot(index(stk), log(Cl(stk)), type = "l", lwd = 3, las = 1)
abline(coefficients(stk.lm1)[1], coefficients(stk.lm1)[2], col="blue")
I know it is the plot using index(stk), how can I do to keep the x axis of plot in date and can I use plot.xts or other like ggplot2 to do the same things? Please advise, thank you very much.
It isn't dificult to do the plot that you want in base r plot or ggplot2 here is what you what:
plot(index(stk), log(Cl(stk)), type="l", lwd=3, las=1)
lines(x = index(stk.lm1$fitted.values), y = stk.lm1$fitted.values,col = "blue")
for the base r plot I added a line with the fitted values of the linear regression that I extracted with the $ signed and the dates of theme. Take into account that lm respect the structure of the data so the results are xts
library(ggplot2)
ggplot(stk, aes(x = index(stk), y = as.numeric(log(Cl(stk)))))+geom_line(lwd=1)+
geom_line(aes(x = index(stk.lm1$fitted.values), y = stk.lm1$fitted.values),col = "blue")+
labs(x = "Date", y = "Log Price")
For ggplot2 is quite similar. First you have to initiate the plot with ggplot where you defined the data and aesthetics (aes), then you add a line with geom_line and for the extra line I used the this command and define the new line in a new aes the same way I did it with the base r function.
Here's a ggplot solution. You shouldn't have to calculate the linear regression coefficients yourself:
# convert stk to data frame & specify your x-axis variable explicitly
stk.df <- as.data.frame(stk)
stk.df$Date <- as.Date(rownames(stk.df))
# plot
ggplot(stk.df,
aes(x = Date, y = log(AAPL.Close))) +
geom_line() +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Year", y = "log(AAPL's closing value)") +
theme_bw()
The geom_smooth line takes care of the regression. Set se = TRUE if you want to include a confidence interval around the line.
I am trying to plot 4 ecdf functions on one plot but can't seem to figure out the proper syntax.
If I have 4 functions "A, B, C, D" what would be the proper syntax in R to get them to be plotted on the same chart with different colors. Thanks!
Here is one way (for three of them, works for four the same way):
set.seed(42)
ecdf1 <- ecdf(rnorm(100)*0.5)
ecdf2 <- ecdf(rnorm(100)*1.0)
ecdf3 <- ecdf(rnorm(100)*2.0)
plot(ecdf3, verticals=TRUE, do.points=FALSE)
plot(ecdf2, verticals=TRUE, do.points=FALSE, add=TRUE, col='brown')
plot(ecdf1, verticals=TRUE, do.points=FALSE, add=TRUE, col='orange')
Note that I am using the fact that the third has the widest range, and use that to initialize the canvas. Else you need ylim=c(...).
The package latticeExtra provides the function ecdfplot.
library(lattice)
library(latticeExtra)
set.seed(42)
vals <- data.frame(r1=rnorm(100)*0.5,
r2=rnorm(100),
r3=rnorm(100)*2)
ecdfplot(~ r1 + r2 + r3, data=vals, auto.key=list(space='right')
Here is an approach using ggplot2 (using the ecdf objects from [Dirk's answer])(https://stackoverflow.com/a/20601807/1385941)
library(ggplot2)
# create a data set containing the range you wish to use
d <- data.frame(x = c(-6,6))
# create a list of calls to `stat_function` with the colours you wish to use
ll <- Map(f = stat_function, colour = c('red', 'green', 'blue'),
fun = list(ecdf1, ecdf2, ecdf3), geom = 'step')
ggplot(data = d, aes(x = x)) + ll
A simpler way is to use ggplot and have the variable that you want to plot as a factor. In the example below, I have Portfolio as a factor and plotting the distribution of Interest Rates by Portfolio.
# select a palette
myPal <- c( 'royalblue4', 'lightsteelblue1', 'sienna1')
# plot the Interest Rate distribution of each portfolio
# make an ecdf of each category in Portfolio which is a factor
g2 <- ggplot(mortgage, aes(x = Interest_Rate, color = Portfolio)) +
scale_color_manual(values = myPal) +
stat_ecdf(lwd = 1.25, geom = "line")
g2
You can also set geom = "step", geom = "point" and adjust the line width lwd in the stat_ecdf() function. This gives you a nice plot with the legend.
I wish to add regression lines to a plot that has multiple data series that are colour coded by a factor. Using a brewer.pal palette, I created a plot with the data points coloured by factor (plant$ID). Below is an example of the code:
palette(brewer.pal(12,"Paired"))
plot(x=plant$TL, y=plant$d15N, xlab="Total length (mm)", ylab="d15N", col=plant$ID, pch=16)
legend(locator(1), legend=levels(factor(plant$ID)), text.col="black", pch=16, col=c(brewer.pal(12,"Paired")), cex=0.6)
Is there an easy way to add linear regression lines to the graph for each of the different data series (factors)? I also wish to colour the lines according to the factor plant$ID?
I can achieve this by adding each of the data series to the plot separately and then using the abline function (as below), but in cases with multiple data series it can be very time consuming matching up colours.
plot(y=plant$d15N[plant$ID=="Sm"], x=plant$TL[plant$ID=="Sm"], xlab="Total length (mm)", ylab="d15N", col="green", pch=16, xlim=c(50,300), ylim=c(8,15))
points(y=plant$d15N[plant$ID=="Md"], x=plant$TL[plant$ID=="Md"], type="p", pch=16, col="blue")
points(y=plant$d15N[plant$ID=="Lg"], x=plant$TL[plant$ID=="Lg"], type="p", pch=16, col="orange")
abline(lm(plant$d15N[plant$ID=="Sm"]~plant$TL[plant$ID=="Sm"]), col="green")
abline(lm(plant$d15N[plant$ID=="Md"]~plant$TL[plant$ID=="Md"]), col="blue")
abline(lm(plant$d15N[plant$ID=="Lg"]~plant$TL[plant$ID=="Lg"]), col="orange")
legend.text<-c("Sm","Md","Lg")
legend(locator(1), legend=legend.text, col=c("green", "blue", "orange"), pch=16, bty="n", cex=0.7)
There must be a quicker way! Any help would be greatly appreciated.
Or you use ggplot2 and let it do all the hard work. Unfortunately, you example is not reproducible, so I have to create some myself:
plant = data.frame(d15N = runif(1000),
TL = runif(1000),
ID = sample(c("Sm","Md","Lg"), size = 1000, replace = TRUE))
plant = within(plant, {
d15N[ID == "Sm"] = d15N[ID == "Sm"] + 0.5
d15N[ID == "Lg"] = d15N[ID == "Lg"] - 0.5
})
> head(plant)
d15N TL ID
1 0.6445164 0.14393597 Sm
2 0.2098778 0.62502205 Lg
3 -0.1599300 0.85331376 Lg
4 -0.3173119 0.60537491 Lg
5 0.8197111 0.01176013 Sm
6 1.0374742 0.68668317 Sm
The trick is to use the geom_smooth geometry which calculates the lm and draws it. Because we use color = ID, ggplot2 knows it needs to do the whole plot for each unique ID in ID.
library(ggplot2)
ggplot(plant, aes(x = TL, y = d15N, color = ID)) +
geom_point() + geom_smooth(method = "lm")