Reduced major axis line and CI for ggplot in R - r

Is there anyway to add a reduced major axis line (and ideally CI) to a ggplot? I know I can use method="lm" to get an OLS fit, but there doesn't seem to be a default method for RMA. I can get the RMA coefs and the CI interval from package lmodel2, but adding them with geom_abline() doesn't seem to work. Here's dummy data and code. I just want to replace the OLS line and CI with a RMA line and CI:
dat <- data.frame(a=log10(rnorm(50, 30, 10)), b=log10(rnorm(50, 20, 2)))
ggplot(dat, aes(x=a, y=b) ) +
geom_point(shape=1) +
geom_smooth(method="lm")
Edit1: the code below gets the RMA (here called SMA - standardized major axis) coefs and CIs. Package lmodel2 provides more detailed output, while package smatr returns just the coefs and CIs, if that's any help:
library(lmodel2)
fit1 <- lmodel2(b ~ a, data=dat)
library(smatr)
fit2 <- line.cis(b, a, data=dat)

Building off Joran's answer, I think it's a little easier to pass the whole data frame to geom_abline:
library(ggplot2)
library(lmodel2)
dat <- data.frame(a=log10(rnorm(50, 30, 10)), b=log10(rnorm(50, 20, 2)))
mod <- lmodel2(a ~ b, data=dat,"interval", "interval", 99)
reg <- mod$regression.results
names(reg) <- c("method", "intercept", "slope", "angle", "p-value")
ggplot(dat) +
geom_point(aes(b, a)) +
geom_abline(data = reg, aes(intercept = intercept, slope = slope, colour = method))

As Chase commented, the actual lmodel2() code and the ggplot code you are using would be helpful. But here's an example that may point you in the right direction:
dat <- data.frame(a=log10(rnorm(50, 30, 10)), b=log10(rnorm(50, 20, 2)))
mod <- lmodel2(a ~ b, data=dat,"interval", "interval", 99)
#EDIT: mod is a list, with components (data.frames) regression.results and
# confidence.intervals containing the intercepts+slopes for different
# estimation methods; just put the right values into geom_abline
ggplot(dat,aes(x=b,y=a)) + geom_point() +
geom_abline(intercept=mod$regression.results[4,2],
slope=mod$regression.results[4,3],colour="blue") +
geom_abline(intercept=mod$confidence.intervals[4,2],
slope=mod$confidence.intervals[4,4],colour="red") +
geom_abline(intercept=mod$confidence.intervals[4,3],
slope=mod$confidence.intervals[4,5],colour="red") +
xlim(c(-10,10)) + ylim(c(-10,10))
Full disclosure: I know nothing about RMA regression, so I just plucked out the relevent slopes and intercepts and plopped them into geom_abline(), using some example code from lmodel2 as a guide. The CIs produced in this toy example don't seem to make much sense, since I had to force ggplot to zoom out using xlim() and ylim() in order to see the CI lines (red).
But maybe this will help you construct a working example in ggplot().
EDIT2: With OPs added code to extract the coefficients, the ggplot() would be something like this:
ggplot(dat,aes(x=b,y=a)) + geom_point() +
geom_abline(intercept=fit2[1,1],slope=fit2[2,1],colour="blue") +
geom_abline(intercept=fit2[1,2],slope=fit2[2,2],colour="red") +
geom_abline(intercept=fit2[1,3],slope=fit2[2,3],colour="red")

I found myself in the same situation.
Obtain fitted values and their confidence intervals using the ggpmisc package:
cibrary(ggpmisc)
ci <- predict.lmodel2(fit1, method= 'RMA', interval= "confidence")
Add the model predictions to your data:
datci <- cbind(dat, ci)
Plot using geom_smooth arguments like transparency and line width (of course, you can customize them)
p <- ggplot(datci, aes(x= b, y= a)) + geom_point() + geom_line(aes(x= b, y= a)), lwd= 1.1, alpha= 0.6)
Use geom_ribbon if you want to add confidence intervals:
p + geom_ribbon(aes(ymin= lwr, ymax= upr, fill= feather), alpha= 0.3, color= NA)

Related

How to make density plot correctly show area near the limits?

When I plot densities with ggplot, it seems to be very wrong around the limits. I see that geom_density and other functions allow specifying various density kernels, but none of them seem to fix the issue.
How do you correctly plot densities around the limits with ggplot?
As an example, let's plot the Chi-square distribution with 2 degrees of freedom. Using the builtin probability densities:
library(ggplot2)
u = seq(0, 2, by=0.01)
v = dchisq(u, df=2)
df = data.frame(x=u, p=v)
p = ggplot(df) +
geom_line(aes(x=x, y=p), size=1) +
theme_classic() +
coord_cartesian(xlim=c(0, 2), ylim=c(0, 0.5))
show(p)
We get the expected plot:
Now let's try simulating it and plotting the empirical distribution:
library(ggplot2)
u = rchisq(10000, df=2)
df = data.frame(x=u)
p = ggplot(df) +
geom_density(aes(x=x)) +
theme_classic() +
coord_cartesian(xlim=c(0, 2))
show(p)
We get an incorrect plot:
We can try to visualize the actual distribution:
library(ggplot2, dplyr, tidyr)
u = rchisq(10000, df=2)
df = data.frame(x=u)
p = ggplot(df) +
geom_point(aes(x=x, y=0.5), position=position_jitter(height=0.2), shape='.', alpha=1) +
theme_classic() +
coord_cartesian(xlim=c(0, 2), ylim=c(0, 1))
show(p)
And it seems to look correct, contrary to the density plot:
It seems like the problem has to do with kernels, and geom_density does allow using different kernels. But they don't really correct the limit problem. For example, the code above with triangular looks about the same:
Here's an idea of what I'm expecting to see (of course, I want a density, not a histogram):
library(ggplot2)
u = rchisq(10000, df=2)
df = data.frame(x=u)
p = ggplot(df) +
geom_histogram(aes(x=x), center=0.1, binwidth=0.2, fill='white', color='black') +
theme_classic() +
coord_cartesian(xlim=c(0, 2))
show(p)
The usual kernel density methods have trouble when there is a constraint such as in this case for a density with only support above zero. The usual recommendation for handling this has been to use the logspline package:
install.packages("logspline")
library(logspline)
png(); fit <- logspline(rchisq(10000, 3))
plot(fit) ; dev.off()
If this needed to be done in the ggplot2 environment there is a dlogspline function:
densdf <- data.frame( y=dlogspline(seq(0,12,length=1000), fit),
x=seq(0,12,length=1000))
ggplot(densdf, aes(y=y,x=x))+geom_line()
Perhaps you were insisting on one with 2 degrees of freedom?

Adding predict line from glm to ggplot2, larger than original data set

I have included a sample data set just to demonstrate what I am trying to do.
Speed <- c(400,220,490,210,500,270,200,470,480,310,240,490,420,330,280,210,300,470,230,430,460,220,250,200,390)
Hit <- c(0,1,0,1,0,0,1,0,0,1,1,0,0,1,1,1,1,1,0,0,0,1,1,1,0)
obs <- c(1:25)
msl2.data <- as.data.frame(cbind(obs,Hit,Speed))
msl2.glm <- glm(Hit ~ Speed, data = msl2.data, family = binomial)
Doing What I want in the base package.
plot(Hit~ Speed, data = msl2.data, xlim = c(0,700), xlab = "Speed", ylab = "Hit", main = "Plot of hit vs Speed")
pi.hat<-(predict( msl2.glm, data.frame(Speed=c(0:700)), type="response" ))
lines( 0:700, pi.hat, col="blue" )
I am trying to recreate the above plot, but in ggplot. The error I have been unable to work around is the aes(x,y) have different lengths, which is true, but I want them to have different lengths.
Any ideas for this in gg?
You have a couple of approaches; the first does all the modelling
inside of ggplot, the second does it outside and passes the relevant data
to be plot.
First
gplot(dat=msl2.data, aes(Speed, Hit)) +
geom_point() +
geom_smooth(method="glm", method.args=list(family="binomial"),
fullrange=TRUE, se=FALSE) +
xlim(0, 700)
fullrange is specified so the prediction lines covers the x-range. xlim extends the x-axis.
Second
#Create prediction dataframe
pred <- data.frame(Speed=0:700, pi.hat)
ggplot() +
# prediction line
geom_line(data=pred, aes(Speed, pi.hat)) +
# points - note different dataframe is used
geom_point(dat=msl2.data, aes(Speed, Hit))
I generally prefer to do the modelling outside (second approach), and use ggplot purely as a plotting mechanism.

Quick way to plot an anova

To perform an ANOVA in R I normally follow two steps:
1) I compute the anova summary with the function aov
2) I reorganise the data aggregating subject and condition to visualise the plot
I wonder whether is always neccesary this reorganisation of the data to see the results, or whether it exists a f(x) to plot rapidly the results.
Thanks for your suggestions
G.
I think what you mean is to illustrate the result of your test with a figure ? Anova are usually illustrate with boxplot.
set.seed(1234)
data <- data.frame(group = c(rep("group_1",25),rep("group_2",25)), scores = c(runif(25,0,1),runif(25,1.5,2.5)))
mod1<-aov(scores~group,data=data)
summary(mod1)
You can make boxplot with the implemented function plot or boxplot
boxplot(scores~group,data=data)
plot(scores~group,data=data)
Or with ggplot
require(ggplot2)
require(ggsignif)
ggplot(data, aes(x = group, y = scores)) +
geom_boxplot(fill = "grey80", colour = "blue") +
scale_x_discrete() + xlab("Group") +
ylab("Scores") +
geom_signif(comparisons = list(c("group_1", "group_2")),
map_signif_level=TRUE)
Hope this helps

ggplot2: How to curve small gaussian densities on a regression line?

I want to graphically show the assumptions of linear (and later other type) regression. How can I add to my plot small Gaussian densities (or any type of densities) on a regression line just like in this figure:
You can compute the empirical densities of the residuals for sections along a fitted line. Then, it is just a matter of drawing the lines at the positions of your choosing in each interval using geom_path. To add theoretical distribution, generate some densities along the range of the residuals for each section (here using normal density). For the Normal densities below, the standard deviation for each one is determined for each section from the residuals, but you could just choose a standard deviation for all of them and use that instead.
## Sample data
set.seed(0)
dat <- data.frame(x=(x=runif(100, 0, 50)),
y=rnorm(100, 10*x, 100))
## breaks: where you want to compute densities
breaks <- seq(0, max(dat$x), len=5)
dat$section <- cut(dat$x, breaks)
## Get the residuals
dat$res <- residuals(lm(y ~ x, data=dat))
## Compute densities for each section, and flip the axes, and add means of sections
## Note: the densities need to be scaled in relation to the section size (2000 here)
dens <- do.call(rbind, lapply(split(dat, dat$section), function(x) {
d <- density(x$res, n=50)
res <- data.frame(x=max(x$x)- d$y*2000, y=d$x+mean(x$y))
res <- res[order(res$y), ]
## Get some data for normal lines as well
xs <- seq(min(x$res), max(x$res), len=50)
res <- rbind(res, data.frame(y=xs + mean(x$y),
x=max(x$x) - 2000*dnorm(xs, 0, sd(x$res))))
res$type <- rep(c("empirical", "normal"), each=50)
res
}))
dens$section <- rep(levels(dat$section), each=100)
## Plot both empirical and theoretical
ggplot(dat, aes(x, y)) +
geom_point() +
geom_smooth(method="lm", fill=NA, lwd=2) +
geom_path(data=dens, aes(x, y, group=interaction(section,type), color=type), lwd=1.1) +
theme_bw() +
geom_vline(xintercept=breaks, lty=2)
Or, just gaussian curves
## Just normal
ggplot(dat, aes(x, y)) +
geom_point() +
geom_smooth(method="lm", fill=NA, lwd=2) +
geom_path(data=dens[dens$type=="normal",], aes(x, y, group=section), color="salmon", lwd=1.1) +
theme_bw() +
geom_vline(xintercept=breaks, lty=2)

How to shade part of a density curve in ggplot (with no y axis data)

I'm trying to create a density curve in R using a set of random numbers between 1000, and shade the part that is less than or equal to a certain value. There are a lot of solutions out there involving geom_area or geom_ribbon, but they all require a yval, which I don't have (it's just a vector of 1000 numbers). Any ideas on how I could do this?
Two other related questions:
Is it possible to do the same thing for a cumulative density function (I'm currently using stat_ecdf to generate one), or shade it at all?
Is there any way to edit geom_vline so it will only go up to the height of the density curve, rather than the whole y axis?
Code: (the geom_area is a failed attempt to edit some code I found. If I set ymax manually, I just get a column taking up the whole plot, instead of just the area under the curve)
set.seed(100)
amount_spent <- rnorm(1000,500,150)
amount_spent1<- data.frame(amount_spent)
rand1 <- runif(1,0,1000)
amount_spent1$pdf <- dnorm(amount_spent1$amount_spent)
mean1 <- mean(amount_spent1$amount_spent)
#density/bell curve
ggplot(amount_spent1,aes(amount_spent)) +
geom_density( size=1.05, color="gray64", alpha=.5, fill="gray77") +
geom_vline(xintercept=mean1, alpha=.7, linetype="dashed", size=1.1, color="cadetblue4")+
geom_vline(xintercept=rand1, alpha=.7, linetype="dashed",size=1.1, color="red3")+
geom_area(mapping=aes(ifelse(amount_spent1$amount_spent > rand1,amount_spent1$amount_spent,0)), ymin=0, ymax=.03,fill="red",alpha=.3)+
ylab("")+
xlab("Amount spent on lobbying (in Millions USD)")+
scale_x_continuous(breaks=seq(0,1000,100))
There are a couple of questions that show this ... here and here, but they calculate the density prior to plotting.
This is another way, more complicated than required im sure, that allows ggplot to do some of the calculations for you.
# Your data
set.seed(100)
amount_spent1 <- data.frame(amount_spent=rnorm(1000, 500, 150))
mean1 <- mean(amount_spent1$amount_spent)
rand1 <- runif(1,0,1000)
Basic density plot
p <- ggplot(amount_spent1, aes(amount_spent)) +
geom_density(fill="grey") +
geom_vline(xintercept=mean1)
You can extract the x and y positions for the area to shade from the plot object using ggplot_build. Linear interpolation was used to get the y value at x=rand1
# subset region and plot
d <- ggplot_build(p)$data[[1]]
p <- p + geom_area(data = subset(d, x > rand1), aes(x=x, y=y), fill="red") +
geom_segment(x=rand1, xend=rand1,
y=0, yend=approx(x = d$x, y = d$y, xout = rand1)$y,
colour="blue", size=3)

Resources