Adding R squared value to orthogonal regression line in R - r

I have produced a scatter plot in R of expected/observed values. I calculated orthogonal regression and added the line using the following:
library(ggplot2)
library(MethComp)
r<-read_csv("Uni/MSci/Project/DATA/new data sheets/comparisons/for comarison
graphs/R Regression/GCdNi.csv")
x<-r[1]
y<-r[2]
P<-ggplot()+geom_point(aes(x=x,y=y))+
scale_size_area()+xlab("Expected")+ylab("Observed")+ggtitle("G - Cd x Ni")+
xlim(0, 40)+ylim(0, 40)
# Orthogonal, total least squares or Deming regression
deming <- Deming(y=r$Observed, x=r$Expected)[1:2]
deming
R <- prcomp( ~ r$Expected + r$Observed )
slope <- R$rotation[2,1] / R$rotation[1,1]
slope
intercept <- R$center[2] - slope*R$center[1]
intercept
#Plot orthogonal regression
P+geom_abline(intercept = deming[1], slope = deming[2])
This gives me the following plot:
Is there a way I can calculate and add an R squared value to the graph?
Heres some of the data frame to allow for reproduction:
Expected Observed
2.709093153 1.37799781
2.611562579 1.410720257
2.22411805 1.287685907
3.431914392 1.906787706
3.242018129 1.823698676
3.46139841 1.767857729
2.255673738 1.111307235
2.400606765 1.294583377
1.818447253 0.995226256
2.528992184 1.173159775
2.46829393 1.101852756
1.826044939 0.883336715
1.78702201 1.050122993
2.37226253 1.025298403
2.140921846 1.094761918

I could not reproduce your data, but here's how you could do something like that with linear regression.
library(ggplot2)
set.seed(1)
x <- rnorm(20,1,100)
y<- x + rnorm(20,50,10)
regression <- lm(y ~ x)
r2 <- summary(regression)$r.squared
ggplot() + geom_point(aes(x, y)) +
geom_line(aes(x, regression$fitted.values)) +
annotate("text", x = -100, y = 200, label = paste0("r squared = ", r2))
In the future, you should provide a reproducible example.

Related

Error in plotting dose-response curve in ggplot with drc package

I'm getting a half plotted graph when trying to plot the dose-response curve using ggplot in drc package. I'm trying to follow procedure given in a suplementary information from a recent paper on dose response curve.
Here is my raw data
Please help me to find solution, thanks!
# Fit a log-logistic model with lower and upper asymptotic limits fixed at respectively 0 and 100
mod.Pyr <- drm(gi ~ conc, data = my_data, fct = LL.4(fixed = c(NA, 0, 100, NA)))
# new dose levels as support for the line
newdata <- expand.grid(conc=exp(seq(log(0.5), log(3000), length=1000)))
# predictions and confidence intervals
pm <- predict(mod.Pyr, newdata=newdata, interval="confidence")
# new data with predictions
newdata$p <- pm[,1]
newdata$pmin <- pm[,2]
newdata$pmax <- pm[,3]
# need to shift conc == 0 a bit up, otherwise there are problems with coord_trans
my_data$conc0 <- my_data$conc
my_data$conc0[my_data$conc0 == 0] <- 0.5
# plotting the curve
ggplot(my_data, aes(x = conc0, y = gi)) +
geom_point() +
geom_ribbon(data=newdata, aes(x=conc, y=p, ymin=pmin, ymax=pmax), alpha=0.2) +
geom_line(data=newdata, aes(x=conc, y=p)) +
coord_trans(x="log") +
xlab("Concentration (mg/l)") + ylab("Growth inhibition")
You have conc values above log(3000) and you only created newdata for values until log(100) so you're not able to fit until log(3000), you just need to increase log(100) in expand.grid to higher values, examples with log(3000) :
newdata <- expand.grid(conc=exp(seq(log(0.5), log(3000), length=100)))

Fitting Laplace distribution to data

I want to fit laplace distrubution to data which density is given by formula:
As I read on wikipedia good estimator for mu parameter is median, and for tau - mean deviation from the median.
So what I did:
set.seed(42)
# Create a vector for which Laplace distribution will be fitted
vec <- rexp(1000)
# Defining laplace distribution
dlaplace <- function(x, mu, b) {
1/(2*b)*exp(-(abs(x - mu))/b)
}
#Estimating two parameters
mu <- median(vec)
tau <- mean(abs(vec-mu))
However now if we take a loot at histogram of this density fitted to our data we will end up with the image following:
library(ggplot2)
vals <- dlaplace(vec, mu, tau)
ggplot() + geom_histogram(aes(vals), binwidth = 3) +
geom_line(aes(x = 1:length(vec), y = vec))
Which suggests that it doesn't fit this distribution at all. My question is:
Is this so bad because it I randomized my vector from exponential distribution which is not Laplace, or I'm doing something incorrectly ?
Are you looking for this?
vals <- dlaplace(vec, mu, tau)
df1 <- data.frame(vec, vals)
ggplot(df1, aes(vec)) +
geom_histogram(aes(y = ..density..), fill = "grey", binwidth = 0.5) +
geom_line(aes(y = vals), color = "steelblue")

How to convert log function in RStudio?

fit1 = lm(price ~ . , data = car)
fit2 = lm(log(price) ~ . , data = car)
I'm not sure how to convert log(price) to price in fit2 Won't it just become the same thing as fit1 if I do convert it? Please help.
Let's take a very simple example. Suppose I have some data points like this:
library(ggplot2)
df <- data.frame(x = 1:10, y = (1:10)^2)
(p <- ggplot(df, aes(x, y)) + geom_point())
I want to try to fit a model to them, but don't know what form this should take. I try a linear regression first and plot the resultant prediction:
mod1 <- lm(y ~ x, data = df)
(p <- p + geom_line(aes(y = predict(mod1)), color = "blue"))
Next I try a linear regression on log(y). Whatever results I get from predictions from this model will be predicted values of log(y). But I don't want log(y) predictions, I want y predictions, so I need to take the 'anti-log' of the prediction. We do this in R by doing exp:
mod2 <- lm(log(y) ~ x, data = df)
(p <- p + geom_line(aes(y = exp(predict(mod2))), color = "red"))
But we can see that we have different regression lines. That's because when we took the log of y, we were effectively fitting a straight line on the plot of log(y) against x. When we transform the axis back to a non-log axis, our straight line becomes an exponential curve. We can see this more clearly by drawing our plot again with a log-transformed y axis:
p + scale_y_log10(limits = c(1, 500))
Created on 2020-08-04 by the reprex package (v0.3.0)

geom_abline for logistic regression (ggplot2)

I am sorry if this question is very simple, however, I could not find any solution to my problem. I want to plot logistic regressions lines with ggplot2. The problem is that I cannot use geom_abline because I dont have the original model, just the slope and intercept for each regression line. I have use this approach for linear regressions, and this works fine with geom_abline, because you can just give multiple slopes and intercepts to the function.
geom_abline(data = estimates, aes(intercept = inter, slope = slo)
where inter and slo are vectors with more then one value.
If I try the same approach with coefficients from a logistic regression, I will get the wrong regression lines (linear). I am trying to use geom_line, however, I cannot use the function predict to generate the predicted values because I dont have the a original model objetc.
Any suggestion?
Thanks in advance,
Gustavo
If the model had a logit link then you could plot the prediction using only the intercept (coefs[1]) and slope (coefs[2]) as:
library(ggplot2)
n <- 100L
x <- rnorm(n, 2.0, 0.5)
y <- factor(rbinom(n, 1L, plogis(-0.6 + 1.0 * x)))
mod <- glm(y ~ x, binomial("logit"))
coefs <- coef(mod)
x_plot <- seq(-5.0, 5.0, by = 0.1)
y_plot <- plogis(coefs[1] + coefs[2] * x_plot)
plot_data <- data.frame(x_plot, y_plot)
ggplot(plot_data) + geom_line(aes(x_plot, y_plot), col = "red") +
xlab("x") + ylab("p(y | x)") +
scale_y_continuous(limits = c(0, 1)) + theme_bw()
Edit
Here one way of plotting k predicted probability lines on the same graph following from the previous code:
library(reshape2)
k <- 5L
intercepts <- rnorm(k, coefs[1], 0.5)
slopes <- rnorm(k, coefs[2], 0.5)
x_plot <- seq(-5.0, 5.0, by = 0.1)
model_predictions <- sapply(1:k, function(idx) {
plogis(intercepts[idx] + slopes[idx] * x_plot)
})
colnames(model_predictions) <- 1:k
plot_data <- as.data.frame(cbind(x_plot, model_predictions))
plot_data_melted <- melt(plot_data, id.vars = "x_plot", variable.name = "model",
value.name = "y_plot")
ggplot(plot_data_melted) + geom_line(aes(x_plot, y_plot, col = model)) +
xlab("x") + ylab("p(y | x)") +
scale_y_continuous(limits = c(0, 1)) + theme_bw()

ggplot2: How to plot an orthogonal regression line?

I have tested a large sample of participants on two different tests of visual perception – now, I'd like to see to what extent performance on both tests correlates.
To visualise the correlation, I plot a scatterplot in R using ggplot() and I fit a regression line (using stat_smooth()). However, since both my x and y variable are performance measures, I need to take both of them into account when fitting my regression line – thus, I cannot use a simple linear regression (using stat_smooth(method="lm")), but rather need to fit an orthogonal regression (or Total least squares). How would I go about doing this?
I know I can specify formula in stat_smooth(), but I wouldn't know what formula to use. From what I understand, none of the preset methods (lm, glm, gam, loess, rlm) are applicable.
It turns out that you can extract the slope and intercept from principal components analysis on (x,y), as shown here. This is just a little simpler, runs in base R, and gives the identical result to using Deming(...) in MethComp.
# same `x and `y` as #user20650's answer
df <- data.frame(y, x)
pca <- prcomp(~x+y, df)
slp <- with(pca, rotation[2,1] / rotation[1,1])
int <- with(pca, center[2] - slp*center[1])
ggplot(df, aes(x,y)) +
geom_point() +
stat_smooth(method=lm, color="green", se=FALSE) +
geom_abline(slope=slp, intercept=int, color="blue")
Caveat: not familiar with this method
I think you should be able to just pass the slope and intercept to geom_abline to produce the fitted line. Alternatively, you could define your own method to pass to stat_smooth (as shown at the link smooth.Pspline wrapper for stat_smooth (in ggplot2)). I used the Deming function from the MethComp package as suggested at link How to calculate Total least squares in R? (Orthogonal regression).
library(MethComp)
library(ggplot2)
# Sample data and model (from ?Deming example)
set.seed(1)
M <- runif(100,0,5)
# Measurements:
x <- M + rnorm(100)
y <- 2 + 3 * M + rnorm(100,sd=2)
# Deming regression
mod <- Deming(x,y)
# Define functions to pass to stat_smooth - see mnel's answer at link for details
# Defined the Deming model output as class Deming to define the predict method
# I only used the intercept and slope for predictions - is this correct?
f <- function(formula,data,SDR=2,...){
M <- model.frame(formula, data)
d <- Deming(x =M[,2],y =M[,1], sdr=SDR)[1:2]
class(d) <- "Deming"
d
}
# an s3 method for predictdf (called within stat_smooth)
predictdf.Deming <- function(model, xseq, se, level) {
pred <- model %*% t(cbind(1, xseq) )
data.frame(x = xseq, y = c(pred))
}
ggplot(data.frame(x,y), aes(x, y)) + geom_point() +
stat_smooth(method = f, se= FALSE, colour='red', formula=y~x, SDR=1) +
geom_abline(intercept=mod[1], slope=mod[2], colour='blue') +
stat_smooth(method = "lm", se= FALSE, colour='green', formula = y~x)
So passing the intercept and slope to geom_abline produces the same fitted line (as expected). So if this is the correct approach then imo its easier to go with this.
The MethComp package seems to be no longer maintained (was removed from CRAN).
Russel88/COEF allows to use stat_/geom_summary with method="tls" to add an orthogonal regression line.
Based on this and wikipedia:Deming_regression I created the following functions, which allow to use noise ratios other than 1:
deming.fit <- function(x, y, noise_ratio = sd(y)/sd(x)) {
if(missing(noise_ratio) || is.null(noise_ratio)) noise_ratio <- eval(formals(sys.function(0))$noise_ratio) # this is just a complicated way to write `sd(y)/sd(x)`
delta <- noise_ratio^2
x_name <- deparse(substitute(x))
s_yy <- var(y)
s_xx <- var(x)
s_xy <- cov(x, y)
beta1 <- (s_yy - delta*s_xx + sqrt((s_yy - delta*s_xx)^2 + 4*delta*s_xy^2)) / (2*s_xy)
beta0 <- mean(y) - beta1 * mean(x)
res <- c(beta0 = beta0, beta1 = beta1)
names(res) <- c("(Intercept)", x_name)
class(res) <- "Deming"
res
}
deming <- function(formula, data, R = 100, noise_ratio = NULL, ...){
ret <- boot::boot(
data = model.frame(formula, data),
statistic = function(data, ind) {
data <- data[ind, ]
args <- rlang::parse_exprs(colnames(data))
names(args) <- c("y", "x")
rlang::eval_tidy(rlang::expr(deming.fit(!!!args, noise_ratio = noise_ratio)), data, env = rlang::current_env())
},
R=R
)
class(ret) <- c("Deming", class(ret))
ret
}
predictdf.Deming <- function(model, xseq, se, level) {
pred <- as.vector(tcrossprod(model$t0, cbind(1, xseq)))
if(se) {
preds <- tcrossprod(model$t, cbind(1, xseq))
data.frame(
x = xseq,
y = pred,
ymin = apply(preds, 2, function(x) quantile(x, probs = (1-level)/2)),
ymax = apply(preds, 2, function(x) quantile(x, probs = 1-((1-level)/2)))
)
} else {
return(data.frame(x = xseq, y = pred))
}
}
# unrelated hlper function to create a nicer plot:
fix_plot_limits <- function(p) p + coord_cartesian(xlim=ggplot_build(p)$layout$panel_params[[1]]$x.range, ylim=ggplot_build(p)$layout$panel_params[[1]]$y.range)
Demonstration:
library(ggplot2)
#devtools::install_github("Russel88/COEF")
library(COEF)
fix_plot_limits(
ggplot(data.frame(x = (1:5) + rnorm(100), y = (1:5) + rnorm(100)*2), mapping = aes(x=x, y=y)) +
geom_point()
) +
geom_smooth(method=deming, aes(color="deming"), method.args = list(noise_ratio=2)) +
geom_smooth(method=lm, aes(color="lm")) +
geom_smooth(method = COEF::tls, aes(color="tls"))
Created on 2019-12-04 by the reprex package (v0.3.0)
For anyone who is interested, I validated jhoward's solution against the deming::deming() function, as I was not familiar with jhoward's method of extracting the slope and intercept using PCA. They indeed produce identical results. Reprex is:
# Sample data and model (from ?Deming example)
set.seed(1)
M <- runif(100,0,5)
# Measurements:
x <- M + rnorm(100)
y <- 2 + 3 * M + rnorm(100,sd=2)
# Make data.frame()
df <- data.frame(x,y)
# Get intercept and slope using deming::deming()
library(deming)
mod_Dem <- deming::deming(y~x,df)
slp_Dem <- mod_Dem$coefficients[2]
int_Dem <- mod_Dem$coefficients[1]
# Get intercept and slope using jhoward's method
pca <- prcomp(~x+y, df)
slp_jhoward <- with(pca, rotation[2,1] / rotation[1,1])
int_jhoward <- with(pca, center[2] - slp_jhoward*center[1])
# Plot both orthogonal regression lines and simple linear regression line
library(ggplot2)
ggplot(df, aes(x,y)) +
geom_point() +
stat_smooth(method=lm, color="green", se=FALSE) +
geom_abline(slope=slp_jhoward, intercept=int_jhoward, color="blue", lwd = 3) +
geom_abline(slope=slp_Dem, intercept=int_Dem, color = "white", lwd = 2, linetype = 3)
Interestingly, if you switch the order of x and y in the models (i.e., to mod_Dem <- deming::deming(x~y,df) and pca <- prcomp(~y+x, df)) , you get completely different slopes:
My (very superficial) understanding of orthogonal regression was that it does not treat either variable as independent or dependent, and thus that the regression line should be unaffected by how the model is specified, e.g., as y~x vs x~y. Clearly I was very much mistaken, and I would be interested to hear anyone's thoughts about exactly why I was so wrong.

Resources