R, R², p-value and regression equation - r

This code gives me a plot with the regression equation and R2: (but i need to mention in which x and y the equation will be (manually)
CORRELATIONP3 <-CORRELATIONP2[product=='a',]
x<-CORRELATIONP3$b
y<-CORRELATIONP3$p
df <- data.frame(x = x)
m <- lm(y ~ x, data = df)
p <- ggplot(data = df, aes(x = x, y = y)) +
scale_x_continuous("b (%)") +
scale_y_continuous("p (%)")+
geom_smooth(method = "lm", formula = y ~ x) +
geom_point()
p
eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,
list( a = format(coef(m)[1], digits = 4),
b = format(coef(m)[2], digits = 4),
r2 = format(summary(m)$r.squared, digits = 3)))
dftext <- data.frame(x = 3, y = 0.2, eq = as.character(as.expression(eq)))
p + geom_text(aes(label = eq), data = dftext, parse = TRUE)
But, with this code I have R and p-value: And here the information about R and p values fits automatically in the plot, why? I want this in the first one as well.
CORRELATIONP3 <-CORRELATIONP2[product=='a',]
x<-CORRELATIONP3$b
y<-CORRELATIONP3$p
df <- data.frame(x = x)
m <- lm(y ~ x, data = df)
p <- ggplot(data = df, aes(x = x, y = y)) +
scale_x_continuous("b (%)") +
scale_y_continuous("p (%)")+
geom_smooth(method = "lm", formula = y ~ x) +
geom_point()
p
eq <- substitute(italic(r)~"="~rvalue*","~italic(p)~"="~pvalue, list(rvalue = sprintf("%.2f",sign(coef(m)[2])*sqrt(summary(m)$r.squared)), pvalue = format(summary(m)$coefficients[2,4], digits = 3)))
dftext <- data.frame(x = 30, y = 0.4, eq = as.character(as.expression(eq)))
p + geom_text(aes(label = eq), data = dftext, parse = TRUE)
Can you tell me how can I join all the 4 informations in one sigle plot? (R, R2, equation and p-value)
Besides that, i would like that these informations could be fitted automatically in the plot, not manually.

Ok, I am not sure if this works as you have not given a reproducible example of your data but I guess you just have to rename one of your variables e.g.:
eq2 <- substitute(italic(r)~"="~rvalue*","~italic(p)~"="~pvalue,
list(rvalue = sprintf("%.2f",sign(coef(m)[2])*sqrt(summary(m)$r.squared)),
pvalue = format(summary(m)$coefficients[2,4], digits = 3)))
and then you change the points you put it on in your plot just a bit below your other block in the first plot. x and y here refer to the position of the text lable so play around with these until your text looks ok.
dftext2 <- data.frame(x = 30, y = 0.12, eq2 = as.character(as.expression(eq2)))
p + geom_text(aes(label = eq2), data = dftext2, parse = TRUE)
please let me know if this works and if this is what you meant.

Related

How to put plotmath labels in ggplot facets

We often want individual regression equations in ggplot facets. The best way to do this is build the labels in a dataframe and then add them manually. But what if the labels contain plotmath, e.g., superscripts?
Here is a way to do it. The plotmath is converted to a string and then parsed by ggplot. The test_eqn function is taken from another Stackoverflow post, I'll link it when I find it again. Sorry about that.
library(ggplot2)
library(dplyr)
test_eqn <- function(y, x){
m <- lm(log(y) ~ log(x)) # fit y = a * x ^ b in log space
p <- exp(predict(m)) # model prediction of y
eq <- substitute(expression(Y==a~X^~b),
list(
a = format(unname(exp(coef(m)[1])), digits = 3),
b = format(unname(coef(m)[2]), digits = 3)
))
list(eq = as.character(eq)[2], pred = p)
}
set.seed(123)
x <- runif(20)
y <- runif(20)
test_eqn(x,y)$eq
#> [1] "Y == \"0.57\" ~ X^~\"0.413\""
data <- data.frame(x = x,
y = y,
f = sample(c("A","B"), 20, replace = TRUE)) %>%
group_by(f) %>%
mutate(
label = test_eqn(y,x)$eq, # add label
labelx = mean(x),
labely = mean(y),
pred = test_eqn(y,x)$pred # add prediction
)
# plot fits (use slice(1) to avoid multiple copies of labels)
ggplot(data) +
geom_point(aes(x = x, y = y)) +
geom_line(aes(x = x, y = pred), colour = "red") +
geom_text(data = slice(data, 1), aes(x = labelx, y = labely, label = label), parse = TRUE) +
facet_wrap("f")
Created on 2021-10-20 by the reprex package (v2.0.1)

Why the lm_eqn way of adding Regression line function did not work in my case?

I ran into a problem when I was writing a function in R.
I want to compare two variables in this function and I want to draw the regression line of the comparison. I would also want to add the information of the regression line, including the equation and the R^2. The lm_eqn way I have already tried and it did not work on my case, here is my code when I try it. I do not know why, please help!
lm_eqn <- function(df){
m <- lm(y ~ x, df);
eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,
list(a = format(coef(m)[1], digits = 2),
b = format(coef(m)[2], digits = 2),
r2 = format(summary(m)$r.squared, digits = 3)))
as.character(as.expression(eq));
}
compareFunction <- function(my_dataset, var1, var2) {
ggplot(data = my_dataset,
aes(x = my_dataset[[var1]],
y = my_dataset[[var2]])) +
geom_point() +
geom_smooth(method = 'lm', formula = 'y ~ x') +
geom_text(x = 100, y = 100, label = lm_eqn(my_dataset), parse = TRUE)
}
Ok it becomes a bit tedious in the comment.
So first I recommend adding some useful sample data:
x = runif(100)
y = runif(100)+x
df = data.frame(x,y)
Then update your lm_eqn function as follows - I removed the as.character from your return value.
lm_eqn <- function(df){
m <- lm(y ~ x, df);
eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,
list(a = format(coef(m)[1], digits = 2),
b = format(coef(m)[2], digits = 2),
r2 = format(summary(m)$r.squared, digits = 3)))
as.expression(eq);
}
The compareFunction I would change to use ggtitle:
compareFunction <- function(my_dataset, var1, var2) {
ggplot(data = my_dataset,
aes(x = my_dataset[[var1]],
y = my_dataset[[var2]])) +
geom_point() +
geom_smooth(method = 'lm', formula = 'y ~ x') +
ggtitle(lm_eqn(my_dataset))
}
Then compareFunction(df,"x","y") yields:

How do I get the equation for a regression line in log-log plot in ggplot2?

I've a log-log plot, I got the regression line by using:
geom_smooth(formula = y ~ x, method='lm')
But now I'd like to obtain the equation of this line (e.g. y=a*x^(-b)) and print it. I managed to get it in a lin-lin plot but not in this case.
Here's the code:
mydataS<-data.frame(DurPeak_h[],IntPeak[],IntPeakxDurPeak[],ID[]) #df peak
names(mydataS)<-c("x","y","ID","IDEVENT")
plotID<-ggplot(mydataS, aes(x=x, y=y, label=IDEVENT)) +
geom_text(check_overlap = TRUE, hjust = 0, nudge_x = 0.02)+
geom_point(colour="black", size = 2) + geom_point(aes(colour = ID)) +
geom_quantile(quantiles = qs, colour="green")+
scale_colour_gradient(low = "white", high="red") +
scale_x_log10(limits = c(min(DurEnd_h),max(DurEnd_h))) +
scale_y_log10(limits = c(min(IntEnd),max(IntEnd))) +
geom_smooth(formula = y ~ x, method='lm')
ggsave(height=7,"plot.pdf")
mydataS<-data.frame(DurPeak_h[],IntPeak[],IntPeakxDurPeak[],ID[])
names(mydataS)<-c("x","y","ID","IDEVENT")
model <- lm(y~x, header = T)
summary(model)
use the intercept value given as "b" and the coefficient as your "a"
Did it with a workaround: using nls to calculate the two parameters a and b, precisely:
nlsPeak <- coef(nls(y ~ a*(x)^b, data = mydataS, start = list(a=30, b=-0.1)))
then plotting the line with annotate (see some examples here) and finally printing the equation using the function:
power_eqn = function(ds){
m = nls(y ~ a*x^b, start = list(a=30, b=-0.1), data = ds);
eq <- substitute(italic(y) == a ~italic(x)^b,
list(a = format(coef(m)[1], digits = 4),
b = format(coef(m)[2], digits = 2)))
as.character(as.expression(eq));
}
called as follow:
annotate("text",x = 3, y = 180,label = power_eqn(mydataS), parse=TRUE, col="black") +
Hope it helps!

How to display different levels in a multilevel analysis with different colors

I am a beginner at multilevel analysis and try to understand how I can do graphs with the plot functions from base-R. I understand the output of fit below but I am struggeling with the visualization. df is just some simple test data:
t <- seq(0, 10, 1)
df <- data.frame(t = t,
y = 1.5+0.5*(-1)^t + (1.5+0.5*(-1)^t) * t,
p1 = as.factor(rep(c("p1", "p2"), 10)[1:11]))
fit <- lm(y ~ t * p1, data = df)
# I am looking for an automated version of that:
plot(df$t, df$y)
lines(df$t[df$p1 == "p1"],
fit$coefficients[1] + fit$coefficients[2] * df$t[df$p1 == "p1"], col = "blue")
lines(df$t[df$p1 == "p2"],
fit$coefficients[1] + fit$coefficients[2] * df$t[df$p1 == "p2"] +
+ fit$coefficients[3] + fit$coefficients[4] * df$t[df$p1 == "p2"], col = "red")
It should know that it has to include p1 and that there are two lines.
The result should look like this:
Edit: Predict est <- predict(fit, newx = t) gives the same result as fit but still I don't know "how to cluster".
Edit 2 #Keith: The formula y ~ t * p1 reads y = (a + c * p1) + (b + d * p1) * t. For the "first blue line" c, d are both zero.
This is how I would do it. I'm including a ggplot2 version of plot as well because I find it better fitted for the way I think about plots.
This version will account for the number of levels in p1. If you want to compensate for the number of model parameters, you will just have to adjust the way you construct xy to include all the relevant variables. I should point out that if you omit the newdata argument, fitting will be done on the dataset provided to lm.
t <- seq(0, 10, 1)
df <- data.frame(t = t,
y = 1.5+0.5*(-1)^t + (1.5+0.5*(-1)^t) * t,
p1 = as.factor(rep(c("p1", "p2"), 10)[1:11]))
fit <- lm(y ~ t * p1, data = df)
xy <- data.frame(t = t, p1 = rep(levels(df$p1), each = length(t)))
xy$fitted <- predict(fit, newdata = xy)
library(RColorBrewer) # for colors, you can define your own
cols <- brewer.pal(n = length(levels(df$p1)), name = "Set1") # feel free to ignore the warning
plot(x = df$t, y = df$y)
for (i in 1:length(levels(xy$p1))) {
tmp <- xy[xy$p1 == levels(xy$p1)[i], ]
lines(x = tmp$t, y = tmp$fitted, col = cols[i])
}
library(ggplot2)
ggplot(xy, aes(x = t, y = fitted, color = p1)) +
theme_bw() +
geom_point(data = df, aes(x = t, y = y)) +
geom_line()

2 polynomial regressions in a ggplot() graph

This is my Dataset:
As you can see, there are two quantitative variables (X, Y) and 1 categorical variable (molar, with two factors: M1, M2).
I would like to represent in one single graph two polynomial regressions and their respective prediction intervals: one for the M1 factor and one for the M2 factor. Each polynomial regression has its own degree (M1 is a 4 degree polynomial regression, and M2 is a 6 degree).
I want to use ggplot() function (which is in package ggplot2 in R). I have actually performed this figure but with all data merged (I mean, with no distinction between factors). This is the code I used:
# Fit a linear model
m <- lm(Y ~ X+I(X^2)+I(X^3)+I(X^4), data = Dataset)
# cbind the predictions to Dataset
mpi <- cbind(Dataset, predict(m, interval = "prediction"))
ggplot(mpi, aes(x = X)) +
geom_ribbon(aes(ymin = lwr, ymax = upr),
fill = "blue", alpha = 0.2) +
geom_point(aes(y = Y)) +
geom_line(aes(y = fit), colour = "blue", size = 1)
With this result:
So, I would like to have two different-grade polynomial regressions (one for the M1 and one for the M2), taking into account their respective predictions intervals. Which would be the exact code?
UPDATE - New code! I run this code with no success:
M1=subset(Dataset,Dataset$molar=="M1",select=X:Y)
M2=subset(Dataset,Dataset$molar=="M2",select=X:Y)
M1.R <- lm(Y ~ X +I(X^2)+I(X^3)+I(X^4),
data=subset(Dataset,Dataset$molar=="M1",select=X:Y))
M2.R <- lm(Y ~ X +I(X^2)+I(X^3)+I(X^4),
data=subset(Dataset,Dataset$molar=="M2",select=X:Y))
newdf <- data.frame(x = seq(0, 1, c(408,663)))
M1.P <- cbind(data=subset(Dataset,Dataset$molar=="M1",select=X:Y), predict(M1.R, interval = "prediction"))
M2.P <- cbind(data=subset(Dataset,Dataset$molar=="M2",select=X:Y), predict(M2.R, interval = "prediction"))
p = cbind(as.data.frame(rbind(M1.P, M2.P)), f = factor(rep(1:2, c(408,663)), x = rep(newdf$x, 2))
mdf = with(Dataset, data.frame(x = rep(x, 2), y = c(subset(Dataset,Dataset$molar=="M1",select=Y), subset(Dataset,Dataset$molar=="M2",select=Y),
f = factor(rep(1:2, c(408,663))))
ggplot(mdf, aes(x = x, y = y, colour = f)) + geom_point() +
geom_ribbon(data = p, aes(x = x, ymin = lwr, ymax = upr,
fill = f, y = NULL, colour = NULL),
alpha = 0.2) +
geom_line(data = p, aes(x = x, y = fit))
These are the messages I get now:
[98] WARNING: Warning in if (n < 0L) stop("wrong sign in 'by' argument") :
the condition has length > 1 and only the first element will be used
Warning in if (n > .Machine$integer.max) stop("'by' argument is much too small") :
the condition has length > 1 and only the first element will be used
Warning in 0L:n :
numerical expression has 2 elements: only the first used
Warning in if (by > 0) pmin(x, to) else pmax(x, to) :
the condition has length > 1 and only the first element will be used
[99] WARNING: Warning in predict.lm(M1.R, interval = "prediction") :
predictions on current data refer to _future_ responses
[100] WARNING: Warning in predict.lm(M2.R, interval = "prediction") :
predictions on current data refer to _future_ responses
[101] ERROR: <text>
I think I am closer but still can't see it. Help!
Here is one way. If you have more than two models/levels in the factor you should look into code that will work over the levels of the factor and fit the models that way.
Anyway, first some dummy data:
set.seed(100)
x <- runif(100)
y1 <- 2 + (0.3 * x) + (2.4 * x^2) + (-2.5 * x^3) + (3.4 * x^4) + rnorm(100)
y2 <- -1 + (0.3 * x) + (2.4 * x^2) + (-2.5 * x^3) + (3.4 * x^4) +
(-0.3 * x^5) + (2.4 * x^6) + rnorm(100)
df <- data.frame(x, y1, y2)
Fit our two models:
m1 <- lm(y1 ~ poly(x, 4), data = df)
m2 <- lm(y2 ~ poly(x, 6), data = df)
Now precict at some new locations x and stick it together with x and f, a factor indexing the model, into a tidy format:
newdf <- data.frame(x = seq(0, 1, length = 100))
p1 <- predict(m1, newdata = newdf, interval = "prediction")
p2 <- predict(m2, newdata = newdf, interval = "prediction")
p <- cbind(as.data.frame(rbind(p1, p2)), f = factor(rep(1:2, each = 100)),
x = rep(newdf$x, 2))
Melt the original data into tidy form
mdf <- with(df, data.frame(x = rep(x, 2), y = c(y1, y2),
f = factor(rep(1:2, each = 100))))
Draw the plot, using colour to distinguish the models/data
ggplot(mdf, aes(x = x, y = y, colour = f)) +
geom_point() +
geom_ribbon(data = p, aes(x = x, ymin = lwr, ymax = upr,
fill = f, y = NULL, colour = NULL),
alpha = 0.2) +
geom_line(data = p, aes(x = x, y = fit))
This gets us

Resources