Adding the R-squared for a linear regression plot (ggplot2)? - r

I've been trying different suggestions such as the ggmisc package, but nothing seems to work in my favor.
I'm using the iris dataframe and just trying to plot random variables:
modellm <- lm(`Sepal.Length` ~ `Sepal.Width` + `Petal.Length` + `Petal.Width`, data = iris)
model <- coef(Modellm)["(Intercept)"] +
coef(Modellm)["Sepal.Width"] * iris$`Sepal.Width` +
coef(Modellm)["Petal.Length"] * iris$`Petal.Length` +
coef(Modellm)["Petal.Width"] * iris$`Petal.Width` +
residuals(Modellm)
library(ggplot2)
ggplot(iris, aes(`Sepal.Length`, model))+
geom_point(size=2, alpha=0.2)+
geom_smooth(method='lm')
How is it possible for me to get the R-squared value plotted in the ggplot?

If you want to display the r squared value just add this to the end of your plot:
+ annotate("text", x = 1, y = 1, label = paste0("R Squared = ", summary(modellm)$r.squared))
adjust the placement with the x and y coordinates

If you really want to plot the R^2, you could do something like this.
library(ggplot2)
p <- ggplot(iris, aes(`Sepal.Length`, model))+
geom_point(size=2, alpha=0.2)+
geom_smooth(method='lm')
r2 <- summary(Modellm)$r.squared
p + scale_y_continuous(
sec.axis=sec_axis(~ . * 4 / 30 , name = expression(paste(R^{2})))) +
geom_rect(xmin=7.9, xmax=8, ymin=0, ymax=1*30/4,
fill="white", color="#78B17E") +
geom_rect(xmin=7.9, xmax=8, ymin=0, ymax=r2*30/4, fill="#78B17E") +
annotate("text", x = 7.95, y = 7.62, size=3, color="#78B17E",
label = paste0(round(r2, 2)))
Yields

Related

ggplot2, introduce breaks on a x log scale

I have a plot like this:
p<-ggplot() +
geom_line(data= myData, aes(x = myData$x , y = myData$y)) +
scale_x_log10()+
scale_y_log10()
My x value is seq(9880000, 12220000, 10000)
There is only one break on the x-axis of the plot, what should I do if to get at least 3 breaks on the plot x-axis?
Here is fully reproducible example of the original poster's problem where a log-scaled plot only displays one break value on the x-axis. I demonstrate three possible solutions below.
library(ggplot2)
# Create a reproducible example data.frame using R functions.
x = seq(9880000, 12220000, 10000)
# Use set.seed() so that anyone who runs this code
# will get the same sequence of 'random' values.
set.seed(31415)
y = cumsum(runif(n=length(x), min=-1e5, max=1e5)) + 1e6
dat = data.frame(x=x, y=y)
# Original poster's plot.
p1 = ggplot(data=dat, aes(x=x, y=y)) +
geom_line() +
scale_x_log10() +
scale_y_log10() +
labs(title="1. Plot has only one x-axis break.")
# Add extra x-axis breaks manually.
x_breaks = c(10^7.0, 10^7.04, 10^7.08)
p2 = ggplot(data=dat, aes(x=x, y=y)) +
geom_line() +
scale_x_log10(breaks=x_breaks) +
scale_y_log10() +
labs(title="2. Add some x-axis breaks manually.")
# Add extra x-axis breaks in semi-automated manner.
x_breaks = 10^pretty(log10(x))
x_labels = formatC(x_breaks, format = "e", digits = 2)
p3 = ggplot(data=dat, aes(x=x, y=y)) +
geom_line() +
scale_x_log10(breaks=x_breaks, labels=x_labels) +
scale_y_log10() +
labs(title="3. Create x-axis breaks with R functions.")
# Skip the log10 scale because the x-values don't span multiple orders of magnitude.
p4 = ggplot(data=dat, aes(x=x, y=y)) +
geom_line() +
scale_y_log10() +
labs(title="4. Check appearance without log10 scale for x-axis.")
library(gridExtra)
ggsave("example.png", plot=arrangeGrob(p1, p2, p3, p4, nrow=2),
width=10, height=5, dpi=150)
I add: scale_x_log10(breaks=seq(9880000, 12220000, 1000000)).
This is my reproducible example:
library(random)
library(ggplot2)
z <- randomStrings(n=235, len=5, digits=TRUE, upperalpha=TRUE, loweralpha=TRUE, unique=TRUE, check=TRUE)
x <- seq(9880000, 12220000, 10000)
y <- randomNumbers(n=235, min=9880000, max=12220000, col=1)
df <- data.frame(z, x, y)
head(df)
V1 x V1.1
1 378VO 9880000 11501626
2 AStRK 9890000 10929705
3 sotp4 9900000 11305700
4 AS4DR 9910000 11302110
5 7iFdk 9920000 11611918
6 HIS7z 9930000 11175074
p<-ggplot() + geom_line(data= df, aes(x = df$x , y = df$V1.1)) + scale_y_log10()
p + scale_x_log10(breaks=seq(9880000, 12220000, 1000000))
Hope it is useful...
Add this between your parenthesis: breaks=seq(specify, breaks, here)
For example, if you wanted a break at 0, 10, 100:
scale_x_log10((breaks=seq(0,10,100))

Adding two Y axes to an xy plot

Working with this data in Rstudio. I need to run a simple regression of ed76 on lwage76 and a saturated regression that turns ed76 into a dummy variable for every level within the column. Then I need to plot both regressions in an XY plot with lwage76 as the Y axis and ed76 as the X axis. This is what I have so far:
regression <- lm(nlsdata$lwage76~nlsdata$ed76)
predicted <- data.frame(Edu =nlsdata$ed76, Wage = predict(regression))
aggplot <- aggregate(Wage ~ Edu, data=predicted, mean)
xyplot( Wage ~ Edu, data = aggplot, grid = TRUE, type = c("p","l"))
This gives me a very nice XY plot, but now I need to add the predicted values from my staturated model:
satreg <- lm(lwage76 ~ ed76*edu_1 + ed76*edu_2 + ed76*edu_3 +
ed76*edu_4 + ed76*edu_5 + ed76*edu_6 + ed76*edu_7 +
ed76*edu_8 + ed76*edu_9 + ed76*edu_10 + ed76*edu_11 +
ed76*edu_12 + ed76*edu_13 + ed76*edu_14 + ed76*edu_15 +
ed76*edu_16 + ed76*edu_17, data = nlsdata)
satmodel <- data.frame(Edu =nlsdata$ed76, Wage = predict(satreg))
So how do I add the second data set to the graph that I have?
Solution in ggplot:
ggplot(data=predicted, aes(Edu, Wage)) +
geom_line() +
geom_point() +
geom_line(data=satmodel, colour="blue") +
geom_point(data=satmodel, colour="blue")
Alternatively, you can label each of your table and combined them into a single data.frame.
satmodel <- satmodel %>% mutate(type="sat_model")
predicted <- predicted %>% mutate(type="predicted")
df <- rbind(satmodel, predicted)
ggplot(df, aes(Edu, Wage, colour=type)) +
geom_line() +
geom_point()

Varying factor order in each facet of ggplot2

I am trying to create a Cleveland Dot Plot given for two categories in this case J and K. The problem is the elements A,B,C are in both categories so R keeps farting. I have made a simple example:
x <- c(LETTERS[1:10],LETTERS[1:3],LETTERS[11:17])
type <- c(rep("J",10),rep("K",10))
y <- rnorm(n=20,10,2)
data <- data.frame(x,y,type)
data
data$type <- as.factor(data$type)
nameorder <- data$x[order(data$type,data$y)]
data$x <- factor(data$x,levels=nameorder)
ggplot(data, aes(x=y, y=x)) +
geom_segment(aes(yend=x), xend=0, colour="grey50") +
geom_point(size=3, aes(colour=type)) +
scale_colour_brewer(palette="Set1", limits=c("J","K"), guide=FALSE) +
theme_bw() +
theme(panel.grid.major.y = element_blank()) +
facet_grid(type ~ ., scales="free_y", space="free_y")
Ideally, I would want a dot plot for both categories(J,K) individually with each factor(vector x) decreasing with respect to the y vector. What ends up happening is that both categories aren't going from biggest to smallest and are erratic at the end instead. Please help!
Unfortunately factors can only have one set of levels. The only way i've found to do this is actually to create two separate data.frames from your data and re-level the factor in each. For example
data <- data.frame(
x = c(LETTERS[1:10],LETTERS[1:3],LETTERS[11:17]),
y = rnorm(n=20,10,2),
type= c(rep("J",10),rep("K",10))
)
data$type <- as.factor(data$type)
J<-subset(data, type=="J")
J$x <- reorder(J$x, J$y, max)
K<-subset(data, type=="K")
K$x <- reorder(K$x, K$y, max)
Now we can plot them with
ggplot(mapping = aes(x=y, y=x, xend=0, yend=x)) +
geom_segment(data=J, colour="grey50") +
geom_point(data=J, size=3, aes(colour=type)) +
geom_segment(data=K, colour="grey50") +
geom_point(data=K, size=3, aes(colour=type)) +
theme_bw() +
theme(panel.grid.major.y = element_blank()) +
facet_grid(type ~ ., scales="free_y", space="free_y")
which results in

Grouping labels when x is a factor variable in ggplot2

I'm trying to replace the x-axis labels "A0" and "A1" by one "A" which can be placed in the middle of "A0" and "A1". It would be better if there is a method which works like the following question:
grouping of axis labels ggplot2
By that, I mean to redraw the x-axis only for each group, and leave a blank between groups.
Here is the code I'm working on:
y = 1*round(runif(20)*10,1)
x1 = c("A","B")
x2 = c(0,1)
x = expand.grid(x1,x2)
xy = cbind(x,y)
xy$z = paste(xy$Var1,xy$Var2,sep="")
p <- ggplot(xy, aes(x=factor(z), y=y,fill=factor(Var2)))
p + geom_boxplot() + geom_jitter(position=position_jitter(width=.2)) + theme_bw() + xlab("X") + ylab("Y") + scale_fill_discrete(name="Var2",breaks=c(0, 1),labels=c("T", "C"))
Try this. No need for the variable z, just use position="dodge":
p <- ggplot(xy, aes(x=factor(Var1), y=y,fill=factor(Var2)))
p + geom_boxplot(position="dodge") + geom_jitter(position=position_jitter(width=.2)) + theme_bw() + xlab("X") + ylab("Y") + scale_fill_discrete(name="Var2",breaks=c(0, 1),labels=c("T", "C"))

Error with ggplot2

I don't know what am I missing in the code?
set.seed(12345)
require(ggplot2)
AData <- data.frame(Glabel=LETTERS[1:7], A=rnorm(7, mean = 0, sd = 1), B=rnorm(7, mean = 0, sd = 1))
TData <- data.frame(Tlabel=LETTERS[11:20], A=rnorm(10, mean = 0, sd = 1), B=rnorm(10, mean = 0, sd = 1))
i <- 2
j <- 3
p <- ggplot(data=AData, aes(AData[, i], AData[, j])) + geom_point() + theme_bw()
p <- p + geom_text(aes(data=AData, label=Glabel), size=3, vjust=1.25, colour="black")
p <- p + geom_segment(data = TData, aes(xend = TData[ ,i], yend=TData[ ,j]),
x=0, y=0, colour="black",
arrow=arrow(angle=25, length=unit(0.25, "cm")))
p <- p + geom_text(data=TData, aes(label=Tlabel), size=3, vjust=1.35, colour="black")
Last line of the code produces the error. Please point me out how to figure out this problem. Thanks in advance.
I have no idea what you are trying to do, but the line that fails is the last line, because you haven't mapped new x and y variables in the mapping. geom_text() needs x and y coords but you only provide the label argument, so ggplot takes x and y from p, which has only 7 rows of data whilst Tlabel is of length 10. That explains the error. I presume you mean to plot at x = A and y = B of TData? If so, this works:
p + geom_text(data=TData, mapping = aes(A, B, label=Tlabel),
size=3, vjust=1.35, colour="black")
(This might get a better answer on the ggplot mailing list.)
It looks like you're trying to display some kind of biplot ... the root of your problem is that you're violating the idiom of ggplot, which wants you to specify variables in a way that's consistent with the scope of the data.
Maybe this does what you want, via some aes_string trickery that substitutes the names of the desired columns ...
varnames <- colnames(AData)[-1]
v1 <- varnames[1]
v2 <- varnames[2]
p <- ggplot(data=AData,
aes_string(x=v1, y=v2)) + geom_point() + theme_bw()
## took out redundant 'data', made size bigger so I could see the labels
p <- p + geom_text(aes(label=Glabel), size=7, vjust=1.25, colour="black")
p <- p + geom_segment(data = TData, aes_string(xend = v1, yend=v2),
x=0, y=0, colour="black",
arrow=arrow(angle=25, length=unit(0.25, "cm")))
## added colour so I could distinguish this second set of labels
p <- p + geom_text(data=TData,
aes(label=Tlabel), size=10, vjust=1.35, colour="blue")

Resources