So I have a simple multi-plot/plotmatrix of the following form:
DATA_SE <- read.table("DEWRATES_SE_15-17.txt", sep = "\t", dec = ".", header = T)
multiplot_SE <- pairs(~SE_21+SE_25+SE_26, data = DATA_SE, main = "Tauraten_Selhausen")
multiplot_SE
Is there any way to add r-squared-values (for a simple lm-modell) to each one of my plots?
Thanks!
Update:
Is there a way to set a fixed limit for the x- and y-axis of my plot-panels?
I just need to set the them all at the same value (even for x- and y)!
You can do something like this (since you don't provide sample data I'm using the iris dataset to demonstrate):
panel.rsquared <- function(x, y) {
fit <- lm(y ~ x)
usr <- par("usr")
on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
text(0.5, 0.5, sprintf(
"R squared = %4.3f\n Adj. R squared = %4.3f",
summary(fit)$r.squared,
summary(fit)$adj.r.squared))
}
pairs(iris[, -ncol(iris)], upper.panel = panel.rsquared)
Update
In response to your comment, you can define any upper/lower panel function to meet your needs.
For example, you could do something like I'm showing below. Mind you, this is not very useful, as it will be difficult (impossible) to avoid overlapping text and points. That's the whole idea (and strength) of pairs when configuring the upper panel to show annotation/text and the lower panel to show the plots. That way you avoid redundancies (in your original post plots are repeated and are therefore redundant).
Anyway, for what it's worth:
panel.plot_withrsquared <- function(x, y) {
points(x, y)
fit <- lm(y ~ x)
usr <- par("usr")
on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
text(0.1, 0.8,
sprintf("R squared = %4.3f",summary(fit)$r.squared),
adj = 0, cex = 0.8)
}
pairs(
iris[, -ncol(iris)],
upper.panel = panel.rsquared,
lower.panel = panel.plot_withrsquared)
Related
I would like to do a corrplot, but instead of using the correlation coefficient, it would display the slope of a linear regression between each variables.
And if possible, it would do the same than the corrplot function, as it will show which slope is significant or not. And for comparaison issues between the variables, I guess it would be preferable to normalise all the slopes.
I want to do that because I have sometimes a bad correlation/R2, but still a significant slope. So having both the correlation matrix and the "slope" matrix would be great.
Do you know if there is any existing function like this ? Or how to do it ?
Thank you.
EDIT :
Here is a link explaining why I have a difference between the slope and R2/correlation : https://statisticsbyjim.com/regression/low-r-squared-regression/
Here is an example of what I get using corrplot. And what I would like to do is a similar function but with the slope instead of the correlation.
M<-cor(mtcars)
test <- cor.mtest(M, conf.level = 0.95)
corrplot(M, order="hclust", tl.col="black",
p.mat = test$p, sig.level = 0.10)
Here you have points with best fit (lower panel), and the regression parameters( upper panel):
#Panel of correlations
panel.corr <- function(x, y,data){
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
a <- round(summary(lm(x~y, data=mtcars))$coef[1,1],3)
b <- round(summary(lm(x~y, data=mtcars))$coef[2,1],3)
txt <- paste0("y=", a," + (",b,")*x")
text(0.5, 0.5, txt, cex = 1)
}
#Panel of histograms
panel.hist <- function(x, ...){
usr <- par("usr"); on.exit(par(usr))
par(usr = c(usr[1:2], 0, 1.5) )
h <- hist(x, plot = FALSE)
breaks <- h$breaks
len <- length(breaks)
y <- h$counts/max(h$counts)
rect(breaks[-len], 0, breaks[-1], y, col = "lightblue")
}
panel.scat <- function(x, y, ...) {
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1), new = TRUE)
plot(x,y)
abline(lm(y ~ x))
}
#Plot
pairs(mtcars[, c(1,3:7)],
lower.panel = panel.scat,
upper.panel = panel.corr,
diag.panel = panel.hist,
gap = 0.3,
main = "Scatterplot matrix of `mtcars`")
Following the tutorial on this page and to answer your question :
library(tidyverse)
library(ggpubr)
theme_set(theme_pubr())
# Load the package
data("marketing", package = "datarium")
head(marketing, 4)
ggplot(marketing, aes(x = youtube, y = sales)) +
geom_point() +
stat_smooth()
cor(marketing$sales, marketing$youtube)
model <- lm(sales ~ youtube, data = marketing)
model
The output of calling model is :
##
## Call:
## lm(formula = sales ~ youtube, data = marketing)
##
## Coefficients:
## (Intercept) youtube
## 8.4391 0.0475
And there is the informations you're looking for :
Intercepts is quite self-explanatory
Slope is the value of youtube coefficient here
If you are working on multiple regression you need to take into account all the coeeficient from your model or formula to link the R code.
If you want to compare just to features which you previously computed the correlation just swap it into the formula and you'll get a simple regression model for it. I have to advise you to check the pre-requisite of a linear regression before just in case ...
Hope it helps.
There's a nice answer around to plot a miniature plot within a plot. I wrapped it in a function which works fine for a single plot.
myPlot <- function(x, y) {
# main plot
plot(x)
# calculate position of inset
pp <- par("plt")
x0 <- pp[2] - (pp[2] - pp[1]) * 0.225
x1 <- pp[2] - .01
y0 <- pp[4] - (pp[4] - pp[3]) * 0.225
y1 <- pp[4] - .01
# set position for inset
op <- par(fig=c(x0, x1, y0, y1), mar=c(0, 0, 0, 0), new=TRUE)
# add inset grey background
plot.new()
u <- par("usr")
rect(u[1], u[2], u[4], u[3], col="grey80")
# add inset
par(new=TRUE)
plot(y, col=2)
par(op)
}
myPlot(x, y)
However, when I useMap to loop over several data lists, in order to make multiple plots of this type side by side, there seems to be a mess with the pars. The miniature appears as a new plot and not within the main plot. Also a new device is opened after one iteration (i.e. old plot gets overwritten).
op1 <- par(mfrow=c(1, 2))
Map(function(x, y) myPlot(x, y), list(d0, d0), list(d0_inset, d0_inset))
par(op1)
When I use Map(function(x, y) myPlot(x, y), list(d0, d0), list(d0_inset, d0_inset)) alone, though, there are two perfect plots in the plot queue (of RStudio). Thus the plot.new() and par(new=TRUE) might not be the issue here.
What I actually want is this:
myPlot() should throw a number of main plots with miniatures inside corresponding to the length of the data lists when using Map and fit it into the par(mfrow=...).
Does anyone have a clue how to solve this using base R functionalities?
Data:
x <- data.frame(x = rnorm(150, sd=5), y = rnorm(150, sd=5))
y <- data.frame(x = rnorm(1500, sd=5), y = rnorm(1500, sd=5))
There's a couple of points here Jay. The first is that if you want to continue to use mfrow then it's best to stay away from using par(fig = x) to control your plot locations, since fig changes depending on mfrow and also forces a new plot (though you can override that, as per your question). You can use plt instead, which makes all co-ordinates relative to the space within the fig co-ordinates.
The second point is that you can plot the rectangle without calling plot.new()
The third, and maybe most important, is that you only need to write to par twice: once to change plt to the new plotting co-ordinates (including a new = TRUE to plot it in the same window) and once to reset plt (since new will reset itself). This means the function is well behaved and leaves the par as they were.
Note I have added a parameter, at, that allows you to specify the position and size of the little plot within the larger plot. It uses normalized co-ordinates, so for example c(0, 0.5, 0, 0.5) would be the bottom left quarter of the plotting area. I have set it to default at somewhere near your version's location.
myPlot <- function(x, y, at = c(0.7, 0.95, 0.7, 0.95))
{
# Helper function to simplify co-ordinate conversions
space_convert <- function(vec1, vec2)
{
vec1[1:2] <- vec1[1:2] * diff(vec2)[1] + vec2[1]
vec1[3:4] <- vec1[3:4] * diff(vec2)[3] + vec2[3]
vec1
}
# Main plot
plot(x)
# Gray rectangle
u <- space_convert(at, par("usr"))
rect(u[1], u[3], u[2], u[4], col="grey80")
# Only write to par once for drawing insert plot: change back afterwards
plt <- par("plt")
plt_space <- space_convert(at, plt)
par(plt = plt_space, new = TRUE)
plot(y, col = 2)
par(plt = plt)
}
So we can test it with:
x <- data.frame(x = rnorm(150, sd = 5), y = rnorm(150, sd = 5))
y <- data.frame(x = rnorm(1500, sd = 5), y = rnorm(1500, sd = 5))
myPlot(x, y)
par(mfrow = c(1, 2))
myPlot(x, y)
myPlot(x, y)
par(mfrow = c(2, 2))
for(i in 1:4) myPlot(x, y)
I'm new to R and working on some code that outputs a scatter plot matrix. The data frame is in the following format:
A B C D
2 3 0 5
8 9 5 4
0 0 5 3
7 0 0 0
My data sets can run into the 100-1000s of rows and 10-100s of columns, with a wide scale of values (hence log transforming my data).
This bit of code gives me some partial success in enhancing the basic plot (see embedded image):
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1), xlog = FALSE, ylog = FALSE)
r <- abs(cor(x, y))
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste(prefix, txt)
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor * r)
}
# Add regression line to plots.
my_line <- function(x,y,...){
points(x,y,...)
LR <- lm(log(x) ~ log(y), data = SP)
abline(LR, col = "red", untf = TRUE)
}
# Plot scatter plot matrices.
pairs(mydataframe, pch = 20, main = "test",
cex = 0.125, cex.labels = 1,
xlim = c(100, 1e9),
ylim = c(100, 1e9),
upper.panel = panel.cor,
lower.panel = my_line,
log = "xy")'
example
Problem 1 - instead of getting R^2 values in the upper panel, I get NAs instead. How can I correct this?
Problem 2 - I'd like to remove the function for adjusting text size of R^2 value in proportion to correlation. I know it's in panel.cor but not sure which part will need removal or adjustment.
Many thanks in advance
EDIT: 08/06/2016
I have found a work around which also simplifies the code:
panel.cor <- function(x, y, digits = 2, cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
# correlation coefficient
r <- cor(x, y)
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste("r= ", txt, sep = "")
text(0.5, 0.6, txt)
}
# add regression line to plots.
my_line <- function(x,y,...)
{
points(x,y,...)
LR <- lm(x ~ y, data = SP)
abline(LR, col = "red", untf = TRUE)
}
# Plot scatterplot matrices.
pairs(SP, pch = 20, main = "test",
cex = 0.125, cex.labels = 1,
upper.panel = panel.cor,
lower.panel = my_line)
example 2
The issue appears to be missing values i.e. 0's. I change these to NA's initially so I can use a log scale. This in combination with log transformation leads to missing R^2 values in the upper panel.
Ideally I'd like to have a log scale. Is there a way i can do this without introducing the aformentioned issue?
Clarification - I'd like a log (xy) scale in the scatter plots (lower panel) and for x-axis in the histograms (diagonal panel). I've been playing about with it today but can't quite get it as i want. Perhaps i'm asking too much from pairs. Any help would be appreciated.
Edit: 10/06/2016
Success!....well approximately 99% happy.
I have made changes - added histograms to diagonal panel and p-value to upper panel (the base code in "pairs()" for adding the histogram needed adjustment due to the log scale used on the x-axis). Please feel free to correct my descriptions if they're not accurate or correct:
library(lattice)
DF <- read.csv("File location", header = TRUE)
DF.1 <- DF+1 # Added small epsilon to data frame otherwise plot errors arise due to missing values.
# Function to calculate R^2 & p-value for upper panels in pairs() - scatterplot matrices.
panel.cor <- function(x, y, digits = 3, cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1), xlog = FALSE, ylog = FALSE) # xlog/ylog: ensures that R^2 and p-values display in upper panel.
# Calculate correlation coefficient and add to diagonal plot.
r <- cor(x, y)
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste("r= ", txt, sep = "")
text(0.5, 0.7, txt, cex = 1.25) # First 2 arguments determine postion of R^2-value in upper panel cells.
# Calculate P-value and add to diagonal plot.
p <- cor.test(x, y)$p.value
txt2 <- format(c(p, 0.123456789), digits = digits)[1]
txt2 <- paste("p= ", txt2, sep = "")
if(p<0.01) txt2 <- paste("p= ", "<0.01", sep = "")
text(0.5, 0.3, txt2, cex = 1.25) # First 2 arguments determine postion of p-value in upper panel cells.
}
# Function to calculate frequency distribution and plot histogram in diagonal plot.
panel.hist <- function(x, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0.5, 1.5, 0, 1.75), xlog = TRUE, ylog = FALSE) # xlog argument allows log x-axis when called in pairs.
h <- hist(log(x), plot = FALSE, breaks = 20)
breaks <- h$breaks; nB <- length(breaks)
y <- h$counts; y <- y/max(y)
rect(breaks[-nB], 0, breaks[-1], y, col = "cyan")
}
# add regression line to plots.
my_line <- function(x,y, ...)
{
points(x,y,...)
LR <- lm(log(x) ~ log(y), data = DF.1)
abline(LR, col = "red", untf = TRUE)
}
# Plot scatterplot matrices.
pairs(DF.1, pch = 20, main = "Chart Title",
cex = 0.75, cex.labels = 1.5, label.pos = 0.0001,
upper.panel = panel.cor,
lower.panel = my_line,
diag.panel = panel.hist,
log = ("xy"),
xlim = c(5, 1e9),
ylim = c(5, 1e9))
The fly in the ointment:
1 - the text labels in the diagonal panel only partially appear. I used a decreasing value for the "label.pos" argument in "pairs()" which moved the label down until they appeared. However, they won't move anymore no matter how much I decrease that value. I've tried to coerce the position from the histogram function, but that doesn't work. I hope someone can see what I'm missing. Thanks in advance...I've not had any responses yet:(
PS: I tried to link 3rd image with my successful plot but I was foiled by my lack of reputation...groan.
EDIT: 13/06/2016
Solved! I feel a bit foolish. The fix for the positioning of the main title in the diagonal panel was super simple and I spent a long time trying much more complex ways to do this. The "label.pos" argument in pairs should be negative! I used a small value of -0.0675 which placed it near the top of the cell containing the histogram.
I hope someone else finds this useful. I'll mark as solved but I'd appreciate any comments regarding my code commenting or if someone sees a way of making the code more efficient. Thanks Alex
Sometimes i feel totally dense. Answer my own question...who would have thought...slaps head. Please see edits in my post for the fixes I found.
Similar to a prevous post, I'd like to modify the following code (from example in the R documentation for pairs() command):
## put (absolute) correlations on the upper panels,
## with size proportional to the correlations.
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y))
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste0(prefix, txt)
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor * r)
}
pairs(USJudgeRatings, lower.panel = panel.smooth, upper.panel = panel.cor)
Instead of a loess line, I want a line of identity for each plot. The secret lies in the $"panel.smooth" function, but I don't know how to modify it.
I think you just mean something like this:
my_line <- function(x,y,...){
points(x,y,...)
abline(a = 0,b = 1,...)
}
pairs(USJudgeRatings, lower.panel = my_line, upper.panel = panel.cor)
Or, if you want to plot a fitted, linear line, then you could modify joran's answer:
my_line <- function(x,y,...){
points(x,y,...)
abline(a = lm(y ~ x)$coefficients[1] , b = lm(y ~ x)$coefficients[2] , ...)
}
If you are indeed using pairs, however, it seems like loess would be more appropriate since you are likely exploring a dataset and would warrant the fitting of a linear line as extraneous at that point.
I'm trying to plot a huge matrix of correlation coefficients, and currently, my plot looks like this:
Notice that some cells are missing correlation coefficients (ignore for now that lack of symmetry of the plot, unless you happen know why that's the case, too). I believe that the values are not, in fact, missing, but simply too small to appear, because they are scaled by the value of their correlation coefficient.
Looking at the documentation for chart.Correlation(), I was able to find a function from which much of the content of chart.Correlation() was modeled:
panel.cor <- function(x, y, digits=2, prefix="", cex.cor)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y))
txt <- format(c(r, 0.123456789), digits=digits)[1]
txt <- paste(prefix, txt, sep="")
if(missing(cex.cor)) cex <- 0.8/strwidth(txt)
test <- cor.test(x,y)
# borrowed from printCoefmat
Signif <- symnum(test$p.value, corr = FALSE, na = FALSE,
cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1),
symbols = c("***", "**", "*", ".", " "))
text(0.5, 0.5, txt, cex = cex * r)
text(.8, .8, Signif, cex=cex, col=2)
}
pairs(USJudgeRatings[,c(2:3,6,1,7)], lower.panel=panel.smooth, upper.panel=panel.cor)
If I change:
text(0.5, 0.5, txt, cex = cex * r)
To:
text(0.5, 0.5, txt, cex = 0.8)
I get roughly the effect I'm looking for. The problem is that I have no idea how to change this parameter using chart.Correlation() itself. Does issue this make sense?
Here's a hack that modifies the function to allow what you want. This adds a user-settable exponent argument to the function that allows you to change the cex value to cex*[correlation]^cex.cor.scale.
It spits out warnings because of the way ... is handled; they're annoying but harmless.
It would probably be best to contact the maintainer and ask them if they'd be willing to enhance the function, or start creating your own modified version of the package.
edit: slightly more robust changes to the relevant lines
library("PerformanceAnalytics")
## turn the function into a character string
tmpstr <- deparse(chart.Correlation)
## modify the relevant lines
panelcorline <- grep("^ *panel.cor",tmpstr)
tmpstr[panelcorline] <- paste(tmpstr[panelcorline],"cex.cor.scale=1,")
rscaleline <- grep("^ *text\\(0.5",tmpstr)
tmpstr[rscaleline] <- gsub("cex \\* r","cex*r^cex.cor.scale",tmpstr[rscaleline])
## convert back to a function (don't mask the original function)
my.chart.Correlation <- eval(parse(text=tmpstr))
Test it out:
data(managers)
chart.Correlation(managers[,1:8], histogram=TRUE, pch="+")
## no scaling
my.chart.Correlation(managers[,1:8], histogram=TRUE, pch="+",cex.cor.scale=0)
## enhanced scaling
my.chart.Correlation(managers[,1:8], histogram=TRUE, pch="+",cex.cor.scale=2)