Adding two regression graphs having problem in dots and lines - r

I cannot find my answer anywhere.
I have two regression lines from two different datasets. I am trying to put these two regression lines in one graph. The following worked well.
regression1<-lm(Y ~ X, data = mydata1)
regression2<-lm(Y ~ X, data = mydata2)
abline(regression1)
abline(regression2)
However in this plot I just have lines and I don't have dots. I run:
regression1<-lm(Y ~ X, data = mydata1)
regression2<-lm(Y ~ X, data = mydata2)
plot(c(0,2),c(0,2),type="n") +
points(rnorm(200), rnorm(200), col = "red")
abline(regression1)
abline(regression2)
With this command, I had just dots and I don't have lines, still does not work for me. What I want is having one graph with two different lines (representing the each regression's fitted lines) and dots of these regressions. I want these with different colours. Any help would be appreciated. Thanks

When, using base R graphics, you plot something and it doesn't show up in the graph the odds are that you have plotted it outside the plot area. In the example below I will make sure that everything is in the plot area by first getting appropriate x and y axis limits.
The first two code lines do the trick.
rangeX <- range(c(mydata1$X, mydata2$X))
rangeY <- range(c(mydata1$Y, mydata2$Y))
regression1 <- lm(Y ~ X, data = mydata1)
regression2 <- lm(Y ~ X, data = mydata2)
plot(rangeX, rangeY, type = "n", xlab = "X", ylab = "Y")
with(mydata1, points(X, Y, col = "red"))
with(mydata2, points(X, Y, col = "blue"))
abline(regression1, col = "red")
abline(regression2, col = "blue")
Data creation code.
set.seed(1234)
n <- 20
x <- seq_len(n) + rnorm(n)
mydata1 <- data.frame(X = x, Y = x + rnorm(n))
x <- seq_len(n) + rnorm(n)
mu <- 3
mydata2 <- data.frame(X = x + rnorm(n), Y = mu + x + rnorm(n))

Related

Use lattice xyplot for grouped data with points and lines

I wish to make a single-panel graph in lattice that shows data (y) from several groups (g) with superimposed lines showing predicted values (y_pred). I generate example data below:
d <- data.frame(x = rep(1:100,2), g = factor(rep(c('a','b'), each = 100)))
d$y_pred <- -0.1*x + 0.001*x^2
d$y_pred <- with(d, ifelse(g=='a', y_pred+2,y_pred))
d$y <- d$y_pred + rnorm(nrow(d),0,1)
Using 'type=c('p','l'), distribute.type=TRUE does not work, nor does my attempt at making a panel:
xyplot(y + y_pred ~ x, data = d,
groups = g,
panel = panel.superpose,
panel.groups=function(...){
panel.xyplot(x, y, type='p')
panel.xyplot(x, y_pred, type='l')
}
)
What should I do here?
Ok, you can do this with latticeExtra
dat <- xyplot(y ~ x, data=d,
groups = g,
type="p"
)
dat
dat + layer(panel.xyplot(x=x, y=y_pred,
groups = g,
type="l",
subscripts=TRUE),
data=d)
But this is really finicky and non-robust. The 'dat + layer()' code will render properly if it is the first thing I run after starting the program, but after that it will frequently render with several of lines for different groups missing.

Plot one data frame column against all other columns using ggplots and showing densities in R

I have a data frame with 20 columns, and I want to plot one specific column (called BB) against each single column in the data frame. The plots I need are probability density plots, and I’m using the following code to generate one plot (plotting columns BB vs. AA as an example):
mydata = as.data.frame(fread("filename.txt")) #read my data as data frame
#function to calculate density
get_density <- function(x, y, n = 100) {
dens <- MASS::kde2d(x = x, y = y, n = n)
ix <- findInterval(x, dens$x)
iy <- findInterval(y, dens$y)
ii <- cbind(ix, iy)
return(dens$z[ii])
}
set.seed(1)
#define the x and y of the plot; x = column called AA; y = column called BB
xy1 <- data.frame(
x = mydata$AA,
y = mydata$BB
)
#call function get_density to calculate density for the defined x an y
xy1$density <- get_density(xy1$x, xy1$y)
#Plot
ggplot(xy1) + geom_point(aes(x, y, color = density), size = 3, pch = 20) + scale_color_viridis() +
labs(title = "BB vs. AA") +
scale_x_continuous(name="AA") +
scale_y_continuous(name="BB")
Would appreciate it if someone can suggest a method to produce multiple plot of BB against every other column, using the above density function and ggplot command. I tried adding a loop, but found it too complicated especially when defining the x and y to be plotted or calling the density function.
Since you don't provide sample data, I'll demo on mtcars. We convert the data to long format, calculate the densities, and make a faceted plot. We plot the mpg column against all others.
library(dplyr)
library(tidyr)
mtlong = gather(mtcars, key = "var", value = "value", -mpg) %>%
group_by(var) %>%
mutate(density = get_density(value, mpg))
ggplot(mtlong, aes(x = value, y = mpg, color = density)) +
geom_point(pch = 20, size = 3) +
labs(x = "") +
facet_wrap(~ var, scales = "free")

plotting two simple linear regression scatterplots and lines on one graph from two data sets in R

I have tried:
plot(CORTMaglog~CORTlogB, data = data0, xlab="logCORTB", ylab="log CORT30- CORTB")
abline(lm(CORTMaglog ~ CORTlogB))
and
plot(CORTMaglog~CORTlogB, data = data1, xlab="logCORTB", ylab="log CORT30- CORTB")
abline(lm(CORTMaglog ~ CORTlogB))
and now have two graphs.
How do I have both plots from two different data sets on one graph with lines and scatterplots?
Thank you!
You should use points. Here is a reproducible example:
x = 1:100
y1 = x^1.2 + x*rnorm(100,0,1)
y2 = 2*x^1.2 + x*rnorm(100,1,0.5)
plot(x,y1)
plot(x,y2)
data1 = cbind(x,y1)
data2 = cbind(x,y2)
plot(y2 ~ x, data=data2,col='blue')
abline(lm(y2 ~ x),col='blue')
points(y1 ~ x, data=data1,col='red')
abline(lm(y1 ~ x),col='red')
Edit To answer the question in the comments. To use the plot function the way you want to, you need to extract the predictions from the predict function. For example:
x = 1:100
y1 = x^1.2 + x*rnorm(100,0,1)
data1 = data.frame(cbind(x,y1))
fit = lm(y1~x, data=data1)
y = predict(fit, newdata = data.frame(x))
plot(x,y1)
lines(x,y)

`scatterplot3d`: can not add a regression plane to 3D scatter plot

I have created a 3d Scatterplot in R and want to add a regression plane. I have looked at code from the statmethods.net website, which can be very useful, and it worked. I then tried it with my own data and the plane did not show up.
library(scatterplot3d)
s3d <- scatterplot3d(Try$Visits, Try$Net.Spend, Try$Radio, pch=16, highlight.3d = TRUE, type = "h", main = "3D Scatterplot")
fit <- lm(Try$Visits ~ Try$Net.Spend +Try$Radio)
s3d$plane3d(fit)
I can not reproduce the issue with the following reproducible example:
set.seed(0)
x <- runif(20)
y <- runif(20)
z <- 0.1 + 0.3 * x + 0.5 * y + rnorm(20, sd = 0.1)
dat <- data.frame(x, y, z)
rm(x,y,z)
fit <- lm(z ~ x + y, data = dat)
library(scatterplot3d)
s3d <- scatterplot3d(dat$x, dat$y, dat$z, pch=16, highlight.3d = TRUE, type = "h", main = "3D Scatterplot")
s3d$plane3d(fit)
You should avoid $ in model formula. Use data argument instead:
fit <- lm(Visits ~ Net.Spend + Radio, data = Try)
Your z-variable(dependent variable) in the scatter plot is Try$Radio whereas in the regression model, the dependent variable is Try$Visits and this is causing confusion. The 3rd variable in the scatter plot argument is treated as the dependent variable R.

using lines() with 'multiple x entries'

I'm looking for a way to plot a nonlinear regression line on a data set where every value in my vector y is being stored multiple times, so I tried to use something like:
x <- c(1,2,3,4,5,6,7,8,9,10)
y <- c(1,4,9,15,25,9,36,25,36,25)
reg4 <- lm( x ~ y + I(y^2) )
plot(x ~ y)
lines(y, predict(reg4), type="l", col="red", lwd=1)
this gives http://i.imgur.com/qSEVNdT.png
So my question is, is there a way to, let's say, use some sort of mean value for each y entry? Or well just make it a 'continous' line instead of something that branches of into multiple lines/returns to a lower y value at the points where there are multiple 'entries'.
In these cases, it is best to predict from the model over the range of the covariate. You do this for say 50 or 100 locations equally spaced over the range of x. Increasing or decreasing the number of locations to predict at as needed - more complex responses will need more locations etc. Doing this also solves the spaghetti plot issue as the newdata supplied will be in the order of x
x <- c(1,2,3,4,5,6,7,8,9,10)
y <- c(1,4,9,15,25,9,36,25,36,25)
reg4 <- lm( x ~ y + I(y^2) )
## predictions
pred <- data.frame(y = seq(min(y), max(y), length = 100))
pred <- transform(pred, x = predict(reg4, newdata = pred))
## plot
plot(x ~ y)
lines(x ~ y, data = pred, type = "l", col = "red", lwd = 1)
The problem does not come from the ties in the data:
for a given value of y, there is only one forecast.
The problem is that the points are not sorted,
so that when you join them, you end up with a tangle of lines.
You can use order to reorder the points.
plot(
x ~ y,
xlab = "y", ylab = "x" # Confusing...
)
i <- order(y)
lines( y[i], predict(reg4)[i] )

Resources