Drawing a regression surface with an interaction in a 3D figure in R - r

Using car::scatter3d(), I am trying to create a 3D figure with a regression surface indicating an interaction between a categorical and a continuous variable. Partly following the code here, I obtained a figure below.
The figure is obviously wrong in that the regression surface does not reach one of the values of the categorical variable. The problem perhaps lies in the use of the rgl::persp3d() (the last block of the code below), but I have not been able to identify what exactly I'm doing wrongly. Could someone let me know what I'm missing and how to fix the problem?
library(rgl)
library(car)
n <- 100
set.seed(1)
x <- runif(n, 0, 10)
set.seed(1)
z <- sample(c(0, 1), n, replace = TRUE)
set.seed(1)
y <- 0.5 * x + 0.1 * z + 0.3 * x * z + rnorm(n, sd = 1.5)
d <- data.frame(x, z, y)
scatter3d(y ~ x + z, data = d,
xlab = "continuous", zlab = "categorical", ylab = "outcome",
residuals = FALSE, surface = FALSE
)
d2 <- d
d2$x <- d$x / (max(d$x) - min(d$x))
d2$y <- d$y / (max(d$y) - min(d$y))
mod <- lm(y ~ x * z, data = d2)
grd <- expand.grid(x = unique(d2$x), z = unique(d2$z))
grd$pred <- predict(mod, newdata = grd)
grd <- grd[order(grd$z, grd$x), ]
# The problem is likely to lie somewhere below.
persp3d(x = unique(grd$x), y = unique(grd$z),
z = matrix(grd$pred, length(unique(grd$z)), length(unique(grd$x))),
alpha = 0.5,
col = "blue",
add = TRUE,
xlab = "", ylab = "", zlab = ""
)
I prefer sticking to car::scatter3d() in drawing the original graph because I already made several figures with car::scatter3d() and want to make this figure consistent with them as well.

Related

Problem with plotting 3D elliptic paraboloid

I'm looking to plot 3D functions using R. For example, take the elliptic paraboloid given by f(x,y) = (𝑥−2𝑦−1)^2 + (3𝑥+𝑦−2)^2. Here's what I've tried:
require(lattice)
x <- seq(-10, 10, by=0.5)
y <- seq(-10, 10, by=0.5)
g <- expand.grid(x = x, y = y)
g$z <- (x-2*y-1)^2 + (3*x-y-2)^2
wireframe(z ~ x * y, g, drape = TRUE,
aspect = c(1,1), colorkey = TRUE)`
And here's the output
However, here's the "true" graph of f:
I've tried changing the definitions of x and y, to no avail. I've also tried the curve3d() function from the emdbook package. It looks even worse.
You multiplied by the wrong x and y. You need to use the ones inside g:
g$z <- with(g, (x-2*y-1)^2 + (3*x-y-2)^2)
wireframe(z ~ x * y, g, drape = TRUE,
aspect = c(1,1), colorkey = TRUE)

How to fit exponential regression in r?(a.k.a changing power of base)

I am making exponential regressions in r.
Actually I want to compare y = exp^(ax+b) with y = 5^(ax+b).
# data
set.seed(1)
y <- c(3.5, 2.9, 2.97,4.58,6.18,7.11,9.50,9.81,10.17,10.53,
12.33,14.14,18, 22, 25, 39, 40, 55, 69, 72) + rnorm(20, 10, 1)
x <- 1:length(y)
df = data.frame(x = x, y = y)
predata = data.frame(x = 1:20)
# plot
plot(df, ylim = c(0,100), xlim = c(0,40))
# simple linear regression
fit_sr = lm(y~x, data = df)
pre_sr = predict(fit_sr, newdata = predata,
interval ='confidence',
level = 0.90)
lines(pre_sr[,1], col = "red")
# exponential regression 1
fit_er1 = lm(log(y, base = exp(1))~x, data = df)
pre_er1 = predict(fit_er1, newdata = predata,
interval ='confidence',
level = 0.90)
pre_er1 = exp(1)^pre_er1 # correctness
lines(pre_er1[,1], col = "dark green")
# exponential regression 2
fit_er2 = lm(log(y, base = 5) ~ x, data = df)
pre_er2 = predict(fit_er2, newdata = predata,
interval ='confidence',
level = 0.90)
pre_er2 = 5^pre_er2 # correctness
lines(pre_er2[,1], col = "blue")
I expect something like this(plot1), but exponential regression 1 and 2 are totally the same(plot2).
plot1
plot2
The two regression should be different because of the Y value is different.
Also, I am looking for how to make y = exp(ax+b) + c fitting in R.
Your code is correct, your theory is where the problem is. The models should be the same.
Easiest way is to think on the log scale, as you've done in your code. Starting with y = exp(ax + b) we can get to log(y) = ax + b, so a linear model with log(y) as the response. With y = 5^(cx + d), we can get log(y) = (cx + d) * log(5) = (c*log(5)) * x + (d*log(5)), also a linear model with log(y) as the response. Yhe model fit/predictions will not be any different with a different base, you can transform the base e coefs to base 5 coefs by multiplying them by log(5). a = c*log(5) and b = d*log(5).
It's a bit like wanting to compare the linear models y = ax + b where x is measured in meters vs y = ax + b where x is measured in centimeters. The coefficients will change to accommodate the scale, but the fit isn't any different.
The first part is already answered by #gregor, the second part "...I am looking for how to make y = exp(ax+b) + c fitting in R" can be done with nls:
fit_er3 <- nls(y ~ exp(a*x+b) + c, data = df, start=list(a=1,b=0,c=0))

Calculate 5th quantile of curve generated from vectors of X, Y points

I have these curves below:
These curves were generated using a library called discreteRV.
library(discreteRV)
placebo.rate <- 0.5
mmm.rate <- 0.3
mmm.power <- power.prop.test(p1 = placebo.rate, p2 = mmm.rate, power = 0.8, alternative = "one.sided")
n <- as.integer(ceiling(mmm.power$n))
patients <- seq(from = 0, to = n, by = 1)
placebo_distribution <- dbinom(patients, size = n, prob = placebo.rate)
mmm_distribution <- dbinom(patients, size = n, prob = mmm.rate)
get_pmf <- function(p1, p2) {
X1 <- RV(patients,p1, fractions = F)
X2 <- RV(patients,p2, fractions = F)
pmf <- joint(X1, X2, fractions = F)
return(pmf)
}
extract <- function(string) {
ints <- unlist(strsplit(string,","))
x1 <- as.integer(ints[1])
x2 <- as.integer(ints[2])
return(x1-x2)
}
diff_prob <- function(pmf) {
diff <- unname(sapply(outcomes(pmf),FUN = extract)/n)
probabilities <- unname(probs(pmf))
df <- data.frame(diff,probabilities)
df <- aggregate(. ~ diff, data = df, FUN = sum)
return(df)
}
most_likely_rate <- function(x) {
x[which(x$probabilities == max(x$probabilities)),]$diff
}
mmm_rate_diffs <- diff_prob(get_pmf(mmm_distribution,placebo_distribution))
placebo_rate_diffs <- diff_prob(get_pmf(placebo_distribution,placebo_distribution))
plot(mmm_rate_diffs$diff,mmm_rate_diffs$probabilities * 100, type = "l", lty = 2, xlab = "Rate difference", ylab = "# of trials per 100", main = paste("Trials with",n,"patients per treatment arm",sep = " "))
lines(placebo_rate_diffs$diff, placebo_rate_diffs$probabilities * 100, lty = 1, xaxs = "i")
abline(v = c(most_likely_rate(placebo_rate_diffs), most_likely_rate(mmm_rate_diffs)), lty = c(1,2))
legend("topleft", legend = c("Alternative hypothesis", "Null hypothesis"), lty = c(2,1))
Basically, I took two binomial discrete random variables, created a joint probability mass function, determined the probability of any given rate difference then plotted them to demonstrate a distribution of those rate differences if the null hypothesis was true or if the alternative hypothesis was true over 100 identical trials.
Now I want to illustrate the 5% percentile on the null hypothesis curve. Unfortunately, I don't know how to do this. If I simply use quantile(x = placebo_rate_diffs$diff, probs = 0.05, I get -0.377027. This can't be correct looking at the graph. I want to calculate the 5th percentile like I would using pbinom() but I don't know how to do that with a graph created from essentially what are just x and y vectors.
Maybe I can approximate these two curves as binomial since they appear to be, but I am still not sure how to do this.
Any help would be appreciated.

Is it possible to specify lower bound in response variable during smooth with gam?

I am trying to fit a smoothed surface of z against x and y using formula z ~ s(x, y) with gam function
in mgcv package. My goal is to predict response z based on new values of x and y.
In my real situation, z should be a positive number negative z would be meaningless. However, the predicted zs
are sometimes negative. It seems that for some region, there is not enough points in the training data to estimate z
accurately.
My question is: Is there a way to specifiy a lower bound of z during smooth in gam so that later I won't get negative zs with predict?
Below is a minimal example that reproduces this issue.
library(mgcv)
x <- seq(0.1, 1, by = 0.01)
y <- seq(0.1, 1, by = 0.01)
dtt <- expand.grid(x = x, y = y)
set.seed(123)
dtt$xp <- dtt$x + rnorm(nrow(dtt)) / 100
dtt$yp <- dtt$y + rnorm(nrow(dtt)) / 100
dtt$z <- 1 / (dtt$xp^2 + dtt$yp^2)
m <- sample.int(nrow(dtt), 3000)
dtt.train <- dtt[m, ]
dtt.test <- dtt[!(1:nrow(dtt) %in% m), ]
fit <- gam(z ~ s(x, y), data = dtt.train)
p <- predict(fit, newdata = dtt.test)
plot(dtt.test$z, p, xlab = 'Real', ylab = 'Predicted', pch = 19, col = 1 + (p < 0))
abline(h = 0, v = 0)
As you can see, for the red points. the real values are positive but the predicted values are negative.

Plot 3D data in R

I have a 3D dataset:
data = data.frame(
x = rep( c(0.1, 0.2, 0.3, 0.4, 0.5), each=5),
y = rep( c(1, 2, 3, 4, 5), 5)
)
data$z = runif(
25,
min = (data$x*data$y - 0.1 * (data$x*data$y)),
max = (data$x*data$y + 0.1 * (data$x*data$y))
)
data
str(data)
And I want to plot it, but the built-in-functions of R alwyas give the error
increasing 'x' and 'y' values expected
# ### 3D Plots ######################################################
# built-in function always give the error
# "increasing 'x' and 'y' values expected"
demo(image)
image(x = data$x, y = data$y, z = data$z)
demo(persp)
persp(data$x,data$y,data$z)
contour(data$x,data$y,data$z)
When I searched on the internet, I found that this message happens when combinations of X and Y values are not unique. But here they are unique.
I tried some other libraries and there it works without problems. But I don't like the default style of the plots (the built-in functions should fulfill my expectations).
# ### 3D Scatterplot ######################################################
# Nice plots without surface maps?
install.packages("scatterplot3d", dependencies = TRUE)
library(scatterplot3d)
scatterplot3d(x = data$x, y = data$y, z = data$z)
# ### 3D Scatterplot ######################################################
# Only to play around?
install.packages("rgl", dependencies = TRUE)
library(rgl)
plot3d(x = data$x, y = data$y, z = data$z)
lines3d(x = data$x, y = data$y, z = data$z)
surface3d(x = data$x, y = data$y, z = data$z)
Why are my datasets not accepted by the built-in functions?
I use the lattice package for almost everything I plot in R and it has a corresponing plot to persp called wireframe. Let data be the way Sven defined it.
wireframe(z ~ x * y, data=data)
Or how about this (modification of fig 6.3 in Deepanyan Sarkar's book):
p <- wireframe(z ~ x * y, data=data)
npanel <- c(4, 2)
rotx <- c(-50, -80)
rotz <- seq(30, 300, length = npanel[1]+1)
update(p[rep(1, prod(npanel))], layout = npanel,
panel = function(..., screen) {
panel.wireframe(..., screen = list(z = rotz[current.column()],
x = rotx[current.row()]))
})
Update: Plotting surfaces with OpenGL
Since this post continues to draw attention I want to add the OpenGL way to make 3-d plots too (as suggested by #tucson below). First we need to reformat the dataset from xyz-tripplets to axis vectors x and y and a matrix z.
x <- 1:5/10
y <- 1:5
z <- x %o% y
z <- z + .2*z*runif(25) - .1*z
library(rgl)
persp3d(x, y, z, col="skyblue")
This image can be freely rotated and scaled using the mouse, or modified with additional commands, and when you are happy with it you save it using rgl.snapshot.
rgl.snapshot("myplot.png")
Adding to the solutions of others, I'd like to suggest using the plotly package for R, as this has worked well for me.
Below, I'm using the reformatted dataset suggested above, from xyz-tripplets to axis vectors x and y and a matrix z:
x <- 1:5/10
y <- 1:5
z <- x %o% y
z <- z + .2*z*runif(25) - .1*z
library(plotly)
plot_ly(x=x,y=y,z=z, type="surface")
The rendered surface can be rotated and scaled using the mouse. This works fairly well in RStudio.
You can also try it with the built-in volcano dataset from R:
plot_ly(z=volcano, type="surface")
If you're working with "real" data for which the grid intervals and sequence cannot be guaranteed to be increasing or unique (hopefully the (x,y,z) combinations are unique at least, even if these triples are duplicated), I would recommend the akima package for interpolating from an irregular grid to a regular one.
Using your definition of data:
library(akima)
im <- with(data,interp(x,y,z))
with(im,image(x,y,z))
And this should work not only with image but similar functions as well.
Note that the default grid to which your data is mapped to by akima::interp is defined by 40 equal intervals spanning the range of x and y values:
> formals(akima::interp)[c("xo","yo")]
$xo
seq(min(x), max(x), length = 40)
$yo
seq(min(y), max(y), length = 40)
But of course, this can be overridden by passing arguments xo and yo to akima::interp.
I think the following code is close to what you want
x <- c(0.1, 0.2, 0.3, 0.4, 0.5)
y <- c(1, 2, 3, 4, 5)
zfun <- function(a,b) {a*b * ( 0.9 + 0.2*runif(a*b) )}
z <- outer(x, y, FUN="zfun")
It gives data like this (note that x and y are both increasing)
> x
[1] 0.1 0.2 0.3 0.4 0.5
> y
[1] 1 2 3 4 5
> z
[,1] [,2] [,3] [,4] [,5]
[1,] 0.1037159 0.2123455 0.3244514 0.4106079 0.4777380
[2,] 0.2144338 0.4109414 0.5586709 0.7623481 0.9683732
[3,] 0.3138063 0.6015035 0.8308649 1.2713930 1.5498939
[4,] 0.4023375 0.8500672 1.3052275 1.4541517 1.9398106
[5,] 0.5146506 1.0295172 1.5257186 2.1753611 2.5046223
and a graph like
persp(x, y, z)
Not sure why the code above did not work for the library rgl, but the following link has a great example with the same library.
Run the code in R and you will obtain a beautiful 3d plot that you can turn around in all angles.
http://statisticsr.blogspot.de/2008/10/some-r-functions.html
########################################################################
## another example of 3d plot from my personal reserach, use rgl library
########################################################################
# 3D visualization device system
library(rgl);
data(volcano)
dim(volcano)
peak.height <- volcano;
ppm.index <- (1:nrow(volcano));
sample.index <- (1:ncol(volcano));
zlim <- range(peak.height)
zlen <- zlim[2] - zlim[1] + 1
colorlut <- terrain.colors(zlen) # height color lookup table
col <- colorlut[(peak.height-zlim[1]+1)] # assign colors to heights for each point
open3d()
ppm.index1 <- ppm.index*zlim[2]/max(ppm.index);
sample.index1 <- sample.index*zlim[2]/max(sample.index)
title.name <- paste("plot3d ", "volcano", sep = "");
surface3d(ppm.index1, sample.index1, peak.height, color=col, back="lines", main = title.name);
grid3d(c("x", "y+", "z"), n =20)
sample.name <- paste("col.", 1:ncol(volcano), sep="");
sample.label <- as.integer(seq(1, length(sample.name), length = 5));
axis3d('y+',at = sample.index1[sample.label], sample.name[sample.label], cex = 0.3);
axis3d('y',at = sample.index1[sample.label], sample.name[sample.label], cex = 0.3)
axis3d('z',pos=c(0, 0, NA))
ppm.label <- as.integer(seq(1, length(ppm.index), length = 10));
axes3d('x', at=c(ppm.index1[ppm.label], 0, 0), abs(round(ppm.index[ppm.label], 2)), cex = 0.3);
title3d(main = title.name, sub = "test", xlab = "ppm", ylab = "samples", zlab = "peak")
rgl.bringtotop();

Resources