Add 3D abline to cloud plot in R's lattice package - r

I want to add a 3D abline to a cloud scatterplot in R's lattice package. Here's a subset of my data (3 variables all between 0,1):
dat <- structure(c(0.413, 0.879, 0.016, 0.631, 0.669, 0.048, 1, 0.004, 0.523, 0.001,
0.271, 0.306, 0.014, 0.008, 0.001, 0.023, 0.670, 0.027, 0.291, 0.709,
0.002, 0.003, 0.611, 0.024, 0.580, 0.755, 1, 0.003, 0.038, 0.143, 0.214,
0.161, 0.008, 0.027, 0.109, 0.026, 0.229, 0.006, 0.377, 0.191, 0.724,
0.119, 0.203, 0.002, 0.309, 0.011, 0.141, 0.009, 0.340, 0.152, 0.545,
0.001, 0.217, 0.132, 0.839, 0.052, 0.745, 0.001, 1, 0.273), .Dim = c(20L, 3L))
Here's the cloud plot:
# cloud plot
trellis.par.set("axis.line", list(col="transparent"))
cloud(dat[, 1] ~ dat[, 2] + dat[, 3], pch=16, col="darkorange", groups=NULL, cex=0.8,
screen=list(z = 30, x = -70, y = 0),
scales=list(arrows=FALSE, cex=0.6, col="black", font=3, tck=0.6, distance=1) )
I want to add a dashed grey line between 0,0,0 and 1,1,1 (i.e., diagonally through the plot). I know I can change the points to lines using "type="l", panel.3d.cloud=panel.3dscatter", but I can't see a way to add extra points/lines to the plot using this.
Here's an example of what I want to achieve using scatterplot3d:
# scatterplot3d
s3d <- scatterplot3d(dat, type="p", color="darkorange", angle=55, scale.y=0.7,
pch=16, col.axis="blue", col.grid="lightblue")
# add line
s3d$points3d(c(0,1), c(0,1), c(0,1), col="grey", type="l", lty=2)
I want to do this with a cloud plot to control the angle at which I view the plot (scatterplot3d doesn't allow me to have the 0,0,0 corner of the plot facing). Thanks for any suggestions.

Inelegant and probably fragile, but this seems to work ...
cloud(dat[, 1] ~ dat[, 2] + dat[, 3], pch=16, col="darkorange",
groups=NULL, cex=0.8,
screen=list(z = 30, x = -70, y = 0),
scales=list(arrows=FALSE, cex=0.6, col="black", font=3,
tck=0.6, distance=1) ,
panel=function(...) {
L <- list(...)
L$x <- L$y <- L$z <- c(0,1)
L$type <- "l"
L$col <- "gray"
L$lty <- 2
do.call(panel.cloud,L)
p <- panel.cloud(...)
})
One thing to keep in mind is that this will not do hidden point/line removal, so the line will be either in front of all of the points or behind them all; in this (edited) version, do.call(panel.cloud,L) is first so the points will obscure the line rather than vice versa. If you want hidden line removal then I believe rgl is your only option ... very powerful but not as pretty and with a much more primitive interface.

Related

Standardizing a vector in R so that values shift towards boundaries

I have vector as follows -
a <- c(0.211, 0.028, 0.321, 0.072, -0.606, -0.364, -0.066, 0.172,
-0.917, 0.062, 0.117, -0.136, -0.296, 0.022, 0.046, -0.19, 0.057,
-0.625, -0.01, 0.158, 0.407, -0.328, -0.347, -0.512, -0.101,
0.008, -0.406, -0.014, 0.517, 0.085, -0.525, -0.635, -0.603,
-0.105, 0.643, -0.094, -0.26, 0.348, -0.106, 0.608, 0.146, -0.343,
-0.537, -0.661, 0.166, -0.037, -0.224, -0.269, -0.221, -0.623,
-0.025, 0.382, 0.201, -0.281, -0.699, -0.373, -0.146, -0.273,
-0.354, -0.138, -0.098, 0.312, 0.467, 0.156, 0.264, -0.108, -0.707,
-1, -0.423, -0.708, -0.235, -0.219, -0.645, 0.081, 0.704, -0.639,
0.368, -0.578, 0.158, -0.04, -0.071, -0.125, 0.006, 0.423, 0.112,
1, 0.373, -0.554, -0.092, 0.509, -0.535, -0.619, -0.31, -0.082,
-0.367, -0.574, 0.029, 0.391, 0.062, -0.476)
The range of this vector is from -1 to 1 and it looks like -
> plot(a)
Is there a way to standardize vector a so that all the values move away from zero and shift towards 1 or -1? (near the red lines).
It will be great if I can control the extent of how much these values can move towards 1 or -1.
You can use min-max standardization. Usually min max std. is used to scale values between 0 and 1. However, you can scale values to any range [a, b] by using the following equation:
X_Scaled = a + (x - min(x)) * (b-a) / (max(x) - min(x))
So in your case, let's break it down to two steps.
First: you want positive values to be centered around 0.75 and negative values centered around -0.75. So we can just filter for the values in your data.
data <- runif(100, -1, 1)
positive_vals <- data[data > 0]
negative_vals <- data[data < 0]
Second step: You want to control how much they move towards this value of 0.75. So you could define a range and a center. Say, a range of 0.05 and a center of 0.75 gives us a = 0.7 and b=0.8, right? We can do the same for the negative center.
range <- 0.05
upper_center <- 0.75
lower_center <- -0.75
b1 <- upper_center + range
a1 <- upper_center - range
b2 <- lower_center + range
a2 <- lower_center - range
Finally, we apply the min-max equation for both cases, taking care to preserve the original positions of the positive and negative values in the original array.
# normalize them using, say, min-max
positive_vals <- a1 + ((positive_vals - min(positive_vals)) * (b1 - a1)) / (max(positive_vals) - min(positive_vals))
negative_vals <- a2 + ((negative_vals - min(negative_vals)) * (b2 - a2)) / (max(negative_vals) - min(negative_vals))
new_data <- data
new_data[data > 0] <- positive_vals
new_data[data < 0] <- negative_vals
# Plot the results!
plot(data)
points(new_data, col = "red")
If you're not satisfied with moving values so close to 0.75, just increase the range. You can also move the centers by defining different values.
Using your data provided:

xlim geom_histogram Error: Aesthetics must be either length 1 or the same as the data

I am trying to plot a histogram with a custom colour palette. The problem arises when I set the xlim of the histogram.
Please see below the reproducible example:
# sample dataframe
test_dt <- structure(list(col_1 = c(0.057, -0.063, -0.319, 0.02, 0.079,
0.007, -0.105, -0.084, 0.019, 0.28, -0.064, -0.243, -0.116, 0.079,
0.07, -0.187, -0.725, 0.134, 0.062, -0.056, -0.074, 0.392, -0.014,
-0.062, 0.214, 0.371, 0.069, -0.03, 0.036, -0.175, 0.097, 0.358,
0.153, -0.092, -0.038, -0.051, 0.017, -0.108, 0.133, 0.105, 0.187,
-0.056, -0.316, 0.15, -0.142, 0.076, 0.242, -0.069, 0.155, 0.214,
0.162, -0.037, -0.109, 0.111, -0.077, -0.435, 0.003, 0.187, 0.134,
0.027, 0.107, 0.175, -0.355, -0.572, 0.038, -0.209, -0.263, -0.147,
-0.23, -0.174, 0.203, -0.118, 0.008, -0.268, -0.001, 0.227, -0.019,
0.08, 0.044, -0.065, -0.131, 0.093, 0.127, -0.131, 0.039, 0.045,
0.032, 0.343, 0.053, -0.033, 0.453, 0.07, -0.225, 0.094, 0.002,
-0.119, 0.014, -0.125, 0.003, -0.48)), row.names = c(NA, -100L
), class = "data.frame")
# colour palette
RBW <- colorRampPalette(c("darkred","white","darkblue"))
# plot histogram without xlim
ggplot(test_dt) +
geom_histogram(aes(x=col_1),
position = "identity",
bins = 60,
color = "grey10",
fill = RBW(60))
When I run the following lines is when I get the error:
Aesthetics must be either length 1 or the same as the data
# plot histogram with xlim
ggplot(test_dt) +
geom_histogram(aes(x=col_1),
position = "identity",
bins = 60,
color = "grey10",
fill = RBW(60)) +
xlim(-2,2)
instead of xlim, add + coord_cartesian(xlim = c(-2,2))
library(ggplot2)
``` r
ggplot(test_dt) +
geom_histogram(aes(x=col_1),
position = "identity",
bins = 60,
color = "grey10",
fill = RBW(60)) +
coord_cartesian(xlim = c(-2,2))
Created on 2020-02-11 by the reprex package (v0.3.0)

Not able to merge two legends

enter image description here
I have plotted stacked bar plot and line graph in one which have two different data sets. I am getting two separate legends for both of them. I have tried all the possible things.
Please find the attached code.
alldata = data.frame(x, aircargo, autototal, govtreceipts,
iipconsumer,nongimports, railfreight)
linedata = data.frame(x,ceii)
melteddata = melt(alldata,id.vars="x")
plotS1 <- ggplot(melteddata)
plotS1 + geom_bar(aes(x=ordered_x,y=value,factor=variable,fill=variable,
order=-as.numeric(variable)), stat="identity") +
geom_line(data=linedata, aes(x=as.numeric(ordered_x),y=ceii, color = "CEII"), lwd=1.5) +
scale_color_manual( values = c("#000000")) +
scale_fill_manual(name = "Components", values = c("#0000FF", "#FFC0CB", "#00FFFF", "#00FF00", "#FF00FF", "#20B2AA", "#000000")) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + theme(plot.background = element_rect(fill = "#BFD5E3")) +
ggtitle("Monthly Contribution by Components (3 month MA)") +
theme( panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + labs( y = "", x = "") +
scale_y_continuous(labels = c("-0.30","-0.25","-0.2","-0.15","-0.10","-0.05", "0.00", "0.05", "0.10", "0.15", "0.20", "0.25", "0.30"), breaks = c(-0.30, -0.25, -0.20, -0.15, -0.10, -0.05, 0.00, 0.05, 0.10, 0.15, 0.20, 0.25, 0.30))
Dataset -
aircargo <- c(-0.027, 0.028, 0.044, 0.011, 0.041, 0.030, -0.028, 0.017, 0.001, 0.060, -0.040, 0.016, 0.006, -0.040, -0.003, 0)
autototal <- c(0.061, -0.004, 0.009, 0.024, -0.026, 0.025, -0.029, 0.000, -0.015, -0.016, 0.026, -0.062, 0.034, 0.002, -0.081, -0.005)
govtreceipts <- c(-0.001, 0.001, -0.005, 0.031, -0.023, 0.000, -0.009, 0.005, 0.002, -0.005, 0.004, 0.000, 0.004, -0.003, 0, 0)
iipconsumer <- c(0.043, -0.014, 0.041, -0.035, 0.001, 0.001, 0.040, 0.010, -0.006, 0.013, 0.001, -0.006, -0.002, -0.011, -0.033, 0)
nongimports <- c(0.018, -0.008, 0.015, -0.004, 0.019, -0.010, 0.008, 0.007, -0.021, 0.006, -0.002, -0.007, 0.009, -0.017, 0.005, 0)
railfreight <- c(0.014, -0.015, 0.031, 0.103, -0.041, 0.025, -0.044, 0.061, -0.050, 0.092, -0.045, 0.011, -0.007, 0.050, 0.100, -0.015)
x <- c("Jan-18", "Feb-18", "Mar-18", "Apr-18", "May-18", "Jun-18", "Jul-18", "Aug-18", "Sep-18", "Oct-18", "Nov-18", "Dec-18", "Jan-19", "Feb-19", "Mar-19", "Apr-19")
ceii <- c(0.108, -0.012, 0.134, 0.131, -0.030, 0.072, -0.062, 0.100, -0.089, 0.149, -0.070, -0.047, 0.043, -0.019, -0.012, -0.020)
Please help in combining the legend. Thanks in advance.
One option is to get the same levels for the two factors. This takes some up front work with the data.frames.
For example, here's one way to do this, adding a variable named variable to linedata and then matching the factor levels.
melteddata = reshape2::melt(alldata, id.vars = "x")
# Add CEII to levels of "variable"
melteddata$variable = factor(melteddata$variable,
levels = c(levels(melteddata$variable), "CEII") )
linedata = data.frame(x, ceii, variable = "CEII")
# Same levels in linedata as melteddata
linedata$variable = factor(linedata$variable,
levels = levels(melteddata$variable) )
Then I made a vector for the colors outside of the plot so it can be used for both colors and fills. I made this a named vector since I find this best practice in case the order ever changes.
# Vector of colors
fillcol = c("#0000FF", "#FFC0CB", "#00FFFF", "#00FF00", "#FF00FF", "#20B2AA", "#000000")
names(fillcol) = levels(melteddata$variable)
Then you get a combined legend if you use drop = FALSE in the scale layers.
To get filled boxes plus a line box for the line you need override.aes within guide_legend(). I removed the fill from the last box so the line shows.
Note I didn't have your ordered_x variable so this is likely not exactly the plot you were looking for.
ggplot(melteddata) +
geom_col(aes(x = x, y = value, fill = variable) ) +
geom_line(data = linedata, aes(x = x, y = ceii,
color = variable,
group = 1), lwd = 1.5) +
scale_color_manual(name = "Components", drop = FALSE,
values = fillcol ) +
scale_fill_manual(name = "Components", drop = FALSE,
values = fillcol ) +
guides(fill = guide_legend(override.aes = list(fill = c(fillcol[1:6], NA) ) ) )

Repeating x axis R

I have data with the amount of radiation at a specific time (hour, minutes) for three repeating days. I want to plot this so the x-axis goes from 0 - 24 3 times. So the x axis repeats itself. And on the y axis the amount of radiation. I have tried the following script without any succes.
plot(gegevens[,1],gegevens[,2],type='l',col='red',xaxt='n',yaxt='n',xlab='',ylab='')
axis(1, at=(0:74),labels = rep.int(0:24,3), las=2)
mtext('Zonnetijd (u)', side=1,line=3)
The dataset was to big so I've selected the first two hours from 2 days. The first column is the time en the second is the radiation. The data then looks as followed:
structure(c(0, 0.083333333333333, 0.166666666666667, 0.25, 0.333333333333333,
0.416666666666667, 0.5, 0.583333333333333, 0.666666666666667,
0.75, 0.833333333333333, 0.916666666666667, 1, 1.08333333333333,
1.16666666666667, 1.25, 1.33333333333333, 1.41666666666667, 1.5,
1.58333333333333, 1.66666666666667, 1.75, 1.83333333333333, 1.91666666666667,
0.0158590638878904, 0.0991923972212234, 0.182525730554557, 0.26585906388789,
0.349192397221223, 0.432525730554557, 0.51585906388789, 0.599192397221223,
0.682525730554557, 0.76585906388789, 0.849192397221223, 0.932525730554557,
1.01585906388789, 1.09919239722122, 1.18252573055456, 1.26585906388789,
1.34919239722122, 1.43252573055456, 1.51585906388789, 1.59919239722122,
1.68252573055456, 1.76585906388789, 1.84919239722122, 1.93252573055456,
0.066, 0.066, 0.068, 0.068, 0.068, 0.066, 0.066, 0.066, 0.066,
0.066, 0.066, 0.066, 0.057, 0, 0, 0, -0.002, 0, 0, -0.002, 0,
-0.002, -0.009, -0.011, 0, -0.002, 0, -0.002, 0, -0.002, 0, 0.002,
0, 0, 0, 0, -0.002, -0.002, -0.007, 0, -0.002, 0, 0, 0, -0.002,
-0.002, -0.002, 0), .Dim = c(48L, 2L), .Dimnames = list(NULL,
c("t", "z")))
I think you would be better off to move towards a date/time class for your axis. Then you can have more control on what to plot etc. Below is an example:
# create example data
df <- data.frame(
T = seq.POSIXt(as.POSIXct("2000-01-01 00:00:00"),
by = "hours", length.out = 24*3)
)
df
df$St <- cumsum(rnorm(24*3))
# plot
png("test.png", width = 8, height = 4, units = "in", res = 200)
op <- par(mar = c(4,4,1,1), ps = 8)
plot(St ~ T, df, type="l",col='red',xaxt='n',yaxt='n',xlab='',ylab='')
axis(1, at=df$T, labels = format(df$T, "%H"), las=2)
mtext('Zonnetijd (u)', side=1,line=3)
par(op)
dev.off()
You Can see that you may have some space issues with the labels when you plot every one.
Here is another example with 3-hour increment labels:
# alt plot
AT <- seq(min(df$T), max(df$T), by = "3 hour") # 3 hour increments
LAB <- format(AT, "%H")
png("test2.png", width = 8, height = 4, units = "in", res = 200)
op <- par(mar = c(4,4,1,1), ps = 8)
plot(St ~ T, df, type="l",col='red', xlab='', ylab='', xaxt='n')
axis(1, at = AT, labels = LAB, las=2)
mtext('Zonnetijd (u)', side=2, line=3)
mtext('hour', side=1, line=3)
par(op)
dev.off()
Marc has good advice about using a datetime class. Overall, that is a good way to go. See this question for examples of converting decimal times in hours to POSIX datetime class.
If you want to continue with your numeric data we the data itself to indicate what day it occurs on. Here we create a new column identical to the first, but adding 24 every time the first column has a negative difference between successive rows:
gegevens = cbind(gegevens, gegevens[, 1] + 24 * c(0, cumsum(diff(gegevens[, 1]) < 0)))
Now when we plot using our new column, the hours are correctly spaced by day:
plot(gegevens[, 3], gegevens[, 2], type = 'l', col = 'red', xaxt = 'n', yaxt = 'n', xlab = '', ylab = '')
You have some axis issues as well. There is no 24 hour, we usually call this the 0 hour. And 24 * 3 = 72, not 74, so our maximum hour (starting at 0) is 71:
axis(1, at= 0:71, labels = rep.int(0:23,3), las = 2)
Here is the resulting plot on your sample data. It should "work" on your full data, but I agree with Marc that it is probably too many labels. Using a POSIXct date-time format is the best way to flexibly make adjustments.

Plot conditional density curve `P(Y|X)` along a linear regression line

This is my data frame, with two columns Y (response) and X (covariate):
## Editor edit: use `dat` not `data`
dat <- structure(list(Y = c(NA, -1.793, -0.642, 1.189, -0.823, -1.715,
1.623, 0.964, 0.395, -3.736, -0.47, 2.366, 0.634, -0.701, -1.692,
0.155, 2.502, -2.292, 1.967, -2.326, -1.476, 1.464, 1.45, -0.797,
1.27, 2.515, -0.765, 0.261, 0.423, 1.698, -2.734, 0.743, -2.39,
0.365, 2.981, -1.185, -0.57, 2.638, -1.046, 1.931, 4.583, -1.276,
1.075, 2.893, -1.602, 1.801, 2.405, -5.236, 2.214, 1.295, 1.438,
-0.638, 0.716, 1.004, -1.328, -1.759, -1.315, 1.053, 1.958, -2.034,
2.936, -0.078, -0.676, -2.312, -0.404, -4.091, -2.456, 0.984,
-1.648, 0.517, 0.545, -3.406, -2.077, 4.263, -0.352, -1.107,
-2.478, -0.718, 2.622, 1.611, -4.913, -2.117, -1.34, -4.006,
-1.668, -1.934, 0.972, 3.572, -3.332, 1.094, -0.273, 1.078, -0.587,
-1.25, -4.231, -0.439, 1.776, -2.077, 1.892, -1.069, 4.682, 1.665,
1.793, -2.133, 1.651, -0.065, 2.277, 0.792, -3.469, 1.48, 0.958,
-4.68, -2.909, 1.169, -0.941, -1.863, 1.814, -2.082, -3.087,
0.505, -0.013, -0.12, -0.082, -1.944, 1.094, -1.418, -1.273,
0.741, -1.001, -1.945, 1.026, 3.24, 0.131, -0.061, 0.086, 0.35,
0.22, -0.704, 0.466, 8.255, 2.302, 9.819, 5.162, 6.51, -0.275,
1.141, -0.56, -3.324, -8.456, -2.105, -0.666, 1.707, 1.886, -3.018,
0.441, 1.612, 0.774, 5.122, 0.362, -0.903, 5.21, -2.927, -4.572,
1.882, -2.5, -1.449, 2.627, -0.532, -2.279, -1.534, 1.459, -3.975,
1.328, 2.491, -2.221, 0.811, 4.423, -3.55, 2.592, 1.196, -1.529,
-1.222, -0.019, -1.62, 5.356, -1.885, 0.105, -1.366, -1.652,
0.233, 0.523, -1.416, 2.495, 4.35, -0.033, -2.468, 2.623, -0.039,
0.043, -2.015, -4.58, 0.793, -1.938, -1.105, 0.776, -1.953, 0.521,
-1.276, 0.666, -1.919, 1.268, 1.646, 2.413, 1.323, 2.135, 0.435,
3.747, -2.855, 4.021, -3.459, 0.705, -3.018, 0.779, 1.452, 1.523,
-1.938, 2.564, 2.108, 3.832, 1.77, -3.087, -1.902, 0.644, 8.507
), X = c(0.056, 0.053, 0.033, 0.053, 0.062, 0.09, 0.11, 0.124,
0.129, 0.129, 0.133, 0.155, 0.143, 0.155, 0.166, 0.151, 0.144,
0.168, 0.171, 0.162, 0.168, 0.169, 0.117, 0.105, 0.075, 0.057,
0.031, 0.038, 0.034, -0.016, -0.001, -0.031, -0.001, -0.004,
-0.056, -0.016, 0.007, 0.015, -0.016, -0.016, -0.053, -0.059,
-0.054, -0.048, -0.051, -0.052, -0.072, -0.063, 0.02, 0.034,
0.043, 0.084, 0.092, 0.111, 0.131, 0.102, 0.167, 0.162, 0.167,
0.187, 0.165, 0.179, 0.177, 0.192, 0.191, 0.183, 0.179, 0.176,
0.19, 0.188, 0.215, 0.221, 0.203, 0.2, 0.191, 0.188, 0.19, 0.228,
0.195, 0.204, 0.221, 0.218, 0.224, 0.233, 0.23, 0.258, 0.268,
0.291, 0.275, 0.27, 0.276, 0.276, 0.248, 0.228, 0.223, 0.218,
0.169, 0.188, 0.159, 0.156, 0.15, 0.117, 0.088, 0.068, 0.057,
0.035, 0.021, 0.014, -0.005, -0.014, -0.029, -0.043, -0.046,
-0.068, -0.073, -0.042, -0.04, -0.027, -0.018, -0.021, 0.002,
0.002, 0.006, 0.015, 0.022, 0.039, 0.044, 0.055, 0.064, 0.096,
0.093, 0.089, 0.173, 0.203, 0.216, 0.208, 0.225, 0.245, 0.23,
0.218, -0.267, 0.193, -0.013, 0.087, 0.04, 0.012, -0.008, 0.004,
0.01, 0.002, 0.008, 0.006, 0.013, 0.018, 0.019, 0.018, 0.021,
0.024, 0.017, 0.015, -0.005, 0.002, 0.014, 0.021, 0.022, 0.022,
0.02, 0.025, 0.021, 0.027, 0.034, 0.041, 0.04, 0.038, 0.033,
0.034, 0.031, 0.029, 0.029, 0.029, 0.022, 0.021, 0.019, 0.021,
0.016, 0.007, 0.002, 0.011, 0.01, 0.01, 0.003, 0.009, 0.015,
0.018, 0.017, 0.021, 0.021, 0.021, 0.022, 0.023, 0.025, 0.022,
0.022, 0.019, 0.02, 0.023, 0.022, 0.024, 0.022, 0.025, 0.025,
0.022, 0.027, 0.024, 0.016, 0.024, 0.018, 0.024, 0.021, 0.021,
0.021, 0.021, 0.022, 0.016, 0.015, 0.017, -0.017, -0.009, -0.003,
-0.012, -0.009, -0.008, -0.024, -0.023)), .Names = c("Y", "X"
), row.names = c(NA, -234L), class = "data.frame")
With this I run a OLS regression: lm(dat[,1] ~ dat[,2]).
At a set of values: X = quantile(dat[,2], c(0.1, 0.5, 0.7)), I would like to plot a graph similar to the following, with conditional density P(Y|X) displaying along the regression line.
How can I do this in R? Is it even possible?
I call your dataset dat. Don't use data as it masks R function data.
dat <- na.omit(dat) ## retain only complete cases
## use proper formula rather than `$` or `[,]`;
## otherwise you get trouble in prediction with `predict.lm`
fit <- lm(Y ~ X, dat)
## prediction point, as given in your question
xp <- quantile(dat$X, probs = c(0.1, 0.5, 0.7), names = FALSE)
## make prediction and only keep `$fit` and `$se.fit`
pred <- predict.lm(fit, newdata = data.frame(X = xp), se.fit = TRUE)[1:2]
#$fit
# 1 2 3
#0.20456154 0.14319857 0.00678734
#
#$se.fit
# 1 2 3
#0.2205000 0.1789353 0.1819308
To understand the theory behind the following, read Plotting conditional density of prediction after linear regression. Now I am to use mapply function to apply the same computation to multiple points:
## a function to make 101 sample points from conditional density
f <- function (mu, sig) {
x <- seq(mu - 3.2 * sig, mu + 3.2 * sig, length = 101)
dx <- dnorm(x, mu, sig)
cbind(x, dx)
}
## apply `f` to all `xp`
lst <- mapply(f, pred[[1]], pred[[2]], SIMPLIFY = FALSE)
## To plot rotated density curve, we basically want to plot `(dx, x)`
## but scaling `(alpha * dx, x)` is needed for good scaling with regression line
## Also to plot rotated density along the regression line,
## a shift is needed: `(alpha * dx + xp, x)`
## The following function adds rotated, scaled density to a regression line
## a "for-loop" is used for readability, with no loss of efficiency.
## (make sure there is an existing plot; otherwise you get `plot.new` error!!)
addrsd <- function (xp, lst, alpha = 1) {
for (i in 1:length(xp)) {
x0 <- xp[i]; mat <- lst[[i]]
dx. <- alpha * mat[, 2] + x0 ## rescale and shift
x. <- mat[, 1]
lines(dx., x., col = "gray") ## rotate and plot
segments(x0, x.[1], x0, x.[101], col = "gray") ## a local axis
}
}
Now let's see the picture:
## This is one simple way to draw the regression line
## A better way is to generate and grid and predict on the grid
## In later example I will show this
plot(dat$X, fit$fitted, type = "l", ylim = c(-0.6, 1))
## we try `alpha = 0.01`;
## you can also try `alpha = 1` in raw scale to see what it looks like
addrsd(xp, lst, 0.01)
Note, we have only scaled the height of the density, not its span. The span sort of implies confidence band, and should not be scaled. Consider further overlaying confidence band on the plot. If the use of matplot is not clear, read How do I change colours of confidence interval lines when using matlines for prediction plot?.
## A grid is necessary for nice regression plot
X.grid <- seq(min(dat$X), max(dat$X), length = 101)
## 95%-CI based on t-statistic
CI <- predict.lm(fit, newdata = data.frame(X = X.grid), interval = "confidence")
## use `matplot`
matplot(X.grid, CI, type = "l", col = c(1, 2, 2), lty = c(1, 2, 2))
## add rotated, scaled conditional density
addrsd(xp, lst, 0.01)
You see that the span of the density curve agrees with the confidence ribbon.

Resources