Standardizing a vector in R so that values shift towards boundaries

Standardizing a vector in R so that values shift towards boundaries - r

I have vector as follows -
a <- c(0.211, 0.028, 0.321, 0.072, -0.606, -0.364, -0.066, 0.172,
-0.917, 0.062, 0.117, -0.136, -0.296, 0.022, 0.046, -0.19, 0.057,
-0.625, -0.01, 0.158, 0.407, -0.328, -0.347, -0.512, -0.101,
0.008, -0.406, -0.014, 0.517, 0.085, -0.525, -0.635, -0.603,
-0.105, 0.643, -0.094, -0.26, 0.348, -0.106, 0.608, 0.146, -0.343,
-0.537, -0.661, 0.166, -0.037, -0.224, -0.269, -0.221, -0.623,
-0.025, 0.382, 0.201, -0.281, -0.699, -0.373, -0.146, -0.273,
-0.354, -0.138, -0.098, 0.312, 0.467, 0.156, 0.264, -0.108, -0.707,
-1, -0.423, -0.708, -0.235, -0.219, -0.645, 0.081, 0.704, -0.639,
0.368, -0.578, 0.158, -0.04, -0.071, -0.125, 0.006, 0.423, 0.112,
1, 0.373, -0.554, -0.092, 0.509, -0.535, -0.619, -0.31, -0.082,
-0.367, -0.574, 0.029, 0.391, 0.062, -0.476)
The range of this vector is from -1 to 1 and it looks like -
> plot(a)
Is there a way to standardize vector a so that all the values move away from zero and shift towards 1 or -1? (near the red lines).
It will be great if I can control the extent of how much these values can move towards 1 or -1.

You can use min-max standardization. Usually min max std. is used to scale values between 0 and 1. However, you can scale values to any range [a, b] by using the following equation:
X_Scaled = a + (x - min(x)) * (b-a) / (max(x) - min(x))
So in your case, let's break it down to two steps.
First: you want positive values to be centered around 0.75 and negative values centered around -0.75. So we can just filter for the values in your data.
data <- runif(100, -1, 1)
positive_vals <- data[data > 0]
negative_vals <- data[data < 0]
Second step: You want to control how much they move towards this value of 0.75. So you could define a range and a center. Say, a range of 0.05 and a center of 0.75 gives us a = 0.7 and b=0.8, right? We can do the same for the negative center.
range <- 0.05
upper_center <- 0.75
lower_center <- -0.75
b1 <- upper_center + range
a1 <- upper_center - range
b2 <- lower_center + range
a2 <- lower_center - range
Finally, we apply the min-max equation for both cases, taking care to preserve the original positions of the positive and negative values in the original array.
# normalize them using, say, min-max
positive_vals <- a1 + ((positive_vals - min(positive_vals)) * (b1 - a1)) / (max(positive_vals) - min(positive_vals))
negative_vals <- a2 + ((negative_vals - min(negative_vals)) * (b2 - a2)) / (max(negative_vals) - min(negative_vals))
new_data <- data
new_data[data > 0] <- positive_vals
new_data[data < 0] <- negative_vals
# Plot the results!
plot(data)
points(new_data, col = "red")
If you're not satisfied with moving values so close to 0.75, just increase the range. You can also move the centers by defining different values.
Using your data provided:

Related

Plot conditional density curve `P(Y|X)` along a linear regression line

This is my data frame, with two columns Y (response) and X (covariate):
## Editor edit: use `dat` not `data`
dat <- structure(list(Y = c(NA, -1.793, -0.642, 1.189, -0.823, -1.715,
1.623, 0.964, 0.395, -3.736, -0.47, 2.366, 0.634, -0.701, -1.692,
0.155, 2.502, -2.292, 1.967, -2.326, -1.476, 1.464, 1.45, -0.797,
1.27, 2.515, -0.765, 0.261, 0.423, 1.698, -2.734, 0.743, -2.39,
0.365, 2.981, -1.185, -0.57, 2.638, -1.046, 1.931, 4.583, -1.276,
1.075, 2.893, -1.602, 1.801, 2.405, -5.236, 2.214, 1.295, 1.438,
-0.638, 0.716, 1.004, -1.328, -1.759, -1.315, 1.053, 1.958, -2.034,
2.936, -0.078, -0.676, -2.312, -0.404, -4.091, -2.456, 0.984,
-1.648, 0.517, 0.545, -3.406, -2.077, 4.263, -0.352, -1.107,
-2.478, -0.718, 2.622, 1.611, -4.913, -2.117, -1.34, -4.006,
-1.668, -1.934, 0.972, 3.572, -3.332, 1.094, -0.273, 1.078, -0.587,
-1.25, -4.231, -0.439, 1.776, -2.077, 1.892, -1.069, 4.682, 1.665,
1.793, -2.133, 1.651, -0.065, 2.277, 0.792, -3.469, 1.48, 0.958,
-4.68, -2.909, 1.169, -0.941, -1.863, 1.814, -2.082, -3.087,
0.505, -0.013, -0.12, -0.082, -1.944, 1.094, -1.418, -1.273,
0.741, -1.001, -1.945, 1.026, 3.24, 0.131, -0.061, 0.086, 0.35,
0.22, -0.704, 0.466, 8.255, 2.302, 9.819, 5.162, 6.51, -0.275,
1.141, -0.56, -3.324, -8.456, -2.105, -0.666, 1.707, 1.886, -3.018,
0.441, 1.612, 0.774, 5.122, 0.362, -0.903, 5.21, -2.927, -4.572,
1.882, -2.5, -1.449, 2.627, -0.532, -2.279, -1.534, 1.459, -3.975,
1.328, 2.491, -2.221, 0.811, 4.423, -3.55, 2.592, 1.196, -1.529,
-1.222, -0.019, -1.62, 5.356, -1.885, 0.105, -1.366, -1.652,
0.233, 0.523, -1.416, 2.495, 4.35, -0.033, -2.468, 2.623, -0.039,
0.043, -2.015, -4.58, 0.793, -1.938, -1.105, 0.776, -1.953, 0.521,
-1.276, 0.666, -1.919, 1.268, 1.646, 2.413, 1.323, 2.135, 0.435,
3.747, -2.855, 4.021, -3.459, 0.705, -3.018, 0.779, 1.452, 1.523,
-1.938, 2.564, 2.108, 3.832, 1.77, -3.087, -1.902, 0.644, 8.507
), X = c(0.056, 0.053, 0.033, 0.053, 0.062, 0.09, 0.11, 0.124,
0.129, 0.129, 0.133, 0.155, 0.143, 0.155, 0.166, 0.151, 0.144,
0.168, 0.171, 0.162, 0.168, 0.169, 0.117, 0.105, 0.075, 0.057,
0.031, 0.038, 0.034, -0.016, -0.001, -0.031, -0.001, -0.004,
-0.056, -0.016, 0.007, 0.015, -0.016, -0.016, -0.053, -0.059,
-0.054, -0.048, -0.051, -0.052, -0.072, -0.063, 0.02, 0.034,
0.043, 0.084, 0.092, 0.111, 0.131, 0.102, 0.167, 0.162, 0.167,
0.187, 0.165, 0.179, 0.177, 0.192, 0.191, 0.183, 0.179, 0.176,
0.19, 0.188, 0.215, 0.221, 0.203, 0.2, 0.191, 0.188, 0.19, 0.228,
0.195, 0.204, 0.221, 0.218, 0.224, 0.233, 0.23, 0.258, 0.268,
0.291, 0.275, 0.27, 0.276, 0.276, 0.248, 0.228, 0.223, 0.218,
0.169, 0.188, 0.159, 0.156, 0.15, 0.117, 0.088, 0.068, 0.057,
0.035, 0.021, 0.014, -0.005, -0.014, -0.029, -0.043, -0.046,
-0.068, -0.073, -0.042, -0.04, -0.027, -0.018, -0.021, 0.002,
0.002, 0.006, 0.015, 0.022, 0.039, 0.044, 0.055, 0.064, 0.096,
0.093, 0.089, 0.173, 0.203, 0.216, 0.208, 0.225, 0.245, 0.23,
0.218, -0.267, 0.193, -0.013, 0.087, 0.04, 0.012, -0.008, 0.004,
0.01, 0.002, 0.008, 0.006, 0.013, 0.018, 0.019, 0.018, 0.021,
0.024, 0.017, 0.015, -0.005, 0.002, 0.014, 0.021, 0.022, 0.022,
0.02, 0.025, 0.021, 0.027, 0.034, 0.041, 0.04, 0.038, 0.033,
0.034, 0.031, 0.029, 0.029, 0.029, 0.022, 0.021, 0.019, 0.021,
0.016, 0.007, 0.002, 0.011, 0.01, 0.01, 0.003, 0.009, 0.015,
0.018, 0.017, 0.021, 0.021, 0.021, 0.022, 0.023, 0.025, 0.022,
0.022, 0.019, 0.02, 0.023, 0.022, 0.024, 0.022, 0.025, 0.025,
0.022, 0.027, 0.024, 0.016, 0.024, 0.018, 0.024, 0.021, 0.021,
0.021, 0.021, 0.022, 0.016, 0.015, 0.017, -0.017, -0.009, -0.003,
-0.012, -0.009, -0.008, -0.024, -0.023)), .Names = c("Y", "X"
), row.names = c(NA, -234L), class = "data.frame")
With this I run a OLS regression: lm(dat[,1] ~ dat[,2]).
At a set of values: X = quantile(dat[,2], c(0.1, 0.5, 0.7)), I would like to plot a graph similar to the following, with conditional density P(Y|X) displaying along the regression line.
How can I do this in R? Is it even possible?

I call your dataset dat. Don't use data as it masks R function data.
dat <- na.omit(dat) ## retain only complete cases
## use proper formula rather than `$` or `[,]`;
## otherwise you get trouble in prediction with `predict.lm`
fit <- lm(Y ~ X, dat)
## prediction point, as given in your question
xp <- quantile(dat$X, probs = c(0.1, 0.5, 0.7), names = FALSE)
## make prediction and only keep `$fit` and `$se.fit`
pred <- predict.lm(fit, newdata = data.frame(X = xp), se.fit = TRUE)[1:2]
#$fit
# 1 2 3
#0.20456154 0.14319857 0.00678734
#
#$se.fit
# 1 2 3
#0.2205000 0.1789353 0.1819308
To understand the theory behind the following, read Plotting conditional density of prediction after linear regression. Now I am to use mapply function to apply the same computation to multiple points:
## a function to make 101 sample points from conditional density
f <- function (mu, sig) {
x <- seq(mu - 3.2 * sig, mu + 3.2 * sig, length = 101)
dx <- dnorm(x, mu, sig)
cbind(x, dx)
}
## apply `f` to all `xp`
lst <- mapply(f, pred[[1]], pred[[2]], SIMPLIFY = FALSE)
## To plot rotated density curve, we basically want to plot `(dx, x)`
## but scaling `(alpha * dx, x)` is needed for good scaling with regression line
## Also to plot rotated density along the regression line,
## a shift is needed: `(alpha * dx + xp, x)`
## The following function adds rotated, scaled density to a regression line
## a "for-loop" is used for readability, with no loss of efficiency.
## (make sure there is an existing plot; otherwise you get `plot.new` error!!)
addrsd <- function (xp, lst, alpha = 1) {
for (i in 1:length(xp)) {
x0 <- xp[i]; mat <- lst[[i]]
dx. <- alpha * mat[, 2] + x0 ## rescale and shift
x. <- mat[, 1]
lines(dx., x., col = "gray") ## rotate and plot
segments(x0, x.[1], x0, x.[101], col = "gray") ## a local axis
}
}
Now let's see the picture:
## This is one simple way to draw the regression line
## A better way is to generate and grid and predict on the grid
## In later example I will show this
plot(dat$X, fit$fitted, type = "l", ylim = c(-0.6, 1))
## we try `alpha = 0.01`;
## you can also try `alpha = 1` in raw scale to see what it looks like
addrsd(xp, lst, 0.01)
Note, we have only scaled the height of the density, not its span. The span sort of implies confidence band, and should not be scaled. Consider further overlaying confidence band on the plot. If the use of matplot is not clear, read How do I change colours of confidence interval lines when using matlines for prediction plot?.
## A grid is necessary for nice regression plot
X.grid <- seq(min(dat$X), max(dat$X), length = 101)
## 95%-CI based on t-statistic
CI <- predict.lm(fit, newdata = data.frame(X = X.grid), interval = "confidence")
## use `matplot`
matplot(X.grid, CI, type = "l", col = c(1, 2, 2), lty = c(1, 2, 2))
## add rotated, scaled conditional density
addrsd(xp, lst, 0.01)
You see that the span of the density curve agrees with the confidence ribbon.

Using apply using multiple sources of data?

I'm still in the beginning stages of R but I've gotten a few functions down and now I'm looking for my final "project."
I've created a function that takes each of my four sources of data (different populations) and creates histograms, performs kolmogorov-smirnov tests, and then graphs any significant results for a given row. What I want to do is turn it into an apply function. However, the issue is that my function takes four variables, and I don't know a way to make apply take four sources of data.
hist_fx <- function(w,x,y,z) {
hist(w,prob=TRUE,col="green",xlim=c(-1,1),ylim=c(0,3))
lines(density(w),col="red")
abline(v=c(mean(w)),col="red")
hist(x,prob=TRUE,col="blue",xlim=c(-1,1),ylim=c(0,3))
lines(density(x),col="red")
abline(v=c(mean(x)),col="red")
hist(y,prob=TRUE,col="yellow",xlim=c(-1,1),ylim=c(0,3))
lines(density(y),col="red")
abline(v=c(mean(y)),col="red")
hist(z,prob=TRUE,col="purple",xlim=c(-1,1),ylim=c(0,3))
lines(density(z),col="red")
abline(v=c(mean(z)),col="red")
all <- c(w,x,y,z)
hist(all,prob=TRUE,xlim=c(-1,0.5),ylim=c(0,3))
lines(density(w),col="purple")
lines(density(x),col="red")
lines(density(y),col="blue")
lines(density(z),col="green")
plot(ecdf(w),col="green")
plot(ecdf(x),col="blue",add=TRUE)
plot(ecdf(y),col="red",add=TRUE)
plot(ecdf(z),col="purple",add=TRUE)
t1 <- ks.test(w,x)
print(t1)
t2 <- ks.test(w,y)
print(t2)
t3 <- ks.test(w,z)
print(t3)
if(t1$p.value < 0.05) {
plot(ecdf(w),col="green")
plot(ecdf(x),col="blue",add=TRUE)
}
if(t2p.value < 0.05) {
plot(ecdf(w),col="green")
plot(ecdf(y),col="red",add=TRUE)
}
if(t3$p.value < 0.05) {
plot(ecdf(w),col="green")
plot(ecdf(z),col="purple",add=TRUE)
}
}
I'm able to use this function with apply for one population at a time (i.e. turn hist_fx into a function of one variable). However, I can't find a way to make this work for all four populations at the same time. I've messed around with some for loops, though they haven't been successful as of yet.
One last thing that might be of use: my data is arranged such that independent variables are the rows and the dependent variables are columns. Consequently, I need to run these per row (hence my idea of a for loop).
EDIT:
Here's the dput for one of the populations:
dput(k2)
structure(c(-0.15, 0.13, 0.23, -0.23, 0.06, -0.11, 0.107, 0.06,
-0.17, 0.12, 0.06, -0.25, -0.32, 0.13, 0.06, -0.2, -0.08, 0.06,
0.12, 0.02, 0.11, -0.11, -0.15, 0.097, 0.347, -0.307, 0.097,
-0.047, 0.09, 0.01, -0.217, 0.117, 0.03, -0.3, -0.33, 0.13, 0.19,
-0.24, -0.08, -0.01, 0.15, 0.61, 0.18, -0.15, -0.103, 0.135,
0.31, -0.25, 0.157, -0.105, -0.08, 0.01, -0.165, 0.17, 0.1, -0.23,
-0.28, 0.15, 0.13, -0.14, -0.06, 0.01, 0.07, -0.02, 0.11, -0.06,
-0.123, 0.13, 0.35, -0.27, 0.165, -0.065, 0.135, 0.13, -0.17,
0.135, 0.08, -0.21, -0.25, 0.2, 0.16, -0.18, NA, -0.04, 0.05,
-0.02, 0.13, -0.14, -0.13, 0.098, 0.27, -0.193, 0.062, -0.08,
0.057, 0.028, -0.199, 0.1, 0.04, -0.24, -0.32, 0.13, 0.13, -0.15,
-0.05, 0.01, 0.08, -0.04, 0.1, -0.1, -0.14, 0.154, 0.261, -0.194,
0.1, -0.129, 0.063, 0.142, -0.136, 0.136, 0.08, -0.23, -0.24,
0.12, 0.1, -0.16, -0.06, 0.04, 0.09, -0.01, 0.04, -0.08, -0.127,
0.133, 0.337, -0.06, 0.11, -0.107, 0.16, 0.167, -0.183, 0.103,
0.05, -0.2, -0.3, 0.22, -0.01, -0.17, -0.14, 0.02, 0.07, 0.01,
0.11, -0.11, -0.155, 0.221, 0.22, -0.172, 0.09, -0.15, 0.12,
0.03, -0.153, 0.146, 0.11, -0.2, -0.24, 0.16, 0.07, -0.19, -0.1,
0.03, 0.17, 0.02, 0.09, -0.16, -0.062, 0.19, 0.269, -0.265, 0.118,
-0.11, 0.126, 0.094, -0.186, 0.151, 0.08, -0.26, -0.31, 0.13,
0.09, -0.23, -0.12, 0.05, 0.13, 0.01, 0.11, -0.14, -0.095, 0.14,
0.24, -0.46, 0.09, -0.17, 0.08, 0.01, -0.24, 0.16, 0.04, -0.38,
-0.39, 0.11, 0.06, -0.31, -0.25, 0.03, 0.21, -0.14, 0, -0.22,
-0.07, 0.148, 0.311, -0.27, 0.11, -0.055, 0.16, 0.04, -0.197,
0.064, 0.09, -0.24, -0.34, 0.17, 0.07, -0.15, -0.18, 0.03, 0.13,
0.07, 0.13, -0.08, -0.136, 0.142, 0.27, -0.257, 0.1, -0.13, 0.103,
0.064, -0.197, 0.118, 0.06, -0.29, -0.35, 0.13, 0.1, -0.19, -0.13,
0.01, 0.1, -0.01, 0.13, -0.15), .Dim = c(22L, 12L))
To further clarify, here's the format of the actual data frame:
c1 c2 c3 c4
r2 x x x
r3 x x x
r4 x x x
Each column represents a star's values for the variable on the row. As such, I want to create a histogram for each row, for each dataset.
For the values of the function, I just used those variables for simplicity's sake. w = population 1, x = population 2, y = population 3, z = population 4.
As for an example:
> hist_fx(k2[1,],n2[1,],j2[1,],g2[1,])
Two-sample Kolmogorov-Smirnov test
data: w and x
D = 1, p-value = 1.229e-05
alternative hypothesis: two-sided
Two-sample Kolmogorov-Smirnov test
data: w and y
D = 1, p-value = 1.229e-05
alternative hypothesis: two-sided
Two-sample Kolmogorov-Smirnov test
data: w and z
D = 1, p-value = 1.229e-05
alternative hypothesis: two-sided
My problem is that currently, I can only run the function one row at a time. I'd like to be able to do it for all rows. I was thinking of using apply because I've used it in a very similar context except only for one source of data.

Not quite sure of your needs but consider transposing, t() to run plots column-wise for row data. And consider using mapply(), the multivariate type of the apply family which runs an operation element-wise at the same time for equal-length objects. Even break apart the operations as running them together may only print/plot the last iteration to screen.
Transpose (data used were slight variations of posted dput matrix)
pop1 <- data.frame(t(data))
pop2 <- data.frame(t(data))
pop3 <- data.frame(t(data))
pop4 <- data.frame(t(data))
Histograms
hist_fx <- function(w,x,y,z) {
whist <- hist(w,prob=TRUE,col="green",xlim=c(-1,1),ylim=c(0,3))
lines(density(w),col="red")
abline(v=c(mean(w)),col="red")
xhist <- hist(x,prob=TRUE,col="blue",xlim=c(-1,1),ylim=c(0,3))
lines(density(x),col="red")
abline(v=c(mean(x)),col="red")
yhist <- hist(y,prob=TRUE,col="yellow",xlim=c(-1,1),ylim=c(0,3))
lines(density(y),col="red")
abline(v=c(mean(y)),col="red")
zhist <- hist(z,prob=TRUE,col="purple",xlim=c(-1,1),ylim=c(0,3))
lines(density(z),col="red")
abline(v=c(mean(z)),col="red")
}
# HISTOGRAM PLOTS FOR EACH DF COLUMN
output <- mapply(hist_fx, w=pop1, x=pop2, y=pop3, z=pop4)
Kolmogorov-Smirnov tests (using slight variations of dput data)
hist_fx <- function(w,x,y,z) {
t1 <- ks.test(w,x)
t2 <- ks.test(w,y)
t3 <- ks.test(w,z)
if(t1$p.value < 0.05) {
plot(ecdf(w),col="green")
plot(ecdf(x),col="blue",add=TRUE)
}
if(t2$p.value < 0.05) {
plot(ecdf(w),col="green")
plot(ecdf(y),col="red",add=TRUE)
}
if(t3$p.value < 0.05) {
plot(ecdf(w),col="green")
plot(ecdf(z),col="purple",add=TRUE)
}
return(c(t1, t2, t3))
}
output <- mapply(hist_fx, w=pop1, x=pop2, y=pop3, z=pop4)
output
# X1
# statistic 0.1666667
# p.value 0.9962552
# alternative "two-sided"
# method "Two-sample Kolmogorov-Smirnov test"
# data.name "w and x"
# statistic 0.25
# p.value 0.8474885
# alternative "two-sided"
# method "Two-sample Kolmogorov-Smirnov test"
# data.name "w and y"
# statistic 0.08333333
# p.value 1
# alternative "two-sided"
# method "Two-sample Kolmogorov-Smirnov test"
# data.name "w and z"
# X2
# statistic 0.25
# p.value 0.8474885
# alternative "two-sided"
# method "Two-sample Kolmogorov-Smirnov test"
# data.name "w and x"
# statistic 0.08333333
# p.value 1
# alternative "two-sided"
# method "Two-sample Kolmogorov-Smirnov test"
# data.name "w and y"
# statistic 0.1666667
# p.value 0.9962552
# alternative "two-sided"
# method "Two-sample Kolmogorov-Smirnov test"
# data.name "w and z"
# ...

Interpolate within points in a vector

Vector V1 contains 56 observations for X, and vector BS contains a bootstrapped sample of V1 of length 100000. I would like to interpolate linearly within points in BS to fill in any missing values. For example, V1 contains no 0.27 values, and hence neither does BS. But BS would contain a few 0.28 and 0.26. I would like the interpolation to create a few 0.27 values and add those to BS. And so on for any missing values within the two extremes in the vector.
V1 <- c(0.18, 0.2, 0.24, 0.35, -0.22, -0.17, 0.28, -0.28, -0.14, 0.03, 0.87, -0.2, 0.06, -0.1, -0.72, 0.18, 0.01, 0.31, -0.36, 0.61, -0.16, -0.07, -0.13, 0.01, -0.09, 0.26, -0.14, 0.08, -0.62, -0.2, 0.3, -0.21, -0.11, 0.05, 0.06, -0.28, -0.27, 0.17, 0.42, -0.05, -0.15, 0.05, -0.07, -0.22, -0.34, 0.16, 0.34, 0.1, -0.12, 0.24, 0.45, 0.37, 0.61, 0.9, -0.25, 0.02)
BS <- sample(V1, 100000, replace=TRUE)
The approxfun functions do not help as are for interpolating within data sets. Have found a few questions/answers covering interpolating within different data sets, but not within one data set. Thank you for your help.
EDIT: please note I do not want to fit a normal distribution (or any other) to create those points.

You can use approx() (or approxfun()) to do this by treating BS as the y-coordinate and using sequential x-coordinates:
set.seed(1L); BS <- sample(V1,1e5L,T);
res <- approx(seq_along(BS),BS,n=length(BS)*2L-1L)$y;
The specification of n here is important. It ensures that exactly one interpolated value will be produced halfway between each adjacent pair of input values.
Here's a plot of an excerpt of the result, centered around the first occurrence of an adjacent pair of 0.26 and 0.28:
i <- which(BS[-length(BS)]==0.26 & BS[-1L]==0.28)[1L];
j <- i*2L-1L;
xlim <- c(j-6L,j+8L);
ylim <- c(-1,1);
xticks <- seq(xlim[1L],xlim[2L]);
yticks <- seq(ylim[1L],ylim[2L],0.05);
plot(NA,xlim=xlim,ylim=ylim,xlab='res index',ylab='y',axes=F,xaxs='i',yaxs='i');
abline(v=xticks,col='lightgrey');
abline(h=yticks,col='lightgrey');
axis(1L,xticks,cex.axis=0.7);
axis(2L,yticks,sprintf('%.02f',round(yticks,2L)),las=1L,cex.axis=0.7);
x <- seq(xlim[1L],xlim[2L],2L); y <- BS[seq(i-3L,len=8L)];
points(x,y,pch=16L,col='red',xpd=NA);
x <- seq(xlim[1L],xlim[2L]); y <- res[x];
points(x,y,pch=4L,cex=1.2,col='blue',xpd=NA);
text(x+0.24,y+0.03,y,cex=0.7,xpd=NA);
legend(xlim[1L]+1.5,0.87,c('input value','interpolated'),col=c('red','blue'),pch=c(16L,4L));

text to expression in function of variance estimation of derived parameters via Delta Method

I have written a function to perform matrix multiplication on each row of the data set pd.matrix. The function my.var.function performs as intended. However, now I want to generalize the function to handle matrices of variable sizes instead of just the example matrix with five columns.
To generalize the function I imagine that I will need to replace x[1], x[2], x[3], x[4], x[5] in the apply statement with something like x[1]:x[ncol(pd.matrix)]. I imagine I similarly will need to replace the two instances of (x1, x2, x3, x4, x5) within the function.
I have tried making these changes with eval(parse(text= followed by paste0 to create the desired series of x1, x2, x3, x4, x5 or x[1], x[2], x[3], x[4], x[5] for this example. However, I have been unable to get eval(parse(text= to work after trying numerous permutations.
How can I generalize the function and apply statement to handle a pd.matrix of n columns rather than five columns?
pd.matrix <- matrix(c(0.10, 0.20, 0.30, 0.40, 0.50,
0.11, 0.21, 0.31, 0.41, 0.51,
0.12, 0.22, 0.32, 0.42, 0.52,
0.13, 0.23, 0.33, 0.43, 0.53,
0.14, 0.24, 0.34, 0.44, 0.54), nrow = 5, byrow = TRUE)
vcv.mat = matrix(c(0.01, 0.0020, 0.0030, 0.0040, 0.0050,
0.0020, 0.02, 0.0031, 0.0041, 0.0051,
0.0030, 0.0031, 0.03, 0.0042, 0.0052,
0.0040, 0.0041, 0.0042, 0.04, 0.0053,
0.0050, 0.0051, 0.0052, 0.0053, 0.05), nrow = 5, byrow = TRUE)
my.var.function <- function(x1, x2, x3, x4, x5) {
my.pd <- matrix(c(x1, x2, x3, x4, x5), nrow = 1)
my.mat = my.pd %*% vcv.mat
my.var = my.mat %*% t(my.pd)
return(my.var = my.var)
}
apply(pd.matrix, 1, function(x) my.var.function(x[1], x[2], x[3], x[4], x[5]))
# [1] 0.0303160 0.0319642 0.0336588 0.0353998 0.0371872

The solution turned out to be very simple. Not sure why I did not see this solution before.
pd.matrix <- matrix(c(0.10, 0.20, 0.30, 0.40, 0.50,
0.11, 0.21, 0.31, 0.41, 0.51,
0.12, 0.22, 0.32, 0.42, 0.52,
0.13, 0.23, 0.33, 0.43, 0.53,
0.14, 0.24, 0.34, 0.44, 0.54), nrow = 5, byrow = TRUE)
vcv.mat = matrix(c(0.01, 0.0020, 0.0030, 0.0040, 0.0050,
0.0020, 0.02, 0.0031, 0.0041, 0.0051,
0.0030, 0.0031, 0.03, 0.0042, 0.0052,
0.0040, 0.0041, 0.0042, 0.04, 0.0053,
0.0050, 0.0051, 0.0052, 0.0053, 0.05), nrow = 5, byrow = TRUE)
my.var.function <- function(x) {
my.pd <- matrix(c(x), nrow = 1)
my.mat = my.pd %*% vcv.mat
my.var = my.mat %*% t(my.pd)
return(my.var = my.var)
}
apply(pd.matrix, 1, function(x) my.var.function(x))
# [1] 0.0303160 0.0319642 0.0336588 0.0353998 0.0371872

Add 3D abline to cloud plot in R's lattice package

I want to add a 3D abline to a cloud scatterplot in R's lattice package. Here's a subset of my data (3 variables all between 0,1):
dat <- structure(c(0.413, 0.879, 0.016, 0.631, 0.669, 0.048, 1, 0.004, 0.523, 0.001,
0.271, 0.306, 0.014, 0.008, 0.001, 0.023, 0.670, 0.027, 0.291, 0.709,
0.002, 0.003, 0.611, 0.024, 0.580, 0.755, 1, 0.003, 0.038, 0.143, 0.214,
0.161, 0.008, 0.027, 0.109, 0.026, 0.229, 0.006, 0.377, 0.191, 0.724,
0.119, 0.203, 0.002, 0.309, 0.011, 0.141, 0.009, 0.340, 0.152, 0.545,
0.001, 0.217, 0.132, 0.839, 0.052, 0.745, 0.001, 1, 0.273), .Dim = c(20L, 3L))
Here's the cloud plot:
# cloud plot
trellis.par.set("axis.line", list(col="transparent"))
cloud(dat[, 1] ~ dat[, 2] + dat[, 3], pch=16, col="darkorange", groups=NULL, cex=0.8,
screen=list(z = 30, x = -70, y = 0),
scales=list(arrows=FALSE, cex=0.6, col="black", font=3, tck=0.6, distance=1) )
I want to add a dashed grey line between 0,0,0 and 1,1,1 (i.e., diagonally through the plot). I know I can change the points to lines using "type="l", panel.3d.cloud=panel.3dscatter", but I can't see a way to add extra points/lines to the plot using this.
Here's an example of what I want to achieve using scatterplot3d:
# scatterplot3d
s3d <- scatterplot3d(dat, type="p", color="darkorange", angle=55, scale.y=0.7,
pch=16, col.axis="blue", col.grid="lightblue")
# add line
s3d$points3d(c(0,1), c(0,1), c(0,1), col="grey", type="l", lty=2)
I want to do this with a cloud plot to control the angle at which I view the plot (scatterplot3d doesn't allow me to have the 0,0,0 corner of the plot facing). Thanks for any suggestions.

Inelegant and probably fragile, but this seems to work ...
cloud(dat[, 1] ~ dat[, 2] + dat[, 3], pch=16, col="darkorange",
groups=NULL, cex=0.8,
screen=list(z = 30, x = -70, y = 0),
scales=list(arrows=FALSE, cex=0.6, col="black", font=3,
tck=0.6, distance=1) ,
panel=function(...) {
L <- list(...)
L$x <- L$y <- L$z <- c(0,1)
L$type <- "l"
L$col <- "gray"
L$lty <- 2
do.call(panel.cloud,L)
p <- panel.cloud(...)
})
One thing to keep in mind is that this will not do hidden point/line removal, so the line will be either in front of all of the points or behind them all; in this (edited) version, do.call(panel.cloud,L) is first so the points will obscure the line rather than vice versa. If you want hidden line removal then I believe rgl is your only option ... very powerful but not as pretty and with a much more primitive interface.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Standardizing a vector in R so that values shift towards boundaries - r

Related

Plot conditional density curve `P(Y|X)` along a linear regression line

Using apply using multiple sources of data?

Interpolate within points in a vector

text to expression in function of variance estimation of derived parameters via Delta Method

Add 3D abline to cloud plot in R's lattice package

Categories

Resources