Creating a loop to run a statistical test - r

I need to run a statistical test and compute p_hat_6, using N = 10 samples each of size n = 10 from X ∼ uniform(0, 14). Here is my original loop to calculate p_hat at mu=6:
pvalue <- rep(0,10)
reject <- 0
alpha <- 0.05
N <- 10
for (n in seq_along(pvalue)){
pvalue[n] <- wilcox.test(runif(10,0,14), mu=6)$p.value
reject[n] <- ifelse(pvalue[n] > alpha,0,1)
}
p_hat_6 <- (sum(reject))/N
p_hat_6
Using the same N and n, I need to repeat it 5 more times with the following changes:
(1) Data from X ∼ uniform(0, 16) and compute p_hat_7 (mu=7)
(2) Data from X ∼ uniform(0, 18) and compute p_hat_8 (mu=8)
(3) Data from X ∼ uniform(0, 20) and compute p_hat_9 (mu=9)
(4) Data from X ∼ uniform(0, 22) and compute p_hat_10 (mu=10)
(5) Data from X ∼ uniform(0, 24) and compute p_hat_11 (mu=11)
How do I loop through as the intervals increase by 2 each time and mu increase by 1 each time?

We can create a function
fun1 <- function(N, pvalue, reject, max_val, mu_val) {
for (n in seq_along(pvalue)){
pvalue[n] <- wilcox.test(runif(N ,0, max_val), mu=mu_val)$p.value
reject[n] <- ifelse(pvalue[n] > alpha,0,1)
}
p_hat_6 <- (sum(reject))/N
return(p_hat_6)
}
and use that in Map
Map(fun1, max_val = seq(14, 24, by = 2), mu_val = 6:11,
MoreArgs = list(N = 10, pvalue = pvalue, reject = reject))
-output
[[1]]
[1] 0
[[2]]
[1] 0.2
[[3]]
[1] 0.2
[[4]]
[1] 0
[[5]]
[1] 0.1
[[6]]
[1] 0.2
Or create an outer loop around the function
mu_val <- 6:11
max_val <- seq(14, 24, by = 2)
out <- numeric(length(mu_val))
for(i in seq_along(mu_val)) {
out[i] <- fun1(N = 10, pvalue = pvalue,
reject = reject, max_val = max_val[i], mu_val = mu_val[i])
}
out
[1] 0.1 0.1 0.1 0.1 0.3 0.0
data
N <- 10
pvalue <- rep(0, N)
reject <- rep(0, N)
alpha <- 0.05

Related

floating-point error fligner.test r function?

I have noticed that using the statistical test fligner.test from the r stats package provides different results with a simple transformation, even though this shouldn't be the case.
Here an example (the difference for the original dataset is much more dramatic):
g <- factor(rep(1:2, each=6))
x1 <- c(2,2,6,6,1,4,5,3,5,6,5,5)
x2 <- (x1-1)/5 #> cor(x1,x2) [1] 1
fligner.test(x1,g) # chi-squared = 4.2794, df = 1, p-value = 0.03858
fligner.test(x2,g) # chi-squared = 4.8148, df = 1, p-value = 0.02822
Looking at the function code, I have noticed that the median centering might be causing the issue:
x1 <- x1 - tapply(x1,g,median)[g]
x2 <- x2 - tapply(x2,g,median)[g]
unique(abs(x1)) # 1 3 2 0
unique(abs(x2)) # 0.2 0.6 0.4 0.2 0.0 <- repeated 0.2
Is this a known issue, and how should this inconsistency be resolved?
I think your analysis is correct here. In your example the problem ultimately occurs because (0.8 - 0.6) == 0.2 is FALSE unless rounded to 15 decimal places. You should file a bug report, since this is avoidable.
If you are desperate in the meantime, you can adapt stats:::fligner.test.default by applying a tiny bit of rounding at the median centering stage to remove floating point inequalities:
fligner <- function (x, g, ...)
{
if (is.list(x)) {
if (length(x) < 2L)
stop("'x' must be a list with at least 2 elements")
DNAME <- deparse1(substitute(x))
x <- lapply(x, function(u) u <- u[complete.cases(u)])
k <- length(x)
l <- lengths(x)
if (any(l == 0))
stop("all groups must contain data")
g <- factor(rep(1:k, l))
x <- unlist(x)
}
else {
if (length(x) != length(g))
stop("'x' and 'g' must have the same length")
DNAME <- paste(deparse1(substitute(x)), "and",
deparse1(substitute(g)))
OK <- complete.cases(x, g)
x <- x[OK]
g <- g[OK]
g <- factor(g)
k <- nlevels(g)
if (k < 2)
stop("all observations are in the same group")
}
n <- length(x)
if (n < 2)
stop("not enough observations")
x <- round(x - tapply(x, g, median)[g], 15)
a <- qnorm((1 + rank(abs(x))/(n + 1))/2)
a <- a - mean(a)
v <- sum(a^2)/(n - 1)
a <- split(a, g)
STATISTIC <- sum(lengths(a) * vapply(a, mean, 0)^2)/v
PARAMETER <- k - 1
PVAL <- pchisq(STATISTIC, PARAMETER, lower.tail = FALSE)
names(STATISTIC) <- "Fligner-Killeen:med chi-squared"
names(PARAMETER) <- "df"
METHOD <- "Fligner-Killeen test of homogeneity of variances"
RVAL <- list(statistic = STATISTIC, parameter = PARAMETER,
p.value = PVAL, method = METHOD, data.name = DNAME)
class(RVAL) <- "htest"
return(RVAL)
}
This now returns the correct result for both your vectors:
fligner(x1,g)
#>
#> Fligner-Killeen test of homogeneity of variances
#>
#> data: x1 and g
#> Fligner-Killeen:med chi-squared = 4.2794, df = 1, p-value = 0.03858
fligner(x2,g)
#>
#> Fligner-Killeen test of homogeneity of variances
#>
#> data: x2 and g
#> Fligner-Killeen:med chi-squared = 4.2794, df = 1, p-value = 0.03858

Error in while (e_i$X1 < 12 | e_i$X2 < 12) { : argument is of length zero

In an earlier question (R: Logical Conditions Not Being Respected), I learned how to make the following simulation :
Step 1: Keep generating two random numbers "a" and "b" until both "a" and "b" are greater than 12
Step 2: Track how many random numbers had to be generated until it took for Step 1 to be completed
Step 3: Repeat Step 1 and Step 2 100 times
res <- matrix(0, nrow = 0, ncol = 3)
for (j in 1:100){
a <- rnorm(1, 10, 1)
b <- rnorm(1, 10, 1)
i <- 1
while(a < 12 | b < 12) {
a <- rnorm(1, 10, 1)
b <- rnorm(1, 10, 1)
i <- i + 1
}
x <- c(a,b,i)
res <- rbind(res, x)
}
head(res)
[,1] [,2] [,3]
x 12.14232 12.08977 399
x 12.27158 12.01319 1695
x 12.57345 12.42135 302
x 12.07494 12.64841 600
x 12.03210 12.07949 82
x 12.34006 12.00365 782
Question: Now, I am trying to make a slight modification to the above code - Instead of "a" and "b" being produced separately, I want them to be produced "together" (in math terms: "a" and "b" were being produced from two independent univariate normal distributions, now I want them to come from a bivariate normal distribution).
I tried to modify this code myself:
library(MASS)
Sigma = matrix(
c(1,0.5, 0.5, 1), # the data elements
nrow=2, # number of rows
ncol=2, # number of columns
byrow = TRUE) # fill matrix by rows
res <- matrix(0, nrow = 0, ncol = 3)
for (j in 1:100){
e_i = data.frame(mvrnorm(n = 1, c(10,10), Sigma))
e_i$i <- 1
while(e_i$X1 < 12 | e_i$X2 < 12) {
e_i = data.frame(mvrnorm(n = 1, c(10,10), Sigma))
e_i$i <- i + 1
}
x <- c(e_i$X1, e_i$X2 ,i)
res <- rbind(res, x)
}
res = data.frame(res)
But this is producing the following error:
Error in while (e_i$X1 < 12 | e_i$X2 < 12) { : argument is of length
zero
If I understand your code correctly you are trying to see how many samples occur before both values are >=12 and doing that for 100 trials? This is the approach I would take:
library(MASS)
for(i in 1:100){
n <- 1
while(any((x <- mvrnorm(1, mu=c(10,10), Sigma=diag(0.5, nrow=2)+0.5))<12)) n <- n+1
if(i==1) res <- data.frame("a"=x[1], "b"=x[2], n)
else res <- rbind(res, data.frame("a"=x[1], "b"=x[2], n))
}
Here I am assigning the results of a mvrnorm to x within the while() call. In that same call, it evaluates whether either are less than 12 using the any() function. If that evaluates to FALSE, n (the counter) is increased and the process repeated. Once TRUE, the values are appended to your data.frame and it goes back to the start of the for-loop.
Regarding your code, the mvrnorm() function is returning a vector, not a matrix, when n=1 so both values go into a single variable in the data.frame:
data.frame(mvrnorm(n = 1, c(10,10), Sigma))
Returns:
mvrnorm.n...1..c.10..10...Sigma.
1 9.148089
2 10.605546
The matrix() function within your data.frame() calls, along with some tweaks to your use of i, will fix your code:
library(MASS)
Sigma = matrix(
c(1,0.5, 0.5, 1), # the data elements
nrow=2, # number of rows
ncol=2, # number of columns
byrow = TRUE) # fill matrix by rows
res <- matrix(0, nrow = 0, ncol = 3)
for (j in 1:10){
e_i = data.frame(matrix(mvrnorm(n = 1, c(10,10), Sigma), ncol=2))
i <- 1
while(e_i$X1[1] < 12 | e_i$X2[1] < 12) {
e_i = data.frame(matrix(mvrnorm(n = 1, c(10,10), Sigma), ncol=2))
i <- i + 1
}
x <- c(e_i$X1, e_i$X2 ,i)
res <- rbind(res, x)
}
res = data.frame(res)

How can I modify my code to include loop?

I am trying to create a function that examines how variables with different distributions influence OLS results. I have created two DVs (y1 and y2) but would like to expand this to include five or so. I am trying to change my code to include a loop so I do not have to copy and paste this multiple times, but I am not having much luck. Any suggestions would be greatly appreciated.
library(psych)
library(arm)
library(plyr)
library(fBasics)
regsim <- function(iter, n) {
ek1 <- rnorm(n, 0, 1)
ek2 <- rnorm(n, 0, 5)
x <- rnorm(n, 0, .5)
y1 <- .3*x + ek1
y2 <- .3*x + ek2
#y1
lm1 <- lm(y1 ~ x)
bhat1 <- coef (lm1)[2]
sehat1 <- se.coef (lm1) [2]
skewy1 <- skew(y1)
stdevy1 <- stdev(y1)
#y2
lm2 <- lm(y2 ~ x)
bhat2 <- coef (lm2)[2]
sehat2 <- se.coef (lm2) [2]
skewy2 <- skew(y2)
stdevy2 <- stdev(y2)
results <- c(bhat1, sehat1, stdevy1, skewy1,
bhat2, sehat2, stdevy2, skewy2)
names(results) <- c('b1', 'se1', 'sdy1', 'skewy1',
'b2', 'se2', 'sdy2', 'skewy2')
return(results)
}
iter <-1000
n <-500
results <- NULL
sims <-ldply(1:iter, regsim, n)
sims$n <- n
results <- rbind(results, sims)
Another option...
regsim <- function(n=100,num.y=5,sd=c(1:5)){
if(length(sd) != num.y){stop('length of sd must match number of dependent vars')
} else {
ldply(1:num.y,function(x){
e <- rnorm(n,0,sd=sd[x])
x <- rnorm(n,0,5)
y <- 0.3*x + e
out <- lm(y~x)
b1 <- coef(out)[2]
int <- coef(out)[1]
data.frame(b1=b1,int=int)
})
}
}
regsim(num.y=10,sd=c(1:10))
b1 int
1 0.30817303 0.0781049
2 0.38681600 -0.3359067
3 0.24560773 -0.0277561
4 0.08032659 0.1877233
5 0.39873955 -0.6027522
6 0.21729930 0.7384340
7 0.33761456 -0.1053028
8 0.26502006 -0.1851552
9 0.15452261 -1.6334873
10 -0.10496863 -0.3225169
This will allow you to specify the number of dependent variables and the SD for each error term. You can then use replicate to repeat the function for the desired number of replications.
replicate(10,regsim(),simplify = F)
[[1]]
b1 int
1 0.3047779 -0.01984306
2 0.3133198 -0.20458410
3 0.2833979 -0.25307502
4 0.3066878 -0.03235019
5 0.1374949 0.10958616
[[2]]
b1 int
1 0.2902103 -0.12683502
2 0.3499006 0.06691437
3 0.1949797 -0.14371830
4 0.2358269 0.53117467
5 0.2869511 0.16281380
[[3]]
b1 int
1 0.2952211 0.05905549
2 0.2367774 0.02862166
3 0.0896778 -0.08467935
4 0.2352622 -0.20835837
5 0.3149963 0.07042032
[[4]]
b1 int
1 0.2946468 -0.08266406
2 0.3322577 0.17558135
3 0.2200087 -0.25778150
4 0.1822915 0.34962679
5 0.2442479 0.34433656
[[5]]
b1 int
1 0.2882853 0.12677506
2 0.3455534 -0.27885958
3 0.2981193 0.04598347
4 0.3380173 0.05243198
5 0.2148643 -0.09631672
[[6]]
b1 int
1 0.2962269 0.03743759
2 0.2979327 -0.12830803
3 0.3352781 -0.03935422
4 0.2584965 -0.05924351
5 0.2856802 0.03430055
[[7]]
b1 int
1 0.2968077 -0.10300109
2 0.2954560 0.25979902
3 0.3276077 -0.07001758
4 0.1825841 0.13508932
5 0.4302788 -0.13951914
[[8]]
b1 int
1 0.2992147 0.02084806
2 0.2765976 0.07277813
3 0.2469616 0.44580403
4 0.2601966 -0.09849855
5 0.2679183 0.50501652
[[9]]
b1 int
1 0.2963905 0.03308366
2 0.3356783 -0.06080088
3 0.3199835 0.22533444
4 0.3546083 -0.26909478
5 0.3536241 -0.19795094
[[10]]
b1 int
1 0.3100336 -0.05228032
2 0.4076447 -0.18715063
3 0.3436858 -0.37518649
4 0.4569368 -0.09114672
5 0.3255668 -0.18738138
How about this:
n <- 1000
x <- rnorm(n, 0, .5)
fun_reg <- function(n, ek_mu, ek_sd, x){
s <- list() # list to collect results for output
ek <- rnorm(n, ek_mu, ek_sd)
y <- .3*x + ek
m <- lm(y ~ x)
s$bhat <- coef(m)[2]
s$sehat <- arm::se.coef(m)[2]
s$skewy <- psych::skew(y)
s$stdevy <- fBasics::stdev(y)
return(s)
}
purrr::map_dfr(c(1, 5, 10, 20, 50), ~fun_reg(n, 0, ., x))
Edit:
This now has 500 observations each and the regression is repeated with 1000 draws for each value of the standard deviation. A variable ek_sd has been added to the final output, to reflect with which standard deviation the values were arrived at. Note that x is not redrawn for each iteration, but I'm not entirely sure, that that is what you want. If you want x to be redrawn at each iteration, move it inside the function.
n <- 500
x <- rnorm(n, 0, .5)
fun_reg <- function(n, ek_mu, ek_sd, x){
s <- list()
ek <- rnorm(n, ek_mu, ek_sd)
y <- .3*x + ek
m <- lm(y ~ x)
s$ek_sd <- ek_sd
s$bhat <- coef(m)[2]
s$sehat <- arm::se.coef(m)[2]
s$skewy <- psych::skew(y)
s$stdevy <- fBasics::stdev(y)
return(s)
}
intr <- unlist(lapply(c(1, 5, 10, 20, 50), rep, 1000))
purrr::map_dfr(intr, ~fun_reg(n, 0, ., x))
This reduces the package reliance to just psych::skew and an optional ggplot2 call:
library(psych)
regsim <- function(n, eks) {
x <- rnorm(n, 0, .5)
ek <- sapply(eks, function(x) rnorm(n, 0, x))
y <- 0.3 * x + ek
lms <- lm(y ~ x)
data.frame(b_hat = lms[['coefficients']][2,],
int = lms[['coefficients']][1, ],
skew_y = psych::skew(y),
se_hat = unlist(lapply(summary(lms), function(lst) lst[[4]][2,2]), use.names = FALSE),
sd_y = apply(y, 2, sd),
sd_eks = eks
)
}
iter <-1000
n <-500
eks_sd = c(1,5)
# do the simulations and make them into a nice data.frame
sims <- replicate(iter, regsim(n, eks_sd), simplify = FALSE)
results <- do.call(rbind, sims)
#next parts are optional
results$iter_id <- rep(seq_len(iter), each = length(eks_sd))
tibble::as_tibble(results)
# Random graph because everyone loves graphs
library(ggplot2)
ggplot(results, aes(x = iter_id, y = int)) + geom_point() + facet_grid(vars(sd_eks))
The main thing is that lm can take multiple y arguments. That's why we we create a matrix of ek using sapply.

Write a loop to select all combination of variable values generating positive equation values in R

I have the following four equations (a,b,c,d), with several different variables (x,t,v,w,n,f). My goal would be to try and find all sets of variable values that would generate all positive (and non-zero) numbers for equations(a,b,c,d). A regular loop would just go through each number of the sequence generated and systematically check if it generates a positive value or not. I want it to pick up random numbers from each sequence and test it against the others in R.
For example (x=8, t = 2.1,v=13,w=1,n=10,f=1) is a possible set of combinations.
Please do not suggest analytically solving for these and then finding out values. These are simply representations of equations I'm dealing with. The equations I have are quite complex, and more than 15 variables.
#Equations
a <- x * t - 2*x
b <- v - x^2
c <- x - w*t - t*t
d <- (n - f)/t
x <- seq(from = 0.0001, to = 1000, by = 0.1)
t <- seq(from = 0.0001, to = 1000, by = 0.1)
v <- seq(from = 0.0001, to = 1000, by = 0.1)
w <- seq(from = 0.0001, to = 1000, by = 0.1)
n <- seq(from = 0.0001, to = 1000, by = 0.1)
f <- seq(from = 0.0001, to = 1000, by = 0.1)
For a start, it might be better to organize your equations and your probe values into lists:
set.seed(1222)
values <- list(x = x, t = t, v = v, w = w, n = n, f = f)
eqs <- list(
a = expression(x * t - 2 * x),
b = expression(v - x^2),
c = expression(x - w*t - t*t),
d = expression((n - f)/t)
)
Then we can define a number of samples to take randomly from each probe vector:
samples <- 3
values.sampled <- lapply(values, sample, samples)
$x
[1] 642.3001 563.1001 221.3001
$t
[1] 583.9001 279.0001 749.1001
$v
[1] 446.6001 106.7001 0.7001
$w
[1] 636.0001 208.8001 525.5001
$n
[1] 559.8001 28.4001 239.0001
$f
[1] 640.4001 612.5001 790.1001
We can then iterate over each stored equation, evaluating the equation within the "sampled" environment:
results <- sapply(eqs, eval, envir = values.sampled)
a b c d
[1,] 373754.5 -412102.82 -711657.5 -0.1380373
[2,] 155978.8 -316975.02 -135533.2 -2.0935476
[3,] 165333.3 -48973.03 -954581.8 -0.7356827
From there you can remove any value that is 0 or less:
results[results <= 0] <- NA
If every independent value can take on the same value (e.g. seq(from = 0.0001, to = 1000, by = 0.1)), we can approach this with much greater rigor and avoid the possibility of generating duplicates. First we create a masterFun that is essentially a wrapper for all of the functions you want to define:
masterFun <- function(y) {
## y is a vector with 6 values
## y[1] -->> x
## y[2] -->> t
## y[3] -->> v
## y[4] -->> w
## y[5] -->> n
## y[6] -->> f
fA <- function(x, t) {x * t - 2*x}
fB <- function(v, x) {v - x^2}
fC <- function(x, w, t) {x - w*t - t*t}
fD <- function(n, f, t) {(n - f)/t}
## one can easily filter out negative
## results as #jdobres has done.
c(a = fA(y[1], y[2]), b = fB(y[3], y[1]),
c = fC(y[1], y[4], y[2]), d = fD(y[5], y[6], y[2]))
}
Now, using permuteSample, which is capable of generating random permutations of a vector and subsequently applying any given user defined function to each permutation, from RcppAlgos (I am the author), we have:
## Not technically the domain, but this variable name
## is concise and very descriptive
domain <- seq(from = 0.0001, to = 1000, by = 0.1)
library(RcppAlgos)
## number of variables ... x, t, v, w, n, f
## ||
## \/
permuteSample(domain, m = 6, repetition = TRUE,
n = 3, seed = 123, FUN = masterFun)
[[1]]
a b c d
218830.316100 -608541.146040 -310624.596670 -1.415869
[[2]]
a b c d
371023.322880 -482662.278860 -731052.643620 1.132836
[[3]]
a b c d
18512.60761001 -12521.71284001 -39722.27696002 -0.09118721
In short, the underlying algorithm is capable of generating the nth lexicographical result, which allows us to apply a mapping from 1 to "# of total permutations" to the permutations themselves. For example, given the permutations of the vector 1:3:
permuteGeneral(3, 3)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 3 2
[3,] 2 1 3
[4,] 2 3 1
[5,] 3 1 2
[6,] 3 2 1
We can easily generate the 2nd and the 5th permutation above without generating the first permutation or the first four permutations:
permuteSample(3, 3, sampleVec = c(2, 5))
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 3 1 2
This allows us to have a more controlled and tangible grasp of our random samples as we can now think of them in a more familiar way (i.e. a random sample of numbers).
If you actually want to see which variables were used in the above calculation, we simply drop the FUN argument:
permuteSample(domain, m = 6, repetition = TRUE, n = 3, seed = 123)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 780.7001 282.3001 951.5001 820.8001 289.1001 688.8001
[2,] 694.8001 536.0001 84.9001 829.2001 757.3001 150.1001
[3,] 114.7001 163.4001 634.4001 80.4001 327.2001 342.1001

Function for polynomials of arbitrary order (symbolic method preferred)

I've found polynomial coefficients from my data:
R <- c(0.256,0.512,0.768,1.024,1.28,1.437,1.594,1.72,1.846,1.972,2.098,2.4029)
Ic <- c(1.78,1.71,1.57,1.44,1.25,1.02,0.87,0.68,0.54,0.38,0.26,0.17)
NN <- 3
ft <- lm(Ic ~ poly(R, NN, raw = TRUE))
pc <- coef(ft)
So I can create a polynomial function:
f1 <- function(x) pc[1] + pc[2] * x + pc[3] * x ^ 2 + pc[4] * x ^ 3
And for example, take a derivative:
g1 <- Deriv(f1)
How to create a universal function so that it doesn't have to be rewritten for every new polynomial degree NN?
My original answer may not be what you really want, as it was numerical rather symbolic. Here is the symbolic solution.
## use `"x"` as variable name
## taking polynomial coefficient vector `pc`
## can return a string, or an expression by further parsing (mandatory for `D`)
f <- function (pc, expr = TRUE) {
stringexpr <- paste("x", seq_along(pc) - 1, sep = " ^ ")
stringexpr <- paste(stringexpr, pc, sep = " * ")
stringexpr <- paste(stringexpr, collapse = " + ")
if (expr) return(parse(text = stringexpr))
else return(stringexpr)
}
## an example cubic polynomial with coefficients 0.1, 0.2, 0.3, 0.4
cubic <- f(pc = 1:4 / 10, TRUE)
## using R base's `D` (requiring expression)
dcubic <- D(cubic, name = "x")
# 0.2 + 2 * x * 0.3 + 3 * x^2 * 0.4
## using `Deriv::Deriv`
library(Deriv)
dcubic <- Deriv(cubic, x = "x", nderiv = 1L)
# expression(0.2 + x * (0.6 + 1.2 * x))
Deriv(f(1:4 / 10, FALSE), x = "x", nderiv = 1L) ## use string, get string
# [1] "0.2 + x * (0.6 + 1.2 * x)"
Of course, Deriv makes higher order derivatives easier to get. We can simply set nderiv. For D however, we have to use recursion (see examples of ?D).
Deriv(cubic, x = "x", nderiv = 2L)
# expression(0.6 + 2.4 * x)
Deriv(cubic, x = "x", nderiv = 3L)
# expression(2.4)
Deriv(cubic, x = "x", nderiv = 4L)
# expression(0)
If we use expression, we will be able to evaluate the result later. For example,
eval(cubic, envir = list(x = 1:4)) ## cubic polynomial
# [1] 1.0 4.9 14.2 31.3
eval(dcubic, envir = list(x = 1:4)) ## its first derivative
# [1] 2.0 6.2 12.8 21.8
The above implies that we can wrap up an expression for a function. Using a function has several advantages, one being that we are able to plot it using curve or plot.function.
fun <- function(x, expr) eval.parent(expr, n = 0L)
Note, the success of fun requires expr to be an expression in terms of symbol x. If expr was defined in terms of y for example, we need to define fun with function (y, expr). Now let's use curve to plot cubic and dcubic, on a range 0 < x < 5:
curve(fun(x, cubic), from = 0, to = 5) ## colour "black"
curve(fun(x, dcubic), add = TRUE, col = 2) ## colour "red"
The most convenient way, is of course to define a single function FUN rather than doing f + fun combination. In this way, we also don't need to worry about the consistency on the variable name used by f and fun.
FUN <- function (x, pc, nderiv = 0L) {
## check missing arguments
if (missing(x) || missing(pc)) stop ("arguments missing with no default!")
## expression of polynomial
stringexpr <- paste("x", seq_along(pc) - 1, sep = " ^ ")
stringexpr <- paste(stringexpr, pc, sep = " * ")
stringexpr <- paste(stringexpr, collapse = " + ")
expr <- parse(text = stringexpr)
## taking derivatives
dexpr <- Deriv::Deriv(expr, x = "x", nderiv = nderiv)
## evaluation
val <- eval.parent(dexpr, n = 0L)
## note, if we take to many derivatives so that `dexpr` becomes constant
## `val` is free of `x` so it will only be of length 1
## we need to repeat this constant to match `length(x)`
if (length(val) == 1L) val <- rep.int(val, length(x))
## now we return
val
}
Suppose we want to evaluate a cubic polynomial with coefficients pc <- c(0.1, 0.2, 0.3, 0.4) and its derivatives on x <- seq(0, 1, 0.2), we can simply do:
FUN(x, pc)
# [1] 0.1000 0.1552 0.2536 0.4144 0.6568 1.0000
FUN(x, pc, nderiv = 1L)
# [1] 0.200 0.368 0.632 0.992 1.448 2.000
FUN(x, pc, nderiv = 2L)
# [1] 0.60 1.08 1.56 2.04 2.52 3.00
FUN(x, pc, nderiv = 3L)
# [1] 2.4 2.4 2.4 2.4 2.4 2.4
FUN(x, pc, nderiv = 4L)
# [1] 0 0 0 0 0 0
Now plotting is also easy:
curve(FUN(x, pc), from = 0, to = 5)
curve(FUN(x, pc, 1), from = 0, to = 5, add = TRUE, col = 2)
curve(FUN(x, pc, 2), from = 0, to = 5, add = TRUE, col = 3)
curve(FUN(x, pc, 3), from = 0, to = 5, add = TRUE, col = 4)
Since my final solution with symbolic derivatives eventually goes too long, I use a separate session for numerical calculations. We can do this as for polynomials, derivatives are explicitly known so we can code them. Note, there will be no use of R expression here; everything is done directly by using functions.
So we first generate polynomial basis from degree 0 to degree p - n, then multiply coefficient and factorial multiplier. It is more convenient to use outer than poly here.
## use `outer`
g <- function (x, pc, nderiv = 0L) {
## check missing aruments
if (missing(x) || missing(pc)) stop ("arguments missing with no default!")
## polynomial order p
p <- length(pc) - 1L
## number of derivatives
n <- nderiv
## earlier return?
if (n > p) return(rep.int(0, length(x)))
## polynomial basis from degree 0 to degree `(p - n)`
X <- outer(x, 0:(p - n), FUN = "^")
## initial coefficients
## the additional `+ 1L` is because R vector starts from index 1 not 0
beta <- pc[n:p + 1L]
## factorial multiplier
beta <- beta * factorial(n:p) / factorial(0:(p - n))
## matrix vector multiplication
drop(X %*% beta)
}
We still use the example x and pc defined in the symbolic solution:
x <- seq(0, 1, by = 0.2)
pc <- 1:4 / 10
g(x, pc, 0)
# [1] 0.1000 0.1552 0.2536 0.4144 0.6568 1.0000
g(x, pc, 1)
# [1] 0.200 0.368 0.632 0.992 1.448 2.000
g(x, pc, 2)
# [1] 0.60 1.08 1.56 2.04 2.52 3.00
g(x, pc, 3)
# [1] 2.4 2.4 2.4 2.4 2.4 2.4
g(x, pc, 4)
# [1] 0 0 0 0 0 0
The result is consistent with what we have with FUN in the the symbolic solution.
Similarly, we can plot g using curve:
curve(g(x, pc), from = 0, to = 5)
curve(g(x, pc, 1), from = 0, to = 5, col = 2, add = TRUE)
curve(g(x, pc, 2), from = 0, to = 5, col = 3, add = TRUE)
curve(g(x, pc, 3), from = 0, to = 5, col = 4, add = TRUE)
Now after quite much effort in demonstrating how we can work out this question ourselves, consider using R package polynom. As a small package, it aims at implementing construction, derivatives, integration, arithmetic and roots-finding of univariate polynomials. This package is written completely with R language, without any compiled code.
## install.packages("polynom")
library(polynom)
We still consider the cubic polynomial example used before.
pc <- 1:4 / 10
## step 1: making a "polynomial" object as preparation
pcpoly <- polynomial(pc)
#0.1 + 0.2*x + 0.3*x^2 + 0.4*x^3
## step 2: compute derivative
expr <- deriv(pcpoly)
## step 3: convert to function
g1 <- as.function(expr)
#function (x)
#{
# w <- 0
# w <- 1.2 + x * w
# w <- 0.6 + x * w
# w <- 0.2 + x * w
# w
#}
#<environment: 0x9f4867c>
Note, by step-by-step construction, the resulting function has all parameters inside. It only requires a single argument for x value. In contrast, functions in the other two answers will take coefficients and derivative order as mandatory arguments, too. We can call this function
g1(seq(0, 1, 0.2))
# [1] 0.200 0.368 0.632 0.992 1.448 2.000
To produce the same graph we see in other two answers, we get other derivatives as well:
g0 <- as.function(pcpoly) ## original polynomial
## second derivative
expr <- deriv(expr)
g2 <- as.function(expr)
#function (x)
#{
# w <- 0
# w <- 2.4 + x * w
# w <- 0.6 + x * w
# w
#}
#<environment: 0x9f07c68>
## third derivative
expr <- deriv(expr)
g3 <- as.function(expr)
#function (x)
#{
# w <- 0
# w <- 2.4 + x * w
# w
#}
#<environment: 0x9efd740>
Perhaps you have already noticed that I did not specify nderiv, but recursively take 1 derivative at a time. This may be a disadvantage of this package. It does not facilitate higher order derivatives.
Now we can make a plot
## As mentioned, `g0` to `g3` are parameter-free
curve(g0(x), from = 0, to = 5)
curve(g1(x), add = TRUE, col = 2)
curve(g2(x), add = TRUE, col = 3)
curve(g3(x), add = TRUE, col = 4)

Resources