R-exams: extol command in Meta information - r-exams

I just started working in R-exams package to prepare a set of dynamic questions. I am puzzled with the command "extol" in Meta information of the exercise:
Meta-information
================
extol: 0.01
For instance, in the example above, what exactly does "extol=0.01" mean? Is 0.01 a percent of the amount that is tolerable? And where could I possibly get detailed information on tolerance specification "extol" in r-exams package?
Is it actually possible to set the tolerance as the percentage of the amount?

The extol meta-information sets the absolute tolerance of the exsolution. Thus, exsolution plus/mins extol is the interval which is accepted as correct.
To set a relative tolerance you can use the num_to_tol() function which converts a numeric solution to an absolute tolerance. It also assures a certain minimal tolerance. See the source code for the precise computation:
num_to_tol
## function (x, reltol = 2e-04, min = 0.01, digits = 2)
## pmax(min, round(reltol * abs(x), digits = digits))
This function is useful if the order of magnitude of the correct answer can change substantially across different random replications:
exams::num_to_tol(10)
## [1] 0.01
exams::num_to_tol(100)
## [1] 0.02
## exams::num_to_tol(1000)
## [1] 0.2
exams::num_to_tol(10000)
## [1] 2
The default of reltol = 2e-04 and min = 0.01 was chosen based on our needs and experiences in our own exams. You might prefer different specifications, of course.
To apply it in a given exercise for a solution in a variable sol, say, you typically do:
exsolution: `r sol`
extol: `r num_to_tol(sol)`

Related

Handling box constraints in Nelder-Mead optimisation by distorting the parameter space

I have a question on a specific implementation of a Nelder-Mead algorithm (1) that handles box contraints in an unusual way. I cannot find in anything about it in any paper (25 papers), textbook (searched 4 of them) or the internet.
I have a typical optimisation problem: min f(x) with a box constraint -0.25 <= x_i <= 250
The expected approach would be using a penalty function and make sure that all instances of f(x) are "unattractive" when x is out of bounds.
The algorithm works differently: the implementation in question does not touch f(x). Instead it distorts the parameter space using an inverse hyperbolic tangens atanh(f). Now the simplex algorithm can freely operate in a space without bounds and pick just any point. Before it gets f(x) in order to assess the solution at x the algorithm switches back into normal space.
At a first glance I found the idea ingenious. This way we avoid the disadvantages of penalty functions. But now I am having doubts. The distorted space affects termination behaviour. One termination criterion is the size of the simplex. By inflating the parameter space with atanh(x) we also inflate the simplex size.
Experiments with the algorithm also show that it does not work as intended. I do not yet understand how this happens, but I do get results that are out of bounds. I can say that almost half of the returned local minima are out of bounds.
As an example, take a look at nmkb() optimising the rosenbrook function when we gradually change the width of the box constraint:
rosbkext <- function(x) {
# Extended Rosenbrock function
n <- length(x)
sum (100*(x[1:(n-1)]^2 - x[2:n])^2 + (x[1:(n-1)] - 1)^2)
}
np <- 6 #12
for (box in c(2, 4, 12, 24, 32, 64, 128)) {
set.seed(123)
p0 <- rnorm(np)
p0[p0 > +2] <- +2 - 1E-8
p0[p0 < -2] <- -2 + 1E-8
ctrl <- list(maxfeval = 5E4, tol = 1E-8)
o <- nmkb(fn = rosbkext, par = p0, lower = -box, upper = +box, control = ctrl)
print(o$message)
cat("f(", format(o$par, digits = 2), ") =", format(o$value, digits=3), "\n")
}
The output shows that it claims to converge but it does not in three cases. And it does that for bounds of (-2,2) and (-12,12). I might accept that but then it also fails at (-128, 128). I also tried the same with the unconstrained dfoptim::nmk(). No trouble there. It converges perfectly.
[1] "Successful convergence"
f( -0.99 0.98 0.97 0.95 0.90 0.81 ) = 3.97
[1] "Successful convergence"
f( 1 1 1 1 1 1 ) = 4.42e-09
[1] "Successful convergence"
f( -0.99 0.98 0.97 0.95 0.90 0.81 ) = 3.97
[1] "Successful convergence"
f( 1 1 1 1 1 1 ) = 1.3e-08
[1] "Successful convergence"
f( 1 1 1 1 1 1 ) = 4.22e-09
[1] "Successful convergence"
f( 1 1 1 1 1 1 ) = 8.22e-09
[1] "Successful convergence"
f( -0.99 0.98 0.97 0.95 0.90 0.81 ) = 3.97
Why does the constrained algorithm have more trouble converging than the unconstrained one?
Footnote (1): I am referring to the Nelder-Mead implementation used in the optimx package in R. This package calls another package dfoptim with the nmkb-function.
(This question has nothing to do with optimx, which is just a wrapper for R packages providing unconstrained optimization.)
The function in question is nmkb() in the dfoptim package for gradient-free optimization routines. The approach to transform bounded regions into unbounded spaces is a common one and can be applied with many different transformation functions, sometimes depending on the kind of the boundary and/or the type of the objective function. It may also be applied, e.g., to transform unbounded integration domains into bounded ones.
The approach is problematic if the optimum lies at the boundary, because the optimal point will be sent to (nearly) infinity and cannot ultimately be reached. The routine will not converge or the solution be quite inaccurate.
If you think the algorithm is not working correctly, you should write to the authors of that package and -- that is important -- add one or two examples for what you think are bugs or incorrect solutions. Without explicit code examples no one here is able to help you.
(1) Those transformations define bijective maps between bounded and unbounded regions and the theory behind this approach is obvious. You may read about possible transformations in books on multivariate calculus.
(2) The approach with penalties outside the bounds has its own drawbacks, for instance the target function will not be smooth at the boundaries, and the BFGS method may not be appropriate anymore.
(3) You could try the Hooke-Jeeves algorithm through function hjkb() in the same dfoptim package. It will be slower, but uses a different approach for treating the boundaries, no transformations involved.
EDIT (after discussion with Erwin Kalvelagen above)
There appear to be local minima (with some coordinates negative).
If you set the lower bounds to 0, nmkb() will find the global minimum (1,1,1,1,1,1) in any case.
Watch out: starting values have to be feasible, that is all their coordinates greater 0.

How does the pnorm aspect of work with z scores & x-values?

My professor assigned us some homework questions regarding normal distributions. We are using R studio to calculate our values instead of the z-tables.
One question asks about something about meteors where the mean (μ) = 4.35, standard deviation (σ) = 0.59 and we are looking for the probability of x>5.
I already figured out the answer with 1-pnorm((5-4.35)/0.59) ~ 0.135.
However, I am currently having some difficulty trying to understand what pnorm calculates.
Originally, I just assumed that z scores were the only arguments needed. So I proceeded to use pnorm(z-score) for most of the normal curvature problems.
The help page for pnorm accessed through ?pnorm() indicates that the usage is:
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE).
My professor also says that I am ignoring the mean and sd by just using pnorm(z-score). I feel like it is just easier to type in one value instead of the whole set of arguments. So I experimented and found that
1-pnorm((5-4.35)/0.59) = 1-pnorm(5,4.35,0.59)
So it looks like pnorm(z-score) = pnorm (x,μ,σ).
Is there a reason that using the z-score allows to skip the mean and
standard deviation in the pnorm function?
I have also noticed that trying to add μ,σ arguments with the z-score gives the wrong answer (ex: pnorm(z-score,μ,σ).
> 1-pnorm((5-4.35)/0.59)
[1] 0.1352972
> pnorm(5,4.35,0.59)
[1] 0.8647028
> 1-pnorm(5,4.35,0.59)
[1] 0.1352972
> 1-pnorm((5-4.35)/0.59,4.35,0.59)
[1] 1
That is because a z-score is standard normally distributed, meaning it has μ = 0 and σ = 1, which, as you found out, are the default parameters for pnorm().
The z-score is just the transformation of any normally distributed value to a standard normally distributed one.
So when you output the probability of the z-score for x = 5 you indeed get the same value than asking for the probability of x > 5 in a normal distribution with μ = 4.35 and σ = 0.59.
But when you add μ = 4.35 and σ = 0.59 to your z-score inside pnorm() you get it all wrong, because you're looking for a standard normally distributed value in a different distribution.
pnorm() (to answer your first question) calculates the cumulative density function, which shows you P(X < x) (the probability that a random variable takes a value equal or less than x). That's why you do 1 - pnorm(..) to find out P(X > x).

Same function with same inputs returns different values

Lets say I have a function like follows:
testFunction <- function(testInputs){
print( sum(testInputs)+1 == 2 )
return( sum(testInputs) == 1 )
}
When I test this on command line with following input: c(0.65, 0.3, 0.05), it prints and returns TRUE as expected.
However when I use c(1-0.3-0.05, 0.3, 0.05) I get TRUE printed and FALSE returned. Which makes no sense because it means sum(testInputs)+1 is 2 but sum(testInputs) is not 1.
Here is what I think: Somehow printed value is not exactly 1 but probably 0.9999999..., and its rounded up on display. But this is only a guess. How does this work exactly?
This is exactly a floating point problem, but the interesting thing about it for me is how it demonstrates that the return value of sum() produces this error, but with + you don't get it.
See the links about floating point math in the comments. Here is how to deal with it:
sum(1-0.3-0.5, 0.3, 0.05) == 1
# [1] FALSE
dplyr::near(sum(1-0.3-0.05, 0.3, 0.05), 1)
# [1] TRUE
For me, the fascinating thing is:
(1 - 0.3 - 0.05 + 0.3 + 0.05) == 1
# [1] TRUE
Because you can't predict how the various implementations of floating point arithmetic will behave, you need to correct for it. Here, instead of using ==, use dplyr::near(). This problem (floating point math is inexact, and also unpredictable), is found across languages. Different implementations within a language will result in different floating point errors.
As I discussed in this answer to another floating point question, dplyr::near(), like all.equal(), has a tolerance argument, here tol. It is set to .Machine$double.eps^0.5, by default. .Machine$double.eps is the smallest number that your machine can add to 1 and be able to distinguish it from 1. It's not exact, but it's on that order of magnitude. Taking the square root makes it a little bigger than that, and allows you to identify exactly those values that are off by an amount that would make a failed test for equality likely to be a floating point error.
NOTE: yes, near() is in dplyr, which i almost always have loaded, so I forgot it wasn't in base... you could use all.equal(), but look at the source code of near(). It's exactly what you need, and nothing you don't:
near
# function (x, y, tol = .Machine$double.eps^0.5)
# {
# abs(x - y) < tol
# }
# <environment: namespace:dplyr>

Multiply Probability Distribution Functions

I'm having a hard time building an efficient procedure that adds and multiplies probability density functions to predict the distribution of time that it will take to complete two process steps.
Let "a" represent the probability distribution function of how long it takes to complete process "A". Zero days = 10%, one day = 40%, two days = 50%. Let "b" represent the probability distribution function of how long it takes to complete process "B". Zero days = 10%, one day = 20%, etc.
Process "B" can't be started until process "A" is complete, so "B" is dependent upon "A".
a <- c(.1, .4, .5)
b <- c(.1,.2,.3,.3,.1)
How can I calculate the probability density function of the time to complete "A" and "B"?
This is what I'd expect as the output for or the following example:
totallength <- 0 # initialize
totallength[1:(length(a) + length(b))] <- 0 # initialize
totallength[1] <- a[1]*b[1]
totallength[2] <- a[1]*b[2] + a[2]*b[1]
totallength[3] <- a[1]*b[3] + a[2]*b[2] + a[3]*b[1]
totallength[4] <- a[1]*b[4] + a[2]*b[3] + a[3]*b[2]
totallength[5] <- a[1]*b[5] + a[2]*b[4] + a[3]*b[3]
totallength[6] <- a[2]*b[5] + a[3]*b[4]
totallength[7] <- a[3]*b[5]
print(totallength)
[1] [1] 0.01 0.06 0.16 0.25 0.28 0.19 0.05
sum(totallength)
[1] 1
I have an approach in visual basic that used three for loops (one for each of the steps, and one for the output) but I hope I don't have to loop in R.
Since this seems to be a pretty standard process flow question, part two of my question is whether any libraries exist to model operations flow so I'm not creating this from scratch.
The efficient way to do this sort of operation is to use a convolution:
convolve(a, rev(b), type="open")
# [1] 0.01 0.06 0.16 0.25 0.28 0.19 0.05
This is efficient both because it's less typing than computing each value individually and also because it's implemented in an efficient way (using the Fast Fourier Transform, or FFT).
You can confirm that each of these values is correct using the formulas you posted:
(expected <- c(a[1]*b[1], a[1]*b[2] + a[2]*b[1], a[1]*b[3] + a[2]*b[2] + a[3]*b[1], a[1]*b[4] + a[2]*b[3] + a[3]*b[2], a[1]*b[5] + a[2]*b[4] + a[3]*b[3], a[2]*b[5] + a[3]*b[4], a[3]*b[5]))
# [1] 0.01 0.06 0.16 0.25 0.28 0.19 0.05
See the package:distr. Choosing the term "multiply" is unfortunate, since the situation described is not one where the contributions to probabilities is independent (where multiplication of probabilities would be the natural term to use). It's rather some sort of sequential addition, and that is exactly what the distr package provides as its interpretation of what "+" should mean when used as a symbolic manipulation of two discrete distributions.
A <- DiscreteDistribution ( setNames(0:2, c('Zero', 'one', 'two') ), a)
B <- DiscreteDistribution(setNames(0:2, c( "Zero2" ,"one2", "two2",
"three2", "four2") ), b )
?'operators-methods' # where operations on 2 DiscreteDistribution are convolution
plot(A+B)
After a bit of nosing around I see that the actual numeric values can be found here:
A.then.B <- A + B
> environment(A.the.nB#d)$dx
[1] 0.01 0.06 0.16 0.25 0.28 0.19 0.05
Seems like there should have been a method for display of the probabilities, and I'm not a regular user of this fascinating package so there well may be one. Do read the vignette and the code-demos ... which I have not yet done. Further noodling around convinces me that the right place to look is in the companion package: distrDoc where the vignette is 100+ pages long. And it shouldn't have required any effort to find it, either, since that advice is in the messages that print when the package is loaded ... except in my defense there were a couple of pages of messages, so it was more tempting to jump into coding and using the help pages.
I'm not familiar with a dedicated package that does exactly what your example describes. but let me sujust a more robust solution for this problem.
You are looking for a method to estimate the distribution of a process that might be combined by an n steps process, in your case 2 that might not be as easy to compute as your example.
The approach Iwould use is a simulation, of 10k observations drown from the underlying distributions, and then calculating the density function of the simulated results.
using your example we can do the following:
x <- runif(10000)
y <- runif(10000)
library(data.table)
z <- as.data.table(cbind(x,y))
z[x>=0 & x<0.1, a_days:=0]
z[x>=0.1 & x<0.5, a_days:=1]
z[x>=0.5 & x<=1, a_days:=2]
z[y>=0 & y <0.1, b_days:=0]
z[x>=0.1 & x<0.3, b_days:=1]
z[x>=0.3 & x<0.5, b_days:=2]
z[x>=0.5 & x<0.8, b_days:=3]
z[x>=0.8 & x<=1, b_days:=4]
z[,total_days:=a_days+b_days]
hist(z[,total_days])
this will result in a very good proxy if the density and the aproach would also work if your second process was drown from an exponential distribution. in which case you'd use rexp function to calculate b_days directly.

more significant digits

How can I get more significant digits in R? Specifically, I have the following example:
> dpois(50, lambda= 5)
[1] 1.967673e-32
However when I get the p-value:
> 1-ppois(50, lambda= 5)
[1] 0
Obviously, the p-value is not 0. In fact it should greater than 1.967673e-32 since I'm summing a bunch of probabilities. How do I get the extra precision?
Use lower.tail=FALSE:
ppois(50, lambda= 5, lower.tail=FALSE)
## [1] 2.133862e-33
Asking R to compute the upper tail is much more accurate than computing the lower tail and subtracting it from 1: given the inherent limitations of floating point precision, R can't distinguish (1-eps) from 1 for values of eps less than .Machine$double.neg.eps, typically around 10^{-16} (see ?.Machine).
This issue is discussed in ?ppois:
Setting ‘lower.tail = FALSE’ allows to get much more precise
results when the default, ‘lower.tail = TRUE’ would return 1, see
the example below.
Note also that your comment about the value needing to be greater than dpois(50, lambda=5) is not quite right; ppois(x,...,lower.tail=FALSE) gives the probability that the random variable is greater than x, as you can see (for example) by seeing that ppois(0,...,lower.tail=FALSE) is not exactly 1, or:
dpois(50,lambda=5) + ppois(50,lambda=5,lower.tail=FALSE)
## [1] 2.181059e-32
ppois(49,lambda=5,lower.tail=FALSE)
## [1] 2.181059e-32

Resources