Transformation with box-cox in R - r

I have a vector like x = [7,41;7,32;7,14;6,46;7,36;7,23;7,16;7,28]. I did a shapiro test (shapiro.test) and the result for the p-value = 0.003391826 which means its not normal distributed and so i want to transform it with box cox (or if you have a better idea except of log and square root) into normal form.
This is the command i tried: boxcox_x=boxcox(x~1, lambda = seq(2,3,1/10), plotit = TRUE, eps=1/50, xlab=expression(lambda), ylab="log-Likelihood"). After this i saw in the diagram for example lambda = -2.
Then i wrote lambda.max=boxcox_x$x[which.max(boxcox_ph$y)] and the lambda value from this code was completely different from what i could see in the diagram
then i wrote: x_new=bcPower(x, lambda.max, jacobian.adjusted = FALSE) because i thought this code will give me my new vector which should be normal distributed but the result was completely different
Can anybody help me in an easy way of explaining (I am an newcomer)
Thank you

Getting a good approximation of the distribution is a bit of an art that depends on the context.
A bigger problem you may have is that you have a small sample size which could lead to unreliable estimates of the p-value or representation of the data in any distribution.

Related

How to use Simulated Annealing in R (GenSA) for a function with discrete variables with a few options without pattern?

I want to use Simulated Annealing. My objective function exist of multiple variables, for some of them there are only a few options possible. I saw the same question on Stack here:
How to use simulated annealing for a function with discrete paremeters?, but there was no answer but a reference to: How to put mathematical constraints with GenSA function in R.
I don't understand how to apply the advice from the second link to my situation (but I think the answer can be found there).
For example:
v <- c(50, 50, 25, 25)
lower <- c(0,0,0,20)
upper <- c(100,100,50,40)
out <- GenSA(v, lower = lower, upper = upper, fn = efficientFunction)
Assume that the fourth parameter, v[4], only can be in {20,25,30,35,40}. They suggested the use of Lagrange multipliers, hence, I was thinking of something like: lambda * ceil(v[4] / 5). Is this a good idea ?
But what can I do it the sample space of a variable does not have a nice pattern, for example third parameter, v[3], only can be in {0,21,33,89,100}. I don't understand why a Lagrange multiplier can help in this situation. Do I need to make the form of my parameters different that they follow a pattern or is there another option?
In case Lagrange multipliers are the only option, I'll end up with with 8 of these formulations in my objective. It seems to me that there is another option, but I don't know how!
With kind regards and thanks in advance,
Roos
With SA, you could start with a very simple neighbourhood sheme,
pick 1 of the parameters, and change it by selecting a new valid setting, 1 above, or 1 below the current one (we assume that they have a order, like I feel is your case).
There are no Lagrange multipliers involved in SA as I know. But there are many variations and maybe some with Constrainsts or other make use of them.

Why has the author used the following matrices for the following standardisation?

Can somebody tell me why this author has used the following code in their normalisation.
The first line appears fine to me they have standardised the training set by the following formula;
(x - mean(x)) / std(x)
However the second line and third line (validation and test) they have used the train mean (trainme) and train standard deviation (trainstd). Should they not have used the validation mean (validationme) and validation standard deviation (validationstd) along with the test mean and test standard deviation?
You can also view the page from the book at the following link (page 173)
What the authors are doing is reasonable and it's what is conventionally done. The idea is that the same normalization is applied to all inputs. This is essentially allocating some new parameters (offset and scale) and estimating them from the training data. In that scheme, if the value 100 is input, then the normalized value is (100 - offset)/scale, no matter where (training, testing, whatever) that 100 came from.
I guess one can also make an argument that the offset and scale should be context dependent in the sense that if you are given a set of data and for some reason the offset and scale are very different from the original training data, maybe what's important is how big each value is relative to the others in the same data set. E.g. maybe you should treat 200 the same as 100, if the scale is twice as big in the data set containing 200.
Whether that data-dependent scaling is reasonable would have to be decided case by case. I don't remember ever having seen it, but it's plausible that it could be the right thing to do in some cases.
By the way, you'll get more interest in general statistical questions at stats.stackexchange.com and/or datascience.stackexchange.com.

Equation of rbfKernel in kernlab is different from the standard?

I have observed that kernlab uses rbfkernel as,
rbf(x,y) = exp(-sigma * euclideanNorm(x-y)^2)
but according to this wiki link, the rbf kernel should be of the form
rbf(x,y) = exp(-euclideanNorm(x-y)^2/(2*sigma^2))
which is also more intuitive since two close samples with a large kernel sigma value will lead to a higher similarity matching.
I am not sure what e1071 svm uses (native code libsvm?)
I hope someone can enlighten me on why there is a difference ? I caught this because I was initially using e1071 but switched to ksvm but saw inconsistent results for the two.
A small example for comparison
set.seed(123)
x <- rnorm(3)
y <- rnorm(3)
sigma <- 100
rbf <- rbfdot(sigma=sigma)
rbf(x, y)
exp( -sum((x-y)^2)/(2*sigma^2) )
I would expect the kernel value to be close to 1 (since x,y come from sigma=1, while kernel sigma=100). This is observed only in the second case.
I came across that discrepancy too and I wound up digging into the source to figure out if there was a typo in the documentation or what was going on exactly since sigma in the context of Gaussians traditionally goes as the standard deviation in the denominator right?
Here's the relevant source
**kernlab\R\kernels.R**
## Define the kernel objects,
## functions with an additional slot for the kernel parameter list.
## kernel functions take two vector arguments and return a scalar (dot product)
rbfdot<- function(sigma=1)
{
rval <- function(x,y=NULL)
{
if(!is(x,"vector")) stop("x must be a vector")
if(!is(y,"vector")&&!is.null(y)) stop("y must a vector")
if (is(x,"vector") && is.null(y)){
return(1)
}
if (is(x,"vector") && is(y,"vector")){
if (!length(x)==length(y))
stop("number of dimension must be the same on both data points")
return(exp(sigma*(2*crossprod(x,y) - crossprod(x) - crossprod(y))))
# sigma/2 or sigma ??
}
}
return(new("rbfkernel",.Data=rval,kpar=list(sigma=sigma)))
}
You can observe from their comment on sigma/2 or sigma ?? that they may perhaps be a bit confused about the convention to adopt, the presence of /2 would be consistent with the standard deviation form /(2*sigma), but I had to speculate about this discovery.
Now another corroborating piece of evidence is in the help page for ? rbfdot which reads...
sigma The inverse kernel width used by the Gaussian the Laplacian,
the Bessel and the ANOVA kernel
And that is consistent with the form they use with sigma in the numerator, since in the denominator it would scale proportionately with the width of the Gaussian right. So it indeed looks like they settled on the convention that is described in the Wikipedia article as the gamma form, where they say
An equivalent, but simpler, definition involves a parameter gamma =
-1/(2*sigma^2)
So the difference just seems to be a matter of adopting different but equivalent conventions. One motivator for the particular convention (which someone may confirm in a comment) may arise from issues of code reuse and consistency, where as you see the parameter is used by three other kernel forms that may have their parameters more traditionally set in the numerator. I'm not sure on that point however since I've never used those alternate kernels and am unfamiliar with each.

What's the lowest number R will present before rounding to 0?

I'm doing some statistical analysis with R software (bootstrapped Kolmogorov-Smirnov tests) of very large data sets, meaning that my p values are all incredibly small. I've Bonferroni corrected for the large number of tests that I've performed meaning that my alpha value is also very small in order to reject the null hypothesis.
The problem is, R presents me with p values of 0 in some cases where the p value is presumably so small that it cannot be presented (these are usually for the very large sample sizes). While I can happily reject the null hypothesis for these tests, the data is for publication, so I'll need to write p < ..... but I don't know what the lowest reportable values in R are?
I'm using the ks.boot function in case that matters.
Any help would be much appreciated!
.Machine$double.xmin gives you the smallest non-zero normalized floating-point number. On most systems that's 2.225074e-308. However, I don't believe this is a sensible limit.
Instead I suggest that in Matching::ks.boot you change the line
ks.boot.pval <- bbcount/nboots to
ks.boot.pval <- log(bbcount)-log(nboots) and work on the log-scale.
Edit:
You can use trace to modify the function.
Step 1: Look at the function body, to find out where to add additional code.
as.list(body(ks.boot))
You'll see that element 17 is ks.boot.pval <- bbcount/nboots, so we need to add the modified code directly after that.
Step 2: trace the function.
trace (ks.boot, quote(ks.boot.pval <- log(bbcount)-log(nboots)), at=18)
Step 3: Now you can use ks.boot and it will return the logarithm of the bootstrap p-value as ks.boot.pvalue. Note that you cannot use summary.ks.boot since it calls format.pval, which will not show you negative values.
Step 4: Use untrace(ks.boot) to remove the modifications.
I don't know whether ks.boot has methods in the packages Rmpfr or gmp but if it does, or you feel like rolling your own code, you can work with arbitrary precision and arbitrary size numbers.

How to handle boundary constraints when using `nls.lm` in R

I asked this question a while ago. I am not sure whether I should post this as an answer or a new question. I do not have an answer but I "solved" the problem by applying the Levenberg-Marquardt algorithm using nls.lm in R and when the solution is at the boundary, I run the trust-region-reflective algorithm (TRR, implemented in R) to step away from it. Now I have new questions.
From my experience, doing this way the program reaches the optimal and is not so sensitive to the starting values. But this is only a practical method to step aside from the issues I encounterd using nls.lm and also other optimization functions in R. I would like to know why nls.lm behaves this way for optimization problems with boundary constraints and how to handle the boundary constraints when using nls.lm in practice.
Following I gave an example illustrating the two issues using nls.lm.
It is sensitive to starting values.
It stops when some parameter reaches the boundary.
A Reproducible Example: Focus Dataset D
library(devtools)
install_github("KineticEval","zhenglei-gao")
library(KineticEval)
data(FOCUS2006D)
km <- mkinmod.full(parent=list(type="SFO",M0 = list(ini = 0.1,fixed = 0,lower = 0.0,upper =Inf),to="m1"),m1=list(type="SFO"),data=FOCUS2006D)
system.time(Fit.TRR <- KinEval(km,evalMethod = 'NLLS',optimMethod = 'TRR'))
system.time(Fit.LM <- KinEval(km,evalMethod = 'NLLS',optimMethod = 'LM',ctr=kingui.control(runTRR=FALSE)))
compare_multi_kinmod(km,rbind(Fit.TRR$par,Fit.LM$par))
dev.print(jpeg,"LMvsTRR.jpeg",width=480)
The differential equations that describes the model/system is:
"d_parent = - k_parent * parent"
"d_m1 = - k_m1 * m1 + k_parent * f_parent_to_m1 * parent"
In the graph on the left is the model with initial values, and in the middle is the fitted model using "TRR"(similar to the algorithm in Matlab lsqnonlin function ), on the right is the fitted model using "LM" with nls.lm. Looking at the fitted parameters(Fit.LM$par) you will find that one fitted parameter(f_parent_to_m1) is at the boundary 1. If I change the starting value for one parameter M0_parent from 0.1 to 100, then I got the same results using nls.lm and lsqnonlin.I have many cases like this one.
newpars <- rbind(Fit.TRR$par,Fit.LM$par)
rownames(newpars)<- c("TRR(lsqnonlin)","LM(nls.lm)")
newpars
M0_parent k_parent k_m1 f_parent_to_m1
TRR(lsqnonlin) 99.59848 0.09869773 0.005260654 0.514476
LM(nls.lm) 84.79150 0.06352110 0.014783294 1.000000
Except for the above problems, it often happens that the Hessian returned by nls.lm is not invertable(especially when some parameters are on the boundary) so I cannot get an estimation of the covariance matrix. On the other hand, the "TRR" algorithm(in Matlab) almost always give an estimation by calculating the Jacobian at the solution point. I think this is useful but I am also sure that R optimization algorithms(the ones I have tried) did not do this for a reason. I would like to know whether I am wrong by using the Matlab way of calculating the covariance matrix to get standard error for the parameter estimates.
One last note, I claimed in my previous post that the Matlab lsqnonlin outperforms R's optimization functions in almost all cases. I was wrong. The "Trust-Region-Reflective" algorithm used in Matlab is in fact slower(sometimes much slower) if also implemented in R as you can see from the above example. However, it is still more stable and reaches a better solution than the R's basic optimization algorithms.
First off, I am not an expert on Matlab and Optimisation and have never used R.
I am not sure I see what your actual question is, but maybe I can shed some light into your puzzlement:
LM is slightly enhanced Gauß-Newton approach - for problems with several local minima it is very sensitive to initial states. Including boundaries typically generates more of those minima.
TRR is akin to LM, but more robust. It has better capabilities for "jumping out of" bad local minima. It is quite feasible that it will behave better, but perform worse, than an LM. Actually explaining why is very hard. You would need to study the algorithms in detail and look at how they behave in this situation.
I cannot explain the difference between Matlab's and R's implementation, but there are several extensions to TRR that maybe Matlab uses and R does not.
Does your approach of using LM and TRR alternatingly converge better than TRR alone?
Using the mkin package, you can find the parameters using the "Port" algorithm (which is also a kind of a TRR algorithm as far as I could tell from its documentation), or the "Marq" algorithm, which uses nls.lm in the background. Then you can use "normal" starting values or "bad" starting values.
library(mkin)
packageVersion("mkin")
Recent mkin version can speed up the process considerably as they compile the models from automatically generated C code if a compiler is available on your system (e.g. you have r-base-dev installed on Debian/Ubuntu, or Rtools on Windows).
This defines the model:
m <- mkinmod(parent = mkinsub("SFO", "m1"),
m1 = mkinsub("SFO"),
use_of_ff = "max")
You can check that the differential equations are correct:
cat(m$diffs, sep = "\n")
Then we fit in four variants, Port and LM, with or without M0 fixed to 0.1:
f.Port = mkinfit(m, FOCUS_2006_D)
f.Port.M0 = mkinfit(m, FOCUS_2006_D, state.ini = c(parent = 0.1, m1 = 0))
f.LM = mkinfit(m, FOCUS_2006_D, method.modFit = "Marq")
f.LM.M0 = mkinfit(m, FOCUS_2006_D, state.ini = c(parent = 0.1, m1 = 0),
method.modFit = "Marq")
Then we look at the results:
results <- sapply(list(Port = f.Port, Port.M0 = f.Port.M0, LM = f.LM, LM.M0 = f.LM.M0),
function(x) round(summary(x)$bpar[, "Estimate"], 5))
which are
Port Port.M0 LM LM.M0
parent_0 99.59848 99.59848 99.59848 39.52278
k_parent 0.09870 0.09870 0.09870 0.00000
k_m1 0.00526 0.00526 0.00526 0.00000
f_parent_to_m1 0.51448 0.51448 0.51448 1.00000
So we can see that the Port algorithm finds the best solution (to the best of my knowledge) even with bad starting values. The speed issue that one may have with more complicated models is alleviated using the automatic generation of C code.

Resources