Gamma function returns unstable value? - r

Gamma function should not take any negative value as an argument. Look at the code below where strange thing happens. Is this some problem with R?
I was using function optim to optimize some function containing:
gamma(sum(alpha))
with respect to alpha. R returns negative alpha.
> gamma(sum(alpha))
[1] 3.753+14
>sum(alpha)
[1] -3
gamma(-3)
[1] NaN
Warning message:
In gamma(-3) NaN's produced.
Can somebody explain? Or any suggestion for the optimization?
Thanks!

Gamma function is "not defined" at negative integer argument values so R returns Not a Number (NaN). The reason of the "strange" behaviour is decimal representation of numbers in R. In case the number differs from the nearest integer not very much, R rounds it during printing (in fact when you type alpha, R is calling for print(alpha). Please see the examples of such a behaviour below.
gamma(-3)
# [1] NaN
# Warning message:
# In gamma(-3) : NaNs produced
x <- -c(1, 2, 3) / 2 - 1e-15
x
# [1] -0.5 -1.0 -1.5
sum(x)
# [1] -3
gamma(sum(x))
# [1] 5.361428e+13
curve(gamma, xlim = c(-3.5, -2.5))
Please see a graph below which explains the behaviour of gamma-function near negative integers:

Related

MLE of generalized normal distribution [duplicate]

I have a time series in R Studio. Now I want to calculate the log() of this series. I tried the following:
i <- (x-y)
ii <- log(i)
But then I get the following: Warning message: In log(i): NaNs produced
To inspect this I used: table(is.nan(ii)) which gives me the following output:
FALSE TRUE
2480 1
So I assume, that there is 1 NaN in my time series now. My question is: what code can I use, that R shows me for which observation a NaN was produced?
Here is a small data sample: i <- c(9,8,4,5,7,1,6,-1,8,4)
Btw how do I type mathematical formulas in stackoverflow, for example for log(x)? Many thanks
As I said in my comment, to know which observation generated the NaN, you can use function which:
i <- c(9,8,4,5,7,1,6,-1,8,4)
which(is.nan(log(i))) # 8
Use the test to subset your original vector with the values that produce NaN:
> i <- c(9,8,4,5,-7,1,6,-1,8,4,Inf,-Inf,NA)
> i[which(is.nan(log(i)))]
[1] -7 -1 -Inf
Warning message:
In log(i) : NaNs produced
Here you see that -7, -1, and -Inf produced NaN.
Note that log(NA) is not NaN, its NA, which is a different sort of not-numberness.

Questions about boundary constraints with L-BFGS-B method in optim() in R

I am trying to use L-BFGS-B method in optim() to find out the minimum value of the following function:
ip<-function(x) log(mean(exp(return*x))) , where "return" is a series of constants.
First, I gave no boundary constraints: rst1<-optim (-1,ip,method="L-BFGS-B"), and it provided a reasonable answer (x=-118.44,ip.min=-0.00017), which could be justified by both theory and excel calculation. The given message in the result was
CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL.
As x must be less than zero in theory, I then added boundary constraints to the optimizer: rst2<-optim (-1,ip,method="L-BFGS-B",lower=-Inf,upper=0). However, this time it only provided an answer calculated by the initial parameter (-1), which is obviously not the minimum value. The given message in the result was
CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH.
I then tried other boundary constraints, and no matter what they were, as long as boundary constraints were added here, it would always provided an answer calculated by the initial parameter, and failed to find the minimum value.
Does anyone know why this happens? Many thanks.
example
rtntxt<-"
return
9.15051E-05
9.67217E-07
1.34187E-05
-0.000105801
0.000111004
0.000228786
3.84068E-06
0.000388639
-0.000122291
-7.73028E-05
4.97595E-05
-3.97503E-05
1.86449E-05
-0.000137739
-0.000180709
-1.07254E-05
3.89723E-05
"
rtn<-read.table(text=rtntxt,header=TRUE)
ip<-function(x) log(mean(exp(rtn$return*x)))
rst1<-optim(-1,ip,method="L-BFGS-B") #no boundaries
rst2<-optim(-1,ip,method="L-BFGS-B",lower=-Inf,upper=0) #with boundaries
plot
x<- -10000:10000
n<-length(x)
s<-numeric(n)
for(i in 1:n) s[i]<-ip(x[i])
plot(x,s)
x[which(s==min(s))] #rst1(no boundaries) is correct
min(s)
I am not sure how did you get that result: If I correct your code for misspelling I still get similar answers, and not the answer you got from your result:
ip<-function(x) log(mean(exp(return(x))))
rst1<-optim(-1,ip,method="L-BFGS-B")
# > rst1
# $`par`
# [1] -1.820444e+13
#
# $value
# [1] -1.820444e+13
#
# $counts
# function gradient
# 20 20
#
# $convergence
# [1] 0
#
# $message
# [1] "CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL"
#
rst2<-optim (-1,ip,method="L-BFGS-B",lower=-Inf,upper=0)
# $`par`
# [1] -1.80144e+13
#
# $value
# [1] -1.80144e+13
#
# $counts
# function gradient
# 3 3
#
# $convergence
# [1] 0
#
# $message
# [1] "CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL"
Moreover, to check whether there could be a mistake on my code, I tried to plot the values of you function for -1:-100000, but It does not look like there exist an optim where you tell there is. Check your code/post, and if you know approximately where the optimum value is, try to plot it graphically ( that would be my advise). Cheers !,
plot(x = -1:-100000, y = ip(-1:-100000))

Optim: non-finite finite-difference value in L-BFGS-B

I'm trying to maximize a likelihood with R's 'optim.' I get the error "non-finite finite-difference value."
I'm using L-BFGS-B because I have to constrain the 11th parameter (Bernoulli "p") to be 0<=p<=1. Since I need this constraint, I can't use a nongradient method like "Nelder-Mead." Any thoughts on how I can fix this? It worked fine with simulated data!
Note that I'm using a floor function in here because discrete values are needed for the "Trials" parameters (params 1 through 10).
library(rmutil)
Nhat<-c(14335,15891,2700,1218,2213,10985,4985,8738,13878)
sdNhat<-
sqrt(c(26915344,6574096,175561,51529,71824,12166144,145924,2808976,3319684))
C<-c(313,410,38,30,69,175,132,193,240)
LL1<-vector()
LL2<-vector()
NLL<-function(data,par){
for (i in 1:length(Nhat)){
LL1[i]<-dnorm(Nhat[i],par[i],sdNhat[i],log=TRUE)
LL2[i]<-dbetabinom(C[i],floor(par[i]),par[length(Nhat)+1],par[length(Nhat)+2],log=TRUE)
}
-1*(sum(LL1)+sum(LL2))
}
out<-optim(par=c(floor(Nhat*runif(length(Nhat),0.9,1.1)),0.02,3),
fn=NLL,data=list(Nhat=Nhat,sdNhat=sdNhat,C=C),
method='L-BFGS-B',
lower=c(rep(min(Nhat),length(Nhat)),0.0001,1),
upper=c(rep(min(Nhat),length(Nhat)),0.9999,2))
You are getting an error, because the boundaries you are setting for the parameters 1 to 9 are identical. Thus, you have to adjust upper=c(rep(min(Nhat),length(Nhat)),0.9999,2)) (or lower) to be an interval.
You said that only the 10th (you actually wrote 11th, but I guess that's a typo) has to be bounded between 0 and 1, so this would work:
set.seed(1)
out<-optim(par=c(floor(Nhat*runif(length(Nhat),0.9,1.1)),0.02,3),
fn=NLL,data=list(Nhat=Nhat,sdNhat=sdNhat,C=C),
method='L-BFGS-B',
lower=c(rep(-Inf,length(Nhat)),0,-Inf),
upper=c(rep(Inf,length(Nhat)),1,Inf))
out
# $par
# [1] 13660.61522882 15482.96819195 2730.66273051 1310.04511624 2077.45269032 11857.94955470
# [7] 5417.09464008 9016.57472573 14234.22972586 0.02165253 826.21691430
#
# $value
# [1] 116.2657

R small pvalues

I am calculating z-scores to see if a value is far from the mean/median of the distribution.
I had originally done it using the mean, then turned these into 2-side pvalues. But now using the median I noticed that there are some Na's in the pvalues.
I determined this is occuring for values that are very far from the median.
And looks to be related to the pnorm calculation.
"
'qnorm' is based on Wichura's algorithm AS 241 which provides
precise results up to about 16 digits. "
Does anyone know a way around this as I would like the very small pvalues.
Thanks,
> z<- -12.5
> 2-2*pnorm(abs(z))
[1] 0
> z<- -10
> 2-2*pnorm(abs(z))
[1] 0
> z<- -8
> 2-2*pnorm(abs(z))
[1] 1.332268e-15
Intermediately, you are actually calculating very high p-values:
options(digits=22)
z <- c(-12.5,-10,-8)
pnorm(abs(z))
# [1] 1.0000000000000000000000 1.0000000000000000000000 0.9999999999999993338662
2-2*pnorm(abs(z))
# [1] 0.000000000000000000000e+00 0.000000000000000000000e+00 1.332267629550187848508e-15
I think you will be better off using the low p-values (close to zero) but I am not good enough at math to know whether the error at close-to-one p-values is in the AS241 algorithm or the floating point storage. Look how nicely the low values show up:
pnorm(z)
# [1] 3.732564298877713761239e-36 7.619853024160526919908e-24 6.220960574271784860433e-16
Keep in mind 1 - pnorm(x) is equivalent to pnorm(-x). So, 2-2*pnorm(abs(x)) is equivalent to 2*(1 - pnorm(abs(x)) is equivalent to 2*pnorm(-abs(x)), so just go with:
2 * pnorm(-abs(z))
# [1] 7.465128597755427522478e-36 1.523970604832105383982e-23 1.244192114854356972087e-15
which should get more precisely what you are looking for.
One thought, you'll have to use an exp() with larger precision, but you might be able to use log(p) to get slightly more precision in the tails, otherwise you are effectively at 0 for the non-log p values in terms of the range that can be calculated:
> z<- -12.5
> pnorm(abs(z),log.p=T)
[1] -7.619853e-24
Converting back to the p value doesn't work well, but you could compare on log(p)...
> exp(pnorm(abs(z),log.p=T))
[1] 1
pnorm is a function which gives what P value is based on given x. If You do not specify more arguments, then default distribution is Normal with mean 0, and standart deviation 1.
Based on simetrity, pnorm(a) = 1-pnorm(-a).
In R, if you add positive numbers it will round them. But if you add negative no rounding is done. So using this formula and negative numbers you can calculate needed values.
> pnorm(0.25)
[1] 0.5987063
> 1-pnorm(-0.25)
[1] 0.5987063
> pnorm(20)
[1] 1
> pnorm(-20)
[1] 2.753624e-89

how to solve multi dimension integral equations with variable on upper bounds

I would like to solve an equation as below, where the X is the only unknown variable and function f() is a multi-variate Student t distribution.
More precisely, I have a multi k-dimensional integral for a student density function, which gives us a probability as a result, and I know that this probability is given as q. The lower bound for all integral is -Inf and I know the last k-1 dimension's upper bound (as given), the only unknown variable is the first integral's upper bound. It should have an solution for a variable and one equation. I tried to solve it in R. I did Dynamic Conditional Correlation to have a correlation matrix in order to specify my t-distribution. So plug this correlation matrix into my multi t distribution "dmvt", and use the "adaptIntegral" function from "cubature" package to construct a function as an argument to the command "uniroot" to solve the upper bound on the first integral. But I have some difficulties to achieve what I want to get. (I hope my question is clear) I have provided my codes before, somebody told me that there is problem, but cannot find why there is an issue there. Many thanks in advance for your help.
I now how to deal with it with one dimension integral, but I don't know how a multi-dimension integral equation can be solved in R? (e.g. for 2 dimension case)
\int_{-\infty}^{X}
\int_{-\infty}^{Y_{1}} \cdots
\int_{-\infty}^{Y_{k}}
f(x,y_{1},\cdots y_{k})
d_{x}d_{y_{1},}\cdots d_{y_{k}} = q
This code fails:
require(cubature)
require(mvtnorm)
corr <- matrix(c(1,0.8,0.8,1),2,2)
f <- function(x){ dmvt(x,sigma=corr,df=3) }
g <- function(y) adaptIntegrate(f,
lowerLimit = c( -Inf, -Inf),
upperLimit = c(y, -0.1023071))$integral-0.0001
uniroot( g, c(-2, 2))
Since mvtnorm includes a pmvt function that computes the CDF of the multivariate t distribution, you don't need to do the integral by brute force. (mvtnorm also includes a quantile function qmvt, but only for "equicoordinate" values.)
So:
library(mvtnorm)
g <- function(y1_upr,y2_upr=-0.123071,target=1e-4,df=3) {
pmvt(upper=c(y1_upr,y2_upr),df=df)-target
}
uniroot(g,c(-10000,0))
## $root
## [1] -17.55139
##
## $f.root
## [1] -1.699876e-11
## attr(,"error")
## [1] 1e-15
## attr(,"msg")
## [1] "Normal Completion"
##
## $iter
## [1] 18
##
## $estim.prec
## [1] 6.103516e-05
##
Double-check:
pmvt(upper=c(-17.55139,-0.123071),df=3)
## [1] 1e-04
## attr(,"error")
## [1] 1e-15
## attr(,"msg")
## [1] "Normal Completion"

Resources