Generating data from non-standard function in r - r

I want to draw samples from this function
f(x) = 1/(gamma(1+1/x))^N
for x > 0 and N is number of samples
I tried using the quantile , the rejection method and the metropolis Hasting but could not figure out. I'm also new to R programming.
EDIT.
The OP posted the following function in a comment.
Falp <- function(a, b, abi) {
ti = seq(a, b, abi)
m = length(x)
Fx = fx = matrix(0, m)
fx = 1/(((gamma(1 + 1/x))^N))
}
The formating was mine with help by formatR::tidy_source().

Related

Improved inverse transform method for Poisson random variable generation in R

I am reading Section 4.2 in Simulation (2006, 4ed., Elsevier) by Sheldon M. Ross, which introducing generating a Poisson random variable by the inverse transform method.
Denote pi =P(X=xi)=e^{-λ} λ^i/i!, i=0,1,... and F(i)=P(X<=i)=Σ_{k=0}^i pi to be the PDF and CDF for Poisson, respectively, which can be computed via dpois(x,lambda) and ppois(x,lambda) in R.
There are two inverse transform algorithms for Poisson: the regular version and the improved one.
The steps for the regular version are as follows:
Simulate an observation U from U(0,1)​.
Set i=0​ and ​F=F(0)=p0=e^{-λ}​.
If U<F​, select ​X=​i and terminate.
If U >= F​, obtain i=i+1, F=F+pi​ and return to the previous step.
I write and test the above steps as follows:
### write the regular R code
pois_inv_trans_regular = function(n, lambda){
X = rep(0, n) # generate n samples
for(m in 1:n){
U = runif(1)
i = 0; F = exp(-lambda) # initialize
while(U >= F){
i = i+1; F = F + dpois(i,lambda) # F=F+pi
}
X[m] = i
}
X
}
### test the code (for small λ, e.g. λ=3)
set.seed(0); X = pois_inv_trans_regular(n=10000,lambda=3); c(mean(X),var(X))
# [1] 3.005000 3.044079
Note that the mean and variance for Poisson(λ) are both λ, so the writing and testing for the regular code are making sense!
Next I tried the improved one, which is designed for large λ and described according to the book as follows:
The regular algorithm will need to make 1+λ searches, i.e. O(λ) computing complexity, which is fine when λ is small, while it can be greatly improved upon when λ is large.
Indeed, since a Poisson random variable with mean λ is most likely to take on one of the two integral values closest to λ , a more efficient algorithm would first check one of these values, rather than starting at 0 and working upward. For instance, let I=Int(λ) and recursively determine F(I).
Now generate a Poisson random variable X with mean λ by generating a random number U, noting whether or not X <= I​ by seeing whether or not ​U <= F(I)​. Then search downward starting from ​I​ in the case where X <= I​ and upward starting from ​I+1​ otherwise.
It is said that the improved algorithm only need 1+0.798√λ searches, i.e., having O(√λ) complexity.
I tried to wirte the R code for the improved one as follows:
### write the improved R code
pois_inv_trans_improved = function(n, lambda){
X = rep(0, n) # generate n samples
p = function(x) {dpois(x,lambda)} # PDF: p(x) = P(X=x) = λ^x exp(-λ)/x!
F = function(x) {ppois(x,lambda)} # CDF: F(x) = P(X ≤ x)
I = floor(lambda) # I=Int(λ)
F1 = F(I); F2 = F(I+1) # two close values
for(k in 1:n){
U = runif(1)
i = I
if ( F1 < U & U <= F2 ) {
i = I+1
}
while (U <= F1){ # search downward
i = i-1; F1 = F1 - p(i)
}
while (U > F2){ # search upward
i = i+1; F2 = F2 + p(i)
}
X[k] = i
}
X
}
### test the code (for large λ, e.g. λ=100)
set.seed(0); X = pois_inv_trans_improved(n=10000,lambda=100); c(mean(X),var(X))
# [1] 100.99900000 0.02180118
From the simulation results [1] 100.99900000 0.02180118 for c(mean(X),var(X)), which shows nonsense for the variance part. What should I remedy this issue?
The main problem was that F1 and F2 were modified within the loop and not reset, so eventually a very wide range of U's are considered to be in the middle.
The second problem was on the search downward the p(i) used should be the original i, because F(x) = P(X <= x). Without this, the code hangs for low U.
The easiest fix for this is to start i = I + 1. Then "in the middle" if statement isn't needed.
pois_inv_trans_improved = function(n, lambda){
X = rep(0, n) # generate n samples
p = function(x) {dpois(x,lambda)} # PDF: p(x) = P(X=x) = λ^x exp(-λ)/x!
`F` = function(x) {ppois(x,lambda)} # CDF: F(x) = P(X ≤ x)
I = floor(lambda) # I=Int(λ)
F1 = F(I); F2 = F(I+1) # two close values
for(k in 1:n){
U = runif(1)
i = I + 1
# if ( F1 < U & U <= F2 ) {
# i = I + 1
# }
F1tmp = F1
while (U <= F1tmp){ # search downward
i = i-1; F1tmp = F1tmp - p(i);
}
F2tmp = F2
while (U > F2tmp){ # search upward
i = i+1; F2tmp = F2tmp + p(i)
}
X[k] = i
}
X
}
This gives:
[1] 100.0056 102.2380

How to convert MATLAB function into R function?

I am both new to maximum likelihood and writing loop functions in R. I am playing around with the function in Matlab and I'm wondering if its correctly transcribed in R? There is not easy way for me to check it since I'm not familiar with MATLAB. The code uses equations (1) and (3) from the paper.
#Matlab Code. The matlab code imposes two conditions under which lnpq takes different values depending on q.
p=N/K;
if q == 0
lnqp = log(p);
else
lnqp =((p^q)-1)/q;
end
Y = ((aa *((p*K/Ka)-1))-1)*lnqp;
#R code. In the R code function, I'm trying to impose similar conditions on lnpq. Here is my attempt. However, I don't know how to compare if the values obtained from Matlab are similar to R. I am not sure how to verify across platforms
p <- c(1:00)
skewfun <- function(aa, K, Ka, q){
Y <- ifelse(q = 0, ((aa *((p*K/Ka)-1))-1)*log(p),((aa *((p*K/Ka)-1))-1)*((p^q)-1)/q)
}
The inputs can be either numeric scalars or vectors. If you don't have Matlab you could install Octave which is free and compatible with Matlab and try the original Matlab code there and then compare the outputs for a test case to the result of running the following on the same test case to ensure that it gives the same result.
f <- function(aa, N, K, Ka, q) {
p <- N / K
lnqp <- ifelse(q == 0, log(p), (p^q - 1) / q)
Y <- (aa * (p * K / Ka - 1) - 1) * lnqp
Y
}
aa <- 1; N <- 1; K <- 1; Ka <- 1; q <- 1 # test data: change to use your data
f(aa, N, K, Ka, q)
## [1] 0
You can use matconv package to facilitate automatic code conversion:
matconv::mat2r(inMat = "Y = ((aa *((p*K/Ka)-1))-1)*lnqp;")
will return:
$matCode
[1] "Y = ((aa *((p*K/Ka)-1))-1)*lnqp;"
$rCode
[1] "Y <- ((aa *((p*K/Ka)-1))-1)*lnqp"
R mailing list has also useful bash script that you may use for that purpose.

How to solve this quadratic optimization problem in R?

I am currently trying to implement a bigger simulation exercise but i'm stuck with this bit.
The aim is to find the vector p* (2x1) that maximizes this function (p* = argmax of h):
Equation
Also Y and q are given and all other quantities in the function are defined using them.
P_priority_i <- function(unknown, arg1, arg2, i){
mu = 2
delta = 0.00001
c <- c(pbar[i,] + rep(delta,m))
e <- rep(0,2)
s <- rep(0,2)
for (j in 1:m){
e[j] <- x[i,j] + sum(A[[j]][i,]*min(pbar[i,j],arg1[i,j]))
}
if(y[i,'countries'] != 'IT'){
s[1] <- min(pbar[i,1],(max(0,sum(arg2*e)))/arg2[1])
s[2] <- min(pbar[i,2],(max(0,sum(arg2*e)-arg2[1]*s[1]))/arg2[2])
value <- -0.5*t(c-unknown)%*%diag(arg2/(c-s))%*%(c-unknown)
return(value)
} else {
s[2] <- min(pbar[i,2],(max(0,sum(arg2*e)))/arg2[2])
s[1] <- min(pbar[i,1],(max(0,sum(arg2*e)-arg2[2]*s[2]))/arg2[1])
value <- -0.5*t(c-unknown)%*%diag(arg2/(c-s))%*%(c-unknown)
return(value)
}}
I've checked the formulation of the function, whose output is a scalar, and it is correct.
I also have 3 constraints on p*:
Constraints
where \bar{p} and x are given quantities.
I've found quadprog package but I don't know how to solve this particular problem using solve.QP function () which supposes an objective function as (− d^T b + 0.5 b^T D b). The problem is that the argument of my maximization should be p and not (c-p) (also the constrains are formulated w.r.t p).
How can i set up this in R?

How to obtain the probability distribution of a sum of dependent discrete random variables more efficiently

I hope you are well. I was wondering if you could help me with the question provided in the attached link, please. Below the link I attach an R-code that solves the problem recursively for particular values of the parameters of the distributions involved. However, I realized that this method is inefficient. Thanks a lot for your help.
How to obtain the probability distribution of a sum of dependent discrete random variables more efficiently
library(boot) # The library boot is necessary to use the command inv.logit.
TMax <- 500 # In this R-code, I am using TMax instead of using T.
M <- 2000
beta0 <- 1
beta1 <- 0.5
Prob_S <- function(k, r){ # In this R-code, I am using r instead of using t.
if(r == 1){
Aux <- dbinom(x = k, size = M, prob = inv.logit(beta0))
}
if(r %in% 2:TMax){
Aux <- 0
for(u in 0:k){
Aux <- Aux + dbinom(x = k - u, size = M - u,
prob = inv.logit(beta0 + beta1 * u)) * Prob_S(u, r - 1)
}
}
Aux
}
m <- 300
P <- Prob_S(k = m, r = TMax) # Computing P takes a loooong time. :(

Trying to use the collin function in the R package FME to identify parameters and then fit them using modFit

So I have a system of ode's and some data I am using the R packages deSolve and FME to fit the parameters of the ode system to data. I am getting a singular matrix result when I fit the full parameter set to the data. So I went back and looked at the collinearity of the parameters using a collinearity index cut-off of 20 as suggested in all the FME package documentation I then picked a few models with subsets of parameters to fit. Then when I run modFit I get this error:
Error in approx(xMod, yMod, xout = xDat) :
need at least two non-NA values to interpolate
Can anyone enlighten me as to a fix for this. Everything else is working fine. So this is not a coding problem.
Here is a minimal working example (removing r=2 in modFit creates the error which I can fix in the minimal working example but not in my actual problem so I doubt a minimal working example helps here):
`## =======================================================================
## Now suppose we do not know K and r and they are to be fitted...
## The "observations" are the analytical solution
## =======================================================================
# You need these packages
library('deSolve')
library('FME')
## logistic growth model
TT <- seq(1, 100, 2.5)
N0 <- 0.1
r <- 0.5
K <- 100
## analytical solution
Ana <- cbind(time = TT, N = K/(1 + (K/N0 - 1) * exp(-r*TT)))
time <- 0:100
parms <- c(r = r, K = K)
x <- c(N = N0)
logist <- function(t, x, parms) {
with(as.list(parms), {
dx <- r * x[1] * (1 - x[1]/K)
list(dx)
})
}
## Run the model with initial guess: K = 10, r = 2
parms["K"] <- 10
parms["r"] <- 2
init <- ode(x, time, logist, parms)
## FITTING algorithm uses modFit
## First define the objective function (model cost) to be minimised
## more general: using modFit
Cost <- function(P) {
parms["K"] <- P[1]
parms["r"] <- P[2]
out <- ode(x, time, logist, parms)
return(modCost(out, Ana))
}
(Fit<-modFit(p = c(K = 10,r=2), f = Cost))
summary(Fit)`
I think the problem is in your Cost function. If you don't provide both K and r, then the cost function will override the start value of r to NA. You can test this:
Cost <- function(P) {
parms["K"] <- P[1]
parms["r"] <- P[2]
print(parms)
#out <- ode(x, time, logist, parms)
#return(modCost(out, Ana))
}
Cost(c(K=10, r = 2))
Cost(c(K=10))
This function works:
Cost <- function(P) {
parms[names(P)] <- P
out <- ode(x, time, logist, parms)
return(modCost(out, Ana))
}
The vignette FMEDyna is very helpful: https://cran.r-project.org/web/packages/FME/vignettes/FMEdyna.pdf See page 14 on how to specify the Objective (Cost) function.

Resources