Passing a function (distribution) to other arguments which are functions themselves - r

I am trying to do multiple samples of a distribution with a function. The trouble I am having is that when I pass the distribution into the function all the means come out the same as it appears my distribution is not being run each time inside the for loop.
Test line:
test(100,100,dist = rbinom(x, 1, 0.50))
Code
test = function(N, n, dist){
means = matrix(rep(0,times=N,nrow=N,ncol=1))
x = n
for(i in 1:N){
means[i,1]<- mean(dist)
print(means[i,1])
}
}
This question is similar to Passing a function argument to other arguments which are functions themselves but I seem to be having a different type of problem.

One option would be to pass function and parameters separately. E.g.
test(100, 100, dist = rbinom, list(1, 0.5))
test = function(N, n, dist, ...){
means = matrix(rep(0,times=N,nrow=N,ncol=1))
x = n
for(i in 1:N){
means[i, 1] <- mean(do.call(dist, c(x, ...)))
print(means[i, 1])
}
}

Related

Vectorising density of mixture Gaussian distribution and integrating/plotting in R

I'm trying to write the density of a mixture Gaussian distribution to an arbitrary power, b, in R. Currently, I have two methods that works, but I prefer if I could avoid a for loop.
dnorm_mix_tempered_unnorm <- function(x, w, m, s, b) {
value <- 0
for (i in 1:length(w)) {value <- value + w[i]*dnorm(x, mean = m[i], sd = s[i])}
value <- value^(b)
return(value)
}
Alternatively, I can vectorise this to avoid the for loop:
dnorm_mix_tempered_unnorm <- function(x, w, m, s, b) {
return(sum(w*dnorm(x, mean = m, sd = s))^b)
}
Both of these give the same result, but the second is more efficient since it is vectorised. But I need to next normalise this so that the density integrates to 1, I do this by using:
dnorm_mix_tempered <- function(x, weights, means, sds, beta) {
norm_constant <- integrate(function(x) dnorm_mix_tempered_unnorm(x, w = weights,
m = means, s = sds, b = 1/beta), lower = -Inf,
upper = Inf)$value
value <- dnorm_mix_tempered_unnorm(x, w = weights, m = means, s = sds, b = 1/beta)
/ norm_constant
return(value)
}
If I define dnorm_mix_tempered_unnorm with for loops, this works with no problem, and I can use curve() to plot the density. But if I define dnorm_mix_tempered_unnorm by using vectorisation, then I get the following error:
Error in integrate(function(x) dnorm_mix_tempered_unnorm(x, w = weights, :
evaluation of function gave a result of wrong length
Does anyone know what is going on when I am vectorising instead and trying to integrate?
Thanks in advance,
R.
A possible option is
dnorm_mix_tempered_unnorm <- function(x, w, m, s, b) {
return(rowSums(mapply(dnorm, mean = m, sd = m, MoreArgs = list(x = x)))^b)
}
But I think it is quite similar to your first proposal.

Passing arguments of an R function which is itself an argument

Environments and the like have always confused me incredibly in R. I guess therefore this is more of a reference request, since I've been surfing the site for the last hour in search of an answer to no avail.
I have a simple R function called target defined as follows
target <- function(x,scale,shape){
s <- scale
b <- shape
value <- 0.5*(sin(s*x)^b + x + 1)
return(value)
}
I then define the function AR
AR <- function(n,f,...){
variates <- NULL
for(i in 1:n){
z <- runif(1)
u <- runif(1)
if(u < f(z, scale, shape)/c){
variates[i] <- z
}else{next}
}
variates <- variates[!is.na(variates)]
return(variates)
}
in which the function target is being evaluated. Unfortunately, the call returns the following error
sample <- AR(n = 10000, f = target, shape = 8, scale = 5)
Error in fun(z, scale, shape) : object 'shape' not found
I know this has to do with the function AR not knowing where to look for the objects shape and scale, but I thought that was exactly the job of the ellipsis: allowing me to sort of put argument definition "on hold" until one actually calls the function. Where am I wrong and could anyone give me a lead as to where to look for insight on this specific problem?
You are very close, you just need to make use of your ellipses...
NB: c was not defined in AR so I added it and gave it a value.
NB2: I would refrain from using c and sample in your function as these themselves are functions and could cause some confusion downt he road.
AR <- function(n, f, c, ...){
variates <- NULL
for(i in 1:n){
z <- runif(1)
u <- runif(1)
if(u < f(z, ...)/c){ ##instead of using shape and scale use the ellipses and R will insert any parameters here which were not defined in the function
variates[i] <- z
}else{next}
}
variates <- variates[!is.na(variates)]
return(variates)
}
sample <- AR(n = 10000, f = target, shape = 8, scale = 5, c = 100)

How to loop all values in a dataframe as the start value in maxLik

I'm doing Maximum Likelihood Estimation using maxLik, which requires specifying starting values. Instead of specifying a single value, is there any way that allows me to use all the values from a matrix as the start value?
My current code of maxLik is:
f12 <- function(param){
alpha <- param[1]
rho <- param[2]
lambda <- param[3]
u <- 0.5*(p12$v_50_1)^alpha + 0.5*lambda*(p12$v_50_2)^alpha
p <- 1/(1 + exp(-rho*u))
f <- sum(p12$gamble*log(p) + (1-p12$gamble)*log(1-p))}
ml <- maxLik(f12, start = c(alpha = 1, rho=2, lambda = 1), method = "NM")
I create a dataframe with the upper and lower bounds of potential start values:
st <- expand.grid(alpha = seq(0, 2, len = 100),rho = seq(0, 1, len = 100),lambda = seq(0,2, length(100))
There are 3 parameters in my function, and my goal is to loop all the values in the above dataframe st and select the best vector of start values after running the model from a variety of starting parameters.
Thanks!
Consider Map (wrapper to mapply) to pass the st columns elementwise through your methods. Here, Map will return a list of maxLik objects, specifically inherited maxim class objects containing a list of other components. The number of items in this list will be equal to rows of st.
Notice input parameters, a, r, and l being passed into start argument of maxLik() and no longer hard-coded integers. And f12 is left untouched.
maxLik_run <- function(a, r, l) {
tryCatch({
f12 <- function(param){
alpha <- param[1]
rho <- param[2]
lambda <- param[3]
u <- 0.5*(p12$v_50_1)^alpha + 0.5*lambda*(p12$v_50_2)^alpha
p <- 1/(1 + exp(-rho*u))
f <- sum(p12$gamble*log(p) + (1-p12$gamble)*log(1-p))
}
return(maxLik(f12, start = c(alpha = a, rho = r, lambda = l), method = "NM"))
}, error = function(e) return(NA))
}
st <- expand.grid(alpha = seq(0, 2, len = 100),
rho = seq(0, 1, len = 100),
lambda = seq(0, 2, length(100)))
maxLik_list <- Map(maxLik_run, st$alpha, st$rho, st$lambda)
And to answer the question --best vector of start values after running the model from a variety of starting parameters-- requires a particular definition of "best". Once you define this, you can use Filter() on your returned list of objects to select the one or more element that yields this "best".
Below is a demonstration to find the highest value across each maximum likelihood's maximum. Use estimate if needed. Do note, this returned list can have more than one if the highest value is shared by other list items:
highest_value <- max(sapply(maxLik_list, function(item) item$maximum))
maxLik_item_list <- Filter(function(i) i$maximum == highest_value, maxLik_list)
What you are doing in your logLik function is that you are calculating alpha,lambda,rho whereas your data already has them.Those are the lines with u,p and f12(that is also your function name!). Also it is possible to calculate log likelihood for one row as your log likelihood function has single indices. So you run the code using apply like this
#create a function to find mle estimate for first row
maxlike <- function(a) {
f12 <- function(param){
alpha <- param[1]
rho <- param[2]
lambda <- param[3]
#u <- 0.5*(p12$v_50_1)^alpha + 0.5*lambda*(p12$v_50_2)^alpha
#p <- 1/(1 + exp(-rho*u))
#f12 <- sum(p12$gamble*log(p) + (1-p12$gamble)*log(1-p))
}
ml <- maxLik(f12, start = c(alpha = 1, rho=2, lambda = 1), method = "NM")
}
#then using apply with data = st, 2 means rows and your mle function
mle <- apply(st,2,maxlike)
mle

How to make this R code (for loop) more efficient?

I am doing a simulation study and I wrote the following R code. Is there anyway to write this code without using two for loop, or make it more efficient (run faster)?
S = 10000
n = 100
v = c(5,10,50,100)
beta0.mle = matrix(NA,S,length(v)) #creating 4 S by n NA matrix
beta1.mle = matrix(NA,S,length(v))
beta0.lse = matrix(NA,S,length(v))
beta1.lse = matrix(NA,S,length(v))
for (j in 1:length(v)){
for (i in 1:S){
set.seed(i)
beta0 = 50
beta1 = 10
x = rnorm(n)
e.t = rt(n,v[j])
y.t = e.t + beta0 + beta1*x
func1 = function(betas){
beta0 = betas[1]
beta1 = betas[2]
sum = sum(log(1+1/v[j]*(y.t-beta0-beta1*x)^2))
return((v[j]+1)/2*sum)
}
beta0.mle[i,j] = nlm(func1,c(1,1),iterlim = 1000)$estimate[1]
beta1.mle[i,j] = nlm(func1,c(1,1),iterlim = 1000)$estimate[2]
beta0.lse[i,j] = lm(y.t~x)$coef[1]
beta1.lse[i,j] = lm(y.t~x)$coef[2]
}
}
The function func1 inside the second for loop is used for nlm function (to find mle when errors are t distributed).
I wanted to use parallel package in R but I didn't find any useful functions.
The key to getting anything to run faster in R is replacing for loops with vectorized functions (such as the apply family). Additionally, as for any programming language, you should look for places where you are calling expensive functions (such as nlm) more than once with the same parameters and see where you can store the results rather than recomputing each time.
Here I am starting as you did by defining the parameters. Also since beta0 and beta1 always 50 and 10 I am going to define those here as well.
S <- 10000
n <- 100
v <- c(5,10,50,100)
beta0 <- 50
beta1 <- 10
Next we will define func1 outside the loop to avoid redefining it each time. func1 now has two extra parameters, v and y.t so that it can be called with the new values.
func1 <- function(betas, v, y.t, x){
beta0 <- betas[1]
beta1 <- betas[2]
sum <- sum(log(1+1/v*(y.t-beta0-beta1*x)^2))
return((v+1)/2*sum)
}
Now we actually do the real work. Rather than having nested loops, we use nested apply statements. The outer lapply will make a list for each value of v and the inner vapply will make a matrix for the four values you want to get (beta0.mle, beta1.mle, beta0.sle, beta1.lse) for each value of S.
values <- lapply(v, function(j) vapply(1:S, function(s) {
# This should look familiar, it is taken from your code
set.seed(s)
x <- rnorm(n)
e.t <- rt(n,j)
y.t <- e.t + beta0 + beta1*x
# Rather than running `nlm` and `lm` twice, we run it once and store the results
nlmmod <- nlm(func1,c(1,1), j, y.t, x, iterlim = 1000)
lmmod <- lm(y.t~x)
# now we return the four values of interest
c(beta0.mle = nlmmod$estimate[1],
beta1.mle = nlmmod$estimate[2],
beta0.lse = lmmod$coef[1],
beta1.lse = lmmod$coef[2])
}, numeric(4)) # this tells `vapply` what to expect out of the function
)
Finally we can reorganize everything into the four matrices.
beta0.mle <- vapply(values, function(x) x["beta0.mle", ], numeric(S))
beta1.mle <- vapply(values, function(x) x["beta1.mle", ], numeric(S))
beta0.lse <- vapply(values, function(x) x["beta0.lse.(Intercept)", ], numeric(S))
beta1.lse <- vapply(values, function(x) x["beta1.lse.x", ], numeric(S))
As a final note, it may be possible to reorganize this to run even faster depending on why you are using the S index to set the seed. If it is important to know what seed was used to generate your x with rnorm then this may be there best I can do. However if you are only doing it to ensure that all of your values of v are being tested on the same values of x then there may be more reorganizing we can do that may produce more speed up using replicate.

What is the rationale behind this type of data input?

The example is from the rootsolve package:
We have this function:
gradient(f, x, centered = FALSE, pert = 1e-8, ...)
in which f is a function and x is the data input in for of a vector
Now the following is an instance of the code being run:
logistic <- function (x, times) {
with (as.list(x),
{
N <- K / (1+(K-N0)/N0*exp(-r*times))
return(c(N = N))
})
}
# parameters for the US population from 1900
x <- c(N0 = 76.1, r = 0.02, K = 500)
# Sensitivity function: SF: dfi/dxj at
# output intervals from 1900 to 1950
SF <- gradient(f = logistic, x, times = 0:50)
My question is how does the code understand to use times in its routine. It's not defined globally and it is not part of the function input list either. Is it possible to pass inputs to a function when it is not defined in its structure? Does ... play a role here?
... is just a way of getting an extra arguments and passing them on to another function.
Simple example:
power.function <- function(x,power) { x^power }
apply.function <- function(f, data, ...) { f(data, ...) }
sample <- c(1,2,3)
apply.function (power.function, sample, power = 3)
# which is the same as
apply.function (power.function, sample, 3)
produces
> apply.function (power.function, sample, 3)
[1] 1 8 27
EDIT
To make it crystal clear, if you look at the source of the rootSolve::gradient you'll see the definition as
function (f, x, centered = FALSE, pert = 1e-08, ...)
and further down the call to
reff <- f(x, ...)
which is the same as described above in the example.

Resources