MLE in R (multiple parameter function) - r

I'm trying to do MLE in R to simulate mark recapture methods. I'm trying to maximise this function;
likelihood <- function(N, p){
likelihood <- 1
X <- 0
for (s in 1:S){
X <- X + M[s]
}
likelihood <- likelihood * p^(X) * (1-p)^(N*S-X)
likelihood <- likelihood * (factorial(N)/factorial((N-length(U))))
return(likelihood)
}
I'm trying to maximise the N, all the other parameters are known or estimated and so assumed known
Have looked into nlm and optim but can't get either to do what I want...
In response to answers;
Thanks for all the answers, I recoded the likelihood and managed to get something working, see the code below. To answer questions/answers specifically
1. Thanks for X <- sum(M[1:S]) nice trick, The X is counting 'all the animals ever caught (not unique captures)'
2. M is generated in the code below, based on n (captures) it is a count of number of animals marked on each survey, S is number of surveys.
captures <- function(N, S, P){
P <- replicate(S, P) #Capture Probability (same across animals and surveys)
captures <- t(replicate(N, rbinom(S, 1, P))) #Generate capture data from N animals with S surveys with capture Probability P
return(captures)
}
marked <- function(N, S, captures){ #Count numbers that were marked on each survey
M <- replicate(S, 0) #Initialise the 'marked' variable to zero
for (s in 1:S){
for (i in 1:N){
if (captures[i, s] == 1){
M[s] <- M[s] + 1
}
}
}
return(M)
}
unseen <- function(N, S, captures){#Count total number of animals that were unseen on any survey
U <- array()
seen <- replicate(N, FALSE) #All animals begin as unseen
for (i in 1:N){
for (s in 1:S){
if (captures[i, s] == 1){
seen[i] = TRUE
}
}
if (seen[i] == FALSE){
U <- c(U, i)
}
}
U <- U[-1] #fix the N/A in the first index
return(U)
}
firstcaught <- function(N, S, captures){#Produce a vector containing how many animals were caught for the first time on each survey
fc <- 0
seen <- replicate(N, FALSE)
for (s in 1:S){
fc[s] <- 0
for (i in 1:N){
if ((seen[i] == FALSE) && (captures[i, s] == 1)){
fc[s] <- fc[s] + 1
seen[i] = TRUE
}
}
}
return(fc)
}
##Generate data using functions
N <- 200
S <- 10
P <- 0.2
n <- captures(N,S,P)
M <- marked(N,S,n)
U <- unseen(N,S,n)
fc <- firstcaught(N,S,n)
Mc <- cumsum(fc) #A running total of animals that have been captured at least once
##Define a likelihood for model M_0
likelihood <- function(N, P){
likelihood <- 1
X <- 0
for (s in 1:S){
X <- X + M[s]
likelihood <- likelihood * (1-P)^(N-M[s])
likelihood <- likelihood * choose(N-M[s],fc[s])
}
likelihood <- likelihood * P^(X) * (1-P)^(-X)
return(-log(likelihood))
}
##Find the MLE
out <- nlm(f = likelihood, p = 200, P = P, check.analyticals = TRUE)

Related

How to fit it `Error in hist.default(res) : 'x' must be numeric`?

Following this question: How to get the value of `t` so that my function `h(t)=epsilon` for a fixed `epsilon`?
I first sampling 500 eigenvectors v of a random matrix G and then generate 100 different random vectors initial of dimension 500. I normalized them in mats.
#make this example reproducible
set.seed(100001)
n <- 500
#Sample GOE random matrix
A <- matrix(rnorm(n*n, mean=0, sd=1), n, n)
G <- (A + t(A))/sqrt(2*n)
ev <- eigen(G)
l <- ev$values
v <- ev$vectors
#size of multivariate distribution
mean <- rep(0, n)
var <- diag(n)
#simulate bivariate normal distribution
initial <- MASS::mvrnorm(n=1000, mu=mean, Sigma=var) #ten random vectors
#normalized the first possible initial value, the initial data uniformly distributed on the sphere
xmats <- lapply(1:1000, function(i) initial[i, ]/norm(initial[i, ], type="2"))
Then I compute res
h1t <- function(t,x_0) {
h10 <- c(x_0 %*% v[, n])
denom <- vapply(t, function(.t) {
sum((x_0 %*% v)^2 * exp(-4*(l - l[n]) * .t))
}, numeric(1L))
abs(h10) / sqrt(denom)
}
find_t <- function(x, epsilon = 0.01, range = c(-50, 50)) {
uniroot(function(t) h1t(t, x) - epsilon, range,
tol = .Machine$double.eps)$root
}
I want to get res:
res <- lapply(xmats, find_t)
However, it shows error that Error in uniroot(function(t) h1t(t, x) - epsilon, range, tol = .Machine$double.eps) : f() values at end points not of opposite sign
res is a list. I run hist(unlist(res)) and it worked well.

Minimum sample size n such that difference is no more than

What is the minimum sample size n (or the length n = length(x) of the data vector x) such that the difference D = 1 - statx4(x)/statx5(x) of the functions statx4 and statx5 is no more than 1/100 i.e. D ≤ 1/100?
And here are the functions:
statx4 <- function(x) {
numerator <- sum((x-mean(x))^2)
denominator <- length(x)
result <- numerator/denominator
return(result)
}
statx5 <- function(x) {
numerator <- sum((x-mean(x))^2)
denominator <- length(x)-1
result <- numerator/denominator
return(result)
}
I've been doing this exercise set for a while, but haven't managed to get anything valid on this question. Could you point me to right direction?
For the normal distribution, it is the following:
statx4 <- function(x) {
numerator <- sum((x-mean(x))^2)
denominator <- length(x)
result <- numerator/denominator
return(result)
}
statx5 <- function(x) {
numerator <- sum((x-mean(x))^2)
denominator <- length(x)-1
result <- numerator/denominator
return(result)
}
D <- function(x){
1-statx4(x)/statx5(x)
}
DD <- function(N=1111,seed =1){
set.seed(seed)
Logi <- vector()
for (n in 1:N) {
x<- rnorm(n)
y <- D(x)
Logi[n] <- (y > 1/100)
}
return(Logi)
}
min <- vector()
for (seed in 1:100) {
message(seed)
DD(1000,seed)
min[seed] <- length(which(DD(1000) == TRUE))
}
Answer <- mean(min)+1
Answer
Note that the function D evaluates the difference of the unbiased variance and the ordinal variance.
I think this problem should be more clear in mathematical sense.
I got solutions today and all you had to do was guess random values:
a <- rnorm(99); 1-statx4(a)/statx5(a)
a <- rnorm(100); 1-statx4(a)/statx5(a)
a <- rnorm(101); 1-statx4(a)/statx5(a)`
And correct answer is 100.
Thank you all for help.

R function writing - getting error: NaNs producedError in tsort[U + 1]only 0's may be mixed with negative subscripts

I am creating an R function that calculates a bootstrapped bias corrected and accelerated interval, (not using any pre-installed packages) My code seems to be working but am struggling actually writing the code for the lower and upper limits of the interval. Any suggestions would be helpful.
BCa <- function(stat,X,k,level=0.95,...){
if(!is.numeric(k)||k<=0){
stop("The number of bootstrap resamples 'k' must be a numeric value greater than 0")
}
t.star <- stat(X,...)
t.k <- rep(NA,k)
for(i in 1:k){
Xi <- sample(X,replace=TRUE)
t.k[i] <- stat(Xi,...)
}
z0 <- qnorm(mean(t.k<t.star))
n <- length(X)
t.minus.j <- rep(NA,n)
for(j in 1:n){
Xj <- X[-j]
t.minus.j[j]<- stat(Xj,...)
}
t.bar.minus <- mean(t.minus.j)
t.diff <- t.bar.minus - t.minus.j
a <- ((sum(t.diff^3))/(6*(t.diff^2)^3/2))
alpha <- 1-level
tsort <- sort(t.k, decreasing = FALSE)
L <- pnorm(z0 + ((z0+qnorm(alpha/2))/((1-a)*z0+qnorm(alpha/2))))
U <- qnorm(z0 + ((z0+qnorm(alpha/2))/((1-a)*z0+qnorm(alpha/2))))
if(!is.integer(L)){
L <- floor(L*(k+1))
}
if(!is.integer(U)){
U <- ceiling(U*(k+1))
}
lower.limit <- tsort[L]
upper.limit <- tsort[U+1]
return(list(t.star=t.star,ci=c(lower.limit,upper.limit)))
}

Use the markovchain package to compare two empirically estimated Markov chains

I need to compare two probability matrices to know the degree of proximity of the chains, so I would use the resulting P-Value of the test.
I tried to use the markovchain r package, more specifically the divergenceTest function. But, the problem is that the function is not properly implemented. It is based on the test of the book "Statistical Inference Based on Divergence Measures" on page 139, I contacted the package developers, but they still have not corrected, so I tried to implement, but I'm having trouble, could anyone help me to find the error?
Parameters: freq_matrix: Is a frequency matrix used to estimate the probability matrix. hypothetic: Is the matrix used to compare with the estimated matrix.
divergenceTest3 <- function(freq_matrix, hypothetic){
n <- sum(freq_matrix)
empirical = freq_matrix
for (i in 1:length(hypothetic)){
empirical[i,] <- freq_matrix[i,]/rowSums(freq_matrix)[i]
}
M <- nrow(empirical)
v <- numeric()
out <- 2 * n / .phi2(1)
sum <- 0
c <- 0
for(i in 1:M){
sum2 <- 0
sum3 <- 0
for(j in 1:M){
if(hypothetic[i, j] > 0){
c <- c + 1
}
sum2 <- sum2 + hypothetic[i, j] * .phi(empirical[i, j] / hypothetic[i, j])
}
v[i] <- rowSums(freq_matrix)[i]
sum <- sum + ((v[i] / n) * sum2)
}
TStat <- out * sum
pvalue <- 1 - pchisq(TStat, c-M)
cat("The Divergence test statistic is: ", TStat, " the Chi-Square d.f. are: ", c-M," the p-value is: ", pvalue,"\n")
out <- list(statistic = TStat, p.value = pvalue)
return(out)
}
# phi function for divergence test
.phi <- function(x) {
out <- x*log(x) - x + 1
return(out)
}
# another phi function for divergence test
.phi2 <- function(x) {
out <- 1/x
return(out)
}
The divergence test has been replaced by the verifyHomogeneityfunction. It requires and input list of elements that can be coerced to a raw transition matrix (as of createSequenceMatrix). Then it tests whether they belong to the same unknown DTMC.
See the example below:
myMatr1<-matrix(c(0.2,.8,.5,.5),byrow=TRUE, nrow=2)
myMatr2<-matrix(c(0.5,.5,.4,.6),byrow=TRUE, nrow=2)
mc1<-as(myMatr1,"markovchain")
mc2<-as(myMatr2,"markovchain")
mc
mc2
sample1<-rmarkovchain(n=100, object=mc1)
sample2<-rmarkovchain(n=200, object=mc2)
# should reject
verifyHomogeneity(inputList = list(sample1,sample2))
#should accept
sample2<-rmarkovchain(n=200, object=mc1)
verifyHomogeneity(inputList = list(sample1,sample2))

Custom Kalman Filter not producing desired Maximum Likelihood Estimates in R

I am attempting to implement the local level model in R - for a reference see http://personal.vu.nl/s.j.koopman/documents/LLIntro2011.pdf. I have implemented the code below. For any input data mydata the maximum likelihood estimates are giving the same number for sigma_n and sigma_e. I do not know what the source of the problem is.
locallevel <- function(x,data) {
sigma_e <- exp(x[1])
sigma_n <- exp(x[2])
len <- length(data)
y <- matrix(data,nrow=len,ncol=1)
a <- matrix(0,nrow=len,ncol=1)
v <- matrix(0,nrow=len,ncol=1)
F <- matrix(0,nrow=len,ncol=1)
P <- matrix(0,nrow=len,ncol=1)
K <- matrix(0,nrow=len,ncol=1)
for (i in 1:len) {
if(i==1) {
a[i]=y[i]
P[i]=sigma_e+sigma_n
F[i]=P[i]+sigma_e
v[i]=y[i]-a[i]
} else {
a[i]=a[i-1]+((P[i-1])/(P[i-1]+sigma_e))*(y[i-1]-a[i-1])
P[i]=((P[i-1]*sigma_e)/(P[i-1]+sigma_e))+sigma_n
F[i]=P[i]+sigma_e
v[i]=y[i]-a[i]
}
}
loglik = -(len/2)*log(2*pi)-0.5*sum(log(abs(F))+((v^2)/F))
loglik = -loglik
}
mlest<-optim(c(log(var(mydata)),log(var(mydata))),function(x) {locallevel(mydata,x)})

Resources