doing likelihood plot in R for binomial model [closed] - r

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Can anybody tell me how do i plot the maximum likelihood values L(ˆθM, M) versus M for a suitable range of M values for the count data provided in frogs and then estimate the total number of frogs living in the pond and the probability of appearance in R?
These were the questions asked:
questions and i have answered a and b
I have my pmf of my module and finded likelihood and log likelihood of my binomial model and you can see how much code i have written so far please help!
solutions to a,b and c so far
# importing the necessary modules
library(tidyverse)
library(ggplot2)
# loading the data
load("~/Statistical Modelling and Inference/aut2020.RData")
# Assigning a variable to the data
data <- frogs
# Assigning n to the length of the data
n <- length(frogs$counts)
n
theta_hat <- sum(frogs$counts)/M
loglik <- function(theta, y, M, data){
# Computes the log_likelihood for binomial model
sum_y <- sum(data$counts)
M <- sum_y / n
sum(log(dbinom(M,y))) + sum(y)*log(theta) + n*M - sum(y)*log(1-theta)}
Data looks like this:
in r script
when readed

Since you have already found the likelihood function in your answer (a), you can see that it is a function of M and theta - both unknown.
After estimating theta you have the MLE estimator - let's call it theta_hat.
In the dataframe frogs you have all the count observations y_i (known). So, using the known data and the ML estimate theta_hat that means the likelihood can be plotted for some (reasonable) range of values of M (you might need to try different ranges). So plot L(theta_hat, M) as a function of M. Bear in mind though that the estimate theta_hat will change as you change M so take that into account. The point where L(theta_hat, M) is maximized are you ML estimates for theta and M.

Related

Lognormal Stock Price Distribution in R [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
My goal is to obtain a lognormal distribution of stock prices, that I can then use to calculate the expected utilty an agent would receive from holding such stocks. But I am a bit stuck on how to achieve this. The distribution of stock prices is lognormal with volatility σ and expected returns r obtained from the capital asset pricing model.
For example, σ = 0.01825838 and r = 0.13053162. I tried to generate the distribution of stock prices with:
dist <- rlnorm(1000, 0.13053162, 0.01825838)
However, this distribution is normally distributed, and not lognormally distributed. How can I generate this distribution that I subsequently can use to calculate the expected utility assuming the agent have constant relative risk aversion:
I infer from this post on Mathematics StacksExchange that the lognormal approximates the normal for small sigma. So you are sampling from lognormal with the code you provide, but since you have a small sigma, it can be approximated by the normal.
You can see the approximation of the normal visually by varying sigma
hist(rlnorm(1000, meanlog = 0.1305, sdlog = 0.500)
hist(rlnorm(1000, meanlog = 0.1305, sdlog = 0.018)
I don't really understand the second part of your question, but to create a lognormal distribution, you can use the property that if X follows a log-normal distribution and Y = ln(X), then Y follows a normal distribution.
https://en.wikipedia.org/wiki/Log-normal_distribution
So something like:
set.seed(1234)
dist <- rnorm(1000, 1, .5)
ldist <- exp(dist)
hist(ldist)
This looks log-normal to me.

maximizing with two functions in R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I would like to find a maximum economic stress scenario restricted by a limit of the mahalanobis distance of this scenario. For this, I have to consider two functions in the optimization.
To make it easier, we can work with a simplifying problem: We have a simple linear model: y=a+bx. For this I want to minimize: sum(a+bx-y)^2. But also, I have for example the restriction that: (ab*5)/2<30.
To calculate this problem with the excel solver is not a problem. But, how I get this in r?
You could try to incorporate the constraint into the objective function, like this
# example data whose exact solution lies outside the constraint
x <- runif(100, 1, 10)
y <- 3 + 5*x + rnorm(100, mean=0, sd=.5)
# big but not too big
bigConst <- sum(y^2) * 100
# if the variables lie outside the feasible region, add bigConst
f <- function(par, x, y)
sum((par["a"]+par["b"]*x-y)^2) +
if(par["a"]*par["b"]>12) bigConst else 0
# simulated annealing can deal with non-continous objective functions
sol <- optim(par=c(a=1, b=1), fn=f, method="SANN", x=x, y=y)
# this is how it looks like
plot(x,y)
abline(a=sol$par["a"], b=sol$par["b"])

distribution from percentage with R [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have distribution of parameter (natural gas mixture composition) expressed in percents. How to test such data for distribution parameters (it should be gamma, normal or lognormal distribution) and generate random composition based on that parameters in R?
This might be a better question for CrossValidated, but:
it is not generally a good idea to choose from among a range of possible distributions according to goodness of fit. Instead, you should choose according to the qualitative characteristics of your data, something like this:
Frustratingly, this chart doesn't actually have the best choice for your data (composition, continuous, bounded between 0 and 1 [or 0 and 100]), which is a Beta distribution (although there are technical issues if you have values of exactly 0 or 100 in your sample).
In R:
## some arbitrary data
z <- c(2,8,40,45,56,58,70,89)
## fit (beta values must be in (0,1), not (0,100), so divide by 100)
(m <- MASS::fitdistr(z/100,"beta",start=list(shape1=1,shape2=1)))
## sample 1000 new values
z_new <- 100*rbeta(n=1000,shape1=m$estimate["shape1"],
shape2=m$estimate["shape2"])

vegan accumulation curve predictions [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm having a matrix of plants(rows) and pollinators(columns) and interaction frequencies within (converted to 0 (no interaction) and 1 (interaction/s present) for this analysis).
I'm using the vegan package and have produced a species accumulation curve.
accum <- specaccum(mydata[1:47,], method = "random", permutations = 1000)
plot(accum)
I now would like to predict how many new pollinator species I would be likely to find with additional plant sampling but can't figure in what format I have to include "newdata" within the predict command. I have tried empty rows and rows with zeros within the matrix but was not able to get results. This is the code I've used for the prediction:
predictaccum1 <- predict(accum, newdata=mydata[48:94,])
The error message:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "specaccum"
The error message does not change if I specify: interpolation = c("linear") or "spline".
Could anyone help please?
Not perhaps the clearest way of putting this, but the documentation says:
newdata: Optional data used in prediction interpreted as number of
sampling units (sites).
It should be a number of sampling units you had. A single number or a vector of numbers will do. However, the predict function cannot extrapolate, but it only interpolates. The nonlinear regression models of fitspecaccum may be able to extrapolate, but should you trust them?
Here a bit about dangers of extrapolation: the non-linear regression models are conventionally used analysing species accumulation data, but none of these is really firmly based on theory -- they are just some nice non-linear regression models. I know of some models that may have a firmer basis, but we haven't implemented them in vegan, neither plan to do so (but contributions are welcome). However, it is possible to get some idea of problems by subsampling your data and seeing if you can estimate the overall number of species with an extrapolation from your subsample. The following shows how to do this with the BCI data in vegan. These data have 50 sample plot with 225 species. We take subsamples of 25 plots and extrapolate to 50:
mod <- c("arrhenius", "gleason", "gitay", "lomolino", "asymp", "gompertz",
"michaelis-menten", "logis", "weibull")
extraps <- matrix(NA, 100, length(mod))
colnames(extraps) <- mod
for(i in 1:nrow(extraps)) {
## use the same accumulation for all nls models
m <- specaccum(BCI[sample(50,25),], "exact")
for(p in mod) {
## need try because some nls models can fail
tmp <- try(predict(fitspecaccum(m, p), newdata=50))
if(!inherits(tmp, "try-error")) extraps[i,p] <- tmp
}
}
When I tried this, most extrapolation models did not include the correct number of species among their predictions, but all values were either higher than correct richness (from worst: Arrhenius, Gitay, Gleason) or lower than correct richness (from worst: logistic, Gompertz, asymptotic, Michaelis-Menten, Lomolino, Weibull; only these two last included the correct richness in their range).
In summary: in lack of theory and adequate model, beware extrapolation.

Adjusting regression weight based on feedback [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
Let's say I want to predict a dependent variable D, where:
D<-rnorm(100)
I cannot observe D, but I know the values of three predictor variables:
I1<-D+rnorm(100,0,10)
I2<-D+rnorm(100,0,30)
I3<-D+rnorm(100,0,50)
I want to predict D by using the following regression equation:
I1 * w1 + I2 * w2 + I3 * w3 = ~D
however, I do not know the correct values of the weights (w), but I would like to fine-tune them by repeating my estimate:
in the first step I use equal weights:
w1= .33, w2=.33, w3=.33
and I estimate D using these weights:
EST= I1 * .33 + I2 * .33 + I3 *. 33
I receive feedback, which is a difference score between D and my estimate (diff=D-EST)
I use this feedback to modify my original weights and fine-tune them to eventually minimize the difference between D and EST.
My question is:
Is the difference score sufficient for being able to fine-tune the weights?
What are some ways of manually fine-tuning the weights? (e.g. can I look at the correlation between diff and I1,I2,I3 and use that as a weight?
The following command,
coefficients(lm(D ~ I1 + I2 + I3))
will give you the ideal weights to minimize diff.
Your defined diff will not tell you enough to manually manipulate the weights correctly as there is no way to isolate the error component of each I.
The correlation between D and the I's is not sufficient either as it only tells you the strength of the predictor, not the weight. If your I's are truly independent (both from each other, all together and w.r.t. D - a strong assumption, but true when using rnorm for each), you could try manipulating one at a time and notice how it affects diff, but using a linear regression model is the simplest way to do it.

Resources