How to estimate parameters of exponential function with R? - r

I am working on an investigation on ichthyofauna (study of fishes). I need to find the condition factor of the fish.
The steps to find the condition factor are as follows:
1. W = aL^b ... (1)
Where:
W: The weight of fish in grams.
L: Total length of fish in centimeters.
a: Exponent describing the rate of change of weight with length (= the intercept of the regression line on the Y axis).
b: The slope of the regression line (also referred to as the Allometric coefficient).
2. Log w = log a + b log L ... (2)
Where:
a: constant
b: the regression co-efficient
3. K = 100 w/L^b ... (3)
Where:
W: Weight of the fish in grams
L: The total length of the fish in centimeters
b: The value obtained from the length-eight equation formula
I understand that to calculate K, I must first obtain the regression slope (b of 1), then the co-efficient regression (b of 2) and finally K. I need help to do it in R.
I would be very grateful for your support.
Thanks and regards!

so for a very simple regression, you may want to start with a linear model, and you'd do something like this:
reg1 <- lm(log(W) ~ log(L), data=yourdataframename)
then check the summary for coefficients:
summary(reg1)
Note you don't need to take a log of the intercept because it is a column of ones essentially (but it is included implicitly unless you put '-1' in your parameters)

Related

Binomial Experiment

How do I use the Binomial function to solve this experiment:
number of trials -> n=18,
p=10%
success x=2
The answer is 28% . I am using Binomial(18, 0.1) but how I pass the n=2?
julia> d=Binomial(18,0.1)
Binomial{Float64}(n=18, p=0.1)
pdf(d,2)
How can I solve this in Julia?
What you want is the Probability Mass Function, aka the probability, that in a binomial experiment of n Bernoulli independent trials with a probability p of success on each individual trial, we obtain exactly x successes.
The way to answer this question in Julia is, using the Distribution package, to first create the "distribution" object with parameters n and p, and then call the function pdf to this object and the variable x:
using Distributions
n = 18 # number of trials in our experiments
p = 0.1 # probability of success of a single trial
x = 2 # number of successes for which we want to compute the probability/PMF
binomialDistribution = Binomial(n,p)
probOfTwoSuccesses = pdf(binomialDistribution,x)
Note that all the other probability related functions (like cdf, quantile, .. but also rand) work in the same way.. you first build the distribution object, that embed the specific distribution parameters, and then you call the function over the distribution object and the variable you are looking for, e.g. quantile(binomialDistribution,0.9) for 90% quantile.

How to simulate from poisson distribution using simulations from exponential distribution

I am asked to implement an algorithm to simulate from a poisson (lambda) distribution using simulation from an exponential distribution.
I was given the following density:
P(X = k) = P(X1 + · · · + Xk ≤ 1 < X1 + · · · + Xk+1), for k = 1, 2, . . . .
P(X = k) is the poisson with lambda, and Xi is exponential distribution.
I wrote code to simulate the exponential distribution, but have no clue how to simulate a poisson. Could anybody help me about this? Thanks million.
My code:
n<-c(1:k)
u<-runif(k)
x<--log(1-u)/lambda
I'm working on the assumption you (or your instructor) want to do this from first principles rather than just calling the builtin Poisson generator. The algorithm is pretty straightforward. You count how many exponentials you can generate with the specified rate until their sum exceeds 1.
My R is rusty and this sounds like a homework anyway, so I'll express it as pseudo-code:
count <- 0
sum <- 0
repeat {
generate x ~ exp(lambda)
sum <- sum + x
if sum > 1
break
else
count <- count + 1
}
The value of count after you break from the loop is your Poisson outcome for this trial. If you wrap this as a function, return count rather than breaking from the loop.
You can improve this computationally in a couple of ways. The first is to notice that the 1-U term for generating the exponentials has a uniform distribution, and can be replaced by just U. The more significant improvement is obtained by writing the evaluation as maximize i s.t. SUM(-log(Ui) / rate) <= 1, so SUM(log(Ui)) >= -rate.
Now exponentiate both sides and simplify to get
PRODUCT(Ui) >= Exp(-rate).
The right-hand side of this is constant, and can be pre-calculated, reducing the amount of work from k+1 log evaluations and additions to one exponentiation and k+1 multiplications:
count <- 0
product <- 1
threshold = Exp(-lambda)
repeat {
generate u ~ Uniform(0,1)
product <- product * u
if product < threshold
break
else
count <- count + 1
}
Assuming you do the U for 1-U substitution for both implementations, they are algebraically equal and will yield identical answers to within the precision of floating point arithmetic for a given set of U's.
You can use rpois to generate Poisson variates as per above suggestion. However, my understanding of the question is that you wish to do so from first principles rather than using built-in functions. To do this, you need to use the property of the Poisson arrivals stating that the inter-arrival times are exponentially distributed. Therefore we proceed as follows:
Step 1: Generate a (large) sample from the exponential distribution and create vector of cumulative sums. The k-th entry of this vector is the waiting time to the k-th Poisson arrival
Step 2: Measure how many arrivals we see in a unit time interval
Step3: Repeat steps 1 and 2 many times and gather the results into a vector
This will be your sample from the Poisson distribution with the correct rate parameter.
The code:
lambda=20 # for example
out=sapply(1:100000, function(i){
u<-runif(100)
x<--log(1-u)/lambda
y=cumsum(x)
length(which(y<=1))
})
Then you can test the validity vs the built-in function via the Kolmogorov-Smirnov test:
ks.test(out, rpois(100000, lambda))

Formula of computing the Gini Coefficient in fastgini

I use the fastgini package for Stata (https://ideas.repec.org/c/boc/bocode/s456814.html).
I am familiar with the classical formula for the Gini coefficient reported for example in Karagiannis & Kovacevic (2000) (http://onlinelibrary.wiley.com/doi/10.1111/1468-0084.00163/abstract)
Formula I:
Here G is the Gini coefficient, µ the mean value of the distribution, N the sample size and y_i the income of the ith sample unit. Hence, the Gini coefficient computes the difference between all available income pairs in the data and calculates the total of all absolute differences.
This total is then normalized by dividing it by population squared times mean income (and multiplied by two?).
The Gini coefficient ranges between 0 and 1, where 0 means perfect equality (all individuals earn the same) and 1 refers to maximum inequality (1 person earns all the income in the country).
However the fastgini package refers to a different formula (http://fmwww.bc.edu/repec/bocode/f/fastgini.html):
Formula II:
fastgini uses formula:
i=N j=i
SUM W_i*(SUM W_j*X_j - W_i*X_i/2)
i=1 j=1
G = 1 - 2* ----------------------------------
i=N i=N
SUM W_i*X_i * SUM W_i
i=1 i=1
where observations are sorted in ascending order of X.
Here W seems to be the weight, which I don't use, therefore it should be 1 (?). I’m not sure whether formula I and formula II are the same. There are no absolute differences and the result is subtracted from 1 in formula II. I have tried to transform the equations but I don’t get any further.
Could someone give me a hint whether both ways of computing (formula I + formula II) are equivalent?

Defining exponential distribution in R to estimate probabilities

I have a bunch of random variables (X1,....,Xn) which are i.i.d. Exp(1/2) and represent the duration of time of a certain event. So this distribution has obviously an expected value of 2, but I am having problems defining it in R. I did some research and found something about a so-called Monte-Carlo Stimulation, but I don't seem to find what I am looking for in it.
An example of what i want to estimate is: let's say we have 10 random variables (X1,..,X10) distributed as above, and we want to determine for example the probability P([X1+...+X10<=25]).
Thanks.
You don't actually need monte carlo simulation in this case because:
If Xi ~ Exp(λ) then the sum (X1 + ... + Xk) ~ Erlang(k, λ) which is just a Gamma(k, 1/λ) (in (k, θ) parametrization) or Gamma(k, λ) (in (α,β) parametrization) with an integer shape parameter k.
From wikipedia (https://en.wikipedia.org/wiki/Exponential_distribution#Related_distributions)
So, P([X1+...+X10<=25]) can be computed by
pgamma(25, shape=10, rate=0.5)
Are you aware of rexp() function in R? Have a look at documentation page by typing ?rexp in R console.
A quick answer to your Monte Carlo estimation of desired probability:
mean(rowSums(matrix(rexp(1000 * 10, rate = 0.5), 1000, 10)) <= 25)
I have generated 1000 set of 10 exponential samples, putting them into a 1000 * 10 matrix. We take row sum and get a vector of 1000 entries. The proportion of values between 0 and 25 is an empirical estimate of the desired probability.
Thanks, this was helpful! Can I use replicate with this code, to make it look like this: F <- function(n, B=1000) mean(replicate(B,(rexp(10, rate = 0.5)))) but I am unable to output the right result.
replicate here generates a matrix, too, but it is an 10 * 1000 matrix (as opposed to a 1000* 10 one in my answer), so you now need to take colSums. Also, where did you put n?
The correct function would be
F <- function(n, B=1000) mean(colSums(replicate(B, rexp(10, rate = 0.5))) <= n)
For non-Monte Carlo method to your given example, see the other answer. Exponential distribution is a special case of gamma distribution and the latter has additivity property.
I am giving you Monte Carlo method because you name it in your question, and it is applicable beyond your example.

Is there a way to calculate the number of tries X, who's binomial probability falls on given number CPR? (example inside)

Let's assume we have an event with:
number of tries => N = 20
probability of success => p = 0.3
number of successes => X = 0
Probability that the event succeeds X number of times => PR = 0.0007979 (check Binomial Distribution for how the calculation works)
So for X=0-20 we will have different probabilities where PR(X=0) + PR(X=1) + ... + PR(X=20) = 1
Now what I would like to do is generate a random number with regards to the binomial distributions Xn, but without calculating all the specific Probabilities PR(X=0), PR(X=1), etc.
So how I've thought to solve this problem is to generate a random number from 0-1 and check where in this binomial distribution it falls
I.e. Suppose we have (as defined above):
N = 20
p = 0.3
X = ?
Random number = CumulativePR = 0.6
My question is: Is there a way to calculate 'm' for which the sum of previous probabilities falls
PR(X=0) + PR(X=1) ... + PR(X=m-1) < CumulativePR and
PR(X=0) + PR(X=1) ... + PR(X=m) > CumulativePR
You can use the so-called rejection method to sample from a binomial distribution. In that method, it's not necessary to calculate the cumulative probabilities. See for example Section 7.3 of "Numerical Recipes in C" and doubtless many other references. I assume that you can translate the algorithm from C to whatever you want.
Chances are good that there is a library which already contains a sampling algorithm -- what programming language are you working with? Is it important for you to implement the algorithm yourself?

Resources