Julia function for weighted variance returning "wrong" value - julia

I'm trying to calculate the weighted variance using Julia, but when I compare the results
with my own formula, I get a different value.
x = rand(10)
w = Weights(rand(10))
Statistics.var(x,w,corrected=false) #Julia's default function
sum(w.*(x.-mean(x)).^2)/sum(w) #my own formula
When I read the docs for the "var" function, it says that the formula for "corrected=false" is
the one I wrote.

You have to subtract a weighted mean in your formula to get the same result:
sum(w.*(x.-mean(x,w)).^2)/sum(w)
or (to expand it)
sum(w.*(x.- sum(w.*x)/sum(w)).^2)/sum(w)

Related

Bootstraping weighted functions in Julia

I am trying to use Bootstrap.jl functions to obtain the Standard Error (SE) of a weighted function (e.g. a weighted median).
See below the Bootstrap.bootstrap code to obtain the SE of an unweighted median.
using StatsBase, DataFrames, Bootstrap
v = collect(1:1:20)
bootstrap(median, v, BasicSampling(100))
I would now need to pass a second argument to median above to obtain the SE of the weighted median. Outside of the bootstrap function, this looks like:
w = collect(0.1:0.1:2)
median(v, Weights(w))
How can I pass a second argument to the median function inside bootstrap to include the weights? Notice that the bootstrap resampling should be applied to both vectors, drawing the same indices for both of them.
You can pass a DataFrame containing both vectors to the second argument of bootstrap. Then write an anonymous function to use each of the columns within median. E.g.
df = DataFrame(v = collect(1:1:20),
w = collect(0.1:0.1:2))
bootstrap(d -> median(d[!,:v], Weights(d[!,:w])), df, BasicSampling(100))

Computing ECDF of a data for parameter estimation using weighted nonlinear least square in R

I am writing a code for estimating the parameter of a GPD using weighted nonlinear least square(WNLS) method.
The WNLS method consist of 2 steps
step 1: $(\hat{\xi_1} , \hat{b_1}) = arg\ \min_{(\xi,b)} \sum_{i=1}^{n} [\log(1-F_n(x_i)) - log(1-G_{\xi,b}(x_i))]$,
here $F_n$ is the ECDF and $1-G_{\xi,b}$ is the generalized pareto distribution.
Can anyone let me know how to calculate EDF function $F_n$ for a data "X" in R?
Does ecdf(X)(X) will calculate the ECDF? If so then, what is the need for ecdf(X) other than plotting? Also it would be really helpful if someone share some example code which involves the calculation of ECDF for data.
The ecdf call creates a function. That is, you can apply ecdf(X) to other data, as your ecdf(X)(X) call does. However, you might want to apply ecdf(X) to something other than X itself. If you want to know the empirical quantile to which three numbers a, b, and c_ correspond, an easy way to do that is to call ecdf(X)(c(a, b, c_)).

Calculating the MSE by definition vs. Var - Bias

So I am trying to calculate the MSE in two ways.
Say T is an estimator for the value t.
First I am trying to calculate it in R by using the theorem:
MSE(T) = Var(T) + (Bias(T))^2
Secondly, I am trying to calculate it in R by definition, i.e. MSE(T) = E((T-t)^2).
And say that T is an unbiased estimator, i.e. Bias(T) = 0
So in R, MSE(T) = Var(T) which we can just in R: var(T)
But when I try calculating the MSE by definition I get a different number from Var(T)...
And I think that my formula that I wrote in R is wrong, this is what I wrote for MSE definition in R:
It was suggested that "weighted.mean" is equivalent to the "expected value" function.
So I wrote: weighted.mean( (T - 2)^2) where my t = 2.
I hope I provided enough information to get help, thanks in advance.

Coding weighted mean (R)

I am having trouble with a piece of my code. I want to perform a weighted mean but the value I get is not the value I obtain if I calculate the weighed mean myself.
Here's how I'm coding the weighted mean:
weighted.mean(x = dataset$A[rows], weights = weights)
The variable is "dataset$A" and the rows I'm using for the weighted mean are listed in "rows" (there are 2 rows). The weights are listed in "weights."
Here's how I'm calculating it myself:
dataset$A_MEAN[rows[1]]*weights[1] + dataset$A_MEAN[rows[2]]*weights[2]
Why is there a difference with these two lines of code?
I tried with the following values:
dataset$A = [45792.76, 64984.67]
weights = [0.3253927, 0.6746073]
The first line of code returns: 55388.71
The second line of code returns: 58739.76
Thank you so much! I am sure that this is something minor, but it's driving me nuts!
Check your use of weighted.mean
The arguments weights should be w:
weighted.mean(x = dataset$A[rows], w = weights) should give you what you want.
When calling a function, you can make sure that you're using the correct variable names by reading the function's documentation with ?weighted.mean

How to implement variance function in R

I am trying to calculate the variance of a column from a data frame.I know that there are inbuilt functions var() for calculating the variance but I am not sure how to write a function for variance by passing my data frame column as variable.
var(banknote$Length)*((n-1)/n)
If the vector you're going to take the variance of is 1-dimensional, as in your case, you can simply do:
myvar = function(v) {
m = mean(v)
mean((m - v)^2)
}
This assumes (based on your example) that you don't want to use the n/(n-1) correction.

Resources