How to implement variance function in R - r

I am trying to calculate the variance of a column from a data frame.I know that there are inbuilt functions var() for calculating the variance but I am not sure how to write a function for variance by passing my data frame column as variable.
var(banknote$Length)*((n-1)/n)

If the vector you're going to take the variance of is 1-dimensional, as in your case, you can simply do:
myvar = function(v) {
m = mean(v)
mean((m - v)^2)
}
This assumes (based on your example) that you don't want to use the n/(n-1) correction.

Related

Julia function for weighted variance returning "wrong" value

I'm trying to calculate the weighted variance using Julia, but when I compare the results
with my own formula, I get a different value.
x = rand(10)
w = Weights(rand(10))
Statistics.var(x,w,corrected=false) #Julia's default function
sum(w.*(x.-mean(x)).^2)/sum(w) #my own formula
When I read the docs for the "var" function, it says that the formula for "corrected=false" is
the one I wrote.
You have to subtract a weighted mean in your formula to get the same result:
sum(w.*(x.-mean(x,w)).^2)/sum(w)
or (to expand it)
sum(w.*(x.- sum(w.*x)/sum(w)).^2)/sum(w)

does boot package in r, use the first return(result) as the observed data to calculate confidence intervals

I am using the function boot in R to do a bootstrap, but instead of passing my dataset directly as the data parameter in the boot function, I pass an index that is used inside the statistic to merge two data tables to get my result. It seems that boot uses the result of the first bootstrap as the real sampled data (say the empirical value). Is this correct? Because when I do the bootstrap manually I get similar results. Although I would expect boot to use 'data' as the original data. I am confused. The CI make sense but I would expect it not to work, unless for the reason I have mentioned.
In short, I have an index vector
x=1:100
and my function
myboot <- function(data,indeces) {
toselect <- data[indeces] # allows boot to select sample
toselect=as.data.table(toselect)
#this is where I use the index for the merge
t=merge(toselect,mydataset,allow.cartesian=TRUE)
return(nrow(t))
}
b <- boot(data=x, statistic=myboot, R=1000)
The results I get
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = x, statistic = myboot, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 397.2477 -0.03669725 11.70803
> boot.ci(b, type="bca")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = b, type = "bca")
Intervals :
Level BCa
95% (375.2, 421.1 )
Yes you are correct.
The function used to compute the statistic has the following requirement (according to the help page):
... In all other cases statistic must take at least two arguments. The first argument passed will always be the original data. The second will be a vector of indices, frequencies or weights which define the bootstrap sample. Further, if predictions are required, then a third argument is required which would be a vector of the random indices used to generate the bootstrap predictions.
Since your dataset consists of the numbers from 1:100 then the second argument passed will sample from 1:100 and will end up producing the exact same result. In other words your data[indeces] line will be identical to indeces.

How to extract saved envelope values in Spatstat?

I am new to both R & spatstat and am working with the inhomogeneous pair correlation function. My dataset consists of point values spread across several time intervals.
sp77.ppp = ppp(sp77.dat$Plot_X, sp77.dat$Plot_Y, window = window77, marks = sp77.dat$STATUS)
Dvall77 = envelope((Y=dv77.ppp[dv77.ppp$marks=='2']),fun=pcfinhom, r=seq(0,20,0.25), nsim=999,divisor = 'd', simulate=expression((rlabel(dv77.ppp)[rlabel(dv77.ppp)$marks=='1']),(rlabel(dv77.ppp)[rlabel(dv77.ppp)$marks=='2'])), savepatterns = T, savefuns = T).
I am trying to compare multiple pairwise comparisons (from different time periods) and need to create a function that will go through for every calculated envelope value, at each ‘r’ value, and find the min and max differences between the envelopes.
My question is: How do I find the saved envelope values? I know that the savefuns = T is saving all the simulated envelope values but I can’t find how to extract the values. The summary (below) says that the values are stored. How do I call the values and extract them?
> print(Dvall77)
Pointwise critical envelopes for g[inhom](r)
and observed value for ‘(Y = dv77.ppp[dv77.ppp$marks == "2"])’
Edge correction: “iso”
Obtained from 999 evaluations of user-supplied expression
(All simulated function values are stored)
(All simulated point patterns are stored)
Alternative: two.sided
Significance level of pointwise Monte Carlo test: 2/1000 = 0.002
.......................................................................................
Math.label Description
r r distance argument r
obs {hat(g)[inhom]^{obs}}(r) observed value of g[inhom](r) for data pattern
mmean {bar(g)[inhom]}(r) sample mean of g[inhom](r) from simulations
lo {hat(g)[inhom]^{lo}}(r) lower pointwise envelope of g[inhom](r) from simulations
hi {hat(g)[inhom]^{hi}}(r) upper pointwise envelope of g[inhom](r) from simulations
.......................................................................................
Default plot formula: .~r
where “.” stands for ‘obs’, ‘mmean’, ‘hi’, ‘lo’
Columns ‘lo’ and ‘hi’ will be plotted as shading (by default)
Recommended range of argument r: [0, 20]
Available range of argument r: [0, 20]
Thanks in advance for any suggestions!
If you are looking to access the values of the summary statistic (ginhom) for each of the randomly labelled patterns this is in principle documented in help(envelope.ppp). Admittedly this is long and if you are new to both R and spatstat it is easy to get lost. The clue is in the value section of the help file. The result is a data.frame with the some additional classes (envelope and fv) and as the help file says:
Additionally, if ‘savepatterns=TRUE’, the return value has an
attribute ‘"simpatterns"’ which is a list containing the ‘nsim’
simulated patterns. If ‘savefuns=TRUE’, the return value has an
attribute ‘"simfuns"’ which is an object of class ‘"fv"’
containing the summary functions computed for each of the ‘nsim’
simulated patterns.
Then of course you need to know how to access an attribute in R, which is done using attr:
funs <- attr(Dvall77, "simfuns")
Then funs is a data.frame (and fv-object) with all the function values for each randomly labelled pattern.
I can't really understand from your question whether you just need the values of the upper and lower curve defining the envelope? In that case you just access them like an ordinary data.frame (and there is no need to save all the individual function values in the envelope):
lo <- Dvall77$lo
hi <- Dvall77$hi
d <- hi - lo
More elegantly you can do:
d <- with(Dvall77, hi - lo)

Accessing defined parameters in a formula in R

I am writing a function that allows many starting parameter combinations to be tried to fit nonlinear regression as nlsList() only allows one set of starting parameters.
I have managed this fine but want to add a predictions data frame into the function for easy plotting that returns the best fit curve at smaller increments of x than the data supplies. For example having 100 points instead of 10 to achieve a nice smooth predicted curve.
In my function arguments I specify the formula as an argument and treat is a formula within the function. Some of the formulas include a function that I make to encompass the non-linear relationship. I then use do.call(function, new.starting params) to pass on the predicted parameters onto a predictions data frame.
I have not found a way of isolating and passing any defined and fixed variables from the function to the do.call() function.
Is there a way to get the values that are defined in the formula? So Tc = 25 in this example...
model = y ~ schoolfield.high(ln.c, Ea, Eh, Th, temp = x, Tc = 25)
formula <- as.formula(model)
vars <- all.vars(formula[[3]])
This returns :
"ln.c" "Ea" "Eh" "Th" "x"
I am wondering if there is a way to isolate defined variables from a formula object, or if there is any other way I could do this?

In R, how can you calculate P-values using cor() when you have multiple variables?

I've got a large data set (120 data points and 10+ variables) that I want to explore using a correlation matrix:
H5<- log[ which(log$Harvest=="e"), ]
H5.cor <- cor(H5[sapply(H5, is.numeric)])
I presented the data using the package corrplot:
corrplot(H5.cor, method = "number")
Additionally to this would like to know the P - value of each of the correlations. I know that I can use cor.test() but as I understand it, it needs an X and a Y value.
Thanks

Resources