How can I use pre bootstrapped data to obtain a BCa CI? - r

I have bootstrapped two variables (one which is already in the "Impala.csv" file) using a function which resamples and reports the mean for a sample the size of nrow(data) for 5000 repetitions. The code is as follows:
data<-read.csv("Impala.csv")
allo<-data$distance
data2<-read.csv("2010 - IM.csv")
pro<-data2$pro
n1<-nrow(data2)
boot4000 <- c()
for(i in 1:5000){
s <- sample(data2$xs,n1,replace=T,prob = data2$pro)
boot4000[i] <- mean(s)
}`
And then combine the two outputs in a formula, giving me 5000 new variables.
d<-(pi/2)*(boot4000*(1/allo))
Now I wish to find the BCa confidence intervals for this, but as I understand, the boot function will require me to make a new set of resamples, but I do not want this as the bootstrapping is complete. All I want now is a function which will take my bootstrapped data as is and determine the BCa confidence interval.
http://www.filedropper.com/impala
http://www.filedropper.com/2010-im
Here are the data files I have used
Also, I have tried to create an object imitating a 'boot' object using the following
den<-as.matrix(d, ncol=1)
outs<-list(t0=mean(d), t=den, R=5000, L=3)
boot.ci(outs, type="bca")
This spits out the error:
Error in if (as.character (boot.out$call[1L]) == "tsboot") warning
("BCa intervals not defined for time series bootstraps") else output
<- C (output,: argument is of length zero

outs <- list(t0=mean(d), t=den, R=5000, sim="ordinary",
stype="i", weights=rep(0.0002,5000), statistic=meanfun,
data=d, call=boot(data=d, statistic = meanfun,R=5000),
strata = rep(1,5000), attr="boot", seed=.Random.seed)
This is how one can make the object of class boot.out.

Related

Is it possible to adapt standard prediction interval code for dlm in R with other distribution?

Using the dlm package in R I fit a dynamic linear model to a time series data set, consisting of 20 observations. I then use the dlmForecast function to predict future values (which I can validate against the genuine data for said period).
I use the following code to create a prediction interval;
ciTheory <- (outer(sapply(fut1$Q, FUN=function(x) sqrt(diag(x))), qnorm(c(0.05,0.95))) +
as.vector(t(fut1$f)))
However my data does not follow a normal distribution and I wondered whether it would be possible to
adapt the qnorm function for other distributions. I have tried qt, but am unable to apply qgamma.......
Just wondered if anyone knew how you would go about sorting this.....
Below is a reproduced version of my code...
library(dlm)
data <- c(20.68502, 17.28549, 12.18363, 13.53479, 15.38779, 16.14770, 20.17536, 43.39321, 42.91027, 49.41402, 59.22262, 55.42043)
mod.build <- function(par) {
dlmModPoly(1, dV = exp(par[1]), dW = exp(par[2]))
}
# Returns most likely estimate of relevant values for parameters
mle <- dlmMLE(a2, rep(0,2), mod.build); #nileMLE$conv
if(mle$convergence==0) print("converged") else print("did not converge")
mod1 <- dlmModPoly(dV = v, dW = c(0, w))
mod1Filt <- dlmFilter(a1, mod1)
fut1 <- dlmForecast(mod1Filt, n = 7)
Cheers

How to predict in kknn function? library(kknn)

I try to use kknn + loop to create a leave-out-one cross validation for a model, and compare that with train.kknn.
I have split the data into two parts: training (80% data), and test (20% data). In the training data, I exclude one point in the loop to manually create LOOCV.
I think something gets wrong in predict(knn.fit, data.test). I have tried to find how to predict in kknn through the kknn package instruction and online but all the examples are "summary(model)" and "table(validation...)" rather than the prediction on a separate test data. The code predict(model, dataset) works successfully in train.kknn function, so I thought I could use the similar arguments in kknn.
I am not sure if there is such a prediction function in kknn. If yes, what arguments should I give?
Look forward to your suggestion. Thank you.
library(kknn)
for (i in 1:nrow(data.train)) {
train.data <- data.train[-i,]
validation.data <- data.train[i,]
knn.fit <- kknn(as.factor(R1)~., train.data, validation.data, k = 40,
kernel = "rectangular", scale = TRUE)
# train.data + validation.data is the 80% data I split.
}
pred.knn <- predict(knn.fit, data.test) # data.test is 20% data.
Here is the error message:
Error in switch(type, raw = object$fit, prob = object$prob,
stop("invalid type for prediction")) : EXPR must be a length 1
vector
Actually I try to compare train.kknn and kknn+loop to compare the results of the leave-out-one CV. I have two more questions:
1) in kknn: is it possible to use another set of data as test data to see the knn.fit prediction?
2) in train.kknn: I split the data and use 80% of the whole data and intend to use the rest 20% for prediction. Is it an correct common practice?
2) Or should I just use the original data (the whole data set) for train.kknn, and create a loop: data[-i,] for training, data[i,] for validation in kknn? So they will be the counterparts?
I find that if I use the training data in the train.kknn function and use prediction on test data set, the best k and kernel are selected and directly used in generating the predicted value based on the test dataset.
In contrast, if I use kknn function and build a loop of different k values, the model generates the corresponding prediction results based on
the test data set each time the k value is changed. Finally, in kknn + loop, the best k is selected based on the best actual prediction accuracy rate of test data. In short, the best k train.kknn selected may not work best on test data.
Thank you.
For objects returned by kknn, predict gives the predicted value or the predicted probabilities of R1 for the single row contained in validation.data:
predict(knn.fit)
predict(knn.fit, type="prob")
The predict command also works on objects returned by train.knn.
For example:
train.kknn.fit <- train.kknn(as.factor(R1)~., data.train, ks = 10,
kernel = "rectangular", scale = TRUE)
class(train.kknn.fit)
# [1] "train.kknn" "kknn"
pred.train.kknn <- predict(train.kknn.fit, data.test)
table(pred.train.kknn, as.factor(data.test$R1))
The train.kknn command implements a leave-one-out method very close to the loop developed by #vcai01. See the following example:
set.seed(43210)
n <- 500
data.train <- data.frame(R1=rbinom(n,1,0.5), matrix(rnorm(n*10), ncol=10))
library(kknn)
pred.kknn <- array(0, nrow(data.train))
for (i in 1:nrow(data.train)) {
train.data <- data.train[-i,]
validation.data <- data.train[i,]
knn.fit <- kknn(as.factor(R1)~., train.data, validation.data, k = 40,
kernel = "rectangular", scale = TRUE)
pred.kknn[i] <- predict(knn.fit)
}
knn.fit <- train.kknn(as.factor(R1)~., data.train, ks = 40,
kernel = "rectangular", scale = TRUE)
pred.train.kknn <- predict(knn.fit, data.train)
table(pred.train.kknn, pred.kknn)
# pred.kknn
# pred.train.kknn 1 2
# 0 374 14
# 1 9 103

how to calculate Probability for CNN model in R?

I have built and trained CNN model for Image classification using MXNET package and I predicted Test result against model data using below snippet of code.
pred_test <- predict(model,test_array)
pred_test_label <- max.col(t(pred_test))-1
print(pred_test_label)
Along with this I wanted to know what is the probability that Test Result matching with Model data, is there any way I can check on this?
You can do something like this:
# Prediction of test set
preds <- predict(model, test.array)
pred.label = max.col(t(preds))-1
accuracy <- function(label, pred) {
ypred = max.col(t(as.array(pred)))
return(sum((as.array(label) + 1) == ypred) / length(label))
}
print(paste0("Finish prediction...accuracy=", accuracy(test.y, preds)))
Add all the elements of pred_test column variable to get say out_sum and then divide every element of pred_text by out_sum. This way now output will sum to one and can be taken it as probability of each output node of CNN.
Alternatively, you can also get probability, if you could configure CNN model as below (note use of out_activation="softmax" below) at the time of model initialization:
model <- mx.mlp(train.x, train.y, hidden_node=10, out_node=5, out_activation="softmax")
Using this configuration, CNN model bound to give output sum to be 1 and thus can be taken each node of output as probability of each class corresponding to the each node of output.

bootstrap proportion confidence interval

I would like to produce confidence intervals for proportions using the boot package if possible.
I have a vector and I would like to set a threshold and then calculate the proportions below the specified level.
After that I would like to use the bootstrap function in the boot package to calculate the confidence intervals for the proportions.
Simple example of what I have so far:
library(boot)
vec <- abs(rnorm(1000)*10) #generate example vector
data_to_tb <- vec
tb <- function(data) {
sum(data < 10, na.rm = FALSE)/length(data) #function for generating the proportion
}
tb(data_to_tb)
boot(data = data_to_tb, statistic = tb, R = 999)
quantile(boot.out$t, c(.025,.975))
However, I get this error message:
> boot(data = data_to_tb, statistic = tb, R = 999)
Error in statistic(data, original, ...) : unused argument (original)
I can not get it to work though, help appreciated
Your problem is your function tb - it needs two arguments. From the help file ?boot
statistic A function which when applied to data returns a vector
containing the statistic(s) of interest. When sim = "parametric", the
first argument to statistic must be the data. For each replicate a
simulated dataset returned by ran.gen will be passed. In all other
cases statistic must take at least two arguments.

R return p-value from glm within cbind

Statistics and R noob wondering if there is there a way to add p-values from a glm onto the end of the output resulting from the following command:
exp(cbind(OR = coef(mod1), confint(mod1)))
Perhaps something like:
summary(mod1)$coefficients[,4]
I realise that this is somewhat of a 'cosmetic' issue but it would be handy nonetheless.
Thanks
You can save the results of summary(mod1), and then access the coefficients table using coefficients.
You can write a function that will do the whole process for you...
OR.summary <- function(x){
# get the summary
xs <- summary(x)
# and the confidence intervals for the coefficients
ci = confint(x)
# the table from the summary object
coefTable <- coefficients(xs)
# replace the Standard error / test statistic columns with the CI
coefTable[,2:3] <- ci
# rename appropriatly
colnames(coefTable)[2:3] <- colnames(ci)
# exponentiate the appropriate columns
coefTable[,1:3] <- exp(coefTable[,1:3])
# return the whole table....
coefTable
}
A more robust approach would be to use a package like rms....

Resources