Pass Different Values for a function parameter in R - r

I am trying to run a forecasting function hw() in R with below parameters:
hw(TS, initial = 'simple', alpha = a,beta= b ,gamma=g,h=future_num)
where alpha, beta, and gamma vary from 0 to 1 with an increment of 0.1.
I want to run this function for all the combination of alpha, beta, and gamma on one go and so have written below code:
volume = 'UpdatedVol1'
TS <- ts(Data_summary[volume],start = c(sy,sm),frequency = freq)
model_out = data.frame()
for(i in 1:length(alpha)){
for(j in 1:length(beta)){
for(k in 1:length(gama)){
### Model Output ###
#alpha = 0.2
fore_hw <- hw(TS, initial = 'simple', alpha =
alpha[i],beta=beta[j],gamma=gama[k],h=future_num)
hw <- data.frame(ForecastVol =as.numeric(round(fore_hw$mean)))
hw$ModelName <- "Ex HW"
hw$Alpha <- alpha[i]
hw$Beta <- beta[j]
hw$Gama <- gama[k]
model_out <- rbind(model_out,hw)
}
}
}
where Data_summary is data frame with 150 rows, freq = 7 and future_num = 300
Is there any possibility to run the hw() function with all alpha, beta, and gamma values at once? or any option to optimise this code as it is taking more than 8 mins to run this code

rbind() inside a loop is notoriously inefficient. See The Second Circle of R Hell in the R Inferno for a detailed explanation.
A quick fix is to put all the model results in a list and then use rbind once at the end:
# create a list and counter
results <- list()
counter <- 1
volume = 'UpdatedVol1'
TS <- ts(Data_summary[volume],start = c(sy,sm),frequency = freq)
model_out = data.frame() ## this seems to be unused??
for(i in 1:length(alpha)){
for(j in 1:length(beta)){
for(k in 1:length(gama)){
### Model Output ###
#alpha = 0.2
fore_hw <- hw(TS, initial = 'simple', alpha =
alpha[i],beta=beta[j],gamma=gama[k],h=future_num)
hw <- data.frame(ForecastVol =as.numeric(round(fore_hw$mean)))
hw$ModelName <- "Ex HW"
hw$Alpha <- alpha[i]
hw$Beta <- beta[j]
hw$Gama <- gama[k]
# add to the list and increment the counter
results[[counter]] <- hw
counter <- counter + 1
}
}
}
results <- do.call(rbind, results) # one rbind at the end
# (you can use `dplyr::bind_rows` or `data.table::rbindlist` to do this faster)
You could certainly do better than this too, but this will be a vast improvement with minimal changes.

Related

List giving an empty values upto penultimate cell

following is the code i am trying to run.The main objective is to run the model for different K values then after calculate the accuracies in order to choose the best K value.
so i thought of using for loop where every model.Result and the respective accuracy is stored in lists.,then after is sent out with respective k values..
but the thing is for the following code...the list isnt having any values from 1:29 and there is predicted values for 30..
k = 1:30
for(l in k){
pre[[l]] = knn(train_dataset,test_dataset,cl = labels_train, k = l)
}
output :
enter image description here
can someone help me out with this....like why the list is coming like that and what should be done in order to get the correct result..and why so..?
Here is a solution, with the models fit using the code in tacoman's comment.
library(class)
set.seed(1) # Make the results reproducible
knn_list <- lapply(1:30, function(l){
knn(train_dataset, test_dataset, cl = labels_train, k = l)
})
ok <- sapply(knn_list, '==', labels_test)
acc <- colMeans(ok)
which(acc == max(acc))
plot(acc, type = "b")
The for loop in the question can also be run, as long as the results list is created beforehand. The results are identical.
set.seed(1) # Make the results reproducible
k <- 1:30
pre <- vector("list", length = 30)
for(l in k){
pre[[l]] <- knn(train_dataset, test_dataset, cl = labels_train, k = l)
}
identical(pre, knn_list)
#[1] TRUE
Example data
set.seed(2021)
n <- nrow(iris)
i <- sample(n, 0.7*n)
train_dataset <- iris[i, -5]
test_dataset <- iris[-i, -5]
labels_train <- iris[i, 5]
labels_test <- iris[-i, 5]

Loop function in R

I am trying to use Auto Arima function for 70 variables. For it I need to create time series 70 times.
dealer1 <- ts(tier2[,2], start=c(2015,8), end = c(2018,12), frequency=12)
dealer2 <- ts(tier2[,3], start=c(2015,8), end = c(2018,12), frequency=12)
dealer3 <- ts(tier2[,4], start=c(2015,8), end = c(2018,12), frequency=12)..and so on.
And then I need to use the Auto Arima function again for 70 variables.
automatic.dealer1 = auto.arima(dealer1, ic = "aicc")
automatic.dealer2 = auto.arima(dealer2, ic = "aicc")
automatic.dealer3 = auto.arima(dealer3, ic = "aicc")... and so on
And then forecast the output:
forecast.dealer1 = forecast(automatic.dealer1, h = 3)$mean
forecast.dealer2 = forecast(automatic.dealer2, h = 3)$mean
forecast.dealer3 = forecast(automatic.dealer3, h = 3)$mean
I am trying to use the for loop in R, but I am getting an error.
What am I doing wrong??
k <- 1
l <- 2
for(i in seq(1,70)){
dealer[k] <- ts(dealer1[,l], start=c(2015,8), end = c(2018,12), frequency=12)
dealer[k]
automatic <- auto.arima(dealer.[k], ic = "aicc")
foreArima <- forecast(automatic, h=3)
automatic
foreArima
k <- k+1
l <- l+1
}
I need the ARIMA model selected for all the 70 variables I have in my data plus the forecast for each one of them to be displayed
Data sample looks like:
first of all you don't need a loop to create a ts object.
You can create a multivariate ts object like this
myts <- ts(tier2[,2:70], , start=c(2015,8), frequency=12)
Here is how you use a for loop in R:
result <- numeric(ncol(myts)) #initialize a vector where the results will be stored.
for (j in 1:ncol(myts)) {
automatic.dealer <- auto.arima(myts[,j], ic = "aicc")
result[j] <- forecast(automatic.dealer, h = 3)$mean
}
Just as additional information: most of time you can avoid loops in R using apply, sapply or lapply. Doing so increases readability of your code and the performance.
EDIT: if you want to save all the results as well you can store the results from auto.arima into a list:
result <- numeric(ncol(myts))
arima.list <- list()
forecast.list <- list()
for (j in 1:ncol(myts)) {
automatic.dealer <- auto.arima(myts[,j], ic = "aicc")
arima.list[[j]] <- automatic.dealer
forecast.list[[j]] <- forecast(automatic.dealer, h = 3)
result[j] <- forecast.list[[j]]$mean
}

Predicting chunks with M models in R

I have dataset (HEART). I split it into chunks. I would like to predict each chunk with his (M=3) previous models. In this case, I would like to predict chunk number 10 - with models 7,8,9. chunk 9 - with models 6,7,8... chunk 4 - with models 1,2,3.
Here is my code:
library(caret)
dat1 <- read.csv(url("http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data"), header = FALSE,sep = ",")
colnames(dat1) <- c(LETTERS[1:(ncol (dat1)-1)],"CLA")
dat1$CLA<-as.factor (dat1$CLA)
chunk <- 30
n <- nrow(dat1)
r <- rep(1:floor(n/chunk),each=chunk)[1:n]
d <- split(dat1,r)
N<-floor(n/chunk)
cart.models <- list()
for(i in 1:N){cart.models[[i]]<-rpart(CLA~ ., data = d[[i]]) }
for (i in (1+M):N) { k=0
for (j in (i-M):(i-1)) {
k=k+1
d[[i]][,(ncol(d[[i]])+k)]<-(predict(cart.models[[j]], d[[i]][,c(-14)], type = "class") )
}
}
I get the following Error:
Error in `[<-.data.frame`(`*tmp*`, , (ncol(d[[i]]) + k), value = c(1L, :
new columns would leave holes after existing columns
Your question is a bit puzzling, you load caret without using any functions from it. The objective seems like a time series analyses but instead of building on one chunk and predicting on the one that comes after it, you have a more complex desire, so createTimeSlices from caret won't do the trick.
You could create custom folds in caret with index and indexOut arguments in trainControl but that would ultimately lead to the creation of more models (21 to be exact) than is required for the presented objective (9). So I do believe loops are an appropriate way:
create the models:
library(rpart)
N <- 9
cart.models <- list()
for(i in 1:N){
cart.models[[i]] <- rpart(CLA~ ., data = d[[i]])
}
N can be 9 since 10 will not be utilized later on.
create a matrix to store the values:
cart.predictions <- matrix(nrow = chunk, ncol = length(4:10)*3)
it should have the same number of rows as there are predictions in each chunk (so 30) and it should have as many columns are there are predictions (three models for 4:10 chunks).
k <- 0 #as a counter
for (j in 4:10) { #prediction on chunks 4:10
p <- j-3
pred <- list()
for(i in p : (p+2)) { #using models (chink - 3) : (chunk - 1)
k = k + 1
predi <- predict(cart.models[[i]], d[[j]], type = "class")
cart.predictions[,k] <- predi
}
}
this creates a numeric matrix for predictions. By default when R converts factors to numeric it gives them numbers: 1 to the first level, 2 to the second etc - so to get the levels (0:4) you can just:
cart.predictions <- as.data.frame(cart.predictions - 1)
to create the column names:
names <- expand.grid(3:1, 4:10)
names$Var1 <- with(names, Var2 - Var1)
colnames(cart.predictions) <- make.names(paste0(names$Var1,"_", names$Var2))
lets check if it correct:
prediction from model 5 on chunk 6 converted to numeric
as.numeric(as.character(predict(cart.models[[5]], d[[6]], type = "class")))
should be equal to
cart.predictions[["X5_6"]] #that's how the names were designed
all.equal(as.numeric(as.character(predict(cart.models[[5]], d[[6]], type = "class"))),
cart.predictions[["X5_6"]])
#output
TRUE
or you can create a character matrix in the first place:
cart.predictions <- matrix(data = NA_character_, nrow = chunk, ncol = length(4:10)*3)
k <- 0 #as a counter
for (j in 4:10) {
p <- j-3
pred <- list()
for(i in p : (p+2)) {
k = k + 1
predi <- predict(cart.models[[i]], d[[j]], type = "class")
cart.predictions[,k] <- predi
}
}
cart.predictions <- as.data.frame(cart.predictions)
This should be the preferred method if the classes are certain "names".

Multi-data likelihood function and mle2 function from bbmle package in R

I have written a custom likelihood function that fits a multi-data model that integrates mark-recapture and telemetry data (sensu Royle et al. 2013 Methods in Ecology and Evolution). The likelihood function is designed to be flexible in terms of whether and how many covariates are specified for different linear models in different likelihood components which is determined by values supplied as function arguments (i.e., data matrices "detcovs" and "dencovs" in my code). The likelihood function works when I directly supply it to optimization functions (e.g., optim or nlm), but does not play nice with the mle2 function in the bbmle package. My problem is that I continually run into the following error: "some named arguments in 'start' are not arguments to the specified log-likelihood function". This is my first attempt at writing custom likelihood functions so I'm sure there are general coding conventions of which I'm unaware that make such tasks much more efficient and amendable to the mle2 function. Below is my likelihood function, code creating the staring value objects, and code calling the mle2 function. Any advice how to solve the error problem and general comments on writing cleaner functions is welcome. Many thanks in advance.
Edit: As requested, I have simplified the likelihood function and provided code to simulate reproducible data to which the model can be fit. Included in the simulation code are 2 custom functions and use of the raster function from the raster package. Hopefully, I have sufficiently simplified everything to enable others to troubleshoot. Again, many thanks for your help!
Jared
Likelihood function:
CSCR.RSF.intlik2.EXAMPLE <- function(alpha0,sigma,alphas=NULL,betas=NULL,n0,yscr=NULL,K=NULL,X=X,trapcovs=NULL,Gden=NULL,Gdet=NULL,ytel=NULL,stel=NULL,
dencovs=NULL,detcovs=NULL){
#
# this version of the code handles a covariate on log(Density). This is starting value 5
#
# start = vector of starting values
# yscr = nind x ntraps encounter matrix
# K = number of occasions
# X = trap locations
# Gden = matrix with grid cell coordinates for density raster
# Gdet = matrix with gride cell coordinates for RSF raster
# dencovs = all covariate values for all nGden pixels in density raster
# trapcovs = covariate value at trap locations
# detcovs = all covariate values for all nGrsf pixels in RSF raster
# ytel = nguys x nGdet matrix of telemetry fixes in each nGdet pixels
# stel = home range center of telemetered individuals, IF you wish to estimate it. Not necessary
# alphas = starting values for RSF/detfn coefficients excluding sigma and intercept
# alpha0 = starting values for RSF/detfn intercept
# sigma = starting value for RSF/detfn sigma
# betas = starting values for density function coefficients
# n0 = starting value for number of undetected individuals on log scale
#
n0 = exp(n0)
nGden = nrow(Gden)
D = e2dist(X,Gden)
nGdet <- nrow(Gdet)
alphas = alphas
loglam = alpha0 -(1/(2*sigma*sigma))*D*D + as.vector(trapcovs%*%alphas) # ztrap recycled over nG
psi = exp(as.vector(dencovs%*%betas))
psi = psi/sum(psi)
probcap = 1-exp(-exp(loglam))
#probcap = (exp(theta0)/(1+exp(theta0)))*exp(-theta1*D*D)
Pm = matrix(NA,nrow=nrow(probcap),ncol=ncol(probcap))
ymat = yscr
ymat = rbind(yscr,rep(0,ncol(yscr)))
lik.marg = rep(NA,nrow(ymat))
for(i in 1:nrow(ymat)){
Pm[1:length(Pm)] = (dbinom(rep(ymat[i,],nGden),rep(K,nGden),probcap[1:length(Pm)],log=TRUE))
lik.cond = exp(colSums(Pm))
lik.marg[i] = sum( lik.cond*psi )
}
nv = c(rep(1,length(lik.marg)-1),n0)
part1 = lgamma(nrow(yscr)+n0+1) - lgamma(n0+1)
part2 = sum(nv*log(lik.marg))
out = -1*(part1+ part2)
lam = t(exp(a0 - (1/(2*sigma*sigma))*t(D2)+ as.vector(detcovs%*%alphas)))# recycle zall over all ytel guys
# lam is now nGdet x nG!
denom = rowSums(lam)
probs = lam/denom # each column is the probs for a guy at column [j]
tel.loglik = -1*sum( ytel*log(probs) )
out = out + tel.loglik
out
}
Data simulation code:
library(raster)
library(bbmle)
e2dist <- function (x, y){
i <- sort(rep(1:nrow(y), nrow(x)))
dvec <- sqrt((x[, 1] - y[i, 1])^2 + (x[, 2] - y[i, 2])^2)
matrix(dvec, nrow = nrow(x), ncol = nrow(y), byrow = F)
}
spcov <- function(R) {
v <- sqrt(nrow(R))
D <- as.matrix(dist(R))
V <- exp(-D/2)
cov1 <- t(chol(V)) %*% rnorm(nrow(R))
Rd <- as.data.frame(R)
colnames(Rd) <- c("x", "y")
Rd$C <- as.numeric((cov1 - mean(cov1)) / sd(cov1))
return(Rd)
}
set.seed(1234)
co <- seq(0.3, 0.7, length=5)
X <- cbind(rep(co, each=5),
rep(co, times=5))
B <- 10
co <- seq(0, 1, length=B)
Z <- cbind(rep(co, each=B), rep(co, times=B))
dencovs <- cbind(spcov(Z),spcov(Z)[,3]) # ordered as reading raster image from left to right, bottom to top
dimnames(dencovs)[[2]][3:4] <- c("dencov1","dencov2")
denr.list <- vector("list",2)
for(i in 1:2){
denr.list[[i]] <- raster(
list(x=seq(0,1,length=10),
y=seq(0,1,length=10),
z=t(matrix(dencovs[,i+2],10,10,byrow=TRUE)))
)
}
B <- 20
co <- seq(0, 1, length=B)
Z <- cbind(rep(co, each=B), rep(co, times=B))
detcovs <- cbind(spcov(Z),spcov(Z)[,3]) # ordered as reading raster image from left to right, bottom to top
dimnames(detcovs)[[2]][3:4] <- c("detcov1","detcov2")
detcov.raster.list <- vector("list",2)
trapcovs <- matrix(0,J,2)
for(i in 1:2){
detr.list[[i]] <- raster(
list(x=seq(0,1,length=20),
y=seq(0,1,length=20),
z=t(matrix(detcovs[,i+2],20,20,byrow=TRUE)))
)
trapcovs[,i] <- extract(detr.list[[i]],X)
}
alpha0 <- -3
sigma <- 0.15
alphas <- c(1,-1)
beta0 <- 3
betas <- c(-1,1)
pixelArea <- (dencovs$y[2] - dencovs$y[1])^2
mu <- exp(beta0 + as.matrix(dencovs[,3:4])%*%betas)*pixelArea
EN <- sum(mu)
N <- rpois(1, EN)
pi <- mu/sum(mu)
s <- dencovs[sample(1:nrow(dencovs), size=N, replace=TRUE, prob=pi),1:2]
J <- nrow(X)
K <- 10
yc <- d <- p <- matrix(NA, N, J)
D <- e2dist(s,X)
loglam <- t(alpha0 - t((1/(2*sigma*sigma))*D*D) + as.vector(trapcovs%*%alphas))
p <- 1-exp(-exp(loglam))
for(i in 1:N) {
for(j in 1:J) {
yc[i,j] <- rbinom(1, K, p[i,j])
}
}
detected <- apply(yc>0, 1, any)
yscr <- yc[detected,]
ntel <- 5
nfixes <- 100
poss.tel <- which(s[,1]>0.2 & s[,1]<0.8 & s[,2]>0.2 & s[,2]<0.8)
stel.id <- sample(poss.tel,ntel)
stel <- s[stel.id,]
ytel <- matrix(NA,ntel,nrow(detcovs))
d <- e2dist(stel,detcovs[,1:2])
lam <- t(exp(1 - t((1/(2*sigma*sigma))*d*d) + as.vector(as.matrix(detcovs[,3:4])%*%alphas)))
for(i in 1:ntel){
ytel[i,] <- rmultinom(1,nfixes,lam[i,]/sum(lam[i,]))
}
Specify starting values and call mle2 function:
start1 <- list(alpha0=alpha0,sigma=sigma,alphas=alphas,betas=betas,n0=log(N-nrow(yscr)))
parnames(CSCR.RSF.intlik2.EXAMPLE) <- names(start)
out1 <- mle2(CSCR.RSF.intlik2.EXAMPLE,start=start1,method="SANN",optimizer="optim",
data=list(yscr=yscr,K=K,X=X,trapcovs=trapcovs,Gden=dencovs[,1:2],Gdet=detcovs[,1:2],
ytel=ytel,stel=stel,dencovs=as.matrix(dencovs[,3:4]),detcovs=as.matrix(detcovs[,3:4]))
)

Computing correlation coefficients for an AR(2) process in R

I have a time series problem that I could easily work out manually, only it would take kind of a long time since I have 4 different AR(2) processes and want to calculate at least 20 lags for each.
What I want to do is use the Yule Walker equation for rho as follows:
I have an auto regressive process of second order, AR(2). Phi(1) is 0.6 and Phi(2) is 0.4.
I want to calculate the correlation coefficients rho(k) for all lags up to k = 20.
So rho(0) would naturally be 1 and rho(-1) = rho(1). Therefore
rho(1) = phi(1) + phi(2)*rho(1)
rho(k) = phi(1)*rho(k-1) + phi(2)*rho(k-2)
Now I want to solve this in R, but I have no idea how to start, can anyone help me out here?
You can try my program in R languages,
In R Script:
AR2 <- function(Zt,tetha0,phi1,phi2,nlag)
{
n <- length(Zt)
Zbar <- mean(Zt)
Zt1 <- rep(Zbar,n)
for(i in 2:n){Zt1[i] <- Zt[i-1]}
Zt2 <- rep(Zbar,n)
for(i in 3:n){Zt1[i] <- Zt[i-2]}
Zhat <- tetha0+phi1*Zt1+phi2*Zt2
error <- Zt-Zhat
ACF(error,nlag)
}
ACF <- function(error,nlag)
{
n <- length(error)
rho <- rep(0,nlag)
for(k in 1:nlag)
{
a <- 0
b <- 0
for(t in 1:(n-k)){a <- a+(error[t]*error[t+k])}
for(t in 1:n){b <- b+(error[t]^2)}
rho[k] <- a/b
}
return(rho)
}
In R console:
Let you have a Zt series, tetha(0) = 0, phi(1) = 0.6, phi(2) = 0.4, and number of lag = 20
AR2(Zt,0,0.6,0.4,20)

Resources