Neural Network Prediction Intervals in R - r

I am trying to compute prediction intervals for my neural network created with the neuralnet package.
I use R in Tableau Software, by creating .RData files containing my functions and loaded in Tableau.
It's a simple NN, with one hidden layer containing 5 nodes. I searched and found this package : nnetpredint
So I tried to use it, using their examples.
I tried also to change the way I use it (train/test in same data frame, separated data frames with the same columns names etc.)
And the best result I had was the prediction, but without the lowerBound and upperBound columns.
In fact, I got exactly the same result as when I use compute(myNN, etc.), but I don't have the second and third columns.
Thanks for your help,
EDIT :
My data is coming from tableau, my function take five parameters which are :
ValuesToExplain,train1,train2,test1,test2.
Then, i create and train my NN with the 3first and try to compute the two last.
(test1 = k*train1 and test2 = k2*train2 for now but it will probably move in the future).
Here is my whole code :
NNetwork <- function(objectiveValues, knownValues1, knownValues2, newData, newData2){
numberOfColumn = 3
##Create the training dataframe
training <- data.frame(objectiveValues, knownValues1,knownValues2)
training[which(is.na(training[,"objectiveValues"])),"objectiveValues"]<- mean(training[,"objectiveValues"], na.rm = TRUE)
training[which(is.na(training[,"knownValues1"])),"knownValues1"]<- mean(training[,"knownValues1"], na.rm = TRUE)
training[which(is.na(training[,"knownValues2"])),"knownValues2"]<- mean(training[,"knownValues2"], na.rm = TRUE)
## Create the testing dataframe
testing <- data.frame(objectiveValues,newData,newData2)
names(testing) <- c("objectiveValues", "knownValues1", "knownValues2")
testing[which(is.na(testing[,"objectiveValues"])),"objectiveValues"]<- mean(testing[,"objectiveValues"], na.rm = TRUE)
testing[which(is.na(testing[,"knownValues1"])),"knownValues1"]<- mean(testing[,"knownValues1"], na.rm = TRUE)
testing[which(is.na(testing[,"knownValues2"])),"knownValues2"]<- mean(testing[,"knownValues2"], na.rm = TRUE)
## Scaling
maxs <- apply(training, 2, max)
mins <- apply(training, 2, min)
trainingScaled <- as.data.frame(scale(training, center = mins, scale = maxs - mins))
testingScaled <- as.data.frame(scale(testing, center = mins, scale = maxs - mins))
### NeuralNetwork Part
library(neuralnet)
n <- names(trainingScaled)
f <- as.formula(paste("objectiveValues ~", paste(n[!n %in% "objectiveValues"], collapse = " + ")))
# Training NN
nn <- neuralnet(f, data=trainingScaled,hidden=5,linear.output=TRUE)
# Using NN
computedTrainingScaled <- compute(nn,trainingScaled[,2:numberOfColumn])
computedFromNNScaled <- compute(nn,testingScaled[,2:numberOfColumn])
# UnScaling
computedTraining <- computedTrainingScaled$net.result*(max(training$objectiveValues)-min(training$objectiveValues))+min(training$objectiveValues)
computedFromNN <- computedFromNNScaled$net.result*(max(training$objectiveValues)-min(training$objectiveValues))+min(training$objectiveValues)
RSquare = (1-( (sum((training$objectiveValues - computedTraining)^2))/(sum((training$objectiveValues - mean(training$objectiveValues))^2)) ))*100
RSE = sum((training$objectiveValues - computedTraining)^2)/nrow(training)
res <- (1:nrow(training))
library(nnetpredint) # Getting prediction confidence interval
x <- trainingScaled[,2:numberOfColumn]
y <- trainingScaled[1]
newData <- testingScaled[,2:numberOfColumn]
# S3 generic method: Object of nn
yPredInt <- nnetPredInt(nn, x, y, newData)
for(i in 1:nrow(training)){
res[i] <- paste(computedFromNN[i],RSquare,RSE, sep="#")
}
return(res)
}
save(NNetwork, file = "NNetwork.RData")
Here, i removed the part using the nnetpredint pckage because it was not working, but it was like this :
library(nnetpredint)
y <- trainingScaled
x <- trainingScaled[,2:3]
newData <- testingScaled[,2:3]
yPredInt <- nnetPredInt(nn, x, y, newData)
My problem is that when I try to access yPredInt$lowerBound or yPredInt$upperBound , they don't exist.

Related

Expand for-loop to accommodate list in R?

I've recently been interested in trying to develop a for-loop that would be able to run multiple generalized additive models and then produce results in a table that ranks them based on AIC, p-value of each smooth in the model, deviance explained of the overall model, etc.
I found this related question in stack overflow which is basically what I want and was able to run this well for gam() instead of gamm(), however I want to expand this to include multiple independent variables in the model, not just 1.
Ideally, the models would run all possible combinations of independent variables against the dependent variable, and it would test combinations anywhere from 1 independent variable in the model, up to all of the possible covariates in "d_pred" in the model.
I have attempted to do this so far by starting out small and finding all possible combinations of 2 independent variables (df_combinations2), which results in a list of data frames. Then I adjusted the rest of the code to run the for loop such that each iteration will run a different combination of the two variables:
library(mgcv)
## Example data
set.seed(0)
dat <- gamSim(1,n=200,scale=2)
set.seed(1)
dat2 <- gamSim(1,n=200,scale=2)
names(dat2)[1:5] <- c("y1", paste0("x", 4:7))
d <- cbind(dat[, 1:5], dat2[, 1:5])
d_resp <- d[ c("y", "y1")]
d_pred <- d[, !(colnames(d) %in% c("y", "y1"))]
df_combinations2 <- lapply(1:(ncol(combn(1:ncol(d_pred), m = 2))),
function(y) d_pred[, combn(1:ncol(d_pred), m = 2)[,y]])
## create a "matrix" list of dimensions i x j
results_m2 <-lapply(1:length(df_combinations2), matrix, data= NA, nrow=ncol(d_resp), ncol=2)
## for-loop
for(k in 1:length(df_combinations2)){
for(i in 1:ncol(d_resp)){
for(j in 1:ncol(df_combinations2[[k]])){
results_m2[i, j][[1]] <- gam(d_resp[, i] ~ s(df_combinations2[[k]][,1])+s(df_combinations2[[k]][,2]))
}
}}
However, after running the for-loop I get the error "Error in all.vars1(gp$fake.formula[-2]) : can't handle [[ in formula".
Anyone know why I am getting this error/ how to fix it?
Any insight is much appreciated. Thanks!
Personally, I would create a data.table() containing all combinations of target variables and combinations of predictors and loop through all rows. See below.
library(data.table)
library(dplyr)
# Example data
set.seed(0)
dat <- gamSim(1,n=200,scale=2)
set.seed(1)
dat2 <- gamSim(1,n=200,scale=2)
names(dat2)[1:5] <- c("y1", paste0("x", 4:7))
d <- cbind(dat[, 1:5], dat2[, 1:5])
#select names of targets and predictors
targets <- c("y", "y1")
predictors <- colnames(d)[!colnames(d) %in% targets]
#create all combinations of predictors
predictor_combinations <- lapply(1:length(predictors), FUN = function(x){
#create combination
combination <- combn(predictors, m = x) |> as.data.table()
#add s() to all for gam
combination <- sapply(combination, FUN = function(y) paste0("s(", y, ")")) |> as.data.table()
#collapse
combination <- summarize_all(combination, .funs = paste0, collapse = "+")
#unlist
combination <- unlist(combination)
#remove names
names(combination) <- NULL
#return
return(combination)
})
#merge combinations of predictors as vector
predictor_combinations <- do.call(c, predictor_combinations)
#create folder to save results to
if(!dir.exists("dev")){
dir.create("dev")
}
if(!dir.exists("dev/models")){
dir.create("dev/models")
}
#create and save hypergrid (all combinations of targets and predictors combinations)
if(!file.exists("dev/hypergrid.csv")){
#create hypergrid and save to dev
hypergrid <- expand.grid(target = targets, predictors = predictor_combinations) |> as.data.table()
#add identifier
hypergrid[, model := paste0("model", 1:nrow(hypergrid))]
#save to dev
fwrite(hypergrid, file = "dev/hypergrid.csv")
} else{
#if file exists read
hypergrid <- fread("dev/hypergrid.csv")
}
#loop through hypergrid, create GAM models
#progressbar
pb <- txtProgressBar(min = 1, max = nrow(hypergrid), style = 3)
for(i in 1:nrow(hypergrid)){
#update progressbar
setTxtProgressBar(pb, i)
#select target
target <- hypergrid[i,]$target
#select predictors
predictors <- hypergrid[i,]$predictors
#create formula
gam.formula <- as.formula(paste0(target, "~", predictors))
#run gam
gam.model <- gam(gam.formula, data = d)
#save gam model do dev/model
saveRDS(gam.model, file = paste0("dev/models/", hypergrid[i,]$model, ".RDS"))
}
#example where you extract model performances
for(i in 1:nrow(hypergrid)){
#read the right model
rel.model <- readRDS(paste0("dev/models/", hypergrid[i,]$model, ".RDS"))
#extract model performance, add to hypergrid
hypergrid[i, R2 := summary(rel.model)[["r.sq"]]]
}
#arrange hypergrid on target and r2
hypergrid <- dplyr::arrange(hypergrid, hypergrid$target, desc(hypergrid$R2))
Which would give
head(hypergrid)
target predictors model R2
1: y s(x0)+s(x1)+s(x2)+s(x4)+s(x5) model319 0.6957242
2: y s(x0)+s(x1)+s(x2)+s(x3)+s(x4)+s(x5) model423 0.6953753
3: y s(x0)+s(x1)+s(x2)+s(x4)+s(x5)+s(x7) model437 0.6942054
4: y s(x0)+s(x1)+s(x2)+s(x5) model175 0.6941025
5: y s(x0)+s(x1)+s(x2)+s(x4)+s(x5)+s(x6) model435 0.6940569
6: y s(x0)+s(x1)+s(x2)+s(x3)+s(x4)+s(x5)+s(x7) model481 0.6939756
All models are saved to a folder with an identifier (for if you want to use the model or extract more information from the model).
Notably, p-hacking comes to mind using this appraoch and I would be careful by conducting your analysis like this.

mixture copula in R

I want to use mixture copula for reliability analysis, now ,with the help of a friend ,I've already finished it ‘RVMs_fitted’ 。now i want to perform the probability integral transformation (PIT),but the function of RVINEPIT can’t use,because RVINEPIT(data,RVM),this RVM not RVINEMATRIX Here is my code:
library(vineclust)
data1 <- read.csv("D:/ASTUDY/Rlanguage/Mix copula/data.csv", header = FALSE)
fit <- vcmm(data = data1, total_comp=3,is_cvine = 0)
print(fit)
summary(fit)
RVMs_fitted <- list()
RVMs_fitted[[1]] <- VineCopula::RVineMatrix(Matrix=fit$output$vine_structure[,,1],
family=fit$output$bicop_familyset[,,1],
par=fit$output$bicop_param[,,1],
par2=fit$output$bicop_param2[,,1])
RVMs_fitted[[2]] <- VineCopula::RVineMatrix(Matrix=fit$output$vine_structure[,,2],
family=fit$output$bicop_familyset[,,2],
par=fit$output$bicop_param[,,2],
par2=fit$output$bicop_param2[,,2])
RVMs_fitted[[3]] <- VineCopula::RVineMatrix(Matrix=fit$output$vine_structure[,,3],
family=fit$output$bicop_familyset[,,3],
par=fit$output$bicop_param[,,3],
par2=fit$output$bicop_param2[,,3])
RVM<-RVMs_fitted
meanx <- c(0.47,0.508,0.45,0.52,0.48)
sigmax <- c(0.318,0.322,0.296,0.29,0.279)
ux1<-pnorm(x[1],meanx[1],sigmax[1])
ux2<-pnorm(x[2],meanx[2],sigmax[2])
ux3<-pnorm(x[3],meanx[3],sigmax[3])
ux4<-pnorm(x[4],meanx[4],sigmax[4])
ux5<-pnorm(x[5],meanx[5],sigmax[5])
data <- c(ux1,ux2,ux3,ux4,ux5)
du=RVinePIT(data, RVM)
y=t(qnorm(t(du)))
Error:
In RVinePIT: RVM has to be an RVineMatrix object.
You have multiple problems here:
RVM is a list. However, you tried to fit RVinePIT to a list, while it works for one data at a time.
The same holds for the y.
I do not have your data, but try it with other data.
Here is the code (it should work):
library(vineclust)
library(VineCopula)
data1 <- read.csv("D:/ASTUDY/Rlanguage/Mix copula/data.csv", header = FALSE)
fit <- vcmm(data = data, total_comp=3,is_cvine = 0)
print(fit)
summary(fit)
RVMs_fitted <- list()
RVMs_fitted[[1]] <- RVineMatrix(Matrix=fit$output$vine_structure[,,1],
family=fit$output$bicop_familyset[,,1],
par=fit$output$bicop_param[,,1],
par2=fit$output$bicop_param2[,,1])
RVMs_fitted[[2]] <- RVineMatrix(Matrix=fit$output$vine_structure[,,2],
family=fit$output$bicop_familyset[,,2],
par=fit$output$bicop_param[,,2],
par2=fit$output$bicop_param2[,,2])
RVMs_fitted[[3]] <- RVineMatrix(Matrix=fit$output$vine_structure[,,3],
family=fit$output$bicop_familyset[,,3],
par=fit$output$bicop_param[,,3],
par2=fit$output$bicop_param2[,,3])
RVM<-RVMs_fitted
meanx <- c(0.47,0.508,0.45,0.52,0.48)
sigmax <- c(0.318,0.322,0.296,0.29,0.279)
ux1<-pnorm(x[1],meanx[1],sigmax[1])
ux2<-pnorm(x[2],meanx[2],sigmax[2])
ux3<-pnorm(x[3],meanx[3],sigmax[3])
ux4<-pnorm(x[4],meanx[4],sigmax[4])
ux5<-pnorm(x[5],meanx[5],sigmax[5])
data <- c(ux1,ux2,ux3,ux4,ux5)### This must be a matrix to work with RVinePIT
du=lapply(1:3, function(i) RVinePIT(data, RVM[[i]]))
y <-lapply(1:3, function(i) t(qnorm(t(du[[i]]))))

Accessing a variable in a data frame by columns number in R?

I have a data frame as "df" and 41 variables var1 to var41. If I write this command
pcdtest(plm(var1~ 1 , data = df, model = "pooling"))[[1]]
I can see the test value. But I need to apply this test 41 times. I want to access variable by column number which is "df[1]" for "var1" and "df[41]" for "var41"
pcdtest(plm(df[1]~ 1 , data = dfp, model = "pooling"))[[1]]
But it fails. Could you please help me to do this? I will have result in for loop. And I will calculate the descriptive statistics for all the results. But it is very difficult to do test for each variable.
I think you can easily adapt the following code to your data. Since you didn't provide any of your data, I used data that comes with the plm package.
library(plm) # for pcdtest
# example data from plm package
data("Cigar" , package = "plm")
Cigar[ , "fact1"] <- c(0,1)
Cigar[ , "fact2"] <- c(1,0)
Cigar.p <- pdata.frame(Cigar)
# example for one column
p_model <- plm(formula = pop~1, data = Cigar.p, model = "pooling")
pcdtest(p_model)[[1]]
# run through multiple models
l_plm_models <- list() # store plm models in this list
l_tests <- list() # store testresults in this list
for(i in 3:ncol(Cigar.p)){ # start in the third column, since the first two are state and year
fmla <- as.formula(paste(names(Cigar.p)[i], '~ 1', sep = ""))
l_plm_models[[i]] <- plm(formula = as.formula(paste0(colnames(Cigar.p)[i], "~ 1", sep = "")),
data = Cigar.p,
model = "pooling")
l_tests[[i]] <- pcdtest(l_plm_models[[i]])[[1]]
}
testresult <- data.frame("z" = unlist(l_tests), row.names = (colnames(Cigar.p[3:11])))
> testresult
z
price 175.36476
pop 130.45774
pop16 155.29092
cpi 176.21010
ndi 175.51938
sales 99.02973
pimin 175.74600
fact1 176.21010
fact2 176.21010
# example for cipstest
matrix_results <- matrix(NA, nrow = 11, ncol = 2) # use 41 here for your df
l_ctest <- list()
for(i in 3:ncol(Cigar.p)){
l_ctest[[i]] <- cipstest(Cigar.p[, i], lags = 4, type = 'none', model = 'cmg', truncated = F)
matrix_results[i, 1] <- as.numeric(l_ctest[[i]][1])
matrix_results[i, 2] <- as.numeric(l_ctest[[i]][7])
}
res <- data.frame(matrix_results)
names(res) <- c('cips-statistic', 'p-value')
print(res)
Try using as.formula(), for example:
results <- list()
for (i in 1:41){
varName <- paste0('var',i)
frml <- paste0(varName, ' ~ 1')
results[[i]] <-
pcdtest(plm(as.formula(frml) , data = dfp, model = "pooling"))[[1]]
}
You can use reformulate to create the formula and apply the code for 41 times using lapply :
var <- paste0('var', 1:41)
result <- lapply(var, function(x) pcdtest(plm(reformulate('1', x),
data = df, model = "pooling"))[[1]])

How to use weighted last squares while reconciling custom forecasts using hts::combinef()?

I am trying to reconcile custom forecasts using the combinef function from the hts package.
I want to compare recombined automated approaches and recombined custom forecasts
(https://otexts.com/fpp2/reconciliation.html)
The automated coutertpart for combinef (forecast.gts) is very user friendly.
If a bottom up forecast is required then you can set method = "comb".
If reconciliation is required then you can for example choose between weighted last squares, ordinary last squares and structural scaling by setting the parameter weights to weights = c("wls", "ols", "nseries").
For combinef() the default of the weights parameter are the ordinary last squares, therefore this approach can be easily implemented.
In another thread it was already explained how to apply the bottom up approach (How to get top down forecasts using `hts::combinef()`?).
I am now interested in also using "wls" as well als "nseries".
Is there an easy way to implement this?
Edit
I had a deeper look into the raw code of the forecast.gts() and the combinef() function. I ended up with this custom code:
library(hts)
library(rlist)
#forecast grouped time series by custom function
ally_df <- aggts(htseg1) %>% as.data.frame
forecast_list <- apply(ally_df, 2, function(x){x %>% auto.arima %>% forecast(h = 12)})
ally_fitted <- lapply(forecast_list, function(x){x$fitted %>% as.data.frame}) %>% list.cbind
colnames(ally_fitted) <- colnames(ally)
ally_forecast <- lapply(forecast_list, function(x){x$mean %>% as.data.frame}) %>% list.cbind
colnames(ally_forecast) <- colnames(ally)
#create weights for reconciliation
recomb_approaches <- c("wls", "ols", "nseries", "bu")
recomb_approach <- recomb_approaches[1]
if(recomb_approach == "bu"){
weights <- c(rep(0, ncol(ally_df)-ncol(htseg1$bts)), rep(1, ncol(htseg1$bts)))
}else if(recomb_approach == "ols"){
weights <- NULL
}else if(recomb_approach == "wls"){
tmp.resid <- ally_df - ally_fitted
weights <- 1/colMeans(tmp.resid^2, na.rm = TRUE)
}else if(recomb_approach == "nseries"){
# A function to calculate No. of groups at each level
Mlevel <- function(xgroup) {
m <- apply(xgroup, 1, function(x) length(unique(x)))
return(m)
}
# A function to get the inverse of row sums of S matrix
InvS4g <- function(xgroup) {
mlevel <- Mlevel(xgroup)
len <- length(mlevel)
repcount <- mlevel[len]/mlevel
inv.s <- 1/unlist(mapply(rep, repcount, mlevel, SIMPLIFY = FALSE))
return(inv.s)
}
weights <- InvS4g(htseg1$groups)
}
ally_forecast_recombined_df <- combinef(ally_forecast
, nodes = get_nodes(htseg1)
, weights = weights
, algorithms = "lu"
, keep = "bottom"
, parallel = TRUE
, num.cores = cores
)
Will this do the trick?
WLS forecast reconciliation uses the one-step forecast variances which are equal to the residual variances. Here is some code to do it in a small example.
library(hts)
h <- 12
ally <- aggts(htseg1)
nseries <- NCOL(ally)
allf <- ts(matrix(NA, nrow = h, ncol = nseries))
resid_var <- numeric(nseries)
for(i in seq(nseries)) {
fc <- forecast(auto.arima(ally[,i]), h = h, level=95)
resid_var[i] <- fc$model$sigma2
allf[,i] <- fc$mean
}
tsp(allf) <- tsp(fc$mean)
wls <- combinef(allf, get_nodes(htseg1), weights = resid_var,
keep = "gts", algorithms = "lu")
Created on 2020-07-03 by the reprex package (v0.3.0)

Reqsubsets results differ with coef() for model with linear dependencies

while using Regsubsets from package leaps on data with linear dependencies, I found that results given by coef() and by summary()$which differs. It seems that, when linear dependencies are found, reordering changes position of coefficients and coef() returns wrong values.
I use mtcars just to "simulate" the problem I had with other data. In first example there is no issue of lin. dependencies and best given model by BIC is mpg~wt+cyl and both coef(),summary()$which gives the same result. In second example I add dummy variable so there is possibility of perfect multicollinearity, but variables in this order (dummy in last column) don't cause the problem. In last example after changing order of variables in dataset, the problem finally appears and coef(),summary()$which gives different models. Is there anything incorrect in this approach? Is there any other way to get coefficients from regsubsets?
require("leaps") #install.packages("leaps")
###Example1
dta <- mtcars[,c("mpg","cyl","am","wt","hp") ]
bestSubset.cars <- regsubsets(mpg~., data=dta)
(best.sum <- summary(bestSubset.cars))
#
w <- which.min(best.sum$bic)
best.sum$which[w,]
#
best.sum$outmat
coef(bestSubset.cars, w)
#
###Example2
dta2 <- cbind(dta, manual=as.numeric(!dta$am))
bestSubset.cars2 <- regsubsets(mpg~., data=dta)
(best.sum2 <- summary(bestSubset.cars2))
#
w <- which.min(best.sum2$bic)
best.sum2$which[w,]
#
coef(bestSubset.cars2, w)
#
###Example3
bestSubset.cars3 <- regsubsets(mpg~., data=dta2[,c("mpg","manual","am","cyl","wt","hp")])
(best.sum3 <- summary(bestSubset.cars3))
#
w <- which.min(best.sum3$bic)
best.sum3$which[w,]
#
coef(bestSubset.cars3, w)
#
best.sum2$which
coef(bestSubset.cars2,1:4)
best.sum3$which
coef(bestSubset.cars3,1:4)
The order of vars by summary.regsubsets and regsubsets are different. The generic function coef() of regsubsets calls those two in one function, and the results are in mess if you are trying to force.in or using formula with fixed order. Changing some lines in the coef() function might help. Try codes below, see if it works!
coef.regsubsets <- function (object, id, vcov = FALSE, ...)
{
s <- summary(object)
invars <- s$which[id, , drop = FALSE]
betas <- vector("list", length(id))
for (i in 1:length(id)) {
# added
var.name <- names(which(invars[i, ]))
thismodel <- which(object$xnames %in% var.name)
names(thismodel) <- var.name
# deleted
#thismodel <- which(invars[i, ])
qr <- .Fortran("REORDR", np = as.integer(object$np),
nrbar = as.integer(object$nrbar), vorder = as.integer(object$vorder),
d = as.double(object$d), rbar = as.double(object$rbar),
thetab = as.double(object$thetab), rss = as.double(object$rss),
tol = as.double(object$tol), list = as.integer(thismodel),
n = as.integer(length(thismodel)), pos1 = 1L, ier = integer(1))
beta <- .Fortran("REGCF", np = as.integer(qr$np), nrbar = as.integer(qr$nrbar),
d = as.double(qr$d), rbar = as.double(qr$rbar), thetab = as.double(qr$thetab),
tol = as.double(qr$tol), beta = numeric(length(thismodel)),
nreq = as.integer(length(thismodel)), ier = numeric(1))$beta
names(beta) <- object$xnames[qr$vorder[1:qr$n]]
reorder <- order(qr$vorder[1:qr$n])
beta <- beta[reorder]
if (vcov) {
p <- length(thismodel)
R <- diag(qr$np)
R[row(R) > col(R)] <- qr$rbar
R <- t(R)
R <- sqrt(qr$d) * R
R <- R[1:p, 1:p, drop = FALSE]
R <- chol2inv(R)
dimnames(R) <- list(object$xnames[qr$vorder[1:p]],
object$xnames[qr$vorder[1:p]])
V <- R * s$rss[id[i]]/(object$nn - p)
V <- V[reorder, reorder]
attr(beta, "vcov") <- V
}
betas[[i]] <- beta
}
if (length(id) == 1)
beta
else betas
}
Another solution that works for me is to randomize the order of the column(independent variables) in your dataset before running the regsubsets. The idea is that after reorder hopefully the highly correlated columns will be far apart from each other and will not trigger the reorder behavior in the regsubsets algorithm.

Resources