I have the following code that defines two constraints I want to use in my multi-objective optimization problem, given that model1 model2 and model3 are already verifiably working before.
restrictions <- function (var) {
x <- var[1]; y <- var[2]
restrictions <- logical(2)
restrictions[1] <- (predict(get(model1), data.frame(x, y), type = "response") < 500)
restrictions[2] <- (predict(get(model1), data.frame(x, y), type = "response") > 0)
return (restrictions);
}
Building a genetic algorithm multi objective function in the following code:
fn <- function (var) {
x <- var[1]; y <- var[2]
f <- numeric(3)
f[1] <- predict(get(model1), data.frame(x, y), type = "response")
f[2] <- predict(get(model2), data.frame(x, y), type = "response")
f[3] <- predict(get(model3), data.frame(x, y), type = "response")
return (f);
}
And finally the optimization process here using mco library
library (mco)
optimum <- mco::nsga2 (fn = fn, idim = 2, odim=3,
constraints = restrictions, cdim = 2,
generations = 100,
popsize= 40,
cprob = 0.5,
cdist = 20,
mprob = 0.5,
mdist = 20,
lower.bounds = c(-80, 50),
upper.bounds = c(-70, 60)
)
The main problem is that the solution does not abide with the constraint specified. Any thoughts on that?
Related
I'm using a Cox regression model considering my variable trough splines transformation. All is working nice until the subsequent nomogram... as expected, the scale of my variable is also transformed but I'd like to add some custom ticks inside the region between values 0 and 2 (I guess is the transformed one). Any idea, if you please?
Here's my code...
data <- source("https://pastebin.com/raw/rGtUSTLz")$value
ddist <- datadist(data)
options(datadist = "ddist")
fit <- cph(Surv(time, event) ~ rcs(var, 3), data = data, surv = T, x = T, y = T)
surv <- Survival(fit)
plot(nomogram(fit,
fun = list(function(x) surv(times = 10, lp = x),
function(x) surv(times = 30, lp = x),
function(x) surv(times = 60, lp = x)),
funlabel = paste("c", 1:3), lp = T))
... and these are the real and the desired outputs.
Thanks in advance for your help!
I have had this issue too. My answer is a work around using another package, regplot. Alternatively, if you know what the point values are at the tick marks you want plotted, then you can supply those instead of using the output from regplot. Basically, you need to modify the tick marks and points that are output from the nomogram function and supplied to plot the nomogram.
This method also provides a way to remove points / tick marks by editing the nomogram output.
data <- source("https://pastebin.com/raw/rGtUSTLz")$value
ddist <- datadist(data)
options(datadist = "ddist")
fit <- cph(Surv(time, event) ~ rcs(var, 3), data = data, surv = T, x = T, y = T)
surv <- Survival(fit)
nom1 <- nomogram(fit, fun = list(function(x) surv(times = 10, lp = x),
function(x) surv(times = 30, lp = x),
function(x) surv(times = 60, lp = x)),
funlabel = paste("c", 1:3), lp = T)
library(regplot)
# call regplot with points = TRUE to get output
regplot(fit, fun = list(function(x) surv(times = 10, lp = x),
function(x) surv(times = 30, lp = x),
function(x) surv(times = 60, lp = x)),
funlabel = paste("c", 1:3), points = TRUE)
# look at the points supplied through regplot and take those.
nom1_edit <- nom1
# now we edit the ticks supplied for var and their corresponding point value
nom1_edit[[1]][1] <- list(c(0, 0.06, 0.15, 0.3, 2,4,6,8,10,12,14,16))
nom1_edit[[1]][2] <- list(c(0, 10, 21, 32, 42.41191, 50.63878, 58.86565,
67.09252, 75.31939, 83.54626, 91.77313, 100.00000))
nom1_edit$var$points <- c(0, 10, 21, 32, 42.41191, 50.63878, 58.86565,
67.09252, 75.31939, 83.54626, 91.77313, 100.00000)
# plot the edited nomogram with new points
plot(nom1_edit)
I am trying to use cross validation with a decision tree using AUC. These are the functions that I am using:
.cvFolds <- function(Y, V) {
Y0 <- split(sample(which(Y == 0)), rep(1:V, length = length(which(Y == 0))))
Y1 <- split(sample(which(Y == 1)), rep(1:V, length = length(which(Y == 1))))
folds <- vector("list", length = V)
for (v in seq(V)) folds[[v]] <- c(Y0[[v]], Y1[[v]])
return(folds)
}
.doFit <- function(V, folds, train) {
set.seed(v)
ycol <- which(names(train) == y)
params <- list(x = train[-folds[[V]], -ycol],
y = as.factor(train[-folds[[V]], ycol]),
xtest = train[folds[[V]], -ycol])
fit <- do.call(randomForest, params)
pred <- fit$test$votes[, 2]
return(pred)
}
This is the function to calculate probabilities:
iid_example <- function(train, y = "V1", V = 10, seed = 1) {
set.seed(seed)
folds <- .cvFolds(Y = train[, c(y)], V = V)
# Generate CV predicted values
cl <- makeCluster(detectCores())
registerDoParallel(cl)
predictions <- foreach(v = 1:V, .combine = "c",
.packages = c("randomForest")) %dopar% .doFit(v, folds, train)
stopCluster(cl)
predictions[unlist(folds)] <- predictions
# Get CV AUC
runtime <- system.time(res <- ci.cvAUC(predictions = predictions,
labels = train[, c(y)],
folds = folds,
confidence = 0.95))
print(runtime)
return(res)
}
The actual function call:
res <- iid_example(train = datos, y = "V1", V = 10, seed = 1)
When I try to run it, I get the following error:
Y0[[v]] out of bounds
I am trying to adjust the parameterization of the function, but I do not understand why it is out of boundaries. Thanks for your help
I get an error when I try to run this line of code:
nnetPred.model <- nnetPred(X, Y, step_size = 0.4,reg = 0.0002, h=50, niteration = 6000)
The error message is:
Error in nnetPred(X, Y, step_size = 0.4, reg = 2e-04, h = 50, niteration = 6000) :
unused arguments (step_size = 0.4, reg = 2e-04, h = 50, niteration = 6000)
My code is as below:
nnetPred <- function(X, Y, para = list()){
W <- para[[1]]
b <- para[[2]]
W2 <- para[[3]]
b2 <- para[[4]]
N <- nrow(X)
hidden_layer <- pmax(0, X%*% W + matrix(rep(b,N), nrow = N, byrow = T))
hidden_layer <- matrix(hidden_layer, nrow = N)
scores <- hidden_layer%*%W2 + matrix(rep(b2,N), nrow = N, byrow = T)
predicted_class <- apply(scores, 1, which.max)
return(predicted_class)
}
nnetPred.model <- nnetPred(X, Y, step_size = 0.4,reg = 0.0002, h=50, niteration = 6000)
It looks like you are trying to use variable arguments. In R, this means the ellipsis (...). This is how you would define the top of nnetPred to use variable arguments:
nnetPred <- function(X, Y, ...) {
para <- list(...)
This will work in your case, but is not really the best way to define that function, because it looks like you have a finite number of parameters. Only when you have an unknown number of parameters should you use variable argument lists. I would recommend simply putting your parameters in the parameter list. You can rename them if you want to:
nnetPred <- function(X, Y, step_size, reg, h, niteration) {
W <- step_size
b <- reg
W2 <- h
b2 <- niteration
try <- emd(xt2, tt2, boundary="wave")
Error in emd(xt2, tt2, boundary = "wave") :
unused argument (boundary = "wave")
In MICE R mice.impute.polyreg.r (imputation for categorical response variables by the Bayesian polytomous regression model), it is mentioned that the method consists of the following steps:
Fit categorical response as a multinomial model
Compute predicted categories
Add appropriate noise to predictions.
In the implementation:
mice.impute.polyreg <- function(y, ry, x, nnet.maxit = 100,
nnet.trace = FALSE, nnet.maxNWts = 1500, ...) {
x <- as.matrix(x)
aug <- augment(y, ry, x, ...)
x <- aug$x
y <- aug$y
ry <- aug$ry
w <- aug$w
fy <- as.factor(y)
nc <- length(levels(fy))
un <- rep(runif(sum(!ry)), each = nc)
xy <- cbind.data.frame(y = y, x = x) # fixed SvB 6/12/2010
if (ncol(x) == 0L)
xy <- data.frame(xy, int = 1)
fit <- multinom(formula(xy), data = xy[ry,,drop = FALSE ],
weights = w[ry], maxit = nnet.maxit, trace = nnet.trace,
maxNWts = nnet.maxNWts, ...)
post <- predict(fit, xy[!ry, ], type = "probs")
if (sum(!ry) == 1)
post <- matrix(post, nrow = 1, ncol = length(post))
if (is.vector(post))
post <- matrix(c(1 - post, post), ncol = 2)
draws <- un > apply(post, 1, cumsum)
idx <- 1 + apply(draws, 2, sum)
return(levels(fy)[idx])
}
link to github code
I am able to make out the first two steps, however I can't seem to find where in the implementation "noise" has been added to the predictions. It seems that the predicted categories are returned directly as they are.
Am I missing something?
The code pasted below from ResourceSelection::hoslem.test performs a Hosmer and Lemeshow goodness of fit test. While investigating why the output that does not agree exactly with that performed by another software (Stata), I found that the difference relates to use of default R argument for the quantile function (type=7). I would like to use this function with a different default for calculation of quantiles (type=6).
FWIW, the reference to the 9 possible methods used by R can be found at:
https://www.amherst.edu/media/view/129116/original/Sample+Quantiles.pdf
The Stata manual for pctile refers to a default method and an 'altdef' method. I found it difficult to map these two methods to corresponding R types.
However,
hoslem.test(yhat, y, type=6)
Produces:
> hl <- hoslem.test(y, yhat, type=6)
Error in hoslem.test(y, yhat, type = 6) : unused argument (type = 6)
Is there a way to run the function below with a non-default argument for the quantile function?
Ie. allows the following line adding ', type=6':
qq <- unique(quantile(yhat, probs = seq(0, 1, 1/g), type=6))
The function in question is:
> ResourceSelection::hoslem.test
function (x, y, g = 10)
{
DNAME <- paste(deparse(substitute(x)), deparse(substitute(y)),
sep = ", ")
METHOD <- "Hosmer and Lemeshow goodness of fit (GOF) test"
yhat <- y
y <- x
qq <- unique(quantile(yhat, probs = seq(0, 1, 1/g)))
cutyhat <- cut(yhat, breaks = qq, include.lowest = TRUE)
observed <- xtabs(cbind(y0 = 1 - y, y1 = y) ~ cutyhat)
expected <- xtabs(cbind(yhat0 = 1 - yhat, yhat1 = yhat) ~
cutyhat)
chisq <- sum((observed - expected)^2/expected)
PVAL = 1 - pchisq(chisq, g - 2)
PARAMETER <- g - 2
names(chisq) <- "X-squared"
names(PARAMETER) <- "df"
structure(list(statistic = chisq, parameter = PARAMETER,
p.value = PVAL, method = METHOD, data.name = DNAME, observed = observed,
expected = expected), class = "htest")
}
We can modify pieces of functions. Look at the body of the function
as.list(body(hoslem.test))
See that the element we want to modify is the 6th element in the body
[[1]]
`{`
[[2]]
DNAME <- paste(deparse(substitute(x)), deparse(substitute(y)),
sep = ", ")
[[3]]
METHOD <- "Hosmer and Lemeshow goodness of fit (GOF) test"
[[4]]
yhat <- y
[[5]]
y <- x
[[6]]
qq <- unique(quantile(yhat, probs = seq(0, 1, 1/g)))
Modify the 6th element to what you want
body(hoslem.test)[[6]] = substitute(qq <- unique(quantile(yhat,
probs = seq(0, 1, 1/g), type = 6)))
The easiest way would be to reenter the function as your own:
myhoslem.test<-function(x, y, g = 10, mytype = 6){
DNAME <- paste(deparse(substitute(x)), deparse(substitute(y)),
sep = ", ")
METHOD <- "Hosmer and Lemeshow goodness of fit (GOF) test"
yhat <- y
y <- x
qq <- unique(quantile(yhat, probs = seq(0, 1, 1/g), type = mytype))
cutyhat <- cut(yhat, breaks = qq, include.lowest = TRUE)
observed <- xtabs(cbind(y0 = 1 - y, y1 = y) ~ cutyhat)
expected <- xtabs(cbind(yhat0 = 1 - yhat, yhat1 = yhat) ~
cutyhat)
chisq <- sum((observed - expected)^2/expected)
PVAL = 1 - pchisq(chisq, g - 2)
PARAMETER <- g - 2
names(chisq) <- "X-squared"
names(PARAMETER) <- "df"
structure(list(statistic = chisq, parameter = PARAMETER,
p.value = PVAL, method = METHOD, data.name = DNAME, observed = observed,
expected = expected), class = "htest")
}
The key change here is :
qq <- unique(quantile(yhat, probs = seq(0, 1, 1/g), type = mytype))
and allowing mytype as a argument to the function with default as 6
The two answers suggest a wrapper function to flexibly modify hoslem.test
myhoslem.test<-function(x, y, g = 10, mytype = 6){
body(hoslem.test)[[6]] = substitute(qq <- unique(quantile(yhat,
probs = seq(0, 1, 1/g), type = mytype)))
hoslem.test(x,y, g=10)
}