R: passing a formula into an function as the first input - r

library(RSSL)
set.seed(1)
df <- generateSlicedCookie(1000,expected=FALSE) %>%
add_missinglabels_mar(Class~.,0.98)
class_erlr <- EntropyRegularizedLogisticRegression(Class ~., df, lambda=0.01,lambda_entropy = 100)
In the EntropyRegularizedLogisticRegression function from the RSSL package, the example in the documentation passed in the formula Class ~. as the input. I was looking at the source code, and these are the parameters for the function
function (X, y, X_u = NULL, lambda = 0, lambda_entropy = 1, intercept = TRUE,
init = NA, scale = FALSE, x_center = FALSE)
I tried manually defining what X, y, X_u are based on the df I generated. But running the following gives me an error with the optimization:
y <- df$Class
X <- df[, -1]
ids <- which(is.na(y))
X_u <- X[ids, ]
class_erlr_manual <- EntropyRegularizedLogisticRegression(X = X, y = y, X_u = X_u, lambda=0.01,lambda_entropy = 100)
The error reads:
Error in optim(w, fn = loss_erlr, gr = grad_erlr, X, y, X_u, lambda = lambda, :
initial value in 'vmmin' is not finite
Why does changing the formula input Class ~. into X=X, y =y, X_u = X_u result in an error? Can anyone point me to where in the source code the formula input is being used?

Related

How to prepare variables for nnet classification/predict in R?

In the classification I use the variable x as the value and y as the labels. As here in the example for randomForest:
iris_train_values <- iris[,c(1:4)]
iris_train_labels <- iris[,5]
model_RF <- randomForest(x = iris_train_values, y = iris_train_labels, importance = TRUE,
replace = TRUE, mtry = 4, ntree = 500, na.action=na.omit,
do.trace = 100, type = "classification")
This solution works for many classifiers, however when I try to do it in nnet and get error:
model_nnet <- nnet(x = iris_train_values, y = iris_train_labels, size = 1, decay = 0.1)
Error in nnet.default(x = iris_train_values, y = iris_train_labels, size = 1, :
NA/NaN/Inf in foreign function call (arg 2)
In addition: Warning message:
In nnet.default(x = iris_train_values, y = iris_train_labels, size = 1, :
NAs introduced by coercion
Or on another data set gets an error:
Error in y - tmp : non-numeric argument to binary operator
How should I change the variables to classify?
The formula syntax works:
library(nnet)
model_nnet <- nnet(Species ~ ., data = iris, size = 1)
But the matrix syntax does not:
nnet::nnet(x = iris_train_values, y = as.matrix(iris_train_labels), size = 1)
I don't understand why this doesn't work, but at least there is a work around.
predict works fine with the formula syntax:
?predict.nnet
predict(model_nnet,
iris[c(1,51,101), 1:4],
type = "class") # true classese are ['setosa', 'versicolor', 'virginica']

Optim inside function: promise already under evaluation

The are a few topics about the error message "promise already under evaluation" and scoping. However it doesn't seem to be the case here. I trying to use optim inside other function. To reproduce the same error I put a minimal example bellow. Is there a way to avoid this?
set.seed(123)
df_ss = data.frame(var1 = rnorm(100),
var2 = rnorm(100),
var3 = rnorm(100),
var4 = rnorm(100))
test <- function(df_ss = df_ss, degree = 3, raw = TRUE, ...){
# objective function
objective <- function(beta, df_ss = df_ss) {
op2 <- lm(formula = I(var1 - beta*var2) ~ poly(I(var3 - beta*var2), degree = degree), data = df_ss)
return (sum(residuals(op2)^2))
}
ss_reg <- optim(1, fn = objective , method ="Brent",lower =-1, upper =1)
}
test()
Error in model.frame.default(formula = I(var1 - beta * var2) ~ poly(I(var3 - : promise already under evaluation: recursive default argument reference or earlier problems?
The following works. Avoid doing df_ss = df_ss.
test <- function(dat = df_ss, degree = 3, raw = TRUE, ...){
# objective function
objective <- function(beta) {
op2 <- lm(formula = I(var1 - beta*var2) ~ poly(I(var3 - beta*var2), degree = degree, raw = raw), data = dat)
return (sum(residuals(op2)^2))
}
ss_reg <- optim(1, fn = objective , method ="Brent",lower =-1, upper =1)
}
result <- test()
result
# par
# [1] -0.03866607
#
# value
# [1] 80.22191

Error in VAR: Different row size of y and exogen

I am attempting a VAR model in R with an exogenous variable on:
vndata <- read.csv("vndata.txt", sep="")
names(vndata)
da <- data.frame(vndata[2:dim(vndata),])
# STOCK PRICE MODEL
y <- da[, c("irate", "stockp", "mrate", "frate")]
x <- data.frame(da[, c("cdi")])
library("vars")
VARselect(y, lag.max = 8,exogen = x)
var1 <- restrict(VAR(y, p = 2,exogen = x),method = c("ser"),thresh = 1.56)
Then, I want to plot the impulse response function:
plot(irf(var1, impulse = c("irate"), response = c("frate"), boot = T,
cumulative = FALSE,n.ahead = 20))
however, it produces the warning:
Error in VAR(y = ysampled, p = 2, exogen = x) :
Different row size of y and exogen.
I can not figure what happen. I have use dim() to make sure that y and x have the same row size.
Try this, it worked for me:
.GlobalEnv$exogen <- x
VARselect(y, lag.max = 8,exogen = .GlobalEnv$exogen)

Running existing function with non-default option

The code pasted below from ResourceSelection::hoslem.test performs a Hosmer and Lemeshow goodness of fit test. While investigating why the output that does not agree exactly with that performed by another software (Stata), I found that the difference relates to use of default R argument for the quantile function (type=7). I would like to use this function with a different default for calculation of quantiles (type=6).
FWIW, the reference to the 9 possible methods used by R can be found at:
https://www.amherst.edu/media/view/129116/original/Sample+Quantiles.pdf
The Stata manual for pctile refers to a default method and an 'altdef' method. I found it difficult to map these two methods to corresponding R types.
However,
hoslem.test(yhat, y, type=6)
Produces:
> hl <- hoslem.test(y, yhat, type=6)
Error in hoslem.test(y, yhat, type = 6) : unused argument (type = 6)
Is there a way to run the function below with a non-default argument for the quantile function?
Ie. allows the following line adding ', type=6':
qq <- unique(quantile(yhat, probs = seq(0, 1, 1/g), type=6))
The function in question is:
> ResourceSelection::hoslem.test
function (x, y, g = 10)
{
DNAME <- paste(deparse(substitute(x)), deparse(substitute(y)),
sep = ", ")
METHOD <- "Hosmer and Lemeshow goodness of fit (GOF) test"
yhat <- y
y <- x
qq <- unique(quantile(yhat, probs = seq(0, 1, 1/g)))
cutyhat <- cut(yhat, breaks = qq, include.lowest = TRUE)
observed <- xtabs(cbind(y0 = 1 - y, y1 = y) ~ cutyhat)
expected <- xtabs(cbind(yhat0 = 1 - yhat, yhat1 = yhat) ~
cutyhat)
chisq <- sum((observed - expected)^2/expected)
PVAL = 1 - pchisq(chisq, g - 2)
PARAMETER <- g - 2
names(chisq) <- "X-squared"
names(PARAMETER) <- "df"
structure(list(statistic = chisq, parameter = PARAMETER,
p.value = PVAL, method = METHOD, data.name = DNAME, observed = observed,
expected = expected), class = "htest")
}
We can modify pieces of functions. Look at the body of the function
as.list(body(hoslem.test))
See that the element we want to modify is the 6th element in the body
[[1]]
`{`
[[2]]
DNAME <- paste(deparse(substitute(x)), deparse(substitute(y)),
sep = ", ")
[[3]]
METHOD <- "Hosmer and Lemeshow goodness of fit (GOF) test"
[[4]]
yhat <- y
[[5]]
y <- x
[[6]]
qq <- unique(quantile(yhat, probs = seq(0, 1, 1/g)))
Modify the 6th element to what you want
body(hoslem.test)[[6]] = substitute(qq <- unique(quantile(yhat,
probs = seq(0, 1, 1/g), type = 6)))
The easiest way would be to reenter the function as your own:
myhoslem.test<-function(x, y, g = 10, mytype = 6){
DNAME <- paste(deparse(substitute(x)), deparse(substitute(y)),
sep = ", ")
METHOD <- "Hosmer and Lemeshow goodness of fit (GOF) test"
yhat <- y
y <- x
qq <- unique(quantile(yhat, probs = seq(0, 1, 1/g), type = mytype))
cutyhat <- cut(yhat, breaks = qq, include.lowest = TRUE)
observed <- xtabs(cbind(y0 = 1 - y, y1 = y) ~ cutyhat)
expected <- xtabs(cbind(yhat0 = 1 - yhat, yhat1 = yhat) ~
cutyhat)
chisq <- sum((observed - expected)^2/expected)
PVAL = 1 - pchisq(chisq, g - 2)
PARAMETER <- g - 2
names(chisq) <- "X-squared"
names(PARAMETER) <- "df"
structure(list(statistic = chisq, parameter = PARAMETER,
p.value = PVAL, method = METHOD, data.name = DNAME, observed = observed,
expected = expected), class = "htest")
}
The key change here is :
qq <- unique(quantile(yhat, probs = seq(0, 1, 1/g), type = mytype))
and allowing mytype as a argument to the function with default as 6
The two answers suggest a wrapper function to flexibly modify hoslem.test
myhoslem.test<-function(x, y, g = 10, mytype = 6){
body(hoslem.test)[[6]] = substitute(qq <- unique(quantile(yhat,
probs = seq(0, 1, 1/g), type = mytype)))
hoslem.test(x,y, g=10)
}

Create R Function with flexibility to reference different datasets

I am trying to create a simple function in R that can reference multiple datasets and multiple variable names. Using the following code, I get an error, which I believe is due to referencing:
set.seed(123)
dat1 <- data.frame(x = sample(10), y = sample(10), z = sample(10))
dat2 <- data.frame(x = sample(10), y = sample(10), z = sample(10))
table(dat1$x, dat1$y)
table(dat2$x, dat2$y)
fun <- function(dat, sig, range){print(table(dat$sig, dat$range))}
fun(dat = dat1, sig = x, range = y)
fun(dat = dat2, sig = x, range = y)
Any idea how to adjust this code so that it can return the table appropriately?
The [[ ]] operator on data frame is similar to $ but allows you to introduce an object and look for it's value. Then outside of the function you assign "x" value to sig. if you don't put quotes there R will look for x object
fun <- function(dat, sig, range){print(table(dat[[sig]], dat[[range]]))}
fun(dat = dat1, sig = "x", range = "y")
fun(dat = dat2, sig = "x", range = "y")

Resources