trying to use exact=TRUE feature in R glmnet - r

I am trying to use exact=TRUE feature in glmnet. But I am getting an error message.
> fit = glmnet(as.matrix(((x_values))), (as.matrix(y_values)),penalty=variable.list$penalty)
> coef.exact = coef(fit, s = 0.03, exact = TRUE)
Error: used coef.glmnet() or predict.glmnet() with `exact=TRUE` so must in addition supply original argument(s) x and y and penalty.factor in order to safely rerun glmnet
How can I supply penalty.factor to coef.exact?
Options tried:-
> coef.exact = coef(as.matrix(((x_values))), (as.matrix(y_values)),penalty=variable.list$penalty, s = 0.03, exact = TRUE)
Error: $ operator is invalid for atomic vectors
>
> coef.exact = coef((as.matrix(((x_values))), (as.matrix(y_values)),penalty=variable.list$penalty), s = 0.03, exact = TRUE)
Error: unexpected ',' in "coef.exact = coef((as.matrix(((x_values))),"
>
> coef.exact = coef((as.matrix(((x_values))) (as.matrix(y_values)) penalty=variable.list$penalty), s = 0.03, exact = TRUE)
Error: unexpected symbol in "coef.exact = coef((as.matrix(((x_values))) (as.matrix(y_values)) penalty"
>
> coef.exact = coef(fit(as.matrix(((x_values))), (as.matrix(y_values)),penalty=variable.list$penalty), s = 0.03, exact = TRUE)
Error in fit(as.matrix(((x_values))), (as.matrix(y_values)), penalty = variable.list$penalty) :
could not find function "fit"
>
> coef.exact = coef(glmnet(as.matrix(((x_values))), (as.matrix(y_values)),penalty=variable.list$penalty), s = 0.03, exact = TRUE)
Error: used coef.glmnet() or predict.glmnet() with `exact=TRUE` so must in addition supply original argument(s) x and y and penalty.factor in order to safely rerun glmnet
>

Here is an example using mtcars as sample data. Note it's always advisable to provide a minimal & reproducible code example including sample data when posting on SO.
# Fit mpg ~ wt + disp
x <- as.matrix(mtcars[c("wt", "disp")]);
y <- mtcars[, "mpg"];
fit <- glmnet(x, y, penalty = 0.1);
# s is our regularisation parameter, and since we want exact results
# for s=0.035, we need to refit the model using the full data (x,y)
coef.exact <- coef(fit, s = 0.035, exact = TRUE, x = x, y = y, penalty.factor = 0.1);
coef.exact;
#3 x 1 sparse Matrix of class "dgCMatrix"
# 1
#(Intercept) 34.40289989
#wt -3.00225110
#disp -0.02016836
The reason why you explicitly need to provide x and y again is given in ?coef.glmnet (also see #FelipeAlvarenga post).
So in your case, the following should work:
fit = glmnet(x = as.matrix(x_values), y = y_values, penalty=variable.list$penalty)
coef.exact = coef(
fit,
s = 0.03,
exact = TRUE,
x = as.matrix(x_values),
y = y_values,
penalty.factor = variable.list$penalty)
Some comments
Perhaps the confusion arises from the difference between the model's overall regularisaton parameter (s or lambda) and the penalty.factors that you can apply to every coefficient. The latter allows for differential regularisation of individual parameters, whereas s controls the effect of overall L1/L2 regularisation.

In coef the parameter s corresponds to the penalty parameter. In the help files:
s Value(s) of the penalty parameter lambda at which predictions are
required. Default is the entire sequence used to create the model.
[...]
With exact=TRUE, these different values of s are merged (and sorted)
with object$lambda, and the model is refit before predictions are
made. In this case, it is required to supply the original data x= and
y= as additional named arguments to predict() or coef(). The workhorse
predict.glmnet() needs to update the model, and so needs the data used
to create it. The same is true of weights, offset, penalty.factor,
lower.limits, upper.limits if these were used in the original call.
Failure to do so will result in an error.
Therefore, to use exact = T you must assign your original penalties, x, y and any other parameter you inputted in your original model

Related

What causes improper input parameters when using nlsLM from minpack.lm?

I'm trying to fit a nonlinear curve to three data points. Later on, I'll need to integrate this snippet into a larger software that would try fitting the curve to these three points automatically. As it can be seen below, I'm trying to estimate the curve in the form a*x^power1 + b*x^power2. I know that the following function satisfies the condition 0.666*x^(-0.18) - 0.016*x^0.36. However, I am, for some reason, not able at all to reproduce it using nlsLM() from minpack.lm. No matter what combination I try to add at the start parameter, I end up with the same warning message of Warning message: In nls.lm(par = start, fn = FCT, jac = jac, control = control, lower = lower, : lmdif: info = 0. Improper input parameters.
And even though it is "only" a warning message, it seems to entirely mess up my code. Due to the improper input parameters, my variable m which I pass the results to, gets corrupted and nothing works afterwards that include the variable m.
Here is the reproducible example:
library(ggplot2)
library(minpack.lm)
dataset <- read.table(text='
x y
1 0.1 1
2 30 0.3
3 1000 0', header=T)
ds <- data.frame(dataset)
str(ds)
plot(ds, main = "bla")
nlmInitial <- c(a = 0.5, power1 = -0.2, b = -0.02, power2 = 0.3)
m <- nlsLM(y ~ a*I(x^power1) + b*I(x^power2),
data = ds,
start = nlmInitial,
trace = T)
summary(m)$coefficients
You want to estimate to many coefficients with too less observations. You say that 0.666*x^(-0.18) - 0.016*x^0.36 will be a solution. R comes to:
m <- nlsLM(y ~ 0.666*I(x^power1) + b*I(x^power2), data = ds, trace = T
, start = c(power1 = -0.2, b = -0.02, power2 = 0.3))
0.666*x^(-0.18053) - 0.01975*x^0.32879. But also
m <- nlsLM(y ~ 0.7*I(x^power1) + b*I(x^power2), data = ds, trace = T
, start = c(power1 = -0.2, b = -0.02, power2 = 0.3))
0.7*x^(-0.16599) - 0.04428*x^0.23363 will be a solution.
So you either have to increase the number of observations or reduce the number of coefficients to estimate.

Meaning of "trait" in MCMCglmm

Like in this post I'm struggling with the notation of MCMCglmm, especially what is meant by trait. My code ist the following
library("MCMCglmm")
set.seed(123)
y <- sample(letters[1:3], size = 100, replace = TRUE)
x <- rnorm(100)
id <- rep(1:10, each = 10)
dat <- data.frame(y, x, id)
mod <- MCMCglmm(fixed = y ~ x, random = ~us(x):id,
data = dat,
family = "categorical")
Which gives me the error message For error structures involving catgeorical data with more than 2 categories pleasue use trait:units or variance.function(trait):units. (!sic). If I would generate dichotomous data by letters[1:2], everything would work fine. So what is meant by this error message in general and "trait" in particular?
Edit 2016-09-29:
From the linked question I copied rcov = ~ us(trait):units into my call of MCMCglmm. And from https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q3/004006.html I took (and slightly modified it) the prior
list(R = list(V = diag(2), fix = 1), G = list(G1 = list(V = diag(2), nu = 1, alpha.mu = c(0, 0), alpha.V = diag(2) * 100))). Now my model actually gives results:
MCMCglmm(fixed = y ~ 1 + x, random = ~us(1 + x):id,
rcov = ~ us(trait):units, prior = prior, data = dat,
family = "categorical")
But still I've got a lack of understanding what is meant by trait (and what by units and the notation of the prior, and what is us() compared to idh() and ...).
Edit 2016-11-17:
I think trait is synoym to "target variable" or "response" in general or y in this case. In the formula for random there is nothing on the left side of ~ "because the response is known from the fixed effect specification." So the rational behind specifiying that rcov needs trait:units could be that it is alread defined by the fixed formula, what trait is (y in this case).
units is the response variable value, and trait is the response variable name, which corresponds to the categories. By specifying rcov = ~us(trait):units, you are allowing the residual variance to be heterogeneous across "traits" (response categories) so that all elements of the residual variance-covariance matrix will be estimated.
In Section 5.1 of Hadfield's MCMCglmm Course Notes (vignette("CourseNotes", "MCMCglmm")) you can read an explanation for the reserved variables trait and units.

Piecewise linear regression in R (segmented.lm)

I appreciate any help to make segmented.lm (or any other function) find the obvious breakpoints in this example:
data = list(x=c(50,60,70,80,90) , y= c(703.786,705.857,708.153,711.056,709.257))
plot(data, type='b')
require(segmented)
model.lm = segmented(lm(y~x,data = data),seg.Z = ~x, psi = NA)
It returns with the following error:
Error in solve.default(crossprod(x1), crossprod(x1, y1)) :
system is computationally singular: reciprocal condition number = 1.51417e-20
If I change K:
model.lm = segmented(lm(y~x,data = data),seg.Z = ~x, psi = NA, control = seg.control(K=1))
I get another error:
Error in segmented.lm(lm(y ~ x, data = data), seg.Z = ~x, psi = NA, control = seg.control(K = 1)) :
only 1 datum in an interval: breakpoint(s) at the boundary or too close each other
A nice objective method to determine the break point is described in Crawley (2007: 427).
First, define a vector breaks for a range of potential break points:
breaks <- data$x[data$x >= 70 & data$x <= 90]
Then run a for loop for piecewise regressions for all potential break points and yank out the minimal residual standard error (mse) for each model from the summary output:
mse <- numeric(length(breaks))
for(i in 1:length(breaks)){
piecewise <- lm(data$y ~ data$y*(data$x < breaks[i]) + data$y*(data$x >= breaks[i]))
mse[i] <- summary(piecewise)[6]
}
mse <- as.numeric(mse)
Finally, identify the break point with the least mse:
breaks[which(mse==min(mse))]
Hope this helps.

Cross-validating a CART model

In an assignment, we are asked to perform a cross-validation on a CART model. I have tried using the cvFit function from cvTools but got a strange error message. Here's a minimal example:
library(rpart)
library(cvTools)
data(iris)
cvFit(rpart(formula=Species~., data=iris))
The error I'm seeing is:
Error in nobs(y) : argument "y" is missing, with no default
And the traceback():
5: nobs(y)
4: cvFit.call(call, data = data, x = x, y = y, cost = cost, K = K,
R = R, foldType = foldType, folds = folds, names = names,
predictArgs = predictArgs, costArgs = costArgs, envir = envir,
seed = seed)
3: cvFit(call, data = data, x = x, y = y, cost = cost, K = K, R = R,
foldType = foldType, folds = folds, names = names, predictArgs = predictArgs,
costArgs = costArgs, envir = envir, seed = seed)
2: cvFit.default(rpart(formula = Species ~ ., data = iris))
1: cvFit(rpart(formula = Species ~ ., data = iris))
It looks that y is mandatory for cvFit.default. But:
> cvFit(rpart(formula=Species~., data=iris), y=iris$Species)
Error in cvFit.call(call, data = data, x = x, y = y, cost = cost, K = K, :
'x' must have 0 observations
What am I doing wrong? Which package would allow me to do a cross-validation with a CART tree without having to code it myself? (I am sooo lazy...)
The caret package makes cross validation a snap:
> library(caret)
> data(iris)
> tc <- trainControl("cv",10)
> rpart.grid <- expand.grid(.cp=0.2)
>
> (train.rpart <- train(Species ~., data=iris, method="rpart",trControl=tc,tuneGrid=rpart.grid))
150 samples
4 predictors
3 classes: 'setosa', 'versicolor', 'virginica'
No pre-processing
Resampling: Cross-Validation (10 fold)
Summary of sample sizes: 135, 135, 135, 135, 135, 135, ...
Resampling results
Accuracy Kappa Accuracy SD Kappa SD
0.94 0.91 0.0798 0.12
Tuning parameter 'cp' was held constant at a value of 0.2
Finally, I was able to get it to work. As Joran noted, the cost parameter needs to be adapted. In my case I am using 0/1 loss, which means that I use a simple function that evaluates != instead of - between y and yHat. Also, predictArgs must include c(type='class'), otherwise the predict call used internally will return a vector of probabilities instead of the most probable classification. To sum up:
library(rpart)
library(cvTools)
data(iris)
cvFit(rpart, formula=Species~., data=iris,
cost=function(y, yHat) (y != yHat) + 0, predictArgs=c(type='class'))
(This uses another variant of cvFit. Additional args to rpart can be passed by setting the args= parameter.)

How to save estimated parameters from nigfit() in a variable

I want to automatically fit time series returns into a NIG distribution.
With nigfit() from the package fBasics I estimate the mu, alpha, beta and delta of the distribution.
> nigFit(histDailyReturns,doplot=FALSE,trace=FALSE)
Title:
Normal Inverse Gaussian Parameter Estimation
Call:
.nigFit.mle(x = x, alpha = alpha, beta = beta, delta = delta,
mu = mu, scale = scale, doplot = doplot, span = span, trace = trace,
title = title, description = description)
Model:
Normal Inverse Gaussian Distribution
Estimated Parameter(s):
alpha beta delta mu
48.379735861 -1.648483055 0.012361539 0.001125734
This works fine, which means that nigfit plots my parameters.
However I would like to use the estimated parameters and save them in variables. So I could use them later.
> variable = nigfit(histDailyReturns,doplot=FALSE,trace=FALSE)
This doesn't work out. 'variable' is an S4 object of class structure fDISTFIT. Calling the variable replots the output of nigfit above.
I tried the following notations, to get just one parameter:
> variable$alpha
> variable.alpha
> variable[1]
I couldn't find an answer in the documentation of nigfit.
Is it possible to save the estimated parameters in variables? How does it work?
access the output compenents using #. variable has different slots. Get their names using slotNames(). Using the example from the documentation:
set.seed(1953)
s <- rnig(n = 1000, alpha = 1.5, beta = 0.3, delta = 0.5, mu = -1.0)
a <- nigFit(s, alpha = 1, beta = 0, delta = 1, mu = mean(s), doplot = TRUE)
slotNames(a)
[1] "call" "model" "data" "fit" "title"
[6] "description"
# `fit` is a list with all the goodies. You're looking for the vector, `estimate`:
a#fit$estimate
alpha beta delta mu
1.6959724 0.3597794 0.5601027 -1.0446402
Examine the structure of the output object using str(variable):
> variable#fit$par[["alpha"]]
[1] 48.379735861
> variable#fit$par[["beta"]]
[1] -1.648483055
> variable#fit$par[["delta"]]
[1] 0.012361539
> variable#fit$par[["mu"]]
[1] 0.001125734

Resources