Issue running glmnet() for mtcars dataset - r

Whenever I run glmnet(mpg ~ ., data = mtcars, alpha=1) (from the glmnet package) I get the following error:
"Error in glmnet(mpg ~ ., data = mtcars, alpha = 1) : unused argument (data = mtcars)"
Any ideas for how to deal with this?
I think its because the glmnet() function is supposed to take in x and y as separate arguments. If I need separate x and y arguments, how would I write the formula so that glmnet::glmnet() runs for all variables of mtcars?

As the commenter suggests you need to use the glmnet method like so:
fit <- glmnet(as.matrix(mtcars[-1]), mtcars$mpg, alpha=1)
plot(fit)

Related

Passing offset inside custom regression function

I am trying to create a custom regression function for running multiple models (simplified here for detail). However, I am unable to pass an offset into this function. I am aware that this can be done inside of a formula but for this particular use case it must be an optional parameter.
Here is what I have:
fit_glm <- function(formula, df,
model_offset = NULL,
family = quasipoisson){
fit <- glm(formula, data = df,
offset = model_offset,
family = family)
return(fit)
}
data(mtcars)
fit_glm(mpg ~ hp, df = mtcars)
Whenever I run this I am met with Error in eval(extras, data, env) : object 'model_offset' not found. Perhaps I am missing something very simple.
Performing this call using:
glm(mpg ~ hp, data = mtcars, family = quasipoisson, offset = NULL)
Works perfectly fine. I want this method to be possible for running models with and without an offset, currently neither work.
Any help is much appreicated, tia

glm fit with iris invalid first argument, must be vector (list or atomic)

I have the following working code
glm.fit <- glm(Income ~ .,data=train,family=binomial)
summary(glm.fit)
However there are some questions I want to ask, and so I can ask the questions I decided to try and reproduce the code using the iris data set.
I tried
cf<-iris
glm.fit(Petal.Width ~ ., cf, family = binomial)
but I get an error
Error in dim(data) <- dim : invalid first argument, must be vector (list or atomic)
[Update]
I see the data I expect using the following
library(dplyr)
cf<-iris
cf %>% head(10)
There are some issues with your code.
First, there's no need to create the variable cf. You can just use iris.
Second, glm.fit takes as its first 2 arguments x and y. From the documentation, accessible at ?glm.fit:
For glm.fit: x is a design matrix of dimension n * p, and y is a vector of observations of length n.
Your first line of code uses glm to create a variable named glm.fit - this is not the same as the function of that name.
If you want to use glm, that function can take a formula and the name of a data frame as arguments. So this works:
glm(Petal.Width ~ ., data = iris)
But this gives an error:
glm(Petal.Width ~ ., data = iris, family = binomial)
Error in eval(family$initialize) : y values must be 0 <= y <= 1
That's because the response variable, Petal.Width is continuous. You use the binomial family when the response takes 2 values (yes/no, 0/1, true/false).

Calculating piecewise quantile linear regression with segmented package R

I am looking for a way to obtain the piecewise quantile linear regression with R. I have been able to compute the Quantile regression with the package quantreg. However, I don't want just 1 unique slope but want to check for breakpoints in my dataset. I have seen that the segmented package can do so. While it works good if the fit is carried out with lm or glm (as shown below in an example), it doesn't manage to work for quantile.
On the segmented package info I have read that there is a segmented.default which can be used for specific regression models, such as Quantiles. However, when I apply it for my quantile outcome it gives me the following errors:
Error in diag(vv) : invalid 'nrow' value (too large or NA)
In addition: Warning message:
cannot compute the covariance matrix
If instead of using K=2 I use for example psi I get other type of errors:
Error in rq.fit.br(x, y, tau = tau, ...) : Singular design matrix
I have created an example with the mtcars data so you can see the errors that I get.
library(quantreg)
library(segmented)
data(mtcars)
out.rq <- rq(mpg ~ wt, data= mtcars)
out.lm <- lm(mpg ~ wt, data= mtcars)
# Plotting the results
plot(mpg ~ wt, data = mtcars, pch = 1, main = "mpg ~ wt")
abline(out.lm, col = "red", lty = 2)
abline(out.rq, col = "blue", lty = 2)
legend("topright", legend = c("linear", "quantile"), col = c("red", "blue"), lty = 2)
#Generating segmented LM
o <- segmented(out.lm, seg.Z= ~wt, npsi=2, control=seg.control(display=FALSE))
plot(o, lwd=2, col=2:6, main="Segmented regression", res=FALSE) #lwd: line width #col: from 2 to 6 #RES: show datapoints
#Generating segmented Quantile
#using K=2
o.quantile <- segmented.default(out.rq, seg.Z= ~wt, control=seg.control(display=FALSE, K=2))
# using psi
o.quantile <- segmented.default(out.rq, seg.Z= ~wt, psi=list(wt=c(2,4)), control=seg.control(display=FALSE))
I came across this post after a long time because I have the same issue. Just in case others might be stuck with the problem in the future, I wanted to point out what the problem is.
I examined "segmented.default". There is a line in the source code as follows:
Cov <- try(vcov(objF), silent = TRUE)
vcov is used to calculate the covariance matrix but does not work for quantile regression object objF. To get the covariance matrix for quantile regression, you need:
summary(objF,se="boot",cov=TRUE)$cov
Here, I used bootstrap method to compute the covariance matrix by selecting se="boot" but you should choose the appropriate method for you. Check ?summary.rq then "se" section for different methods.
Additionally, you need to assign the row/column names as follows:
dimnames(Cov)[[1]] <- dimnames(Cov)[[2]] <- unlist(attributes(objF$coef))
After modifying the function, it worked for me.
Maybe the other answer isn't particularly clean, as you need to modify a package function.
Additionally, maybe boot isn't such a good idea for SEs, according to this answer.
To get it working a bit easier, add a function to your workspace:
vcov.rq <- function(object, ...) {
result = summary(object, se = "nid", covariance = TRUE)$cov
rownames(result) = colnames(result) = names(coef(object))
return(result)
}
Caveats from the Cross-Validated link apply.

How to plot a SVM model in R [duplicate]

I am trying to plot my svm model.
library(foreign)
library(e1071)
x <- read.arff("contact-lenses.arff")
#alt: x <- read.arff("http://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/contact-lenses.arff")
model <- svm(`contact-lenses` ~ . , data = x, type = "C-classification", kernel = "linear")
The contact lens arff is the inbuilt data file in weka.
However, now i run into an error trying to plot the model.
plot(model, x)
Error in plot.svm(model, x) : missing formula.
The problem is that in in your model, you have multiple covariates. The plot() will only run automatically if your data= argument has exactly three columns (one of which is a response). For example, in the ?plot.svm help page, you can call
data(cats, package = "MASS")
m1 <- svm(Sex~., data = cats)
plot(m1, cats)
So since you can only show two dimensions on a plot, you need to specify what you want to use for x and y when you have more than one to choose from
cplus<-cats
cplus$Oth<-rnorm(nrow(cplus))
m2 <- svm(Sex~., data = cplus)
plot(m2, cplus) #error
plot(m2, cplus, Bwt~Hwt) #Ok
plot(m2, cplus, Hwt~Oth) #Ok
So that's why you're getting the "Missing Formula" error.
There is another catch as well. The plot.svm will only plot continuous variables along the x and y axes. The contact-lenses data.frame has only categorical variables. The plot.svm function simply does not support this as far as I can tell. You'll have to decide how you want to summarize that information in your own visualization.

Passing Argument to lm in R within Function

I would like to able to call lm within a function and specify the weights variable as an argument passed to the outside function that is then passed to lm. Below is a reproducible example where the call works if it is made to lm outside of a function, but produces the error message Error in eval(expr, envir, enclos) : object 'weightvar' not found when called from within a wrapper function.
olswrapper <- function(form, weightvar, df){
ols <- lm(formula(form), weights = weightvar, data = df)
}
df <- mtcars
ols <- lm(mpg ~ cyl + qsec, weights = gear, data = df)
summary(ols)
ols2 <- olswrapper(mpg ~ cyl + qsec, weightvar = gear, df = df)
#Produces error: "Error in eval(expr, envir, enclos) : object 'weightvar' not found"
Building on the comments, gear isn't defined globally. It works inside the stand-alone lm call as you specify the data you are using, so lm knows to take gear from df.
Howver, gear itself doesn't exist outside that stand-alone lm function. This is shown by the output of gear
> gear
Error: object 'gear' not found
You can pass the gear into the function using df$gear
weightvar <- df$gear
ols <- olswrapper(mpg ~ cyl + qsec, weightvar , df = df)
I know I'm late on this, but I believe the previous explanation is incomplete. Declaring weightvar <- df$gear and then passing it in to the function only works because you use weightvar as the name for your weight argument. This is just using weightvar as a global variable. That's why df$gear doesn't work directly. It also doesn't work if you use any name except weightvar.
The reason why it doesn't work is that lm looks for data in two places: the dataframe argument (if specified), and the environment of your formula. In this case, your formula's environment is R_GlobalEnv. (You can test this by running print(str(form)) from inside olswrapper). Thus, lm will only look in the global environment and in df, not the function environment.
edit: In the lm documentation the description of the data argument says:
"an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which lm is called."
A quick workaround is to say environment(form) <- environment() to change your formula's environment. This won't cause any problems because the data in the formula is in the data frame you specify.
eval(substitute(...)) inside a body of a function allows us to employ non-standard evaluation
df <- mtcars
olswrapper <- function(form, weightvar, df)
eval(substitute(ols <- lm(formula(form), weights = weightvar, data = df)))
summary(ols)
olswrapper(mpg ~ cyl + qsec, weightvar = gear, df = df)
More here:
http://adv-r.had.co.nz/Computing-on-the-language.html

Resources