Give the formula of a SVM with R - r

I use this code for my SVM prediction
library(gdata)
data = read.csv2("test.csv")
data
library(e1071)
model <- svm(cote ~ .,data,kernel='radial')
#model1 <- svm(y ~ x1+x2, data=f, type='nu-classification',kernel='radial',tolerance=0.001,gamma=2.5,cost=2,nu=0.8,cross=10,shrinking=FALSE)
predict(model, subset(data, select = - c(cote)))
Now I need to take the literal formula of this SVM to paste it on a C++ program. How can I do that ?
Thx

Maybe the formula can be recovered from the 'model'-object. Try this:
model$call[[2]]
Example:
> ?e1071::predict.svm
> model <- svm(Species ~ ., data = iris)
> model$call[[2]]
# Species ~ .
If you want that as a character variable the usual methods of coercion work as expected.

Related

Naming conventions of predictor variables

Specific Examples:
log1 <- glm(Outcome ~ Predictor1 + Predictor2, family = binomial(link="logit"),
data=data)
log2 <- glm(data$Outcome ~ data$Predictor1 + data$Predictor2,
family = binomial(link="logit"))
These will produce the same models and their summaries will be identical.
Then why when using these models to predict an outcome from test data, do the values differ?
Example:
predict(log1,type = "response", newdata = test_dat) ==
predict(log2,type = "response", newdata = test_dat) = "FALSE"
I am not as familiar with R as I would like, but I can't seem to explain the differences. Help?
To compare two objects use identical(log1, log2) ; however, the problem is that the names are part of the objects so if the names are different then the objects cannot be identical even if all the numbers underlying them are the same.
For example, note how Time and BOD$Time are part of fm1 and fm2:
fm1 <- lm(demand ~ Time, BOD)
fm2 <- lm(BOD$demand ~ BOD$Time)
fm1[[1]]
## (Intercept) Time
## 8.521429 1.721429
fm2[[1]]
## (Intercept) BOD$Time
## 8.521429 1.721429

How do I create a "macro" for regressors in R?

For long and repeating models I want to create a "macro" (so called in Stata and there accomplished with global var1 var2 ...) which contains the regressors of the model formula.
For example from
library(car)
lm(income ~ education + prestige, data = Duncan)
I want something like:
regressors <- c("education", "prestige")
lm(income ~ #regressors, data = Duncan)
I could find is this approach. But my application on the regressors won't work:
reg = lm(income ~ bquote(y ~ .(regressors)), data = Duncan)
as it throws me:
Error in model.frame.default(formula = y ~ bquote(.y ~ (regressors)), data =
Duncan, : invalid type (language) for variable 'bquote(.y ~ (regressors))'
Even the accepted answer of same question:
lm(formula(paste('var ~ ', regressors)), data = Duncan)
strikes and shows me:
Error in model.frame.default(formula = formula(paste("var ~ ", regressors)),
: object is not a matrix`.
And of course I tried as.matrix(regressors) :)
So, what else can I do?
Here are some alternatives. No packages are used in the first 3.
1) reformulate
fo <- reformulate(regressors, response = "income")
lm(fo, Duncan)
or you may wish to write the last line as this so that the formula that is shown in the output looks nicer:
do.call("lm", list(fo, quote(Duncan)))
in which case the Call: line of the output appears as expected, namely:
Call:
lm(formula = income ~ education + prestige, data = Duncan)
2) lm(dataframe)
lm( Duncan[c("income", regressors)] )
The Call: line of the output look like this:
Call:
lm(formula = Duncan[c("income", regressors)])
but we can make it look exactly as in the do.call solution in (1) with this code:
fo <- formula(model.frame(income ~., Duncan[c("income", regressors)]))
do.call("lm", list(fo, quote(Duncan)))
3) dot
An alternative similar to that suggested by #jenesaisquoi in the comments is:
lm(income ~., Duncan[c("income", regressors)])
The approach discussed in (2) to the Call: output also works here.
4) fn$ Prefacing a function with fn$ enables string interpolation in its arguments. This solution is nearly identical to the desired syntax shown in the question using $ in place of # to perform substitution and the flexible substitution could readily extend to more complex scenarios. The quote(Duncan) in the code could be written as just Duncan and it will still run but the Call: shown in the lm output will look better if you use quote(Duncan).
library(gsubfn)
rhs <- paste(regressors, collapse = "+")
fn$lm("income ~ $rhs", quote(Duncan))
The Call: line looks almost identical to the do.call solutions above -- only spacing and quotes differ:
Call:
lm(formula = "income ~ education+prestige", data = Duncan)
If you wanted it absolutely the same then:
fo <- fn$formula("income ~ $rhs")
do.call("lm", list(fo, quote(Duncan)))
For the scenario you described, where regressors is in the global environment, you could use:
lm(as.formula(paste("income~", paste(regressors, collapse="+"))), data =
Duncan)
Alternatively, you could use a function:
modincome <- function(regressors){
lm(as.formula(paste("income~", paste(regressors, collapse="+"))), data =
Duncan)
}
modincome(c("education", "prestige"))

Referencing factor names in R for ANOVA

I'm relatively new to R and am trying to streamline an ANOVA script to read a set of factor names from a table, and perform statistical tests on the interactions between these factors.
My basic question is how to not have to manually write the name of factors when I call aov, like this:
aov2 <- aov(no_gap ~ Diag*Age, data=data)
But instead, to index a variable which contains the names of the factors of interest, like this (but this doesn't work):
aov2 <- aov(get(vars[5]) ~ get(vars[1])*get(vars[2]), data=data)
Here's my whole script:
#Load data
outName <- read_file("fileNameToWrite.txt")
data <- read.table(header=TRUE, "testDataTable.txt",stringsAsFactors = TRUE)
vars <- colnames(data)
# Make sure subject column is a factor
cols <- c(vars[1:2])
data[,cols] <- data.frame(apply(data[cols], 2, as.factor))
##
# 2x2 between:
aov2 <- aov(get(vars[5]) ~ get(vars[1])*get(vars[2]), data=data)
aov2 <- aov(no_gap ~ Diag*Age, data=data)
aov2 <- aov(apply(vars[5]) ~ get(vars[1])*get(vars[2]), data=data)
summary(aov2)
For reference, this is what "vars" looks like when evaluated:
> vars
[1] "subject" "Diag" "Age" "gap" "no_gap"
Thanks so much for your help!!
The argument no_gap ~ Diag*Age you are passing to aov is a formula object. You can create a formula object from vars as follows:
myform <- as.formula(sprintf("%s ~ %s * %s", vars[5], vars[1], vars[2]))
aov2 <- aov(myform, data=data)

Train a random forest algorithm using various columns

I have asked this question before here: Creating a loop for different random forest training algoritms but didnt get a right answer yet. So hereby another attempt with a more reproducable example.
I have the following datasets:
train <- read.csv(url("http://s3.amazonaws.com/assets.datacamp.com/course/Kaggle/train.csv"))
test <- read.csv(url("http://s3.amazonaws.com/assets.datacamp.com/course/Kaggle/test.csv"))
train <- train[complete.cases(train), ]
I would like to run several random forest algoritms to see which one performs best. So what I basically want to do is:
#predict based on Pclass
fit <- randomForest(as.factor(Survived) ~ Pclass, data=train, importance=TRUE, ntree=2000)
Prediction <- predict(fit, test)
#fetch accuracy
#predict based on Pclass and Sex
fit <- randomForest(as.factor(Survived) ~ Pclass + Sex, data=train, importance=TRUE, ntree=2000)
Prediction <- predict(fit, test)
#fetch accuracy
I would like to create some kind of loop so that I can store all values in a list and then loop over it. So like this:
list <- c(Pclass, Pclass + Sex)
for (R in list) {
modfit <- paste0("won ~ ", R, ", data=training, method=\"rf\", prox=\"TRUE")
modfit <- as.formula(modfit)
train(modfit)
}
But the code above doesn't work. It gives me the following error:
Error in parse(text = x, keep.source = FALSE) :
<text>:1:13: unexpected ','
1: won ~ Pclass,
Any thoughts on how I can get this working?
for (R in list) {
modfit <- paste0("won ~ ", R, "data=training, method=\"rf\", prox=\"TRUE")
modfit <- as.formula(modfit)
train(modfit)
}
You might be having a comma before data=training that does not need to be there

R: Regression of each variable depending on all the others

In R, I have the following data.frame:
df <- data.frame(var1,var2,var3)
I would like to fit a regression function, like multinom, for each variable with respect to the others, without using the variable names explicitely. In other words, I would like to obtain this result:
fit1 <- multinom(var1 ~ ., data=df)
fit2 <- multinom(var2 ~ ., data=df)
fit3 <- multinom(var3 ~ ., data=df)
But in a for loop, without using the variable names (so that I can use the same code for any data.frame). Something similar to this:
for (i in colnames(df))
{
fit[i] <- lm(i ~ ., data=df)
}
(This code does not work.)
Maybe my question is trivial, but I have no idea on how to proceed.
Thanks!
You need to add an extra step to build the formula object using string operation
fit <- vector(mode = "list", length = ncol(df))
for (i in colnames(df)) {
fm <- as.formula(paste0(i, " ~ ."))
fit[[i]] <- lm(fm, data = df)
}

Resources