Fail to predict woe in R - r

I used this formula to get woe with
library("woe")
woe.object <- woe(data, Dependent="target", FALSE,
Independent="shop_id", C_Bin=20, Bad=0, Good=1)
Then I want to predict woe for the test data
test.woe <- predict(woe.object, newdata = test, replace = TRUE)
And it gives me an error
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "data.frame"
Any suggestions please?

For prediction, you cannot do it with the package woe. You need to use the package. Take note of the masking of the function woe, see below:
#let's say we woe and then klaR was loaded
library(klaR)
data = data.frame(target=sample(0:1,100,replace=TRUE),
shop_id = sample(1:3,100,replace=TRUE),
another_var = sample(letters[1:3],100,replace=TRUE))
#make sure both dependent and independent are factors
data$target=factor(data$target)
data$shop_id = factor(data$shop_id)
data$another_var = factor(data$another_var)
You need two or more dependent variables:
woemodel <- klaR::woe(target~ shop_id+another_var,
data = data)
If you only provide one, you have an error:
woemodel <- klaR::woe(target~ shop_id,
data = data)
Error in woe.default(x, grouping, weights = weights, ...) : All
factors with unique levels. No woes calculated! In addition: Warning
message: In woe.default(x, grouping, weights = weights, ...) : Only
one single input variable. Variable name in resulting object$woe is
only conserved in formula call.
If you want to predict the dependent variable with only one independent, something like logistic regression will work:
mdl = glm(target ~ shop_id,data=data,family="binomial")
prob = predict(mdl,data,type="response")
predicted_label = ifelse(prob>0.5,levels(data$target)[1],levels(data$target)[0])

Related

How do you compute average marginal effects for glm.cluster models?

I am looking for a way to compute average marginal effects with clustered standard errors which i seem to be having a few problems with. My model is as follows:
cseLogit <- miceadds::glm.cluster(data = data_long,
formula = follow ~ f1_distance + f2_distance + PolFol + MediaFol,
cluster = "id",
family = binomial(link = "logit"))
Where the dependent variable is binary (0/1) and all explanatory variables are numeric. I've tried to different ways of getting average marginal effects. The first one is:
marginaleffects <- margins(cseLogit, vcov = your_matrix)
Which gives me the following error:
Error in find_data.default(model, parent.frame()) :
'find_data()' requires a formula call
I've also tried this:
marginaleffects <- with(cseLogit, margins(glm_res, vcov=vcov))
which gives me this error:
Error in eval(predvars, data, env) :
object 'f1_distance' was not found
In addition: warnings:
1: In dydx.default(X[[i]], ...) :
Class of variable, f1_distance, is unrecognized. Returning NA.
2: In dydx.default(X[[i]], ...) :
Class of variable, f2_distance, is unrecognized. Returning NA.
Can you tell me what i'm doing wrong? If i haven't provided enough information, please let me know. Thanks in advance.

Errors with dredge() function in MuMin

I'm trying to use the dredge() function to evaluate models by completing every combination of variables (up to five variables per model) and comparing models using AIC corrected for small sample size (AICc).
However, I'm presented with one error and two warning messages as follows:
Fixed term is "(Intercept)"
Warning messages: 1: In dredge(MaxN.model,
m.min = 2, m.max = 5) : comparing models fitted by REML 2: In
dredge(MaxN.model, m.min = 2, m.max = 5) : arguments 'm.min' and
'm.max' are deprecated, use 'm.lim' instead
I've tried changing to 'm.lim' as specified but it comes up with the error:
Error in dredge(MaxN.model, m.lim = 5) : invalid 'm.lim' value In
addition: Warning message: In dredge(MaxN.model, m.lim = 5) :
comparing models fitted by REML
The code I'm using is:
MaxN.model<-lme(T_MaxN~Seagrass.cover+composition.pca1+composition.pca2+Sg.Richness+traits.pca1+
land.use.pc1+land.use.pc2+seascape.pc2+D.landing.site+T_Depth,
random=~1|site, data = sgdf, na.action = na.fail, method = "REML")
Dd_MaxN<-dredge(MaxN.model, m.min = 2 , m.max = 5)
What am I doing wrong?
You didn't tell us what you tried to specify for m.lim. ?dredge says:
m.lim ...optionally, the limits ‘c(lower, upper)’ for number of terms in a single model
so you should specify a two-element numeric (integer) vector.
You should definitely be using method="ML" rather than method="REML". The warning/error about REML is very serious; comparing models with different fixed effects that are fitted via REML will lead to nonsense.
So you should try:
MaxN.model <- lme(..., method = "ML") ## where ... is the rest of your fit
Dd_MaxN <- dredge(MaxN.model, m.lim=c(2,5))

Error in R: "Error in tree.control(nobs, ...) : unused argument (type = "class")"

I am building a decision tree using the tree library in R. I have attempted to fit my model as follows:
model <- tree(Outcome ~ Age + Sex + Income, data = train, type = "class")
Running the above line gives me an error as follows:
Error in tree.control(nobs, ...) : unused argument (type = "class")
I down sampled so that each class is equal and so did not specify any weights. If I remove the argument, type = "class", the model runs but when I predict using the model, it seems that it is building a regression model which I do not want.
Can someone help?
If you look at the help page ?tree there is no argument called type. If you are getting a regression, that is because Outcome is a numeric argument. I expect that you can fix this by adding
train$Outcome = factor(train$Outcome)

Error when applying lsmeans to model

I have a dataset (data1) as below:
T,D,P,R
1,0,1,1
2,1,1,3
3,0,1,7
1,0,2,2
2,1,2,3
3,1,2,7
1,0,3,1
2,1,3,4
3,1,3,7
1,1,4,1
2,1,4,3
3,0,4,7
To start with, all variables except the response (which is R) are specified as factors (so D, T and P are factors). I then fit the model:
model1 <- lme(R~D+T,random=~1|P, method="REML",data=data1)
Then I want to use lsmeans to average across factor levels of D and P.
To do this I type:
lsm_model <- lsmeans(model1,~T)
Doing this, I get two error messages:
"Error in [.data.frame(tbl, , vars, drop = FALSE) :
undefined columns selected
And
Error in ref.grid(object = list(modelStruct = list(reStruct = list(Plot = -11.7209195395097)), :
Perhaps a 'data' or 'params' argument is needed"
Why can't I fit lsmeans to this model? I have used lsmeans many times without any problems.

using ksvm of kernlab package for predicting has an error

I use ksvm function to train the data, but in predicting I have an error,here is the code:
svmmodel4 <- ksvm(svm_train[,1]~., data=svm_train,kernel = "rbfdot",C=2.4,
kpar=list(sigma=.12),cross=5)
Warning message:
In .local(x, ...) : Variable(s) `' constant. Cannot scale data.
pred <- predict(svmmodel4, svm_test[,-1])
Error in eval(expr, envir, enclos) : object 'res_var' not found.
If I add the response variable, it works:
pred <- predict(svmmodel4, svm_test)
But if you add the response variable,how can it be "predict"? what is wrong with my code? Thanks for your help!
The complete code:
library(kernlab)
svmData <- read.csv("svmData.csv",header=T,stringsAsFactors = F)
svmData$res_var <- as.factor(svmData$res_var)
svm_train <- svmData1[1:2110,]
svm_test <- svmData1[2111:2814,]
svmmodel4 <- ksvm(svm_train[,1]~.,data = svm_train,kernel = "rbfdot",C=2.4,
kpar=list(sigma=.12),cross=5)
pred1 <- predict(svmmodel4,svm_test[,-1])
You can not remove your response column from your test dataset. You simply divide your data horizontally, meaning the response column must be in your training and testing datasets, or even validation dataset if you have one.
your function
pred <- predict(svmmodel4, svm_test)
is working just fine, the predict function will take your data, knowing your factored column, and test the rest against the model. Your training and testing datasets must have the same number of columns, but the number of rows could be different.

Resources