Error in multiple regression prediction interval - r

This is the error message:
Error in qt((1 - level)/2, df) : Non-numeric argument to mathematical function
What I am trying to do is to fit a model to check the association between SBP and age with sex and race adjustments.
My code uses the uwIntroStats package: the code to fit the model works. Sex (male) is coded as 0 for female and 1 for male, race is coded 1 to 4.
library(uwIntroStats)
data(mri)
model <- regress("mean", sbp~age*male+as.factor(race), data = mri)
predict(model, data.frame(age=70,male=0,race=2),interval="prediction")
Any reasons why the error occurs and how to fix it? Thanks!

You need to name the newdata argument: otherwise the predict method thinks you're trying to specify the next unmatched argument, which is level. From ?predict.uRegress:
## S3 method for class 'uRegress'
predict(object,interval="prediction",level=0.95, ...)
So
predict(model, newdata=data.frame(age=70,male=0,race=2),
interval="prediction")
works (you don't actually need to specify interval="prediction" - that's the default value).

Related

SVM:Need numeric dependent variable for regression

I have the following data
scorer<-function(points){
points["scores"] <- as.vector((points$X-5)^2+(points$Y-5)^2-9)
points["class"]<-(as.vector( points$scores<0 ))
points
}
dt<-scorer(data.frame(X=c(0,1,5,20,5,3,9,3,5,5),Y=c(0,9,9,0,-18,3,4,5,7,4)))
Then i am trying to predict the last column (class) using SVM
library(e1071)
model <- svm(class ~ . , dt)
predictedClass <- predict(model, dt)
but it complains with:
Error in svm.default(x, y, scale = scale, ..., na.action = na.action) :
Need numeric dependent variable for regression.
The advice from nya really works.
Please, have a look type parameter description
svm can be used as a classification machine, as a regression machine, or for
novelty detection. Depending on whether y is a factor or not, the default setting
for type is C-classification or eps-regression ... page 50
With your dataset you can make classification using svm method.
But if you want absolutely to make regression, try to transform your variable "class" in numeric form which can take value 1 for negative score and 0 for positif score.
function(points) {
points["scores"] <- as.vector((points$X-5)^2+(points$Y-5)^2-9)
points["class"]<-as.vector( ifelse(points$scores<0 ,1,0))
points
}
dt<-scorer(data.frame(X=c(0,1,`enter code here`5,20,5,3,9,3,5,5),Y=c(0,9,9,0,-18,3,4,5,7,4)))
svm(class~.,dt)

Back-transforming contrast lstrends results

I calculated a linear mixed model using the packages lme4 and lsmeans with the lmer-function, where I have one dependent variable rv and the interacting factors treatment, time, age, and race. I'm interested in the response variable change over time, that's why I use the lstrends-function. So far so good. The problem is, I have to square root the response variable in order to fit the model properly. But the pairs-function only gives out a response to the square root of the rv, hard to interpret!
So I tried to back-transform the response variable after pairs:
model.lmer <- lmer(sqrt(rv) ~ treat*time*age*race + (1|individual), data=mydata)
model.lst <- lstrends(model.lmer, ~treat | age*race , var = "time", type="response")
pairs(mouse.lst, type="response")
This obviously doesn't work, as stated by the package itself:
# Transformed response
sqwarp.rg <- ref.grid(update(warp.lm, sqrt(breaks) ~ .))
summary(sqwarp.rg)
# Back-transformed results - compare with summary of 'warp.rg'
summary(sqwarp.rg, type = "response")
# But differences of sqrts can't be back-transformed
summary(pairs(sqwarp.rg, by = "wool"), type = "response")
# We can do it via regrid
sqwarp.rg2 <- regrid(sqwarp.rg)
summary(sqwarp.rg2) # same as for sqwarp.rg with type = "response"
pairs(sqwarp.rg2, by = "wool")
It could look like the following code:
summary(pairs(lsmeans(rg.regrid, ~ treat | race*age, trend="time")), type="response")
The problem is, I can't alter the reference grid for lstrends, just for lsmeans, because the first argument in lstrends or lsmeans with trend="time" requires the linear mixed effect model (model.lmer) intead of just the reference grid like in lsmeans, without the trend-argument... That's probably why I can't back-transform the data with
This here sums up my problem pretty well:
model.sqrt <- lmer(sqrt(rv) ~ time*treat*race*age, data=mydata)
rg <- ref.grid(model.sqrt)
rg.regrid <- regrid(rg)
summary(pairs(lsmeans(rg.regrid, ~treat | race*age*time), type = "response"))
Works perfectly.
summary(pairs(lsmeans(rg.regrid, ~treat | race*age, trend="time"), type = "response"))
Gives the following error:
Error in summary(pairs(lsmeans(rg.regrid, ~vns | gen * age, trend = "time"), :
error in evaluating the argument 'object' in selecting a method for function 'summary': Error in data[[var]] : subscript out of bounds
How to avoid the error and still be able to back-transform my data?
It does NOT seem to seem possible at all - the back-transformation would be a complicated procedure without any obvious pattern. That's what the creator of the package said.

Error in predict.svm in R

Click here to access the train and test data I used. I m new to SVM. I was trying the svm package in R to train my data which consists of 40 attributes and 39 labels. All attributes are of double type(most of them are 0's or 1's becuase I performed dummy encoding on the categorical attriubutes ) , the class label was of different strings which i later converted to a factor and its now of Integer type.
model=svm(Category~.,data=train1,scale=FALSE)
p1=predict(model,test1,"prob")
This was the result i got once i trained the model using SVM.
Call:
svm(formula = Category ~ ., data = train1, scale = FALSE)
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 1
gamma: 0.02564103
Number of Support Vectors: 2230
I used the predict function
Error in predict.svm(model, test1, "prob") :
NAs in foreign function call (arg 1)
In addition: Warning message:
In predict.svm(model, test1, "prob") : NAs introduced by coercion
I'm not understanding why this error is appearing, I checked all attributes of my training data none of them have NA's in them. Please help me with this.
Thanks
I'm assuming you are using the package e1071 (you don't specify which package are you using, and as far as I know there is no package called svm).
The error message is confusing, but the problem is that you are passing "prob" as the 3rd argument, while the function expects a boolean. Try it like this:
require(e1071)
model=svm(Category~.,data=train1, scale=FALSE, probability=TRUE)
p1=predict(model,test1, probability = TRUE)
head(attr(p1, "probabilities"))
This is a sample of the output I get.
WARRANTS OTHER OFFENSES LARCENY/THEFT VEHICLE THEFT VANDALISM NON-CRIMINAL ROBBERY ASSAULT WEAPON LAWS BURGLARY
1 0.04809877 0.1749634 0.2649921 0.02899535 0.03548131 0.1276913 0.02498949 0.08322866 0.01097913 0.03800846
SUSPICIOUS OCC DRUNKENNESS FORGERY/COUNTERFEITING DRUG/NARCOTIC STOLEN PROPERTY SECONDARY CODES TRESPASS MISSING PERSON
1 0.03255891 0.003790755 0.006249521 0.01944938 0.004843043 0.01305858 0.009727582 0.01840337
FRAUD KIDNAPPING RUNAWAY DRIVING UNDER THE INFLUENCE SEX OFFENSES FORCIBLE PROSTITUTION DISORDERLY CONDUCT ARSON
1 0.01884472 0.006089563 0.001378799 0.003289503 0.01071418 0.004562048 0.003107619 0.002124643
FAMILY OFFENSES LIQUOR LAWS BRIBERY EMBEZZLEMENT SUICIDE
1 0.0004787845 0.001669914 0.0007471968 0.0007465053 0.0007374036
Hope it helps.

Extracting predictions from a GAM model with splines and lagged predictors

I have some data and am trying to teach myself about utilize lagged predictors within regression models. I'm currently trying to generate predictions from a generalized additive model that uses splines to smooth the data and contains lags.
Let's say I have the following data and have split the data into training and test samples.
head(mtcars)
Train <- sample(1:nrow(mtcars), ceiling(nrow(mtcars)*3/4), replace=FALSE)
Great, let's train the gam model on the training set.
f_gam <- gam(hp ~ s(qsec, bs="cr") + s(lag(disp, 1), bs="cr"), data=mtcars[Train,])
summary(f_gam)
When I go to predict on the holdout sample, I get an error message.
f_gam.pred <- predict(f_gam, mtcars[-Train,]); f_gam.pred
Error in ExtractData(object, data, NULL) :
'names' attribute [1] must be the same length as the vector [0]
Calls: predict ... predict.gam -> PredictMat -> Predict.matrix3 -> ExtractData
Can anyone help diagnose the issue and help with a solution. I get that lag(__,1) leaves a data point as NA and that is likely the reason for the lengths being different. However, I don't have a solution to the problem.
I'm going to assume you're using gam() from the mgcv library. It appears that gam() doesn't like functions that are not defined in "base" in the s() terms. You can get around this by adding a column which include the transformed variable and then modeling using that variable. For example
tmtcars <- transform(mtcars, ldisp=lag(disp,1))
Train <- sample(1:nrow(mtcars), ceiling(nrow(mtcars)*3/4), replace=FALSE)
f_gam <- gam(hp ~ s(qsec, bs="cr") + s(ldisp, bs="cr"), data= tmtcars[Train,])
summary(f_gam)
predict(f_gam, tmtcars[-Train,])
works without error.
The problem appears to be coming from the mgcv:::get.var function. It tires to decode the terms with something like
eval(parse(text = txt), data, enclos = NULL)
and because they explicitly set the enclosure to NULL, variable and function names outside of base cannot be resolved. So because mean() is in the base package, this works
eval(parse(text="mean(x)"), data.frame(x=1:4), enclos=NULL)
# [1] 2.5
but because var() is defined in stats, this does not
eval(parse(text="var(x)"), data.frame(x=1:4), enclos=NULL)
# Error in eval(expr, envir, enclos) : could not find function "var"
and lag(), like var() is defined in the stats package.

Predicting with lm object in R - black box paradigm

I have a function that returns an lm object. I want to produce predicted values based on some new data. The new data is a data.frame in the exact format as the data passed to the lm function, except that the response has been removed (since we're predicting, not training). I would expect to execute the following, but get an error:
predict( model , newdata )
"Error in eval(expr, envir, enclos) : object 'ModelResponse' not found"
In my case, ModelResponse was the name of the response column in the data I originally trained on. So just for kicks, I tried to insert NA reponse:
newdata$ModelResponse = NA
predict( model , newdata )
Error in terms.default(object, data = data) : no terms component nor attribute
Highly frustrating! R's notion of models/regression doesn't match mine: 1. I train a model with some data and get a model object. 2. I can score new data from any environment/function/frame/etc. so long as I input data into the model object that "looks like" the data I trained on (i.e. same column names). This is a standard black-box paradigm.
So here are my questions:
1. What concept(s) am I missing here?
2. How do I get my scenario to work?
3. How can I get model object to be portable? str(model) shows me that the model object saved the original data it trained on! So the model object is massive. I want my model to be portable to any function/environment/etc. and only contain the data it needs to score.
In the absence of str() on either the model or the data offered to the model, here's my guess regarding this error message:
predict( model , newdata )
"Error in eval(expr, envir, enclos) : object 'ModelResponse' not found"
I guess that you made a model object named "model" and that your outcome variable (the left-hand-side of the formula( in the original call to lm was named "ModelResponse" and that you then named a column in newdata by the same name. But what you should have done was leave out the "ModelResponse" columns (because that is what you are predicting) and put in the "Model_Predictor1", Model_Predictor2", etc. ... i.e. all the names on the right-hand-side of the formula given to lm()
The coef() function will allow you to extract the information needed to make the model portable.
mod.coef <- coef(model)
mod.coef
Since you expressed interest in the rms/Hmisc package combo Function, here it is using the help-example from ols and comparing the output with an extracted function and the rms Predict method. Note the capitals, since these are designed to work with the package equivalents of lm and glm(..., family="binomial") and coxph, which in rms become ols, lrm, and cph.
> set.seed(1)
> x1 <- runif(200)
> x2 <- sample(0:3, 200, TRUE)
> distance <- (x1 + x2/3 + rnorm(200))^2
> d <- datadist(x1,x2)
> options(datadist="d") # No d -> no summary, plot without giving all details
>
>
> f <- ols(sqrt(distance) ~ rcs(x1,4) + scored(x2), x=TRUE)
>
> Function(f)
function(x1 = 0.50549065,x2 = 1) {0.50497361+1.0737604* x1-
0.79398383*pmax(x1-0.083887788,0)^3+ 1.4392827*pmax(x1-0.38792825,0)^3-
0.38627901*pmax(x1-0.65115162,0)^3-0.25901986*pmax(x1-0.92736774,0)^3+
0.06374433*x2+ 0.60885222*(x2==2)+0.38971577*(x2==3) }
<environment: 0x11b4568e8>
> ols.fun <- Function(f)
> pred1 <- Predict(f, x1=1, x2=3)
> pred1
x1 x2 yhat lower upper
1 1 3 1.862754 1.386107 2.339401
Response variable (y): sqrt(distance)
Limits are 0.95 confidence limits
# The "yhat" is the same as one produces with the extracted function
> ols.fun(x1=1, x2=3)
[1] 1.862754
(I have learned through experience that the restricted cubic-spline fit functions coming from rms need to have spaces and carriage returns added to improve readability. )
Thinking long-term, you should probably take a look at the caret package. Many or most modeling functions work with data frames and matrices, others have a preference, and there may be other variations of their expectations. It's important to quickly get your head around each, but if you want a single wrapper that will simplify life for you, making the intricacies into a "black box", then caret is as close as you can get.
As a disclaimer: I do not use caret, as I don't think modeling should be a be a black box. I've had more than a few emails to maintainers of modeling packages resulting from looking into their code and seeing something amiss. Wrapping that in another layer would not serve my interests. So, in the very long-run, avoid caret and develop an enjoyment for dissecting what's going into and out of the different modeling functions. :)

Resources