when calling a function in R, how can I retrieve the result values. For example, I used 'roc' function and I need to extract AUC value and CI (0.6693 and 0.6196-0.7191 respectively in the following example).
> roc(tmpData[,lenCnames], fitted(model), ci=TRUE)
Call:
roc.default(response = tmpData[, lenCnames], predictor = fitted(model), ci = TRUE)
Data: fitted(model) in 127 controls (tmpData[, lenCnames] 0) < 3248 cases (tmpData[, lenCnames] 1).
Area under the curve: 0.6693
95% CI: 0.6196-0.7191 (DeLong)
I can use the following to fetch these values with associated texts.
> z$auc
Area under the curve: 0.6693
> z$ci
95% CI: 0.6196-0.7191 (DeLong)
Is there a way to get only the values and not the text.
I do now how to get these using 'regular expression' or 'strsplit' function, but I suspect there should be some other way to directly access these values.
It's helpful to use reproducible examples when asking a question. Also best to refer to the library you're asking about ("pROC"), since it is not loaded with base R. pROC has functions that extract auc and ci.auc objects from the roc object.
>library("pROC")
>data(aSAH)
# Basic example
>z <- roc(aSAH$outcome, aSAH$s100b,
levels=c("Good", "Poor"))
# Examining the class of 'auc' output shows us that it is also of class 'numeric'
> class(auc(z))
[1] "auc" "numeric"
# calling 'as.numeric' will extract the value
> as.numeric(auc(z))
[1] 0.7313686
# calling 'as.numeric' on the 'ci.auc' object extracts three values.
as.numeric(ci(z))
[1] 0.6301182 0.7313686 0.8326189
# The ones we want are 1 and 3
> as.numeric(ci(z))[c(1,3)]
[1] 0.6301182 0.8326189
Using the functions str, class, and attributes will often help you figure out how to get what you want out of an object.
Related
I'm very new to R (and stackoverflow). I've been trying to conduct a simple slopes analysis for my continuous x dichotomous regression model using lmres, and simpleSlope from the pequod package.
My variables:
SLS - continuous DV
csibdiff - continuous predictor (I already manually centered the variable with another code)
culture - dichotomous moderator
newmod<-lmres(SLS ~ csibdiff*culture, data=sibdat2)
newmodss <-simpleSlope(newmod, pred="csibdiff", mod1="culture")
However, after running the simpleSlope function, I get this error message:
Error in if (nomZ %in% coded) { : argument is of length zero
I don't understand the nomZ part but I assume something was wrong with my variables. What does this mean? I don't have a nomZ named thing in my data at all. None of my variables are null class (I checked them with the is.null() function), and I didn't seem to have accidentally deleted the contents of the variable (I checked with the table() function).
If anyone else can suggest another function/package that I can do a simple slope analysis in, as well, I'd appreciate it. I've been stuck on this problem for a few days now.
EDIT: I subsetted the relevant variables into a csv file.
https://www.dropbox.com/s/6j82ky457ctepkz/sibdat2.csv?dl=0
tl;dr it looks like the authors of the package were thinking primarily about continuous moderators; if you specify mod1="cultureEuropean" (i.e. to match the name of the corresponding parameter in the output) the function returns an answer (I have no idea if it's sensible or not ...)
It would be a service to the community to let the maintainers of the pequod package (maintainer("pequod")) know about this issue ...
Read data and replicate error:
sibdat2 <- read.csv("sibdat2.csv")
library(pequod)
newmod <- lmres(SLS ~ csibdiff*culture, data=sibdat2)
newmodss <- simpleSlope(newmod, pred="csibdiff", mod1="culture")
Check the data:
summary(sibdat2)
We do have some NA values in csibdiff, so try removing these ...
sibdat2B <- na.omit(sibdat2)
But that doesn't actually help (same error as before).
Plot the data to check for other strangeness
library(ggplot2); theme_set(theme_bw())
ggplot(sibdat2B,aes(csibdiff,SLS,colour=culture))+
stat_sum(aes(size=factor(..n..))) +
geom_smooth(method="lm")
There's not much going on here, but nothing obviously wrong either ...
Use traceback() to see approximately where the problem is:
traceback()
3: simple.slope(object, pred, mod1, mod2, coded)
2: simpleSlope.default(newmod, pred = "csibdiff", mod1 = "culture")
1: simpleSlope(newmod, pred = "csibdiff", mod1 = "culture")
We could use options(error=recover) to jump right to the scene of the crime, but let's try step-by-step debugging instead ...
debug(pequod:::simple.slope)
As we go through we can see this:
nomZ <- names(regr$coef)[pos_mod]
nomZ ## character(0)
And looking a bit farther back we can see that pos_mod is also a zero-length integer. Farther back, we see that the code is looking through the parameter names (row names of the variance-covariance matrix) for the name of the modifier ... but it's not there.
debug: pos_pred_mod1 <- fI + grep(paste0("\\b", mod1, "\\b"), jj[(fI +
1):(fI + fII)])
Browse[2]> pos_mod
## integer(0)
Browse[2]> jj[1:fI]
## [[1]]
## [1] "(Intercept)"
##
## [[2]]
## [1] "csibdiff"
##
## [[3]]
## [1] "cultureEuropean"
Browse[2]> mod1
## [1] "culture"
The solution is to tell simpleSlope to look for a variable that is there ...
(newmodss <- simpleSlope(newmod, pred="csibdiff", mod1="cultureEuropean"))
## Simple Slope:
## simple slope standard error t-value p.value
## Low cultureEuropean (-1 SD) -0.2720128 0.2264635 -1.201133 0.2336911
## High cultureEuropean (+1 SD) 0.2149291 0.1668690 1.288011 0.2019241
We do get some warnings about NaNs produced -- you'll have to dig farther yourself to see if you need to worry about them.
I can use rpart to predict (as below),
library(rpart)
datpred <-tail(car.test.frame,10)
fit <- rpart(Mileage ~ Weight+Price, car.test.frame)
predict(fit,newdata=datpred)
plot(fit, uniform=TRUE)
text(fit, use.n=TRUE, all=TRUE, cex=.8)
objects(fit)
Is there an easy way to convert the fit objects into a simple function that contains only the splitting logic on the data input and then outputs the prediction?
The reason for this is that I can then have the function within a single script with no need to load the fit objects from an external source.
Thank you for your help.
You can save your object in a file using dput() then read it with dget():
dput(fit, 'fit.dput')
rm(fit)
fit <- dget('fit.dput')
The output that you are interested in, namely the variable names and the values at which the tree is split, are assembled by the labels.rpart function:
labels(fit)
#-----------
[1] "root" "Weight>=2568" "Weight>=3088" "Weight< 3088"
[5] "Weight>=2748" "Weight< 2748" "Weight< 2568"
The 'splits' element of the fit object is where the cutpoints are stored (in the "index" column):
> fit$splits
count ncat improve index adj
Weight 60 1 0.5953491 2567.5 0
Weight 45 1 0.5045118 3087.5 0
Weight 23 1 0.1476996 2747.5 0
You can look at the code but if you don't already know how to do that then this is not a function that is easy to understand:
> methods(labels)
[1] labels.default labels.dendrogram* labels.dist*
[4] labels.lm* labels.rpart* labels.survreg
[7] labels.terms*
see '?methods' for accessing help and source code
> getAnywhere(labels.rpart)
I am busy with comparing different machine learning techniques in R.
This is the case: I made several functions that, in an automated way
are able to create each a different prediction model (e.g: logistic regression, random forest, neural network, hybrid ensemble , etc.) , predictions, confusion matrices, several statistics (e.g AUC and Fscore) ,and different plots.
Now I would like to create a list of S4 (or S3?) objects in R, where each object contains the model, predictions, the plots, confusion matrix , auc and fscore.
The idea is that each function creates such object and then append it to the object list in the return statement.
How should I program such class? And how can I define that each model can be of some different type (I suppose that all models that I create are S3 objects, so how do can I define this in my S4 class?
The end result should be able to do something like this: modelList[i]#plot should for example summon the requested plot. and names(modelList[i]) should give the name of the used model (if this is not possible, modelList[i]#name will do). Also, it should be possible to select the best model out of the list, based on a parameter, such as AUC.
I am not experienced in creating such object, so this is the code / idea I have at the moment:
modelObject <- setClass(
# Set the name for the class
"modelObject",
# Define the slots
slots = c(
modelName = "character"
model = #should contain a glm, neural network, random forest , etc model
predictions = #should contain a matrix or dataframe of custid and prediction
rocCurve = #when summoned, the ROC curve should be plotted
plotX = #when summoned, plot X should be plotted
AUC = "numeric" #contains the value of the AUC
confusionMatrix = "matrix" #prints the confusion matrix in the console
statX = "numeric"#contains statistic X about the confusion matrix e.g. Fscore
),
# Set the default values for the slots. (optional)
prototype=list(
# I guess i can assign NULL to each variable of the S4 object
),
# Make a function that can test to see if the data is consistent.
# This is not called if you have an initialize function defined!
validity=function(object)
{
#not really an idea how to handle this
}
return(TRUE)
}
)
Use setOldClass() to promote each S3 class to it's S4 equivalent
setOldClass("lm")
setOldClass(c("glm", "lm"))
setOldClass(c("nnet.formula", "nnet"))
setOldClass("xx")
Use setClassUnion() to insert a common base class in the hierarchy
setClassUnion("lmORnnetORxx", c("lm", "nnet", "xx"))
.ModelObject <- setClass("ModelObject", slots=c(model="lmORnnetORxx"))
setMethod("show", "ModelObject", function(object) {
cat("model class: ", class(object#model), "\n")
})
In action:
> library(nnet)
> x <- y <- 1:10
> .ModelObject(model=lm(x~y))
model class: lm
> .ModelObject(model=glm(x~y))
model class: glm lm
> .ModelObject(model=nnet(x~y, size=10, trace=FALSE))
model class: nnet.formula nnet
I think that you would also like to implement a Models object that contains a list where all elements are ModelObject; the constraint would be imposed by a validity method (see ?setValidity).
What I would do, is for each slot you want in your modelObject class, determine the range of expected values. For example, your model slot has to support all the possible classes of objects that can be returned by model training functions (e.g. lm(), glm(), nnet(), etc.). In the example case, you see the following objects returned:
```
x <- y <- 1:10
class(lm(x~y))
class(glm(x~y))
class(nnet(x~y, size=10))
```
Since there is no common class among the objects returned, it might make more sense to use an S3, which has less rigorous syntax and would allow you to assign various classes of output to the same field name. Your question is actually quite tough to answer, given that there are so many different approaches to take with R's myriad OO systems.
I'm trying to understand the predict function in R
There is a parameter called type which I can set to "response" or "scores"
I'm having difficulty understanding the difference.
Thanks.
This isn't precisely an answer, but it shows how I looked through all of the predict() methods available in base R to see what the possible values of type were for all of those methods ...
m <- methods("predict")
p <- lapply(m,getAnywhere)
tt <- function(x) {
obj <- formals(x$objs[[1]])
r <- eval(obj$type)
}
res <- setNames(lapply(p,tt),
sapply(p,"[[","name"))
res[!sapply(res,is.null)]
Results:
$predict.glm
[1] "link" "response" "terms"
$predict.lm
[1] "response" "terms"
So, you're going to have to tell us what S3 predict() method allows type="scores" as an option ...
Googling cran predict type="scores": maybe the pls package?
From ?predict.mvr:
When ‘type’ is ‘"scores"’, predicted score values are returned for
the components given in ‘comps’. If ‘comps’ is missing or ‘NULL’,
‘ncomps’ is used instead.
I believe that the score values are the predicted principal component scores for a given set of predictors (as opposed to the predicted values of the original predictor variables).
I want to get the values of bootstrap statistics (original, bias and error) into a separate list - but I cannot figure out how to do that.
Here's an example:
> library(boot)
> set.seed(123)
> mean.fun <- function(data, idx) { mean(data[idx]) }
> data <- boot(data=rnorm(100), statistic=mean.fun, R=999)
> names(data)
[1] "t0" "t" "R" "data"
[5] "seed" "statistic" "sim" "call"
[9] "stype" "strata" "weights"
> data
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = rnorm(100), statistic = mean.fun, R = 999)
Bootstrap Statistics :
original bias std. error
t1* 0.09040591 0.004751773 0.08823615
Now, instead of text I want the actual values. Apparently data$t0 is the "original" but I don't see how to get the values for bias and error.
Also, since typing a function name gives you its code, I typed boost in R and copied a snippet from the source code, and tried to search it on my local R installation. But couldn't find anything. Why, shouldn't R grab that source code from a local storage?
The std.error and bias are not stored as a part of the boot object. It is calculated on the fly (see: https://stat.ethz.ch/pipermail/r-help/2011-July/284660.html)
In your case, try:
mean(data$t) - data$t0
sd(data$t)