Grabbing R^2 from linear model in R - r

After running something like:
mod.1<-lm(z~x+y)
I know I can do summary(mod.1) and see the $R^2$ value. I'm wondering how I might grab it from mod.1, sort of like grabbing the coefficients with mod.1$coefficients.

mod.1 = lm(c(1,2,3)~ c(1,2.3,3.4))
summary(mod.1)$r.squared

R-squared is actually not an element of the lm object itself, but of summary(mod.1). That is, if you type str(summary(mod.1)) you will see that the summary is itself a list (with a special print method) and that one of those list items is R-squared.
However, for programmatic use it's inefficient to calculate the entire summary just to extract one element. Rolling your own extractor function would lead to faster code in general, especially if you call lm with the argument y = TRUE. Then R-squared would just be 1 - sum(mod.1$residuals^2)/sum((mod.1$y - mean(mod.1$y))^2).

Related

Atomic vectors in R and applying function to them

So I have a data set that I'm using from UC Irvine's website("Wine Quality" dataset) and I want to take a look at a plot of the residuals of the data set. The reason I'm doing this is to look to see if there is a an increase in variance so I could run a log based regression. To look at the residuals I apply this code:
residuals(white.wine)
white.wine is how I named my dataframe. However I get this error thrown at me, "NULL". If I want to look at the residuals of a specific predictor variable like Fixed Acidity I get this error:
Error: $ operator is invalid for atomic vectors.
Any way around this? Thanks for any help!
#Hugh was right in saying that "residuals" must be used against a model, but I think your question was also asking about how to apply something over a data frame. In case you just want the variance of each predictor variable, you might want something like:
apply(white.wine, 2, var)
As the ?apply documentation says, you need to provide the data, the margin, and the function. The margin refers to applying over rows or columns, with 1 signaling to apply a function over the rows, and 2 indicating that the function should be applied over columns. I'm assuming you have predictor variables in columns, so I used a 2 in the code above.

R: parameter in update function

Here is a snippet of R script doing beta regression on data "GasolineYield":
library("betareg")
data("GasolineYield", package = "betareg")
gy_logit <- betareg(yield ~ batch + temp, data = GasolineYield)
gy_logit4 <- update(gy_logit, subset = -4)
The 4th line magically deletes the 4th observation and update the fit automatically, but I don't quite understand the why this parameter works in the update function here, because I tried to look up the documentation by ?update, but couldn't find there's such parameter.
I'm curious about how to find right documentation in this case, because maybe I want to add some new observation instead of removing it. Any help?
subset in betareg works the same as subset in lm, therefore you can read lm documentation.
From the help file you can find:
subset an optional vector specifying a subset of observations to be used in the fitting process.
Hence by setting select=-4 you are lefting out the fourth row in the estimation.
update() contains the ... parameter, which means any parameters that are not matched in your call to update() are passed on to the function that does the estimation. In this case, that is betareg(), which does have the subset argument.
This type of thing is very common in R. Many higher-level function that call other user-visible functions will have the three dot parameter and pass any unmatched parameters on, so you have to search all the user-visible functions that get called in order to know all possible options.
You can check out the help file for the top level function (update() in this case) to get an idea of which functions get the leftover parameters.

Taylor diagram from existing Correlation and Standard Dev values

Is it possible to create a Taylor diagram from already calculated correlation and standard deviation values?
I am doing model evaluation, and I have already the correlation and standard deviations values.I understand that there is already a package plotrix where by giving the observation and the modeled values, the diagram is created. However for the type of work that I am doing, it is easier to start by giving already the correlation and standard deviation values.
Is there any way I can do this in R?
There's no reason it shouldn't be possible, but the authors didn't seem to allow for that when they wrote the function. The function is a bit long and complex, but the part that does the calculation is at the top. It is possible to swap out that code and replace it to allow for the passing of summary statistics. Now, keep in mind what i'm about to do is a hack and i've only tested it with versions 3.5-5 of plotrix. Other version may not work.
Here will will create a new function taylor.diagram2 that takes all the code from taylor.diagram but adds in an extra if statement to check for a list of summarized data as the first argument
taylor.diagram2<-taylor.diagram
bl<-as.list(body(taylor.diagram))
cond<-list(
as.name("if"),
quote(is.list(ref) & missing(model)), #condition
quote({R<-ref$R; sd.r<-ref$sd.r; sd.f<-ref$sd.f}), #if true
as.call(c(as.symbol("{"), bl[3:8]))) #else
bl<-c(bl[1:2], as.call(cond), bl[9:length(bl)]) #splice in new code
body(taylor.diagram2)<-as.call(bl) #update function
Now we can test the function. First, we'll do things the standard way
#test data
aref<-rnorm(30,sd=2)
amodel1<-aref+rnorm(30)/2
#standard behavior function
taylor.diagram2(aref,amodel1, main="Standard Behavior"))
#summarized data
xx<-list(
R=cor(aref, amodel1, use = "pairwise"),
sd.r=sd(aref),
sd.f=sd(amodel1)
)
#modified behavior
taylor.diagram2(xx, main="Modified Behavior")
So the new taylor.diagram2 function can do both. If you pass it two vectors, it will do the standard behavior. If you pass it a list with the names R, sd.r, and sd.f, then it will do the same plot but with the values you passed in. Also, the model parameter must be empty for the modified version to work. That means if you want to set any additional parameter, you must use named parameters rather than positional arguments.

Accessing class values in R's poLCA

I am trying my hand at learning Latent Component Analysis, while also learning R. I'm using the poLCA package, and am having a bit of trouble accessing the attributes. I can run the sample code just fine:
ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
ds = within(ds, (cesdcut = ifelse(cesd>20, 1, 0)))
library(poLCA)
res2 = poLCA(cbind(homeless=homeless+1,
cesdcut=cesdcut+1, satreat=satreat+1,
linkstatus=linkstatus+1) ~ 1,
maxiter=50000, nclass=3,
nrep=10, data=ds)
but in order to make this more useful, I'd like to access the attributes within the objects created by the poLCA class as such:
attr(res2, 'Nobs')
attr(res2, 'maxiter')
but they both come up as 'Null'. I expect Nobs to be 453 (determined by the function) and maxiter to be 50000 (dictated by my input value).
I'm sure I'm just being naive, but I could use any help available. Thanks a lot!
Welcome to R. You've got the model-fitting syntax right, in that you can get a model out (don't know how latent component analysis works, so can't speak to the statistical validity of your result). However, you've mixed up the different ways in which R can store information pertaining to a model.
poLCA returns an object of class poLCA, which is
a list containing the following elements:
(. . .)
Nobs number of fully observed cases (less than or equal to N).
maxiter maximum number of iterations through which the estimation algorithm was set
to run.
Since it's a list, you can extract individual elements from your model object using the $ operator:
res2$Nobs # number of observations
res2$maxiter # maximum iterations
In some cases, there might be extractor functions to get this information without having to do low-level indexing. For example, many model-fitting functions will have a fitted method, which pulls out the vector of fitted values on the training data; and similarly residuals pulls out the vector of residuals. You should check whether there are such extractor functions provided by the poLCA package and use them if possible; that way, you're not making assumptions about the structure of the model object that might be broken in the future.
This is distinct to getting the attributes of an object, which is what you use attr for. Attributes in R are what you might call metadata: they contain R-specific information about an object itself, rather than information about whatever it is the object relates to. Examples of common attributes include class (the class of an object), dim (the dimensions of an array or matrix), names (names of individual elements of a vector/list/array) and so on.

Returning Multiple Output Parameters from Optim

Im running an optimisation routine using optim in R and im telling the programme what i want returned. for example, if i put return(op1$par), it will return all 4 of my variable values. Thats fine, and if i run return(op1), I obviously get all the information from the optimisation routine (par, value, convergence etc). However, in this format, the par values arent accessible in the output, it simply details that there are 4 values.
Now what i need is to the get the parameter values and the convergence information at the same time. R wont let me call this return(op1$par, op1$convergence) so im looking for the best way to get these two entities in one run?
I should specify that im writing this to a file for 1000s of iterations and not just looking to call it up once on screen.
Cheers
Try something like this:
return(c(Parameters=op1$par, Convergence=op1$convergence))
The names Parameters and Convergence are only for identifying what are the parameters and what is the convergence, since this result will be a vector.
By design, a function can return only one object (or else assignments like a <- fn(b) would get confusing; which thing do you assign?). But that object can be a vector, or a list (which is what optim does). So wrap your arguments in something like
return(c(par=op1$par, convergence=op1$convergence))
or more generally (for objects of different types),
return(list(par=op1$par, convergence=op1$convergence))

Resources