Plot which parameter where in R? - r

So... I'm looking at an example in a book that goes something like this:
library(daewr)
mod1 <- aov(height ~ time, data=bread)
summary(mod1)
...
par(mfrow=c(2,2))
plot(mod1, which=5)
plot(mod1, which=1)
plot(mod1, which=2)
plot(residuals(mod1) ~ loaf, main="Residuals vs Exp. Units", font.main=1, data=bread)
abline(h = 0, lty = 2)
That all works... but the text is a little vague about the purpose of the parameter 'which='. I dug around in the help (in Rstudio) on plot() and par(), looked around online... found some references to a different 'which()'... but nothing really referring me to the purpose/syntax for the parameter 'which=' inside plot().
A bit later (next page, figures) I found a mention of using names(mod1) to view the list of quantities calculated by aov... which I presume is what which= is refering to, i.e. which item in the list to plot where in the 2x2 matrix of plots. Yay. Now where the heck is that buried in the docs?!?

which selects which plot to be displayed:
A plot of residuals against fitted values
A normal Q-Q plot
A Scale-Location plot of sqrt(| residuals |) against fitted values
A plot of Cook's distances versus row labels
A plot of residuals against leverages
A plot of Cook's distances against leverage/(1-leverage)
By default, the first three and 5 are provided.
Check ?plot.lm in r for more details.

Related

How to smooth non-linear regression curve in R

So I'm asked to obtain an estimate theta of the variable Length in the MASS package. The code I used is shown below, as well as the resulting curve. Somehow, I don't end up with a smooth curve, but with a very "blocky" one, as well as some lines between points on the curve. Can anyone help me to get a smooth curve?
utils::data(muscle,package = "MASS")
Length.fit<-nls(Length~t1+t2*exp(-Conc/t3),muscle,
start=list(t1=3,t2=-3,t3=1))
plot(Length~Conc,data=muscle)
lines(muscle$Conc, predict(Length.fit))
Image of the plot:
.
Edit: as a follow-up question:
If I want to more accurately predict the curve, I use nonlinear regression to predict the curve for each of the 21 species. This gives me a vector
theta=(T11,T12,...,T21,T22,...,T3).
I can create a for-loop that plots all of the graphs, but like before, I end up with the blocky curve. However, seeing as I have to plot these curves as follows:
for(i in 1:21) {
lines(muscle$Conc,theta[i]+theta[i+21]*
exp(-muscle$Conc/theta[43]), col=color[i])
i = i+1
}
I don't see how I can use the same trick to smooth out these curves, since muscle$Conc still only has 4 values.
Edit 2:
I figured it out, and changed it to the following:
lines(seq(0,4,0.1),theta[i]+theta[i+21]*exp(-seq(0,4,0.1)/theta[43]), col=color[i])
If you look at the output of cbind(muscle$Conc, predict(Length.fit)), you'll see that many points are repeated and that they're not sorted in order of Conc. lines just plots the points in order and connects the points, giving you multiple back-and-forth lines. The code below runs predict on a unique set of ordered values for Conc.
plot(Length ~ Conc,data=muscle)
lines(seq(0,4,0.1),
predict(Length.fit, newdata=data.frame(Conc=seq(0,4,0.1))))

Building a logistic trend surface in R

I would like to build a very simple rectangular surface in R that would have a logistic trend. The values at the top would have the highest values (1) and at the bottom the lowest (0). I have drafted an image that shows example of the surface that I have in mind, with help of not the prettiest trend lines so you have an idea what is needed. I do not have any data, it is supposed to be a theoretical surface with logistic trend, that I am later going to modify.
Any help with how to start/approach it, or helpful packages in R would be highly appreciated!
Consider this as a hint.
library("graphics")
plot(0:1, type = "n",xaxt="n", ann=FALSE)
abline(h = c(seq(0,1,.1))
or
abline(h = c(0,.1,.2,.3,.6,.7,.8,.9))
abline(h = c(0.4,.5), col="red")
The only thing you have to do is place the variable, as you call it, with the “logistic trend,” in place of ‘0:1’
A second hint
df = as.matrix(c(0.131313131,0.111111111,0.090909091,
0.080808081,0.060606061,0.050505051,
0.060606061,0.080808081,0.090909091,
0.111111111,0.131313131))
barplot(prop.table(df, 2) )
this results in

How do I plot an abline() when I don't have any data points (in R)

I have to plot a few different simple linear models on a chart, the main point being to comment on them. I have no data for the models. I can't get R to create a plot with appropriate axes, i.e. I can't get the range of the axes correct. I think I'd like my y-axis to 0-400 and x to be 0-50.
Models are:
$$
\widehat y=108+0.20x_1
$$$$
\widehat y=101+2.15x_1
$$$$
\widehat y=132+0.20x_1
$$$$
\widehat y=119+8.15x_1
$$
I know I could possibly do this much more easily in a different software or create a dataset from the model and estimate and plot the model from that but I'd love to know if there is a better way in R.
As #Glen_b noticed, type = "n" in plot produces a plot with nothing on it. As it demands data, you have to provide anything as x - it can be NA, or some data. If you provide actual data, the plot function will figure out the plot margins from the data, otherwise you have to choose the margins by hand using xlim and ylim arguments. Next, you use abline that has parameters a and b for intercept and slope (or h and v if you want just a horizontal or vertical line).
plot(x=NA, type="n", ylim=c(100, 250), xlim=c(0, 50),
xlab=expression(x[1]), ylab=expression(hat(y)))
abline(a=108, b=0.2, col="red")
abline(a=101, b=2.15, col="green")
abline(a=132, b=0.2, col="blue")
abline(a=119, b=8.15, col="orange")

Labelling the residuals on diagnostic plots

I have made a linear regression model in R with 3 continuous independent variables and one continuous dependent variable. I have generated the diagnostic plots.
I would now like to label/colour the data points for each residual on my diagnostic plots according to the binary categorical independent variable that was not included in the model;
i.e. when this variable = A, I want a blue dot on my diagnostic plot,
and when this variable = B, I want a red dot, so there will be red and blue dots on my diagnostic plots.
I would love some advice on how to do this.
[You don't specify what diagnostic plots you're trying to do this to. You also haven't given a minimal reproducible example, which makes it difficult to alter what you were doing to do what you want.]
I'll give an example of the kind of command that does what you need and you may be able to adapt it to whatever displays you need.
library(MASS)
catsmdl <- lm(Hwt~Bwt,cats)
plot(residuals(catsmdl)~fitted(catsmdl), col=cats$Sex)
abline(h=0, col=8, lty=3)
which gives:
This even works with plot.lm, because it has a ... argument to pass information along to the lower level plotting functions. So for example:
opar <- par()
par(mfrow=c(2,2))
plot(catsmdl,col=c("blue","darkorange")[as.numeric(cats$Sex)])
par(opar)
If you replace c("blue","darkorange") with whatever colours you like, it should work. (There are a variety of ways to specify colours in R.)

lines() not properly displaying quadratic fit

I'm simply trying to display the fit I've generated using lm(), but the lines function is giving me a weird result in which there are multiple lines coming out of one point.
Here is my code:
library(ISLR)
data(Wage)
lm.mod<-lm(wage~poly(age, 4), data=Wage)
Wage$lm.fit<-predict(lm.mod, Wage)
plot(Wage$age, Wage$wage)
lines(Wage$age, Wage$lm.fit, col="blue")
I've tried resetting my plot with dev.off(), but I've had no luck. I'm using rStudio. FWIW, the line shows up perfectly fine if I make the regression linear only, but as soon as I make it quadratic or higher (using I(age^2) or poly()), I get a weird graph. Also, the points() function works fine with poly().
Thanks for the help.
Because you forgot to order the points by age first, the lines are going to random ages. This is happening for the linear regression too; he reason it works for lines is because traveling along any set of points along a line...stays on the line!
plot(Wage$age, Wage$wage)
lines(sort(Wage$age), Wage$lm.fit[order(Wage$age)], col = 'blue')
Consider increasing the line width for a better view:
lines(sort(Wage$age), Wage$lm.fit[order(Wage$age)], col = 'blue', lwd = 3)
Just to add another more general tip on plotting model predictions:
An often used strategy is to create a new data set (e.g. newdat) which contains a sequence of values for your predictor variables across a range of possible values. Then use this data to show your predicted values. In this data set, you have a good spread of predictor variable values, but this may not always be the case. With the new data set, you can ensure that your line represents evenly distributed values across the variable's range:
Example
newdat <- data.frame(age=seq(min(Wage$age), max(Wage$age),length=1000))
newdat$pred <- predict(lm.mod, newdata=newdat)
plot(Wage$age, Wage$wage, col=8, ylab="Wage", xlab="Age")
lines(newdat$age, newdat$pred, col="blue", lwd=2)

Resources