I have a small multiple plot that looks like this:
The plot presents the results from two models: mpg predicted by cyl and disp for two transmission types. 0 is the first model, fit for automatic transmission. 1 is the second model, fit for manual transmission. The code to get the plot is this:
library(tidyverse)
library(dotwhisker)
mod_mtcars='mpg~cyl+disp'
np_mtcars=mtcars%>%
group_by(am)%>%
do(broom::tidy(lm(mod_mtcars, data= . )))%>%
rename(model=am)
small_multiple(np_mtcars)
I would like to add a horizontal line to each subplot which corresponds to the coefficients of a model fit without groups (a complete pooling model: cp=lm(mpg~cyl+disp, data=mtcars)). I know how to add a generic horizontal line, with intercept 0, for instance. However, does anyone know how to add a different line to each subplot?
When I vectorise the coefficients of cp (cp_coeff=coef(cp)) and add them to the plot, I get all of them at once on every subplot. When I run the below loop, I get the last element of the vector printed on each subplot.
for (i in 1:2){
small_multiple(np_mtcars)+
geom_hline(cp_coeff[i])}
You need to add another layer as follows:
small_multiple(np_mtcars) +
geom_hline(data = broom::tidy(cp)[-1,], aes(yintercept=estimate))
Look at broom::tidy(cp) for an explanation as to why this works (compared to np_mtcars), and think about how it will be plotted given the facets already defined in the graph.
Related
I'm using GAM in R and I can't understand why the output for two different equations that should give the same plot are not exactly the same.
For example, when using the mpg dataset with a multivariate equation as follows, I get the plot for the additive affect of weight and rpm in hw.mpg. Then, I want to see what happens when I plot the data of rmp by fuel type. This gives me 3 plots, and I expected the first one (weight) to be exactly the same as the one plotted previously without the "by fuel" differentiation. Am I missing something? Then what is the graph 1 in figure 2 showing?
To get figure 1:
par(mfrow=c(1,2))
data(mpg)
mod_hwy1 <- gam(hw.mpg ~ s(weight) + s(rpm), data = mpg, method = "REML")
plot(mod_hwy1)
To get figure 2:
par(mfrow=c(1,3))
mod_hwy2 <- gam(hw.mpg ~ s(weight) + s(rpm, by=fuel), data = mpg, method = "REML")
plot(mod_hwy2)
Using my own data is even more visible that the two graphs are not exactly the same:
Please someone help me understand!
The main problem with your model is that you forgot to include the group means for the levels of fuel. As a result, the smooths, which are centred about the overall mean of the response are having to also model the group means for the levels of fuel.
Fit the model as:
mod_hwy2 <- gam(hw.mpg ~ fuel + # <--- group means
s(weight) + s(rpm, by=fuel),
data = mpg, method = "REML")
Then add in Gregor's point about these effects being conditional upon the other terms in the model and you should be able to understand what's going one and why things change.
And regarding one of your comments; the locations are shown in your plot, look at the label for the y-axis of each plot.
I have the following matrix:
test <- matrix(c(2006,100,
2007,105,
2008,98,
2009,102,
2010,107),ncol=2,byrow=TRUE)
And I want to draw its boxplot with
boxplot.matrix(test)
However, I only get two flat lines:
I can't pinpoint what I am doing wrong. What could be the problem?
If you examine the nature of your data, you will see that there are 2 groups that are far apart but within each group, the data points are close together.
Due to the clustering and the scaling, your data appear the way they are.
If you examine each column separately, you will get a "typical" box plot
> boxplot(test[,1], main="boxplot of column 1")
> boxplot(test[,2], main="boxplot of column 2")
So I'm asked to obtain an estimate theta of the variable Length in the MASS package. The code I used is shown below, as well as the resulting curve. Somehow, I don't end up with a smooth curve, but with a very "blocky" one, as well as some lines between points on the curve. Can anyone help me to get a smooth curve?
utils::data(muscle,package = "MASS")
Length.fit<-nls(Length~t1+t2*exp(-Conc/t3),muscle,
start=list(t1=3,t2=-3,t3=1))
plot(Length~Conc,data=muscle)
lines(muscle$Conc, predict(Length.fit))
Image of the plot:
.
Edit: as a follow-up question:
If I want to more accurately predict the curve, I use nonlinear regression to predict the curve for each of the 21 species. This gives me a vector
theta=(T11,T12,...,T21,T22,...,T3).
I can create a for-loop that plots all of the graphs, but like before, I end up with the blocky curve. However, seeing as I have to plot these curves as follows:
for(i in 1:21) {
lines(muscle$Conc,theta[i]+theta[i+21]*
exp(-muscle$Conc/theta[43]), col=color[i])
i = i+1
}
I don't see how I can use the same trick to smooth out these curves, since muscle$Conc still only has 4 values.
Edit 2:
I figured it out, and changed it to the following:
lines(seq(0,4,0.1),theta[i]+theta[i+21]*exp(-seq(0,4,0.1)/theta[43]), col=color[i])
If you look at the output of cbind(muscle$Conc, predict(Length.fit)), you'll see that many points are repeated and that they're not sorted in order of Conc. lines just plots the points in order and connects the points, giving you multiple back-and-forth lines. The code below runs predict on a unique set of ordered values for Conc.
plot(Length ~ Conc,data=muscle)
lines(seq(0,4,0.1),
predict(Length.fit, newdata=data.frame(Conc=seq(0,4,0.1))))
I am trying to have output 2 different graphs with a regression line. I am using the mtcars data set which I believe you can load into R. So, I am comparing 2 different pairs of information to create a regression line. And the problem seems to be that the 2nd regression line from the 2nd graph is for some reason in the first graph as well.
I just want it to show 1 regression line in each graph the way it should be.
mtcars
names(mtcars)
attach(mtcars)
par(mfrow=c(1,2), bg="white")
with(mtcars,
{
regrline=(lm(gear~mpg))
abline(regrline)
plot(mpg,gear,abline(regrline, col="red"),main="MPG vs Gear")
# The black line in the first graph is the regression line(blue) from the second graph
regrline=(lm(cyl~disp))
abline(regrline)
plot(disp,cyl,abline(regrline, col="blue"),main="Displacement vs Number of Cylinder")
})
Also when I run the code separately for plotting, I don't see the black line. Its only when I run it with the: with() it causes a problem.
First of all, you really should avoid using attach. And for functions that have data= parameters (like plot and lm), its usually wiser to use that parameter rather than with().
Also, abline() is a function that should be called after plot(). Putting it is a parameter to plot() doesn't really make any sense.
Here's a better arrangement of your code
par(mfrow=c(1,2), bg="white")
regrline=lm(gear~mpg, mtcars)
plot(gear~mpg,mtcars,main="MPG vs Gear")
abline(regrline, col="red")
regrline=lm(cyl~disp, mtcars)
plot(cyl~disp,mtcars,main="Displacement vs Number of Cylinder")
abline(regrline, col="blue")
You got that second regression line because you were calling abline() before plot() for the second regression, do the line drew on the first plot.
Here is your code cleaned up a little. You were making redundant calls to abline that was drawing the extra lines.
By the way, you don't need to use attach when you use with. with is basically a temporary attach.
par(mfrow=c(1,2), bg="white")
with(mtcars,
{
regrline=(lm(gear~mpg))
plot(mpg,gear,main="MPG vs Gear")
abline(regrline, col="red")
regrline=(lm(cyl~disp))
plot(disp,cyl,main="Displacement vs Number of Cylinder")
abline(regrline, col="blue")
}
)
I'm simply trying to display the fit I've generated using lm(), but the lines function is giving me a weird result in which there are multiple lines coming out of one point.
Here is my code:
library(ISLR)
data(Wage)
lm.mod<-lm(wage~poly(age, 4), data=Wage)
Wage$lm.fit<-predict(lm.mod, Wage)
plot(Wage$age, Wage$wage)
lines(Wage$age, Wage$lm.fit, col="blue")
I've tried resetting my plot with dev.off(), but I've had no luck. I'm using rStudio. FWIW, the line shows up perfectly fine if I make the regression linear only, but as soon as I make it quadratic or higher (using I(age^2) or poly()), I get a weird graph. Also, the points() function works fine with poly().
Thanks for the help.
Because you forgot to order the points by age first, the lines are going to random ages. This is happening for the linear regression too; he reason it works for lines is because traveling along any set of points along a line...stays on the line!
plot(Wage$age, Wage$wage)
lines(sort(Wage$age), Wage$lm.fit[order(Wage$age)], col = 'blue')
Consider increasing the line width for a better view:
lines(sort(Wage$age), Wage$lm.fit[order(Wage$age)], col = 'blue', lwd = 3)
Just to add another more general tip on plotting model predictions:
An often used strategy is to create a new data set (e.g. newdat) which contains a sequence of values for your predictor variables across a range of possible values. Then use this data to show your predicted values. In this data set, you have a good spread of predictor variable values, but this may not always be the case. With the new data set, you can ensure that your line represents evenly distributed values across the variable's range:
Example
newdat <- data.frame(age=seq(min(Wage$age), max(Wage$age),length=1000))
newdat$pred <- predict(lm.mod, newdata=newdat)
plot(Wage$age, Wage$wage, col=8, ylab="Wage", xlab="Age")
lines(newdat$age, newdat$pred, col="blue", lwd=2)