I'm trying to add second order curve to scatterplot.
I've read the answers to previous similar questions and here's what I came up with:
x<-log2(c(100,500,1000,2000,4000))
y<-c(3.6,1.308,1.065,.960,.908)
plot(x,y,pch=1)
mod_<-lm(y~poly(x,2,raw=TRUE))
lines(x,predict(mod_),col='red',lty=2)
Still, I get linear segments instead of smooth curve.
What mistake am I not seeing here ? Thanks !
You are calling predict by passing the model only. This only results in the model being evaluated at the values you specified in your lm call (that is x).
You need to supply a new set of values at which the model will be evaluated.
For, instance, this gives you a nice smooth line:
x<-log2(c(100,500,1000,2000,4000))
y<-c(3.6,1.308,1.065,.960,.908)
plot(x,y,pch=1)
mod_<-lm(y~poly(x,2,raw=TRUE))
# Define the new points at which you want to evaluate your model
new.x <- seq(6, 12, 0.1)
lines(new.x, predict(mod_, newdata = list(x=new.x)),col='red',lty=2)
You can also use ggplot2 like this
library(ggplot2)
df <- data.frame(x, y)
ggplot(data=df, aes(x, y))+geom_point()+stat_smooth(method="lm", formula = y ~ poly(x, 2, raw =TRUE))
Related
I am plotting my coefficient estimates using the function plot_summs() and would like to divide my coefficients into two separate groups.
The function plot_summs() has an argument groups, however, when I try to use it as explained in the documentation, I do not get any results nor error. Can someone give me an example of how I can use this argument please?
This is the code I currently have:
plot_summs(model.c, scale = TRUE, groups = list(pane_1 = c("AQI_average", "temp_yearly"), pane_2 = c("rain_1h_yearly", "snow_1h_yearly")), coefs = c("AQI Average"= "AQI_average", "Temperature (in Farenheit)" = "temp_yearly","Rain volume in mm" = "rain_1h_yearly", "Snow volume in mm" = "snow_1h_yearly"))
And the image below is what I get as a result. What I would like to get is to have two panes separate panes. One which would include "AQI_average" and "temp_yearly" and the other one that would have "rain_1h_yearly" and "snow_1h_yearly". Event though I use the groups argument, I do not get this.
Output of my code
By minimal reproducible example, markus is refering to a piece of code that enables others to exactly reproduce the issue you are refering to on our respective computers, as described in the link that they provided.
To me, it seems the problem is that the groups function does not seem to work in plot_summs - it seems someone here also pointed it out.
If plot_summs is replaced by plot_coef, the groups function work for me. However, the scale function does not seem to be available. A workaround might be:
r <- lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, data = iris)
y <- plot_summs(r, scale = TRUE) #Plot for scaled version
t <- plot_coefs(r, #Plot for unscaled versions but with facetting
groups =
list(
pane_1 = c("Sepal.Width", "Petal.Length"),
pane_2 = c("Petal.Width"))) + theme_linedraw()
y$data$group <- t$data$group #Add faceting column to data for the plot
t$data <- y$data #Replace the data with the scaled version
t
I hope this is what you meant!
I apologize first for bringing what I imagine to be a ridiculously simple problem here, but I have been unable to glean from the help file for package 'polynom' how to solve this problem. For one out of several years, I have two vectors of x (d for day of year) and y (e for an index of egg production) data:
d=c(169,176,183,190,197,204,211,218,225,232,239,246)
e=c(0,0,0.006839425,0.027323127,0.024666883,0.005603878,0.016599262,0.002810977,0.00560387 8,0,0.002810977,0.002810977)
I want to, for each year, use the poly.calc function to create a polynomial function that I can use to interpolate the timing of maximum egg production. I want then to superimpose the function on a plot of the data. To begin, I have no problem with the poly.calc function:
egg1996<-poly.calc(d,e)
egg1996
3216904000 - 173356400*x + 4239900*x^2 - 62124.17*x^3 + 605.9178*x^4 - 4.13053*x^5 +
0.02008226*x^6 - 6.963636e-05*x^7 + 1.687736e-07*x^8
I can then simply
plot(d,e)
But when I try to use the lines function to superimpose the function on the plot, I get confused. The help file states that the output of poly.calc is an object of class polynomial, and so I assume that "egg1996" will be the "x" in:
lines(x, len = 100, xlim = NULL, ylim = NULL, ...)
But I cannot seem to, based on the example listed:
lines (poly.calc( 2:4), lty = 2)
Or based on the arguments:
x an object of class "polynomial".
len size of vector at which evaluations are to be made.
xlim, ylim the range of x and y values with sensible defaults
Come up with a command that successfully graphs the polynomial "egg1996" onto the raw data.
I understand that this question is beneath you folks, but I would be very grateful for a little help. Many thanks.
I don't work with the polynom package, but the resultant data set is on a completely different scale (both X & Y axes) than the first plot() call. If you don't mind having it in two separate panels, this provides both plots for comparison:
library(polynom)
d <- c(169,176,183,190,197,204,211,218,225,232,239,246)
e <- c(0,0,0.006839425,0.027323127,0.024666883,0.005603878,
0.016599262,0.002810977,0.005603878,0,0.002810977,0.002810977)
egg1996 <- poly.calc(d,e)
par(mfrow=c(1,2))
plot(d, e)
plot(egg1996)
I have a problem with the function lines.
this is what I have written so far:
model.ew<-lm(Empl~Wage)
summary(model.ew)
plot(Empl,Wage)
mean<-1:500
lw<-1:500
up<-1:500
for(i in 1:500){
mean[i]<-predict(model.ew,data.frame(Wage=i*100),interval="confidence",level=0.90)[1]
lw[i]<-predict(model.ew,data.frame(Wage=i*100),interval="confidence",level=0.90)[2]
up[i]<-predict(model.ew,data.frame(Wage=i*100),interval="confidence",level=0.90)[3]
}
plot(Wage,Empl)
lines(mean,type="l",col="red")
lines(up,type="l",col="blue")
lines(lw,type="l",col="blue")
my problem i s that no line appears on my plot and I cannot figure out why.
Can somebody help me?
You really need to read some introductory manuals for R. Go to this page, and select one that illustrates using R for linear regression: http://cran.r-project.org/other-docs.html
First we need to make some data:
set.seed(42)
Wage <- rnorm(100, 50)
Empl <- Wage + rnorm(100, 0)
Now we run your regression and plot the lines:
model.ew <- lm(Empl~Wage)
summary(model.ew)
plot(Empl~Wage) # Note. You had the axes flipped here
Your first problem was that you flipped the axes. The dependent variable (Empl) goes on the vertical axis. That is the main reason you didn't get any lines on the plot. To get the prediction lines requires no loops at all and only a single plot call using matlines():
xval <- seq(min(Wage), max(Wage), length.out=101)
conf <- predict(model.ew, data.frame(Wage=xval),
interval="confidence", level=.90)
matlines(xval, conf, col=c("red", "blue", "blue"))
That's all there is to it.
Here is some code that tries to compute the marginal effects of each of the predictors in a model (using the effects package) and then plot the results. To do this, I am looping over the "term.labels" attribute of the glm terms object).
library(DAAG)
library(effects)
formula = pres.abs ~ altitude + distance + NoOfPools + NoOfSites + avrain + meanmin + meanmax
summary(logitFrogs <- glm(formula = formula, data = frogs, family = binomial(link = "logit")))
par(mfrow = c(4, 2))
for (predictorName in attr(logitFrogs$terms, "term.labels")) {
print(predictorName)
effLogitFrogs <- effect(predictorName, logitFrogs)
plot(effLogitFrogs)
}
This produces no picture at all. On the other hand, explicitly stating the predictor names does work:
effLogitFrogs <- effect("distance", logitFrogs)
plot(effLogitFrogs)
What am I doing wrong?
Although you call function plot(), actually it calls function plot.eff() and it is lattice plot and so par() argument is ignored. One solution is to use function allEffects() and then plot(). This will call function plot.efflist(). With this function you do not need for loop because all plots are made automatically.
effLogitFrogs <- allEffects(predictorName, logitFrogs)
plot(effLogitFrogs)
EDIT - solution with for loop
There is "ugly" solution to use with for() loop. For this we need also package grid. First, make as variables number of rows and columns (now it works only with 1 or 2 columns). Then grid.newpage() and pushViewport() set graphical window.
Predictor names are stored in vector outside the loop. Using functions pushViewport() and popViewport() all plots are put in the same graphical window.
library(lattice)
library(grid)
n.col=2
n.row= 4
grid.newpage()
pushViewport(viewport(layout = grid.layout(n.row,n.col)))
predictorName <- attr(logitFrogs$terms, "term.labels")
for (i in 1:length(predictorName)) {
print(predictorName[i])
effLogitFrogs <- effect(predictorName[i], logitFrogs)
pushViewport(viewport(layout.pos.col=ceiling(i/n.row), layout.pos.row=ifelse(i-n.row<=0,i,i-n.row)))
p<-plot(effLogitFrogs)
print(p,newpage=FALSE)
popViewport(1)
}
add print to your loop resolve the problem.
print(plot(effLogitFrogs))
plot call plot.eff , which create the plot without printing it.
allEffects generete an object of type eff.list. When we try to plot this object, its calls plot.efflist function which prints the plot so no need to call print like plot.eff.
In an effort to help populate the R tag here, I am posting a few questions I have often received from students. I have developed my own answers to these over the years, but perhaps there are better ways floating around that I don't know about.
The question: I just ran a regression with continuous y and x but factor f (where levels(f) produces c("level1","level2"))
thelm <- lm(y~x*f,data=thedata)
Now I would like to plot the predicted values of y by x broken down by groups defined by f. All of the plots I get are ugly and show too many lines.
My answer: Try the predict() function.
##restrict prediction to the valid data
##from the model by using thelm$model rather than thedata
thedata$yhat <- predict(thelm,
newdata=expand.grid(x=range(thelm$model$x),
f=levels(thelm$model$f)))
plot(yhat~x,data=thethedata,subset=f=="level1")
lines(yhat~x,data=thedata,subset=f=="level2")
Are there other ideas out there that are (1) easier to understand for a newcomer and/or (2) better from some other perspective?
The effects package has good ploting methods for visualizing the predicted values of regressions.
thedata<-data.frame(x=rnorm(20),f=rep(c("level1","level2"),10))
thedata$y<-rnorm(20,,3)+thedata$x*(as.numeric(thedata$f)-1)
library(effects)
model.lm <- lm(formula=y ~ x*f,data=thedata)
plot(effect(term="x:f",mod=model.lm,default.levels=20),multiline=TRUE)
Huh - still trying to wrap my brain around expand.grid(). Just for comparison's sake, this is how I'd do it (using ggplot2):
thedata <- data.frame(predict(thelm), thelm$model$x, thelm$model$f)
ggplot(thedata, aes(x = x, y = yhat, group = f, color = f)) + geom_line()
The ggplot() logic is pretty intuitive, I think - group and color the lines by f. With increasing numbers of groups, not having to specify a layer for each is increasingly helpful.
I am no expert in R. But I use:
xyplot(y ~ x, groups= f, data= Dat, type= c('p','r'),
grid= T, lwd= 3, auto.key= T,)
This is also an option:
interaction.plot(f,x,y, type="b", col=c(1:3),
leg.bty="0", leg.bg="beige", lwd=1, pch=c(18,24),
xlab="",
ylab="",
trace.label="",
main="Interaction Plot")
Here is a small change to the excellent suggestion by Matt and a solution similar to Helgi but with ggplot. Only difference from above is that I have used the geom_smooth(method='lm) which plots regression lines directly.
set.seed(1)
y = runif(100,1,10)
x = runif(100,1,10)
f = rep(c('level 1','level 2'),50)
thedata = data.frame(x,y,f)
library(ggplot2)
ggplot(thedata,aes(x=x,y=y,color=f))+geom_smooth(method='lm',se=F)