Working with this data in Rstudio. I need to run a simple regression of ed76 on lwage76 and a saturated regression that turns ed76 into a dummy variable for every level within the column. Then I need to plot both regressions in an XY plot with lwage76 as the Y axis and ed76 as the X axis. This is what I have so far:
regression <- lm(nlsdata$lwage76~nlsdata$ed76)
predicted <- data.frame(Edu =nlsdata$ed76, Wage = predict(regression))
aggplot <- aggregate(Wage ~ Edu, data=predicted, mean)
xyplot( Wage ~ Edu, data = aggplot, grid = TRUE, type = c("p","l"))
This gives me a very nice XY plot, but now I need to add the predicted values from my staturated model:
satreg <- lm(lwage76 ~ ed76*edu_1 + ed76*edu_2 + ed76*edu_3 +
ed76*edu_4 + ed76*edu_5 + ed76*edu_6 + ed76*edu_7 +
ed76*edu_8 + ed76*edu_9 + ed76*edu_10 + ed76*edu_11 +
ed76*edu_12 + ed76*edu_13 + ed76*edu_14 + ed76*edu_15 +
ed76*edu_16 + ed76*edu_17, data = nlsdata)
satmodel <- data.frame(Edu =nlsdata$ed76, Wage = predict(satreg))
So how do I add the second data set to the graph that I have?
Solution in ggplot:
ggplot(data=predicted, aes(Edu, Wage)) +
geom_line() +
geom_point() +
geom_line(data=satmodel, colour="blue") +
geom_point(data=satmodel, colour="blue")
Alternatively, you can label each of your table and combined them into a single data.frame.
satmodel <- satmodel %>% mutate(type="sat_model")
predicted <- predicted %>% mutate(type="predicted")
df <- rbind(satmodel, predicted)
ggplot(df, aes(Edu, Wage, colour=type)) +
geom_line() +
geom_point()
Related
In the R statistical package, is there a way to plot a graph of a second order polynomial regression with one continuous variable and one categorical variable?
To generate a linear regression graph with one categorical variable:
library(ggplot2)
library(ggthemes) ## theme_few()
set.seed(1)
df <- data.frame(minutes = runif(60, 5, 15), endtime=60, category="a")
df$category = df$category=letters[seq( from = 1, to = 2 )]
df$endtime = df$endtime + df$minutes^3/180 + df$minutes*runif(60, 1, 2)
ggplot(df, aes(y=endtime, x=minutes, col = category)) +
geom_point() +
geom_smooth(method=lm) +
theme_few()
To plot a polynomial graph with one one continuous variable:
ggplot(df, aes(x=minutes, y=endtime)) +
geom_point() +
stat_smooth(method='lm', formula = y ~ poly(x,2), size = 1) +
xlab('Minutes of warm up') +
ylab('End time')
But I can’t figure out how to plot a polynomial graph with one continuous variable and one categorical variable.
Just add a colour or group mapping. This will make ggplot fit and display separate polynomial regressions for each category. (1) It's not possible to display an additive mixed-polynomial regression (i.e. lm(y ~ poly(x,2) + category)); (2) what's shown here is not quite equivalent to the results of the interaction model lm(y ~ poly(x,2)*col), because the residual variances (and hence the widths of the confidence ribbons) are estimated separately for each group.
ggplot(df, aes(x=minutes, y=endtime, col = category)) +
geom_point() +
stat_smooth(method='lm', formula = y ~ poly(x,2)) +
labs(x = 'Minutes of warm up', y = 'End time') +
theme_few()
I'm trying to make a boxplot to visualize this regression model
library(lme4)
lmer(dv1 ~ intervention + (1|id/area),
data=data,
REML=T)
In this experiment, the control and treatment intervention are both applied to a subject within discrete areas.
Here's the data I'm using
data <- data.frame("id" = 1:2,
"intervention" = c(rep("a",27),rep("b", 27)),
"area" = 1:3,
"dv1" = rnorm(54),
"dv2" = rnorm(54),
"dv3" = rnorm(54))
data$area <- as.factor(data$area)
data$id <- as.factor(data$id)
Here's what I've tried
library(ggplot2)
ggplot(data,aes(x=area,y=dv1,col=intervention)) +
geom_point() +
geom_boxplot(alpha=0.2) +
facet_wrap(~id) +
ggtitle("DV1") +
xlab("Intervention") +
ylab("DV1")
Instead of the red points overlaying the red boxplot, they're all over the place. How do I fix this?
Edit: I used the jitter options that u/eipi10 suggested and this is what I have now.
ggplot(data,aes(x=area,y=dv1,col=intervention)) +
geom_point(position=position_jitterdodge(dodge.width=0.75, jitter.height=0, jitter.width=0.25), alpha=0.6) +
geom_boxplot(alpha=0.2, size=0.3) +
facet_wrap(~id) +
ggtitle("DV1") +
xlab("Area") +
ylab("DV1")
I have the following data
df <- data.frame(x= c(0,1,10,100,1000,0,1, 10,100,1000,0,1,10,100,1000),
y=c(7,15,135,1132,6459,-3,11,127,1120,6249,-5,13,126,1208,6208))
After making a linear model using the data, I used the model to predict y values from know x values. Stored the predicted y values in a data frame "pred.fits"
fit <- lm(data = df, y ~ x)
pred.fits <- expand.grid(x=seq(1, 2000, length=2001))
pm <- predict(fit, newdata=pred.fits, interval="confidence")
pred.fits$py <- pm[,1]
I plot the data and use both geom_smooth() and geom_line(), they seem to be quite coincident.
ggplot(df, aes(x=x, y=y)) +
geom_point() +
geom_smooth(method = lm, formula = y ~ x, se = FALSE, size=1.5) +
geom_line(data=pred.fits, aes(x=x, y=py), size=.2)
However, when I plot the same data, with setting the axes in log scale the two regressions differs drastically.
ggplot(df, aes(x=x, y=y)) +
geom_point() +
geom_smooth(method = lm, formula = y ~ x, se = FALSE, size=1.5) +
geom_line(data=pred.fits, aes(x=x, y=py), size=.2) +
scale_x_log10() +
scale_y_log10()
Am I missing something here?
UPDATE
After #Duck pointed me to correct direction, I was able to get it right. The issue was, I wanted the data to be untransformed, but the axes transformed to log10 scale. This is how I was able to do it.
df2 <- df[df$x>=1,] # remove annoying warning msgs.
fit2 <- lm(data = df2, log10(y) ~ log10(x))
pred.fits2 <- expand.grid(x=seq(10^0, 10^3 , length=200))
pm2 <- predict(fit2, newdata=pred.fits2, interval="confidence")
pred.fits2$py <- 10^pm2[,1] # convert the predicted y values to linear scale
ggplot(df2, aes(x=x, y=y)) +
geom_point() +
geom_smooth(method = lm, formula = y ~ x, se = FALSE, size=1.5) +
geom_line(data=pred.fits2, aes(x=x, y=py), size=1.5, linetype = "longdash") +
scale_x_log10() +
scale_y_log10()
Thanks everyone for your help.
This code can be useful for your understanding (Thanks to #BWilliams for the valious comment). You want x and y in log scale so if mixing a linear model with different scales can mess everything. If you want to see similar scales it is better if you train a different model with log variables and then plot it also using the proper values. Here an approach where we build a log-log model and then plot (data values as ones or negative have been isolated in a new dataframe df2). Here the code:
First linear model:
library(ggplot2)
#Data
df <- data.frame(x= c(0,1,10,100,1000,0,1, 10,100,1000,0,1,10,100,1000),
y=c(7,15,135,1132,6459,-3,11,127,1120,6249,-5,13,126,1208,6208))
#Model 1 all obs
fit <- lm(data = df, y ~ x)
pred.fits <- expand.grid(x=seq(1, 2000, length=2001))
pm <- predict(fit, newdata=pred.fits, interval="confidence")
pred.fits$py <- pm[,1]
#Plot 1
ggplot(df, aes(x=x, y=y)) +
geom_point() +
geom_smooth(method = lm, formula = y ~ x, se = FALSE, size=1.5) +
geom_line(data=pred.fits, aes(x=x, y=py), size=.2)
Output:
Now the sketch for log variables, notice how we use log() across main variables and also how the model is build:
#First remove issue values
df2 <- df[df$x>1,]
#Train a new model
pred.fits2 <- expand.grid(x=seq(1, 2000, length=2001))
fit2 <- lm(data = df2, log(y) ~ log(x))
pm2 <- predict(fit2, newdata=pred.fits2, interval="confidence")
pred.fits2$py <- pm2[,1]
#Plot 2
ggplot(df2, aes(x=log(x), y=log(y))) +
geom_point() +
geom_smooth(method = lm, formula = y ~ x, se = FALSE, size=1.5) +
geom_line(data=pred.fits2, aes(x=log(x), y=py), size=.2)
Output:
I've been trying different suggestions such as the ggmisc package, but nothing seems to work in my favor.
I'm using the iris dataframe and just trying to plot random variables:
modellm <- lm(`Sepal.Length` ~ `Sepal.Width` + `Petal.Length` + `Petal.Width`, data = iris)
model <- coef(Modellm)["(Intercept)"] +
coef(Modellm)["Sepal.Width"] * iris$`Sepal.Width` +
coef(Modellm)["Petal.Length"] * iris$`Petal.Length` +
coef(Modellm)["Petal.Width"] * iris$`Petal.Width` +
residuals(Modellm)
library(ggplot2)
ggplot(iris, aes(`Sepal.Length`, model))+
geom_point(size=2, alpha=0.2)+
geom_smooth(method='lm')
How is it possible for me to get the R-squared value plotted in the ggplot?
If you want to display the r squared value just add this to the end of your plot:
+ annotate("text", x = 1, y = 1, label = paste0("R Squared = ", summary(modellm)$r.squared))
adjust the placement with the x and y coordinates
If you really want to plot the R^2, you could do something like this.
library(ggplot2)
p <- ggplot(iris, aes(`Sepal.Length`, model))+
geom_point(size=2, alpha=0.2)+
geom_smooth(method='lm')
r2 <- summary(Modellm)$r.squared
p + scale_y_continuous(
sec.axis=sec_axis(~ . * 4 / 30 , name = expression(paste(R^{2})))) +
geom_rect(xmin=7.9, xmax=8, ymin=0, ymax=1*30/4,
fill="white", color="#78B17E") +
geom_rect(xmin=7.9, xmax=8, ymin=0, ymax=r2*30/4, fill="#78B17E") +
annotate("text", x = 7.95, y = 7.62, size=3, color="#78B17E",
label = paste0(round(r2, 2)))
Yields
I wish to highlight segments above or below a certain value in a time series by a unique colour or a shape. In the example data I am decomposing a mortality time series into its components. My goal is to highlight the segments when the mortality in the trend component falls below 35 (deep between 1997 and 2000) and when the residual component is above 100 (the spike). I have tried to use annotate, but that did not produce what I wanted.
#Load library and obtain data
library(gamair)
library(tsModel)
library(ggplot2)
library(reshape2)
data<-data(chicago)
## create variables, decompose TS
chicago$date<-seq(from=as.Date("1987-01-01"), to=as.Date("2000-12-31"),length=5114)
data<- chicago[,c("date","death")]
mort <- tsdecomp(data$death, c(1, 2, 15, 5114))
## Convert matrix to df, rename, melt
df<-as.data.frame(mort)
names(df)[1] <- "Trend"
names(df)[2] <- "Seasonal"
names(df)[3] <- "Residual"
df$date<-seq(as.Date("1987-01-01"), as.Date("2000-12-31"), "day")
meltdf <- melt(df,id="date")
## Plot
ggplot(meltdf,aes(x=date,y=value,colour=variable,group=variable)) + geom_line() +
theme_bw() +
ylab("") + xlab("") +
facet_grid(variable ~ . , scales = "free") +
theme(legend.position = "none")
annotate("rect", xmin=1995-01-01,xmax=1996-01-01,ymin= 10, ymax=300, alpha = .2,fill="blue")
Well, this works but I must admit it's more work that I'd hoped.
get.box <- function(data) {
rng <- range(data$date) + c(-50,50)
z <- meltdf[meltdf$date>=rng[1] & meltdf$date <=rng[2] & meltdf$variable==unique(data$variable),]
data.frame(variable=unique(z$variable),
xmin=min(z$date),xmax=max(z$date),ymin=min(z$value),ymax=max(z$value))
}
hilight.trend <- get.box(with(meltdf,meltdf[variable=="Trend" & value<35,]))
hilight.resid <- get.box(with(meltdf,meltdf[variable=="Residual" & value>100,]))
ggplot(meltdf,aes(colour=variable,group=variable)) +
geom_line(aes(x=date,y=value)) +
theme_bw() +
ylab("") + xlab("") +
facet_grid(variable ~ . , scales = "free") +
theme(legend.position = "none") +
geom_rect(data=hilight.trend, alpha=0.2, fill="red",
aes(xmax=xmax,xmin=xmin,ymax=ymax,ymin=ymin)) +
geom_rect(data=hilight.resid, alpha=0.2, fill="blue",
aes(xmax=xmax,xmin=xmin,ymax=ymax,ymin=ymin))
You can't really use annotate(...) with facets, because you will get the same annotation on all the facets. So you're left with something like geom_rect(...). The problem here is that geom_rect(...) draws a rectangle for every row in the data. So you need to create an auxiliary dataset with just one row for each variable, containing the x- and y- min and max.