I have input the following dataset into R:
Distance Overall Average
-9.867344 0.001928728
-8.769583 0.0011855
-8.667573 0.001401361
-8.373794 0.001514139
-7.443686 0.001046918
-3.862468 0.000790981
1.817748 0.000945557
2.5333892 0.000940648
4.190557 0.001649773
When plotted onto a scatter plot these give a U-Shaped curve so I am trying to plot a non-linear regression 'line' onto the plot.
Plot of the original Data showing the U-shaped curve
I saw the following example given to a previous question...
plot(speed ~ dist, data = cars)
fit1 = lm(speed ~ dist, cars) #fits a linear model
plot(speed ~ dist, data = cars)
abline(fit1) #puts line on plot
fit2 = lm(speed ~ I(dist^2) + dist, cars) #fits a model with a quadratic term
fit2line = predict(fit2, data.frame(dist = -10:130))
However, after trying this multiple times I am consistently being shown the same error comment:
plot(Ammonia$Overall.Average~Ammonia$Distance)
fit1=lm(Ammonia$Overall.Average~Ammonia$Distance)
abline(fit1)
fit2=lm(Ammonia$Overall.Average ~ I(Ammonia$Distance^2) + Ammonia$Distance)
fit2line = predict(fit2, data.frame(Ammonia$Distance = 9,3))
Error: unexpected '=' in "fit2line = predict(fit2, data.frame(Ammonia$Distance ="
I am also not sure if "9,3" are the right numbers to have put in here as I'm not really sure where the example I followed got their numbers from.
Related
I need to plot a Scatterplot with the confidence interval for a robust linear regression (rlm) model, all the examples I had found only work with LM.
This is my code:
model1 <- rlm(weightsE$brain ~ weightsE$body)
newx <- seq(min(weightsE$body), max(weightsE$body), length.out=70)
newx<-as.data.frame(newx)
colnames(newx)<-"brain"
conf_interval <- predict(model1, newdata = data.frame(x=newx), interval = 'confidence',
level=0.95)
#create scatterplot of values with regression line
plot(weightsE$body, weightsE$body)
abline(model1)
#add dashed lines (lty=2) for the 95% confidence interval
lines(newx, conf_interval[,2], col="blue", lty=2)
lines(newx, conf_interval[,3], col="blue", lty=2)
but the results of predict don't produce a straight line for the upper and lower level, they are more like random predictions.
You have a few problems to fix here.
When you generate a model, don't use rlm(weightsE$brain ~ weightsE$body), instead use rlm(brain ~ body, data = weightsE). Otherwise, the model cannot take new data for predictions. Any predictions you get will be produced from the original weightsE$body values, not from the new data you pass into predict
You are trying to create a prediction data frame with a column called "brain', but you are trying to predict the value of "brain", so you need a column called "body"
newx is already a data frame, but for some reason you are wrapping it inside another data frame when you do newdata = data.frame(x=newx). Just pass newx.
You are plotting with plot(weightsE$body, weightsE$body), when it should be plot(weightsE$body, weightsE$brain)
Putting all this together, and using a dummy data set with the same names as your own (see below), we get:
library(MASS)
model1 <- rlm(brain ~ body, data = weightsE)
newx <- data.frame(body = seq(min(weightsE$body),
max(weightsE$body), length.out=70))
conf_interval <- predict(model1, newdata = data.frame(x=newx),
interval = 'confidence',
level=0.95)
#create scatterplot of values with regression line
plot(weightsE$body, weightsE$brain)
abline(model1)
#add dashed lines (lty=2) for the 95% confidence interval
lines(newx$body, conf_interval[, 2], col = "blue", lty = 2)
lines(newx$body, conf_interval[, 3], col = "blue", lty = 2)
Incidentally, you could do the whole thing in ggplot in much less code:
library(ggplot2)
ggplot(weightsE, aes(body, brain)) +
geom_point() +
geom_smooth(method = MASS::rlm)
Reproducible dummy data
data(mtcars)
weightsE <- setNames(mtcars[c(1, 6)], c("brain", "body"))
weightsE$body <- 10 - weightsE$body
For my homework, I am working with a dataset titled Default. I split my data into training and test sets, and ran a logistic regression for the relationship of default1 and the other 3 predictors(income (continuous), balance(continuous), student(0/1)).
I am supposed to plot the regression model, but it keeps showing a straight horizontal line on the graph and I don't think that's correct.
How can I graph multiple predictors with a singular binary outcome using my Default_train_logistic glm?
Also, how can I obtain those coefficients and error rates of the model?
TIA!
set.seed(1234)
Default$subsample <- runif(nrow(Default))
Default$test <- ifelse(Default$subsample < 0.80, "train", "test")
Default_train <- filter(Default, test == "train")
Default_test <- filter(Default, test == "test")
###Q1 Part B: b. Construct a logistic regression to predict if an individual will default based on all of the provided predictors, and visualize your final predicted model.
#Immediately after loading data, I created default1 to use default as a numerical binary variable for logistic regression.
Default_train_logistic <- glm(default1 ~ ., data = Default_train %>% select(-test), family = "binomial")
summary(Default_train_logistic)
plot(Default_train_logistic)
G1 <- ggplot(Default_train_logistic, aes(balance + income + student1, default1)) +
geom_point() +
geom_smooth(method = "glm",
method.args = list(family = "binomial"),
se = FALSE)
print(G1)
I am plotting panel data using ggplot and I want to add the regression line for my fixed effects model "fixed" to the plot. This is the current code:
# Fixed Effects Model in plm
fixed <- plm(progenyMean ~ damMean, data=finalDT, model= "within", index = c("sireID", "cropNum"))
# Plotting Function
plotFunction <- function(Data){
ggplot(Data, aes(x=damMean, y=progenyMean)) +
geom_point() +
geom_smooth(method = "lm", se = T, formula=fixed)
}
However, the plot doesn't recognise the geom_smooth() and there is no regression line on the plot.
Is it possible to plot a regression line for a fixed effects model here?
OP. Please, include a reproducible example in your next question so that we can help you better. In this case, I'll answer using the same dataset that is used on Princeton's site here, since I'm not too familiar with the necessary data structure to support the plm() function from the package plm. I do wish the dataset could be one that is a bit more dependably going to be present... but hopefully this example remains illustrative even if the dataset is no longer available.
library(foreign)
library(plm)
library(ggplot2)
library(dplyr)
library(tidyr)
Panel <- read.dta("http://dss.princeton.edu/training/Panel101.dta")
fixed <-plm(y ~ x1, data=Panel, index=c("country", "year"), model="within")
my_lm <- lm(y ~ x1, data=Panel) # including for some reference
Example: Plotting a Simple Linear Regression
Note that I've also referenced a standard linear model - this is to show you how you can extract the values and plot a line from that to match geom_smooth(). Here's an example plot of that data plus a line plotted with the lm() function used by geom_smooth().
plot <- Panel %>%
ggplot(aes(x1, y)) + geom_point() + theme_bw() +
geom_smooth(method="lm", alpha=0.1, color='gray', size=4)
plot
If I wanted to plot a line to match the linear regression from geom_smooth(), you can use geom_abline() and specify slope= and intercept=. You can see those come directly from our my_lm list:
> my_lm
Call:
lm(formula = y ~ x1, data = Panel)
Coefficients:
(Intercept) x1
1.524e+09 4.950e+08
Extracting those values for my_lm$coefficients gives us our slope and intercept (realizing that the named vector has intercept as the fist position and slope as the second. You'll see our new blue line runs directly over top of the geom_smooth() line - which is why I made that one so fat :).
plot + geom_abline(
slope=my_lm$coefficients[2],
intercept = my_lm$coefficients[1], color='blue')
Plotting line from plm()
The same strategy can be used to plot the line from your predictive model using plm(). Here, it's simpler, since the model from plm() seems to have an intercept of 0:
> fixed
Model Formula: y ~ x1
Coefficients:
x1
2475617827
Well, then it's pretty easy to plot in the same way:
plot + geom_abline(slope=fixed$coefficients, color='red')
In your case, I'd try this:
ggplot(Data, aes(x=damMean, y=progenyMean)) +
geom_point() +
geom_abline(slope=fixed$coefficients)
Following workflow for nonlinear quantile regression seems to work. However I don´t know how to plot the resulting curve.
btw.: I´d prefer to use the function graphics::curve() instead of graphics::lines()
require(quantreg)
# load sample data
dat <- DNase
# introduce variable
x <- DNase$conc
y <- DNase$density
# introduce function
f <- function(a, b, x) {(a*x/(b+x))}
# fit the model
fm0 <- nls(log(y) ~ log(f(a,b,x)), dat, start = c(a = 1, b = 1))
# fit a nonlinear least-square regression
fit <- nls(y ~ f(a,b,x), dat, start = coef(fm0))
# receive coeffientes
co <- coef(fit)
a=co[1]
b=co[2]
# plot
plot(y~x)
# add curve
curve((a*x/(b+x)), add=T)
# then fit the median using nlrq
dat.nlrq <- nlrq(y ~ SSlogis(x, Asym, mid, scal), data=dat, tau=0.5)
# add curve
???
EDIT: What I´m looking for is a way to plot various quantile regressions of a formula, like a*x/(b+x).
Inserting the formula leads me to the question what to put as 'start' argument
dat.nlrq.075 <- nlrq(formula=fit, data = dat, start=???, tau = 0.75)
curve uses lines so there is really no reason to use curve when it's easier to use lines.
First ensure that data is sorted so plots come out right. Then fit with nls or nlrq and use fitted for the fitted line.
library(quantreg)
dat <- DNase[order(DNase$conc), ]
fit.nlrq <- nlrq(density ~ SSlogis(conc, Asym, mid, scal), data = dat, tau = 0.5)
plot(density ~ conc, dat)
lines(fitted(fit.nlrq) ~ conc, dat)
If you want to plot the fit at a different number of equally spaced points such as 250 then do the same except use predict instead of fitted:
x <- seq(min(dat$conc), max(dat$conc), length = 250)
lines(predict(fit.nlrq, list(conc = x)) ~ x, lty = 2, col = "red")
The same style works with nls.
Note that if you use require its value should be checked. If you don't want to do that use library instead.
Still quite new to R (and statistics to be honest) and I have currently only used it for simple linear regression models. But now one of my data sets clearly shows a inverted U pattern. I think I have to do a quadratic regression analysis on this data, but I'm not sure how. What I tried so far is:
independentvar2 <- independentvar^2
regression <- lm(dependentvar ~ independentvar + independentvar2)
summary (regression)
plot (independentvar, dependentvar)
abline (regression)
While this would work for a normal linear regression, it doesn't work for non-linear regressions. Can I even use the lm function since I thought that meant linear model?
Thanks
Bert
This example is from this SO post by #Tom Liptrot.
plot(speed ~ dist, data = cars)
fit1 = lm(speed ~ dist, cars) #fits a linear model
plot(speed ~ dist, data = cars)
abline(fit1) #puts line on plot
fit2 = lm(speed ~ I(dist^2) + dist, cars) #fits a model with a quadratic term
fit2line = predict(fit2, data.frame(dist = -10:130))