Cannot draw prediction lines - r

So I have prediction data from:
predictedParams <- predict(parameters,
list(independentVar = log(median_income)),
interval='prediction', level=0.95)
predictedParamsDataFrame <- as.data.frame(predictedParams)
abline(independentVar, predictedParamsDataFrame$upr, lwd=4, col = 'red')
but that doesn't work. I've also tried doing this-
uprParams <- lm(independentVar ~ predictedParamsDataFrame$upr)
abline(uprParams, lwd=4, col = 'red')
But that results in a line that is way too high.
What should I do?

abline draws straight lines, so it is not usually what you want to draw for other than the fit of a linear model. Also, it receives as parameters the intercept and slope, which is again not what you have here.
This will get you a matrix with the predicted values and interval:
predictedParams <- predict(parameters,
list(independentVar = log(median_income)),
interval='prediction', level=0.95)
Now you simply need to plot independentVar and the upper/lower interval:
lines(log(median_income), predictedParams[,"upr"], lwd=4, col = 'red')
See that we use the lines function. In the x-axis we use log(median_income) because it is the actual data given to predict, and in the y-axis we use the upper end of the interval predictedParams[,"upr"].

You first need to plot. Then use abline.
plot(concentration, signal)
res <- lm(signal ~ concentration)
abline(res)

Related

Fitting a curve in the points

This is my data:
y<-c(1.8, 2, 2.8, 2.9, 2.46, 1.8,0.3,1.1,0.664,0.86,1,1.9)
x<- c(1:12)
data<-as.data.frame(cbind(y,x))
plot(data$y ~ data$x)
I want to fit a curve through these points so that I can generate the intermediate predicted values. I need a curve that goes through the points. I don't care what function it fits.
I consulted this link.
Fitting a curve to specific data
install.packages("rgp")
library(rgp)
result <- symbolicRegression(y ~ x,data=data,functionSet=mathFunctionSet,
stopCondition=makeStepsStopCondition(2000))
# inspect results, they'll be different every time...
(symbreg <- result$population[[which.min(sapply(result$population,
result$fitnessFunction))]])
function (x)
exp(sin(sqrt(x)))
# inspect visual fit
ggplot() + geom_point(data=data, aes(x,y), size = 3) +
geom_line(data=data.frame(symbx=data$x, symby=sapply(data$x, symbreg)),
aes(symbx, symby), colour = "red")
If I repeat this analysis again, every time the function above produces a different curve. Does anyone know why is this happening and whether this is a right way to fit a curve in these points? Also this function does not go through each points therefore I cannot obtain the intermediates points.
A standard approach is to fit a spline, this gives a nice curve that goeas through all points. See spline. Concretely you would use a call like:
spline(x = myX, y = myY, xout=whereToInterpolate)
or just calculating 100 points to your example:
ss <- spline(x,y, n=100)
plot(x,y)
lines(ss)
Note there is also a smoothing spline which may help for noisy data.
If the curve doesn't need to be smooth there is the simpler approx which does linear interpolation.
approx(x = myX, y = myY, xout=whereToInterpolate)

`abline` does not add line when producing regression diagonstic plots with `par()`

I am using par() function to draw a multi-panel plot and I want to add a line to exactly second plot...
par(mfrow = c(2, 2))
hist(model$residuals) # model is some predefined lm object
plot((model$residuals + model$fitted.values) ~ model$fitted.values)
# Now I want to add a line (or points or curve) to only above plot like
abline(model$coef) # but this doesn't work
qqnorm(model$residuals) # some more plots, doesn't matter which
Any help? I do not intend to use ggplot() and want to keep it simple.
The problem is not what you think to be with par; it is merely because you feed inappropriate values to abline. You changed your question several times, showing that you don't know what line should be added for different several plots. I will now clarify this, assuming mod is your fitted model.
residuals v.s. fitted
with(mod, plot(fitted.values, residuals))
abline(h = 0) ## residuals are centred, so we want a horizontal line
fitted v.s. response
with(mod, plot(fitted.values + residuals, fitted.values))
abline(0, 1) ## perfect fit has `fitted = response`, so we want line `y = x`
scatter plot with regression line
v <- attr(mod$terms, "term.labels") ## independent variable name
with(mod, plot(model[[v]], fitted.values + residuals)) ## scatter plot
abline(mod$coef) ## or simply `abline(mod)`, for add regression curve
reproducible example
set.seed(0)
xx <- rnorm(100)
yy <- 1.3 * xx - 0.2 + rnorm(100, sd = 0.5)
mod <- lm(yy ~ xx)
rm(xx, yy)
par(mfrow = c(2,2))
with(mod, plot(fitted.values, residuals))
abline(h = 0)
with(mod, plot(fitted.values + residuals, fitted.values))
abline(0, 1)
v <- attr(mod$terms, "term.labels") ## independent variable name
with(mod, plot(model[[v]], fitted.values + residuals)) ## scatter plot
abline(mod$coef) ## or simply `abline(mod)`
As #ZheyuanLi says, it's hard to see exactly what you want. Some of your problems appear to be from adding lines that don't overlap with the existing plot limits.
model <- lm(Illiteracy~Income,data.frame(state.x77))
par(mfrow = c(2, 2))
hist(model$residuals)
plot(model$residuals ~ model$fitted.values)
plot((model$residuals+model$fitted.values) ~ model$fitted.values)
Adding elements immediately after the plot works fine:
abline(a=0,b=1)
What if you want to go back and add elements to a previous frame? That's a bit difficult. Reset plot to row 1, column 2: this does not put us inside the plotting frame of the previous plot, it just gets us ready to plot in this subframe.
par(mfg=c(1,2))
We want to set up the same plot frame again: we'll cheat by plotting the same thing again (ensuring the same axis limits, etc. etc.), but turning off all aspects of the plot (new=FALSE means we don't blank out the previous plot):
plot(model$residuals ~ model$fitted.values,
type="n",new=FALSE,axes=FALSE,ann=FALSE)
abline(h=0,col=2)
Base graphics are really not designed for modifying existing plots; if you want to do much of it, you should look into the grid graphics system (which lattice and ggplot2 graphics are built on).

Understanding the Local Polynomial Regression

Could someone explain me why I get different lines when I plot? Somehow I thought the line should be the same
data(aircraft)
help(aircraft)
attach(aircraft)
lgWeight <- log(Weight)
library(KernSmooth)
# a) Fit a nonparametric regression to data (xi,yi) and save the estimated values mˆ (xi).
# Regression of degree 2 polynomial of lgWeight against Yr
op <- par(mfrow=c(2,1))
lpr1 <- locpoly(Yr,lgWeight, bandwidth=7, degree = 2, gridsize = length(Yr))
plot(Yr,lgWeight,col="grey", ylab="Log(Weight)", xlab = "Year")
lines(lpr1,lwd=2, col="blue")
lines(lpr1$y, col="black")
How can I get the values from the model? If I print the model, it gives me the values on $x and $y, but somehow if I plot them, is not the same as the blue line. I need the values of the fitted model (blue) for every x, could someone help me?
The fitted model (blue curve) is correctly in lpr1. As you said, the correct y-values are in lpr1$y and the correct x-values are in lpr1$x.
The reason the second plot looks like a straight line is because you are only giving the plot function one variable, lpr1$y. Since you don't specify the x-coordinates, R will automatically plot them along an index, from 1 to the length of the y variable.
The following are two explicit and equivalent ways to plot the curve and line:
lines(x = lpr1$x, y = lpr1$y,lwd=2, col="blue") # plots curve
lines(x = 1:length(lpr1$y), y = lpr1$y, col="black") # plot line

log-transformed density function not plotting correctly

I'm trying to log-transform the x axis of a density plot and get unexpected results. The code without the transformation works fine:
library(ggplot2)
data = data.frame(x=c(1,2,10,11,1000))
dens = density(data$x)
densy = sapply(data$x, function(x) { dens$y[findInterval(x, dens$x)] })
ggplot(data, aes(x = x)) +
geom_density() +
geom_point(y = densy)
If I add scale_x_log10(), I get the following result:
Apart from the y values having been rescaled, something seems to have happened to the x values as well -- the peaks of the density function are not quite where the points are.
Am I using the log transformation incorrectly here?
The shape of the density curve changes after the transformation because the distribution of the data has changed and the bandwidths are different. If you set a bandwidth of (bw=1000) prior to the transformation and 10 afterward, you will get two normal looking densities (with different y-axis values because the support will be much larger in the first case). Here is an example showing how varying bandwidths change the shape of the density.
data = data.frame(x=c(1,2,10,11,1000), y=0)
## Examine how changing bandwidth changes the shape of the curve
par(mfrow=c(2,1))
greys <- colorRampPalette(c("black", "red"))(10)
plot(density(data$x), main="No Transform")
points(data, pch=19)
plot(density(log10(data$x)), ylim=c(0,2), main="Log-transform w/ varying bw")
points(log10(data$x), data$y, pch=19)
for (i in 1:10)
points(density(log10(data$x), bw=0.02*i), col=greys[i], type="l")
legend("topright", paste(0.02*1:10), col=greys, lty=2, cex=0.8)

Smoothing using kernel and loess in R

I am trying to smooth my data set, using kernel or loess smoothing method. But, They are all not clear or not what I want. Several questions are the followings.
My x data is "conc" and y data is "depth", which is ex. cm.
1) Kernel smooth
k <- kernel("daniell", 150)
plot(k)
K <- kernapply(conc, k)
plot(conc~depth)
lines(K, col = "red")
Here, my data is smoothed by frequency=150. This means that every data point is averaged by neighboring (right and left) 150 data points? What "daniell" means? I could not find what it means online.
2) Loess smooth
p<-qplot(depth, conc, data=total)
p1 <- p + geom_smooth(method = "loess", size = 1, level=0.95)
Here, what is the default of loess smooth function? If I want to smooth my data with frequency=150 like above case (moving average by every 150 data point), how can I modify this code?
3) To show y-axis with log scale, I put "log10(conc)", instead of "conc", and it worked. But, I cannot change the y-axis tick label. I tried to use "scale_y_log10(limits = c(1,1e3))" in my code to show axis tick labe like 10^0, 10^1, 10^2..., but did not work.
Please answer my questions. Thanks a lot for your help.
Sum

Resources