Knn Regression in R - r

I am investigating Knn regression methods and later Kernel Smoothing.
I wish to demonstrate these methods using plots in R. I have generated a data set using the following code:
x = runif(100,0,pi)
e = rnorm(100,0,0.1)
y = sin(x)+e
I have been trying to follow a description of how to use "knn.reg" in 9.2 here:
https://daviddalpiaz.github.io/r4sl/k-nearest-neighbors.html#regression
grid2=data.frame(x)
knn10 = FNN::knn.reg(train = x, test = grid2, y = y, k = 10)
My predicted values seem reasonable to me but when I try to plot a line with them on top of my x~y plot I don't get what I'm hoping for.
plot(x,y)
lines(grid2$x,knn10$pred)
I feel like I'm missing something obvious and would really appreciate any help or advice you can offer, thank you for your time.

You just need to sort the x values before plotting the lines.
plot(x,y)
ORD = order(grid2$x)
lines(grid2$x[ORD],knn10$pred[ORD])

Related

Lag 0 is not plotted in GGCcf

With the following code I plotted the Cross Correlation of my data. All works wonderful, however the visualization does not depict Lag 0, which is highly important for my studies.
p= ggCcf(
df_ccf$Asia_Co,
df_ccf$EU_USA,
lag.max = 10,
type = c("correlation", "covariance"),
plot = TRUE,
na.action = na.contiguous)
plot(p)
The plot is looking like that:
Head of data:
I encountered the same issue; it might be an issue/bug with 'ggCcf' from the forecast library. I couldn't get ggCcf to work, no matter what I tried. Anyone who wants to reproduce this behaviour, try:
ggCcf(c(1,2,3,4),c(2,3,4,6))
The workaround is using regular/base R ccf:
max_lag = 10
result = ccf(series1, series2, lag.max = max_lag)
y = results$acf
x = c(-max_lag:max_lag)
You can use these two series to plot the ccf using ggplot2 and choosing an appropriate ylim.
The downside of this all is less conveniance, but the upside is that you can add some flair/styling to your plot now that you are doing everything yourself anyway ;).

R: Problems while plotting sampled values from a curve

I am trying to simulate a signal in order to apply some methods of non-linear fittings, but I have some problems when plotting it.
x<-sample(seq(0,1,length.out = 1000),200)
y<-2*sin(4*pi*x)-6*abs(x-0.4)^(0.3)+2*exp(-30*(4*x-2)^2)+8*x+rnorm(200,0,0.5)
s<-2*sin(4*pi*x)-6*abs(x-0.4)^(0.3)+2*exp(-30*(4*x-2)^2)+8*x
plot(x,y)
lines(x,s,col="red")
The idea I want to have 200 observations uniformly sampled with an additive white noise term, and the I would like to plot this "perturbed" signal together with the original signal. (y and s respectively).
The fact is that if I use the code that I wrote I obtain as result something like:
Probably is such a simple thing, but I'm kinda stuck with this.
Any hint or suggestion will be greatly appreciated.
Lines are plotted sequentially, and you decided to randomly draw your X values, so x values sitting next to each other in x are not next to each other on the axis - hence the mess. Just sort it:
x<-sort(sample(seq(0,1,length.out = 1000),200))
y<-2*sin(4*pi*x)-6*abs(x-0.4)^(0.3)+2*exp(-30*(4*x-2)^2)+8*x+rnorm(200,0,0.5)
s<-2*sin(4*pi*x)-6*abs(x-0.4)^(0.3)+2*exp(-30*(4*x-2)^2)+8*x
plot(x,y)
lines(x,s,col="red")
Another way to do this on the fly mentioned by mickey is:
ord = order(x)
lines(x[ord], s[ord], col = 'red')
You need to reorder the x observations order in ascending order, you can do that by storing everything in a dataframe object and then ordering it:
x<-sample(seq(0,1,length.out = 1000),200)
df_p= data.frame(x)
df_p$y<-2*sin(4*pi*df_p$x)-6*abs(df_p$x-0.4)^(0.3)+2*exp(-30*(4*df_p$x-2)^2)+8*df_p$x+rnorm(200,0,0.5)
df_p$s<-2*sin(4*pi*df_p$x)-6*abs(df_p$x-0.4)^(0.3)+2*exp(-30*(4*df_p$x-2)^2)+8*df_p$x
df_p = df_p[order(df_p$x),]
plot(df_p$x,df_p$y)
lines(df_p$x, df_p$s,col="red")
Also if you want to avoid this step you can use the ggplot2 library:
p <- ggplot(df_p) + geom_point(aes(x = x,y= y)) + geom_line(aes(x=x,y=s,color='red'))
plot(p)

Add second order fit to scatter plot in R

I'm trying to add second order curve to scatterplot.
I've read the answers to previous similar questions and here's what I came up with:
x<-log2(c(100,500,1000,2000,4000))
y<-c(3.6,1.308,1.065,.960,.908)
plot(x,y,pch=1)
mod_<-lm(y~poly(x,2,raw=TRUE))
lines(x,predict(mod_),col='red',lty=2)
Still, I get linear segments instead of smooth curve.
What mistake am I not seeing here ? Thanks !
You are calling predict by passing the model only. This only results in the model being evaluated at the values you specified in your lm call (that is x).
You need to supply a new set of values at which the model will be evaluated.
For, instance, this gives you a nice smooth line:
x<-log2(c(100,500,1000,2000,4000))
y<-c(3.6,1.308,1.065,.960,.908)
plot(x,y,pch=1)
mod_<-lm(y~poly(x,2,raw=TRUE))
# Define the new points at which you want to evaluate your model
new.x <- seq(6, 12, 0.1)
lines(new.x, predict(mod_, newdata = list(x=new.x)),col='red',lty=2)
You can also use ggplot2 like this
library(ggplot2)
df <- data.frame(x, y)
ggplot(data=df, aes(x, y))+geom_point()+stat_smooth(method="lm", formula = y ~ poly(x, 2, raw =TRUE))

Graphing a polynomial output of calc.poly

I apologize first for bringing what I imagine to be a ridiculously simple problem here, but I have been unable to glean from the help file for package 'polynom' how to solve this problem. For one out of several years, I have two vectors of x (d for day of year) and y (e for an index of egg production) data:
d=c(169,176,183,190,197,204,211,218,225,232,239,246)
e=c(0,0,0.006839425,0.027323127,0.024666883,0.005603878,0.016599262,0.002810977,0.00560387 8,0,0.002810977,0.002810977)
I want to, for each year, use the poly.calc function to create a polynomial function that I can use to interpolate the timing of maximum egg production. I want then to superimpose the function on a plot of the data. To begin, I have no problem with the poly.calc function:
egg1996<-poly.calc(d,e)
egg1996
3216904000 - 173356400*x + 4239900*x^2 - 62124.17*x^3 + 605.9178*x^4 - 4.13053*x^5 +
0.02008226*x^6 - 6.963636e-05*x^7 + 1.687736e-07*x^8
I can then simply
plot(d,e)
But when I try to use the lines function to superimpose the function on the plot, I get confused. The help file states that the output of poly.calc is an object of class polynomial, and so I assume that "egg1996" will be the "x" in:
lines(x, len = 100, xlim = NULL, ylim = NULL, ...)
But I cannot seem to, based on the example listed:
lines (poly.calc( 2:4), lty = 2)
Or based on the arguments:
x an object of class "polynomial".
len size of vector at which evaluations are to be made.
xlim, ylim the range of x and y values with sensible defaults
Come up with a command that successfully graphs the polynomial "egg1996" onto the raw data.
I understand that this question is beneath you folks, but I would be very grateful for a little help. Many thanks.
I don't work with the polynom package, but the resultant data set is on a completely different scale (both X & Y axes) than the first plot() call. If you don't mind having it in two separate panels, this provides both plots for comparison:
library(polynom)
d <- c(169,176,183,190,197,204,211,218,225,232,239,246)
e <- c(0,0,0.006839425,0.027323127,0.024666883,0.005603878,
0.016599262,0.002810977,0.005603878,0,0.002810977,0.002810977)
egg1996 <- poly.calc(d,e)
par(mfrow=c(1,2))
plot(d, e)
plot(egg1996)

Best way to plot interaction effects from a linear model

In an effort to help populate the R tag here, I am posting a few questions I have often received from students. I have developed my own answers to these over the years, but perhaps there are better ways floating around that I don't know about.
The question: I just ran a regression with continuous y and x but factor f (where levels(f) produces c("level1","level2"))
thelm <- lm(y~x*f,data=thedata)
Now I would like to plot the predicted values of y by x broken down by groups defined by f. All of the plots I get are ugly and show too many lines.
My answer: Try the predict() function.
##restrict prediction to the valid data
##from the model by using thelm$model rather than thedata
thedata$yhat <- predict(thelm,
newdata=expand.grid(x=range(thelm$model$x),
f=levels(thelm$model$f)))
plot(yhat~x,data=thethedata,subset=f=="level1")
lines(yhat~x,data=thedata,subset=f=="level2")
Are there other ideas out there that are (1) easier to understand for a newcomer and/or (2) better from some other perspective?
The effects package has good ploting methods for visualizing the predicted values of regressions.
thedata<-data.frame(x=rnorm(20),f=rep(c("level1","level2"),10))
thedata$y<-rnorm(20,,3)+thedata$x*(as.numeric(thedata$f)-1)
library(effects)
model.lm <- lm(formula=y ~ x*f,data=thedata)
plot(effect(term="x:f",mod=model.lm,default.levels=20),multiline=TRUE)
Huh - still trying to wrap my brain around expand.grid(). Just for comparison's sake, this is how I'd do it (using ggplot2):
thedata <- data.frame(predict(thelm), thelm$model$x, thelm$model$f)
ggplot(thedata, aes(x = x, y = yhat, group = f, color = f)) + geom_line()
The ggplot() logic is pretty intuitive, I think - group and color the lines by f. With increasing numbers of groups, not having to specify a layer for each is increasingly helpful.
I am no expert in R. But I use:
xyplot(y ~ x, groups= f, data= Dat, type= c('p','r'),
grid= T, lwd= 3, auto.key= T,)
This is also an option:
interaction.plot(f,x,y, type="b", col=c(1:3),
leg.bty="0", leg.bg="beige", lwd=1, pch=c(18,24),
xlab="",
ylab="",
trace.label="",
main="Interaction Plot")
Here is a small change to the excellent suggestion by Matt and a solution similar to Helgi but with ggplot. Only difference from above is that I have used the geom_smooth(method='lm) which plots regression lines directly.
set.seed(1)
y = runif(100,1,10)
x = runif(100,1,10)
f = rep(c('level 1','level 2'),50)
thedata = data.frame(x,y,f)
library(ggplot2)
ggplot(thedata,aes(x=x,y=y,color=f))+geom_smooth(method='lm',se=F)

Resources