Can somebody explain why library(car) finds influential observations here?:
library(car)
x = seq(1, 5, len = 100)
set.seed(99)
y = 2*x + 1 + rnorm(length(x), 0, 0.00005)
plot(x,y) # no influential observations!!
infl = influencePlot(lm(y ~ x))
infl # 4 influential observations??
If you read the help page for the function:
The default ‘method="noteworthy"’ is used only in this function and
indicates setting labels for points with large Studentized residuals,
hat-values or Cook's distances.
And the default settings:
id=TRUE’ is equivalent to ‘id=list(method="noteworthy", n=2, cex=1,
col=carPalette()1, location="lr")’
Using your example it looks like this:
It basically labels the 2 most extreme values for Studentized Residuals (y-axis) and 2 most extreme values for Hat-values (x-axis).
If you want the 3 most extreme, you can do:
influencePlot(lm(y ~ x),id=list(method="noteworthy",n=3))
I am having 3 variables x, y, z. each contains an equal amount of data say 40 numbers. using linear regression
x <- 1:40
y <- 1:40/2
z <- 41:80
model <- lm(x~y)
i can associate x and y values can create a data model to predict x values.
a <- data.frame(x = 52)
res <- predict(model, a)
it will predict y value based on the association. now i can plot the prediction line using the following code
plot(x, y)
plotdata <- cbind(x, predict(model))
lines(plotdata[order(x),], col = "red") .
So my question is if I have three variables x,y,z. how to associate and predict.
lm(x~y~z)
is not working. plotting can be done by using
library(rgl)
plot3d(x,y,z)
model1 <- (x~y+z)
plotdata <- cbind(y,z, predict(model))
lines3d(plotdata[order(y,z),], col = "red")
Thanks in advance.
Was trying to predict the future value of a sample using polynomial regression in R. The y values within the sample forms a wave pattern.
For example
x = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
y= 1,2,3,4,5,4,3,2,1,0,1,2,3,4,5,4
But when the graph is plotted for future values the resultant y values was completely different from what was expected. Instead of a wave pattern, was getting a graph where the y values keep increasing.
futurY = 17,18,19,20,21,22
Tried different degrees of polynomial regression, but the predicted results for futurY were drastically different from what was expected
Following is the sample R code which was used to get the results
dfram <- data.frame('x'=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16))
dfram$y <- c(1,2,3,4,5,4,3,2,1,0,1,2,3,4,5,4)
plot(dfram,dfram$y,type="l", lwd=3)
pred <- data.frame('x'=c(17,18,19,20,21,22))
myFit <- lm(y ~ poly(x,5), data=dfram)
newdata <- predict(myFit, pred)
print(newdata)
plot(pred[,1],data.frame(newdata)[,1],type="l",col="red", lwd=3)
Is this the correct technique to be used for predicting the unknown future y values OR should I be using other techniques like forecasting?
# Reproducing your data frame
dfram <- data.frame("x" = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16),
"y" = c(1,2,3,4,5,4,3,2,1,0,1,2,3,4,5,4))
From your graph I've got the phase and period of the signal. There're better ways of calculating that automatically.
# Phase and period
fase = 1
per = 10
In the linear model function I've put the triangular signal equations.
fit <- lm(y ~ I((((trunc((x-fase)/(per/2))%%2)*2)-1) * (x-fase)%%(per/2))
+ I((((trunc((x-fase)/(per/2))%%2)*2)-1) * ((per/2)-((x-fase)%%(per/2))))
,data=dfram)
# Predict the old data
p_olddata <- predict(fit,type="response")
# Predict the new data
newdata <- data.frame('x'=c(17,18,19,20,21,22))
p_newdata <- predict(fit,newdata,type="response")
# Ploting Old and new data
plot(x=c(dfram$x,newdata$x),
y=c(p_olddata,p_newdata),
col=c(rep("blue",length(p_olddata)),rep("green",length(p_olddata))),
xlab="x",
ylab="y")
lines(dfram)
Where the black line is the original signal, the blue circles are the prediction for the original points and the green circles are the prediction for the new data.
The graph shows a perfect fit for the model because there's no noise in the data. In a real dataset you may find it so the fit will not look as nice as that.
I have run a series of multiple linear regression models and am running diagnostic plots using the method and code found via this link (http://www.r-bloggers.com/checking-glm-model-assumptions-in-r/)
I have no more than 53 data points for every model, however some of the outliers in the regression plots are labeled as above 53... ranging from 58-107. Do the labels of outliers or influential points in the regression plots not correlate with each individual data point? If so what do the labels mean and how do I know which of my data points are the outliers? I have counted my data points in my plots and none of them have more than 53.
I have attached a screenshot of my regression plot output. There are 53 points in this plot, however two of the notable points are labeled 90 and 106. Regression plot example
plot.lm labels the points with the corresponding row names:
set.seed(42)
DF <- data.frame(x = 1:5, y = 2 + 3 * 1:5 + rnorm(5))
rownames(DF) <- letters[1:5]
DF$y[3] <- 1e3
mod <- lm(y ~ x, data = DF)
par(mfrow = c(2,2))
plot(mod, 1:4)
I am trying to forecast future revenue by using Sigmoid growth model in R. The model is like this:
Y = a/( 1+ce^(-bX) ) + noise
my code:
x <- seq(-5,5,length=n)
y <- 1/(1+exp(-x))
plot(y~x, type='l', lwd=3)
title(main='Sigmoid Growth')
I could draw the plot, but I don't know how to get the future values. Suppose I want to predict the next 6 years revenue values.
Make y a function, and plot that (plot has special support for functions):
y <- function(x) 1/(1+exp(-x))
plot(y,-5,11,type="l",lwd=3)