Subtracting a fitted polynomial from a dataset in R - r

I have one curve, a scatterplot, which is the plot of the data set I am working with (named 'mydata') and the other curve which is the fitted 2nd degree polynomial curve that I obtained from the data set.
The scatterplot was obtained with a simple plot function:
plot(mydata)
The code I used for the fitting is:
fit<-lm(mydata$Volts ~ poly(mydata$Frequency, 2, raw=TRUE),data=mydata)
#summary(fit)
lines(mydata$Frequency, predict(fit))
Now, I would like to subtract the fitted polynomial from the dataset. Following was my approach:
given<-plot(mydata)
fit<-lm(mydata$Volts ~ poly(mydata$Frequency, 2, raw=TRUE),data=mydata)
new<-lines(mydata$Frequency, predict(fit))
corrected<-given-new
plot(corrected)
The error I received was:
Error in plot(corrected) : object 'corrected' not found
How do I correct this?

Looks like you are trying to subtract graphical elements. You should perform any math/operations on your data before trying to plot it. Something like the following may work. However without sample data this is just an educated guess.
given <- mydata$Volts
fit <- lm(mydata$Volts ~ poly(mydata$Frequency, 2, raw=TRUE),data=mydata)
new <- predict(fit)
corrected <- given-new
plot(mydata$Frequency, corrected)

I ran a reprex (although technically, I need a random seed for a true reprex, but because of the actual issue with the code, that doesn't matter here) on nonsense data.
volts=rnorm(50,mean=220,sd=5)
frequency=runif(50,min=30,max=90)
mydata=data.frame(Volts=volts,Frequency=frequency)
given<-plot(mydata)
fit<-lm(mydata$Volts ~ poly(mydata$Frequency, 2, raw=TRUE),data=mydata)
new<-lines(mydata$Frequency, predict(fit))
corrected<-given-new
plot(corrected)
The scope of my answer is strictly to explain why the not found error showed up. Daniel's code shows you the fix.
I'm not sure why the response of Daniel O was not chosen, because it worked. I know it is frustrating when you clearly defined something and your source code is right in front of you, yet the interpreter says NOT FOUND. The lesson learned here when you get this situation, to check for NULL. It's a good habit in general for R.

Related

Plotting smoothspline [duplicate]

I am trying to use a smoothing spline on my dataset. I use smooth.spline function. And want to plot my fit next. However, for some reason it won't plot my model. It doesn't even give any error. I only get a error message after running smooth.spline function that 'cross-validation with non-unique 'x' values seems doubtful'. But I don't think it shouldn't make too much of a difference to the practical result.
My code is:
library('splines')
fit_spline <- smooth.spline(data.train$age,data.train$effect,cv = TRUE)
plot(data$effect,data$age,col="grey")
lines(fit_spline,lwd=2,col="purple")
legend("topright",("Smoothing Splines with 5.048163 df selected by CV"),col="purple",lwd=2)
What I get is:
Can someone tell me what I am doing wrong here?
Two issues:
Number 1. If you do smooth.spline(x, y), plot your data with plot(x, y) not plot(y, x).
Number 2. Don’t pass in data.train for fitting then a different dataset data for plotting. If you want to see how the spline looks like at new data points, use predict.smooth.spline first. See ?predict.smooth.spline.

Time Series and MA-model look equal in R

I am using the forecast package of R and I created a MA(1) model by using the ARIMA function. I plotted the time series itself ($x variable of the ma_model), the model itself ($fitted variable of the ma_model) and the residuals (residuals variable of the ma_model). Strangely the time series looks equal to the model altough there are nonegative residuals. Here is the code that I used:
library(forecast)
ma_model<-Arima(ts(generationData$Price[1:200]), order=c(0,1,0))
plot(ma_model$fitted, main = "Fitted")
plot(ma_model$x, main = "X")
plot(ma_model$residuals, main = "Residuals")
Here is the result
Basically the model can't be equal to the real time series especially when having residuals. Can anyone explain this to me? I'd appreciate every comment.
Update: I tried to use the order=c(0,0,20) so I have a MA(20) or AR(20) model (I am not sure which parameters stands for MA and AR). Now the fitted curve and the original time series look quite equal (but not exactly equal). Is this possible and usual? I'd appreciate every further comment.
Any comments on this issue?
I am not sure about your output, but from the code it seems that you just took the difference in the model, not the MA.
I think it should be order=c(0,0,1) instead of order=c(0,1,0) for building the MA model.

Can't plot smooth spline in R

I am trying to use a smoothing spline on my dataset. I use smooth.spline function. And want to plot my fit next. However, for some reason it won't plot my model. It doesn't even give any error. I only get a error message after running smooth.spline function that 'cross-validation with non-unique 'x' values seems doubtful'. But I don't think it shouldn't make too much of a difference to the practical result.
My code is:
library('splines')
fit_spline <- smooth.spline(data.train$age,data.train$effect,cv = TRUE)
plot(data$effect,data$age,col="grey")
lines(fit_spline,lwd=2,col="purple")
legend("topright",("Smoothing Splines with 5.048163 df selected by CV"),col="purple",lwd=2)
What I get is:
Can someone tell me what I am doing wrong here?
Two issues:
Number 1. If you do smooth.spline(x, y), plot your data with plot(x, y) not plot(y, x).
Number 2. Don’t pass in data.train for fitting then a different dataset data for plotting. If you want to see how the spline looks like at new data points, use predict.smooth.spline first. See ?predict.smooth.spline.

R random forest using cforest, how to plot tree

I have created random forest model using cforest
library("party")
crs$rf <- cforest(as.factor(Censor) ~ .,
data=crs$dataset[crs$sample,c(crs$input, crs$target)],
controls=cforest_unbiased(ntree=500, mtry=4))
cf <- crs$rf
tr <- party:::prettytree(cf#ensemble[[1]], names(cf#data#get("input")))
#tr
plot(new("BinaryTree", tree=tr, data=cf#data, responses=cf#responses))
I get error when plotting tree
Error: no string supplied for 'strwidth/height' unit
Any help how to overcome this error?
Looking at your code, I assume that crs is referring to a data frame. The dollar sign may be a problem (specifically crs$rf). If crs is indeed a data frame, then the $ would tell R to extract items from inside the dataframe by using the usual list indexing device. This may be conflicting with the call to generate the random forest, and be causing the error. You could fix this by starting with:
crs_rf <- cforest(as.factor(Censor) ~ ., .....
Which would create the random forest object. This would replace:
crs$rf <- cforest(as.factor(Censor) ~ ., ......
As a reference, in case this doesn't fix it, I wanted to refer you to a great guide from Stanford that covers random forests. They do have examples showing how the party package is used. In order to make trouble shooting easier, I might recommend pulling apart the call like they do in the guide. For example (taking from the guide that is provided), first, set the controls:
data.controls <- cforest_unbiased(ntree=1000, mtry=3)
Then make the call:
data.cforest <- cforest(Resp ~ x + y + z…, data = mydata, controls=data.controls)
Then generate the plot once the call works. If you need help plotting trees using cforest(), you might think about looking at this excellent resource (as I believe the root of the problem is with your plotting function call): http://www.r-bloggers.com/a-brief-tour-of-the-trees-and-forests/

R: Use VAR model to predict response to change in values of certain variables

I've fitted a VECM model in R, and converted in to a VAR representation. I would like to use this model to predict the future value of a response variable based on different scenarios for the explanatory variables.
Here is the code for the model:
library(urca)
library(vars)
input <-read.csv("data.csv")
ts <- ts(input[16:52,],c(2000,1),frequency=4)
dat1 <- cbind(ts[,"dx"], ts[,"u"], ts[,"cci"],ts[,"bci"],ts[,"cpi"],ts[,"gdp"])
args('ca.jo')
vecm <- ca.jo(dat1, type = 'trace', K = 2, season = NULL,spec="longrun",dumvar=NULL)
vecm.var <- vec2var(vecm,r=2)
Now what I would like do is to predict "dx" into the future by varying the others. I am not sure if something like "predict dx if u=30,cpi=15,bci=50,gdp=..." in the next period would work. So what I have in mind is something along the lines of: increase "u" by 15% in the next period (which would obviously impact on all the other variables as well, including "dx") and predict the impact into the future.
Also, I am not sure if the "vec2var" step is necessary, so please ignore it if you think it is redundant.
Thanks
Karl
This subject is covered very nicely in Chapters 4 and 8 of Bernhard Pfaff's book, "Analysis of Integrated and Cointegrated Time Series with R", for which the vars and urca packages were written.
The vec2var step is necessary if you want to use the predict functionality that's available.
A more complete answer was provided on the R-Sig-Finance list. See also this related thread.
Here you go - ??forecast gave vars::predict, Predict method for objects of class varest and vec2var as an answer, which looks precisely as you want it. Increasing u looks like impulse response analysis, so look it up!

Resources