Plotting smoothspline [duplicate] - r

I am trying to use a smoothing spline on my dataset. I use smooth.spline function. And want to plot my fit next. However, for some reason it won't plot my model. It doesn't even give any error. I only get a error message after running smooth.spline function that 'cross-validation with non-unique 'x' values seems doubtful'. But I don't think it shouldn't make too much of a difference to the practical result.
My code is:
library('splines')
fit_spline <- smooth.spline(data.train$age,data.train$effect,cv = TRUE)
plot(data$effect,data$age,col="grey")
lines(fit_spline,lwd=2,col="purple")
legend("topright",("Smoothing Splines with 5.048163 df selected by CV"),col="purple",lwd=2)
What I get is:
Can someone tell me what I am doing wrong here?

Two issues:
Number 1. If you do smooth.spline(x, y), plot your data with plot(x, y) not plot(y, x).
Number 2. Don’t pass in data.train for fitting then a different dataset data for plotting. If you want to see how the spline looks like at new data points, use predict.smooth.spline first. See ?predict.smooth.spline.

Related

Subtracting a fitted polynomial from a dataset in R

I have one curve, a scatterplot, which is the plot of the data set I am working with (named 'mydata') and the other curve which is the fitted 2nd degree polynomial curve that I obtained from the data set.
The scatterplot was obtained with a simple plot function:
plot(mydata)
The code I used for the fitting is:
fit<-lm(mydata$Volts ~ poly(mydata$Frequency, 2, raw=TRUE),data=mydata)
#summary(fit)
lines(mydata$Frequency, predict(fit))
Now, I would like to subtract the fitted polynomial from the dataset. Following was my approach:
given<-plot(mydata)
fit<-lm(mydata$Volts ~ poly(mydata$Frequency, 2, raw=TRUE),data=mydata)
new<-lines(mydata$Frequency, predict(fit))
corrected<-given-new
plot(corrected)
The error I received was:
Error in plot(corrected) : object 'corrected' not found
How do I correct this?
Looks like you are trying to subtract graphical elements. You should perform any math/operations on your data before trying to plot it. Something like the following may work. However without sample data this is just an educated guess.
given <- mydata$Volts
fit <- lm(mydata$Volts ~ poly(mydata$Frequency, 2, raw=TRUE),data=mydata)
new <- predict(fit)
corrected <- given-new
plot(mydata$Frequency, corrected)
I ran a reprex (although technically, I need a random seed for a true reprex, but because of the actual issue with the code, that doesn't matter here) on nonsense data.
volts=rnorm(50,mean=220,sd=5)
frequency=runif(50,min=30,max=90)
mydata=data.frame(Volts=volts,Frequency=frequency)
given<-plot(mydata)
fit<-lm(mydata$Volts ~ poly(mydata$Frequency, 2, raw=TRUE),data=mydata)
new<-lines(mydata$Frequency, predict(fit))
corrected<-given-new
plot(corrected)
The scope of my answer is strictly to explain why the not found error showed up. Daniel's code shows you the fix.
I'm not sure why the response of Daniel O was not chosen, because it worked. I know it is frustrating when you clearly defined something and your source code is right in front of you, yet the interpreter says NOT FOUND. The lesson learned here when you get this situation, to check for NULL. It's a good habit in general for R.

Can't plot smooth spline in R

I am trying to use a smoothing spline on my dataset. I use smooth.spline function. And want to plot my fit next. However, for some reason it won't plot my model. It doesn't even give any error. I only get a error message after running smooth.spline function that 'cross-validation with non-unique 'x' values seems doubtful'. But I don't think it shouldn't make too much of a difference to the practical result.
My code is:
library('splines')
fit_spline <- smooth.spline(data.train$age,data.train$effect,cv = TRUE)
plot(data$effect,data$age,col="grey")
lines(fit_spline,lwd=2,col="purple")
legend("topright",("Smoothing Splines with 5.048163 df selected by CV"),col="purple",lwd=2)
What I get is:
Can someone tell me what I am doing wrong here?
Two issues:
Number 1. If you do smooth.spline(x, y), plot your data with plot(x, y) not plot(y, x).
Number 2. Don’t pass in data.train for fitting then a different dataset data for plotting. If you want to see how the spline looks like at new data points, use predict.smooth.spline first. See ?predict.smooth.spline.

fitting an inverse gaussian distribution to data in R

Im trying to use the fitdist function in R to fit data to three different distributions by maximum likelihood to compare them. Lognormal and Weibull work fine, but I am struggling with Inverse Gaussian.
I need to specify starting values, however when I do I get an error message.
fw<-fitdist(claims,"weibull") WORKS
fln<-fitdist(claims,"lnorm") WORKS
fig<-fitdist(claims,"invgauss",start=list(mu=0,lambda=1)) DOES NOT WORK
Error: 'The pinvgauss function should return a zero-length vector when input has length zero and not raise an error'
What is wrong with my code?
I was working with a similar issue and found the issue was with how I labeled my start values. The actuar library I was working with required the labels "mean" and "shape" on the values. The following code provided me a solution:
library(actuar)
library(fitdistrplus)
fig <- fitdist(claims, "invgauss", start = list(mean = 5, shape = 1))

Plotting a difference between two ecdf()

I have two sets of 100.000 observations that come from a simulation.
Since one of the two cases is a 'baseline' case and the other is a 'treatment' case, I want create a plot that highlights the difference in distribution of the two simulations.
I started with an ecdf() of the two populations. The result is in the picture.
What I would like to do is to have a plot of the difference between the two ecdf curves.
A simple ecdf(baseline) - ecdf(treatment) does not work since ecdf returns a function; even using Ecdf from the Hmisc package does not work, since Ecdf returns a list and again the differene '-' operator is ill-defined in such a case.
By running this code you can get to the scenario described by the picture above
a <- runif(10000)
b <- rnorm(10000,0.5,0.5)
plot(ecdf(a))
lines(ecdf(b), col='red')
Any hints would be more than welcome.
So evaluate the functions?
decdf <- function(x, baseline, treatment) ecdf(baseline)(x) - ecdf(treatment)(x)

analytical derivative of splinefun()

I'm trying to fit a natural cubit spline to probabilistic data (probabilities that a random variable is smaller than certain values) to obtain a cumulative distribution function, which works well enough using splinefun():
cutoffs <- c(-90,-60,-30,0,30,60,90,120)
probs <- c(0,0,0.05,0.25,0.5,0.75,0.9,1)
CDF.spline <- splinefun(cutoffs,probs, method="natural")
plot(cutoffs,probs)
curve(CDF.spline(x), add=TRUE, col=2, n=1001)
I would then, however, like to use the density function, i.e. the derivative of the spline, to perform various calculations (e.g. to obtain the expected value of the random variable).
Is there any way of obtaining this derivative as a function rather than just evaluated at a discrete number of points via splinefun(x, deriv=1)?
This is pretty close to what I'm looking for, but alas the example doesn't seem to work in R version 2.15.0.
Barring an analytical solution, what's the cleanest numerical way of going about this?
If you change the environment assignment line for g in the code the Berwin Turlach provided on R-help to this:
environment(g) <- environment(f)
... you succeed in R 2.15.1.

Resources