I'm trying to fit a natural cubit spline to probabilistic data (probabilities that a random variable is smaller than certain values) to obtain a cumulative distribution function, which works well enough using splinefun():
cutoffs <- c(-90,-60,-30,0,30,60,90,120)
probs <- c(0,0,0.05,0.25,0.5,0.75,0.9,1)
CDF.spline <- splinefun(cutoffs,probs, method="natural")
plot(cutoffs,probs)
curve(CDF.spline(x), add=TRUE, col=2, n=1001)
I would then, however, like to use the density function, i.e. the derivative of the spline, to perform various calculations (e.g. to obtain the expected value of the random variable).
Is there any way of obtaining this derivative as a function rather than just evaluated at a discrete number of points via splinefun(x, deriv=1)?
This is pretty close to what I'm looking for, but alas the example doesn't seem to work in R version 2.15.0.
Barring an analytical solution, what's the cleanest numerical way of going about this?
If you change the environment assignment line for g in the code the Berwin Turlach provided on R-help to this:
environment(g) <- environment(f)
... you succeed in R 2.15.1.
Related
I am writing a code for estimating the parameter of a GPD using weighted nonlinear least square(WNLS) method.
The WNLS method consist of 2 steps
step 1: $(\hat{\xi_1} , \hat{b_1}) = arg\ \min_{(\xi,b)} \sum_{i=1}^{n} [\log(1-F_n(x_i)) - log(1-G_{\xi,b}(x_i))]$,
here $F_n$ is the ECDF and $1-G_{\xi,b}$ is the generalized pareto distribution.
Can anyone let me know how to calculate EDF function $F_n$ for a data "X" in R?
Does ecdf(X)(X) will calculate the ECDF? If so then, what is the need for ecdf(X) other than plotting? Also it would be really helpful if someone share some example code which involves the calculation of ECDF for data.
The ecdf call creates a function. That is, you can apply ecdf(X) to other data, as your ecdf(X)(X) call does. However, you might want to apply ecdf(X) to something other than X itself. If you want to know the empirical quantile to which three numbers a, b, and c_ correspond, an easy way to do that is to call ecdf(X)(c(a, b, c_)).
i do
library(Hmisc)
df <- as.matrix(replicate(20, rnorm(20)))
cor.df <- rcorr(df)
plot(cor.df$r,cor.df$P)
abline(h=0.05)
and i would like to know if R can compute the meeting point of the horizontal line and the bell-curve. Since i have a scatterplot, do i need to model the x,y-curve first, and then balance the two functions? Or can R do that graphically?
I actually want to know what the treshold for (uncorrected) pvalues indicating a significant test statistics for a given dataset would be. I am not a trained statistician, so excuse me if that is a basic question.
Thank you very much!
There is no function to graphically calculate an intersection. There are functions like uniroot that you can use in R to find intersections, but you need to have proper functions and have a good idea of the interval where the intersection occurs.
It would be best to properly model the curve in question, but a simply way to approximate a function when you have a bunch of points on the curve is just to use linear interpolation between the observed points. You can create a function for your points with approxfun
f1 <- approxfun(cor.df$r,cor.df$P, rule=2)
(again, a proper model would be better, but just for the sake of example, i'll continue with this function).
Now we can find the place where this curve cross 0.05 with
uniroot(function(x) f1(x)-.05, c(-1,-.001))$root
# [1] -0.4437796
uniroot(function(x) f1(x)-.05, c(.001, 1))$root
# [1] 0.4440005
I have two sets of 100.000 observations that come from a simulation.
Since one of the two cases is a 'baseline' case and the other is a 'treatment' case, I want create a plot that highlights the difference in distribution of the two simulations.
I started with an ecdf() of the two populations. The result is in the picture.
What I would like to do is to have a plot of the difference between the two ecdf curves.
A simple ecdf(baseline) - ecdf(treatment) does not work since ecdf returns a function; even using Ecdf from the Hmisc package does not work, since Ecdf returns a list and again the differene '-' operator is ill-defined in such a case.
By running this code you can get to the scenario described by the picture above
a <- runif(10000)
b <- rnorm(10000,0.5,0.5)
plot(ecdf(a))
lines(ecdf(b), col='red')
Any hints would be more than welcome.
So evaluate the functions?
decdf <- function(x, baseline, treatment) ecdf(baseline)(x) - ecdf(treatment)(x)
I want to generate sa scaled-inv-chisquared distribution in R. I know geoR have a R function for generating this. But I want to use gamma-distribution to generate this.
I think this two are equivalent:
X ~ rinvchisq(100, df=d, scale=s)
1/X ~ rgamma(100, shape=d/2, scale=2/(d*s))
isn't it? Can there be any numerical problem due this due to extreme values?
More specifically you would need X <- rinvchisq(...) and X <- 1/rgamma(...) (the ~ notation works this way in programs such as WinBUGS, and in statistics notation, but not in R). If you look at the code of geoR::rinvchisq, the relevant part is just
return((df * scale)/rchisq(n, df = df))
so if you have problems taking the reciprocal of very large or small chi-squared deviates you'll be in trouble anyway (although rchisq is internally using .External(C_rchisq, n, df), which falls through to C code, presumably for efficiency in this special case, rather than calling rgamma). If I were you I would go ahead and superimpose densities of some test samples just to make sure I hadn't screwed up the arithmetic or parameterization somewhere ...
For what it's worth there are also rinvgamma() functions in a variety of packages (library(sos); findFn("rinvgamma"))
note: originally posted on Cross Validated (stats SE) on 07-26-2011, with no correct answers to date.
Background
I have a model, f, where Y=f(X)
X is an n x m matrix of samples from m parameters and Y is the n x 1 vector of model outputs.
f is computationally intensive, so I would like to approximate f using a multivariate cubic spline through (X,Y) points, so that I can evaluate Y at a larger number of points.
Question
Is there an R function that will calculate an arbitrary relationship between X and Y?
Specifically, I am looking for a multivariate version of the splinefun function, which generates a spline function for the univariate case.
e.g. this is how splinefun works for the univariate case
x <- 1:100
y <- runif(100)
foo <- splinefun(x,y, method = "monoH.FC")
foo(x) #returns y, as example
The test that the function interpolates exactly through the points is successful:
all(y == foo(1:100))
## TRUE
What I have tried
I have reviewed the mda package, and it seems that the following should work:
library(mda)
x <- data.frame(a = 1:100, b = 1:100/2, c = 1:100*2)
y <- runif(100)
foo <- mars(x,y)
predict(foo, x) #all the same value
however the function does not interpolate exactly through the design points:
all(y == predict(foo,x))
## FALSE
I also could not find a way to implement a cubic-spline in either the gam, marss, or earth packages.
Actually several packages can do it. The one I use is the "rms" package which has rcs, but the survival package also has pspline and the splines package has the ns function {}. "Natural splines" (constructed with ns) are also cubic splines. You will need to form multivariate fitting function with the '*' operator in the multivariate formula creating "crossed" spline terms.
that the example you offered was not sufficiently rich.
I guess I am confused that you want exact fits. R is a statistical package. Approximate estimation is the goal. Generally exact fits are more of a problem because they lead to multicollinearity.
Have a look at the DiceKriging package which was developed to undertake tasks like this.
http://cran.r-project.org/web/packages/DiceKriging/index.html
I've provided an example application at
https://stats.stackexchange.com/questions/13510/fitting-multivariate-natural-cubic-spline/65012#65012
I'm not sure if this is precisely what you are looking for, but you could try Tps() in the R package fields. It's meant for doing thin-plate splines interpolations (2D equivalent of cubic splines) for spatial data, but will take up to four covariates, although it will expect them to be euclidean x,y,z + time, so you need to be clear that you are selecting the correct options for your particular case. If you want to interpolate, set the smoothing parameter lambda to zero. You might also try the function polymars() in the R package polspline.