Being more MatLab than R, I ran into this bit of code:
z <- knots(y)
k <- ecdf(data)(z)
So knots is a interpolating step function so presumably the second line of code somehow "applies" this interpolation to empirical CDF of the data? How exactly do you read this syntax? What does it mean?
Related
My question is how to generate a sample in R from a logistic CDF with the inverse CDF method. The logistic density is p(θ) = exp(θ)/(1 + exp(θ))^2
Here is the algorithm for that method:
1: for t = 1 to T do
2: sample q(t) ∼ Unif(0, 1)
3: θ(t) ← F^−1(q(t))
4: end for
Here is my code but it just generates a vector of the same number. The result should be log-concave but obviously it would not be that if I put it in the histogram, so what is the problem?:
First define T as the number of draws you're taking from uniform distribution
T<-100000
sample_q<-runif(T,0,1)
It seems like plogis will give you the cumulative distribution function, so I suppose I can just take its inverse:
generate_samples_from_logistic_CDF <- function(p) {
for(t in length(T))
cdf<-plogis((1+exp(p)/(exp(p))))
inverse_cdf<-(1/cdf)
return(inverse_cdf)
}
should generate_samples_from_logistic_CDF(sample_q)
but instead it only gives me the same value for everything
Since the inverse CDF is already coded in R as qlogis(), this should work:
qlogis(runif(100000))
or if you want to do it "by hand" rather than using the built-in qlogis(), you can use R <- runif(100000); log(R/(1-R))
Note that rlogis(100000) should be more efficient.
One of your confusions is that "inverse" in the algorithm description above doesn't mean the multiplicative inverse or reciprocal (i.e. 1/x), but rather the function inverse (which in this case is log(q/(1-q)))
I have a time series of rainfall values in a csv file.I plotted the histogram of the data. The histogram is skewed to the left. I wanted to transform the values so that it will have a normal distribution. I used the Yeo-Johnson transform available in R. The transformed values are here.
My question is:
In the above transformation, I used a test value of 0.5 for lambda, which works fine. Is there away to determine the optimal value of lambda based on the time series? I'll appreciate any suggestions.
So far, here's the code:
library(car)
dat <- scan("Zamboanga.csv")
hist(dat)
trans <- yjPower(dat,0.5,jacobian.adjusted=TRUE)
hist(trans)
Here is the csv file.
First find the optimal lambda by using the function boxCox from the car package to estimate λ by maximum likelihood.
You can plot it like this:
boxCox(your_model, family="yjPower", plotit = TRUE)
As Ben Bolker said in a comment, the model here could be something like
your_model <- lm(dat~1)
Then use the optimized lambda in your existing code.
I want to plot the fitted values versus the observed ones and want to put straight line showing the goodness of fit. However, I do not want to use abline() because I did not calculate the fitted values using lm command as my I used a model that R does not cover. I calculated the coefficients and used them to calculate the fitted values. So, what can I do to obtain such a plot in R or in winbugs?
Here is what I want
Still no data provided, but maybe this simple example using the curve function will inform the process:
x <- 1:10
y <- 2+ 3*(1:10) + rnorm(10)
plot(1:10, y)
curve( 2+3*x, 0, 10, add=TRUE)
Note to new R users. the expression y_i = 1 - xbeta + delta_i + e_i would fail in R in part because the x and beta are not separated by an operator. But if you do understand R's matrix syntax it might be a very compact expression even if "X" were multidimensional. All of htis depends on the specifics which we are so far lacking.
I'm trying to fit a natural cubit spline to probabilistic data (probabilities that a random variable is smaller than certain values) to obtain a cumulative distribution function, which works well enough using splinefun():
cutoffs <- c(-90,-60,-30,0,30,60,90,120)
probs <- c(0,0,0.05,0.25,0.5,0.75,0.9,1)
CDF.spline <- splinefun(cutoffs,probs, method="natural")
plot(cutoffs,probs)
curve(CDF.spline(x), add=TRUE, col=2, n=1001)
I would then, however, like to use the density function, i.e. the derivative of the spline, to perform various calculations (e.g. to obtain the expected value of the random variable).
Is there any way of obtaining this derivative as a function rather than just evaluated at a discrete number of points via splinefun(x, deriv=1)?
This is pretty close to what I'm looking for, but alas the example doesn't seem to work in R version 2.15.0.
Barring an analytical solution, what's the cleanest numerical way of going about this?
If you change the environment assignment line for g in the code the Berwin Turlach provided on R-help to this:
environment(g) <- environment(f)
... you succeed in R 2.15.1.
note: originally posted on Cross Validated (stats SE) on 07-26-2011, with no correct answers to date.
Background
I have a model, f, where Y=f(X)
X is an n x m matrix of samples from m parameters and Y is the n x 1 vector of model outputs.
f is computationally intensive, so I would like to approximate f using a multivariate cubic spline through (X,Y) points, so that I can evaluate Y at a larger number of points.
Question
Is there an R function that will calculate an arbitrary relationship between X and Y?
Specifically, I am looking for a multivariate version of the splinefun function, which generates a spline function for the univariate case.
e.g. this is how splinefun works for the univariate case
x <- 1:100
y <- runif(100)
foo <- splinefun(x,y, method = "monoH.FC")
foo(x) #returns y, as example
The test that the function interpolates exactly through the points is successful:
all(y == foo(1:100))
## TRUE
What I have tried
I have reviewed the mda package, and it seems that the following should work:
library(mda)
x <- data.frame(a = 1:100, b = 1:100/2, c = 1:100*2)
y <- runif(100)
foo <- mars(x,y)
predict(foo, x) #all the same value
however the function does not interpolate exactly through the design points:
all(y == predict(foo,x))
## FALSE
I also could not find a way to implement a cubic-spline in either the gam, marss, or earth packages.
Actually several packages can do it. The one I use is the "rms" package which has rcs, but the survival package also has pspline and the splines package has the ns function {}. "Natural splines" (constructed with ns) are also cubic splines. You will need to form multivariate fitting function with the '*' operator in the multivariate formula creating "crossed" spline terms.
that the example you offered was not sufficiently rich.
I guess I am confused that you want exact fits. R is a statistical package. Approximate estimation is the goal. Generally exact fits are more of a problem because they lead to multicollinearity.
Have a look at the DiceKriging package which was developed to undertake tasks like this.
http://cran.r-project.org/web/packages/DiceKriging/index.html
I've provided an example application at
https://stats.stackexchange.com/questions/13510/fitting-multivariate-natural-cubic-spline/65012#65012
I'm not sure if this is precisely what you are looking for, but you could try Tps() in the R package fields. It's meant for doing thin-plate splines interpolations (2D equivalent of cubic splines) for spatial data, but will take up to four covariates, although it will expect them to be euclidean x,y,z + time, so you need to be clear that you are selecting the correct options for your particular case. If you want to interpolate, set the smoothing parameter lambda to zero. You might also try the function polymars() in the R package polspline.