I am trying to perform 2D interpolation in GEKKO, and have the x,y, and z data. However, when I input them into the bspline function, I get the error "x_data and y_data must be strictly increasing". How do you calculate the knots and coefficients to define the surface?
I attempted using scipy's interpolate function to create data for flattened meshgrid between the x and y data with the z data and sorted so x increases and y loops through increasing values, but this still results in decreasing y values.
After looking at http://apmonitor.com/wiki/index.php/Main/ObjectBspline, I realized the xdata and ydata were the knots of the bspline and the z values were the coefficients. Thus these values can be obtained from scipy.interpolate.bisplrep(xdata,ydata,zdata) which returns a list containing the knots and coefficients to define the surface. These then can be input into the bspline function as m.bspline(xtest,ytest,ztest,tck[0],tck[1],tck[2],data=False)
Related
The goal is to compute the density function of a sum of n IID random variables via the density function of one of these random variables by:
Transforming the density function into the characteristic function via fft
Raise the characteristic function to the n
Transform the resulting characteristic function into the density function of interest via fft(inverse=TRUE)
The below is my naive attempt at this:
sum_of_n <- function(density, n, xstart, xend, power_of_2)
{
x <- seq(from=xstart, to=xend, by=(xend-xstart)/(2^power_of_2-1))
y <- density(x)
fft_y <- fft(y)
fft_sum_of_y <- (fft_y ^ n)
sum_of_y <- Re(fft(fft_sum_of_y, inverse=TRUE))
return(sum_of_y)
}
In the above, density is an arbitrary density function: for example
density <- function(x){return(dgamma(x = x, shape = 2, rate = 1))}
n indicates the number of IID random variables being summed. xstart and xend are the start and end of the approximate support of the random variable. power_of_2 is the power of 2 length for the numeric vectors used. As I understand things, lengths of powers of two increase the efficiency of the fft algorithm.
I understand at least partially why the above does not work as intended in general. Firstly, the values themselves will not be scaled correctly, as fft(inverse=TRUE) does not normalize by default. However, I find that the values are still not correct when I divide by the length of the vector i.e.
sum_of_y <- sum_of_y / length(sum_of_y)
which based on my admittedly limited understanding of fft is the normalizing calculation. Secondly, the resulting vector will be out of phase due to (someone correct me on this if I am wrong) the shifting of the zero frequency that occurs when fft is performed. I have tried to use, for example, pracma's fftshift and ifftshift, but they do not appear to address this problem correctly. For symmetric distributions e.g. normal, this is not difficult to address since the phase shift is typically exactly half, so that an operation like
sum_of_y <- c(sum_of_y[(length(y)/2+1):length(y)], sum_of_y[1:(length(y)/2)])
works as a correction. However, for asymmetric distributions like the gamma distribution above this fails.
In conclusion, are there adjustments to the code above that will result in an appropriately scaled and appropriately shifted final density function for the IID sum?
Beginner in R, I performs splinefun function on (x,y) values. I am searching to get the derivative of the function in x and the interpolated y values by the function. Also, I try to constrain the function to be >0.
Maybe someone already asked these questions ?
I performed the splinefun function, and have the impression that the function is not "smoothing" the values of observed (x,y) but passes exactly through these points. Is it how splinefun interpolates the values, or is there a way to constrain the function to "smooth" the (x,y) cloud?
With what I did, I always have y interpolated = y observed/measured.
Is there something to see with the "method" for the interpolation ("fmm", "monoH.FC",...), or with the "ties" argument?
I also tried to get the first derivative, thanks to the argument "f(x,deriv=1)", but I am not sure this is the right way to do that.
example of the code:
x <- c(1,8,14,21,28,35,42,65)
y <- c(65,30,70,150,40,0,15,0)
splinefun(x, y, method="fmm", ties=mean)
deriv <- f(x, deriv=1) #Get the first derivative
y <- f(x) # Get the interpolated y
I am searching how to :
1) Force the interpolated function to return a function with values for y only >0 (is there an argument in the splinefun function which handle that ?)
2) Get the values of the derivative of the interpolated function for the x (used for the interpolation)
3) Get the values of the y interpolated (different than the y that was used to perform the interpolation)
Thanks a lot for your help.
Using Greg's helpful answer here, I fit a second order polynomial regression line to my dataset:
poly.fit<-lm(y~poly(x,2),df)
When I plot the line, I get the graph below:
The coefficients are:
# Coefficients:
# (Intercept) poly(x, 2)1 poly(x, 2)2
# 727.1 362.4 -269.0
I then wanted to find the x-value of the peak. I assume there is an easy way to do so in R but I did not know it,* so I went to Wolfram Alpha. I entered the equation:
y=727.1+362.4x-269x^2
Wolfram Alpha returned the following:
As you can see, the function intersects the x-axis at approximately x=2.4. This is obviously different from my plot in R, which ranges from 0≤x≤80. Why are these different? Does R interpret my x-values as a fraction of some backroom variable?
*I would also appreciate answers on how to find this peak. Obviously I could take the derivative, but how do I set to zero?
Use predict.
plot( 40:90, predict( poly.fit, list(x=40:90) )
In the case of a quadratic polynomial, you can of course use a little calculus and algebra (once you have friendly coefficients).
Somewhat more generally, you can get an estimate by evaluating your model over a range of candidate values and determining which one gives you the maximum response value.
Here is a (only moderately robust) function which will work here.
xmax <- function(fit, startx, endx, x='x', within=NA){
## find approximate value of variable x where model
## specified by fit takes maximum value, inside interval
## [startx, endx]; precision specified by within
within <- ifelse(is.na(within), (endx - startx)/100, within)
testx <- seq(startx, endx, by=within)
testlist <- list(testx)
names(testlist)[1] <- x
testy <- predict(fit, testlist)
testx[which.max(testy)]
}
Note if your predictor variable were called something other than x, you have to specify it as a string in the x parameter.
So to find the x value where your curve has its peak:
xmax(poly.fit, 50, 80, within=0.1)
I want to turn a continuous random variable X with cdf F(x) into a continuous random variable Y with cdf F(y) and am wondering how to implement it in R.
For example, perform a probability transformation on data following normal distribution (X) to make it conform to a desirable Weibull distribution (Y).
(x=0 has CDF F(x=0)=0.5, CDF F(y)=0.5 corresponds to y=5, then x=0 corresponds to y=5 etc.)
There are many built in distribution functions, those starting with a 'p' will transform to a uniform and those starting with a 'q' will transform from a uniform. So the transform in your example can be done by:
y <- qweibull( pnorm( x ), 2, 6.0056 )
Then just change the functions and/or parameters for other cases.
The distr package may also be of interest for additional capabilities.
In general, you can transform an observation x on X to an observation y on Y by
getting the probability of X≤x, i.e. FX(x).
then determining what observation y has the same probability,
I.e. you want the probability Y≤y = FY(y) to be the same as FX(x).
This gives FY(y) = FX(x).
Therefore y = FY-1(FX(x))
where FY-1 is better known as the quantile function, QY. The overall transformation from X to Y is summarized as: Y = QY(FX(X)).
In your particular example, from the R help, the distribution functions for the normal distribution is pnorm and the quantile function for the Weibull distribution is qweibull, so you want to first of all call pnorm, then qweibull on the result.
I am experimenting with ways to deal with overplotting in R, and one thing I want to try is to plot individual points but color them by the density of their neighborhood. In order to do this I would need to compute a 2D kernel density estimate at each point. However, it seems that the standard kernel density estimation functions are all grid-based. Is there a function for computing 2D kernel density estimates at specific points that I specify? I would imagine a function that takes x and y vectors as arguments and returns a vector of density estimates.
If I understand what you want to do, it could be achieved by fitting a smoothing model to the grid density estimate and then using that to predict the density at each point you are interested in. For example:
# Simulate some data and put in data frame DF
n <- 100
x <- rnorm(n)
y <- 3 + 2* x * rexp(n) + rnorm(n)
# add some outliers
y[sample(1:n,20)] <- rnorm(20,20,20)
DF <- data.frame(x,y)
# Calculate 2d density over a grid
library(MASS)
dens <- kde2d(x,y)
# create a new data frame of that 2d density grid
# (needs checking that I haven't stuffed up the order here of z?)
gr <- data.frame(with(dens, expand.grid(x,y)), as.vector(dens$z))
names(gr) <- c("xgr", "ygr", "zgr")
# Fit a model
mod <- loess(zgr~xgr*ygr, data=gr)
# Apply the model to the original data to estimate density at that point
DF$pointdens <- predict(mod, newdata=data.frame(xgr=x, ygr=y))
# Draw plot
library(ggplot2)
ggplot(DF, aes(x=x,y=y, color=pointdens)) + geom_point()
Or, if I just change n 10^6 we get
I eventually found the precise function I was looking for: interp.surface from the fields package. From the help text:
Uses bilinear weights to interpolate values on a rectangular grid to arbitrary locations or to another grid.