Computing integral of a line plot in R - r

I have two positive-valued vectors x,y of the same length in R. Using plot(x, y, "l",...), gives me a continuous line plot in 2 dimensions out of my finite vectors x and y. Is there a way to compute a definite integral over some range of this line plot in R?
edit1: I've looked into the integrate function in R. I'm not sure however how to make a function out of two vectors to pass to it, as my vectors are both finite.
edit2: For some more background, The length of x and y ~ 10,000. I've written a function to find periods, [xi, xj], of abnormalities in the data I'm observing. For each of these abnormalities, I've used plot to see what's going on in these snippets of my data. Now i need to compute statistics concerning the values of the integrals in these abnormal periods, so I'm trying to get as accurate as a number as possible to match with my graphs. X is a time variable, and I've taken very fine intervals of time.

You can do the integration with integrate(). To create a function out of your vectors x and y, you need to interpolate between the values. approxfun() does exactly that.
integrate takes a function and two bounds.
approxfun takes two vectors x and y just like those you have.
So my solution would be :
integrate(approxfun(x,y), range(x)[1], range(x)[2])

The approxfun function will take 2 vectors and return a function that gives the linear interpolation between the points. This can then be passed to functions like integrate. The splinefun function will also do interpolation, but based on a spline rather than piecewise linear.
In the piecewise linear case the integral will just be the sum of the trapezoids, it may be faster/simpler to just sum the areas of the trapezoids (the width, difference in x's

I landed here much later. But for future visitors,
here is some code for the suggestion from
Greg Snow's answer, for piece-wise linear functions:
line_integral <- function(x, y) {
dx <- diff(x)
end <- length(y)
my <- (y[1:(end - 1)] + y[2:end]) / 2
sum(dx *my)
}
# example
x <- c(0, 2, 3, 4, 5, 5, 6)
y <- c(0, 0, 1,-2,-1, 0, 0)
plot(x,y,"l")
line_integral(x,y)

Related

Using FFT in R to Determine Density Function for IID Sum

The goal is to compute the density function of a sum of n IID random variables via the density function of one of these random variables by:
Transforming the density function into the characteristic function via fft
Raise the characteristic function to the n
Transform the resulting characteristic function into the density function of interest via fft(inverse=TRUE)
The below is my naive attempt at this:
sum_of_n <- function(density, n, xstart, xend, power_of_2)
{
x <- seq(from=xstart, to=xend, by=(xend-xstart)/(2^power_of_2-1))
y <- density(x)
fft_y <- fft(y)
fft_sum_of_y <- (fft_y ^ n)
sum_of_y <- Re(fft(fft_sum_of_y, inverse=TRUE))
return(sum_of_y)
}
In the above, density is an arbitrary density function: for example
density <- function(x){return(dgamma(x = x, shape = 2, rate = 1))}
n indicates the number of IID random variables being summed. xstart and xend are the start and end of the approximate support of the random variable. power_of_2 is the power of 2 length for the numeric vectors used. As I understand things, lengths of powers of two increase the efficiency of the fft algorithm.
I understand at least partially why the above does not work as intended in general. Firstly, the values themselves will not be scaled correctly, as fft(inverse=TRUE) does not normalize by default. However, I find that the values are still not correct when I divide by the length of the vector i.e.
sum_of_y <- sum_of_y / length(sum_of_y)
which based on my admittedly limited understanding of fft is the normalizing calculation. Secondly, the resulting vector will be out of phase due to (someone correct me on this if I am wrong) the shifting of the zero frequency that occurs when fft is performed. I have tried to use, for example, pracma's fftshift and ifftshift, but they do not appear to address this problem correctly. For symmetric distributions e.g. normal, this is not difficult to address since the phase shift is typically exactly half, so that an operation like
sum_of_y <- c(sum_of_y[(length(y)/2+1):length(y)], sum_of_y[1:(length(y)/2)])
works as a correction. However, for asymmetric distributions like the gamma distribution above this fails.
In conclusion, are there adjustments to the code above that will result in an appropriately scaled and appropriately shifted final density function for the IID sum?

Splinefun R : how to force interpolated values to be positive, get adjusted value and derivative for the x interpolated

Beginner in R, I performs splinefun function on (x,y) values. I am searching to get the derivative of the function in x and the interpolated y values by the function. Also, I try to constrain the function to be >0.
Maybe someone already asked these questions ?
I performed the splinefun function, and have the impression that the function is not "smoothing" the values of observed (x,y) but passes exactly through these points. Is it how splinefun interpolates the values, or is there a way to constrain the function to "smooth" the (x,y) cloud?
With what I did, I always have y interpolated = y observed/measured.
Is there something to see with the "method" for the interpolation ("fmm", "monoH.FC",...), or with the "ties" argument?
I also tried to get the first derivative, thanks to the argument "f(x,deriv=1)", but I am not sure this is the right way to do that.
example of the code:
x <- c(1,8,14,21,28,35,42,65)
y <- c(65,30,70,150,40,0,15,0)
splinefun(x, y, method="fmm", ties=mean)
deriv <- f(x, deriv=1) #Get the first derivative
y <- f(x) # Get the interpolated y
I am searching how to :
1) Force the interpolated function to return a function with values for y only >0 (is there an argument in the splinefun function which handle that ?)
2) Get the values of the derivative of the interpolated function for the x (used for the interpolation)
3) Get the values of the y interpolated (different than the y that was used to perform the interpolation)
Thanks a lot for your help.

Points uniformly distributed on unit disk (2D)

I am trying to generate 10,000 points from the uniform distribution on the unit disk and plot these points.
The method I am using has three steps. The first step is generating the magnitude of the point x. This point has cdf F(x) = x^2 min(x) = 0 and max(x) = 1. The second step involves generating a 2 dimensional vector (which I will call y) from the multivariate normal distribution with mu being the zero vector and sigma being the 2x2 identity matrix - MVN(0,I). Last I normalize the vector y to have length x. I have tried to code the solution in R but I do not think my answer is correct. I would really appreciate if I could be pointed in the right direction.
u = runif(10000)
x = u^2
y = mvrnorm(10000, mu=rep(0,2), Sigma=diag(2))
y_norm = (x*y)/sqrt(sum(y^2))
plot(y_norm, asp = 1)
I used the MASS package for mvrnorm. Also I have included the plot that I ended up with:
You need to compute the length of each of the rows in your y matrix, you are getting the square root of the sum of all the numbers in y, which is just scaling your multinomial by a constant. Also, you need x to be sqrt(u) rather than u^2 - this code normalises each row by its length and users sqrt(u) scaling and it looks nice and uniform:
plot(sqrt(u)*y/sqrt(y[,1]^2+y[,2]^2))
There are better ways of making uniform points on a disc, unless this is just an exercise to do it this way...

Finding x-value at intersection between a linear and nonlinear equation in R

I have two functions: one for a line (y) and another for a curve (hnc). I would like to determine the one x-value at which the two functions intersect
sigma = 0.075
mu = 0
r=0.226
theta=0.908
H=0.16
hnc <- function(x) (1/(sigma*sqrt(2*pi)))*(exp(-(x^2)/(2*(sigma^2))))
y <- function(x) 2*pi*x+(pi*r^2/((360/theta)/H))
curve(hnc,0,r,n=100,col="blue")
plot(y,0,r,add=T,col="red")
I have tried using the nleqslv package, but this results in two separate x-values that do not agree (perhaps because I am using it incorrectly)
int <- function(x){
z <- numeric(2)
z[1] <- (1/(sigma*sqrt(2*pi)))*(exp(-(x[1]^2)/(2*(sigma^2))))
z[2] <- 2*pi*x[2]+(pi*r^2/((360/theta)/H))
z}
nleqslv(c(0.14,0.14),int,method="Broyden")
Any help would be much appreciated!
Thanks,
Eric
Using optimize here to find the minimum of a function if a single variable seems to work well
xx <- optimize(function(x) abs(hnc(x)-y(x)), c(.10,.20))$minimum
abline(v=xx, lty=2)
You are not using nleqslv in the correct way. It is meant for solving a system of non linear equations with as many variables as there are equations.
You have two functions and you want to determine the intersection which in your case consists of a single value for x.
You need to define a new function like this
g <- function(x) hnc(x) - y(x)
Then you can use uniroot to find a zero of g(x) like this:
uniroot(g,c(0,1))
The root found will be 0.1417802 which corresponds with the graph in the first answer.
Minimizing won't always work to find a point of intersection; if there is no point of intersection you will get misleading results.

Why do results from optim() depend on initial values?

In R, I am using the function optim() to find the minimum of an objective function of two variables. The real objective functions I'm working with are quite complex, so I tried to familiarize myself with the a simpler objective function. The simplest way to run optim() is optim(par,function) where par is a vector of initial values for the algorithm. I find that the answer I get depends heavily on the initial values I input. However, the function I used is so simple, I'm worried that I am misunderstanding either the input or output of optim().
The objective function I am using is:
f <- function(x){
abs(x[1])*abs(x[2])
}
When I run optim(c(-1.2,1),f) and optim(c(-1.2,10),f) I get drastically different output for the optimal arguments (par) and the minimum (value). Does anyone have an idea why this would be so?
In this case your objective function has infinitely many optimal points (not necessarily just different local maxima). Anywhere one of the parameters is zero 0 is just as good as any other point where a parameter is near 0. I'm not sure if you were expecting (0,0), but (0,34) has the same value and can also be considered optimal.
An associated function is:
g <- function(x, y) abs(x)*abs(y)
We can visualize the levels of the graph with contour and plot the points given:
A reasonable field given your initial and final conditions (noted from running optim):
x <- seq(-1.5, 0, by=.1)
y <- seq(0, 11, by=1)
A matrix of the values in g:
m <- outer(x, y, g)
Plot the results, including the results of optim. Note that the values at x==0 or y==0 are optimal.
contour(x, y, m)
o1 <- optim(c(-1.2,1),f)$par
o2 <- optim(c(-1.2,10),f)$par
segments(-1.2, 1, o1[1], o1[2], col='red')
segments(-1.2, 10, o2[1], o2[2], col='red')
# Add zero lines as contour does not draw them
segments(0, 11, 0, 0)
segments(-1.5, 0, 0, 0)
This shows a straight line from the initial condition (left side) to the zero of the function (right side). Note that the optimization does not follow a straight line, but this shows that it is reasonable that quite different results will be achieved.

Resources