How can i plot this transcendental equation in R? - r

This is the kepler equation in terms of the angle theta:
M=2*atan(tan(theta/2)*c)-e*sin(2*atan(tan(theta/2)*c))
where e=0.2056 and c=sqrt((1-e)/(1+e))
M goes from 0 to 2pi.
My X value is M and my Y value is theta. What code should I use to plot theta(M)?

Adjusted answer to make range of M be (0,2*pi)
Your equation:
M=2*atan(tan(theta/2)c)-esin(2*atan(tan(theta/2)*c))
defines M as a function of theta. It may be that in actual use you will know M and need to compute theta, but to get a plot, there is no need for an analytic formula for theta as a function of M. You just need a series of x-y values. So, you can generate a sequence of thetas, compute M and plot them, like this:
e=0.2056
c=sqrt((1-e)/(1+e))
theta = seq(0,2*pi, 0.1)
M=2*atan(tan(theta/2)*c)-e*sin(2*atan(tan(theta/2)*c))
M[M<0] = M[M<0] + 2*pi
plot(M, theta, pch=20)
If you need to be able to compute values of theta from a given M, you can approximate the inverse function like this.
THETA = approxfun(M, theta)
plot(M, THETA(M), type="l", ylab="theta")

Related

Monte Carlo to Estimate Theta using Gamma Distribution

I would like to run a monte carlo simulation in r to estimate theta. Could someone please recommend some resources and suggests for how I could do this?
I have started with creating a sample with the gamma distribution and using the shape and rate of the distribution, but I am unsure of where to go next with this.
x = seq(0.25, 2.5, by = 0.25)
PHI <- pgamma(x, shape = 5.5, 2)
CDF <- c()
n= 10000
set.seed(12481632)
y = rgamma(n, shape = 5.5, rate = 2)
You could rewrite your expression for θ, factoring out exponential distribution.
θ = 0∫∞ (x4.5/2) (2 e-2x) dx
Here (2 e-2x) is exponential distribution with rate=2, which suggests how to integrate it using Monte Carlo.
Sample random values from exponential
Compute function (x4.5/2) at sampled values
Mean value of those computed values would be the integral computed by M-C
Code, R 4.0.3 x64, Win 10
set.seed(312345)
n <- 10000
x <- rexp(n, rate = 2.0)
f <- 0.5*x**4.5
mean(f)
prints
[1] 1.160716
You could even estimate statistical error as
sd(f)/sqrt(n)
which prints
[1] 0.1275271
Thus M-C estimation of your integral θ is 1.160716∓0.1275271
What is implemented here is following, e.g. http://www.math.chalmers.se/Stat/Grundutb/CTH/tms150/1112/MC.pdf, 6.1.2, where
g(x) is our power function (x4.5/2), and f(x) is our exponential distribution.
UPDATE
Just to clarify one thing - there is no single canonical way to split under-the-integral expression into sampling PDF f(x) and computable function g(x), mean value of which would be our integral.
E.g., I could write
θ = 0∫∞ (x4.5 e-x) (e-x) dx
e-x would be the PDF f(x). Simple exponential with rate=1, and g(x) how has exponential leftover part. Similar code
set.seed(312345)
n <- 10000
f <- rexp(n, rate = 1.0)
g <- f**4.5*exp(-f)
print(mean(g))
print(sd(g)/sqrt(n))
produced integral value of 1.148697∓0.02158325. It is a bit better approach, because statistical error is smaller.
You could even write it as
θ = Γ(5.5) 0.55.5 0∫∞ 1 G(x| shape=5.5, scale=0.5) dx
where Γ(x) is gamma-function and G(x| shape, scale) is Gamma-distribution. So you could sample from gamma-distribution and g(x)=1 for any sampled x. Thus, this will give you precise answer. Code
set.seed(312345)
f <- rgamma(n, 5.5, scale=0.5)
g <- f**0.0 # g is equal to 1 for any value of f
print(mean(g)*gamma(5.5)*0.5**5.5)
print(sd(g)/sqrt(n))
produced integral value of 1.156623∓0.
The best way to estimate theta given its definition is
theta <- integrate(function(x) x^4.5 * exp(-2*x), from = 0, to = Inf)
Giving:
theta
#> [1] 1.156623
Another way to handle this is by seeing that the constant lambda^rate / gamma(rate) can be taken outside of the cdf integral, and since we know that the cdf at infinity is 1, then theta must equal gamma(rate)/lambda^rate
gamma(5.5)/2^5.5
#> [1] 1.156623
Note that we can also write functions for your pdf and cdf and plot them:
pdf <- function(t, rate, lambda) {
(lambda^rate)/gamma(rate) * t^(rate-1) * exp(-2 * t)
}
cdf <- function(x, rate, lambda) {
sapply(x, function(y) {
integrate(pdf, lower = 0, upper = y, lambda = lambda, rate = rate)$value
})
}
curve(pdf(x, 5.5, 2), from = 0, to = 10)
curve(cdf(x, 5.5, 2), from = 0, to = 10)
It's not quite clear how you would want a Monte Carlo simulation to help you with any of this.

How to interpolate those signal data with a polynomial?

I am trying to find the coefficients of a polynomial in R, but I am not sure of which order the polynomial is.
I have data:
x=seq(6, 174, by=8)
y=rep(c(-1,1),11)
Now I want to find the (obviously) non-linear function that hits up all these points. Function values should still is in the interval [-1,1], and all these points should be understood as the vertex of a parabola.
EDIT
Actually this is not example data, I just need exactly this function for exactly these points.
I tried to describe it with polynomials up to degree 25 and then gave up, with polynomials it seems that it is only possible to approximate the curve but not to get it directly.
Comments suggested using a sine curve. Does someone know how to get the exact trigonometric function?
Your data have a strong characteristic that they are sampled from a sinusoid signal. With restriction that y is constrained onto [-1,1], we know for sure the amplitude is 1, so let's assume we want a sin function:
y = sin((2 * pi / T) * x + phi)
where T is period and phi is phase. The period of your data is evident: 2 * 8 = 16. To get phi, just use the fact that when x = 6, y = -1. That is
sin(12 * pi / T + phi) = -1
which gives one solution: phi = -pi/2 - 12 * pi / T.
Here we go:
T <- 16
phi <- -pi/2 - 12 * pi / T
f <- function(x) sin(x * pi / 8 + phi)
plot(x, y)
x0 <- seq(6, 174, by = 0.2)
y0 <- f(x0)
lines(x0, y0, col = 2)
Your original intention to have a polynomial is not impossible, but it can't be an ordinary polynomial. An ordinary polynomial is unbounded. It will tends to Inf or -Inf when x tends to Inf or -Inf.
Local polynomial is possible. Since you say: all these points should be understood as the vertex of a parabola, you seem to expect a smooth function. Then a cubic spline is ideal. Specifically, we don't want a natural cubic spline but a period cubic spline. The spline function from stats package can help us:
int <- spline(x[-1], y[-1], method = "periodic", xout = x0)
Note, I have dropped the first datum, as with "periodic" method, spline wants y to have the same value on both ends. Once we drop the first datum, y values are 1 on both sides.
plot(x, y)
lines(int, col = 2)
I did not compare the spline interpolation with the sinusoid function. They can't be exactly the same, but in statistical modelling we can use either one to model the underlying cyclic signal / effect.

Calculate the volume under a plot of kernel bivariate density estimation

I need to calculate a measure called mutual information. First of all, I need to calculate another measure, called entropy, for example, the joint entropy of x and y:
-∬p(x,y)·log p(x,y)dxdy
So, to calculate p(x,y), I used the kernel density estimator (in this way, function kde2d, and it returned the Z values (probability of having x and y in that window).
Again, by now, I have a matrix of Z values [1x100] x [1x100], that's equal my p(x,y). But I have to integrate it, by discovering the volume under the surface (doble integral). But I didn't found a way to do that. The function quad2d, to compute the double quadrature didn't work, because I only integrated a numerical matrix p(x,y), and it gives me a constant....
Anyone knows something to find that volume/calculate the double integral?
The image of the plot from persp3d:
Thanks everybody !!!!
Once you have the results from kde2d, it is very straighforward to compute a numerical integral. The example session below sketches how to do it.
As you know, numerical double integral is just a 2D summation. The kde2d, by default takes range(x) and range(y) as 2D domain. I see that you got a 100*100 matrix, so I think you have set n = 100 in using kde2d. Now, kde$x, kde$y defines a 100 * 100 grid, with den$z giving density on each grid cell. It is easy to compute the size of each grid cell (they are all equal), then we do three steps:
find normalizing constants; although we know that in theory, density sums up (or integrates) to 1, but after computer discretization, it only approximates 1. So we first compute this normalizing constant for later rescaling;
the integrand for entropy is z * log(z); since z is a 100 * 100 matrix, this is also a matrix. You simply sum them up, and multiply it by the cell size cell_size, then you get a non-normalized entropy;
rescale the non-normalized entropy for a normalized one.
## sample data: bivariate normal, with covariance/correlation 0
set.seed(123); x <- rnorm(1000, 0, 2) ## marginal variance: 4
set.seed(456); y <- rnorm(1000, 0, 2) ## marginal variance: 4
## load MASS
library(MASS)
## domain:
xlim <- range(x)
ylim <- range(y)
## 2D Kernel Density Estimation
den <- kde2d(x, y, n = 100, lims = c(xlim, ylim))
##persp(den$x,den$y,den$z)
z <- den$z ## extract density
## den$x, den$y expands a 2D grid, with den$z being density on each grid cell
## numerical integration is straighforward, by aggregation over all cells
## the size of each grid cell (a rectangular cell) is:
cell_size <- (diff(xlim) / 100) * (diff(ylim) / 100)
## normalizing constant; ideally should be 1, but actually only close to 1 due to discretization
norm <- sum(z) * cell_size
## your integrand: z * log(z) * (-1):
integrand <- z * log(z) * (-1)
## get numerical integral by summation:
entropy <- sum(integrand) * cell_size
## self-normalization:
entropy <- entropy / norm
Verification
The above code gives entropy of 4.230938. Now, Wikipedia - Multivariate normal distribution gives entropy formula:
(k / 2) * (1 + log(2 * pi)) + (1 / 2) * log(det(Sigma))
For the above bivariate normal distribution, we have k = 2. We have Sigma (covariance matrix):
4 0
0 4
whose determinant is 16. Hence, the theoretical value is:
(1 + log(2 * pi)) + (1 / 2) * log(16) = 4.224171
Good match!

Calculate the derivative of a data-function in r

Is there an easy way to calculate the derivative of non-liner functions that are give by data?
for example:
x = 1 / c(1000:1)
y = x^-1.5
ycs = cumsum(y)
plot (x, ycs, log="xy")
How can I calculate the derivative function from the function given by ´x´ and ´ycs´?
Was also going to suggest an example of a smoothed spline fit followed by prediction of the derivative. In this case, the results are very similar to the diff calculation described by #dbaupp:
spl <- smooth.spline(x, y=ycs)
pred <- predict(spl)
plot (x, ycs, log="xy")
lines(pred, col=2)
ycs.prime <- diff(ycs)/diff(x)
pred.prime <- predict(spl, deriv=1)
plot(ycs.prime)
lines(pred.prime$y, col=2)
Generating derivatives from raw data is risky unless you are very careful. Not for nothing is this process known as "error multiplier." Unless you know the noise content of your data and take some action (e.g. spline) to remove the noise prior to differentiation, you may well end up with a scary curve indeed.
The derivative of a function is dy/dx, which can be approximated by Δy/Δx, that is, "change in y over change in x". This can be written in R as
ycs.prime <- diff(ycs)/diff(x)
and now ycs.prime contains an approximation to the derivative of the function at each x: however it is a vector of length 999, so you will need to shorten x (i.e. use x[1:999] or x[2:1000]) when doing any analysis or plotting.
There is also gradient from the pracma package.
grad <- pracma::gradient(ycs, h1 = x)
plot(grad, col = 1)

Given a random variable with probability density function f(x), how to compute the expected value of this random variable in R?

Given a random variable with probability density function f(x), how to compute the expected value of this random variable in R?
If you want to compute the expected value, just compute :
E(X) = Integral of xf(x)dx over the whole domain of X.
The integration can easily be done using the function integrate().
Say you're having a normal density function (you can easily define your own density function) :
f <- function(x){
1/sqrt(2*pi)*exp((-1/2)*x^2)
}
You calculate the expected value simply by:
f2 <- function(x){x*f(x)}
integrate(f2,-Inf,Inf )
Pay attention, sometimes you need to use Vectorize() for your function. This is necessary to get integrate to work. For more info, see the help pages of integrate() and Vectorize().
Does it help to know that the expectation E is the integral of x*f(x) dx for x in (-inf, inf)?
You could also use the inverse sampling transformation. All you need is the cumulate density function F(x) of your random variable X. It utilises the fact that the random variable U = F(X) is uniform (with pdf f(x)). You then have that X = F^-1(U). This means that you can sample from a uniform variable and then transform it through F^-1(U) to get a sample from X. You can then take the mean of your sample.
Here is an example for the exponential distribution with parameter lambda = 5, mean = 1/5, F(x) = 1 - exp(-lambda * x) and F^-1(u) = -log(1 - x) / lambda.
sample_exp = function(n, lambda = 5){
u = runif(n)
y = -log(1 - u) / lambda
mean(y)
}
n = seq(10, 4000, 10)
res = sapply(n, sample_exp)
plot(n, res, type = "l", xlab = "sample size",
ylab = "Estimated mean", main = "True mean = 0.2")
Below is a plot of the estimated mean as a function of the sample size:

Resources