Lebesgue–Stieltjes Integration in R - r

I calculated a distribution function numerically. First I plot the function. It looks wrong around 0.05. Is this due to rounding errors, please?
Second, I need to find the corresponding first and second non-central moments. That is,
EX = int x dF(x)
EX^2 = int x^2 dF(x)
Can I do this type of Lebesgue–Stieltjes integration in R, please? Is there a build-in method, please? If not in R, what package offers such calculation, please?
I guess alternatively, I can find the numerical differentiation f(x) of F(x) and then conduct the usually integration like
EX = int x f(x) dx
But I remember from somewhere that numerical differentiation is much less stable. Which is the right way, please?
FYI my functions are attached below.
library(mvtnorm)
library(matrixcalc)
VAR <- matrix(c(1.043856e-03, 5.044899e-04, 3.239951e-04, 2.330992e-04, 0.0001779055, 0.0001403866, 0.0001127118, 9.074962e-05, 7.157144e-05,
5.044899e-04, 5.485889e-04, 3.523165e-04, 2.534751e-04, 0.0001934568, 0.0001526582, 0.0001225642, 9.868232e-05, 7.782773e-05,
3.239951e-04, 3.523165e-04, 3.878844e-04, 2.790645e-04, 0.0002129870, 0.0001680697, 0.0001349376, 1.086447e-04, 8.568475e-05,
2.330992e-04, 2.534751e-04, 2.790645e-04, 3.123147e-04, 0.0002383642, 0.0001880950, 0.0001510153, 1.215896e-04, 9.589399e-05,
1.779055e-04, 1.934568e-04, 2.129870e-04, 2.383642e-04, 0.0002728857, 0.0002153361, 0.0001728863, 1.391990e-04, 1.097820e-04,
1.403866e-04, 1.526582e-04, 1.680697e-04, 1.880950e-04, 0.0002153361, 0.0002548851, 0.0002046389, 1.647645e-04, 1.299447e-04,
1.127118e-04, 1.225642e-04, 1.349376e-04, 1.510153e-04, 0.0001728863, 0.0002046389, 0.0002555744, 2.057751e-04, 1.622886e-04,
9.074962e-05, 9.868232e-05, 1.086447e-04, 1.215896e-04, 0.0001391990, 0.0001647645, 0.0002057751, 2.840218e-04, 2.239993e-04,
7.157144e-05, 7.782773e-05, 8.568475e-05, 9.589399e-05, 0.0001097820, 0.0001299447, 0.0001622886, 2.239993e-04, 3.974881e-04),
nrow=9, ncol=9, byrow=TRUE)
is.symmetric.matrix(VAR)
is.positive.definite(VAR)
kappa(VAR)
CDF <- function(x){
summand <- rep(0, 5)
for(j in 5:9){
choice <- combn(9, j)
for(i in 1:ncol(choice)){
ub <- rep(Inf, 9)
ub[choice[, i]] <- x
summand[j-4] <- summand[j-4] + as.numeric(pmvnorm(lower=rep(-Inf, 9), upper=ub, sigma=VAR))
}
}
l <- c(1, -5, 15, -35, 70)
as.numeric(t(l)%*%summand)
}
CDF <- Vectorize(CDF)
x <- seq(-0.1, 0.1, by=0.01)
y <- CDF(x)
plot(x, y, type="l", lwd=2)

I initially plotted the result I got from taking first differences from numCDF <- CDF( seq(-10, 10, length=100) ), but that was rather disappointing, since only one value was different than 0. So I restricted the focus to:
numCDF <- CDF( seq(-.10, .10, length=100) )
plot( diff(numCDF) )
Simply plotting the values of numCDF produces similar chaotic results in the region where you expressed concern.
So I think maybe your function is not sufficiently well-behaved to yield good results.

Related

sophisticated horizontal axis distortion control

I stumbled upon this site and found there a very interesting transformation of the time axis through the sigmoid
I modified the horizontal axis distortion code to suit my needs.
set.seed(123)
vec <- cumsum(rnorm(50))
sigmoid <- function(x) 1 / (1 + exp(-x))
y1 <- sigmoid(seq(-2,2,length.out = 25))
y2 <- sigmoid(seq(-5,5,length.out = 25))
fu <- function(y1,y2, vec){
n <- length(vec)
s1 <- splinefun(y1, y2, method = "natural")(vec)
s2 <- splinefun(y1, y2, method = "natural")(seq(min(vec),max(vec),by=ceiling(length(vec)/50)))
if (! all(s2 == sort(s2))) warning("Non monotonic transformation on y axis!")
d <- cbind(x=1:length(vec), y=s1)
par(mfrow=c(2,2), mar=rep(2,4))
plot(vec,t="b",lwd=2) ; abline(h=seq(-10,10,by = 1),col=8,lty=2)
plot(d, type="b", lwd=2) ; abline(h=s2,col=8,lty=2)
matplot(cbind(y1,y2),t="l",col=c(2,4))
}
fu(y1,y2,vec)
I would like to replace the sigmoid with more complex functions such as a sine wave or the sum of sine waveforms. But when I try to do this, I get unsatisfactory results.
my.sin <- function(ve,a,f,p) a*sin(f*ve+p)
y1 <- my.sin(1:20,a = 2,f = 0.3,p = 0)+2
y2 <- my.sin(1:20,a = 2,f = 0.6,p = 3)+2
fu(y1,y2,vec)
QUESTION: How to make it so that the graph can only expand / contract, but not distorted,
That is, so that the graph retains its shape in all respects except for the horizontal axis.
That is, I want the same distortion as in my picture with a sigmode, but I want to replace the sigoid with a sinusoid ..
100 times I apologize for the unclear wording of the question, but I do not know how to explain it better

Passing arguments of an R function which is itself an argument

Environments and the like have always confused me incredibly in R. I guess therefore this is more of a reference request, since I've been surfing the site for the last hour in search of an answer to no avail.
I have a simple R function called target defined as follows
target <- function(x,scale,shape){
s <- scale
b <- shape
value <- 0.5*(sin(s*x)^b + x + 1)
return(value)
}
I then define the function AR
AR <- function(n,f,...){
variates <- NULL
for(i in 1:n){
z <- runif(1)
u <- runif(1)
if(u < f(z, scale, shape)/c){
variates[i] <- z
}else{next}
}
variates <- variates[!is.na(variates)]
return(variates)
}
in which the function target is being evaluated. Unfortunately, the call returns the following error
sample <- AR(n = 10000, f = target, shape = 8, scale = 5)
Error in fun(z, scale, shape) : object 'shape' not found
I know this has to do with the function AR not knowing where to look for the objects shape and scale, but I thought that was exactly the job of the ellipsis: allowing me to sort of put argument definition "on hold" until one actually calls the function. Where am I wrong and could anyone give me a lead as to where to look for insight on this specific problem?
You are very close, you just need to make use of your ellipses...
NB: c was not defined in AR so I added it and gave it a value.
NB2: I would refrain from using c and sample in your function as these themselves are functions and could cause some confusion downt he road.
AR <- function(n, f, c, ...){
variates <- NULL
for(i in 1:n){
z <- runif(1)
u <- runif(1)
if(u < f(z, ...)/c){ ##instead of using shape and scale use the ellipses and R will insert any parameters here which were not defined in the function
variates[i] <- z
}else{next}
}
variates <- variates[!is.na(variates)]
return(variates)
}
sample <- AR(n = 10000, f = target, shape = 8, scale = 5, c = 100)

Plotting a sum in R Studio

I am trying to plot the following function in R Studio using the curve function as follows:
loglikelihood.func = function(x, mu){
n = length(x)
n*mu - sum(x) - sum(exp(mu)/(exp(x)))
}
curve(expr = loglikelihood.func(x = data, mu), xname = "mu", from
= 0, to = 15)
Now, I have a vector of data that contains 50 data points and in the function, it is currently summing both mu and x (my data) i.e. sum(exp(mu)/(exp(x))) is equivalent to sum(exp(mu))/sum(exp(x))
I want my function to work so that I have the fraction sum for each different data point in x while keeping mu constant i.e. exp(mu)/exp(x1) + exp(mu)/exp(x2) + exp(mu)/exp(x3) + ... and repeat this for each separate mu when it plots in the curve function.
If I change my function to exp(mu)/sum(exp(x)) it's doing exp(mu)/[exp(x1)+exp(x2)+...] which is not what I want. Can someone offer some advice here?
Edit:
This is a subset of my data,
data = c(8.5,8.9,9.1,8.9,8.4,9.7,9.1,9.6,8.7,9.3,9.6,9.3,8.7,9.0,8.8,8.9,8.9,12.2)
Not sure I get this right... But curve() can't do as much by itself. You can define your function:
loglikelihood.func = function(x, mu) {
length(x) * mu - sum(x) - sum(exp(mu)/(exp(x)))
}
Then define a range (for x) over which you'd like to plot it, and specify the limits (xlim, ylim) on a first plot. If you don't want to have a sequence, you can use your data instead.
xrange <- seq(from=1, to=10, by = 0.1)
plot(x=xrange, y = sapply(xrange, function(x) loglikelihood.func(x, mu=0)),
xlim = c(1, 10),
ylim = c(-10, 0),
type = "l")
Then add other curves, specifying different mu's:
lines(x=xrange, y = sapply(xrange, function(x) loglikelihood.func(x, mu=1)))
lines(x=xrange, y = sapply(xrange, function(x) loglikelihood.func(x, mu=2)))
(More practical doing with a loop if you don't need extra graph parameters)

How to extrapolate beyond the x points passed to `ksmooth`?

I have a kernel function like so:
x <- 1:100
y <- rnorm(100, mean=(x/2000)^2)
plot(x,y)
kernel <- ksmooth(x,y, kernel="normal", bandwidth=10)
print(kernel$y)
If I try to predict at a point outside of the range of x values, it will give me NaN, because it is attempting to extrapolate beyond the data:
x <- 1:100
y <- rnorm(100, mean=(x/2000)^2)
plot(x,y)
kernel <- ksmooth(x,y, kernel="normal", bandwidth=10, x.points=c(130))
print(kernel$y)
> print(kernel$y)
[1] NA
Even when I change range.x it doesn't budge:
x <- 1:100
y <- rnorm(100, mean=(x/2000)^2)
plot(x,y)
kernel <- ksmooth(x,y, kernel="normal", bandwidth=10, range.x=c(1,200) , x.points=c(130))
print(kernel$y)
> print(kernel$y)
[1] NA
How do I get the ksmooth function the extrapolate beyond the data? I know this is a bad idea in theory, but in practice this issue comes up all the time.
To answer your side question, looking at the code of ksmooth, range.x is only used when x.points is not provided so that explains why you do not see it used. Let's look at the code in ksmooth:
function (x, y, kernel = c("box", "normal"), bandwidth = 0.5,
range.x = range(x), n.points = max(100L, length(x)), x.points)
{
if (missing(y) || is.null(y))
stop("numeric y must be supplied.\nFor density estimation use density()")
kernel <- match.arg(kernel)
krn <- switch(kernel, box = 1L, normal = 2L)
x.points <- if (missing(x.points))
seq.int(range.x[1L], range.x[2L], length.out = n.points)
else {
n.points <- length(x.points)
sort(x.points)
}
ord <- order(x)
.Call(C_ksmooth, x[ord], y[ord], x.points, krn, bandwidth)
}
From this we see that we need to not provide x.points to make sure that range.x is used. If you run:
x <- 1:100
y <- rnorm(100, mean=(x/2000)^2)
plot(x,y)
kernel <- ksmooth(x,y, kernel="normal", bandwidth=10, range.x=c(1,200))
plot(kernel$x, kernel$y)
Now you'll see that your kernel is evaluated beyond 100 (although not up to 200). Increasing the bandwidth parameter allows you to get even further away from 100.

Adding two random variables via convolution in R

I would like to compute the convolution of two probability distributions in R and I need some help. For the sake of simplicity, let's say I have a variable x that is normally distributed with mean = 1.0 and stdev = 0.5, and y that is log-normally distributed with mean = 1.5 and stdev = 0.75. I want to determine z = x + y. I understand that the distribution of z is not known a priori.
As an aside the real world example I am working with requires addition to two random variables that are distributed according to a number of different distributions.
Does anyone know how to add two random variables by convoluting the probability density functions of x and y?
I have tried generating n normally distributed random values (with above parameters) and adding them to n log-normally distributed random values. However, I wish to know if I can use the convolution method instead. Any help would be greatly appreciated.
EDIT
Thank you for these answers. I define a pdf, and try to do the convolution integral, but R complains on the integration step. My pdfs are Log Pearson 3 and are as follows
dlp3 <- function(x, a, b, g) {
p1 <- 1/(x*abs(b) * gamma(a))
p2 <- ((log(x)-g)/b)^(a-1)
p3 <- exp(-1* (log(x)-g) / b)
d <- p1 * p2 * p3
return(d)
}
f.m <- function(x) dlp3(x,3.2594,-0.18218,0.53441)
f.s <- function(x) dlp3(x,9.5645,-0.07676,1.184)
f.t <- function(z) integrate(function(x,z) f.s(z-x)*f.m(x),-Inf,Inf,z)$value
f.t <- Vectorize(f.t)
integrate(f.t, lower = 0, upper = 3.6)
R complains at the last step since the f.t function is bounded and my integration limits are probably not correct. Any ideas on how to solve this?
Here is one way.
f.X <- function(x) dnorm(x,1,0.5) # normal (mu=1.5, sigma=0.5)
f.Y <- function(y) dlnorm(y,1.5, 0.75) # log-normal (mu=1.5, sigma=0.75)
# convolution integral
f.Z <- function(z) integrate(function(x,z) f.Y(z-x)*f.X(x),-Inf,Inf,z)$value
f.Z <- Vectorize(f.Z) # need to vectorize the resulting fn.
set.seed(1) # for reproducible example
X <- rnorm(1000,1,0.5)
Y <- rlnorm(1000,1.5,0.75)
Z <- X + Y
# compare the methods
hist(Z,freq=F,breaks=50, xlim=c(0,30))
z <- seq(0,50,0.01)
lines(z,f.Z(z),lty=2,col="red")
Same thing using package distr.
library(distr)
N <- Norm(mean=1, sd=0.5) # N is signature for normal dist
L <- Lnorm(meanlog=1.5,sdlog=0.75) # same for log-normal
conv <- convpow(L+N,1) # object of class AbscontDistribution
f.Z <- d(conv) # distribution function
hist(Z,freq=F,breaks=50, xlim=c(0,30))
z <- seq(0,50,0.01)
lines(z,f.Z(z),lty=2,col="red")
I was having trouble getting integrate() to work for different density parameters, so I came up with an alternative to #jlhoward's using Riemann approximation:
set.seed(1)
#densities to be convolved. could also put these in the function below
d1 <- function(x) dnorm(x,1,0.5) #
d2 <- function(y) dlnorm(y,1.5, 0.75)
#Riemann approximation of convolution
conv <- function(t, a, b, d) { #a to b needs to cover the range of densities above. d needs to be small for accurate approx.
z <- NA
x <- seq(a, b, d)
for (i in 1:length(t)){
print(i)
z[i] <- sum(d1(x)*d2(t[i]-x)*d)
}
return(z)
}
#check against sampled convolution
X <- rnorm(1000, 1, 0.5)
Y <- rlnorm(1000, 1.5, 0.75)
Z <- X + Y
t <- seq(0, 50, 0.05) #range to evaluate t, smaller increment -> smoother curve
hist(Z, breaks = 50, freq = F, xlim = c(0,30))
lines(t, conv(t, -100, 100, 0.1), type = "s", col = "red")

Resources