How to extrapolate beyond the x points passed to `ksmooth`? - r

I have a kernel function like so:
x <- 1:100
y <- rnorm(100, mean=(x/2000)^2)
plot(x,y)
kernel <- ksmooth(x,y, kernel="normal", bandwidth=10)
print(kernel$y)
If I try to predict at a point outside of the range of x values, it will give me NaN, because it is attempting to extrapolate beyond the data:
x <- 1:100
y <- rnorm(100, mean=(x/2000)^2)
plot(x,y)
kernel <- ksmooth(x,y, kernel="normal", bandwidth=10, x.points=c(130))
print(kernel$y)
> print(kernel$y)
[1] NA
Even when I change range.x it doesn't budge:
x <- 1:100
y <- rnorm(100, mean=(x/2000)^2)
plot(x,y)
kernel <- ksmooth(x,y, kernel="normal", bandwidth=10, range.x=c(1,200) , x.points=c(130))
print(kernel$y)
> print(kernel$y)
[1] NA
How do I get the ksmooth function the extrapolate beyond the data? I know this is a bad idea in theory, but in practice this issue comes up all the time.

To answer your side question, looking at the code of ksmooth, range.x is only used when x.points is not provided so that explains why you do not see it used. Let's look at the code in ksmooth:
function (x, y, kernel = c("box", "normal"), bandwidth = 0.5,
range.x = range(x), n.points = max(100L, length(x)), x.points)
{
if (missing(y) || is.null(y))
stop("numeric y must be supplied.\nFor density estimation use density()")
kernel <- match.arg(kernel)
krn <- switch(kernel, box = 1L, normal = 2L)
x.points <- if (missing(x.points))
seq.int(range.x[1L], range.x[2L], length.out = n.points)
else {
n.points <- length(x.points)
sort(x.points)
}
ord <- order(x)
.Call(C_ksmooth, x[ord], y[ord], x.points, krn, bandwidth)
}
From this we see that we need to not provide x.points to make sure that range.x is used. If you run:
x <- 1:100
y <- rnorm(100, mean=(x/2000)^2)
plot(x,y)
kernel <- ksmooth(x,y, kernel="normal", bandwidth=10, range.x=c(1,200))
plot(kernel$x, kernel$y)
Now you'll see that your kernel is evaluated beyond 100 (although not up to 200). Increasing the bandwidth parameter allows you to get even further away from 100.

Related

Time varying parameter-matrix in deSolve R

I am struggling with this for so long. I have a logistic growth function where the growth parameter
r is a matrix. The model is constructed in a way that I have as an output two N the N1 and N2.
I would like to be able to change the r parameter over time. When time < 50 I would like
r = r1 where
r1=matrix(c(
2,3),
nrow=1, ncol=2
When time >= 50 I would like r=r2 where
r2=matrix(c(
1,2),
nrow=1, ncol=2
Here is my function. Any help is highly appreciated.
rm(list = ls())
library(deSolve)
model <- function(time, y, params) {
with(as.list(c(y,params)),{
N = y[paste("N",1:2, sep = "")]
dN <- r*N*(1-N/K)
return(list(c(dN)))
})
}
r=matrix(c(
4,5),
nrow=1, ncol=2)
K=100
params <- list(r,K)
y<- c(N1=0.1, N2=0.2)
times <- seq(0,100,1)
out <- ode(y, times, model, params)
plot(out)
I would like ideally something like this but it does not work
model <- function(time, y, params) {
with(as.list(c(y,params)),{
N = y[paste("N",1:2, sep = "")]
r = ifelse(times < 10, matrix(c(1,3),nrow=1, ncol=2),
ifelse(times > 10, matrix(c(1,4),nrow=1, ncol=2), matrix(c(1,2),nrow=1, ncol=2)))
print(r)
dN <- r*N*(1-N/K)
return(list(c(dN)))
})
}
Thank you for your time.
Here a generic approach that uses an extended version of the approx function. Note also some further simplifications of the model function and the additional plot of the parameter values.
Edit changed according to the suggestion of Lewis Carter to make the parameter change at t=3, so that the effect can be seen.
library(simecol) # contains approxTime, a vector version of approx
model <- function(time, N, params) {
r <- approxTime(params$signal, time, rule = 2, f=0, method="constant")[-1]
K <- params$K
dN <- r*N*(1-N/K)
return(list(c(dN), r))
}
signal <- matrix(
# time, r[1, 2],
c( 0, 2, 3,
3, 1, 2,
100, 1, 2), ncol=3, byrow=TRUE
)
## test of the interpolation
approxTime(signal, c(1, 2.9, 3, 100), rule = 2, f=0, method="constant")
params <- list(signal = signal, K = 100)
y <- c(N1=0.1, N2=0.2)
times <- seq(0, 10, 0.1)
out <- ode(y, times, model, params)
plot(out)
For a small number of state variables like in the example, separate signals with approxfun from package stats will look less generic but may be slighlty faster.
As a further improvement, one may consider to replace the "hard" transitions with a more smooth one. This can then directly be formulated as a function without the need of approx, approxfun or approxTime.
Edit 2:
Package simecol imports deSolve, and we need only a small function from it. So instead of loading simecol it is also possible to include the approxTime function explicitly in the code. The conversion from data frame to matrix improves performance, but a matrix is preferred anyway in such cases.
approxTime <- function(x, xout, ...) {
if (is.data.frame(x)) {x <- as.matrix(x); wasdf <- TRUE} else wasdf <- FALSE
if (!is.matrix(x)) stop("x must be a matrix or data frame")
m <- ncol(x)
y <- matrix(0, nrow=length(xout), ncol=m)
y[,1] <- xout
for (i in 2:m) {
y[,i] <- as.vector(approx(x[,1], x[,i], xout, ...)$y)
}
if (wasdf) y <- as.data.frame(y)
names(y) <- dimnames(x)[[2]]
y
}
If you want to pass a matrix parameter you should pass a list of parameters and you can modify it inside the model when your time limit is exceeded (in the example below you don't even have to pass the r matrix to the model function)
library(deSolve)
model <- function(time, y, params) {
with(as.list(c(y,params)),{
if(time < 3) r = matrix(c(2,3), nrow = 1, ncol = 2)
else r = matrix(c(1,3), nrow = 1, ncol = 2)
N = y[paste("N",1:2, sep = "")]
dN <- r*N*(1-N/K)
return(list(c(dN)))
})
}
y <- c(N1=0.1, N2=0.2)
params <- list(r = matrix(c(0,0), nrow = 1, ncol = 2), K=100)
times <- seq(0,10,0.1)
out <- ode(y, times, model, params)
plot(out)
You can see examples of this for instance with Delay Differential Equations ?dede

How to write a function to get the x-value which yield the maximum Y in a loess smooth?

does anyone can help to write a function which can return the x value of the loess smooth? I did like follows, but seems wrong. What I am want to get is the x-value, which yield the maximum Y in the loess function. Thanks in advance.
myFmsy<-function(x,y){
model <- loess(y ~ x,span = 0.4)
return(x[which(y==max(y))])
}
The problem is that you are fitting a model and then not using it at all.
The return value of loess is a list (of class "loess") with a member fitted. This is the vector where you want to find the maximum.
myFmsy <- function(x, y){
model <- loess(y ~ x,span = 0.4)
yfit <- model$fitted
x[which(yfit == max(yfit))]
}
set.seed(6589) # Make the results reproducible
x <- rnorm(100)
y <- rnorm(100)
myFmsy(x, y)
#[1] -0.938093
There might be cases where due to floating-point issues several values are close to each other, whithin a given tolerance. The following function checks this and also returns the fitted y and the index ix of where it can be found.
myFmsy2 <- function(x, y, tol = .Machine$double.eps^0.5){
model <- loess(y ~ x,span = 0.4)
yfit <- model$fitted
inx <- which(abs(yfit - max(yfit)) < tol)
list(x = x[inx], y.fitted = yfit[inx], ix = inx)
}
myFmsy2(x, y)
#$`x`
#[1] -0.938093
#
#$y.fitted
#[1] 0.5046313
#
#$ix
#[1] 48

Lebesgue–Stieltjes Integration in R

I calculated a distribution function numerically. First I plot the function. It looks wrong around 0.05. Is this due to rounding errors, please?
Second, I need to find the corresponding first and second non-central moments. That is,
EX = int x dF(x)
EX^2 = int x^2 dF(x)
Can I do this type of Lebesgue–Stieltjes integration in R, please? Is there a build-in method, please? If not in R, what package offers such calculation, please?
I guess alternatively, I can find the numerical differentiation f(x) of F(x) and then conduct the usually integration like
EX = int x f(x) dx
But I remember from somewhere that numerical differentiation is much less stable. Which is the right way, please?
FYI my functions are attached below.
library(mvtnorm)
library(matrixcalc)
VAR <- matrix(c(1.043856e-03, 5.044899e-04, 3.239951e-04, 2.330992e-04, 0.0001779055, 0.0001403866, 0.0001127118, 9.074962e-05, 7.157144e-05,
5.044899e-04, 5.485889e-04, 3.523165e-04, 2.534751e-04, 0.0001934568, 0.0001526582, 0.0001225642, 9.868232e-05, 7.782773e-05,
3.239951e-04, 3.523165e-04, 3.878844e-04, 2.790645e-04, 0.0002129870, 0.0001680697, 0.0001349376, 1.086447e-04, 8.568475e-05,
2.330992e-04, 2.534751e-04, 2.790645e-04, 3.123147e-04, 0.0002383642, 0.0001880950, 0.0001510153, 1.215896e-04, 9.589399e-05,
1.779055e-04, 1.934568e-04, 2.129870e-04, 2.383642e-04, 0.0002728857, 0.0002153361, 0.0001728863, 1.391990e-04, 1.097820e-04,
1.403866e-04, 1.526582e-04, 1.680697e-04, 1.880950e-04, 0.0002153361, 0.0002548851, 0.0002046389, 1.647645e-04, 1.299447e-04,
1.127118e-04, 1.225642e-04, 1.349376e-04, 1.510153e-04, 0.0001728863, 0.0002046389, 0.0002555744, 2.057751e-04, 1.622886e-04,
9.074962e-05, 9.868232e-05, 1.086447e-04, 1.215896e-04, 0.0001391990, 0.0001647645, 0.0002057751, 2.840218e-04, 2.239993e-04,
7.157144e-05, 7.782773e-05, 8.568475e-05, 9.589399e-05, 0.0001097820, 0.0001299447, 0.0001622886, 2.239993e-04, 3.974881e-04),
nrow=9, ncol=9, byrow=TRUE)
is.symmetric.matrix(VAR)
is.positive.definite(VAR)
kappa(VAR)
CDF <- function(x){
summand <- rep(0, 5)
for(j in 5:9){
choice <- combn(9, j)
for(i in 1:ncol(choice)){
ub <- rep(Inf, 9)
ub[choice[, i]] <- x
summand[j-4] <- summand[j-4] + as.numeric(pmvnorm(lower=rep(-Inf, 9), upper=ub, sigma=VAR))
}
}
l <- c(1, -5, 15, -35, 70)
as.numeric(t(l)%*%summand)
}
CDF <- Vectorize(CDF)
x <- seq(-0.1, 0.1, by=0.01)
y <- CDF(x)
plot(x, y, type="l", lwd=2)
I initially plotted the result I got from taking first differences from numCDF <- CDF( seq(-10, 10, length=100) ), but that was rather disappointing, since only one value was different than 0. So I restricted the focus to:
numCDF <- CDF( seq(-.10, .10, length=100) )
plot( diff(numCDF) )
Simply plotting the values of numCDF produces similar chaotic results in the region where you expressed concern.
So I think maybe your function is not sufficiently well-behaved to yield good results.

R plot heatmap of matrix with superposed line

I have a matrix whose values I want to plot. Using the image function it looks like this.
How can I plot a line over the image?
(In my case, it want to plot a line over the maximum values along the x axes)
Edit
The image and the line I want to plot over are the output from the Bayesian Online Changepoint detection. Since it is quite short I'll share the whole code. The plotting part is at the end:
# Bayesian Online Changepoint Detection
# Adams, MacKay 2007
# http://hips.seas.harvard.edu/content/bayesian-online-changepoint-detection
#######################################
# Other python and matlab implementations
# https://github.com/JackKelly/bayesianchangepoint
# http://hips.seas.harvard.edu/content/bayesian-online-changepoint-detection
# http://www.inference.phy.cam.ac.uk/rpa23/cp/gaussdemo.m
# http://www.inference.phy.cam.ac.uk/rpa23/cp/studentpdf.m
# http://www.inference.phy.cam.ac.uk/rpa23/cp/constant_hazard.m
# Even more commented, but different paper:
# https://github.com/davyfeng/ipdata/blob/master/csv/bocpd/core/bocpd.m
# Generate data
x1 <- rnorm(100, 10.5, 0.1)
x2 <- rnorm(100, 1, 0.1)
x3 <- rnorm(100, -10, 0.1)
x4 <- rnorm(100, -1, 0.1)
x5 <- rnorm(100, 5, 0.1)
x6 <- rnorm(30, 1, 0.1)
x7 <- rnorm(100, 8, 0.1)
x <- c(x1,x2,x3,x4,x5, x6,x7)
##############
# Algorithm
##############
# Prepare the scaled and shifted student-t
dt.scaled.shifted <- function(x, m, s, df) stats::dt((x-m)/s, df)/s
# Prepare the Hazard function
hazard <-function(x, lambda=200){rep(1/lambda, length(x))}
L <- length(x)
R <- matrix(rep(0,(L+1)*(L+1)), L+1, L+1)
R[1,1] <- 1 # for t=1 where are sure that p(r=1)=1
mu0 <- 0; kappa0 <- 1; alpha0 <-1; beta0 <- 1;
muT <- mu0
kappaT <- kappa0
alphaT <- alpha0
betaT <- beta0
maxes <- rep(0, L)
# Process data as they come in
for(t in 1:L){
# Evaluate predictive probability
predprobs <- dt.scaled.shifted(x[t], muT, betaT*(kappaT+1)/(alphaT*-kappaT), 2*alphaT)
H <- hazard(x[1:t])
# Calculate growth probabilities
R[2:(t+1), t+1] <- R[1:t,t]*predprobs*(1-H)
# Calculate changepoint (reset) probabilities
R[1,t+1] <- sum(R[1:t,t]*predprobs*H)
# Renormalize
R[,t+1] <- R[,t+1] / sum(R[,t+1])
# Update parameters for each possible run length
# keep the past ones since they will be used iteratively
muT0 <- c(mu0, (kappaT*muT + x[t])/(kappaT+1))
kappaT0 <- c(kappa0,kappaT+1)
alphaT0 <- c(alpha0, alphaT + 0.5)
betaT0 <- c(beta0, kappaT + (kappaT * (x[t]-muT)^2)/(2*(kappaT+1)))
muT <- muT0
kappaT <- kappaT0
alphaT <- alphaT0
betaT <- betaT0
# Store the maximum, to plot later
maxes[t] <- which.max(R[,t])
}
# Plot results
par(mfrow=c(2,1))
plot(x, type='l')
image((-t(log(R))), col = grey(seq(0,1,length=256)), axes=T)
par(new=T)
plot(1:(dim(R)[1]-1), maxes,type='l', col="red")
On the top there is the original data. On the bottom, the probability of a current run to have length y. The red line on the bottom should fit the dark shades.
(to be deleted. It does not work. I leave it temporaly to save the comments.)
I got it, I thought I had already tried par(new=T) but obviously I hadn't:
m <- matrix(rnorm(100,1,1),50,50)
image(m, col = grey(seq(0,1,length=256)))
par(new=T)
plot(seq(0,1, length=50), type='l', col="red", lwd=5)
Quick example simulating the whole process:
data <- vector()
for(i in 1:50){
data <- rbind(data, dpois(1:50, i^1.2))
}
maxes <- apply(data, 1, which.max)
image(-data, col = grey(seq(0,1,length=256)))
par(new=T)
plot(1:dim(data)[1], c(maxes),type='l')

Adding two random variables via convolution in R

I would like to compute the convolution of two probability distributions in R and I need some help. For the sake of simplicity, let's say I have a variable x that is normally distributed with mean = 1.0 and stdev = 0.5, and y that is log-normally distributed with mean = 1.5 and stdev = 0.75. I want to determine z = x + y. I understand that the distribution of z is not known a priori.
As an aside the real world example I am working with requires addition to two random variables that are distributed according to a number of different distributions.
Does anyone know how to add two random variables by convoluting the probability density functions of x and y?
I have tried generating n normally distributed random values (with above parameters) and adding them to n log-normally distributed random values. However, I wish to know if I can use the convolution method instead. Any help would be greatly appreciated.
EDIT
Thank you for these answers. I define a pdf, and try to do the convolution integral, but R complains on the integration step. My pdfs are Log Pearson 3 and are as follows
dlp3 <- function(x, a, b, g) {
p1 <- 1/(x*abs(b) * gamma(a))
p2 <- ((log(x)-g)/b)^(a-1)
p3 <- exp(-1* (log(x)-g) / b)
d <- p1 * p2 * p3
return(d)
}
f.m <- function(x) dlp3(x,3.2594,-0.18218,0.53441)
f.s <- function(x) dlp3(x,9.5645,-0.07676,1.184)
f.t <- function(z) integrate(function(x,z) f.s(z-x)*f.m(x),-Inf,Inf,z)$value
f.t <- Vectorize(f.t)
integrate(f.t, lower = 0, upper = 3.6)
R complains at the last step since the f.t function is bounded and my integration limits are probably not correct. Any ideas on how to solve this?
Here is one way.
f.X <- function(x) dnorm(x,1,0.5) # normal (mu=1.5, sigma=0.5)
f.Y <- function(y) dlnorm(y,1.5, 0.75) # log-normal (mu=1.5, sigma=0.75)
# convolution integral
f.Z <- function(z) integrate(function(x,z) f.Y(z-x)*f.X(x),-Inf,Inf,z)$value
f.Z <- Vectorize(f.Z) # need to vectorize the resulting fn.
set.seed(1) # for reproducible example
X <- rnorm(1000,1,0.5)
Y <- rlnorm(1000,1.5,0.75)
Z <- X + Y
# compare the methods
hist(Z,freq=F,breaks=50, xlim=c(0,30))
z <- seq(0,50,0.01)
lines(z,f.Z(z),lty=2,col="red")
Same thing using package distr.
library(distr)
N <- Norm(mean=1, sd=0.5) # N is signature for normal dist
L <- Lnorm(meanlog=1.5,sdlog=0.75) # same for log-normal
conv <- convpow(L+N,1) # object of class AbscontDistribution
f.Z <- d(conv) # distribution function
hist(Z,freq=F,breaks=50, xlim=c(0,30))
z <- seq(0,50,0.01)
lines(z,f.Z(z),lty=2,col="red")
I was having trouble getting integrate() to work for different density parameters, so I came up with an alternative to #jlhoward's using Riemann approximation:
set.seed(1)
#densities to be convolved. could also put these in the function below
d1 <- function(x) dnorm(x,1,0.5) #
d2 <- function(y) dlnorm(y,1.5, 0.75)
#Riemann approximation of convolution
conv <- function(t, a, b, d) { #a to b needs to cover the range of densities above. d needs to be small for accurate approx.
z <- NA
x <- seq(a, b, d)
for (i in 1:length(t)){
print(i)
z[i] <- sum(d1(x)*d2(t[i]-x)*d)
}
return(z)
}
#check against sampled convolution
X <- rnorm(1000, 1, 0.5)
Y <- rlnorm(1000, 1.5, 0.75)
Z <- X + Y
t <- seq(0, 50, 0.05) #range to evaluate t, smaller increment -> smoother curve
hist(Z, breaks = 50, freq = F, xlim = c(0,30))
lines(t, conv(t, -100, 100, 0.1), type = "s", col = "red")

Resources