How to customise a kernel function in R density package? - kernel-density

I want to use a new kernel function, called Epanech, in order to use for use in:
fit1 <- density(x=Data, bw = "nrd0", kernel = "Epanech").
Awaiting your response please accept my best regards.
Thank you very much.
The code of my new kernel function (for example Epanech) is:
x= faithful$eruptions
xgrid = seq(-1, 8, 0.1)
Epanech <- function(xvals, obs, h) {
h <- h * sqrt(5)
dens <- sapply(xvals, function(x) {
u <- abs(x - obs) / h
u <- ifelse(u > 1, 1, u)
sum(0.75*(1 - u^2))
})
dens/sum(dens * mean(diff(xvals)))
}

Related

Error message about the plot in regression model

I have an R programm for a regression that somehow gives me an error message that I do not understand. The regression model takes as input heat input heat data (Q_htg) and the corresponding temperature data (T_amb) and then builds a linear regression for those two variables. Afterwards I want to use the trained regression model to predict some outputs. Here is the code:
dalinearPowerScaling2.function <-
function(Dataset,
numberOfDaysForAggregation,
normOutsideTemperature) {
heatingPower <- Dataset$Q_htg
outSideTemperature <- Dataset$T_amb
aggregationLevel <- numberOfDaysForAggregation * 1440
index <- 0
meanValuesOutsideTemperature <-
vector(, length(outSideTemperature) / aggregationLevel)
for (i in seq(1, length(outSideTemperature), aggregationLevel)) {
sum <- 0
for (j in seq(i, i + aggregationLevel - 1, 1)) {
sum <- sum + outSideTemperature[j]
}
index <- index + 1
meanValuesOutsideTemperature[index] <- sum / aggregationLevel
}
index <- 0
meanValuesHeatingDemand <-
vector(, length(heatingPower) / aggregationLevel)
for (i in seq(1, length(heatingPower), aggregationLevel)) {
sum <- 0
for (j in seq(i, i + aggregationLevel - 1, 1)) {
sum <- sum + heatingPower[j]
}
index <- index + 1
meanValuesHeatingDemand[index] <- sum / aggregationLevel
}
linearModel <-
lm(meanValuesHeatingDemand ~ meanValuesOutsideTemperature)
abline(linearModel, col = "red")
pred <- predict(linearModel, data.frame(meanValuesOutsideTemperature = c(normOutsideTemperature)))
List<-list(meanValuesHeatingDemand, meanValuesOutsideTemperature)
List2 <- vector("list", length(heatingPower)/aggregationLevel)
for (i in seq(1, length(meanValuesHeatingDemand),1)){
List2 [[i]]<-c(meanValuesHeatingDemand[i], meanValuesOutsideTemperature[i])
}
List3<-List2[order(sapply(List2, function(x) x[1], simplify=TRUE), decreasing=FALSE)]
firstTemperatureWithHeatingDemand<-0
firstHeatingDemand<-0
for (i in seq(1, length(List3), 1)) {
if(List3[[i]][1]>0) {
firstTemperatureWithHeatingDemand<-List3[[i]][2]
firstHeatingDemand<-List3[[i]][1]
break}
}
regression2ValuesX <- vector(, 5)
regression2ValuesY <- vector(, 5)
regression2ValuesX [1] <- firstTemperatureWithHeatingDemand
regression2ValuesY [1] <-firstHeatingDemand
List3<-List2[order(sapply(List2, function(x) x[1], simplify=TRUE), decreasing=TRUE)]
for (i in seq(1, length(regression2ValuesX) - 1, 1)) {
regression2ValuesX[i + 1]<-List3[[i]][2]
regression2ValuesY[i + 1]<-List3[[i]][1]
}
plot(regression2ValuesX, regression2ValuesY)
linearModel2 <-
lm(regression2ValuesY ~ regression2ValuesX)
abline(linearModel2, col = "blue")
pred <- predict(linearModel2, data.frame(regression2ValuesX = c(normOutsideTemperature)))
paste("Predicted heating demand:", round(pred))
}
When I run with the command
linearPowerScaling2.function(data_heat_test, 1, -12)
I get the error message:
Error in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...) :
plot.new has not been called yet
3.
int_abline(a = a, b = b, h = h, v = v, untf = untf, ...)
2.
abline(linearModel, col = "red") at LinearPowerScaling2_Function.R#33
1.
linearPowerScaling2.function(data_heat_test, 1, -12)
The data itself should be okay. Can anyone tell me, what the problem is?
Without reproducible minimal example it's hard to test if this solves it, but the error message tells you that you are calling abline() before calling plot().
That's exactly what happens on line 33...
Hope this helps.
Check here to see how to make a minimal reproducible example.

Solving differential equation in R - deSolve

I have an equation as below;
dN/dt = N(t)G(t)
G(t) is given by the equation: dG/dt = a * G
How do I solve this in R, using ode function from deSolve package?
As dario already mentioned, the question lacks some details. Nevertheless, let's try an answer. If we assume that a < 0, the model looks like the ode formulation of Gompertz growth:
dN/dt = N * G
dG/dt = a * G
This can then be solved as:
library(deSolve)
model <- function(t, y, p) {
with(as.list(c(y, p)), {
dN <- N * G
dG <- a * G
list(c(dN, dG))
})
}
y <- c(N = 1, G = 1)
parms <- c(a = -0.1)
times <- seq(0, 100)
out <- ode(y, times, model, parms)
plot(out)

REBayes Error in KWDual MKS_RES_TERM_STALL

I am trying to run the following simulation below. Note that this does require Mosek and RMosek to be installed!
I keep getting the error
Error in KWDual(A, d, w, ...) :
Mosek error: MSK_RES_TRM_STALL: The optimizer is terminated due to slow progress.
How can I resolve the MSK_RES_TRM_STALL error?
Further Research
When looking up the documentation for this I found this:
The optimizer is terminated due to slow progress.
Stalling means that numerical problems prevent the optimizer from making reasonable progress and that it makes no sense to continue. In many cases this happens if the problem is badly scaled or otherwise ill-conditioned. There is no guarantee that the solution will be feasible or optimal. However, often stalling happens near the optimum, and the returned solution may be of good quality. Therefore, it is recommended to check the status of the solution. If the solution status is optimal the solution is most likely good enough for most practical purposes.
Please note that if a linear optimization problem is solved using the interior-point optimizer with basis identification turned on, the returned basic solution likely to have high accuracy, even though the optimizer stalled.
Some common causes of stalling are a) badly scaled models, b) near feasible or near infeasible problems.
So I checked the final value A, but nothing was in it. I found that if I change the simulations from 1000 to 30 I do get values (A <- sim1(30, 30, setting = 1)), but this is suboptimal.
Reproducible Script
KFE <- function(y, T = 300, lambda = 1/3){
# Kernel Fourier Estimator: Stefanski and Carroll (Statistics, 1990)
ks <- function(s,x) exp(s^2/2) * cos(s * x)
K <- function(t, y, lambda = 1/3){
k <- y
for(i in 1:length(y)){
k[i] <- integrate(ks, 0, 1/lambda, x = (y[i] - t))$value/pi
}
mean(k)
}
eps <- 1e-04
if(length(T) == 1) T <- seq(min(y)-eps, max(y)+eps, length = T)
g <- T
for(j in 1:length(T))
g[j] <- K(T[j], y, lambda = lambda)
list(x = T, y = g)
}
BDE <- function(y, T = 300, df = 5, c0 = 1){
# Bayesian Deconvolution Estimator: Efron (B'ka, 2016)
require(splines)
eps <- 1e-04
if(length(T) == 1) T <- seq(min(y)-eps, max(y)+eps, length = T)
X <- ns(T, df = df)
a0 <- rep(0, ncol(X))
A <- dnorm(outer(y,T,"-"))
qmle <- function(a, X, A, c0){
g <- exp(X %*% a)
g <- g/sum(g)
f <- A %*% g
-sum(log(f)) + c0 * sum(a^2)^.5
}
ahat <- nlm(qmle, a0, X=X, A=A, c0 = c0)$estimate
g <- exp(X %*% ahat)
g <- g/integrate(approxfun(T,g),min(T),max(T))$value
list(x = T,y = g)
}
W <- function(G, h, interp = FALSE, eps = 0.001){
#Wasserstein distance: ||G-H||_W
H <- cumsum(h$y)
H <- H/H[length(H)]
W <- integrate(approxfun(h$x, abs(G(h$x) - H)),min(h$x),max(h$x))$value
list(W=W, H=H)
}
biweight <- function(x0, x, bw){
t <- (x - x0)/bw
(1-t^2)^2*((t> -1 & t<1)-0) *15/16
}
Wasser <- function(G, h, interp = FALSE, eps = 0.001, bw = 0.7){
#Wasserstein distance: ||G-H||_W
if(interp == "biweight"){
yk = h$x
for (j in 1:length(yk))
yk[j] = sum(biweight(h$x[j], h$x, bw = bw)*h$y/sum(h$y))
H <- cumsum(yk)
H <- H/H[length(H)]
}
else {
H <- cumsum(h$y)
H <- H/H[length(H)]
}
W <- integrate(approxfun(h$x, abs(G(h$x) - H)),min(h$x),max(h$x),
rel.tol = 0.001, subdivisions = 500)$value
list(W=W, H=H)
}
sim1 <- function(n, R = 10, setting = 0){
A <- matrix(0, 4, R)
if(setting == 0){
G0 <- function(t) punif(t,0,6)/8 + 7 * pnorm(t, 0, 0.5)/8
rf0 <- function(n){
s <- sample(0:1, n, replace = TRUE, prob = c(1,7)/8)
rnorm(n) + (1-s) * runif(n,0,6) + s * rnorm(n,0,0.5)
}
}
else{
G0 <- function(t) 0 + 7 * (t > 0)/8 + (t > 2)/8
rf0 <- function(n){
s <- sample(0:1, n, replace = TRUE, prob = c(1,7)/8)
rnorm(n) + (1-s) * 2 + s * 0
}
}
for(i in 1:R){
y <- rf0(n)
g <- BDE(y)
Wg <- Wasser(G0, g)
h <- GLmix(y)
Wh <- Wasser(G0, h)
Whs <- Wasser(G0, h, interp = "biweight")
k <- KFE(y)
Wk <- Wasser(G0, k)
A[,i] <- c(Wg$W, Wk$W, Wh$W, Whs$W)
}
A
}
require(REBayes)
set.seed(12)
A <- sim1(1000, 1000, setting = 1)
I ran the code and indeed it stalls at the end, but the solution is not any worse than in the preceding cases that solve without stall:
17 1.7e-07 3.1e-10 6.8e-12 1.00e+00 5.345949918e+00 5.345949582e+00 2.4e-10 0.40
18 2.6e-08 3.8e-11 2.9e-13 1.00e+00 5.345949389e+00 5.345949348e+00 2.9e-11 0.41
19 2.6e-08 3.8e-11 2.9e-13 1.00e+00 5.345949389e+00 5.345949348e+00 2.9e-11 0.48
20 2.6e-08 3.8e-11 2.9e-13 1.00e+00 5.345949389e+00 5.345949348e+00 2.9e-11 0.54
Optimizer terminated. Time: 0.62
Interior-point solution summary
Problem status : PRIMAL_AND_DUAL_FEASIBLE
Solution status : OPTIMAL
Primal. obj: 5.3459493890e+00 nrm: 6e+00 Viol. con: 2e-08 var: 0e+00 cones: 4e-09
Dual. obj: 5.3459493482e+00 nrm: 7e-01 Viol. con: 1e-11 var: 4e-11 cones: 0e+00
A quick hack for now that worked for me is to relax the termination tolerances a little bit in the call to GLmix:
control <- list()
control$dparam <- list(INTPNT_CO_TOL_REL_GAP=1e-7,INTPNT_CO_TOL_PFEAS=1e-7,INTPNT_CO_TOL_DFEAS=1e-7)
h <- GLmix(y,control=control,verb=5)
A better solution as I indicated in the comments is not to treat the stall termination code as an error by the REBayes package but use solution status/quality instead.
I have modified the return from KWDual to avoid such messages provided that
the status sol$itr$solsta from Mosek is "Optimal" in REBayes v2.2 now on CRAN.

Sampling a log-concave distribution using the adaptive rejection sampling method (R)

I am not very familiar with R. I have been trying to use the implementation of the adaptive rejection sampling method in R, in order to sample from the following distribution:
here is my R code:
library(ars)
g1 <- function(x,r){(1./r)*((1-x)^r)}
f1 <- function(x,a,k) {
add<-0
for(i in 1:k) {
add<- add+g1(x,i)
}
res <- (a* add)+(a-1)*log(x)+k*log(1-x)
return(res)
}
g2 <- function(x,r){(1-x)^(r-1)}
f1prima <- function(x,a,k) {
add<-0
for(i in 1:k) {
add<- add-g2(x,i)
}
res <- (a* add)+(a-1)/x-k/(1-x)
return(res)
}
mysample1<-ars(20,f1,f1prima,x=c(0.001,0.09),m=2,emax=128,lb=TRUE,xlb=0.0, ub=TRUE, xub=1,a=0.5,k=100)
The function is a log-concave, but I get different error messages when I run ars and fiddling around with the input parameters won't help here. Any suggestion would be appreciated.
First thing, which you already noticed is that your log-concave function is not very well defined at x=0 and x=1.0. So useful interval would be something like 0.01...0.99, not 0.0...1.0
Second, I don't like the idea to compute hundreds of terms in your summation term.
So, good idea might be to express it in following way, starting with derivative
S1N-1 qi is obviously geometric series and could be replaced with
(1-qN)/(1-q), where q=1-x.
This is derivative, so to get to similar term in function itself, just integrate it.
http://www.wolframalpha.com/input/?i=integrate+(1-q%5EN)%2F(1-q)+dq will return Gauss Hypergeometric function 2F1 plus logarithm
-qN+1 2F1(1, N+1; N+2; q)/(N+1) - log(1-q)
NB: It is the same integral as Beta before, but dealing with it was a bit more cumbersome
So, code to compute those terms:
library(gsl)
library(ars)
library(ggplot2)
Gauss2F1 <- function(a, b, c, x) {
ifelse(x >= 0.0 & x < 1.0, hyperg_2F1(a, b, c, x), hyperg_2F1(c - a, b, c, 1.0 - 1.0/(1.0 - x))/(1.0 - x)^b)
}
f1sum <- function(x, N) {
q <- 1.0 - x
- q^(N+1) * Gauss2F1(1, N+1, N+2, q)/(N+1) - log(1.0 - q)
}
f1sum.1 <- function(x, N) {
q <- 1.0 - x
res <- rep(0.0, length.out = length(x))
s <- rep(1.0, length.out = length(x))
for(k in 1:N) {
s <- s * q / as.numeric(k)
res <- res + s
}
res
}
f1 <- function(x, a, N) {
a * f1sum(x, N) + (a - 1.0)*log(x) + N*log(1.0 - x)
}
f1.1 <- function(x, a, N) {
a * f1sum.1(x, N) + (a - 1.0)*log(x) + N*log(1.0 - x)
}
f1primesum <- function(x, N) {
q <- 1.0 - x
(1.0 - q^N)/(1.0 - q)
}
f1primesum.1 <- function(x, N) {
res <- rep(0.0, length.out = length(x))
s <- rep(1.0, length.out = length(x))
for(k in 1:N) {
res <- res + s
s <- s * q
}
-res
}
f1prime <- function(x, a, N) {
a* f1primesum(x, N) + (a - 1.0)/x - N/(1.0 - x)
}
f1prime.1 <- function(x, a, N) {
a* f1primesum.1(x, N) + (a - 1.0)/x - N/(1.0 - x)
}
p <- ggplot(data.frame(x = c(0, 1)), aes(x = x)) +
stat_function(fun = f1, args = list(0.5, 100), colour = "#4271AE") +
stat_function(fun = f1.1, args = list(0.5, 100), colour = "#1F3552") +
scale_x_continuous(name = "X", breaks = seq(0, 1, 0.2), limits=c(0.001, 0.5)) +
scale_y_continuous(name = "F") +
ggtitle("Log-concave function")
p
As you can see, I've implemented both versions - one using summation and another using analytical form of sums. Computed data for a=0.5, N=100.
First, there is a bit of a difference between direct sum and 2F1 - I attribute it to precision loss in summation.
Second, more important result - function is NOT log-concave. No questions why ars() if failing left and right. See graph below

ROI optimisation in R using multi-argument F_objective function

Trying to run a simple ROI optimisation in R, but after hours of fidgeting I'm at a loss. I keep getting the error:
Error in .check_function_for_sanity(F, n) :
cannot evaluate function 'F' using 'n' = 5 parameters.
Here is the sample code:
library(ROI)
library(nloptr)
library(ROI.plugin.nloptr)
#Generate some random data for this example
set.seed(3142)
myRet = matrix(runif(100 * 5, -0.1, 0.1), ncol = 5)
myCovMatrix = cov(myRet)
myRet <- myRet
myCovMatrix <- myCovMatrix
# Sample weights
w <- rep(1/ncol(myRet), ncol(myRet))
#Define functions for the optimisation
diversificationRatio = function(w, covMatrix)
{
weightedAvgVol = sum(w * sqrt(diag(covMatrix)))
portfolioVariance = (w %*% covMatrix %*% w)[1,1]
- 1 * weightedAvgVol / sqrt(portfolioVariance)
}
# Check that the F_objective function works:
diversificationRatio(w, myCovMatrix)
# Now construct the F_objective
foo <- F_objective(F = diversificationRatio, n = (ncol(myRet)))
Any ideas on how many parameters to pass to n?
F_objective expects a function with only one argument so you have to write a wrapper function.
#Define functions for the optimisation
diversificationRatio <- function(w, covMatrix) {
weightedAvgVol <- sum(w * sqrt(diag(covMatrix)))
portfolioVariance <- (w %*% covMatrix %*% w)[1,1]
- 1 * weightedAvgVol / sqrt(portfolioVariance)
}
# Check that the F_objective function works:
wrapper <- function(x) diversificationRatio(x, myCovMatrix)
# Now construct the F_objective
o <- OP(F_objective(F = wrapper, n = (ncol(myRet))))
ROI_applicable_solvers(o)
start <- runif(ncol(myRet))
s <- ROI_solve(o, solver = "nloptr", start = start, method = "NLOPT_LD_SLSQP")
s
solution(s)

Resources