Function to Produce Repeating Spikes - r

I asked a similar question on CrossValidated, but did not get a response. I went ahead anyway, and built out a function but am having a problem with replication...
The original question, posted here is as such:
I am seeking a function (or short algorithm, ideally implemented in R) that produces something similar to the following:
See, I would like to be able to generate a vector of n items that follows this sort of pattern, mapped to a set of inputs (say, seq(1:n)). Ideally, I would be able to tell the algorithm to "spike" to a maximum height h on every kth time period, and decay at rate r. However, I would be sufficiently happy with simply being able to generate a spike pattern that occurs periodically.
I wrote some code in R, which is included here, that works fairly well...
## Neural Networks / Deep Learning ##
# first, must install Python from:
# https://www.anaconda.com/download/#windows
# https://www.python.org/downloads/
if (!require(keras)) devtools::install_github("rstudio/keras") ; library(keras)
# install_tensorflow()
spikes_model <- function(maxiter, total_spikes = 10, max_height = 0.001, min_height = 0.000005, decay_rate = 1) {
value_at_iteration <- rep(0, maxiter)
spike_at <- maxiter / total_spikes
current_rate <- min_height
holder_timeval <- 0
for(i in 1:maxiter) {
spike_indicator <- i / spike_at
if (is.integer(spike_indicator)) {
current_rate <- max_height
value_at_iteration[i] <- current_rate
holder_timeval <- spike_indicator
} else if (i < spike_at) {
current_rate <- min_height
value_at_iteration[i] <- current_rate
} else {
timeval <- i - (holder_timeval*spike_at)
current_rate <- max_height*exp(-decay_rate*timeval) + min_height
value_at_iteration[i] <- current_rate
}
}
return(value_at_iteration)
}
asdf <- spikes_model(maxiter = 100)
plot(asdf, type="l")
... which results in the following plot:
This is exactly what I want, except there is only one spike. I know there is a code or logic error somewhere, but I can not find where I am going wrong. Please help me replicate this spike procedure across time.
The code this scheduler is used in:
eps <- 1000
sch <- spikes_model(eps)
lr_schedule <- function(epoch, lr) {
lrn <- sch[as.integer(epoch)]
lrn <- k_cast_to_floatx(lrn)
return(lrn)
}
## Add callback to automatically adjust learning rate downward when training reaches plateau ##
reduce_lr <- callback_learning_rate_scheduler(lr_schedule)
## Fit model using trainig data, validate with validation data ##
mod1.hst <- mod1 %>% fit(
x=X.train, y=Y.train,
epochs=eps, batch_size=nrow(X.train),
validation_data = list(X.val, Y.val),
shuffle=TRUE, callbacks = list(checkpoint, reduce_lr)
)

Wow, I just figured out my own error. I was using the is.integer() function, which does not work how I wanted. I needed to use the is.whole.number() function from mosaic.
Fixing that single error, I find the following chart, which is exactly what I wanted.

Related

How to combine two optimization problems in R with optim(.)?

Roughly speaking, I seek to solve a maximation problem in R, where the objective function has the following structure: log[f(theta)*g(theta)]. Thus I want to solve
Max log[f(theta)*g(theta)]
The problem comes from the fact that g(theta) is obtained from another (minimization) problem in theta with restrictions. g(theta) is defined as:
g(theta) = argmin {Min h(x,theta)*g}.
Since h(x,theta) depends on theta, the optimal g that minimizes h(x,theta)*g must be a function of theta.
My approach, so far, has been to first define the function constrOptim(.) such that I can tell R that I want to minimize h(x,theta)*g subject to some restrictions, and then I incorporate that optimal g(.) into the second function (in theta). With this, I use the function optim(.) to try to max log[f(theta)*g(theta)].
Here is my code:
First, the function problem_min, which is a function of theta, gives the optimal g(theta), which is obtained as a solution to the minimization problem with constraints (sol_min).
problem_min <- function(theta_est){
reg_theta_int <- reg_matrix%*%theta_est
for(i in 1:(n*T)){
for (j in 1:length(alpha)){
L_1[i,j] = exp(alpha[j]+reg_theta_int[i])/(1+exp(alpha[j]+reg_theta_int[i]))
}
}
y_tilde <- full_data$y - L_1[,1]
L_1_tilde <- L_1[,-1] - L_1[,1]
eval_funct_g <- function(g){
return((sum((y_tilde-L_1%*%g)^2))*0.5*(1/n)*(1/T))
}
sol_min <- constrOptim(theta = g_int, f = eval_funct_g, grad = NULL, ui = R, ci = r, mu = 1e-04,
method = "Nelder-Mead",
outer.iterations = 100, outer.eps = 1e-05,
hessian = FALSE)
g_theta = c(1-sum(sol_min$par), sol_min$par)
return(g_theta)
}
Once I have the optimal g(theta), which is a vector of numbers, I plug in log[f(theta)*g(theta)] to maximize the whole expression using optim(.) :
funct_f_g <- function(theta_est){
full_data$reg_theta_est <- reg_matrix%*%theta_est
for(i in 1:n){
for (j in 1:length(alpha)){
for(t in 1:T){
product[t] = exp(full_data$y[full_data$t==t & full_data$id==i]*(alpha[j]+full_data$reg_theta_est[full_data$t==t & full_data$id==i]))/(1+exp(alpha[j]+full_data$reg_theta_est[full_data$t==t & full_data$id==i]))
}
L_ml[i,j] = prod(product)
}}
return(sum(log(L_ml%*%problem_min(theta_est))))
}
sol_ml <- optim(par = theta_int, fn = funct_f_g, method=c("Nelder-Mead"),
lower=-Inf, upper=Inf,
control=list(fnscale=-1),
hessian = FALSE)
theta_opt <- sol_ml$par
}
sol_ml intends to solve Max log[f(theta)*g(theta)] while incorporating the fact that g(theta) should be previously chosen optimally.
A variable_int means that it gives an initial value.
When I run the previous code R tells me that the objective function in optim(.) cannot be evaluated. Nevertheless, when I evaluate funct_f_g at some give theta_est it runs perfectly. Thus I think that there is something wrong with the optim(.) function regarding how I am trying to tell R that the problem has the previous structure.
If you have a different approach to approach my problem, or an explanation about what I am not doing correctly, it will be great!
I know that I am not giving a description of all the matrices and operations that are involved in the previous problem. I skip this for simplicity, hoping that the general structure of the problems can be understood.

R function loglik() returning -inf?

Simulating an SIR model in R. I have a data set I am trying to plot accurately with the model. I am right now using the particle filter function, then would like to use the corresponding logLik method on the result. When I do this, I get "[1] -Inf" as a result. I can't find in the documentation why this is and how I can avoid it. Are my parameters for the model not accurate enough? Is there something else wrong?
My function looks like this:
SIRsim %>%
pfilter(Np=5000) -> pf
logLik(pf)
From an online course lesson entitled Likelihood for POMPS https://kingaa.github.io/sbied/pfilter/ , this is the R script for the lesson. However, the code works here... I'm not sure how to reproduce my specific problem with it and unfortunately cannot share the dataset or code I am using because it is for academic research.
library(tidyverse)
library(pomp)
options(stringsAsFactors=FALSE)
stopifnot(packageVersion("pomp")>="3.0")
set.seed(1350254336)
library(tidyverse)
library(pomp)
sir_step <- Csnippet("
double dN_SI = rbinom(S,1-exp(-Beta*I/N*dt));
double dN_IR = rbinom(I,1-exp(-mu_IR*dt));
S -= dN_SI;
I += dN_SI - dN_IR;
R += dN_IR;
H += dN_IR;
")
sir_init <- Csnippet("
S = nearbyint(eta*N);
I = 1;
R = nearbyint((1-eta)*N);
H = 0;
")
dmeas <- Csnippet("
lik = dbinom(reports,H,rho,give_log);
")
rmeas <- Csnippet("
reports = rbinom(H,rho);
")
read_csv("https://kingaa.github.io/sbied/pfilter/Measles_Consett_1948.csv")
%>%
select(week,reports=cases) %>%
filter(week<=42) %>%
pomp(
times="week",t0=0,
rprocess=euler(sir_step,delta.t=1/7),
rinit=sir_init,
rmeasure=rmeas,
dmeasure=dmeas,
accumvars="H",
statenames=c("S","I","R","H"),
paramnames=c("Beta","mu_IR","eta","rho","N"),
params=c(Beta=15,mu_IR=0.5,rho=0.5,eta=0.06,N=38000)
) -> measSIR
measSIR %>%
pfilter(Np=5000) -> pf
logLik(pf)
library(doParallel)
library(doRNG)
registerDoParallel()
registerDoRNG(652643293)
foreach (i=1:10, .combine=c) %dopar% {
measSIR %>% pfilter(Np=5000)
} -> pf
logLik(pf) -> ll
logmeanexp(ll,se=TRUE)
If I set Beta=100 in the code above I can get a negative-infinite log-likelihood.
Replacing the measurement-error snippet with this:
dmeas <- Csnippet("
double ll = dbinom(reports,H,rho,give_log);
lik = (!isfinite(ll) ? -1000 : ll );
")
appears to 'solve' the problem, although you should be a little bit careful; papering over numerical cracks like this is sometimes OK, but could conceivably come back to bite you in some way later on. If you just need to avoid non-finite values long enough to get into a reasonable parameter range this might be OK ...
Some guesses as to why this is happening:
you are somehow getting an "impossible" situation like a positive number of reported cases when the underlying true number of infections is zero.
Sometimes non-finite log-likelihoods occur when a very small positive probability underflows to zero. The equivalent here is likely that the probability of infection 1-exp(-Beta*I/N*dt) goes to 1.0; then any observed outcome where less than 100% of the population is infected is impossible.
You can try to diagnose the situation by seeing what the filtered trajectory actually looks like and comparing it with the data, or by adding debugging statements to the code. If there's a way to run just the deterministic simulation with your parameter values that might tell you pretty quickly what's going wrong.
An easier/more direct way to debug would be to replace the Csnippet you're using for dmeas with an R function: this will be slower but easier to work with (especially if you're not familiar with C coding). If you uncomment the browser() statement below, the code will drop into debug mode when you encounter the bad situation ...
dmeas <- function(reports,H,rho,log, ...) {
lik <- dbinom(reports,size=H,prob=rho,log=log)
if (!is.finite(lik)) {
lik <- -1000
## browser()
}
return(lik)
}
For example:
(t = 3, reports = 2, S = 2280, I = 0, R = 35721, H = 0, Beta = 100,
mu_IR = 0.5, rho = 0.5, eta = 0.06, N = 38000, log = TRUE)
Browse[1]> debug at /tmp/SO65554258.R!ZlSILG#7: return(lik)
Browse[2]> reports
[1] 2
Browse[2]> H
[1] 0
Browse[2]> rho
[1] 0.5
This shows that the problem is indeed that you have a positive number of reported cases when there have been zero infections ... R is trying to compute the binomial probability of observing reports cases out when there are H infections that are potentially reportable, each reported with a probability rho. When the number of trials N in a binomial probability Binom(N,p) is zero, the only possible outcome is zero 'successes' (reported cases), with probability 1. All other outcomes have probability 0 (and log-probability -Inf).

Exponential distribution in R

I want to simulate some data from an exp(1) distribution but they have to be > 0.5 .so i used a while loop ,but it does not seem to work as i would like to .Thanks in advance for your responses !
x1<-c()
w<-rexp(1)
while (length(x1) < 100) {
if (w > 0.5) {
x1<- w }
else {
w<-rexp(1)
}
}
1) The code in the question has these problems:
we need a new random variable on each iteration but it only generates new random variables if the if condition is FALSE
x1 is repeatedly overwritten rather than extended
although while could be used repeat seems better since having the test at the end is a better fit than the test at the beginning
We can fix this up like this:
x1 <- c()
repeat {
w <- rexp(1)
if (w > 0.5) {
x1 <- c(x1, w)
if (length(x1) == 100) break
}
}
1a) A variation would be the following. Note that an if whose condition is FALSE evaluates to NULL if there is no else leg so if the condition is FALSE on the line marked ## then nothing is concatenated to x1.
x1 <- c()
repeat {
w <- rexp(1)
x1 <- c(x1, if (w > 0.5) w) ##
if (length(x1) == 100) break
}
2) Alternately, this generates 200 exponential random variables keeping only those greater than 0.5. If fewer than 100 are generated then repeat. At the end it takes the first 100 from the last batch generated. We have chosen 200 to be sufficiently large that on most runs only one iteration of the loop will be needed.
repeat {
r <- rexp(200)
r <- r[r > 0.5]
if (length(r) >= 100) break
}
r <- head(r, 100)
Alternative (2) is actually faster than (1) or (1a) because it is more highly vectorized. This is despite it throwing away more exponential random variables than the other solutions.
I would advise against a while (or any other accept/reject) loop; instead use the methods from truncdist:
# Sample 1000 observations from a truncated exponential
library(truncdist);
x <- rtrunc(1000, spec = "exp", a = 0.5);
# Plot
library(ggplot2);
ggplot(data.frame(x = x), aes(x)) + geom_histogram(bins = 50) + xlim(0, 10);
It's also fairly straightforward to implement a sampler using inverse transform sampling to draw samples from a truncated exponential distribution that avoids rejecting samples in a loop. This will be a more efficient method than any accept/reject-based sampling method, and works particularly well in your case, since there exists a closed form of the truncated exponential cdf.
See for example this post for more details.

R - nolptr - Find the 50 better solutions, not only the best one

I'm using the nerldermead() function of the nolptr package and I would like to find, for instance, the 50 most likely solutions. In this example :
opti= function(x){x-12}
x0=c(0)
lower=c(0)
upper=c(100)
solution=neldermead(x0,opti,lower,upper,control=list(maxeval = 1000,stopval = -Inf))
I will only obtain solution=12, but I would obtain this best solution and 49 other around. Is there a way to extract this information of the nerldermead() function ?
Thanks a lot !
The simplex is a local algorithm that won't allow you to find different local optima, but only one optimum value (being global or local). You can iterate your simplex optimisation with something like a Multi-Level Single Linkage algorithm that will find different starting points for your simplex, depending on the results of the previous simplex. Here is an example with your function:
require(nloptr)
table <- NULL
opti <- function(x){
res <- x-12
table <<- rbind(table, c(x, res))
res
}
lower <- c(0)
upper <- c(100)
local_opts <- list( "algorithm" = "NLOPT_LN_NELDERMEAD",
maxeval=15,
"xtol_abs"=1.0e-4)
opts <- list("algorithm" = "NLOPT_GN_MLSL_LDS",
"local_opts" = local_opts,
maxeval=50*15,
print_level=3)
OPT <- nloptr(
x0 = runif(1, min=lower, max=upper), # random starting point
eval_f=opti,
lb = lower,
ub = upper,
eval_grad_f=NULL,
opts=opts
)
table <- table[order(table[,2]),]
table[1:50,]
As your function is simple, your 50 results are the same but with a rougher surface you may expect interesting results. To my knowledge nloptr does not allow you to get the trace of your optim path so you have to write it in your evaluation function. Here the number of iteration is very low: you have 50-random starting 15-iteration simplex, don't forget to change that.

Genetic algorithms under R, adding suggestions

In the genalg package, the rbga.bin command offers the possibility to add a list of suggestion, however, I can't find any example of this feature actually working, could anyone give me some help ?
library(genalg)
evaluation<-function(x){
n<- 2
if (sum(x)!= n){
return(100) }
if (sum(x)== n){
sequen<- which(x>0)
l=sum(sequen)
return(-l) } }
vect1<-rep(0,times=40)
vect1[c(1,2)]<-c(1,1)
sug<-list(vect1)
for (iin 2:100){
vect1<-sample(vect1)
sug[[i]]<-vect1
}
GAmodel <- rbga.bin(size=40,popSize =100, iters =100, suggestions=sug,mutationChance = 0.01,elitism =T, evalFunc=evaluation,verbose=T)
Although documentation for rbga.bin function says:
suggestions: optional list of suggested chromosomes
rbga.bin apparently wants a data.frame or matrix:
# taken from the rbga.bin source code
suggestionCount = dim(suggestions)[1]
for (i in 1:suggestionCount) {
population[i, ] = suggestions[i, ]
}
When given a matrix, it seems to work fine:
sug2 <- t(replicate(sample(vect1),n = 10)) # needs to be rotated. check your solution n = 99 and it will fail
GAmodel <- rbga.bin(size=40,popSize =100, iters =100, suggestions=sug2,mutationChance = 0.01,elitism =T, evalFunc=evaluation,verbose=T)
Output:
Testing the sanity of parameters...
Not showing GA settings...
Adding suggestions to first population...
Filling others with random values in the given domains...
Starting iteration 1
Calucating evaluation values... .................................................................................................... done.
Creating next generation...
sorting results...
applying elitism...
applying crossover...
applying mutations... 40 mutations applied
Starting iteration 2
Calucating evaluation values... .................................................................................................. done.
Creating next generation...
<...>
Starting iteration 100
Calucating evaluation values... .................................................................................................. done.

Resources