Death and birth process and proportion - r

I have this code which describes a death and birht procceses with four states, 0,1,2,3.
bd_process <- function(lambda, mu, initial_state = 0, steps = 100) {
time_now <- 0
state_now <- initial_state
time <- 0
state <- initial_state
for (i in 1:steps) {
if (state_now == 3) {
lambda_now <- 0
} else {
lambda_now <- lambda
}
if (state_now == 0) {
mu_now <- 0
} else {
mu_now <- mu
}
time_to_transition <- rexp(1, mu_now + lambda_now)
if (runif(1) < mu_now/(lambda_now + mu_now)) {
state_now <- state_now - 1
} else {
state_now <- state_now + 1
}
time_now <- time_now + time_to_transition
time <- c(time, time_now)
state <- c(state, state_now)
}
return(list(time = time, state= state))
}
From this code I want to create a function which count the proportion of time in each time. The function is aimed to take two parameters, and Im a bit lost. Any suggestions?

This is an example function that identifies the times for each state and sums them. It uses sapply to iterate over the possible states.
x=bd_process(2,3)
#' #param x a list returned by bd_process function
#' #return a vector of times spent in state 0,1,2,3
get_times=function(x) {
sapply(0:3, function(i) {
sum(x$time[which(x$state==i)])
})
}
get_times(x)
[1] 401.52405 607.44447 253.52741 38.58117

Related

Stuck in infinite loop, where is the problem I am seeing?

My code is stuck in an infinite loop, with the stop sign occurring. I have read through it multiple times, can anyone help?
I am trying to run trials using samples of possession probabilities and the subsequent make probabilities for a basketball team. I am following along with a video and made sure each step was completed properly.
How can I tell where I am stuck in an infinite loop and how do I fix it?
mc_hoops_ex <- function(trials) {
prob_pos <- c(0.148, 0.544, 0.308, 0.256)
prob_2pm <- 0.524
prob_3pm <- 0.378
prob_ftm <- 0.761
prob_orb <- 0.319
a <- 1
pts_ct <- 0
while (a <= trials) {
pos_outcome <- sample(c(1:4), 1, prob = prob_pos)
if(pos_outcome == 2) {
pos_end <- 0
while (pos_end < 1) {
shot_prob <- runif(1)
if(shot_prob <= prob_2pm) {
pts_ct <- pts_ct + 2
pos_end <- 1
}
else {
orb_prob <- runif(1)
if(orb_prob >= prob_orb)
pos_end <- 1
}
}
}
}
if(pos_outcome == 3) {
pos_end <- 0
while (pos_end < 1) {
shot_prob <- runif(1)
if(shot_prob <= prob_3pm) {
pts_ct <- pts_ct + 3
pos_end <- 1
}
else {
orb_prob <- runif(1)
if(orb_prob >= prob_orb)
pos_end <- 1
}
}
}
if(pos_outcome == 4) {
pos_end <- 0
while (pos_end < 1) {
shot_prob <- runif(1)
if(shot_prob <= prob_ftm) {
pts_ct <- pts_ct + 1
pos_end <- 1
}
else {
orb_prob <- runif(1)
if(orb_prob >= prob_orb)
pos_end <- 1
}
}
}
a <- a + 1
print(pts_ct / trials)
print((pts_ct / trials) * 66.3)
}
This looks like R code, based on the syntax, <- assignment, and functions like c() and runif().
You have the following loop: while (a <= trials). This loop will continue running until the condition no longer holds. Since you initialize a <- 1, this loop will not stop unless (1) trials < 1, in which case the loop will not run a single time, or (2) a is incremented until a > trials.
We can see that the only time a is changed is near the bottom of the function: a <- a + 1. However, look closely at the braces. This increment is outside the while-loop, so it never occurs and the loop runs forever.
I'm not sure if this will produce the expected results, but a corrected version that does not have an infinite loop is given below.
mc_hoops_ex <- function(trials) {
prob_pos <- c(0.148, 0.544, 0.308, 0.256)
prob_2pm <- 0.524
prob_3pm <- 0.378
prob_ftm <- 0.761
prob_orb <- 0.319
a <- 1
pts_ct <- 0
while (a <= trials) {
pos_outcome <- sample(c(1:4), 1, prob = prob_pos)
if(pos_outcome == 2) {
pos_end <- 0
while (pos_end < 1) {
shot_prob <- runif(1)
if(shot_prob <= prob_2pm) {
pts_ct <- pts_ct + 2
pos_end <- 1
}
else {
orb_prob <- runif(1)
if(orb_prob >= prob_orb)
pos_end <- 1
}
}
}
# Removed the closing brace here
if(pos_outcome == 3) {
pos_end <- 0
while (pos_end < 1) {
shot_prob <- runif(1)
if(shot_prob <= prob_3pm) {
pts_ct <- pts_ct + 3
pos_end <- 1
}
else {
orb_prob <- runif(1)
if(orb_prob >= prob_orb)
pos_end <- 1
}
}
}
if(pos_outcome == 4) {
pos_end <- 0
while (pos_end < 1) {
shot_prob <- runif(1)
if(shot_prob <= prob_ftm) {
pts_ct <- pts_ct + 1
pos_end <- 1
}
else {
orb_prob <- runif(1)
if(orb_prob >= prob_orb)
pos_end <- 1
}
}
}
a <- a + 1
} # Added a closing brace here instead
print(pts_ct / trials)
print((pts_ct / trials) * 66.3)
}

For optimx() with method of bobyqa, how to set the initial value

I was trying to maximize my Likelihood with the R package 'optimx'. Here is my code. With the initial value (5,5) and (1,1), I got different Maximized likelihood. I also have tried different method like 'Nelder=Mead', but the estimated log likelihood are different under different methods...
library('optimx')
n=225
X = matrix(runif(225),ncol=1)
e2 = matrix(runif(225,0,2),ncol=1)
set.seed(123)
This is the function to generate some data I will use
get_mls_basis<- function(p){
depth <- ceiling(runif(1)*p)
knot <- matrix(rep(0,depth+1),ncol=1)
lr <- runif(1) > 0.5
x <- matrix(rep(0,n),ncol=1)
not_finished <- 1
while (not_finished == 1) {
data_indx = ceiling(runif(1)*n)
var = matrix(rep(0,depth),ncol=1)
for (j in 1:depth) {
not_ok <- 1
while (not_ok == 1) {
ind <- ceiling(runif(1)*p)
if (!is.element (ind,var[1:j]))
{
var[j] <- ind
not_ok <- 0
}
}
}
x_v <- as.matrix(X[data_indx, var])
knot[1:depth] <- rgamma(depth,1,1)
knot[1:depth] <- knot[1:depth] / sqrt(sum(knot^2))
knot[depth+1] <- -x_v %*% knot[1:depth]
ones <- matrix(rep(1,n),ncol=1)
temp <- as.matrix(cbind(X[,var], ones)) %*% knot
if (lr == 0) {
for (i in 1:n)
{
temp[i] <- max(0,temp[i])
}
}
else {
for (i in 1:n)
{
temp[i] <- min(0,temp[i])
}
}
x <- temp
not_finished <- all(x==0)
}
mx <- mean(x)
stx <- sd(x)
x <- (x-mx)/stx
x
}
This is my log likelihood
Lik1<-function(theta, basis){
theta0=theta[1]
theta1=theta[2]
L=-n/2*log(theta0)-sum(basis/2)*log(theta1)-0.5/theta0*sum(e2/theta1^basis)
return(L)
}
basis1=get_mls_basis(1)
Here I used 5 as initial value
optimx(par=c(5,5), Lik1,
basis=basis1,method='bobyqa',control = list(maximize=TRUE))

A lot of variables in lm regression and large list

My task is to do some researches in dummy variables. Here is a R code:
parameters_estimation2 <- function(n)
{
age <- sample(20:40, n, replace=TRUE)
male <- sample(0:1, n, replace=TRUE)
education <- sample(0:6, n, replace=TRUE)
experience <- floor(rexp(n, 0.2))
for(i in 1:n)
{if(experience[i]>15) {
experience[i] <- floor(rexp(1, 0.2))
if(experience[i]>15) { i <- i-1 }
}}
sqexperience <- experience*experience
e <- rnorm(n, 0, 4)
B0 <- -200; B1 <- 15; B2 <- 100; B3 <-10; B4 <- 5; B5 <-20;
wage <- B0 + B1*age + B2*male + B3*education+ B4*experience+ B5*sqexperience+e
#Dummy making
expe1 <- c(rep(0,n)); expe2 <- c(rep(0,n)); expe3 <- c(rep(0,n)); expe4 <- c(rep(0,n));
expe5 <- c(rep(0,n)); expe6 <- c(rep(0,n)); expe7 <- c(rep(0,n)); expe8 <- c(rep(0,n));
expe9 <- c(rep(0,n)); expe10 <- c(rep(0,n)); expe11 <- c(rep(0,n)); expe12 <- c(rep(0,n));
expe13 <- c(rep(0,n)); expe14 <- c(rep(0,n)); expe15 <- c(rep(0,n));
for(i in 1:n)
{
if(experience[i]==1) { expe1[i] <-1
} else if(experience[i]==2) { expe2[i] <-1
} else if(experience[i]==3) { expe3[i] <-1
} else if(experience[i]==4) { expe4[i] <-1
} else if(experience[i]==5) { expe5[i] <-1
} else if(experience[i]==6) { expe6[i] <-1
} else if(experience[i]==7) { expe7[i] <-1
} else if(experience[i]==8) { expe8[i] <-1
} else if(experience[i]==9) { expe9[i] <-1
} else if(experience[i]==10) { expe10[i] <-1
} else if(experience[i]==11) { expe11[i] <-1
} else if(experience[i]==12) { expe12[i] <-1
} else if(experience[i]==13) { expe13[i] <-1
} else if(experience[i]==14) { expe14[i] <-1
} else if(experience[i]==15) { expe15[i] <-1
}}
regression<-lm(wage~age+male+education+expe1+expe2+expe3+expe4+expe5+expe6+expe7+expe8+expe9+expe10+expe11+expe12+expe13+expe14+expe15)
return(summary(regression)$coefficients[,"Estimate"])
}
times <- 1000
size <- rep(200, times)
koeficientai1 <-mapply(parameters_estimation2, size)
blah <- as.data.table(koeficientai1)
beta0sample200d <- mean(koeficientai1[,"(Intercept)"])
And the problem is that in last line I get:
Error in koeficientai1[, "(Intercept)"] : incorrect number of dimensions
I think the problem is that koeficientai1 is large list. But then I'm trying another lm regression with just 5 variables, the code is working and I get simple data frame.
Try replacing the last line by
beta0sample200d <- mean(sapply(koeficientai1, function(x) x["(Intercept)"]))
koeficientai1 is a list, but you try to access it as a data.frame, hence the error message.
sapply extracts the element named (Intercept) from each list element in koeficientai1(in your case each list element is a named vector) and returns a vector that contains the results.

for loop creating vector r

I am trying to create a function to calculate the Box-Cox transformation in R, where you iterate values of lambda (lambdas) in a formula to maximize L. What I ultimately want is a vector of L, such that for all i in lambda, there is a corresponding L value.
y <- c(256,256,231,101,256,213,241,246,207,143,287,240,262,234,146,255,184,161,252,229,283,132,218,113,194,237,181,262,104)
df <- 28
n=29
lambdas <- seq(-3,3,0.001)
L <- c(rep(NA,length(lambdas)))
for(i in lambdas) {
if(i != 0) {
yprime <- (((y^i)-1)/i)
} else
{ yprime <- log(y)
}
st2 <- var(yprime)
L <- (((-df/2)*(log(st2))) + ((i-1)*(df/n)*(sum(log(y)))))
}
What I typically end up with L as a vector of 1, with the final iteration calculated.
Use seq_along to generate an index for lambdas[] and L[]
for(i in seq_along(lambdas)) {
if(i != 0) {
yprime <- (((y^lambdas[i])-1)/lambdas[i])
} else {
yprime <- log(y)
}
st2 <- var(yprime)
L[i] <- (((-df/2)*(log(st2))) + ((lambdas[i]-1)*(df/n)*(sum(log(y)))))
}
plot(L)

Speed up WMA (Weighted Moving Average) calculation

I am trying to calculate exponential moving average on 15 day bars, but want to see "evolution" of the 15 day bar EMA on each (end of) day/bar. So, this means that I have 15 day bars. When new data comes in on a daily basis I would like to recalculate EMA using new information. Actually I have 15 day bars and then, after each day my new 15 day bar starts to grow and each new bar that comes along is supposed to be used for EMA calculation together with previous full 15 day bars.
Lets say we start at 2012-01-01 (we have data for each calender day for this example), at the end of 2012-01-15 we have the first complete 15 day bar. After 4 completed full 15 day bars on 2012-03-01 we can start calculating 4 bar EMA (EMA(x, n=4)). On the end of 2012-03-02 we use information we have until this moment and calculate EMA on 2012-03-02 pretending that OHLC for 2012-03-02 is the 15 day bar in progress. So we take the 4 complete bars and the bar on 2012-03-02 and calculate EMA(x, n=4). We then wait another day, see what happened with the new 15 day bar in progress (see function to.period.cumulative below for details) and calculate new value for EMA... And so for the next 15 days onwards... See function EMA.cumulative below for details...
Below please find what I was able to come up with until now. The performance is not acceptable for me and I can not make it any faster with my limited R knowledge.
library(quantmod)
do.call.rbind <- function(lst) {
while(length(lst) > 1) {
idxlst <- seq(from=1, to=length(lst), by=2)
lst <- lapply(idxlst, function(i) {
if(i==length(lst)) { return(lst[[i]]) }
return(rbind(lst[[i]], lst[[i+1]]))
})
}
lst[[1]]
}
to.period.cumulative <- function(x, name=NULL, period="days", numPeriods=15) {
if(is.null(name))
name <- deparse(substitute(x))
cnames <- c("Open", "High", "Low", "Close")
if (has.Vo(x))
cnames <- c(cnames, "Volume")
cnames <- paste(name, cnames, sep=".")
if (quantmod:::is.OHLCV(x)) {
x <- OHLCV(x)
out <- do.call.rbind(
lapply(split(x, f=period, k=numPeriods),
function(x) cbind(rep(first(x[,1]), NROW(x[,1])),
cummax(x[,2]), cummin(x[,3]), x[,4], cumsum(x[,5]))))
} else if (quantmod:::is.OHLC(x)) {
x <- OHLC(x)
out <- do.call.rbind(
lapply(split(x, f=period, k=numPeriods),
function(x) cbind(rep(first(x[,1]), NROW(x[,1])),
cummax(x[,2]), cummin(x[,3]), x[,4])))
} else {
stop("Object does not have OHLC(V).")
}
colnames(out) <- cnames
return(out)
}
EMA.cumulative<-function(cumulativeBars, nEMA = 4, period="days", numPeriods=15) {
barsEndptCl <- Cl(cumulativeBars[endpoints(cumulativeBars, on=period, k=numPeriods)])
# TODO: This is sloooooooooooooooooow...
outEMA <- do.call.rbind(
lapply(split(Cl(cumulativeBars), period),
function(x) {
previousFullBars <- barsEndptCl[index(barsEndptCl) < last(index(x)), ]
if (NROW(previousFullBars) >= (nEMA - 1)) {
last(EMA(last(rbind(previousFullBars, x), n=(nEMA + 1)), n=nEMA))
} else {
xts(NA, order.by=index(x))
}
}))
colnames(outEMA) <- paste("EMA", nEMA, sep="")
return(outEMA)
}
getSymbols("SPY", from="2010-01-01")
SPY.cumulative <- to.period.cumulative(SPY, , name="SPY")
system.time(
SPY.EMA <- EMA.cumulative(SPY.cumulative)
)
On my system it takes
user system elapsed
4.708 0.000 4.410
Acceptable execution time would be less than one second... Is it possible to achieve this using pure R?
This post is linked to Optimize moving averages calculation - is it possible? where I received no answers. I was now able to create a reproducible example with more detailed explanation of what I want to speed up. I hope the question makes more sense now.
Any ideas on how to speed this up are highly appreciated.
I have not find a satisfactory solution for my question using R. So I took the old tool, c language, and results are better than I would have ever expected. Thanks for "pushing" me using this great tools of Rcpp, inline etc. Amazing. I guess, whenever I have performance requirements in the future and can not be met using R I will add C to R and performance is there. So, please see below my code and resolution of the performance issues.
# How to speedup cumulative EMA calculation
#
###############################################################################
library(quantmod)
library(Rcpp)
library(inline)
library(rbenchmark)
do.call.rbind <- function(lst) {
while(length(lst) > 1) {
idxlst <- seq(from=1, to=length(lst), by=2)
lst <- lapply(idxlst, function(i) {
if(i==length(lst)) { return(lst[[i]]) }
return(rbind(lst[[i]], lst[[i+1]]))
})
}
lst[[1]]
}
to.period.cumulative <- function(x, name=NULL, period="days", numPeriods=15) {
if(is.null(name))
name <- deparse(substitute(x))
cnames <- c("Open", "High", "Low", "Close")
if (has.Vo(x))
cnames <- c(cnames, "Volume")
cnames <- paste(name, cnames, sep=".")
if (quantmod:::is.OHLCV(x)) {
x <- quantmod:::OHLCV(x)
out <- do.call.rbind(
lapply(split(x, f=period, k=numPeriods),
function(x) cbind(rep(first(x[,1]), NROW(x[,1])),
cummax(x[,2]), cummin(x[,3]), x[,4], cumsum(x[,5]))))
} else if (quantmod:::is.OHLC(x)) {
x <- OHLC(x)
out <- do.call.rbind(
lapply(split(x, f=period, k=numPeriods),
function(x) cbind(rep(first(x[,1]), NROW(x[,1])),
cummax(x[,2]), cummin(x[,3]), x[,4])))
} else {
stop("Object does not have OHLC(V).")
}
colnames(out) <- cnames
return(out)
}
EMA.cumulative<-function(cumulativeBars, nEMA = 4, period="days", numPeriods=15) {
barsEndptCl <- Cl(cumulativeBars[endpoints(cumulativeBars, on=period, k=numPeriods)])
# TODO: This is sloooooooooooooooooow...
outEMA <- do.call.rbind(
lapply(split(Cl(cumulativeBars), period),
function(x) {
previousFullBars <- barsEndptCl[index(barsEndptCl) < last(index(x)), ]
if (NROW(previousFullBars) >= (nEMA - 1)) {
last(EMA(last(rbind(previousFullBars, x), n=(nEMA + 1)), n=nEMA))
} else {
xts(NA, order.by=index(x))
}
}))
colnames(outEMA) <- paste("EMA", nEMA, sep="")
return(outEMA)
}
EMA.c.c.code <- '
/* Initalize loop and PROTECT counters */
int i, P=0;
/* ensure that cumbars and fullbarsrep is double */
if(TYPEOF(cumbars) != REALSXP) {
PROTECT(cumbars = coerceVector(cumbars, REALSXP)); P++;
}
/* Pointers to function arguments */
double *d_cumbars = REAL(cumbars);
int i_nper = asInteger(nperiod);
int i_n = asInteger(n);
double d_ratio = asReal(ratio);
/* Input object length */
int nr = nrows(cumbars);
/* Initalize result R object */
SEXP result;
PROTECT(result = allocVector(REALSXP,nr)); P++;
double *d_result = REAL(result);
/* Find first non-NA input value */
int beg = i_n*i_nper - 1;
d_result[beg] = 0;
for(i = 0; i <= beg; i++) {
/* Account for leading NAs in input */
if(ISNA(d_cumbars[i])) {
d_result[i] = NA_REAL;
beg++;
d_result[beg] = 0;
continue;
}
/* Set leading NAs in output */
if(i < beg) {
d_result[i] = NA_REAL;
}
/* Raw mean to start EMA - but only on full bars*/
if ((i != 0) && (i%i_nper == (i_nper - 1))) {
d_result[beg] += d_cumbars[i] / i_n;
}
}
/* Loop over non-NA input values */
int i_lookback = 0;
for(i = beg+1; i < nr; i++) {
i_lookback = i%i_nper;
if (i_lookback == 0) {
i_lookback = 1;
}
/*Previous result should be based only on full bars*/
d_result[i] = d_cumbars[i] * d_ratio + d_result[i-i_lookback] * (1-d_ratio);
}
/* UNPROTECT R objects and return result */
UNPROTECT(P);
return(result);
'
EMA.c.c <- cfunction(signature(cumbars="numeric", nperiod="numeric", n="numeric", ratio="numeric"), EMA.c.c.code)
EMA.cumulative.c<-function(cumulativeBars, nEMA = 4, period="days", numPeriods=15) {
ratio <- 2/(nEMA+1)
outEMA <- EMA.c.c(cumbars=Cl(cumulativeBars), nperiod=numPeriods, n=nEMA, ratio=ratio)
outEMA <- reclass(outEMA, Cl(cumulativeBars))
colnames(outEMA) <- paste("EMA", nEMA, sep="")
return(outEMA)
}
getSymbols("SPY", from="2010-01-01")
SPY.cumulative <- to.period.cumulative(SPY, name="SPY")
system.time(
SPY.EMA <- EMA.cumulative(SPY.cumulative)
)
system.time(
SPY.EMA.c <- EMA.cumulative.c(SPY.cumulative)
)
res <- benchmark(EMA.cumulative(SPY.cumulative), EMA.cumulative.c(SPY.cumulative),
columns=c("test", "replications", "elapsed", "relative", "user.self", "sys.self"),
order="relative",
replications=10)
print(res)
EDIT: To give an indication of performance improvement over my cumbersome (I am sure it can be made better, since in effect I have created double for loop) R here is a print out:
> print(res)
test replications elapsed relative user.self
2 EMA.cumulative.c(SPY.cumulative) 10 0.026 1.000 0.024
1 EMA.cumulative(SPY.cumulative) 10 57.732 2220.462 56.755
So, by my standards, a SF type of improvement...

Resources