I'm fitting a parametric model to some survival data with time-dependent covariates. The fitting procedure involves solving some ODEs iteratively - one ODE per time-interval per subject, but such that the initial condition for the ODE on the interval at hand is the last value of the solution to the ODE on the preceding interval. In that sense, the ODEs depend on each other.
My problem boils done to this: Right now, I'm solving these ODEs iteratively through a loop, since I need to use the last value of the previous solution as the starting point for the next. The problem is that this looping consumes a lot of time for large datasets. Is there some way in which I can use, say, vapply, or another vectorized function, to do the same thing?
I've been searching the archives, but nothing comes up as a solution to the problem of vectorizing an operation that depends on the previous value.
Here's a code example, that doesn't produce anything statistically meaningful on its own, but illustrates my problem:
require(odeSolve)
param <- c(a=1)
df <- function(t, state, param){
with( as.list(c(state, param)), {dX<-a*X; list(c(dX))} )
}
Data.i <- data.frame( lt=seq(0, 5, length=10)[-10],rt=seq(0, 5, length=10)[2:10], X=rnorm(9) )
Result <- vector(length=10)
Result[1] <- Data.i$X[1]
init <- c(X=Data.i$X[1])
for (k in 1:9){
t.seq <- seq(Data.i$lt[k],Data.i$rt[k],length=10)
sol <- as.numeric(ode(y = init, times = t.seq, func = df, parms = param)[10,-1])
Result[k+1] <- log(sol+X[k+1])
init <- c(X=sol)
}
Related
I am trying to create a function where Monte Carlo Simulation is applied to two of the variables in a DCF Model in R Studio. It supposed to take a first value FCF_0 and applied to it a specific growth FCF_ 0*(1 + growth), which is the first input variable until period 6, each period takes the last FCF to keep growing. After that I would like to discount it as well to get the present value which would be FCFn*(1/((1+WACC)^n)). Where WACC is the second variable to simulate.
So far I have the function to calculate the FCF but with a vector of specifics values of growth, which is the following:
What I am trying so far to create this function is this, but I think is bad.
Could you please help me to understand how to create both simulations and if it is neccesary for me to create two functions or in one function I can do everything? I would expect from the function to give the sum of all present values and each sum would be an element in a vector of 10.000 simulations. I am new at this and even though I have read almost for two weeks, I don't get how to create these simulations.
Thank you very much!
revfunc <- function(hist, growth){
rval <- c()
help <- c(hist)
for(i in growth){
help <- help*(1+i)
rval <- c(rval, help)
}
return(rval)
}
Monte Carlo Simulations
pvffcf_function <- function(fcf0, growth, wacc){
rval1 <- c()
help <- c(fcf0)
pvs <- rval1*(1/((1+wacc)^n))
random_growth <- rnorm(n=10000, mean(fcfgrowth), sd(fcfgrowth))
wacc <- rnorm(n=10000, 0.03804, 0.007711)
pvffcf <- sum(freecashflows)
for(i in growth){
help <- help*(1+i)
rval1 <- c(rval1, help)
}
return(freecashflows)
}
I am trying to calculate the area under the curve for every 10ms of a short piece of EEG wave. To first practice this I made a small dataset to run the auc (from package {flux}) function on.
x <- seq(1:10)
y <- c(0:4,5:1)
df <- data.frame(x,y)
attach(df)
plot(x,y)
for (i in 1:10){
x1 <- c(i,(i+1))
y1 <- c(subset(y, x == i),subset(y, x == (i+1)))
auc(x1,y1,thresh = 0)
rm(y1,x1,i)
}
The loop should try to subset two data points from each variable and then run a auc over those data points. However, when running the loop, I get this error:
Error in seq.default(x[1], x[2], length.out = dens) : 'to' must be a finite number
When I run the subset and auc code outside of the loop, it works no problem. Can anyone tell me what's going wrong in the loop?
Thanks for updating the question. It's not because of the control statement (for loop), the error gets thrown precisely when i=10 -- because the length of your x-coords and y-coords vectors are different. Specifically c(10,11) vs c(1). But you have no point at x=11 !
just stop the loop early, at the appropriate time
I'm using the deSolve package to plot a couple differential equations (read if interested http://www.maa.org/press/periodicals/loci/joma/the-sir-model-for-spread-of-disease-the-differential-equation-model).
My eventual goal is to create an iterative function or process (for loop) to plot how changes in certain parameters (beta and gamma) will affect the solution. The preferred output would be a list that contains all each ode solution for each specified value of beta in the loop. I'm running into issues for integrating a loop into the setup that the deSolve package requires for the ode function.
In the code below, I am trying to plot how the range of values (1 to 2 by increments of 0.1) in parameter beta will affect the plot of the differential equations.
for(k in seq(1,2,by=0.1)){ #range of values for beta
init <- c(S=1-1e-6, I=1e-6, R=0) #initial conditions for odes
time <- seq(0,80,by=1) #time period
parameters <- c(beta=k, gamma=0.15) #parameters in ode
SIR <- function(time,state,parameters){ #function containing equaations
with(as.list(c(state,parameters)),{
dS <- -beta*S*I
dI <- beta*S*I-gamma*I
dR <- gamma*I
return(list(c(dS,dI,dR)))
})
}
ode(y=init,times=time,func=SIR()[beta],parms=parameters[k])}
}
The first error I'm getting states that the argument parameters in the SIR function is missing
Error in as.list(c(init, parameters)) : argument "parameters" is
missing, with no default
I don't understand why this error is being reported when I've assigned parameters in the lines previous.
You might as well define your gradient function (and the other non-changing elements) outside the loop:
SIR <- function(time,state,parameters) {
with(as.list(c(state,parameters)),{
dS <- -beta*S*I
dI <- beta*S*I-gamma*I
dR <- gamma*I
return(list(c(dS,dI,dR)))
})
}
init <- c(S=1-1e-6, I=1e-6, R=0) #initial conditions for odes
time <- seq(0,80,by=1) #time period
Now define the vector of values to try (not necessary but convenient):
betavec <- seq(1,2,by=0.1)
and define a list to hold the results:
res <- vector(length(betavec),mode="list")
library(deSolve)
for (k in seq_along(betavec)){ #range of values for beta
res[[k]] <- ode(y=init,times=time,func=SIR,
parms=c(beta=betavec[k], gamma=0.15))
}
Now you have a list, each element of which contains the results from one run. You can sapply or lapply over this list, e.g. to get a matrix of the last states from each run:
t(sapply(res,tail,1))
Or if you want the results as one long data frame ...
names(res) <- betavec ## to get beta value incorporated in results
dd <- dplyr::bind_rows(lapply(res,as.data.frame),.id="beta")
dd$beta <- as.numeric(dd$beta)
do.call(rbind,...) would work nearly as well as bind_rows(), but bind_rows's .id argument is convenient for adding the beta values to each data frame. You could also leave the results as a list and loop over them while plotting with separate lines() calls, or (e.g.) bind just the infective columns together and use matplot() to plot them all at the same time. This is just a matter of style and idiom.
library(ggplot2); theme_set(theme_bw())
library(viridis)
ggplot(dd,aes(x=time,y=I,colour=beta))+
geom_line(aes(group=beta))+
scale_color_viridis()+
scale_y_log10()
I've been trying hard to recreate this model in R:
Model
(FARHANI 2012)
I've tried many things, such as a cumsum paste - however that would not work as I could not assign strings the correct variable as it kept thinking that L was a function.
I tried to do it manually, I'm only looking for p,q = 1,2,3,4,5 however after starting I realized how inefficient this is.
This is essentially what I am trying to do
model5 <- vector("list",20)
#p=1-5, q=0
model5[[1]] <- dynlm(DLUSGDP~L(DLUSGDP,1))
model5[[2]] <- dynlm(DLUSGDP~L(DLUSGDP,1)+L(DLUSGDP,2))
model5[[3]] <- dynlm(DLUSGDP~L(DLUSGDP,1)+L(DLUSGDP,2)+L(DLUSGDP,3))
model5[[4]] <- dynlm(DLUSGDP~L(DLUSGDP,1)+L(DLUSGDP,2)+L(DLUSGDP,3)+L(DLUSGDP,4))
model5[[5]] <- dynlm(DLUSGDP~L(DLUSGDP,1)+L(DLUSGDP,2)+L(DLUSGDP,3)+L(DLUSGDP,4)+L(DLUSGDP,5))
I'm also trying to do this for regressing DLUSGDP on DLWTI (my oil variable's name) for when p=0, q=1-5 and also p=1-5, q=1-5
cumsum would not work as it would sum the variables rather than treating them as independent regresses.
My goal is to run these models and then use IC to determine which should be analyzed further.
I hope you understand my problem and any help would be greatly appreciated.
I think this is what you are looking for:
reformulate(paste0("L(DLUSGDP,", 1:n,")"), "DLUSGDP")
where n is some order you want to try. For example,
n <- 3
reformulate(paste0("L(DLUSGDP,", 1:n,")"), "DLUSGDP")
# DLUSGDP ~ L(DLUSGDP, 1) + L(DLUSGDP, 2) + L(DLUSGDP, 3)
Then you can construct your model fitting by
model5 <- vector("list",20)
for (i in 1:20) {
form <- reformulate(paste0("L(DLUSGDP,", 1:i,")"), "DLUSGDP")
model5[[i]] <- dynlm(form)
}
i have do to a monte carlo approach for AR(1) time series. I have to generate 10,000 time series of length 100 and afterwards i have to get the first step autocorrelation rho_1 for every time series. My problem is that i just get NA values for the autocorrelation and the calculation takes way to much time. I have no problem with computing the AR(1) time series.
Thank you for your help :)
gen_ar <- function(a,b,length,start)
{
z<-rep(0,length)
e<-rnorm(n=length,sd=1)
z[1]<-start
for (i in 2:length)
{
z[i]<-a+b*z[i-1]+e[i]
}
z
}
mc <- matrix(c(rep(0,10000000)),nrow=10000)
for (i in 1:10000)
{
mc[i,] <- gen_ar(0.99,1,100,0)
}
ac <- matrix(c(rep(0,10000)),nrow=1)
for (i in 1:10000){
for (j in 1:99){
ac[i] <- cor(mc[i,j],mc[i,j+1])
}
}
Statistics aside, I think this achieves your goals, and I don't get NA's. I changed the way it was done b/c you said it was going slow.
mc <- matrix(rep(NA,1E5), nrow=100)
for(i in seq_len(100)){
mc[,i] <- arima.sim(model=list(ar=0.99), n=100, sd=1) + 1
}
myAR <- function(x){
cor(x[-1], x[-length(x)])
}
answer <- apply(mc, 2, myAR)
I skipped the last set of nested for loops and replaced them with apply(). It seems easier to read, and is likely faster. Also, to use apply(), I created a function called myAR, which carries out the same calculation that cor() did in your for() loops.
Now, there are a couple of statistical adjustments that I made. Primarily, these were in the simulation step.
First, your simulated AR(1) process has a coefficient that is equal to 1, which seems odd to me (this would not be stationary, and arima.sim() won't even let you simulate this type of process).
Moreover, your "a" parameter adds 1 to the time series at each time step. In other words, your time series is monotonically increasing from 1 to 100 because the coefficient is equal to 1. This too would make your time series nonstationary, and with such a strong positive slope the cor() function would likely return 1 as the estimated correlation, regardless of the value of the simulated AR coefficient. I assume that you wanted the long-term mean to hover near 1, so the 1 is simply added to the entire time series after it is simulated, not iteratively at each time step.
Assuming that you did want to generate a nonstationary time series by adding some constant (a) at each time step, you could do the following:
myInnov <- function(N=100, a=1, SD=1) {a + rnorm(n=N, sd=SD)}
mc2 <- matrix(rep(NA,1E7), nrow=100)
for(i in seq_len(1E5)){
mc2[,i] <- arima.sim(model=list(ar=0.99), n=100, innov=myInnov(a=1, N=100, SD=1)) + 1
}
I hope that this helps.