For loop for time series data - r

Suppose I have the following simple formula
y[t] = alpha * y[t-1] + beta * y[t-2]
Where y[t-1] and y[t-2] are known observations.
Now the next observation at time t+1 can be written as
y[t+1] = alpha * y[t] + beta * y[t-1]
Using previous information of y[t] yields
y[t+1] = alpha * (alpha * y[t-1] + beta * y[t-2]) + beta * y[t-1]
Simplifying
y[t+1] = alpha^2 * y[t-1] + alpha * beta * y[t-2] + beta * y[t-1]
The next observation
y[t+2] = alpha^2 * y[t] + alpha * beta * y[t-1] + beta * y[t]
Where again y[t] can be inserted.
Now my question is how do I write a for loop such that I can construct y at time t+h where h is an integer value.

Something like this?
y = c(1, 2, rep(0, 49))
alpha <- .4
beta <- .9
for (t in 3:length(y)) {
y[t] <- alpha * y[t-1] + beta * y[t-2]
}
nms <- paste0('y[', as.character(seq(-2, 48, 1)), ']')
`names<-`(y, nms)
#> y[-2] y[-1] y[0] y[1] y[2] y[3]
#> 1.000000 2.000000 1.700000 2.480000 2.522000 3.240800
#> y[4] y[5] y[6] y[7] y[8] y[9]
#> 3.566120 4.343168 4.946775 5.887561 6.807122 8.021654
#> y[10] y[11] y[12] y[13] y[14] y[15]
#> 9.335072 10.953517 12.782971 14.971354 17.493216 20.471505
#> y[16] y[17] y[18] y[19] y[20] y[21]
#> 23.932496 27.997353 32.738188 38.292893 44.781526 52.376214
#> y[22] y[23] y[24] y[25] y[26] y[27]
#> 61.253859 71.640136 83.784528 97.989934 114.602048 134.031760
#> y[28] y[29] y[30] y[31] y[32] y[33]
#> 156.754547 183.330403 214.411254 250.761864 293.274874 342.995627
#> y[34] y[35] y[36] y[37] y[38] y[39]
#> 401.145637 469.154319 548.692801 641.716008 750.509925 877.748377
#> y[40] y[41] y[42] y[43] y[44] y[45]
#> 1026.558283 1200.596853 1404.141196 1642.193646 1920.604534 2246.216095
#> y[46] y[47] y[48]
#> 2627.030519 3072.406693 3593.290144
Created on 2021-06-12 by the reprex package (v2.0.0)

Related

Creating a 3D Plot of a Polynomial Function with Uniform Distributed Values

I have an equation which goes like this,
2* (1-x-a-b)^2 * x * *theta* + 2 * (1-a-b-x) * x^2 * *theta* - 2 * b * x^2 + 2 * a * (1-a-b-x)^2 = 0
I want to create a function in R, that selects a and b with restriction (a + b < 1 - a + b) from an uniform distribution. After selecting, I want it to find the solutions for x (both negative and positive).
I want to repeat this process t amount of time in a for loop where I will give the theta value as an input.
After that I want it to create a 3D density plot where solutions are shown with respect to values of a,b on two axes and x on one axis.
So far I have tried to use polynom package and solve function. But I am having hard time with R when it comes to mathematics.
You need to rewrite the polynomial in standard form a0 + a1*x + a2*x^2 + a3*x^3, then you can use the base function polyroot() to find the roots. For example,
a0 <- 2 * a * (1 - a - b)^2
a1 <- 2 * (1 - a - b)^2 * theta - 4 * a * (1 - a - b)
a2 <- -4 * (1 - a - b) * theta + 2 * (1 - a - b) * theta - 2 * b + 2 * a
a3 <- 0
So this is a quadratic equation, not a cubic as it appears at first glance.
Then use
polyroot(c(a0, a1, a2))
to find the roots. Select the real roots, and put them together into a matrix roots with columns a, b, root, then use rgl::plot3d(roots) to display them.
I think you have a typo in your restriction, so I'll ignore it, and this is the plot I get for theta == 1:
theta <- 1
a <- runif(1000)
b <- runif(1000)
a0 <- 2*a*(1-a-b)^2
a1 <- 2*(1-a-b)^2*theta -4*a*(1-a-b)
a2 <- -4*(1-a-b)*theta + 2*(1-a-b)*theta-2*b+2*a
result <- matrix(numeric(), ncol = 3, dimnames = list(NULL, c("a", "b", "root")))
for (i in seq_along(a)) {
root <- polyroot(c(a0[i], a1[i], a2[i]))
if (max(Im(root)) < 1.e8)
result <- rbind(result, cbind(a[i], b[i], Re(root)))
}
library(rgl)
plot3d(result)
Created on 2022-06-14 by the reprex package (v2.0.1)
Most of the roots are really small, but for some of them a2 is nearly zero, and then they can be very large.
You can create a table with a column for each variable and filter the rows not satisfying your equation:
library(tidyverse)
set.seed(1337)
n <- 1000
tibble(
a = runif(n),
b = runif(n)
) |>
filter(a + b < 1 - a + b) |>
expand_grid(
theta = seq(0, 1, by = 1),
x = seq(0, 1, by = 1)
) |>
filter(
2 * (1 - x - a - b)^2 * x * theta + 2 * (1 - a - b - x) * x^2 * theta - 2 *
b * x^2 + 2 * a * (1 - a - b - x)^2 == 0
)
#> # A tibble: 0 × 4
#> # … with 4 variables: a <dbl>, b <dbl>, theta <dbl>, x <dbl>
Created on 2022-06-13 by the reprex package (v2.0.0)
Unfortunately, there is no point in the sampled space satisfying your equation. This is probably due to ==0 instead of <e where e is a very small error. One needs to allow small errors in numerical sampling solutions.
Why just not solve the roots of the equation analytically?

Solve Equation in R for L

I have the following equation and would like R to solve for L.
Any thought?
Average = 370.4
m = 2
p = 0.2
n = 5
#L = ?
log10(Average) = 0.379933834 -0.107509315* m + 0.104445717 * p + 0.016517169 * n -0.025566689* L + 0.014393465 * m * p + 0.001601271 * m * n - 0.014250365 * n * L + 0.002523518 * m^2 + 0.237090759 * L^2
Your equation is a quadratic, so the quadratic formula works. Alternatively, you can solve numerically using uniroot:
Average = 370.4
m = 2
p = 0.2
n = 5
#L = ?
f0 <- function(L) {
0.379933834 - 0.107509315*m + 0.104445717*p + 0.016517169*n - 0.025566689*L + 0.014393465*m*p + 0.001601271*m*n - 0.014250365*n*L + 0.002523518*m^2 + 0.237090759*L^2 - log10(Average)
}
# solve numerically using uniroot
(nroots <- c(uniroot(f0, c(0, 10))$root, uniroot(f0, c(-10, 0))$root))
#> [1] 3.304099 -2.895724
# solve analytically using the quadratic formula
a <- 0.237090759
b <- -0.025566689 - 0.014250365*n
c <- 0.379933834 - 0.107509315*m + 0.104445717*p + 0.016517169*n + 0.014393465*m*p + 0.001601271*m*n + 0.002523518*m^2 - log10(Average)
(aroots <- (-b + c(1, -1)*sqrt(b^2 - 4*a*c))/(2*a))
#> [1] 3.304084 -2.895724
# check the solutions
f0(c(nroots, aroots))
#> [1] 2.255707e-05 -5.932209e-08 4.440892e-16 4.440892e-16

How can I code the mean and standard deviation for a weighted binomial distribution?

I'm working on a genetics problem where I have 20 genes which each have two alleles. This results in 40 values that can be 1 or 0.
For this distribution I get an expected value of 20 (np) and a variance of 10 (np(1-p)) because n=40 and p=0.5 (see here).
But I weight the contribution of each of these genes. The weights are calculated as follows:
res <- optimize(function(lambda) (sum(exp(-lambda * (1:20))) -5)^2, 0:1, tol = .Machine$double.eps)
res
x <- c(1:20)
lambda <- res$minimum
y<-exp(-lambda*x)
Note that because each of the genes has 2 alleles, each weight is used twice.
gene1.1 * weight1 + gene1.2 * weight 1 + gene2.1 * weight2 + gene2.2 * weight2...
I want to calculate the expected value and variance of this new distribution but I'm not sure how to do this in R. Indeed I don't know the mathematical form of this at all.
Hope you can help
Given n Bernoulli random variables X1, ..., X20 with the same parameter p and weights w1, ..., w20, the expectation of their sum is
E[sumin wiXi] = p sumin wi
and the variance is
Var[sumin wiXi] = sumin wi2Var[Xi] = p(1-p) sumin wi2
This gives
p <- 0.5
n <- 20
# No weights
2 * n * p # Mean
# [1] 20
2 * n * p * (1 - p) # Variance
# [1] 10
# Weights
2 * sum(y) * p # Mean
# [1] 5
2 * p * (1 - p) * sum(y^2) # Variance
# [1] 1.172048
# Unweighted case again
y <- rep(1, n)
2 * sum(y) * p # Mean
# [1] 20
2 * p * (1 - p) * sum(y^2) # Variance
# [1] 10

Mixed Integer Programming in R - Indicator functions

I hope this message finds you well.
I am trying to solve an optimization problem formulated as a Mixed Integer Program with the lpSolveAPI R-package. However, there are indicator functions in the objective function and in some constraints. To be more specific, consider the following optimization problem:
min{ 2.8 * x1 + 3.2 * x2 + 3.5 * x3 +
17.5 * delta(x1) + 2.3 * delta(x2) + 5.5 * delta(x3) }
subject to:
0.4 * x1 + 8.7 * x2 + 4.5 * x3 <=
387 - 3 * delta(x1) - 1 * delta(x2) - 3 * delta(x3)
x1 <= 93 * delta(x1)
x2 <= 94 * delta(x2),
x3 <= 100 * delta(x3), and
x1, x2, and x3 are non-negative integers.
In this problem, for all i in {1, 2, 3}, delta(xi) = 1 if xi > 0, whereas delta(xi) = 0 otherwise.
The R-code I have so far is:
install.packages("lpSolveAPI")
library(lpSolveAPI)
a <- c(3, 1, 3)
b <- c(0.4, 8.7, 4.5)
q <- 387
M <- c(93, 94, 100)
A <- c(17.5, 2.3, 5.5)
h <- c(2.8, 3.2, 3.5)
Fn <- function(u1, u2, u3, u4){
lprec <- make.lp(0, 3)
lp.control(lprec, "min")
set.objfn(lprec, u1)
add.constraint(lprec, u2, "<=", u3)
set.bounds(lprec, lower = rep(0, 3), upper = u4)
set.type(lprec, columns = 1:3, type = "integer")
solve(lprec)
return(list(Soln = get.variables(lprec), MinObj = get.objective(lprec)))
}
TheTest <- Fn(u1 = h, u2 = b, u3 = q, u4 = M)
Please, I was wondering if someone could tell me how to put delta functions into this R-code to solve the aforementioned optimization problem.
Rodrigo.
A constraint like x1 <= 93 * delta(x1) looks very strange to me. I think this is just x1 <= 93. For a MIP solver replace the function delta(x) by a binary variable d. Then add the constraint d <= x <= M*d where M is an upper bound on x. To be explicit, for your model we have:
min 2.8*x1 + 3.2*x2 + 3.5*x3 + 17.5*d1 + 2.3*d2 + 5.5*d3
0.4*x1 + 8.7*x2 + 4.5*x3 <= 387 - 3*d1 - d2 - 3*d3
d1 <= x1 <= 93*d1
d2 <= x2 <= 94*d2
d3 <= x3 <= 100*d3
x1 integer in [0,93]
x2 integer in [0,94]
x3 integer in [0,100]
d1,d2,d3 binary
This is now trivial to solve with any MIP solver. Note that a double inequality like d1 <= x1 <= 93*d1 can be written as two inequalities: d1<=x1 and x1<=93*d1.

How to structure ODEs in R based on multiple groups

I am trying to simulate cell uptake in R, having ported a model from Berkeley Madonna. The model is comprised of several constants and differential equations to calculate amounts and concentrations. A portion of the code is listed:
library(deSolve)
fb = 0.0510
Km = 23.5
Pdif = 0.429
Vmax = 270
Vol_cell = 9.33
Vol_media = 150
S = 10 #concentration of dosing media
yini = c(Amt_media=(S*Vol_media)-(S*fb*Vol_cell),
Amt_cell=S*fb*Vol_cell,
Amt_total=S*Vol_media,
Con_media=S-(S*fb),
Con_cell=S*fb)
Uptake = function(t, y, p){
dy1 = (- (Pdif * y[1]) + (Pdif * y[2]) - ((Vmax * y[4])/(Km + y[4])))
dy2 = (+ (Pdif * y[1]) - (Pdif * y[2]) + ((Vmax * y[4])/(Km + y[4])))
dy3 = dy1 + dy2
dy4 = dy1 / Vol_media
dy5 = dy2 / Vol_cell
list(c(dy1, dy2, dy3, dy4, dy5))}
times1 = seq(from=0, to=15, by=0.01)
out1 = ode(y=yini, times=times1, func=Uptake, parms=NULL, method="rk4")
The rest of the code is for output to dataframes and plotting. My question then is how to have the code structured to use "S" as a list of several concentrations such that each concentration can be applied to the differential equations (essentially giving me an out1 for S1, out2 for S2, etc, that can then be passed onto a dataframe)? In Berkeley Madonna this was achieved by writing over 35 differential equations, though I'd like to use a simplified approach in R if possible.
The only part where S is used is in the initialization of the yini values. Basically we just need to move that part and the part that runs ode with those values into a new function. Then you can call that function for what ever values you want. For example
#set up
library(deSolve)
fb <- 0.0510
Km <- 23.5
Pdif <- 0.429
Vmax <- 270
Vol_cell <- 9.33
Vol_media <- 150
Uptake <- function(t, y, p){
dy1 = (- (Pdif * y[1]) + (Pdif * y[2]) - ((Vmax * y[4])/(Km + y[4])))
dy2 = (+ (Pdif * y[1]) - (Pdif * y[2]) + ((Vmax * y[4])/(Km + y[4])))
dy3 = dy1 + dy2
dy4 = dy1 / Vol_media
dy5 = dy2 / Vol_cell
list(c(dy1, dy2, dy3, dy4, dy5))}
times1 <- seq(from=0, to=15, by=0.01)
# function with S as a parameter
runConc <- function(S) {
yini <- c(Amt_media=(S*Vol_media)-(S*fb*Vol_cell),
Amt_cell=S*fb*Vol_cell,
Amt_total=S*Vol_media,
Con_media=S-(S*fb),
Con_cell=S*fb)
ode(y=yini, times=times1, func=Uptake, parms=NULL, method="rk4")
}
#run for concentrations 10,20,30
out <- lapply(c(10,20,30), runConc)
This will result in a list object with the results for each concentration. So out[[1]] is the result for S=10, out[[2]] is S=20, etc. We can see the first few lines of each of the results with
lapply(out, head, 3)
# [[1]]
# time Amt_media Amt_cell Amt_total Con_media Con_cell
# [1,] 0.00 1495.242 4.75830 1500 9.490000 0.510000
# [2,] 0.01 1488.103 11.89710 1500 9.442408 1.275145
# [3,] 0.02 1481.028 18.97216 1500 9.395241 2.033457
#
# [[2]]
# time Amt_media Amt_cell Amt_total Con_media Con_cell
# [1,] 0.00 2990.483 9.51660 3000 18.98000 1.020000
# [2,] 0.01 2976.550 23.44980 3000 18.88711 2.513377
# [3,] 0.02 2962.739 37.26072 3000 18.79504 3.993646
#
# [[3]]
# time Amt_media Amt_cell Amt_total Con_media Con_cell
# [1,] 0.00 4485.725 14.27490 4500 28.47000 1.53000
# [2,] 0.01 4465.153 34.84653 4500 28.33286 3.73489
# [3,] 0.02 4444.761 55.23920 4500 28.19690 5.92060

Resources