I am trying to break apart the R code in this post:
x <- c(0.17,0.46,0.62,0.08,0.40,0.76,0.03,0.47,0.53,0.32,0.21,0.85,0.31,0.38,0.69)
convolve.binomial <- function(p) {
# p is a vector of probabilities of Bernoulli distributions.
# The convolution of these distributions is returned as a vector
# `z` where z[i] is the probability of i-1, i=1, 2, ..., length(p)+1.
n <- length(p) + 1
z <- c(1, rep(0, n-1))
sapply(p, function(q) {z <<- (1 - q) * z + q * (c(0, z[-n])); q})
z
}
convolve.binomial(x)
[1] 5.826141e-05 1.068804e-03 8.233357e-03 3.565983e-02 9.775029e-02
[6] 1.804516e-01 2.323855e-01 2.127628e-01 1.394564e-01 6.519699e-02
[11] 2.141555e-02 4.799630e-03 6.979119e-04 6.038947e-05 2.647052e-06
[16] 4.091095e-08
I tried debugging in RStudio, but it still opaque.
The issue is with the line: sapply(p, function(q) {z <<- (1 - q) * z + q * (c(0, z[-n])); q}).
I guess that within the context of the call convolve.binomial(x) p = q = x. At least I get identical results if I pull the lines outside the function and run sapply(x, function(x) {z <<- (1 - x) * z + x * (c(0, z[-n])); x})
:
x <- c(0.17,0.46,0.62,0.08,0.40,0.76,0.03,0.47,0.53,0.32,0.21,0.85,0.31,0.38,0.69)
n <- length(x) + 1
z <- c(1, rep(0, n-1))
# [1] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sapply(x, function(x) {z <<- (1 - x) * z + x * (c(0, z[-n])); x})
z # Is extracted by calling it and contains the correct result
My questions are:
What is the purpose of the ;q} ending within sapply()?
How does it relate to the <<- symbol, meant to make z accessible outside of the "implicit" loop that is sapply()?
Below you can see my problem "hacking" this line of code:
(x_complem = 1 - x)
sapply(x, function(x) {z <<- x_complem * z + x * (c(0, z[-n])); x})
z # Returns 16 values and warnings
z_offset = c(0, z[-n])
# [1] 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sapply(x, function(x) {z <<- (1 - x) * z + x * z_offset; x})
z # Returns different values.
If you want to see the intermediate values of z as the function proceeds then insert either a cat or a print command in the code below:
sapply(x, function(x) {z <<- (1 - x) * z + x * (c(0, z[-n])); cat(z,"\n"); x})
#--------
0.83 0.17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0.4482 0.4736 0.0782 0 0 0 0 0 0 0 0 0 0 0 0 0
0.170316 0.457852 0.323348 0.048484 0 0 0 0 0 0 0 0 0 0 0 0
0.1566907 0.4348491 0.3341083 0.07047312 0.00387872 0 0 0 0 0 0 0 0 0 0 0
0.09401443 0.3235858 0.3744046 0.1759272 0.03051648 0.001551488 0 0 0 0 0 0 0 0 0 0
0.02256346 0.1491116 0.3357823 0.3267701 0.1410286 0.02356488 0.001179131 0 0 0 0 0 0 0 0 0
snipped rest of output
I think this makes it clearer that what is happening is that each intermediate step represents a set of probabilities for a series of events. Each row sums to 1.0 and represents the probabilities of individual count survivals when there might be a smaller number of binomial parameters. The final result displays the probabilities of particular sums of counts after the full sequence has been assembled.
Another interesting feature is that this result is invariant under random re-ordering of the probabilities in x (as it should be for the original question). Examine the intermediate results from
plot(x)
lines(seq(length(z)), z)
z2 <- convolve.binomial(sample(x) )
lines(seq(length(z)), z2, col="red" )
z3 <- convolve.binomial(sample(x) )
lines(seq(length(z)), z3, col="blue" )
What is the purpose of the ;q} ending within sapply()?
The function within sapply return q, but actually it's not needed. The following function will work just the same.
convolve.binomial <- function(p) {
n <- length(p) + 1
z <- c(1, rep(0, n-1))
sapply(p, function(q) {z <<- (1 - q) * z + q * (c(0, z[-n]))})
z
}
How does it relate to the <<- symbol, meant to make z accessible outside of the "implicit" loop that is sapply()?
In R, if you search up the documentation for the <<- operator using ?'<<-' it says that
The operators <<- and >>- are normally only used in function, and cause a search to be made through parent environments for an existing definition of the variable to be assigned. If such as variable is found (and its binding is not locked) then its value is redefined, otherwise assignment takes place in the global environment.
In the function convolve.binomial the value z is defined local to the function. So z <<- actually redefines z in the convolve.binomial function.
So to summarize, the z <<- in the sapply call changes the z variable already defined in convolve.binomial and we eventually return this z. The ;q} ending is not needed within sapply().
Related
I am working on a simulation for a pandemic data, I calculated MLE, which has a values for example 0.99. This is for SEIR Modelling, so I have a dataframe for S,E,I and R. Now I am running simulation for the same, but I want to replicate the simulation 100 times and then consider the average.
My simulation code, is the following.
### Pre-define VALUES
# 50 days
sumofnew_infec<-rep(0,50)
Snew<-rep(0,50)
Enew<-rep(0,50)
Inew<-rep(0,50)
Rnew<-rep(0,50)
Snew[1]<-Current_dayStats$St[1]
Inew[1]<-Current_dayStats$It[1]
Enew[1]<-Current_dayStats$Et[1]
Rnew[1]<-Current_dayStats$Rt[1]
E_I<-0
I_R<-0
### SIMULATION STARTS HERE
for(i in 1:49)
{
newinfections<-rbinom(n=Snew[i],size=1,prob=(1-MLE^Inew[i]))
sumofnew_infec[i]<-sum(newinfections)
Snew[i+1]<-Snew[i]-sumofnew_infec[i]
if(i>0)
{
E_I<-sum(sumofnew_infec[i])
#E_I<-0
I_R<-sum(Enew[i])
}
else
{
E_I<-sumofnew_infec[i]
I_R<-sum(Enew[i])
}
Enew[i+1]<-Enew[i]-sumofnew_infec[i]+E_I
Inew[i+1]<-Inew[i]+E_I-I_R
Rnew[i+1]<-I_R+Rnew[i]
}
sumofnew_infec
Snew
Enew
Inew
Rnew
I want to store the results in a matrix, for example
S = S_{i,j}
where S_{i,j} = S[i] = susceptibles on day i, in the jth simulation.
Then I can find the average of S_{i,1}, S_{i,2}, ..., S_{i,100} which would be the average model prediction for the number of susceptibles on day i. And finally I can plot all these averages to see the average susceptible process. This is the whole, I am trying to use replicate, creating above a function, but thats not working. Any help would be appreciated. Thanks in Advance.
EDIT :
I created the simulation in a function.
> do_once()
[1] 180 176 173 167 155 136 105 57 19 3 1 0 0 0 0 0 0 0 0 0 0 0 0
[24] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[47] 0 0 0 0
If you want to save data in matrix for multiple simulation, here is an example
dayNum <- 50
simNum <- 100
S <- matrix(dayNum*simNum,nrow = dayNum)
for (j in 1:simNum) {
for (i in 1:dayNum) {
S[i,j] <- runif(1)
}
}
When you want to calculate the average of S over simulations, you can use rowMeans
rowMeans(S)
Edit
sumofnew_infec_out <- c()
Snew_out <- c()
Enew_out <- c()
Inew_out <- c()
Rnew_out <- c()
for (k in 1:100) {
for (i in 1:49)
{
newinfections <- rbinom(n = Snew[i], size = 1, prob = (1 - MLE^Inew[i]))
sumofnew_infec[i] <- sum(newinfections)
Snew[i + 1] <- Snew[i] - sumofnew_infec[i]
if (i > 0) {
E_I <- sum(sumofnew_infec[i])
# E_I<-0
I_R <- sum(Enew[i])
}
else {
E_I <- sumofnew_infec[i]
I_R <- sum(Enew[i])
}
Enew[i + 1] <- Enew[i] - sumofnew_infec[i] + E_I
Inew[i + 1] <- Inew[i] + E_I - I_R
Rnew[i + 1] <- I_R + Rnew[i]
}
sumofnew_infec_out <- cbind(sumofnew_infec_out,sumofnew_infec)
Snew_out <- cbind(Snew_out,Snew)
Enew_out <- cbind(Enew_out,Enew)
Inew_out <- cbind(Inew_out,Inew)
Rnew_out <- cbind(Rnew_out,Rnew)
}
I know I can use expand.grid for this, but I am trying to learn actual programming. My goal is to take what I have below and use a recursion to get all 2^n binary sequences of length n.
I can do this for n = 1, but I don't understand how I would use the same function in a recursive way to get the answer for higher dimensions.
Here is for n = 1:
binseq <- function(n){
binmat <- matrix(nrow = 2^n, ncol = n)
r <- 0 #row counter
for (i in 0:1) {
r <- r + 1
binmat[r,] <- i
}
return(binmat)
}
I know I have to use probably a cbind in the return statement. My intuition says the return statement should be something like cbind(binseq(n-1), binseq(n)). But, honestly, I'm completely lost at this point.
The desired output should produce something like what expand.grid gives:
n = 5
expand.grid(replicate(n, 0:1, simplify = FALSE))
It should just be a matrix as binmat is being filled recursively.
As requested in a comment (below), here is a limited implementation for binary sequences only:
eg.binary <- function(n, digits=0:1) {
if (n <= 0) return(matrix(0,0,0))
if (n == 1) return(matrix(digits, 2))
x <- eg.binary(n-1)
rbind(cbind(digits[1], x), cbind(digits[2], x))
}
After taking care of an initial case that R cannot handle correctly, it treats the "base case" of n=1 and then recursively obtains all n-1-digit binary strings and prepends each digit to each of them. The digits are prepended so that the binary strings end up in their usual lexicographic order (the same as expand.grid).
Example:
eg.binary(3)
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 1
[3,] 0 1 0
[4,] 0 1 1
[5,] 1 0 0
[6,] 1 0 1
[7,] 1 1 0
[8,] 1 1 1
A general explanation (with a more flexible solution) follows.
Distill the problem down to the basic operation of tacking the values of an array y onto the rows of a dataframe X, associating a whole copy of X with each value (via cbind) and appending the whole lot (via rbind):
cross <- function(X, y) {
do.call("rbind", lapply(y, function(z) cbind(X, z)))
}
For example,
cross(data.frame(A=1:2, b=letters[1:2]), c("X","Y"))
A b z
1 1 a X
2 2 b X
3 1 a Y
4 2 b Y
(Let's worry about the column names later.)
The recursive solution for a list of such arrays y assumes you have already carried out these operations for all but the last element of the list. It has to start somewhere, which evidently consists of converting an array into a one-column data frame. Thus:
eg_ <- function(y) {
n <- length(y)
if (n <= 1) {
as.data.frame(y)
} else {
cross(eg_(y[-n]), y[[n]])
}
}
Why the funny name? Because we might want to do some post-processing, such as giving the result nice names. Here's a fuller implementation:
eg <- function(y) {
# (Define `eg_` here to keep it local to `eg` if you like)
X <- eg_(y)
names.default <- paste0("Var", seq.int(length(y)))
if (is.null(names(y))) {
colnames(X) <- names.default
} else {
colnames(X) <- ifelse(names(y)=="", names.default, names(y))
}
X
}
For example:
eg(replicate(3, 0:1, simplify=FALSE))
Var1 Var2 Var3
1 0 0 0
2 1 0 0
3 0 1 0
4 1 1 0
5 0 0 1
6 1 0 1
7 0 1 1
8 1 1 1
eg(list(0:1, B=2:3))
Var1 B
1 0 2
2 1 2
3 0 3
4 1 3
Apparently this was the desired recursive code:
binseq <- function(n){
if(n == 1){
binmat <- matrix(c(0,1), nrow = 2, ncol = 1)
}else if(n > 1){
A <- binseq(n-1)
B <- cbind(rep(0, nrow(A)), A)
C <- cbind(rep(1, nrow(A)), A)
binmat <- rbind(B,C)
}
return(binmat)
}
Basically for n = 1 we create a [0, 1] matrix. For every n there after we add a column of 0's to the original matrix, and, separately, a column of 1's. Then we rbind the two matrices to get the final product. So I get what the algorithm is doing, but I don't really understand what the recursion is doing. For example, I don't understand the step from n = 2 to n = 3 based on the algorithm.
I am looking for a way to solve - in R - a constrained optimisation problem of the form
min sum(x)
s.t. f(x) < k
where x is a binary variable (either 0 or 1) with lenght n, and f(x) is a function that depends on the entire x variable, and k is an integer constant. Thus, f(x) is not a set of n constraints to each value of x (such as sqrt(x)), but a constraint that is met based on the entire set of values of the binary variable x.
I have tried to use ompr R package with the following syntax
v < 1:10
result <- MILPModel() %>%
add_variable(x[i], i = 1:v, type = "binary") %>%
set_objective(sum_expr(x[i], i = 1:v), sense = "min") %>%
add_constraint(f(x) <= 60) %>%
solve_model(with_ROI(solver = "glpk"))
but it does not work, because I believe the package does not accept a global f(x) constraint.
Here is a solution with the rgenoud package.
library(rgenoud)
g <- function(x){
c(
ifelse(sd(x) > 0.2, 0, 1), # set the constraint (here sd(x)>0.2) in this way
sum(x) # the objective function (to minimize/maximize)
)
}
solution <- genoud(
g, lexical = 2,
nvars = 30,
starting.values = rep(0, 30),
Domains = cbind(rep(0,30), rep(1,30)),
data.type.int = TRUE)
solution$par # the values of x
## [1] 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sd(solution$par) # is the constraint satisfied ?
## [1] 0.2537081
solution$value
## [1] 0 2 ; 0 is the value of ifelse(sd(x)>0.2,0,1) and 2 is the value of sum(x)
See the Notes section in ?genoud to understand the lexical argument.
Apologies for the confusing title, but I don't know how to express my problem otherwise. In R, I have the following problem which I want to solve:
x <- seq(1,1, length.out=10)
y <- seq(0,0, length.out=10)
z <- c(x, y)
p <- c(1,0,1,1,0,0)
How can I remove vector p from vector z so that vector a new vector i now has three occurrences of 1 and three occurrences 0 less, so what do I have to do to arrive at the following result? In the solution, the order of 1's and 0's in z should not matter, they just might have been in a random order, plus there can be other numbers involved as well.
i
> 1 1 1 1 1 1 1 0 0 0 0 0 0 0
Thanks in advance!
Similar to #VincentGuillemot's answer, but in functional programming style. Uses purrr package:
i <- z
map(p, function(x) { i <<- i[-min(which(i == x))]})
i
> i
[1] 1 1 1 1 1 1 1 0 0 0 0 0 0 0
There might be numerous better ways to do it:
i <- z
for (val in p) {
if (val %in% i) {
i <- i[ - which(i==val)[1] ]
}
}
Another solution that I like better because it does not require a test (and thanks fo #Franck's suggestion):
for (val in p)
i <- i[ - match(val, i, nomatch = integer(0) ) ]
I have a randomly generated vector from a normal distribution with 50 elements
vector<-c(rnorm(50))
I want to change all negative values to 0 and positive values to 1
I used this function and indexing however then I do not get vector with 50 elements
vector[ vector< 0 ] <- 1
vector[ vector> 0 ] <- 0
How should I proceed?
Generate some data
x = rnorm(50)
then either
x = ifelse(x > 0, 1, 0)
or
x[x < 0] = 0
x[x > 0] = 1
Or even better
as.numeric (x>0)
However since the standard normal is symmetric about 0, why not simulate directly via
sample(0:1, 50, replace=TRUE)
The problem is that in the first query you replace all value smaller 0 by values larger zero
so the trick is to switch
vector[ vector< 0 ] <- 1
vector[ vector> 0 ] <- 0
into
vector[ vector> 0 ] <- 0
vector[ vector< 0 ] <- 1
Note that you are also slightly biased towards 0 but that should only be marginal