I am simulating some draws using random numbers. Unlikely, the generated numbers are not random as I would like. In fact, I obtain that there are some linear combinations.
In details, I have the following starting data:
start_vector = c(1,10,30,40,50,100) # length equal to 6
residual_of_model = 5
n = 1000 # Number of simulations
I try to simulate n observations from a random normal distribution for each of the start_vector elements, assuming it as a "random noise" to add to the original value (that is the one into start_vector):
out_vec <- matrix(NA, nrow = n, ncol = length(start_vector))
for (h_aux in 1:length(start_vector))
{
random_noise <- rnorm(n, 0, residual_of_model)
out_vec[,h_aux] <- as.numeric(start_vector[h_aux]) + random_noise
}
At this point, I obtain a matrix of size 6x1000. In theory, I assume all the columns and the rows in the matrix are linearly independent among them.
If I try to check it, using the findLinearCombos() function from the caret package I obtain that all the columns are indepent:
caret::findLinearCombos(out_vec)
If I try to evaluate the independence among the rows, using the following code:
caret::findLinearCombos(t(out_vec))
I obtain that all the rows from 7 to 1000 are a linear combination of the first 6 (the length of start_vector).
It is really strange in my opinion, I would like to not observe no dependencies at all since the rows are generated adding a random number using rnorm.
What am I missing? Is there some bug? Thanks in advance!
Objective: The overall objective of the problem is to calculate the confidence interval (CI) of various sample sizes (n=2,4..1024) of rnorm, 10,000 times and then count the number of times each one fails (this likely requires a counter and an if/else statement). Finally the results are to be plotted
I am trying to calculate CI of the means for several simulations of a sample sizes, however, I am first trying to break down the code for one specific sample size a = 8.
The problem I have is that I do not know how to generate a linear model for each row. Would anyone know how I can do this? Here is what I have so far:
a <- 8
n.sim.3 <- 10000
for ( i in a) {
r.mat <- matrix(rnorm(i*n.sim.3), nrow=n.sim.3, ncol = a)
lm.tmp <- apply(three.mat,1,lm(n.sim.3~1) # The lm command is where I'm stuck I don't think this is correct)
confint.tmp <- confint(lm.tmp)
I have produce a stochastic model of infection (parasitic worm), using a Gillespie SSA. The model used the "GillespieSSA"package (https://cran.r-project.org/web/packages/GillespieSSA/index.html).
In short the code models a population of discrete compartments. Movement between compartments is dependent on user defined rate equations. The SSA algorithm acts to calculate the number of events produced by each rate equation for a given timestep (tau) and updates the population accordingly, process repeats up to a given time point. The problem is, the number of events is assumed Poisson distributed (Poisson(rate[i]*tau)), thus produces an error when the rate is negative, including when population numbers become negative.
# Parameter Values
sir.parms <- c(deltaHinfinity=0.00299, CHi=0.00586, deltaH0=0.0854, aH=0.5,
muH=0.02, SigmaW=0.1, SigmaM =0.8, SigmaL=104, phi=1.15, f = 0.6674,
deltaVo=0.0166, CVo=0.0205, alphaVo=0.5968, beta=52, mbeta=7300 ,muV=52, g=0.0096, N=100)
# Inital Population Values
sir.x0 <- c(W=20,M=10,L=0.02)
# Rate Equations
sir.a <- c("((deltaH0+deltaHinfinity*CHi*mbeta*L)/(1+CHi*mbeta*L))*mbeta*L*N"
,"SigmaW*W*N", "muH*W*N", "((1/2)*phi*f)*W*N", "SigmaM*M*N", "muH*M*N",
"(deltaVo/(1+CVo*M))*beta*M*N", "SigmaL*L*N", "muV*L*N", "alphaVo*M*L*N", "(aH/g)*L*N")
# Population change for even
sir.nu <- matrix(c(+0.01,0,0,
-0.01,0,0,
-0.01,0,0,
0,+0.01,0,
0,-0.01,0,
0,-0.01,0,
0,0,+0.01/230,
0,0,-0.01/230,
0,0,-0.01/230,
0,0,-0.01/230,
0,0,-0.01/32),nrow=3,ncol=11,byrow=FALSE)
runs <- 10
set.seed(1)
# Data Frame of output
sir.out <- data.frame(time=numeric(),W=numeric(),M=numeric(),L=numeric())
# Multiple runs and combining data and SSA methods
for(i in 1:runs){
sim <- ssa(sir.x0,sir.a,sir.nu,sir.parms, method="ETL", tau=1/12, tf=140, simName="SIR")
sim.out <- data.frame(time=sim$data[,1],W=sim$data[,2],M=sim$data[,3],L=sim$data[,4])
sim.out$run <- i
sir.out <- rbind(sir.out,sim.out)
}
Thus, rates are computed and the model updates the population values for each time step, with the data store in a data frame, then attached together with previous runs. However, when levels of the population get very low events can occur such that the number of events that occurs reducing a population is greater than the number in the compartment. One method is to make the time step very small, however this greatly increases the length of the simulation very long.
My question is there a way to augment the code so that as the data is created/ calculated at each time step any values of population numbers that are negative are converted to 0?
I have tried working on this problem, but only seem to be able to come up with methods that alter the values once the simulation is complete, with the negative values still causing issues in the runs themselves.
E.g.
if (sir.out$L < 0) sir.out$L == 0
Any help would be appreciated
I believe the problem is the method you set ("ETL") in the ssa function. The ETL method will eventually produce negative numbers. You can try the "OTL" method, based on Efficient step size selection for the tau-leaping simulation method- in which there are a few more parameters that you can tweak, but the basic command is:
ssa(sir.x0,sir.a,sir.nu,sir.parms, method="OTL", tf=140, simName="SIR")
Or the direct method, which will not produce negative number whatsoever:
ssa(sir.x0,sir.a,sir.nu,sir.parms, method="D", tf=140, simName="SIR")
I am trying to generate random numbers for a simulation (the example below uses the uniform distribution for simplicity). Why would these two methods produce different average values (a: 503.2999, b: 497.5372) when sampled 10k times with the same seed number:
set.seed(2)
a <- runif(10000, 1, 999)
draw <- function(x) {
runif(1, 1, 999)
}
b <- sapply(1:10000, draw)
print(c(mean(a), mean(b)))
In my model, the random number for the first method would be referenced within a simulation using a[sim_number] while in the second instance, the runif function would be placed inside the simulation function itself. Is there a correct way of doing it?
For completeness, the answer is that you need to set the seed before each random draw if you want them to be the same.
I want to generate a process where in every step there is a realisation of a Poisson random variable, this realisation should be saved and then it should be realize the next Poisson random variable and add it to the sum of all realisations before. Furthermore there should be a chance that in every step this process stops. Hope that makes sense to you guys... Any thought is appreciated!
More compactly, pick a single geometrically distributed random number for the total number of steps achieved before stopping, then use cumsum to sum that many Poisson deviates:
stopping.prob <- 0.3 ## for example
lambda <- 3.5 ## for example
n <- rgeom(1,1-stopping.prob)+1 ## constant probability per step of stopping
cumsum(rpois(n,lambda))
You are very vague on the parameters of your simulation but how's this?
Lambda for random Poisson number.
lambda <- 5
This is the threshold value when the function exits.
th <- 0.999
Create a vector of length 1000.
bin <- numeric(1000)
Run the darn thing. It basically rolls a "dice" (values generated are between 0 and 1). If the values is below th, it returns a random Poisson number. If the value is above th (but not equal), the function stops.
for (i in 1:length(bin)) {
if (runif(1) < th) {
bin[i] <- rpois(1, lambda = lambda)
} else {
stop("didn't meet criterion, exiting")
}
}
Remove zeros if any.
bin <- bin[bin != 0]
You can use cumsum to cumulatively sum values.
cumsum(bin)