Calculating lm() within a loop - r

Objective: The overall objective of the problem is to calculate the confidence interval (CI) of various sample sizes (n=2,4..1024) of rnorm, 10,000 times and then count the number of times each one fails (this likely requires a counter and an if/else statement). Finally the results are to be plotted
I am trying to calculate CI of the means for several simulations of a sample sizes, however, I am first trying to break down the code for one specific sample size a = 8.
The problem I have is that I do not know how to generate a linear model for each row. Would anyone know how I can do this? Here is what I have so far:
a <- 8
n.sim.3 <- 10000
for ( i in a) {
r.mat <- matrix(rnorm(i*n.sim.3), nrow=n.sim.3, ncol = a)
lm.tmp <- apply(three.mat,1,lm(n.sim.3~1) # The lm command is where I'm stuck I don't think this is correct)
confint.tmp <- confint(lm.tmp)

Related

Can you make an argument for a function to be a random sample in R?

So, I'd like to test how precise is t-test for detecting a mean for various distributions. But I don't want to have to define the sampling distribution each time I run the function in the function. If I write function(data, mju) and then as data input rnorm(n) or any other random sample, I obviously get the same results when replicating the function, because I only have the one "data" sample, that was first inputted. To understand more clearly what I want, here is the code:
t_ci <- function(data,mju){
prod(t.test(data)$conf.int - mju)
}
set.seed(NULL)
prec_t <- function(data, n, N, mju){
sim <- replicate(N, t_ci(data, mju))
sim[sim<0]/N
}
The first function checks, whether the real theoretical parameter "mju" in in the confidence interval. The second one replicates the function t_ci N times, to see how precise the t test confidence intervals are for selected data. I'd like to have an option to just indicate the distribution and then it would generate n-sized samples N times and calculate the precision. But as far as my code goes, it only replicates the same data over and over. Maybe there is a solution for this problem?
Also, it seems that something is wrong with the function prec_t, because I'd like to have a count of times the t_ci produced negative outcome and then divide by N.
Any help would be greatly appreciated! Thanks in advance.

Speed up iterations of piecewise regressions with the segmented function in R

I am trying to fit about 300 piecewise regressions with the segmented function from the segmented package in R. This is taking a lot of time (~4days) because of the segmented function. I am already using all the cores of my computer, but I am not a programmer and I guess this code is probably not optimal. Can I improve the code below to make it run faster? How?
Here is a reproducible example. df is a simulated data frame that corresponds to one of the 300 datasets that I want to analyze. Each dataset is one day, and during each day I measure the temperature every 5 minutes, x is the temperature and y the time of the day. The figure below shows what my data look like. The pattern is very specific and repeatable across days and each change in slope corresponds to well understood biological mechanisms. This is why I can guess all the values of psi (for ex. time of sunrise and sunset).
Of course the real data are more variable and I use many iterations (about 200, here I reduced to 10 for the example) to increase my chances of getting a successful fit.
library(segmented)
y<-seq(1,288,1)
x<-c(seq(0,-30,-1),seq(-30,-54,-2),seq(-54,30,1),seq(30,10,-1),seq(10,90,1),seq(90,34,-1))
df<-data.frame(x,y)
head(df)
plot(x~y)
t1=31
t2=44
t3=129
t4=150
t5=231
iterations<-10
for (j in 1:iterations) {
res <- lm(formula=x~y,data=df)
try(result <- segmented(
res, seg.Z=~y, psi=c(t1,t2,t3,t4,t5),
control=seg.control(it.max=200, display=F, K=4, h=0.1, n.boot=100, random=T)))
}
result
Taking the lm out of the loop doesn't significantly improve the speed of the loop.
One thing that should help is to break out of the iterations once the result is found. In most cases it should find something on the first iteration and this will avoid running 200 unnecessary iterations.
rm(result)
for (j in 1:iterations) {
res <- lm(formula=x~y,data=df)
try(result <- segmented(
res, seg.Z=~y, psi=c(t1,t2,t3,t4,t5),
control=seg.control(it.max=200, display=F, K=4, h=0.1, n.boot=100, random=T)))
if (exists("result")) break
}

Preventing a Gillespie SSA Stochastic Model From Running Negative

I have produce a stochastic model of infection (parasitic worm), using a Gillespie SSA. The model used the "GillespieSSA"package (https://cran.r-project.org/web/packages/GillespieSSA/index.html).
In short the code models a population of discrete compartments. Movement between compartments is dependent on user defined rate equations. The SSA algorithm acts to calculate the number of events produced by each rate equation for a given timestep (tau) and updates the population accordingly, process repeats up to a given time point. The problem is, the number of events is assumed Poisson distributed (Poisson(rate[i]*tau)), thus produces an error when the rate is negative, including when population numbers become negative.
# Parameter Values
sir.parms <- c(deltaHinfinity=0.00299, CHi=0.00586, deltaH0=0.0854, aH=0.5,
muH=0.02, SigmaW=0.1, SigmaM =0.8, SigmaL=104, phi=1.15, f = 0.6674,
deltaVo=0.0166, CVo=0.0205, alphaVo=0.5968, beta=52, mbeta=7300 ,muV=52, g=0.0096, N=100)
# Inital Population Values
sir.x0 <- c(W=20,M=10,L=0.02)
# Rate Equations
sir.a <- c("((deltaH0+deltaHinfinity*CHi*mbeta*L)/(1+CHi*mbeta*L))*mbeta*L*N"
,"SigmaW*W*N", "muH*W*N", "((1/2)*phi*f)*W*N", "SigmaM*M*N", "muH*M*N",
"(deltaVo/(1+CVo*M))*beta*M*N", "SigmaL*L*N", "muV*L*N", "alphaVo*M*L*N", "(aH/g)*L*N")
# Population change for even
sir.nu <- matrix(c(+0.01,0,0,
-0.01,0,0,
-0.01,0,0,
0,+0.01,0,
0,-0.01,0,
0,-0.01,0,
0,0,+0.01/230,
0,0,-0.01/230,
0,0,-0.01/230,
0,0,-0.01/230,
0,0,-0.01/32),nrow=3,ncol=11,byrow=FALSE)
runs <- 10
set.seed(1)
# Data Frame of output
sir.out <- data.frame(time=numeric(),W=numeric(),M=numeric(),L=numeric())
# Multiple runs and combining data and SSA methods
for(i in 1:runs){
sim <- ssa(sir.x0,sir.a,sir.nu,sir.parms, method="ETL", tau=1/12, tf=140, simName="SIR")
sim.out <- data.frame(time=sim$data[,1],W=sim$data[,2],M=sim$data[,3],L=sim$data[,4])
sim.out$run <- i
sir.out <- rbind(sir.out,sim.out)
}
Thus, rates are computed and the model updates the population values for each time step, with the data store in a data frame, then attached together with previous runs. However, when levels of the population get very low events can occur such that the number of events that occurs reducing a population is greater than the number in the compartment. One method is to make the time step very small, however this greatly increases the length of the simulation very long.
My question is there a way to augment the code so that as the data is created/ calculated at each time step any values of population numbers that are negative are converted to 0?
I have tried working on this problem, but only seem to be able to come up with methods that alter the values once the simulation is complete, with the negative values still causing issues in the runs themselves.
E.g.
if (sir.out$L < 0) sir.out$L == 0
Any help would be appreciated
I believe the problem is the method you set ("ETL") in the ssa function. The ETL method will eventually produce negative numbers. You can try the "OTL" method, based on Efficient step size selection for the tau-leaping simulation method- in which there are a few more parameters that you can tweak, but the basic command is:
ssa(sir.x0,sir.a,sir.nu,sir.parms, method="OTL", tf=140, simName="SIR")
Or the direct method, which will not produce negative number whatsoever:
ssa(sir.x0,sir.a,sir.nu,sir.parms, method="D", tf=140, simName="SIR")

How to repeat 1000 times this random walk simulation in R? [duplicate]

This question already has an answer here:
How to repeat 1000 times this random walk simulation in R?
(1 answer)
Closed 2 years ago.
I'm simulating a one-dimensional and symmetric random walk procedure:
$$y_t=y_{t-1}+\varepsilon_t$$
where white noise is denoted by $\varepsilon_t \sim N(0,1)$ in time period $t$. There is no drift in this procedure.
Also, RW is symmetric, because $Pr(y_i=+1)=Pr(y_i=-1)=0.5$.
Here's my code in R:
set.seed(1)
t=1000
epsilon=sample(c(-1,1), t, replace = 1)
y<-c()
y[1]<-0
for (i in 2:t) {
y[i]<-y[i-1]+epsilon[i]
}
par(mfrow=c(1,2))
plot(1:t, y, type="l", main="Random walk")
outcomes <- sapply(1:1000, function(i) cumsum(y[i]))
hist(outcomes)
I would like to simulate 1000 different $y_{it}$ series (i=1,...,1000;t=1,...,1000). (After that, I will check the probability of getting back to the origin ($y_1=0$) at $t=3$, $t=5$ and $t=10$.
Which function does allow me to do this kind of repetition with $y_t$ random walk time-series?
Try the following:
length_of_time_series <- 1000
num_replications <- 1000
errors <- matrix(rnorm(length_of_time_series*num_replications),ncol=num_replications)
rw <- apply(errors, 2, cumsum)
This creates 1000 random walks simultaneously by first defining a matrix filled with white noise error terms drawn from a standard normal distribution, and in the second step I calculate the cumulative sums, which should correspond to your random walk, assuming that y_0=0.
Note that I have ignored that your errors are either -1 or 1, since I am not sure that this is what you intended. Having said that, you can adjust the above code easily with the following line to create errors that are either 1 or -1:
errors2 <- ifelse(errors > 0, 1, -1)
If you really want to go the way of doing it repeatedly as opposed to doing it simultaneously, you can define a function that returns the random walk, and use replicate. Note that you should not hardcode the seed inside the function to avoid replicating the same random walk all the time.

How do I get started with this?

So I am stuck on this problem for a long time.
I was think I should first create the two functions, like this:
n = runif(10000)
int sum = 0
estimator1_fun = function(n){
for(i in 1:10000){
sum = sum + ((n/i)*runif(1))
)
return (sum)
}
and do the same for the other function, and use the mse formula? Am I even approaching this correctly? I tried formatting it, but found that using an image would be better.
Assuming U(0,Theta_0) is the uniform distribution from 0 to Theta_0, and that Theta_0 is a fixed constant, I would proceed as follows:
1. Define Theta_0. Give it a fixed value.
2. Write the function that gives a random number from that distribution
- The distribution function is runif(0,Theta_0).
- Arguments could be Theta_0 and N.
3. Sample it a few thousand (or whatever) times into a vector X.
4. Calculate the two estimates.
5. Repeat steps 3 & 4 for more samples
6. Plot the two estimates against the number of samples and
see if it is approaching Theta_0

Resources