Write output of R loop to output file or indicate progress? - r

I'm running quite a complex loop in R that takes ~ 20-30 days to run. The details of the loop might not be too relevant (it's the Viterbi algorithm to decode the most likely state sequence for post-processing the output of a hidden Markov model). So far, once it has been successively going through all the loops, I was writing the results to a data file.
saveRDS(z_star, file = "z_star.Rda")
However, this is quite frustrating, as I cannot see how far the loops have already made it and I fear that in the end something could happen that ruins all the computation progress (e.g., the resulting data frame could be larger than the RAM of the server I'm using).
My question is if I should rather 1) write to an output file with every loop, or 2) implement some kind of progress tracking. For both options I have no concrete idea on how to implement it and would therefore be very grateful for code.
N_draws = 2000
N = 100000
z_star = matrix(nrow = N_draws, ncol = N)
best_logp = data.frame(matrix(nrow = N, ncol = S))
back_ptr = best_logp
for(d in 1:N_draws){
for (k in 1:K)
best_logp[1, k] = dnorm(y[1],
mean = mu[k],
sd = sigma_q[d],
log = TRUE)
for (t in 2:N) {
for (k in 1:K) {
best_logp[t, k] = -Inf
for (j in 1:K) {
real logp;
logp = best_logp[t-1, j] + log(theta[j, k]) + dnorm(y[t],
mean = mu[k],
sd = sigma_q[d],
log = TRUE)
if (logp > best_logp[t, k]) {
back_ptr[t, k] = j;
best_logp[t, k] = logp;
}
}
}
}
log_p_z_star = max(best_logp[N]);
for (k in 1:K)
if (best_logp[N, k] == log_p_z_star)
z_star[N] = k;
for (t in 1:(N - 1))
z_star[N - t] = back_ptr[N - t + 1, z_star[N - t + 1]];
}

Thanks for your thoughts, I decided to do something simple:
I used the index and the system time to indicate the progress using
for(d in 1:N_draws){
old <- Sys.time()
cat(c("draw", d, "out of", N_draws, "\n"))
and before the last bracket closes
new = Sys.time() - old
cat(c("time for draw", d, ":", new, "seconds", "\n"))
percentage_completed = d/N_draws * 100
cat(c("percentage completed:", percentage_completed, "percent", "\n"))
}
returning
draw 1 out of 2000
time for draw 1 : 0.839755058288574 seconds
percentage completed: 0.05 percent
draw 2 out of 2000
time for draw 2 : 0.735047101974487 seconds
percentage completed: 0.1 percent
draw 3 out of 2000

Related

Renewal Function for Weibull Distribution

The renewal function for Weibull distribution m(t) with t = 10 is given as below.
I want to find the value of m(t). I wrote the following r code to compute m(t)
last_term = NULL
gamma_k = NULL
n = 50
for(k in 1:n){
gamma_k[k] = gamma(2*k + 1)/factorial(k)
}
for(j in 1: (n-1)){
prev = gamma_k[n-j]
last_term[j] = gamma(2*j + 1)/factorial(j)*prev
}
final_term = NULL
find_value = function(n){
for(i in 2:n){
final_term[i] = gamma_k[i] - sum(last_term[1:(i-1)])
}
return(final_term)
}
all_k = find_value(n)
af_sum = NULL
m_t = function(t){
for(k in 1:n){
af_sum[k] = (-1)^(k-1) * all_k[k] * t^(2*k)/gamma(2*k + 1)
}
return(sum(na.omit(af_sum)))
}
m_t(20)
The output is m(t) = 2.670408e+93. Does my iteratvie procedure correct? Thanks.
I don't think it will work. First, lets move Γ(2k+1) from denominator of m(t) into Ak. Thus, Ak will behave roughly as 1/k!.
In the nominator of the m(t) terms there is t2k, so roughly speaking you're computing sum with terms
100k/k!
From Stirling formula
k! ~ kk, making terms
(100/k)k
so yes, they will start to decrease and converge to something but after 100th term
Anyway, here is the code, you could try to improve it, but it breaks at k~70
N <- 20
A <- rep(0, N)
# compute A_k/gamma(2k+1) terms
ps <- 0.0 # previous sum
A[1] = 1.0
for(k in 2:N) {
ps <- ps + A[k-1]*gamma(2*(k-1) + 1)/factorial(k-1)
A[k] <- 1.0/factorial(k) - ps/gamma(2*k+1)
}
print(A)
t <- 10.0
t2 <- t*t
r <- 0.0
for(k in 1:N){
r <- r + (-t2)^k*A[k]
}
print(-r)
UPDATE
Ok, I calculated Ak as in your question, got the same answer. I want to estimate terms Ak/Γ(2k+1) from m(t), I believe it will be pretty much dominated by 1/k! term. To do that I made another array k!*Ak/Γ(2k+1), and it should be close to one.
Code
N <- 20
A <- rep(0.0, N)
psum <- function( pA, k ) {
ps <- 0.0
if (k >= 2) {
jmax <- k - 1
for(j in 1:jmax) {
ps <- ps + (gamma(2*j+1)/factorial(j))*pA[k-j]
}
}
ps
}
# compute A_k/gamma(2k+1) terms
A[1] = gamma(3)
for(k in 2:N) {
A[k] <- gamma(2*k+1)/factorial(k) - psum(A, k)
}
print(A)
B <- rep(0.0, N)
for(k in 1:N) {
B[k] <- (A[k]/gamma(2*k+1))*factorial(k)
}
print(B)
shows that
I got the same Ak values as you did.
Bk is indeed very close to 1
It means that term Ak/Γ(2k+1) could be replaced by 1/k! to get quick estimate of what we might get (with replacement)
m(t) ~= - Sum(k=1, k=Infinity) (-1)k (t2)k / k! = 1 - Sum(k=0, k=Infinity) (-t2)k / k!
This is actually well-known sum and it is equal to exp() with negative argument (well, you have to add term for k=0)
m(t) ~= 1 - exp(-t2)
Conclusions
Approximate value is positive. Probably will stay positive after all, Ak/Γ(2k+1) is a bit different from 1/k!.
We're talking about 1 - exp(-100), which is 1-3.72*10-44! And we're trying to compute it precisely summing and subtracting values on the order of 10100 or even higher. Even with MPFR I don't think this is possible.
Another approach is needed
OK, so I ended up going down a pretty different road on this. I have implemented a simple discretization of the integral equation which defines the renewal function:
m(t) = F(t) + integrate (m(t - s)*f(s), s, 0, t)
The integral is approximated with the rectangle rule. Approximating the integral for different values of t gives a system of linear equations. I wrote a function to generate the equations and extract a matrix of coefficients from it. After looking at some examples, I guessed a rule to define the coefficients directly and used that to generate solutions for some examples. In particular I tried shape = 2, t = 10, as in OP's example, with step = 0.1 (so 101 equations).
I found that the result agrees pretty well with an approximate result which I found in a paper (Baxter et al., cited in the code). Since the renewal function is the expected number of events, for large t it is approximately equal to t/mu where mu is the mean time between events; this is a handy way to know if we're anywhere in the neighborhood.
I was working with Maxima (http://maxima.sourceforge.net), which is not efficient for numerical stuff, but which makes it very easy to experiment with different aspects. At this point it would be straightforward to port the final, numerical stuff to another language such as Python.
Thanks to OP for suggesting the problem, and S. Pappadeux for insightful discussions. Here is the plot I got comparing the discretized approximation (red) with the approximation for large t (blue). Trying some examples with different step sizes, I saw that the values tend to increase a little as step size gets smaller, so I think the red line is probably a little low, and the blue line might be more nearly correct.
Here is my Maxima code:
/* discretize weibull renewal function and formulate system of linear equations
* copyright 2020 by Robert Dodier
* I release this work under terms of the GNU General Public License
*
* This is a program for Maxima, a computer algebra system.
* http://maxima.sourceforge.net/
*/
"Definition of the renewal function m(t):" $
renewal_eq: m(t) = F(t) + 'integrate (m(t - s)*f(s), s, 0, t);
"Approximate integral equation with rectangle rule:" $
discretize_renewal (delta_t, k) :=
if equal(k, 0)
then m(0) = F(0)
else m(k*delta_t) = F(k*delta_t)
+ m(k*delta_t)*f(0)*(delta_t / 2)
+ sum (m((k - j)*delta_t)*f(j*delta_t)*delta_t, j, 1, k - 1)
+ m(0)*f(k*delta_t)*(delta_t / 2);
make_eqs (n, delta_t) :=
makelist (discretize_renewal (delta_t, k), k, 0, n);
make_vars (n, delta_t) :=
makelist (m(k*delta_t), k, 0, n);
"Discretized integral equation and variables for n = 4, delta_t = 1/2:" $
make_eqs (4, 1/2);
make_vars (4, 1/2);
make_eqs_vars (n, delta_t) :=
[make_eqs (n, delta_t), make_vars (n, delta_t)];
load (distrib);
subst_pdf_cdf (shape, scale, e) :=
subst ([f = lambda ([x], pdf_weibull (x, shape, scale)), F = lambda ([x], cdf_weibull (x, shape, scale))], e);
matrix_from (eqs, vars) :=
(augcoefmatrix (eqs, vars),
[submatrix (%%, length(%%) + 1), - col (%%, length(%%) + 1)]);
"Subsitute Weibull pdf and cdf for shape = 2 into discretized equation:" $
apply (matrix_from, make_eqs_vars (4, 1/2));
subst_pdf_cdf (2, 1, %);
"Just the right-hand side matrix:" $
rhs_matrix_from (eqs, vars) :=
(map (rhs, eqs),
augcoefmatrix (%%, vars),
[submatrix (%%, length(%%) + 1), col (%%, length(%%) + 1)]);
"Generate the right-hand side matrix, instead of extracting it from equations:" $
generate_rhs_matrix (n, delta_t) :=
[delta_t * genmatrix (lambda ([i, j], if i = 1 and j = 1 then 0
elseif j > i then 0
elseif j = i then f(0)/2
elseif j = 1 then f(delta_t*(i - 1))/2
else f(delta_t*(i - j))), n + 1, n + 1),
transpose (makelist (F(k*delta_t), k, 0, n))];
"Generate numerical right-hand side matrix, skipping over formulas:" $
generate_rhs_matrix_numerical (shape, scale, n, delta_t) :=
block ([f, F, numer: true], local (f, F),
f: lambda ([x], pdf_weibull (x, shape, scale)),
F: lambda ([x], cdf_weibull (x, shape, scale)),
[genmatrix (lambda ([i, j], delta_t * if i = 1 and j = 1 then 0
elseif j > i then 0
elseif j = i then f(0)/2
elseif j = 1 then f(delta_t*(i - 1))/2
else f(delta_t*(i - j))), n + 1, n + 1),
transpose (makelist (F(k*delta_t), k, 0, n))]);
"Solve approximate integral equation (shape = 3, t = 1) via LU decomposition:" $
fpprintprec: 4 $
n: 20 $
t: 1;
[AA, bb]: generate_rhs_matrix_numerical (3, 1, n, t/n);
xx_by_lu: linsolve_by_lu (ident(n + 1) - AA, bb, floatfield);
"Iterative solution of approximate integral equation (shape = 3, t = 1):" $
xx: bb;
for i thru 10 do xx: AA . xx + bb;
xx - (AA.xx + bb);
xx_iterative: xx;
"Should find iterative and LU give same result:" $
xx_diff: xx_iterative - xx_by_lu[1];
sqrt (transpose(xx_diff) . xx_diff);
"Try shape = 2, t = 10:" $
n: 100 $
t: 10 $
[AA, bb]: generate_rhs_matrix_numerical (2, 1, n, t/n);
xx_by_lu: linsolve_by_lu (ident(n + 1) - AA, bb, floatfield);
"Baxter, et al., Eq. 3 (for large values of t) compared to discretization:" $
/* L.A. Baxter, E.M. Scheuer, D.J. McConalogue, W.R. Blischke.
* "On the Tabulation of the Renewal Function,"
* Econometrics, vol. 24, no. 2 (May 1982).
* H(t) is their notation for the renewal function.
*/
H(t) := t/mu + sigma^2/(2*mu^2) - 1/2;
tx_points: makelist ([float (k/n*t), xx_by_lu[1][k, 1]], k, 1, n);
plot2d ([H(u), [discrete, tx_points]], [u, 0, t]), mu = mean_weibull(2, 1), sigma = std_weibull(2, 1);

How to optimize multiple arrays in R?

I need minimize a funtion based in Hsieh Model, using R language. The main objective is to minimize a distance function that depends on a set of other functions.
obj = function(x1){
s = sf()
h_til = h_tilf()
w_til = w_tilf(x1)
w_r = w_rf()
p_ir = p_irf()
#H_tr = H_trf(x1)
W = Wf(x1)
f1 = matrix(0, i, r)
f2 = matrix(0, i, r)
for (c in 1:i){
for (j in 1:r){
f1[c, j] = ( (W[c, j] - W_t[c, j]) / W_t[c, j] ) ** 2
f2[c, j] = ( (p_ir[c, j] - p_t[c, j]) / p_t[c, j] ) ** 2
}
}
d1 = sum(f1)
d2 = sum(f2)
D = d1 + d2
return(D)
}
Therefore, my algorithm must find three parameters (w, tau_w, tau_h) that minimize this distance function. These three parameters are arrays with i rows and r columns. Given by:
w = runif(i*r, 0, 1)
tau_w = runif(i*r, -1, 1)
tau_h = runif(i*r, -1, 1)
x1 = array( c(tau_w, tau_h, w), dim = c(i, r, 3))
I trying solve this using optimx and Rsolnp libraries.
res = optim(x1, #starting values
obj) #function to optimise
But i get this error:
Error in x1[c, j, 1] : incorrect number of dimensions
This minimization is usually done using the Nelder-Mead algorithm.
I'm beginner in optimization and apreciate any help. My complete code is here.
The dimensions of the array x1 are lost when you do optim(x1, obj). So the error you get is returned by w_tilf(x1) because it involves x1[c,j,1].
Reconstruct the array at the beginning of the obj function:
obj = function(x1){
x1 = array(x1, dim = c(i, r, 3))
s = sf()
......
}
Then opt <- optim(x1, obj) should work now. It will return the solution in the opt$par field as a vector, you will have to do array(opt$par, dim = c(i, r, 3)) to get an array.

For Loop in R replacing Object Values at each iteration

I am struggling to figure out how to create a for loop in which some initial objects (u, l, h, and y) and their values are updated and reported at the end of each iteration of the loop. And that the loop takes into account the values of the prior iteration as the basis (for example after updating the above objects, the runif function takes the updated values of u and l in drawing a q. I keep getting the same result repeated with no variation, and I am unsure as to what might be the best way to resolve this.
Apologies in advance as I am fairly new to R and coding in general.
reset = {
l = 0.1 #lower bound of belief in theta
u = 0.9 #upper bound of belief in theta
h = 0.2 #lower legal threshold, below which an action is not liable
y = 0.8 #upper legal threshold, above which an action is liable
}
### need 1-u <= h <= y <= 1-l for each t along every path of play
period = c(1:100) ## Number of periods in the iteration of the loop.
for (t in 1:length(period)) {
q = runif(1,min = l, max = u) ### 1 draw of q from a uniform distribution
q
probg = function(q,l,u){(u - (1-q))/(u-l)} ### probability of being found guilty given q in the ambiguous region
probg(q,l,u)
probi = function(q,l,u){1-probg(q,l,u)} ### probability of being found innocent given q in the ambiguous region
probi(q,l,u)
ruling = if(q>=y | probg(q,l,u) > 1){print("Guilty") ###Strict liability
} else if(q<=h | probi(q,l,u) > 1) {print("Innocent") ###Permissible
} else if(q>h & q<y) { ###Ambiguous region
discovery = sample(c('guilty','not guilty'), size=1, replace=TRUE, prob=c(probg(q,l,u),probi(q,l,u))) ### court discovering whether a particular ambiguous q is permissible or not
}
discovery
ruling
if(ruling == "not guilty") {u = 1-q} else if (ruling == "guilty") {l = 1-q} else (print("beliefs unchanged"))
if(ruling == "not guilty"){h = 1 - u} else if (ruling == "guilty") {y = 1 - l} else (print("legal threshold unchanged")) #### legal adjustment and updating of beliefs in ambiguous region after discovery of liability
probg(q,l,u)
probi(q,l,u)
modelparam = c(l,u,h,y)
show(modelparam)
}

Newton's Method in R Precision/Output

So, I'm supposed to write the code to execute Newton's Method to calculate the square root of any arbitrary number to a specified precision (tolerance).
Here is my code:
MySqrt <- function(x, eps = 1e-6, itmax = 100, verbose = TRUE) {
GUESS <- 11
myvector <- integer(0)
i <- 1
if (x < 0) {
stop("Square root of negative value")
}
else {
myvector[i] <- GUESS
while (i <= itmax) {
GUESS <- (GUESS + (x/GUESS)) * 0.5
myvector[i+1] <- GUESS
if (abs(GUESS-myvector[i]) < eps) {
break()
}
if (verbose) {
cat("Iteration: ", formatC(i, width = 1), formatC(GUESS, digits = 10, width = 12), "\n")
}
i <- i + 1
}
}
myvector[i]
}
eps is the tolerance. When I use the function to calculate the square root of, say, 21, I got this as an output:
> MySqrt(21, eps = 1e-1, verbose = TRUE)
Iteration: 1 6.454545455
Iteration: 2 4.854033291
Iteration: 3 4.59016621
I'm not sure if the function stops carrying out iterations when it is supposed to, however. Can someone verify if my code is correct? This would be greatly appreciated!
Your code is almost correct. It is iterating the correct number of times. The only bug is that you don't increment i until after the break statement, so you are not returning the most recent approximation. Instead you are returning the previous one.
In order to verify that it is stopping at the right time, you can move the tracing line up above the break. You can also add GUESS-myvector[i] to the trace, so you can watch it halt as soon as the difference gets small enough. If you do this and run the function, the fact that it is stopping at the right time, as well as the fact that it is returning the wrong value, will be obvious:
> MySqrt(21,eps=1e-1)
Iteration: 1 6.454545 -4.545455
Iteration: 2 4.854033 -1.600512
Iteration: 3 4.590166 -0.2638671
Iteration: 4 4.582582 -0.007584239
[1] 4.590166
While your code is (almost) correct, it is not written in very good R style. For example, unless you want to return the entire vector of estimates, there is no reason that you need to keep them all around. Also, rather than using a while loop, here it would make more sense to use a for loop. Here one possible improved version of your function:
MySqrt <- function(x, eps = 1e-6, itmax = 100, verbose = TRUE) {
GUESS <- 11
if (x < 0) {
stop("Square root of negative value")
}
for(i in 1:itmax){
nextGUESS <- (GUESS + (x/GUESS)) * 0.5
if (verbose)
cat("Iteration: ", i, nextGUESS, nextGUESS-GUESS, "\n")
if (abs(GUESS-nextGUESS) < eps)
break
GUESS<- nextGUESS
}
nextGUESS
}

translating matlab script to R

I've just been working though converting some MATLAB scripts to work in R, however having never used MATLAB in my life, and not exactly being an expert on R I'm having some trouble.
Edit: It's a script I was given designed to correct temperature measurements for lag generated by insulation mass effects. My understanding is that It looks at the rate of change of the temperature and attempts to adjust for errors generated by the response time of the sensor. Unfortunately there is no literature available to me to give me an indication of the numbers i am expecting from the function, and the only way to find out will be to experimentally test it at a later date.
the original script:
function [Tc, dT] = CTD_TempTimelagCorrection(T0,Tau,t)
N1 = Tau/t;
Tc = T0;
N = 3;
for j=ceil(N/2):numel(T0)-ceil(N/2)
A = nan(N,1);
# Compute weights
for k=1:N
A(k) = (1/N) + N1 * ((12*k - (6*(N+1))) / (N*(N^2 - 1)));
end
A = A./sum(A);
# Verify unity
if sum(A) ~= 1
disp('Error: Sum of weights is not unity');
end
Comp = nan(N,1);
# Compute components
for k=1:N
Comp(k) = A(k)*T0(j - (ceil(N/2)) + k);
end
Tc(j) = sum(Comp);
dT = Tc - T0;
end
where I've managed to get to:
CTD_TempTimelagCorrection <- function(temp,Tau,t){
## Define which equation to use based on duration of lag and frequency
## With ESM2 profiler sampling # 2hz: N1>tau/t = TRUE
N1 = Tau/t
Tc = temp
N = 3
for(i in ceiling(N/2):length(temp)-ceiling(N/2)){
A = matrix(nrow=N,ncol=1)
# Compute weights
for(k in 1:N){
A[k] = (1/N) + N1 * ((12*k - (6*(N+1))) / (N*(N^2 - 1)))
}
A = A/sum(A)
# Verify unity
if(sum(A) != 1){
print("Error: Sum of weights is not unity")
}
Comp = matrix(nrow=N,ncol=1)
# Compute components
for(k in 1:N){
Comp[k] = A[k]*temp[i - (ceiling(N/2)) + k]
}
Tc[i] = sum(Comp)
dT = Tc - temp
}
return(dT)
}
I think the problem is the Comp[k] line, could someone point out what I've done wrong? I'm not sure I can select the elements of the array in such a way.
by the way, Tau = 1, t = 0.5 and temp (or T0) will be a vector.
Thanks
edit: apparently my description is too brief in explaining my code samples, not really sure what more I could write that would be relevant and not just wasting peoples time. Is this enough Mr Filter?
The error is as follows:
Error in Comp[k] = A[k] * temp[i - (ceiling(N/2)) + k] :
replacement has length zero
In addition: Warning message:
In Comp[k] = A[k] * temp[i - (ceiling(N/2)) + k] :
number of items to replace is not a multiple of replacement length
If you write print(i - (ceiling(N/2)) + k) before that line, you will see that you are using incorrect indices for temp[i - (ceiling(N/2)) + k], which means that nothing is returned to be inserted into Comp[k]. I assume this problem is due to Matlab allowing the use of 0 as an index and not R, and the way negative indices are handled (they don't work the same in both languages). You need to implement a fix to return the correct indices.

Resources