Get out of infinite while loop - r

What is the best way to have a while loop recognize when it is stuck in an infinite loop in R?
Here's my situation:
diff_val = Inf
last_val = 0
while(diff_val > 0.1){
### calculate val from data subset that is greater than the previous iteration's val
val = foo(subset(data, col1 > last_val))
diff_val = abs(val - last_val) ### how much did this change val?
last_val = val ### set last_val for the next iteration
}
The goal is to have val get progressively closer and closer to a stable value, and when val is within 0.1 of the val from the last iteration, then it is deemed sufficiently stable and is released from the while loop. My problem is that with some data sets, val gets stuck alternating back and forth between two values. For example, iterating back and forth between 27.0 and 27.7. Thus, it never stabilizes. How can I break the while loop if this occurs?
I know of break but do not know how to tell the loop when to use it. I imagine holding onto the value from two iterations before would work, but I do not know of a way to keep values two iterations ago...
while(diff_val > 0.1){
val = foo(subset(data, col1 > last_val))
diff_val = abs(val - last_val)
last_val = val
if(val == val_2_iterations_ago) break
}
How can I create val_2_iterations_ago?
Apologies for the non-reproducible code. The real foo() and data that are needed to replicate the situation are not mine to share... they aren't key to figuring out this issue with control flow, though.

I don't know if just keeping track of the previous two iterations will actually suffice, but it isn't too much trouble to add logic for this.
The logic is that at each iteration, the second to last value becomes the last value, the last value becomes the current value, and the current value is derived from foo(). Consider this code:
while (diff_val > 0.1) {
val <- foo(subset(data, col1 > last_val))
if (val == val_2_iterations_ago) break
diff_val = abs(val - last_val)
val_2_iterations_ago <- last_val
last_val <- val
}

Another approach, perhaps a little more general, would be to track your iterations and set a maximum.
Pairing this with Tim's nice answer:
iter = 0
max_iter = 1e6
while (diff_val > 0.1 & iter < max_iter) {
val <- foo(subset(data, col1 > last_val))
if (val == val_2_iterations_ago) break
diff_val = abs(val - last_val)
val_2_iterations_ago <- last_val
last_val <- val
iter = iter + 1
}

How this is generally done is that you have:
A convergence tolerance, so that when your objective function doesn't change appreciably, the algorithm is deemed to have converged
A limit on the number of iterations, so that the code is guaranteed to terminate eventually
A check that the objective function is actually decreasing, to catch the situation where it's diverging/cyclic (many optimisation algorithms are designed so this shouldn't happen, but in your case it does happen)
Pseudocode:
oldVal <- Inf
for(i in 1:NITERS)
{
val <- objective(x)
diffVal <- val - oldVal
converged <- (diffVal <= 0 && abs(diffVal) < TOL)
if(converged || diffVal > 0)
break
oldVal <- val
}

Related

R dataframe uses values in current row from previous row

I have a data frame in R as defined below:
df <- data.frame('ID'=c(1,1,1,1),
'Month' =c('M1','M2','M3','M4'),
"Initial.Balance" =c(100,100,100,0),
"Value" = c(0.1,0.2,0.2,0.2),
"Threshold"=c(0.05,0.18,0.25,0.25),
"Intermediate.Balance"=c(0,0,100,0),
"Final.Balance"=c(100,100,0,0))
This task uses Initial.Balance (in current row) from the Final.Balance of the previous row.
When Value >= Threshold, Intermediate.Balance=0 and Final.Balance = Initial.Balance-Intermediate.Balance
When Value < Threshold, Intermediate.Balance = Initial.Balance and Final.Balance = Initial.Balance-Intermediate.Balance
I have tried to accomplish this task using for loop but it takes lot of time on large dataset (for many IDs)
Here is my solution:
for (i in 1:nrow(df)){
df$Intermediate.Balance[i] <- ifelse(df$Value[i]>df$Threshold[i],0,df$Initial.balance[i])
df$Final.Balance[i] <- df$Initial.balance[i]-df$Intermediate.Balance[i]
if(i+1<=nrow(df)){
df$Initial.balance[i+1] <- df$Final.Balance[i] }
}
Can we look for similar solution using Data Table? As data table operations are quicker than for loop on dataframe, I believe this will help me save computation time.
Thanks,
I think in this particular case, final balance goes to 0 once there is a row with Value less than Threshold and subsequent balances all go to 0. So you can use this:
ib <- 100
df[, InitBal := ib * 0^shift(cumsum(Value<=Threshold), fill=0L)]
df[, ItmdBal := replace(rep(0, .N), which(Value<=Threshold)[1L], ib)]
df[, FinlBal := InitBal - ItmdBal]
or in one []:
df[, c("InitBal", "ItmdBal", "FinlBal") := {
v <- Value<=Threshold
InitBal <- ib * 0^shift(cumsum(v), fill=0L)
ItmdBal <- replace(rep(0, .N), which(v)[1L], ib)
.(InitBal, ItmdBal, InitBal - ItmdBal)
}]
Or a more general approach using Rcpp when the intermediate balance is not simply equal to the initial balance:
library(Rcpp)
cppFunction('List calc(NumericVector Value, NumericVector Threshold, double init) {
int n = Value.size();
NumericVector InitialBalance(n), IntermediateBalance(n), FinalBalance(n);
InitialBalance[0] = init;
for (int i=0; i<n; i++) {
if (Value[i] <= Threshold[i]) {
IntermediateBalance[i] = InitialBalance[i];
}
FinalBalance[i] = InitialBalance[i] - IntermediateBalance[i];
if (i < n-1) {
InitialBalance[i+1] = FinalBalance[i];
}
}
return List::create(Named("InitialBalance") = InitialBalance,
Named("IntermediateBalance") = IntermediateBalance,
Named("FinalBalance") = FinalBalance);
}')
setDT(df)[, calc(Value, Threshold, Initial.Balance[1L])]
I can't see an obvious way of getting rid of the loop since each row is deterministic into the next. That being said, data.frames copy the whole frame or at least whole columns whenever you set some portion of them. As such you can do this:
dt<-as.data.table(df)
for(i in 1:nrow(dt)) {
dt[i,Intermediate.Balance:=ifelse(Value>Threshold,0,Initial.Balance)]
dt[i,Final.Balance:=Initial.Balance-Intermediate.Balance]
if(i+1<=nrow(dt)) dt[i+1,Initial.Balance:=dt[i,Final.Balance]]
}
You could also try the set function but I'm not sure if it'll be faster, or by how much, given that the data comes from the data.table anyway.
dt<-as.data.table(df)
for(i in 1:nrow(dt)) {
i<-as.integer(i)
set(dt,i,"Intermediate.Balance", ifelse(dt[i,Value]>dt[i,Threshold],0,dt[i,Initial.Balance]))
set(dt,i,"Final.Balance", dt[i,Initial.Balance-Intermediate.Balance])
if(i+1<=nrow(dt)) set(dt,i+1L,"Initial.Balance", dt[i,Final.Balance])
}

Error message in Bubble sort code in R language

I did some programming work on R language to do the bubble sort. Sometimes it works perfectly without any error message, but sometimes, it shows "Error in if (x[i] > x[i + 1]) { : argument is of length zero". Can any one help me check whats wrong with it? I have attached my code below
example <- function(x) {
n <- length(x)
repeat {
hasChanged <- FALSE
n <- n - 1
for(i in 1:n) {
if ( x[i] > x[i+1] ) {
temp <- x[i]
x[i] <- x[i+1]
x[i+1] <- temp
hasChanged <- TRUE
cat("The current Vector is", x ,"\n")
}
}
if ( !hasChanged ) break;
}
}
x <-sample(1:10,5)
cat("The original Vector is", x ,"\n")
example(x)
The error occurs because you are iteratively decreasing n. Depending on the original vector's order (or lack thereof), n can reach the value of 1 after the last change. In that case, a further reduction of n in the next iteration step addresses the value x[0], which is undefined.
With a minimal correction your code will work properly, without giving error messages. Try to replace the line
if ( !hasChanged ) break;
with
if ( !hasChanged | n==1 ) break
Basically you have two termination criteria: Either nothing has been changed in the previous iteration or n is equal to one. In both cases, a further iteration won't change the vector since it is already ordered.
By the way, in R programming you don't need a semicolon at the end of a command. It is tolerated/ignored by the interpreter, but it clutters the code and is not considered good programming style.
Hope this helps.

Loop will not execute in R

I have a loop I want to execute that depends on the output of the previous loop in the code. This is the code;
holder <- list()
if (i < historyLength) movement <- movementType(relAngle, angleThreshold)
else if (i > historyLength-1) {
# Array to store speeds
speedHistory <- array(historyLength)
n = historyLength-1
# get the speeds from the previous n (hisoryLength) "Movements"
for (j in seq(1, length(historyLength))){
speedHistory [n] = R[i-j, 6]
n-1
}
if (!bayesFilter(speedHistory, minSpeed, GPS_accy)) movement <- "non-moving"
else if(bayesFilter(speedHistory, minSpeed, GPS_accy)) movement <- movementType(relAngle, angleThreshold)
}
holder [[i]] <- (movement)
for (t in seq(1, length(holder))){
if (t == t-1)
changes <- 0
else if (t != t-1)
changes <- 1
}
You cannot see the beginning of loop but it results in a column of data called 'movements.'
I have attempted to temporarily store the 'movements' in the object 'holder.' What i want then is for the bottom for loop to go through 'holder' and label changes as either 0 or 1 in another column. Basically if the next 'movement' is not equal to the previous record the change as 0 and so forth. I think the problem is with the object 'holder' perhaps?
Currently I'm getting it to loop but it's only printing out a column of '1's.'
Any help much appreciated! Thanks.
Currently get the following output:
Movement Changes
left 1
right 1
forward 1
non-moving 1
non-moving 1
Think the problem lies in the list where movements are stored? Sorry, if I knew where the problem was I'd be more specific. Really new to this!
I end up with a data frame with column headers "Distance" "Speed" "Heading" "Movement" and "Changes." It's looping fine but for some reason Changes reults in a column of 1's as above. Is there an obvious mistake below?:
holder[[i]] <- (movement)
for (t in seq(1, length(holder))){
if (t == t-1)
changes <- 0
else if (t != t-1)
changes <- 1
I have also tried this, but then it doesn't loop at all.
holder[[i]] <- (movement)
for (t in seq(1, length(holder))){
if (holder[t] == holder[t-1])
changes <- 0
else if (holder[t] != holder[t-1])
changes <- 1
I'm currently getting this error: Error in holder[[t - 1]] : attempt to select less than one element
for the following code:
holder <- list(movement)
for (t in length(holder)){
if (holder[[t]] == holder[[t-1]])
changes <- 0
else changes <- 1
This is too long for a comment so I'm putting this as answer (actually it might answer your problem):
As I already mentioned in a comment to your previous question, you should have a look at what is seq(1, length(holder)) and so what you are doing when you put if (t == t-1) : you are doing something like "if 1==0" which cannot be TRUE.
You need to go with "the second version" of your loop (or, actually, without a loop...), which compares the right things, except that holder is a list so you need to either define it as a vector or use double brackets (holder[[t]]).
You don't need another if after else (what you are actually "saying" to R is "if A is true then do something, else, if 'opposite A' is true then do something else" but, necessarily, if A is not TRUE, then 'opposite A' is...
So something like:
for (t in seq(length(holder))){
if (holder[[t]] == holder[[t-1]]) changes <- 0 else changes <- 1
}
Please consider spending some time on the answer from your previous question to understand why your solution didn't work and why the answer provided did. (This includes reading documentations for the different functions and also take a look at the values your variable can take, e.g. running the loop, one "turn" at a time).

R and apply info

I could find any answers to that. So I've got the following code and trying to put it into apply, so it does the work quicker, my data set is 130k rows long. I need an apply that will calculate the missing times of the horses from Behind(in Length) and the winning Horse time. The problem is that the column Behind gives a the distance behind the horse before, not the first 1. So I'm in need to create a variable that will carry on as the function goes and if new race is identified, finds that the position == 1, it resets the variables.
missingTimes <- function(x) {
L <- 2.4384
for(i in 1:nrow(x) - 10) {
distanceL <- (x$distance[i] * 1000) / L
LperS <- x$Winner.Race.time[i] / distanceL
if(x$position[i] == 1 && !is.na(x$position[i])) {
distanceL <- NULL
LperS <- NULL
}
if(grepl("L",x$Behind[i])) {
x$results[i] <- (distanceL + as.numeric(sub("L", "", x$Behind[i]))) * LperS
}
}
}
I need at least 10 reputation to post images, thats why I give you links instead!
http://i.stack.imgur.com/xN23M.png
http://i.stack.imgur.com/Cspfr.png
The results should just give me a column with the proper times for the finish times of the other horses, in a form like the column Winner Race Time
For further understanding Imma count a few results myself for you:
Starting with first row, it sees position = 1, so it cleans the variables.
Then it takes the distance * 1000, and divides it by the constant L,
2.375 * 1000 / 2.4384 = 973.99
Then It need to get the time in seconds it takes to complete 1 length(L),
290.9 / 973.99 = 0.298
Now to get the finish time for the second horse It adds the length BEHIND to the distance of the racing track and multiplies it by the length per second,
973.99 + 2.25 = 976.24 * 0.298 = 290.91952
Then for the next horses time it'd be:
976.24 + 13 = 989.24 * 0.298 = 294.79352
and so on, remember when it hits position = 1, distance needs to reset
What I've done alternatively is put the distanceL in a separate column, same with LperS, of course after calculation.
If you could walk me through steps required to get that done It'd be great. I'm a complete rookie to the R stuff, so please be descriptive. I hope you catch my understanding!
Thank you!

Finding duplicate values in r

So, In a string containing multiple 1's,
Now, it is possible that, the number
'1'
appears at several positions, let's say, at multiple positions. What I want is
(3)
This is not a complete answer, but some ideas (partly based on comments):
z <- "1101101101"
zz <- as.numeric(strsplit(z,"")[[1]])
Compute autocorrelation function and draw plot: in this case I'm getting the periodicity=3 pretty crudely as the first point at which there is an increase followed by a decrease ...
a1 <- acf(zz)
first.peak <- which(diff(sign(diff(a1$acf[,,1])))==-2)[1]
Now we know the periodicity is 3; create runs of 3 with embed() and analyze their similarities:
ee <- embed(zz,first.peak)
pp <- apply(ee,1,paste,collapse="")
mm <- outer(pp,pp,"==")
aa <- apply(mm[!duplicated(mm),],1,which)
sapply(aa,length) ## 3 3 2 ## number of repeats
sapply(aa,function(x) unique(diff(x))) ## 3 3 3
The following code does exactly what you ask for. Try it with str_groups('1101101101'). It returns a list of 3-vectors. Note that the first triple is (1, 3, 4) because the character at the 10th position is also a 1.
Final version, optimized and without errors
str_groups <- function (s) {
digits <- as.numeric(strsplit(s, '')[[1]])
index1 <- which(digits == 1)
len <- length(digits)
back <- length(index1)
if (back == 0) return(list())
maxpitch <- (len - 1) %/% 2
patterns <- matrix(0, len, maxpitch)
result <- list()
for (pitch in 1:maxpitch) {
divisors <- which(pitch %% 1:(pitch %/% 2) == 0)
while (index1[back] > len - 2 * pitch) {
back <- back - 1
if (back == 0) return(result)
}
for (startpos in index1[1:back]) {
if (patterns[startpos, pitch] != 0) next
pos <- seq(startpos, len, pitch)
if (digits[pos[2]] != 1 || digits[pos[3]] != 1) next
repeats <- length(pos)
if (repeats > 3) for (i in 4:repeats) {
if (digits[pos[i]] != 1) {
repeats <- i - 1
break
}
}
continue <- F
for (subpitch in divisors) {
sublen <- patterns[startpos, subpitch]
if (sublen > pitch / subpitch * (repeats - 1)) {
continue <- T
break
}
}
if (continue) next
for (i in 1:repeats) patterns[pos[i], pitch] <- repeats - i + 1
result <- append(result, list(c(startpos, pitch, repeats)))
}
}
return(result)
}
Note: this algorithm has roughly quadratic runtime complexity, so if you make your strings twice as long, it will take four times as much time to find all patterns on average.
Pseudocode version
To aid understanding of the code. For particulars of R functions such as which, consult the R online documentation, for example by running ?which on the R command line.
PROCEDURE str_groups WITH INPUT $s (a string of the form /(0|1)*/):
digits := array containing the digits in $s
index1 := positions of the digits in $s that are equal to 1
len := pointer to last item in $digits
back := pointer to last item in $index1
IF there are no items in $index1, EXIT WITH empty list
maxpitch := the greatest possible interval between 1-digits, given $len
patterns := array with $len rows and $maxpitch columns, initially all zero
result := array of triplets, initially empty
FOR EACH possible $pitch FROM 1 TO $maxpitch:
divisors := array of divisors of $pitch (including 1, excluding $pitch)
UPDATE $back TO the last position at which a pattern could start;
IF no such position remains, EXIT WITH result
FOR EACH possible $startpos IN $index1 up to $back:
IF $startpos is marked as part of a pattern, SKIP TO NEXT $startpos
pos := possible positions of pattern members given $startpos, $pitch
IF either the 2nd or 3rd $pos is not 1, SKIP TO NEXT $startpos
repeats := the number of positions in $pos
IF there are more than 3 positions in $pos THEN
count how long the pattern continues
UPDATE $repeats TO the length of the pattern
END IF (more than 3 positions)
FOR EACH possible $subpitch IN $divisors:
check $patterns for pattern with interval $subpitch at $startpos
IF such a pattern is found AND it envelopes the current pattern,
SKIP TO NEXT $startpos
(using helper variable $continue to cross two loop levels)
END IF (pattern found)
END FOR (subpitch)
FOR EACH consecutive position IN the pattern:
UPDATE $patterns at row of position and column of $pitch TO ...
... the remaining length of the pattern at that position
END FOR (position)
APPEND the triplet ($startpos, $pitch, $repeats) TO $result
END FOR (startpos)
END FOR (pitch)
EXIT WITH $result
END PROCEDURE (str_groups)
Perhaps the following route will help:
Convert string to a vector of integers characters
v <- as.integer(strsplit(s, "")[[1]])
Repeatedly convert this vector to matrices of varying number of rows...
m <- matrix(v, nrow=...)
...and use rle to find relevant patterns in the rows of the matrix m:
rle(m[1, ]); rle(m[2, ]); ...

Resources