I am trying to learn how to implement control structures such as FOR and while loops.
I created a function the simulates betting in a famous Brazilian lotto.
In the lotto, a player bets on 6 unique integers from a 1:60 vector (called your_bet).
The function samples 6 values from the 1 to 60 universe ("result") and tests how many values in result match your_bet, printing out:
your_bet
result
total points (out of 6 possible)
one of three possible commentaries on the result of the bet.
Code as follows:
```
LOTTO<-function(your_bet=sample(1:60, size=6, replace=FALSE)){
result<-sample(1:60, size=6, replace=FALSE)
logical_vector<-(your_bet %in% result)
total_points<-sum(as.integer(logical_vector))
print(paste(c("Your bet:", as.character(your_bet))), collapse="")
print(paste(c("Result", as.character(result))), collapse="")
print(paste(c("Total points", as.character(total_points))), collapse="")
if (total_points==6)
print("You are a millonaire")
else if (total_points==5)
print("5 points, you are rich!")
else print("good luck next time")
}
```
I then tried to implement a loop that would make the function go over and over again in a loop until total points>= a given target (here as target_points), modifying the function as bellow.
```
LOTTO<-function(your_bet=sample(1:60, size=6, replace=FALSE), stubborn_until_x_points=FALSE,
target_points)#inserted stubborn_until_x_points and target_points arguments{
result<-sample(1:60, size=6, replace=FALSE)
logical_vector<-(your_bet %in% result)
total_points<-sum(as.integer(logical_vector))
print(paste(c("Your bet:", as.character(your_bet))), collapse="")
print(paste(c("Result", as.character(result))), collapse="")
print(paste(c("Total points", as.character(total_points))), collapse="")
if (total_points==6)
print("You are a millonaire")
else if (total_points==5)
print("5 points, you are rich!")
else print("good luck next time")
if (stubborn_until_x_points==TRUE)#Inserted WHILE loop here{
while(total_points < target_points){
LOTTO(your_bet, stubborn_until_x_points=TRUE, target_points)}
}
}
```
This did make the function repeat in a loop, but for some reason, it keeps looping even when the condition is met.
Observation - I realized that when the condition is met on the first run, it actually stops, but after entering a loop, it goes on forever.
I canĀ“t find what is wrong. Any ideas?
Thanks in advance.
I haven't dug quite deep into the function, but the only reason that this should not break at any point is when target_points > n_samples (6 here).
The problem in this case is quite obvious and simple to fix. Reduce target_points to be less than 6 or add n_samples (6 atm.) and make it greater than target_points. Rather than this I suspect the main problem lies within the recursive function though. R is rather restrictive when it comes to recursion, if one tries to do a simple recursion for example
i <- 0
f <- function(){
i <<- i + 1
f()
}
f()
i
# 896 on my pc
one can see that we cannot use recursion for very deep recursive functions (in R). This throws the very unhelpful error
Error: C stack usage 7974196 is too close to the limit
To alleviate this, one simply has to remove the recursion (simply in italian because sometimes it is not simple). In this case we just move the while loop to contain the main body of the function, and use an if statement to break early if necessary.
Below is a slightly modified version of the function (note that sample_n and number_range has been added as arguments).
In this function the while loop has been moved to contain the main body, and the result is instead printed at the end (using the variable res to figure out the result). At the end of the loop I use a if(isFALSE(stubborn_until_x_points))break statement to exit early if necessary.
LOTTO <- function(your_bet,
sample_n = 6,
number_range = 1:60,
stubborn_until_x_points = FALSE,
target_points){
if(missing(target_points) || target_points > sample_n)
stop('missing target_points or target_points too large')
total_points <- -Inf # Always smaller than target_points
i <- 0
res <- 0
# If you want a new bet every iteration.
# Add this at the end of the loop, but remove the 'if'
if(missing(your_bet))
your_bet <- sample(number_range, size=sample_n, replace=FALSE)
while(total_points < target_points){
result <- sample(number_range, size=sample_n, replace=FALSE)
logical_vector <- your_bet %in% result
total_points <- sum(logical_vector)
if (total_points==6){
res <- 1
}else if (total_points==5){
res <- 2
}
i <- i + 1
if(isFALSE(stubborn_until_x_points))
break
}
if(res == 1)
cat(res <- 'You\'re a millionaire!\n', sep = '\n')
else if(res == 2)
cat(res <- '5 points, you are rich!\n', sep = '\n')
else
cat(res <- 'Better luck next time.', sep = '\n')
c(result = res, number_of_tries = i)
}
The function is called as before, but now also returns the number of attempts and the result obtained from trials as shown below.
LOTTO(target_points = 6, stubborn_until_x_points = TRUE)
You're a millionaire!
#Output:
result number_of_tries
"You're a millionaire!\n" "8297820"
Removing recursion, including the body of the function inside the while loop, assigning -Inf to the initial total_points and adding the break statement were much usefull.
Addapting the answer by #Oliver yielded exactly what I was looking for:
LOTTO<-function(your_bet=sample(1:60, size=6, replace=FALSE), stubborn_until_x_points=FALSE,
target_points=0){
total_points<--Inf
while(total_points < target_points){
result<-sample(1:60, size=6, replace=FALSE)
logical_vector<-(your_bet %in% result)
total_points<-sum(as.integer(logical_vector))
print(paste(c("Your bet:", as.character(your_bet))), collapse="")
print(paste(c("Result", as.character(result))), collapse="")
print(paste(c("Total points", as.character(total_points))), collapse="")
if (total_points==6)
print("You are a millonaire")
else if (total_points==5)
print("5 points, you are rich!")
else print("good luck next time")
if (isFALSE(stubborn_until_x_points==TRUE))
break
}
His answer, however, yields interesting additional results, and better control over inadequate argument inputs
Related
When working with R I frequently get the error message "subscript out of bounds". For example:
# Load necessary libraries and data
library(igraph)
library(NetData)
data(kracknets, package = "NetData")
# Reduce dataset to nonzero edges
krack_full_nonzero_edges <- subset(krack_full_data_frame, (advice_tie > 0 | friendship_tie > 0 | reports_to_tie > 0))
# convert to graph data farme
krack_full <- graph.data.frame(krack_full_nonzero_edges)
# Set vertex attributes
for (i in V(krack_full)) {
for (j in names(attributes)) {
krack_full <- set.vertex.attribute(krack_full, j, index=i, attributes[i+1,j])
}
}
# Calculate reachability for each vertix
reachability <- function(g, m) {
reach_mat = matrix(nrow = vcount(g),
ncol = vcount(g))
for (i in 1:vcount(g)) {
reach_mat[i,] = 0
this_node_reach <- subcomponent(g, (i - 1), mode = m)
for (j in 1:(length(this_node_reach))) {
alter = this_node_reach[j] + 1
reach_mat[i, alter] = 1
}
}
return(reach_mat)
}
reach_full_in <- reachability(krack_full, 'in')
reach_full_in
This generates the following error Error in reach_mat[i, alter] = 1 : subscript out of bounds.
However, my question is not about this particular piece of code (even though it would be helpful to solve that too), but my question is more general:
What is the definition of a subscript-out-of-bounds error? What causes it?
Are there any generic ways of approaching this kind of error?
This is because you try to access an array out of its boundary.
I will show you how you can debug such errors.
I set options(error=recover)
I run reach_full_in <- reachability(krack_full, 'in')
I get :
reach_full_in <- reachability(krack_full, 'in')
Error in reach_mat[i, alter] = 1 : subscript out of bounds
Enter a frame number, or 0 to exit
1: reachability(krack_full, "in")
I enter 1 and I get
Called from: top level
I type ls() to see my current variables
1] "*tmp*" "alter" "g"
"i" "j" "m"
"reach_mat" "this_node_reach"
Now, I will see the dimensions of my variables :
Browse[1]> i
[1] 1
Browse[1]> j
[1] 21
Browse[1]> alter
[1] 22
Browse[1]> dim(reach_mat)
[1] 21 21
You see that alter is out of bounds. 22 > 21 . in the line :
reach_mat[i, alter] = 1
To avoid such error, personally I do this :
Try to use applyxx function. They are safer than for
I use seq_along and not 1:n (1:0)
Try to think in a vectorized solution if you can to avoid mat[i,j] index access.
EDIT vectorize the solution
For example, here I see that you don't use the fact that set.vertex.attribute is vectorized.
You can replace:
# Set vertex attributes
for (i in V(krack_full)) {
for (j in names(attributes)) {
krack_full <- set.vertex.attribute(krack_full, j, index=i, attributes[i+1,j])
}
}
by this:
## set.vertex.attribute is vectorized!
## no need to loop over vertex!
for (attr in names(attributes))
krack_full <<- set.vertex.attribute(krack_full,
attr, value = attributes[,attr])
It just means that either alter > ncol( reach_mat ) or i > nrow( reach_mat ), in other words, your indices exceed the array boundary (i is greater than the number of rows, or alter is greater than the number of columns).
Just run the above tests to see what and when is happening.
Only an addition to the above responses: A possibility in such cases is that you are calling an object, that for some reason is not available to your query. For example you may subset by row names or column names, and you will receive this error message when your requested row or column is not part of the data matrix or data frame anymore.
Solution: As a short version of the responses above: you need to find the last working row name or column name, and the next called object should be the one that could not be found.
If you run parallel codes like "foreach", then you need to convert your code to a for loop to be able to troubleshoot it.
If this helps anybody, I encountered this while using purr::map() with a function I wrote which was something like this:
find_nearby_shops <- function(base_account) {
states_table %>%
filter(state == base_account$state) %>%
left_join(target_locations, by = c('border_states' = 'state')) %>%
mutate(x_latitude = base_account$latitude,
x_longitude = base_account$longitude) %>%
mutate(dist_miles = geosphere::distHaversine(p1 = cbind(longitude, latitude),
p2 = cbind(x_longitude, x_latitude))/1609.344)
}
nearby_shop_numbers <- base_locations %>%
split(f = base_locations$id) %>%
purrr::map_df(find_nearby_shops)
I would get this error sometimes with samples, but most times I wouldn't. The root of the problem is that some of the states in the base_locations table (PR) did not exist in the states_table, so essentially I had filtered out everything, and passed an empty table on to mutate. The moral of the story is that you may have a data issue and not (just) a code problem (so you may need to clean your data.)
Thanks for agstudy and zx8754's answers above for helping with the debug.
I sometimes encounter the same issue. I can only answer your second bullet, because I am not as expert in R as I am with other languages. I have found that the standard for loop has some unexpected results. Say x = 0
for (i in 1:x) {
print(i)
}
The output is
[1] 1
[1] 0
Whereas with python, for example
for i in range(x):
print i
does nothing. The loop is not entered.
I expected that if x = 0 that in R, the loop would not be entered. However, 1:0 is a valid range of numbers. I have not yet found a good workaround besides having an if statement wrapping the for loop
This came from standford's sna free tutorial
and it states that ...
# Reachability can only be computed on one vertex at a time. To
# get graph-wide statistics, change the value of "vertex"
# manually or write a for loop. (Remember that, unlike R objects,
# igraph objects are numbered from 0.)
ok, so when ever using igraph, the first roll/column is 0 other than 1, but matrix starts at 1, thus for any calculation under igraph, you would need x-1, shown at
this_node_reach <- subcomponent(g, (i - 1), mode = m)
but for the alter calculation, there is a typo here
alter = this_node_reach[j] + 1
delete +1 and it will work alright
What did it for me was going back in the code and check for errors or uncertain changes and focus on need-to-have over nice-to-have.
I am new to R. I wanted to define a R function, find_element, that takes as its inputs a list and a value of any type, and returns the value of the matched element in the input list that matches the value. thanks for your help
find_element <- function(arr, val){
count = 0
for(i in arr){
if (i == val){
print(count)
} else
count = count + 1
print ("No Match")
}
}
e.g.
arr <- 1:10
find_element(arr, 10)
# 10
find_element(arr, 12)
# NULL
Just for educational purposes, please, try (although this is not recommended practice in R!):
find_element <- function(arr, val) {
count = 1
for (i in arr) {
if (i == val) {
return(count)
} else
count = count + 1
}
return("No Match")
}
This will yield
arr <- 1:10
find_element(arr, 10)
#[1] 10
find_element(arr, 12)
#[1] "No Match"
Please, note
In R, elements of vectors, etc are numbered starting with 1
You have to use return instead of print to indicate the return value of a function (well, I know there's a short cut - but it's for the purpose of education, here)
The final return must come after the for loop.
Built-in function
Also for educational purposes, please, note that Sotos already has shown the R way in his comment:
which(arr == 10)
#[1] 10
which(arr == 12)
#integer(0)
In R, it's almost always better to use the well-documented built-in functions or those from packages. And, yes, try to avoid for loops in R.
Learnig R online
As pointed out in the (now deleted) answer of engAnt there are several ressources to learn R. https://www.rstudio.com/online-learning/#R lists a number of resources.
I'm trying to improve the speed of my code, which is trying to optimise a value using 3 variables which have large ranges. The most likely output uses values in the middle of the ranges, so it is wasting time starting from the lowest possible value of each variable. I want to start from the middle value and iterate out! The actual problem has thousands of lines with numbers from 150-650. C,H and O limits will be defined somewhat based on the starting number, but will always be more likely at a central value in the defined range. Is there a way to define the for loop to work outwards like I want? The only, quite shabby, way I can think of is to simply redefine the value within the loop from a vector (e.g. 1=20, 2=21, 3=19, etc). See current code below:
set_error<-2.5
ct<-c(325.00214,325.00952,325.02004,325.02762,325.03535,325.03831,325.04588, 325.05641,325.06402,325.06766,325.07167,325.07454,325.10396)
FormFun<-function(x){
for(C in 1:40){
for(H in 1:80){
for(O in 1:40){
test_mass=C*12+H*1.007825+O*15.9949146-1.0072765
error<-1000000*abs(test_mass-x)/x
if(error<set_error){
result<-paste("C",C,"H",H,"O",O,sep ="")
return(result)
break;break;break;break
}
}
}
}
}
old_t <- Sys.time()
ct2<-lapply(ct,FormFun)
new_t <- Sys.time() - old_t # calculate difference
print(new_t)
Use vectorization and create a closure:
FormFun1_fac <- function(gr) {
gr <<- gr
function(x, set_error){
test_mass <- with(gr, C*12+H*1.007825+O*15.9949146-1.0072765)
error <- 1000000 * abs(test_mass - x) / x
ind <- which(error < set_error)[1]
if (is.na(ind)) return(NULL)
paste0("C", gr[ind, "C"],"H", gr[ind, "H"],"O", gr[ind, "O"])
}
}
FormFun1 <- FormFun1_fac(expand.grid(C = 1:40, H = 1:80, O = 1:40))
ct21 <- lapply(ct, FormFun1, set_error = set_error)
all.equal(ct2, ct21)
#[1] TRUE
This saves a grid of all combinations of C, H, O in the function environment and calculates the error for all combinations (which is fast in vectorized code). The first combination that passes the test is returned.
I'm writing the code to get the data from Uncomtrade- an UN's database. Because the database has a usage limit of 100 enquiries/hour so I need to put a time out there.
I want to write the code with tryCatch that will:
Automatically set programs to time out everytime the error for max limit appears
Rerun for the current level of i,j and k if a connection error orcurs
My current code still work though but I want to learn how to use tryCatch too
And also is there a way to get rid of the for loops. Can the apply family function be used here?
Thanks guys
n=0
a<-c()
for (i in (1996:2014)) {
for (j in c("0301","0302","0303","0304","0305","0306","0307","0308")) {
for (k in c("704","116","360","418","458","104","608","702","764")) {
s2<-paste(i,j,k,sep="")
a<-c(a,s2)
print (s2)
n<-n+1
if(n<=100) {
s1 <- get.Comtrade(r=k, ps=i, rg="2", cc=j, fmt="csv",px="H0")
Sys.sleep (1)
s1<-do.call(rbind.data.frame,s1)
library(foreign)
write.dta(s1,file=paste("D:/unTrade/",s2,".dta"))
}
else {
print(n)
print(s2)
print("reset here")
n=0
Sys.sleep(3610)
}
}
}
}
I can't really help you with the TryCatch(); I don't have the experience myself.
Regarding the for loops, this is one solution (although I think in these cases the for-loops are not that evil; vectorization really counts in all kinds of matrix operations etc).
dat <- expand.grid(i = 1996:1999, j = c("0301","0302","0303","0304","0305","0306","0307","0308"), k = c("704","116","360","418","458","104","608","702","764"))
library(dplyr)
dat %>% group_by(i, j, k) %>%
do({
cat('s1 <- get.Comtrade(r=', .$k, ', ps=', .$i, ', cc=', .$j, ', rg=\"2\", fmt=\"csv\",px=\"H0\")\n')
flush.console()
# return(s1)
})
From your own code s1 (also) appears to be a data.frame, so in this case, the dplyr do() nicely glues all these data frames together.
HTH
I keep getting this error in my For loop:
Error in FUN(X[[i]], ...) : no loop for break/next, jumping to top level
autoAnal <- function(x){
if(!is.numeric(x)){
next
}
m <- median(x, na.rm = T)
a <- mean(x, na.rm = T)
s <- sd(x, na.rm = T)
q <- quantile(x, na.rm = T)
q3 <- q[4]
q1 <- q[2]
outhigh <- (1.5 * q3) + IQR(x, na.rm = T)
outlow <- (1.5 * q1) - IQR(x, na.rm = T)
data.table(Median = m, Average = a, StDev = s,
Outhigh = outhigh, Outlow = outlow)
}
Channel Data
Channel june july december
ATM 666 783 333
VISA 893 321 321
DEBIT 931 134 123
MERCHANT 913 111 134
The first varible - "Channel" is not numerical, so I want R to skip and go to the next variable. However, I get the error above. I can run the function successfully on a numerical vector, but when I use a dataframe or data.table
it doesn't work.
When I run this code I get the error:
test <- lapply(channnel, autoAnal)
What am I doing wrong with the 'next' statement in my for loop?
Thank you.
In order to understand why next doesn't work, we have to break down the two parts of your code that you've attempted. To do this, it makes more sense to work backwards and examine how lapply works. Let's start with a basic variable called i and set it equal to the numbers 1 through 10. Once we have that, we'll use lapply to see how the function handles an input vector.
i <- 1:10
tmp <- lapply(i, as.character)
If you run that code and look at tmp, you'll see that we get a list with 10 elements, where each element in the list is a single number stored as a string. What this should show you is that lapply works as a loop going through each element of an object that you've passed to it and feeding that into whatever function you've chosen. So let's use that to look at what your code is doing.
test <- lapply(channnel, autoAnal)
This is going to take the first element of channnel and pass it into your function autoAnal. At this point, the code switches from the loop that lapply is running and steps into a single function call. The first thing your function does is checks if the element that has been passed into the function is numeric or not. If it's not, you use next to tell R to skip that variable.
However, next only works in loops, such as while and for. So let's go back to our i variable. Let's say we wanted to loop over i and print a message if the number is even, otherwise, skip to the next number. We can do that like this:
for (i in 1:10) {
if (i %% 2 == 0) {
print("i is even!")
} else {
next
}
}
In this instance, using next doesn't throw an error because it's used within a for loop. That's why in my comment I mentioned that there is no for loop, at least not where you think there is. My intention behind saying this was to point out that, yes, lapply is a loop, but not for the sake of what you're trying to do with next.
So, what if we just tried to use next with an if statement?
i <- 5
if (i %% 2 == 0) {
print("i is even")
} else {
next
}
This will throw an error because your code isn't a loop. There's just one interation that it's going to go through, which involves checking if i is even or not, and then moving on. Much in the same way that next doesn't work here, it's not working in your code because the if statement is just checking if the element passed to it is numeric. There's nothing for it to "next" to if that makes sense. For this reason, next is only used with loops.
In my opinion, next is never even really needed. You can always just use if/else statements to run code if needed or otherwise just ignore the variable. As an example, let's rewrite your code to take advantage of if/else and get rid of our next call.
autoAnal <- function(x){
if(is.numeric(x)){
m <- median(x, na.rm = T)
a <- mean(x, na.rm = T)
s <- sd(x, na.rm = T)
q <- quantile(x, na.rm = T)
q3 <- q[4]
q1 <- q[2]
outhigh <- (1.5 * q3) + IQR(x, na.rm = T)
outlow <- (1.5 * q1) - IQR(x, na.rm = T)
data.table(Median = m, Average = a, StDev = s,
Outhigh = outhigh, Outlow = outlow)
} else {
print("Skipping this element")
}
}
By using if/else, we tell the computer to only perform those calculations on variables if they're numeric. Otherwise, print a message saying we're skipping a variable. In normal code, I'd advise just dropping the else statement all together. lapply will naturally return NULL if there is no value returned, and the function will skip all of the code completely for the first variable since it's not numeric. In the end, you end up with an error-free function that only operates on numeric data.
Hopefully that helps illustrate why next doesn't work in your current context.