Why does for loop not iterate with (i in 1:1523)? - r

I have run into errors with my for loop. The code is as follows:
#finding IDs with >5% replicate variance
#initialize vectors
LS1repvariance = NULL
anomalylist = NULL
#open for loop iterating from 1 to end of dataset
for (i in 1:1523){
#call replicates, which start off as characters
charrep1 = widesubdat[i,2]
charrep2 = widesubdat[i,11]
#convert to numeric
rep1 = as.numeric(charrep1)
rep2 = as.numeric(charrep2)
#calculation
repvariance = (rep1-rep2)/((rep1+rep2)/2)*100
#if loop for anomalous replicates
if (abs(repvariance)>=5)
anomalylist[i]=widesubdat[i,0]
}
The error I get says
Error in if (abs(repvariance) >= 5) anomalylist[i] = widesubdat[i, 0]
: missing value where TRUE/FALSE needed
I think the error is in the iteration because it defines i as 336L, and it does not call charrep correctly, but I have no idea why. I've done for loops in python but never in R, but all of the for loop help pages seem to have the same structure. All of the lines that I can run outside of the for loop test out okay.
I've read that if statements also require curly brackets, but IDLE said unexpected "{" when I used them.

You could also drop the loop
pick <- abs(200*(widesubdat[,2]-widesubdat[,11])/(widesubdat[,2]+widesubdat[,11]))>=5
anomalylist <- widesubdat[,1] # Note the comment above with index 0
anomalylist[!pick] <- NA

Related

Calculating distance using latitude and longitude error [duplicate]

When working with R I frequently get the error message "subscript out of bounds". For example:
# Load necessary libraries and data
library(igraph)
library(NetData)
data(kracknets, package = "NetData")
# Reduce dataset to nonzero edges
krack_full_nonzero_edges <- subset(krack_full_data_frame, (advice_tie > 0 | friendship_tie > 0 | reports_to_tie > 0))
# convert to graph data farme
krack_full <- graph.data.frame(krack_full_nonzero_edges)
# Set vertex attributes
for (i in V(krack_full)) {
for (j in names(attributes)) {
krack_full <- set.vertex.attribute(krack_full, j, index=i, attributes[i+1,j])
}
}
# Calculate reachability for each vertix
reachability <- function(g, m) {
reach_mat = matrix(nrow = vcount(g),
ncol = vcount(g))
for (i in 1:vcount(g)) {
reach_mat[i,] = 0
this_node_reach <- subcomponent(g, (i - 1), mode = m)
for (j in 1:(length(this_node_reach))) {
alter = this_node_reach[j] + 1
reach_mat[i, alter] = 1
}
}
return(reach_mat)
}
reach_full_in <- reachability(krack_full, 'in')
reach_full_in
This generates the following error Error in reach_mat[i, alter] = 1 : subscript out of bounds.
However, my question is not about this particular piece of code (even though it would be helpful to solve that too), but my question is more general:
What is the definition of a subscript-out-of-bounds error? What causes it?
Are there any generic ways of approaching this kind of error?
This is because you try to access an array out of its boundary.
I will show you how you can debug such errors.
I set options(error=recover)
I run reach_full_in <- reachability(krack_full, 'in')
I get :
reach_full_in <- reachability(krack_full, 'in')
Error in reach_mat[i, alter] = 1 : subscript out of bounds
Enter a frame number, or 0 to exit
1: reachability(krack_full, "in")
I enter 1 and I get
Called from: top level
I type ls() to see my current variables
1] "*tmp*" "alter" "g"
"i" "j" "m"
"reach_mat" "this_node_reach"
Now, I will see the dimensions of my variables :
Browse[1]> i
[1] 1
Browse[1]> j
[1] 21
Browse[1]> alter
[1] 22
Browse[1]> dim(reach_mat)
[1] 21 21
You see that alter is out of bounds. 22 > 21 . in the line :
reach_mat[i, alter] = 1
To avoid such error, personally I do this :
Try to use applyxx function. They are safer than for
I use seq_along and not 1:n (1:0)
Try to think in a vectorized solution if you can to avoid mat[i,j] index access.
EDIT vectorize the solution
For example, here I see that you don't use the fact that set.vertex.attribute is vectorized.
You can replace:
# Set vertex attributes
for (i in V(krack_full)) {
for (j in names(attributes)) {
krack_full <- set.vertex.attribute(krack_full, j, index=i, attributes[i+1,j])
}
}
by this:
## set.vertex.attribute is vectorized!
## no need to loop over vertex!
for (attr in names(attributes))
krack_full <<- set.vertex.attribute(krack_full,
attr, value = attributes[,attr])
It just means that either alter > ncol( reach_mat ) or i > nrow( reach_mat ), in other words, your indices exceed the array boundary (i is greater than the number of rows, or alter is greater than the number of columns).
Just run the above tests to see what and when is happening.
Only an addition to the above responses: A possibility in such cases is that you are calling an object, that for some reason is not available to your query. For example you may subset by row names or column names, and you will receive this error message when your requested row or column is not part of the data matrix or data frame anymore.
Solution: As a short version of the responses above: you need to find the last working row name or column name, and the next called object should be the one that could not be found.
If you run parallel codes like "foreach", then you need to convert your code to a for loop to be able to troubleshoot it.
If this helps anybody, I encountered this while using purr::map() with a function I wrote which was something like this:
find_nearby_shops <- function(base_account) {
states_table %>%
filter(state == base_account$state) %>%
left_join(target_locations, by = c('border_states' = 'state')) %>%
mutate(x_latitude = base_account$latitude,
x_longitude = base_account$longitude) %>%
mutate(dist_miles = geosphere::distHaversine(p1 = cbind(longitude, latitude),
p2 = cbind(x_longitude, x_latitude))/1609.344)
}
nearby_shop_numbers <- base_locations %>%
split(f = base_locations$id) %>%
purrr::map_df(find_nearby_shops)
I would get this error sometimes with samples, but most times I wouldn't. The root of the problem is that some of the states in the base_locations table (PR) did not exist in the states_table, so essentially I had filtered out everything, and passed an empty table on to mutate. The moral of the story is that you may have a data issue and not (just) a code problem (so you may need to clean your data.)
Thanks for agstudy and zx8754's answers above for helping with the debug.
I sometimes encounter the same issue. I can only answer your second bullet, because I am not as expert in R as I am with other languages. I have found that the standard for loop has some unexpected results. Say x = 0
for (i in 1:x) {
print(i)
}
The output is
[1] 1
[1] 0
Whereas with python, for example
for i in range(x):
print i
does nothing. The loop is not entered.
I expected that if x = 0 that in R, the loop would not be entered. However, 1:0 is a valid range of numbers. I have not yet found a good workaround besides having an if statement wrapping the for loop
This came from standford's sna free tutorial
and it states that ...
# Reachability can only be computed on one vertex at a time. To
# get graph-wide statistics, change the value of "vertex"
# manually or write a for loop. (Remember that, unlike R objects,
# igraph objects are numbered from 0.)
ok, so when ever using igraph, the first roll/column is 0 other than 1, but matrix starts at 1, thus for any calculation under igraph, you would need x-1, shown at
this_node_reach <- subcomponent(g, (i - 1), mode = m)
but for the alter calculation, there is a typo here
alter = this_node_reach[j] + 1
delete +1 and it will work alright
What did it for me was going back in the code and check for errors or uncertain changes and focus on need-to-have over nice-to-have.

R - tryCatch - use last iteration index to restart the for loop

I already read documentations and several other question about tryCatch(), however, I am not able to figure out the solution to my class of problem.
The task to solve is:
1)There is a for loop that goes from row 1 to nrow of the dataframe.
2)Execute some instruction
3)In the case of error that would stop the program, instead restart the loop cycle from the current iteration.
Example
for (i in 1:50000) {
...execute instructions...
}
What I am looking for is a solution that in the case of error at the iteration 30250, it restart the loop such that
for (i in 30250:50000) {
...execute instructions...
}
The practical example I am working on is the following:
library(RDSTK)
library(jsonlite)
DF <- (id = seq(1:400000), lat = rep(38.929840, 400000), long = rep( -77.062343, 400000)
for (i in 1 : nrow(DF) {
location <- NULL
bo <- 0
while (bo != 10) { #re-try the instruction max 10 times per row
location <- NULL
location <- try(location <- #try to gather the data from internet
coordinates2politics(DF$lat[i], DF$long[i]))
if (class(location) == "try-error") { #if not able to gather the data
Sys.sleep(runif(1,2,7)) #wait a random time before query again
print("reconntecting...")
bo <- bo+1
print(bo)
} else #if there is NO error
break #stop trying on this individual
}
location <- lapply(location, jsonlite::fromJSON)
location <- data.frame(location[[1]]$politics)
DF$start_country[i] <- location$name[1]
DF$start_region[i] <- location$name[2]
Sys.sleep(runif(1,2,7)) #sleep random seconds before starting the new row
}
N.B.: the try() is part of the "...execute instruction..."
What I am looking for is a tryCatch that when a critical error that stops the program happen, it instead restart the for loop from the current index "i".
This program would allow me to automatically iterate over 400000 rows and restart where it interrupted in the case of errors. This means that the program would be able to work totally independently by human.
I hope my question is clear, thank you very much.
This is better addressed using while rather than for:
i <- 1
while(i < 10000){
tryCatch(doStuff()) -> doStuffResults
if(class(doStuffResults) != "try-catch") i <- i + 1
{
Beware though if some cases will always fail doStuff: the loop will never terminate!
If the program will produce a fatal error and need a human restart, then the first line i <- 1 may be replaced (as long as you don't use i elsewhere in your code) with
if(!exists("i")) i <- 1
so that the value from the loop will persist and not be reset to one.

R not printing the right thing in loop

I am trying to print the "result" of using table function, but when I tried to use the code here, I got something very strange:
for (i in 1:4){
print (table(paste("group",i,"$", "BMI_obese",sep=""), paste("group",i,"$","A1.1", sep="")))
}
This is the result in R output:
group1$A1.1
group1$BMI_obese 1
group2$A1.1
group2$BMI_obese 1
group3$A1.1
group3$BMI_obese 1
group4$A1.1
group4$BMI_obese 1
But when I type out the statement without typing inside the loop:
table(group2$BMI_obese, group2$A1.1)
I got what I want:
1 2 3 4 5
0 51 20 9 8 0
1 37 20 15 6 4
Does anyone know which part of my for loop code is not correct or can be modified to fit my purpose of printing the loop table result?
Hi, all but now I have another problem. I am trying to add an inner loop which will take the column name as an argument, because I would like to loop through mulitiple column for each of the group data (i.e. for group1, I would like to have table of BMI_obese vs A1.1, BMI_obese vs A1.2 ... BMI_obese vs A1.15. This is my code, but somehow it is not working, I think it is because it is not recognizing the A1.1, A1.2,... as an column taking from the data group1, group2, group3, group4. But instead it is treated as a string I think. I am not sure how to fix it:
for (i in 2:4) {
for (j in c("A1.1","A1.2"))
{
print(with(get(paste0("group", i)),table(BMI_obese,j)))
}
}
I keep getting this error message:
Error in table(BMI_obese, j) : all arguments must have the same length
Okay, you are trying to construct a variable name using paste and then do a table. You are simply passing the name of the variable to table, not the variable object itself. For this sort of approach you want to use get()
for (i in 1:4) {
with(get(paste0("group", i), table(BMI_obese, A1.1))
}
#example saving as a list (using lapply rather than for loop)
group1 <- data.frame(x=LETTERS[1:10], y=(1:10)[sample(10, replace=TRUE)])
group2 <- data.frame(x=LETTERS[1:10], y=(1:10)[sample(10, replace=TRUE)])
result <- lapply(1:2, function(i) with(get(paste0("group", i)), table(x, y)))
#look at first six rows of each:
head(result[[1]])
head(result[[2]])
#example illustrating fetching objects from a string name
data(mtcars)
head(with(get("mtcars"), table(disp, cyl)))
head(with(get("mtcars"), table(disp, "cyl")))
#Error in table(disp, "cyl") : all arguments must have the same length
head(with(get("mtcars"), table(disp, get("cyl"))))
You could also use a combination of eval and parse like this:
x1 <- c(sample(10, 100, replace = TRUE))
y1 <- c(sample(10, 100, replace = TRUE))
table(eval(parse(text = paste0("x", 1))),
eval(parse(text = paste0("y", 1))))
But I'd also say it is not the nicest practice to access variables that way...
Your types are used wrong. See the difference:
table(group2$BMI_obese, group2$A1.1)
and
table(paste(...),paste(...))
So what type does paste return? Certainly some string.
EDIT:
paste(...) was not meant to be syntactically correct but an abbreviation for paste("group",i,"$", "BMI_obese",sep=""), or whatever you paste together.
paste(...) is returning some string. If you put that result into a table, you get a table of strings (the unexpected result that you got). What you want to do is acessing variables or fields with the name which is returned by your paste(...). Just an an eval to your paste like Daniel said and do it like this.
for (i in 1:4){
print (table(eval(paste("group",i,"$", "BMI_obese",sep="")),eval(paste("group",i,"$","A1.1", sep=""))))
}

Substituting variables in a loop?

I am trying to write a loop in R but I think the nomenclature is not correct as it does not create the new objects, here is a simplified example of what I am trying to do:
for i in (1:8) {
List_i <-List
colsToGrab_i <-grep(predefinedRegex_i, colnames(List_i$table))
List_i$table <- List_i$table[,predefinedRegex_i]
}
I have created 'predefinedRegex'es 1:8 which the grep should use to search
The loop creates an object called "List_i" and then fails to find "predefinedRegex_i".
I have tried putting quotes around the "i" and $ in front of the i , also [i] but these do not work.
Any help much appreciated. Thank you.
#
Using #RyanGrammel's answer below::
#CREATING regular expressions for grabbing sets groups 1 -7 ::::
g_1 <- "DC*"
g_2 <- "BN_._X.*"
g_3 <- "BN_a*"
g_4 <- "BN_b*"
g_5 <- "BN_a_X.*"
g_6 <- "BN_b_X.*"
g_7 <- "BN_._Y.*"
for i in (1:8)
{
assign(x = paste("tableA_", i, sep=""), value = BigList$tableA)
assign(x = paste("Forgrep_", i, sep=""), value = colnames(get(x = paste("tableA_", i, sep=""))))
assign(x = paste("grab_", i, sep=""), value = grep((get(x = paste("g_",i, sep=""))), (get(x = paste("Forgrep_",i, sep="")))))
assign(x = paste("tableA_", i, sep=""), value = BigList$tableA[,get(x = paste("grab_",i, sep=""))])
}
This loop is repeated for each table inside "BigList".
I found I could not extract columnnames from
(get(x = paste("BigList_", i, "$tableA" sep=""))))
or from
(get(x = paste("BigList_", i, "[[2]]" sep=""))))
so it was easier to extract the tables first. I will now write a loop to repack the lists up.
Problem
Your syntax is off: you don't seem to understand how exactly R deals with variable names.
for(i in 1:10) name_i <- 1
The above code doesn't assign name_1, name_2,....,name_10. It assigns "name_i" over and over again
To create a list, you call 'list()', not List
creating a variable List_i in a loop doesn't assign List_1, List_2,...,List_8.
It repeatedly assigns an empty list to the name 'List_i'. Think about it; if R names variables in the way you tried to, it'd be equally likely to name your variables L1st_1, L2st_2...See 'Solution' for some valid R code do something similar
'predefinedRegex_i' isn't interpreted as an attempt to get the variable 'predefinedRegex_1', 'predefinedRegex_2', and so one.
However, get(paste0("predefinedRegex_", i)) is interpreted in this way. Just make sure i actually has a value when using this. See below.
Solution:
In general, use this to dynamically assign variables (List_1, List_2,..)
assign(x = paste0("prefix_", i), value = i)
if i is equal to 199, then this code assigns the variable prefix_199 the value 199.
In general, use this to dynamically get the variables you assigned using the above snippet of code.
get(x = paste0("prefix_", i))
if i is equal to 199, then this code gets the variable prefix_199.
That should solve the crux of your problem; if you need any further help feel free to ask for clarification here, or contact me via my Twitter Feed.

Error in *tmp*[[j]] : subscript out of bounds

Apologies for long post! I'm new to R and have been working hard to improve my command of the language. I stumbled across this interesting project on modelling football results: http://www1.maths.leeds.ac.uk/~voss/projects/2010-sports/JamesGardner.pdf
I keep running into problems when I run the code to Simulate a Full Season (first mentioned page 36, appendix page 59):
Games <- function(parameters)
{
teams <- rownames(parameters)
P <- parameters$teams
home <- parameters$home
n <- length(teams)
C <- data.frame()
row <- 1
for (i in 1:n) {
for (j in 1:n) {
if (i != j) {
C[row,1] <- teams[i]
C[row,2] <- teams[j]
C[row,3] <- rpois(1, exp(P[i,]$Attack - P[j,]$Defence + home))
C[row,4] <- rpois(1, exp(P[j,]$Attack - P[i,]$Defence))
row <- row + 1
}
}
}
return(C)
}
Games(TeamParameters)
The response I get is
Error in `*tmp*`[[j]] : subscript out of bounds
When I attempt a traceback(), this is what I get:
3: `[<-.data.frame`(`*tmp*`, row, 1, value = NULL) at #11
2: `[<-`(`*tmp*`, row, 1, value = NULL) at #11
1: Games(TeamParameters)
I don't really understand what the error means and I would appreciate any help. Once again, apologies for the long post but I'm really interested in this project and would love to learn what the problem is!
The data.frame objects are not extendable by row with the [<-.data.frame operation. (You would need to use rbind.) You should create an object that has sufficient space, either a pre-dimensioned matrix or data.frame. If "C" is an object of 0 rows, then trying to assign to row one will fail. There is a function named "C", so you might want to make its name something more distinct. It also seems likely that there are more efficient methods than the double loop but you haven't describe the parameter object very well.
You may notice that the Appendix of that paper you cited shows how to pre-dimension a dataframe:
teams <- sort(unique(c(games[,1], games[,2])), decreasing = FALSE)
T <- data.frame(Team=teams, ... )
... and the games-object was assumed to already have the proper number of rows and the results of computations were assigning new column values. The $<- operation will succeed if there is no current value for that referenced column.

Resources