When working with R I frequently get the error message "subscript out of bounds". For example:
# Load necessary libraries and data
library(igraph)
library(NetData)
data(kracknets, package = "NetData")
# Reduce dataset to nonzero edges
krack_full_nonzero_edges <- subset(krack_full_data_frame, (advice_tie > 0 | friendship_tie > 0 | reports_to_tie > 0))
# convert to graph data farme
krack_full <- graph.data.frame(krack_full_nonzero_edges)
# Set vertex attributes
for (i in V(krack_full)) {
for (j in names(attributes)) {
krack_full <- set.vertex.attribute(krack_full, j, index=i, attributes[i+1,j])
}
}
# Calculate reachability for each vertix
reachability <- function(g, m) {
reach_mat = matrix(nrow = vcount(g),
ncol = vcount(g))
for (i in 1:vcount(g)) {
reach_mat[i,] = 0
this_node_reach <- subcomponent(g, (i - 1), mode = m)
for (j in 1:(length(this_node_reach))) {
alter = this_node_reach[j] + 1
reach_mat[i, alter] = 1
}
}
return(reach_mat)
}
reach_full_in <- reachability(krack_full, 'in')
reach_full_in
This generates the following error Error in reach_mat[i, alter] = 1 : subscript out of bounds.
However, my question is not about this particular piece of code (even though it would be helpful to solve that too), but my question is more general:
What is the definition of a subscript-out-of-bounds error? What causes it?
Are there any generic ways of approaching this kind of error?
This is because you try to access an array out of its boundary.
I will show you how you can debug such errors.
I set options(error=recover)
I run reach_full_in <- reachability(krack_full, 'in')
I get :
reach_full_in <- reachability(krack_full, 'in')
Error in reach_mat[i, alter] = 1 : subscript out of bounds
Enter a frame number, or 0 to exit
1: reachability(krack_full, "in")
I enter 1 and I get
Called from: top level
I type ls() to see my current variables
1] "*tmp*" "alter" "g"
"i" "j" "m"
"reach_mat" "this_node_reach"
Now, I will see the dimensions of my variables :
Browse[1]> i
[1] 1
Browse[1]> j
[1] 21
Browse[1]> alter
[1] 22
Browse[1]> dim(reach_mat)
[1] 21 21
You see that alter is out of bounds. 22 > 21 . in the line :
reach_mat[i, alter] = 1
To avoid such error, personally I do this :
Try to use applyxx function. They are safer than for
I use seq_along and not 1:n (1:0)
Try to think in a vectorized solution if you can to avoid mat[i,j] index access.
EDIT vectorize the solution
For example, here I see that you don't use the fact that set.vertex.attribute is vectorized.
You can replace:
# Set vertex attributes
for (i in V(krack_full)) {
for (j in names(attributes)) {
krack_full <- set.vertex.attribute(krack_full, j, index=i, attributes[i+1,j])
}
}
by this:
## set.vertex.attribute is vectorized!
## no need to loop over vertex!
for (attr in names(attributes))
krack_full <<- set.vertex.attribute(krack_full,
attr, value = attributes[,attr])
It just means that either alter > ncol( reach_mat ) or i > nrow( reach_mat ), in other words, your indices exceed the array boundary (i is greater than the number of rows, or alter is greater than the number of columns).
Just run the above tests to see what and when is happening.
Only an addition to the above responses: A possibility in such cases is that you are calling an object, that for some reason is not available to your query. For example you may subset by row names or column names, and you will receive this error message when your requested row or column is not part of the data matrix or data frame anymore.
Solution: As a short version of the responses above: you need to find the last working row name or column name, and the next called object should be the one that could not be found.
If you run parallel codes like "foreach", then you need to convert your code to a for loop to be able to troubleshoot it.
If this helps anybody, I encountered this while using purr::map() with a function I wrote which was something like this:
find_nearby_shops <- function(base_account) {
states_table %>%
filter(state == base_account$state) %>%
left_join(target_locations, by = c('border_states' = 'state')) %>%
mutate(x_latitude = base_account$latitude,
x_longitude = base_account$longitude) %>%
mutate(dist_miles = geosphere::distHaversine(p1 = cbind(longitude, latitude),
p2 = cbind(x_longitude, x_latitude))/1609.344)
}
nearby_shop_numbers <- base_locations %>%
split(f = base_locations$id) %>%
purrr::map_df(find_nearby_shops)
I would get this error sometimes with samples, but most times I wouldn't. The root of the problem is that some of the states in the base_locations table (PR) did not exist in the states_table, so essentially I had filtered out everything, and passed an empty table on to mutate. The moral of the story is that you may have a data issue and not (just) a code problem (so you may need to clean your data.)
Thanks for agstudy and zx8754's answers above for helping with the debug.
I sometimes encounter the same issue. I can only answer your second bullet, because I am not as expert in R as I am with other languages. I have found that the standard for loop has some unexpected results. Say x = 0
for (i in 1:x) {
print(i)
}
The output is
[1] 1
[1] 0
Whereas with python, for example
for i in range(x):
print i
does nothing. The loop is not entered.
I expected that if x = 0 that in R, the loop would not be entered. However, 1:0 is a valid range of numbers. I have not yet found a good workaround besides having an if statement wrapping the for loop
This came from standford's sna free tutorial
and it states that ...
# Reachability can only be computed on one vertex at a time. To
# get graph-wide statistics, change the value of "vertex"
# manually or write a for loop. (Remember that, unlike R objects,
# igraph objects are numbered from 0.)
ok, so when ever using igraph, the first roll/column is 0 other than 1, but matrix starts at 1, thus for any calculation under igraph, you would need x-1, shown at
this_node_reach <- subcomponent(g, (i - 1), mode = m)
but for the alter calculation, there is a typo here
alter = this_node_reach[j] + 1
delete +1 and it will work alright
What did it for me was going back in the code and check for errors or uncertain changes and focus on need-to-have over nice-to-have.
Related
In some R script, I use some dummy variable in a for loop.
The variable has no purpose itself, so I don't need it recorded at all.
For instance :
database = read.csv("data/somefile.csv")
for (i in 1:ncol(database)) {
name <- names(database)[i]
if (name %in% some_vector) {
label(database[, .i]) <- some_function(databas$somecolumn)
}
}
In R Studio, the "Global Environement" tab keeps track of variables i and name (and give it the last value it had), although they have no usefulness at all.
Is there any elegant way to declare my value so it is not tracked in the global environment ?
Use local for all your workspace hygiene needs.
foo <- local({
x <- 0
for(i in 1:nrow(mtcars))
x <- x + mtcars$mpg[i]
x
})
foo now contains the result of the calculation, and the temporary variables i and x are discarded.
To hide objects from RStudio's object explorer, you can prefix with . like
.x = 2
Downsides. This still creates .x and keeps it in memory, where it might take up space or accidentally be used again after you've forgotten about it. It also hides from the standard "clear workspace" command rm(list = ls()). See ?ls for a way of handling this.
Aside. Generally, I would not create any variables like this, instead wrapping any operation involving temporary objects in a function as #Aurèle suggested and not leaning too heavily on what RStudio's object browser shows me.
The only case so far where I've used dot-prefixed objects is for interactive use in a function, like:
f = function(x, y, debug.obj = FALSE){
dx = dim(x)
dy = dim(y)
if (!(length(dx) == 2 && length(dy) == 2 && dx[2] == dy[1])){
if (debug.obj){
.debug.f <<- list(dx = dx, dy = dy)
stop("Dims don't match. See .debug.f")
}
stop("Dims don't match.")
}
x %*% y
}
# example usage
f(matrix(1,1,1), matrix(2,2,2), debug.obj = TRUE)
# Error in f(matrix(1, 1, 1), matrix(2, 2, 2), debug.obj = TRUE) :
# Dims don't match. See .debug.f
.debug.f
# $dx
# [1] 1 1
#
# $dy
# [1] 2 2
Even this might be a bad idea, though.
I have run into errors with my for loop. The code is as follows:
#finding IDs with >5% replicate variance
#initialize vectors
LS1repvariance = NULL
anomalylist = NULL
#open for loop iterating from 1 to end of dataset
for (i in 1:1523){
#call replicates, which start off as characters
charrep1 = widesubdat[i,2]
charrep2 = widesubdat[i,11]
#convert to numeric
rep1 = as.numeric(charrep1)
rep2 = as.numeric(charrep2)
#calculation
repvariance = (rep1-rep2)/((rep1+rep2)/2)*100
#if loop for anomalous replicates
if (abs(repvariance)>=5)
anomalylist[i]=widesubdat[i,0]
}
The error I get says
Error in if (abs(repvariance) >= 5) anomalylist[i] = widesubdat[i, 0]
: missing value where TRUE/FALSE needed
I think the error is in the iteration because it defines i as 336L, and it does not call charrep correctly, but I have no idea why. I've done for loops in python but never in R, but all of the for loop help pages seem to have the same structure. All of the lines that I can run outside of the for loop test out okay.
I've read that if statements also require curly brackets, but IDLE said unexpected "{" when I used them.
You could also drop the loop
pick <- abs(200*(widesubdat[,2]-widesubdat[,11])/(widesubdat[,2]+widesubdat[,11]))>=5
anomalylist <- widesubdat[,1] # Note the comment above with index 0
anomalylist[!pick] <- NA
I am trying to understand the for and if-statement in r, so I run a code where I am saying that if the sum of rows are bigger than 3 then return 1 else zero:
Here is the code
set.seed(2)
x = rnorm(20)
y = 2*x
a = cbind(x,y)
hold = c()
Now comes the if-statement
for (i in nrow(a)) {
if ([i,1]+ [i,2] > 3) hold[i,] == 1
else ([i,1]+ [i,2]) <- hold[i,] == 0
return (cbind(a,hold)
}
I know that maybe combining for and if may not be ideal, but I just want to understand what is going wrong. Please keep the explanation at a dummy level:) Thanks
You've got some issues. #mnel covered a better way to go about doing this, I'll focus on understanding what went wrong in this attempt (but don't do it this way at all, use a vectorized solution).
Line 1
for (i in nrow(a)) {
a has 20 rows. nrow(a) is 20. Thus your code is equivalent to for (i in 20), which means i will only ever be 20.
Fix:
for (i in 1:nrow(a)) {
Line 2
if ([i,1]+ [i,2] > 3) hold[i,] == 1
[i,1] isn't anything, it's the ith row and first column of... nothing. You need to reference your data: a[i,1]
You initialized hold as a vector, c(), so it only has one dimension, not rows and columns. So we want to assign to hold[i], not hold[i,].
== is used for equality testing. = or <- are for assignment. Right now, if the >3 condition is met, then you check if hold[i,] is equal to 1. (And do nothing with the result).
Fix:
if (a[i,1]+ a[i,2] > 3) hold[i] <- 1
Line 3
else ([i,1]+ [i,2]) <- hold[i,] == 0
As above for assignment vs equality testing. (Here you used an arrow assignment, but put it in the wrong place - as if you're trying to assign to the else)
else happens whenever the if condition isn't met, you don't need to try to repeat the condition
Fix:
else hold[i] <- 0
Fixed code together:
for (i in 1:nrow(a)) {
if (a[i,1] + a[i,2] > 3) hold[i] <- 1
else hold[i] <- 0
}
You aren't using curly braces for your if and else expressions. They are not required for single-line expressions (if something do this one line). They are are required for multi-line (if something do a bunch of stuff), but I think they're a good idea to use. Also, in R, it's good practice to put the else on the same line as a } from the preceding if (inside the for loop or a function it doesn't matter, but otherwise it would, so it's good to get in the habit of always doing it). I would recommend this reformatted code:
for (i in 1:nrow(a)) {
if (a[i, 1] + a[i, 2] > 3) {
hold[i] <- 1
} else {
hold[i] <- 0
}
}
Using ifelse
ifelse() is a vectorized if-else statement in R. It is appropriate when you want to test a vector of conditions and get a result out for each one. In this case you could use it like this:
hold <- ifelse(a[, 1] + a[, 2] > 3, 1, 0)
ifelse will take care of the looping for you. If you want it as a column in your data, assign it directly (no need to initialize first)
a$hold <- ifelse(a[, 1] + a[, 2] > 3, 1, 0)
Such operations in R are nicely vectorised.
You haven't included a reference to the dataset you wish to index with your call to [ (eg a[i,1])
using rowSums
h <- rowSums(a) > 3
I am going to assume that you are new to R and trying to learn about the basic function of the for loop itself. R has fancy functions called "apply" functions that are specifically for doing basic math on each row of a data frame. I am not going to talk about these.
You want to do the following on each row of the array.
Sum the elements of the row.
Test that the sum is greater than 3.
Return a value of 1 or 0 representing the result of 2.
For 1, luckily "sum" is a built in function. It pays off to check out the built in functions within every programming language because they save you time. To sum the elements of a row, just use sum(a[row_number,]).
For 2, you are evaluating a logical statement "is x >3?" where x is the result from 1. The ">3" statement returns a value of true or false. The logical expression is a fancy "if then" statement without the "if then".
> 4>3
[1] TRUE
> 2>3
[1] FALSE
For 3, a true or false value is a data structure called a "logical" value in R. A 1 or 0 value is a data structure called a "numeric" value in R. By converting the "logical" into a "numeric", you can change the TRUE to 1's and FALSE to 0's.
> class(4>3)
[1] "logical"
> as.numeric(4>3)
[1] 1
> class(as.numeric(4>3))
[1] "numeric"
A for loop has a min, a max, a counter, and an executable. The counter starts at the min, and increments until it goes to the max. The executable will run for each run of the counter. You are starting at the first row and going to the last row. Putting all the elements together looks like this.
for (i in 1:nrow(a)){
hold[i] <- as.numeric(sum(a[i,])>3)
}
If i want to check the existence of a variable I use
exists("variable")
In a script I am working on I sometimes encounter the problem of a "subscript out of bounds" after running, and then my script stops. In an if statement I would like to be able to check if a subscript will be out of bounds or not. If the outcome is "yes", then execute an alternative peace of the script, and if "not", then just continue the script as it was intended.
In my imagination in case of a list it would look something like:
if {subscriptOutofBounds(listvariable[[number]]) == TRUE) {
## execute this part of the code
}
else {
## execute this part
}
Does something like that exist in R?
You can compare the length of your list with other number. As an illustration, say I have a list with 3 index and want to check by comparing them with a vector of number 1 to 100.
lol <- list(c(1:10),
c(100:200),
c(3:50))
lol
check_out <- function(x) {
maxi <- max(x)
if (maxi > length(lol)) {
#Excecute this part of code
print("Yes")
}
else {
#Excecute this part of code
print("No")
}
}
num <- 1:100
check_out(num)
The biggest number of vector num is 100 and your list only has 3 index (or length =3), so it will be out of bound from your list, then it will return Yes
Apologies for long post! I'm new to R and have been working hard to improve my command of the language. I stumbled across this interesting project on modelling football results: http://www1.maths.leeds.ac.uk/~voss/projects/2010-sports/JamesGardner.pdf
I keep running into problems when I run the code to Simulate a Full Season (first mentioned page 36, appendix page 59):
Games <- function(parameters)
{
teams <- rownames(parameters)
P <- parameters$teams
home <- parameters$home
n <- length(teams)
C <- data.frame()
row <- 1
for (i in 1:n) {
for (j in 1:n) {
if (i != j) {
C[row,1] <- teams[i]
C[row,2] <- teams[j]
C[row,3] <- rpois(1, exp(P[i,]$Attack - P[j,]$Defence + home))
C[row,4] <- rpois(1, exp(P[j,]$Attack - P[i,]$Defence))
row <- row + 1
}
}
}
return(C)
}
Games(TeamParameters)
The response I get is
Error in `*tmp*`[[j]] : subscript out of bounds
When I attempt a traceback(), this is what I get:
3: `[<-.data.frame`(`*tmp*`, row, 1, value = NULL) at #11
2: `[<-`(`*tmp*`, row, 1, value = NULL) at #11
1: Games(TeamParameters)
I don't really understand what the error means and I would appreciate any help. Once again, apologies for the long post but I'm really interested in this project and would love to learn what the problem is!
The data.frame objects are not extendable by row with the [<-.data.frame operation. (You would need to use rbind.) You should create an object that has sufficient space, either a pre-dimensioned matrix or data.frame. If "C" is an object of 0 rows, then trying to assign to row one will fail. There is a function named "C", so you might want to make its name something more distinct. It also seems likely that there are more efficient methods than the double loop but you haven't describe the parameter object very well.
You may notice that the Appendix of that paper you cited shows how to pre-dimension a dataframe:
teams <- sort(unique(c(games[,1], games[,2])), decreasing = FALSE)
T <- data.frame(Team=teams, ... )
... and the games-object was assumed to already have the proper number of rows and the results of computations were assigning new column values. The $<- operation will succeed if there is no current value for that referenced column.