How to check if subscript will be out of bounds? - r

If i want to check the existence of a variable I use
exists("variable")
In a script I am working on I sometimes encounter the problem of a "subscript out of bounds" after running, and then my script stops. In an if statement I would like to be able to check if a subscript will be out of bounds or not. If the outcome is "yes", then execute an alternative peace of the script, and if "not", then just continue the script as it was intended.
In my imagination in case of a list it would look something like:
if {subscriptOutofBounds(listvariable[[number]]) == TRUE) {
## execute this part of the code
}
else {
## execute this part
}
Does something like that exist in R?

You can compare the length of your list with other number. As an illustration, say I have a list with 3 index and want to check by comparing them with a vector of number 1 to 100.
lol <- list(c(1:10),
c(100:200),
c(3:50))
lol
check_out <- function(x) {
maxi <- max(x)
if (maxi > length(lol)) {
#Excecute this part of code
print("Yes")
}
else {
#Excecute this part of code
print("No")
}
}
num <- 1:100
check_out(num)
The biggest number of vector num is 100 and your list only has 3 index (or length =3), so it will be out of bound from your list, then it will return Yes

Related

Missing value often results in errors in lapply (in R)

I got errors in this code:
FUN = function(files) {
df_week<- data.table::fread(files)
#Sun rate
for (i in 1: nrow(df_week) ){
#check if df is not NA
if(!is.na(df_week[i]))
{
if(df_week$Sun[i] >=10 ) {df_week$Sunr[i] =5}
....
}
}
files = list.files(pattern="1_Stas*")
lapply(files, FUN)
Output:
Error in if (!is.na(df_week[i])) { : argument is of length zero
In addition: Warning message:
13 failed to parse.
Why does the code if () {} gives errors?
If the input contains missing value or NA, the ouput should be NaN or NA , and lapply should continue to the next list of files.
I have tried it with a single file without using lapply and function, the output appears in the environment as empty data.
So when I do it one by one, there's no error. When it is done using lapply, very often there would be problems. Should I uses for loop instead?
Any suggestions to fix it and make lapply continue to the next list of files when the previous file contains missing value?
Thanks.
Just use a for loop. The apply family is good for quick one-liners when you need to apply a single operation to one dimension of your data. Your code will become:
files = list.files(pattern="1_Stas*")
df_week <- data.table::fread(files) # Make sure length(files) == 1
#Sun rate
for (i in 1:nrow(df_week)) {
#check if df is not NA
if (!is.na(df_week[i])) {
if (df_week$Sun[i] >= 10) {
df_week$Sunr[i] = 5
}
...
}
}
The line containing if(!is.na(df_week[i])) { needs clarification. From context, length(dim(df_week)) > 1, so you probably want
if (all(!is.na(df_week[i]))) {
...
}
all(!is.na(df_week[i])) returns true when the ith row of df_week containes no NA values.

Why does this work as a for loop but not within a function?

I'm trying to write a function that identifies if a number within a numerical vector is odd or even. The numerical vector has a length of 1000.
I know that the for loop works fine, and I just wanted to generalize it in the form of a function that takes a vector of any length
out<-vector()
f3<- function(arg){
for(i in 1:length(arg)){
if((arg[i]%%2==0)==TRUE){
out[i]<-1
}else{out[i]<-0
}
}
}
When run within a function, however, it just returns a NULL. Why is that, or what do I need to do to generalize the function work with any numerical vector?
As already mentioned by PKumar in the comments: Your function doesn't return anything, which means, the vector out exists only in the environment of your function.
To change this you can add return(out) to the end of your function. And you should also start your function with creating out before the loop. So your function would look like outlined below.
Note, that I assume you want to pass a vector of a certain length to your function, and get as a result a vector of the same length which contains 1 for even numbers and 0 for odd numbers. f3(c(1,1,2)) would return 0 0 1.
f3 <- function(arg){
out <- vector(length = length(arg), mode = "integer")
for(i in 1:length(arg)){
if((arg[i]%%2==0)==TRUE){ # note that arg[i]%%2==0 will suffice
out[i]<-1
} else {out[i]<-0
}
}
return(out) # calling out without return is enough and more inline with the tidyverse style guide
}
However, as also pointed out by sebastiann in the comments, some_vector %% 2 yields almost the same result. The difference is, that odd numbers yield 1 and even numbers 0. You can also put this into a function and subtract 1 from arg to reverse 0 and 1 :
f3 <- function(arg){
(arg-1) %% 2
}
A few thing to note about your code:
A function must return something
The logical if((arg[i]%%2==0)==TRUE) is redundant. if(arg[i]%%2==0) is enough, but wrong, because arg[i] does not exist.
the length(arg) is the length(1000) which, if ran, returns 1
You should change arg[i] with i and assign to i all the values from 1:1000, as follows:
R
out <-vector()
f3 <- function(arg){
for(i in 1:arg){
if(arg[i] %% 2 == 0){
out[i] <- 1
}
else{
out[i] <- 0
}
}
return(out)
}
f3(1000)

Calculating distance using latitude and longitude error [duplicate]

When working with R I frequently get the error message "subscript out of bounds". For example:
# Load necessary libraries and data
library(igraph)
library(NetData)
data(kracknets, package = "NetData")
# Reduce dataset to nonzero edges
krack_full_nonzero_edges <- subset(krack_full_data_frame, (advice_tie > 0 | friendship_tie > 0 | reports_to_tie > 0))
# convert to graph data farme
krack_full <- graph.data.frame(krack_full_nonzero_edges)
# Set vertex attributes
for (i in V(krack_full)) {
for (j in names(attributes)) {
krack_full <- set.vertex.attribute(krack_full, j, index=i, attributes[i+1,j])
}
}
# Calculate reachability for each vertix
reachability <- function(g, m) {
reach_mat = matrix(nrow = vcount(g),
ncol = vcount(g))
for (i in 1:vcount(g)) {
reach_mat[i,] = 0
this_node_reach <- subcomponent(g, (i - 1), mode = m)
for (j in 1:(length(this_node_reach))) {
alter = this_node_reach[j] + 1
reach_mat[i, alter] = 1
}
}
return(reach_mat)
}
reach_full_in <- reachability(krack_full, 'in')
reach_full_in
This generates the following error Error in reach_mat[i, alter] = 1 : subscript out of bounds.
However, my question is not about this particular piece of code (even though it would be helpful to solve that too), but my question is more general:
What is the definition of a subscript-out-of-bounds error? What causes it?
Are there any generic ways of approaching this kind of error?
This is because you try to access an array out of its boundary.
I will show you how you can debug such errors.
I set options(error=recover)
I run reach_full_in <- reachability(krack_full, 'in')
I get :
reach_full_in <- reachability(krack_full, 'in')
Error in reach_mat[i, alter] = 1 : subscript out of bounds
Enter a frame number, or 0 to exit
1: reachability(krack_full, "in")
I enter 1 and I get
Called from: top level
I type ls() to see my current variables
1] "*tmp*" "alter" "g"
"i" "j" "m"
"reach_mat" "this_node_reach"
Now, I will see the dimensions of my variables :
Browse[1]> i
[1] 1
Browse[1]> j
[1] 21
Browse[1]> alter
[1] 22
Browse[1]> dim(reach_mat)
[1] 21 21
You see that alter is out of bounds. 22 > 21 . in the line :
reach_mat[i, alter] = 1
To avoid such error, personally I do this :
Try to use applyxx function. They are safer than for
I use seq_along and not 1:n (1:0)
Try to think in a vectorized solution if you can to avoid mat[i,j] index access.
EDIT vectorize the solution
For example, here I see that you don't use the fact that set.vertex.attribute is vectorized.
You can replace:
# Set vertex attributes
for (i in V(krack_full)) {
for (j in names(attributes)) {
krack_full <- set.vertex.attribute(krack_full, j, index=i, attributes[i+1,j])
}
}
by this:
## set.vertex.attribute is vectorized!
## no need to loop over vertex!
for (attr in names(attributes))
krack_full <<- set.vertex.attribute(krack_full,
attr, value = attributes[,attr])
It just means that either alter > ncol( reach_mat ) or i > nrow( reach_mat ), in other words, your indices exceed the array boundary (i is greater than the number of rows, or alter is greater than the number of columns).
Just run the above tests to see what and when is happening.
Only an addition to the above responses: A possibility in such cases is that you are calling an object, that for some reason is not available to your query. For example you may subset by row names or column names, and you will receive this error message when your requested row or column is not part of the data matrix or data frame anymore.
Solution: As a short version of the responses above: you need to find the last working row name or column name, and the next called object should be the one that could not be found.
If you run parallel codes like "foreach", then you need to convert your code to a for loop to be able to troubleshoot it.
If this helps anybody, I encountered this while using purr::map() with a function I wrote which was something like this:
find_nearby_shops <- function(base_account) {
states_table %>%
filter(state == base_account$state) %>%
left_join(target_locations, by = c('border_states' = 'state')) %>%
mutate(x_latitude = base_account$latitude,
x_longitude = base_account$longitude) %>%
mutate(dist_miles = geosphere::distHaversine(p1 = cbind(longitude, latitude),
p2 = cbind(x_longitude, x_latitude))/1609.344)
}
nearby_shop_numbers <- base_locations %>%
split(f = base_locations$id) %>%
purrr::map_df(find_nearby_shops)
I would get this error sometimes with samples, but most times I wouldn't. The root of the problem is that some of the states in the base_locations table (PR) did not exist in the states_table, so essentially I had filtered out everything, and passed an empty table on to mutate. The moral of the story is that you may have a data issue and not (just) a code problem (so you may need to clean your data.)
Thanks for agstudy and zx8754's answers above for helping with the debug.
I sometimes encounter the same issue. I can only answer your second bullet, because I am not as expert in R as I am with other languages. I have found that the standard for loop has some unexpected results. Say x = 0
for (i in 1:x) {
print(i)
}
The output is
[1] 1
[1] 0
Whereas with python, for example
for i in range(x):
print i
does nothing. The loop is not entered.
I expected that if x = 0 that in R, the loop would not be entered. However, 1:0 is a valid range of numbers. I have not yet found a good workaround besides having an if statement wrapping the for loop
This came from standford's sna free tutorial
and it states that ...
# Reachability can only be computed on one vertex at a time. To
# get graph-wide statistics, change the value of "vertex"
# manually or write a for loop. (Remember that, unlike R objects,
# igraph objects are numbered from 0.)
ok, so when ever using igraph, the first roll/column is 0 other than 1, but matrix starts at 1, thus for any calculation under igraph, you would need x-1, shown at
this_node_reach <- subcomponent(g, (i - 1), mode = m)
but for the alter calculation, there is a typo here
alter = this_node_reach[j] + 1
delete +1 and it will work alright
What did it for me was going back in the code and check for errors or uncertain changes and focus on need-to-have over nice-to-have.

For loop in R(no clue how stop the loop and continue with the rest of my code)

I would like to stop the loop when my if condition becomes true and continue to the next lines of my code. When I run this, I get an error that the subscript i in blocks[[i]] is out of bounds. Any ideas?
for(i in 1:length(blocks)){
if(length(blocks[[i]]) == 0){
path <- path[ -i , -i]
modes = rep("A", nrow(path))
blocks[[i]] = NULL
}
}
NOTE: I have read the help for loops, next, stop and break
If I understand you correctly, you want to loop while a condition is true and exit the loop when the condition become false. So you should be using the while control-flow operator, not for.
i <- 1
while(i<=length(blocks) & length(blocks[[i]])!=0){
i <- i+1
}
# check that we exited the loop because there was a zero
# and not because we went through all the blocks
if(i != length(blocks+1)) {
path <- path[ -i , -i]
modes = rep("A", nrow(path))
blocks[[i]] = NULL
}
# rest of your code
But there's really no need for loop here.
# apply 'length' to every element of the list blocks
# returns a vector containing all the lengths
all_length <- sapply(blocks, FUN=length)
# check that there is at least one zero
if(any(all_length==0)) {
# find the indexes of the zeros in 'all_length'
zero_length_ind <- which(all_length==0)
# this is the index of the first zero
i <- min(zero_length_ind)
}
I don't know what you want to do but if your plan is to treat all the 'i' sequentially, you may actually want to work with zero_length_ind and treat all your zeros at once.
For example if you want to remove all the values in path corresponding to zero length in blocks, you should directly do:
path <- path[-zero_length_ind,-zero_length_ind]
(Note that if there is no zero length element in blocks, then zero_length_ind will be integer(0) (that is, an empty integer vector) and you can't use it to index path. This might save you some debugging time.)
Use the STOP command, the code below should work
for(i in 1:length(blocks)){
if(length(blocks[[i]]) == 0){
path <- path[ -i , -i]
modes = rep("A", nrow(path))
blocks[[i]] = NULL
` stop("Outside bounds")
}
}
I have trouble parsing your title and your description. Seems to me you don't want to stop the loop, but continue for the whole length of your object blocks. This could help, it should skip to the next iteration when the length is different from 0:
for(i in 1:length(blocks)){
if(length(blocks[[i]]) != 0) {
next
} else if(length(blocks[[i]]) == 0){
path <- path[ -i , -i]
modes = rep("A", nrow(path))
blocks[[i]] = NULL
}
}

For loop and 'which' command

So I'm currently trying to get a random initial pathway between nodes. I've tried the following code, but at times it 'skips' a node i,e sometimes the same node is visited twice rather than it traversing each one. But since I've defined a visited node's 'column' as all 0 I don't see why this should happen when using the which(>0) command. Any advice?
A<-matrix(sample(1:15,25,replace=TRUE), ncol=5)
n=nrow(A)
b=c()
a=c(1:nrow(A))
b[1]=sample(a,1)
for(i in 2:n){
A[,b[i-1]]<-rep(0,n)
d=which(A[b[i-1],]>0)
b[i]=sample(d,1)
}
print(b)
The problem is that sample behaves differently when you pass it a vector of length 1. Observe
set.seed(14)
x<-c(5,3)
sample(x, 1)
# [1] 5
x<-5
sample(x, 1)
# [1] 4
you see that sample returned 4. When you pass in a vector of length one, it draws from 1:x. You can write your own wrapper if you like
Sample<-function(x,n ) {
if(length(x)>1)
sample(x,n)
else if (length(x)==1 & n==1) {
x
} else {
stop("error")
}
}
and then use this function instead.
But it seems like you are just shuffling your rows. Why not just permute the index with one call to sample:
sample(seq_len(nrow(A))

Resources