Error in *tmp*[[j]] : subscript out of bounds - r

Apologies for long post! I'm new to R and have been working hard to improve my command of the language. I stumbled across this interesting project on modelling football results: http://www1.maths.leeds.ac.uk/~voss/projects/2010-sports/JamesGardner.pdf
I keep running into problems when I run the code to Simulate a Full Season (first mentioned page 36, appendix page 59):
Games <- function(parameters)
{
teams <- rownames(parameters)
P <- parameters$teams
home <- parameters$home
n <- length(teams)
C <- data.frame()
row <- 1
for (i in 1:n) {
for (j in 1:n) {
if (i != j) {
C[row,1] <- teams[i]
C[row,2] <- teams[j]
C[row,3] <- rpois(1, exp(P[i,]$Attack - P[j,]$Defence + home))
C[row,4] <- rpois(1, exp(P[j,]$Attack - P[i,]$Defence))
row <- row + 1
}
}
}
return(C)
}
Games(TeamParameters)
The response I get is
Error in `*tmp*`[[j]] : subscript out of bounds
When I attempt a traceback(), this is what I get:
3: `[<-.data.frame`(`*tmp*`, row, 1, value = NULL) at #11
2: `[<-`(`*tmp*`, row, 1, value = NULL) at #11
1: Games(TeamParameters)
I don't really understand what the error means and I would appreciate any help. Once again, apologies for the long post but I'm really interested in this project and would love to learn what the problem is!

The data.frame objects are not extendable by row with the [<-.data.frame operation. (You would need to use rbind.) You should create an object that has sufficient space, either a pre-dimensioned matrix or data.frame. If "C" is an object of 0 rows, then trying to assign to row one will fail. There is a function named "C", so you might want to make its name something more distinct. It also seems likely that there are more efficient methods than the double loop but you haven't describe the parameter object very well.
You may notice that the Appendix of that paper you cited shows how to pre-dimension a dataframe:
teams <- sort(unique(c(games[,1], games[,2])), decreasing = FALSE)
T <- data.frame(Team=teams, ... )
... and the games-object was assumed to already have the proper number of rows and the results of computations were assigning new column values. The $<- operation will succeed if there is no current value for that referenced column.

Related

Loop Changing to Matrix then Running tests

I have a dataframe with ~9000 rows of human coded data in it, two coders per item so about 4500 unique pairs. I want to break the dataset into each of these pairs, so ~4500 dataframes, run a kripp.alpha on the scores that were assigned, and then save those into a coder sheet I have made. I cannot get the loop to work to do this.
I can get it to work individually, using this:
example.m <- as.matrix(example.m)
s <- kripp.alpha(example.m)
example$alpha <- s$value
However, when trying a loop I am getting either "Error in get(v) : object 'NA' not found" when running this:
for (i in items) {
v <- i
v <- v[c("V1","V2")]
v <- assign(v, as.matrix(get(v)))
s <- kripp.alpha(v)
i$alpha <- s$value
}
Or am getting "In i$alpha <- s$value : Coercing LHS to a list" when running:
for (i in items) {
i.m <- i[c("V1","V2")]
i.m <- as.matrix(i.m)
s <- kripp.alpha(i.m)
i$alpha <- s$value
}
Here is an example set of data. Items is a list of individual dataframes.
l <- as.data.frame(matrix(c(4,3,3,3,1,1,3,3,3,3,1,1),nrow=2))
t <- as.data.frame(matrix(c(4,3,4,3,1,1,3,3,1,3,1,1),nrow=2))
items <- c("l","t")
I am sure this is a basic question, but what I want is for each file, i, to add a column with the alpha score at the end. Thanks!
Your problem is with scoping and extracting names from objects when referenced through strings. You'd need to eval() some of your object to make your current approach work.
Here's another solution
library("irr") # For kripp.alpha
# Produce the data
l <- as.data.frame(matrix(c(4,3,3,3,1,1,3,3,3,3,1,1),nrow=2))
t <- as.data.frame(matrix(c(4,3,4,3,1,1,3,3,1,3,1,1),nrow=2))
# Collect the data as a list right away
items <- list(l, t)
Now you can sapply() directly over the elements in the list.
sapply(items, function(v) {
kripp.alpha(as.matrix(v[c("V1","V2")]))$value
})
which produces
[1] 0.0 -0.5

Calculating distance using latitude and longitude error [duplicate]

When working with R I frequently get the error message "subscript out of bounds". For example:
# Load necessary libraries and data
library(igraph)
library(NetData)
data(kracknets, package = "NetData")
# Reduce dataset to nonzero edges
krack_full_nonzero_edges <- subset(krack_full_data_frame, (advice_tie > 0 | friendship_tie > 0 | reports_to_tie > 0))
# convert to graph data farme
krack_full <- graph.data.frame(krack_full_nonzero_edges)
# Set vertex attributes
for (i in V(krack_full)) {
for (j in names(attributes)) {
krack_full <- set.vertex.attribute(krack_full, j, index=i, attributes[i+1,j])
}
}
# Calculate reachability for each vertix
reachability <- function(g, m) {
reach_mat = matrix(nrow = vcount(g),
ncol = vcount(g))
for (i in 1:vcount(g)) {
reach_mat[i,] = 0
this_node_reach <- subcomponent(g, (i - 1), mode = m)
for (j in 1:(length(this_node_reach))) {
alter = this_node_reach[j] + 1
reach_mat[i, alter] = 1
}
}
return(reach_mat)
}
reach_full_in <- reachability(krack_full, 'in')
reach_full_in
This generates the following error Error in reach_mat[i, alter] = 1 : subscript out of bounds.
However, my question is not about this particular piece of code (even though it would be helpful to solve that too), but my question is more general:
What is the definition of a subscript-out-of-bounds error? What causes it?
Are there any generic ways of approaching this kind of error?
This is because you try to access an array out of its boundary.
I will show you how you can debug such errors.
I set options(error=recover)
I run reach_full_in <- reachability(krack_full, 'in')
I get :
reach_full_in <- reachability(krack_full, 'in')
Error in reach_mat[i, alter] = 1 : subscript out of bounds
Enter a frame number, or 0 to exit
1: reachability(krack_full, "in")
I enter 1 and I get
Called from: top level
I type ls() to see my current variables
1] "*tmp*" "alter" "g"
"i" "j" "m"
"reach_mat" "this_node_reach"
Now, I will see the dimensions of my variables :
Browse[1]> i
[1] 1
Browse[1]> j
[1] 21
Browse[1]> alter
[1] 22
Browse[1]> dim(reach_mat)
[1] 21 21
You see that alter is out of bounds. 22 > 21 . in the line :
reach_mat[i, alter] = 1
To avoid such error, personally I do this :
Try to use applyxx function. They are safer than for
I use seq_along and not 1:n (1:0)
Try to think in a vectorized solution if you can to avoid mat[i,j] index access.
EDIT vectorize the solution
For example, here I see that you don't use the fact that set.vertex.attribute is vectorized.
You can replace:
# Set vertex attributes
for (i in V(krack_full)) {
for (j in names(attributes)) {
krack_full <- set.vertex.attribute(krack_full, j, index=i, attributes[i+1,j])
}
}
by this:
## set.vertex.attribute is vectorized!
## no need to loop over vertex!
for (attr in names(attributes))
krack_full <<- set.vertex.attribute(krack_full,
attr, value = attributes[,attr])
It just means that either alter > ncol( reach_mat ) or i > nrow( reach_mat ), in other words, your indices exceed the array boundary (i is greater than the number of rows, or alter is greater than the number of columns).
Just run the above tests to see what and when is happening.
Only an addition to the above responses: A possibility in such cases is that you are calling an object, that for some reason is not available to your query. For example you may subset by row names or column names, and you will receive this error message when your requested row or column is not part of the data matrix or data frame anymore.
Solution: As a short version of the responses above: you need to find the last working row name or column name, and the next called object should be the one that could not be found.
If you run parallel codes like "foreach", then you need to convert your code to a for loop to be able to troubleshoot it.
If this helps anybody, I encountered this while using purr::map() with a function I wrote which was something like this:
find_nearby_shops <- function(base_account) {
states_table %>%
filter(state == base_account$state) %>%
left_join(target_locations, by = c('border_states' = 'state')) %>%
mutate(x_latitude = base_account$latitude,
x_longitude = base_account$longitude) %>%
mutate(dist_miles = geosphere::distHaversine(p1 = cbind(longitude, latitude),
p2 = cbind(x_longitude, x_latitude))/1609.344)
}
nearby_shop_numbers <- base_locations %>%
split(f = base_locations$id) %>%
purrr::map_df(find_nearby_shops)
I would get this error sometimes with samples, but most times I wouldn't. The root of the problem is that some of the states in the base_locations table (PR) did not exist in the states_table, so essentially I had filtered out everything, and passed an empty table on to mutate. The moral of the story is that you may have a data issue and not (just) a code problem (so you may need to clean your data.)
Thanks for agstudy and zx8754's answers above for helping with the debug.
I sometimes encounter the same issue. I can only answer your second bullet, because I am not as expert in R as I am with other languages. I have found that the standard for loop has some unexpected results. Say x = 0
for (i in 1:x) {
print(i)
}
The output is
[1] 1
[1] 0
Whereas with python, for example
for i in range(x):
print i
does nothing. The loop is not entered.
I expected that if x = 0 that in R, the loop would not be entered. However, 1:0 is a valid range of numbers. I have not yet found a good workaround besides having an if statement wrapping the for loop
This came from standford's sna free tutorial
and it states that ...
# Reachability can only be computed on one vertex at a time. To
# get graph-wide statistics, change the value of "vertex"
# manually or write a for loop. (Remember that, unlike R objects,
# igraph objects are numbered from 0.)
ok, so when ever using igraph, the first roll/column is 0 other than 1, but matrix starts at 1, thus for any calculation under igraph, you would need x-1, shown at
this_node_reach <- subcomponent(g, (i - 1), mode = m)
but for the alter calculation, there is a typo here
alter = this_node_reach[j] + 1
delete +1 and it will work alright
What did it for me was going back in the code and check for errors or uncertain changes and focus on need-to-have over nice-to-have.

is it possible to generate a list of seq() for loop in r?

I am very new to R and I have some problem on performing loop using seq() and list. I have search on the QnA in SO, yet I have to find the same problem as this. I apologize if there is a duplicate QnA on this.
I know the basic on how to generate sequence of number and generate using list, however I am wondering whether we can generate a list of sequence for each loop.
this is an example of my code
J <- seq(50,200,50) # (I actually wanted to use 1: J to generate a sequence of each combinations . i.e: 1:50, 1:100 etc)
K <- seq(10,100,10) #(same as the above)
set.seed(1234)
for (i in J) {
for (j in K){
f <- rnorm(i + 1) # the f value I would like it to be generate in terms of list, since the j have 4 sequence value, if possible, could it adhere to that?
}
}
i try using both sequence and list function, but i keep getting either messages:
if print(i)
output
[1]1
.
.
.
[1]50
Warning message:
In 1:(seq(50, 200, 50)) :
numerical expression has 6 elements: only the first used
for (i in 1:list(seq(50,200,50)))
Error in 1:list(seq(50, 200, 50)) : NA/NaN argument
May I know, whether such loop combinations can be perform? Could you please guide me on this? Thank you very much.
not yet sure of what you are asking but is this what you are looking for? It was difficult to post this as a comment
J <- seq(50,200,50)
l1 <- vector(length = length(J), mode = "list")
for (i in seq_along(J)){ # you know of seq_along() right?
l1[[i]] = rnorm(J[i])
}
for the second question where you want lists(J) of lists(K) of matrices : Please do note hat <<- has never been a good practice, but for now this is what i could come up with!
Note : to understand what is actually happening, go into the debug mode : i.e. after defining func, also pass debug(func) which will then go into step-by-step execution.
l1 <- vector(length = length(J), mode = "list")
l2 <- vector(length = length(K), mode = "list")
func <- function(x){
l1[[x]] <- l2
func1 <- function(y) {
l1[[x]][[y]] <<- matrix(rnorm(J[x]*K[y]),
ncol = J[x],
nrow = K[y])
}
lapply(seq_along(l1[[x]]),func1)
}
lapply(seq_along(l1), func)

R: Erroneous adding column, misclassifies instances

I 've got an R assignment in which I have to add a column to my matrix. It's about dates(time zones), I use dplyr and lubridate libraries.
So I want from the below table to according to the state column to add its OlsonName(i.e. NSW -> Australia/NSW)
Event.ID Database Date.Time Nearest.town State *OlsonName*
1 20812 Wind 23/11/1975 07:00 SYDNEY NSW *Australia/NSW*
2 20813 Tornado 02/12/1975 14:00 BARHAM NSW *Australia/NSW*
I implement that with a function and a loop:
#function
addOlsonNames <- function(aussieState,aussieTown){
if(aussieState=="NSW"){
if(aussieTown=="BROKEN HILL"){
value <- "Australia/Broken_Hill";
}else{
value <- "Australia/NSW"
}
}else if(aussieState=="QLD"){
value <- "Australia/Queensland"
}else if(aussieState=="NT"){
value <- "Australia/North"
}else if(aussieState=="SA"){
value <- "Australia/South"
}else if(aussieState=="TAS"){
value <- "Australia/Tasmania"
}else if(aussieState=="VIC"){
value <- "Australia/Victoria"
}else if(aussieState=="WA"){
value <- "Australia/West"
}else if(aussieState=="ACT"){
value <- "Australia/ACT"
}
else{
value <- "NAN"
}
return(value)
}
#loop
for(i in 1:nrow(aussieStorms)){
aussieStorms$OlsonName[i] <- addOlsonNames(State[i],Nearest.town[i])
}
Most of the instances are classified correctly like on my table above but some of the instances are misclassified(i.e. State~TAS -> OlsonName~Australia/West. Altough I have some State~TAS -> OlsonName~Australia/Tasmania).
Seems strange to me. What might be the issue ?
Update:
I also tried mutate() and that's what I got:
aus1 <- mutate(aussieStorms,OlsonXYZ = addOlsonNames(State,Nearest.town))
Warning messages:
1: In if (aussieState == "NSW") { :
the condition has length > 1 and only the first element will be used
2: In if (aussieTown == "BROKEN HILL") { :
the condition has length > 1 and only the first element will be used
If Ben Bolker's comment is right then the problem is in here:
for(i in 1:nrow(aussieStorms)){
aussieStorms$OlsonName[i] <- addOlsonNames(State[i],Nearest.town[i])
}
in that the values passed to addOlsonNames are not coming from rows of the aussieStorms data frame. If R isn't giving an error, then it must be getting State[i] from another object called State in your R workspace. Similarly for Nearest.town. If those objects aren't the same as the ones in your aussieStorms data frame, that would explain the apparent misclassification.
[Its also possible that you've used attach on a data frame at some point, and State is being got from that. But attaching data frames is a bad idea as you can see here...]
Ben's solution, ie making them aussieStorms$State and aussieStorms$Nearest.town look good to me.

SVM Feature Selection using SCAD

Using penalizedSVM R package, I am trying to do feature selection. There is a list of several data.frames called trainingdata.
trainingdata <-lapply(trainingdata, function(data)
{
levels(data$label) <- c(-1, 1)
train_x<-data[, -1]
train_x<-data.matrix(train_x)
trainy<-data[, 1]
print(which(!is.finite(train_x)))
scad.fix<-svm.fs(train_x, y=trainy, fs.method="scad",
cross.outer=0, grid.search="discrete",
lambda1.set=lambda1.scad, parms.coding="none",
show="none", maxIter=1000, inner.val.method="cv",
cross.inner=5, seed=seed, verbose=FALSE)
data <- data[c(1, scad.fix$model$xind)]
data
})
Some iterations go well but then on one data.frame I am getting the following error message.
[1] "feature selection method is scad"
Error in svd(m, nv = 0, nu = 0) : infinite or missing values in 'x'
Calls: lapply ... scadsvc -> .calc.mult.inv_Q_mat2 -> rank.condition -> svd
Using the following call, I am also checking whether x is really infinite but the call returns 0 for all preceding and the current data.frame where the error has occurred.
print(which(!is.finite(train_x)))
Is there any other way to check for infinite values? What else could be done to rectify this error? Is there any way that one can determine the index of the current data.frame being processed within lapply?
For the first question , infinite or missing values in 'x' suggests that you change your condition to something like .
idx <- is.na(train_x) | is.infinite(train_x)
You can assign 0 for example to theses values.
train_x[idx] <- 0
For the second question , concerning how to get the names of current data.frame within lapply you can loop over the names of data.farmes, and do something like this :
lapply(names(trainingdata), function(data){ data <- trainingdata[data]....}
For example:
ll <- list(f=1,c=2)
> lapply(names(list(f=1,c=2)), function(x) data <- ll[x])
[[1]]
[[1]]$f
[1] 1
[[2]]
[[2]]$c
[1] 2
EDIT
You can use tryCatch before this line scad.fix<-svm.fs
tryCatch(
scad.fix<-svm.fs(....)
, error = function(e) e)
})
for example, here I test it on this list, the code continues to be executing to the end of list ,even there is a NA in the list.
lapply(list(1,NA,2), function(x){
tryCatch(
if (any(!is.finite(x)))
stop("infinite or missing values in 'x'")
, error = function(e) e)
})

Resources