I am currently building a reccursive loop in R, where I have to keep track of how deep I am in a nested list. I, however, have run into problems when counting in nested lists.
Here is the problem illustrated:
I have a list
myList <- list()
I test the value of a random index in the list
myList[["test1"]]
NULL
I can sum this value and get zero
sum(myList[["test1"]])
0
Now I assign a value to this index
myList[["test1"]] <- sum(myList[["test1"]]) + 1
Next I want to do the same just deeper into the list
myList[["test1"]][["test2"]]
Error in myList[["test1"]][["test2"]] : subscript out of bounds
Why is this happening?
When you set the value of myList[["test1"]] with...
myList[["test1"]] <- sum(myList[["test1"]]) + 1
myList[["test1"]] becomes a 1 element numeric vector, not a list
If you want to make test1 a list, with one of its elements being another (sub)list called test2, you can do this...
myList <- list()
myList[["test1"]] <- list(sum(myList[["test1"]]) + 1)
myList[["test1"]]["test2"] <- list(sum(myList[["test1"]][[1]]) + 1)
myList
myList$test1
myList$test1$test2
I'm not sure what you're trying to do, but this is a simpler version of what you did:
> x<-1
> x[["test2"]]
Error in x[["test2"]] : subscript out of bounds
Here x is a numeric vector. It can still be subscripted using [[ but there is no element named test2 in x, so you get an "out of bounds" error when you try to access that element.
Even a vector with more than one element would give this error:
> c(1,2)[["test2"]]
Error in c(1, 2)[["test2"]] : subscript out of bounds
However, if we name one of them test2 then the subscripting returns something:
> c(1,2,test2=3)[["test2"]]
[1] 3
Related
I have a few problems concerning the same topic.
(1) I am trying to loop over:
premium1999 <- as.data.frame(coef(summary(data1999_mod))[c(19:44), 1])
for 10 years, in which I wrote:
for (year in seq(1999,2008)) {
paste0('premium',year) <- as.data.frame(coef(summary(paste0('data',year,'_mod')))[c(19:44), 1])
}
Note:
for data1999_mod is regression results that I want extract some of its estimators as a dataframe vector.
The coef(summary(data1999_mod)) looks like this:
#A matrix: ... of type dbl
Estimate Std. Error t value Pr(>|t|)
age 0.0388573570 2.196772e-03 17.6883885 3.362887e-6
age_sqr -0.0003065876 2.790296e-05 -10.9876373 5.826926e-28
relation 0.0724525759 9.168118e-03 7.9026659 2.950318e-15
sex -0.1348453659 8.970138e-03 -15.0326966 1.201003e-50
marital 0.0782049161 8.928773e-03 8.7587533 2.217825e-18
reg 0.1691004469 1.132230e-02 14.9351735 5.082589e-50
...
However, it returns Error: $ operator is invalid for atomic vectors, even if I did not use $ operator here.
(2) Also,
I want to create a column 'year' containing repeated values of the associated year and am trying to loop over this:
premium1999$year <- 1999
In which I wrote:
for (i in seq(1999,2008)) {
assign(paste0('premium',i)[['year']], i)
}
In this case, it returns Error in paste0("premium", i)[["year"]]: subscript out of bounds
(3) Moreover, I'd like to repeat some rows and loop over:
premium1999 <- rbind(premium1999, premium1999[rep(1, 2),])
for 10 years again and I wrote:
for (year in seq(1999,2008)) {
paste0('premium',year) <- rbind(paste0('premium',year), paste0('premium',year)[rep(1, 2),])
}
This time it returns Error in paste0("premium", year)[rep(1, 2), ]: incorrect number of dimensions
I also tried to loop over a few other similar things but I always get Error.
Each code works fine individually.
I could not find what I did wrong. Any help or suggestions would be very highly appreciated.
The problem with the code is that the paste0() function returns the character and not calling the object that is having the name as this character. For example, paste0('data',year,'_mod') returns a character vector of length 1, i.e., "data1999_mod" and not calling the object data1999_mod.
For easy understanding, there is huge a difference between, "data1999_mod"["Estimate"] and data1999_mod["Estimate"]. Subsetting as data frame merely by paste0() function returns the former, however, the expected output will be given by the latter only. That is why you are getting, Error: $ operator is invalid for atomic vectors.
The same error is found in all of your codes. On order to call the object by the output of a paste0() function, we need to enclose is by get().
As, you have not supplied the reproducible sample, I couldn't test it. However, you can try running these.
#(1)
for (year in seq(1999,2008)) {
paste0('premium',year) <- as.data.frame(coef(summary(get(paste0('data',year,'_mod'))))[c(19:44), 1])
}
#(2)
for (i in seq(1999,2008)) {
assign(get(paste0('premium',i))[['year']], i)
}
#(3)
for (year in seq(1999,2008)) {
paste0('premium',year) <- rbind(get(paste0('premium',year)), get(paste0('premium',year))[rep(1, 2),])
}
I have two lists, each list contains two vectors i.e,
x <- list(c(1,2),c(3,4))
y <- list(c(2,4),c(5,6))
z <- list(c(0,0),c(1,1), c(2,3),c(4,5))
I would like to use for loop to iterate over the first list and if statement for the second list as follows:
for (j in 1:seq(x)){
if(y[[j]] == c(2,4))
z[[j]] <- c(0,0)
}
I would like to iterate over the first list and for each iteration I would like to give a condition for the second list. My function is complex, so I upload this example which is similar to what I am trying to do with my original function. So that is, I would like to choose the values of z based on the values of y. For x I just want to run the code based on the length of x.
When I run it, I got this message:
Warning messages:
1: In 1:seq(x) : numerical expression has 2 elements: only the first used
2: In if (y[[j]] == c(2, 4)) y[[j]] <- c(0, 0) :
the condition has length > 1 and only the first element will be used
I search this website and I saw similar question but it is not helpful (if loop inside a for loop which iterates over a list in R?). This question is just for the first part my question. So, it does not help me with my problem.
any help please?
The first warning is caused by using seq() which returns a [1] 1 2 in combination with the colon operator which creates a sequence between the LHS and RHS. Both values on the left and right of the colon must be of length 1. Otherwise it will take the first element and discard the rest. So 1:seq(x) is the same as writing 1:1
The second warning is that the if statement gets 2 logical values from your condition:
y[[1]] == c(2, 4)
[1] TRUE TRUE
If you want to test if elements of the vector are the same you can use your notation. If you want to test if the vectors are the same, you can use all.equal.
isTRUE(all.equal(y[[1]], c(2,4)))
[1] TRUE
It returns TRUE if vectors are equal (but not FALSE if they are not, which is why it needs to be used along with isTRUE()).
To get rid of the warnings, you can do:
for (j in seq_along(x)){
if (isTRUE(all.equal(y[[j]], c(2,4)))) {
z[[j]] <- c(0,0)
}
}
Note: seq_along() is a fast primitive for seq()
For the first part, seq() will returns [1] 1 2. So, you need to use j in seq(x) or j in 1:length(x).
and for the second part, as the command you used generates TRUE and FALSE as many as the elements in the vectors, you can use setequal(x,y). This command will check whether two objects are equal or not. two objects can be vectors, dataframes, etc, and the result is TRUE or FALSE.
The final code can be:
for (j in 1:length(x)){
if (setequal(y[[j]], c(2,4)) == TRUE) {
z[[j]] <- c(0,0)
}
}
or:
for (j in seq(x)){
if (setequal(y[[j]], c(2,4)) == TRUE) {
z[[j]] <- c(0,0)
}
}
I have a function where im trying to compare a dataframe column to a ref table of type character. I have downloaded some data from the Norwegian central statistics office with popular first names. I want to add a column to my data frame which is basically a 1 or a 0 if the name appears in the list (1 being a boy 0 being a girl). Im getting the following error with the code
*Error in match(x, table, nomatch = 0L) : object 'x' not found*
Data frame is train.
Reference data is male_names
male_names <- read.csv("~/R/Functions_Practice/NO/BoysNames_Data.csv", sep=";",as.is = TRUE)[ ,1]
get.sex <- function(x, ref)
for (i in ref)
{
if(x %in% ref)
{return (1)}
}
# set default for column
train$sex <- 2
# Update column if it appears in the names list
train$sex <- sapply(train$sex, FUN=get.sex(x,male_names))
I would then use the function to run the second Girls Name file against the table and set the flag for each record to zero where that occurs
Can anyone help
When using sapply, you don't write arguments directly in the FUN parameter.
train$sex <- sapply(train$sex, FUN=get.sex,ref = male_names)
It is implied that train$sex is the x argument, and all other parameters are passed after that (in this case, it's just ref) and are explicitly defined.
Edit:
As joran noted, in this case sapply isn't particularly useful, and you can do the results in one line:
train$sex = (train$sex %in% male_names)*1
%in% can be used when the argument on the left is a vector, so you don't have to loop over it. Multiplying the result by one converts logical (boolean) values into integers. 1*TRUE yields 1, and 1*FALSE yields 0.
I am using lapply to select elements from vectors in a list, but not all vectors in the list include the same number of elements. I typically use:
lapply(some.list,"[[",n)
were n is the index of the element in the vectors I am trying to parse out. However, this time my list looks more like this:
some.vect <- c("aaa_elem1","aa_elem2","elem3","bb_elem4","ccc_elem5","abc_elem6")
some.list <- strsplit(some.vect,"_")
When I use my normal lapply method:
lapply(some.list,"[[",2)
I get the following error: Error in FUN(X[[3L]], ...) : subscript out of bounds as expected, because not all vectors in the list have two elements. What I would like is a way to declare the index in lapply as the length of the vector.
I also tried defining a vector of the list vector lengths, and assigning it to index:
vect.length <- unlist(lapply(some.list,length))
lapply(some.list,"[[",vect.length)
(Error in FUN(X[[1L]], ...) : attempt to select more than one element)
and not assigning an index at all:
lapply(some.list,"[[")
(Error in FUN(X[[1L]], ...) : no index specified)
Is there a way to select all of the last elements of vectors in a list?
Use tail...
lapply(some.list, tail , 1 )
I'd like to know the reason why the following does not work on the matrix structure I have posted here (I've used the dput command).
When I try running:
apply(mymatrix, 2, sum)
I get:
Error in FUN(newX[, i], ...) : invalid 'type' (list) of argument
However, when I check to make sure it's a matrix I get the following:
is.matrix(mymatrix)
[1] TRUE
I realize that I can get around this problem by unlisting the data into a temp variable and then just recreating the matrix, but I'm curious why this is happening.
?is.matrix says:
'is.matrix' returns 'TRUE' if 'x' is a vector and has a '"dim"'
attribute of length 2) and 'FALSE' otherwise.
Your object is a list with a dim attribute. A list is a type of vector (even though it is not an atomic type, which is what most people think of as vectors), so is.matrix returns TRUE. For example:
> l <- as.list(1:10)
> dim(l) <- c(10,1)
> is.matrix(l)
[1] TRUE
To convert mymatrix to an atomic matrix, you need to do something like this:
mymatrix2 <- unlist(mymatrix, use.names=FALSE)
dim(mymatrix2) <- dim(mymatrix)
# now your apply call will work
apply(mymatrix2, 2, sum)
# but you should really use (if you're really just summing columns)
colSums(mymatrix2)
The elements of your matrix are not numeric, instead they are list, to see this you can do:
apply(m,2, class) # here m is your matrix
So if you want the column sum you have to 'coerce' them to be numeric and then apply colSums which is a shortcut for apply(x, 2, sum)
colSums(apply(m, 2, as.numeric)) # this will give you the sum you want.