How to select all elements of a list at a particular nested Level? - r

Or put it differently: How can I use the [[ operator in a nested list?
You can consider this as a follow-up question on this one, when I asked how to determine the depth level of a list. I got some decent answers from #Spacedman and #flodel who both suggested recursive functions. Both solutions where quite similar and worked for me.
However I haven't figured out yet what to do with the information I get from these functions. Let's say I have a list nested at level i and I want to get back a list that contains all i-th level elements, like this:
myList$firstLevel$secondLevel$thirdLevel$fourthLevel
# fourthLevel contains 5 data.frames and thirdLevel has
# three elements
How can I get back all 15 data.frames from mylist?
I am trying to use e.g.
lapply(mylist,"[[",2)
but obviously I just get the second element of all list elements at the first level.
EDIT: I found the following in the help of extract respectivel ?"[[" but can't really wrap my head around it so far:
"[[ can be applied recursively to lists, so that if the single index i is a vector of length p, alist[[i]] is equivalent to alist[[i1]]...[[ip]] providing all but the final indexing results in a list."
EDIT:
Don't want to end up nesting loops like this.
o <- list()
i=1
for (i in 1:2){
o[[i]] <- mylist[[c(i,1,1,1)]]
}

I found the answer in the meantime. Can't say say I did it on my own. This
link gives an elaborate explanation how to use another (complex) recursive function to linearize the whole nested list wad.
What's really nice about the solution provided by Akhil S. Behl: it deals with the fact that data.frames are lists too and recursing can stop before data.frames. It turned out that this was one of my major problems before.

Related

Extracting Nested Elements of an R List Generated by Loops

For lists within lists produced by a loop in R (in this example a list of caret models) I get an object with an unpredictable length and names for inner elements, such as list[[1]][[n repeats of 1]][[2]] where the internal [[1]] is repeated multiple times according to the function's input. In some cases, the length of n is not known, when accessing some older stored lists where input was not saved. While there are ways to work within a list index, like with list[length(list)], there appears to be no way to do this with repeated nested elements. This has made accessing them and passing them to various jobs awkward. I assume there is an efficient way to access them that I have missed, so I'm asking for help to do so, with an example case given below.
The function I'm generating gives out a list from a function that creates several outputs. The final list returned for a function having a complicated output structure is produced by returning something like:
return(list(listOfModels, trainingData, testingData))
The listofModels has variable length, depending on input of models given, and potentially other conditions depend on evaluation inside the function. It is made by:
listOfModels <- list(c(listOfModels, list(trainedModel)))
Where the "trainedModel" refers to the most recently trained model generated in the loop. The models used and the number of them may vary each time depending on choice. An unfortunate result is a complicated nested lists within a list.
That is, output[[1]] contains the models I want to access more efficiently, which are themselves list objects, while output[[2]] and output[[3]] are the dataframes used to train and evaluate the models. While accessing the dataframes is simple and has a defined, reproducible structure each time (simply being output[[2]], output[[3]] every time), output[[1]] becomes a mess. E.g., something like the following follows the "output[[1]]":
The only thing I am able to attempt in order to access this is using the fact that [[1]] is attached upon output[[1]] before [[2]]. All of the nested elements except one have a [[2]] at the end. Given the above pattern, there is an ugly solution that works, but is not a desirable format to work with. E.g., after evaluating n models given by a vector of strings called inputList, and a list given as output of the function, "output", I can have [[1]] repeated tens to hundreds of times.
for (i in (1:length(inputList)-1)){
eval(rlang::parse_expr(paste0(c("output", c(rep("[[1]]", 1+i)), "[[2]]" ) , collapse="")) )
}
This could be used to use all models for some downstream task like making predictions on new data, or whatever. In cases where the length of the inputList was not known, this could be found out by attempting to repeat this until finding an error, or something similar. This approach can be modified to call on a specific part of the list, for example, a certain model within inputList, if I know the original list input and can find the number for that model. Besides the bulkiness code working this way, compared to some way where I could just call on output[[1]][[n]] using some predictable format for various length n. One of the big problems is when accessing older runs that have been saved where the input list of models was not saved, leaving the length of n unknown. I don't know of any way of using something like length() or lengths() to count how many nested elements exist within a list. (For my example, output[[1]] is of length 1, no matter how many [[1]] repeat elements there are.)
I believe the simplest solution is to change the way the list is saved by the function, so that I can access it by a systematic reference, however, I have a bunch of old lists which I still want to access and perform some work with, and I'd also like to be able to have better control of working with lists in any case. So any help would be greatly appreciated.
I expected there would be some way to query the structure of nested R lists, which could be used to pass nested elements to separate functions, without having to use very long repetition of brackets.

Separating massive character element in a list as its own list (R)

I'm currently working with a list of adjectives from rcorpora that looks like this when I call corpora("words/adjs"):
$description
[1] "A list of English adjectives."
$adjs
[1] "Aristotelian" "Arthurian" "Bohemian" "Brethren" "Mosaic"
[6] "Oceanic" "Proctor" "Terran" "Tudor" "abroad"
There are 1000 words in $adjs but for brevity I only put the first 10. I want to individually save all of the adjectives as their own element in a new list, because the way it is now $adjs is the second element in list words/adjs and a massive character, according to R when I do class(adjectives$adjs). My goal is to be able to randomly go through this new list of adjectives, pick one out, and then use that in my code, but I'm having trouble creating this list of separate, discrete elements. Any help would be appreciated.
Try this:
my_adj <- corpora("words/adjs")$adjs
The corpora package looks like it gives you a nested list (first list is description second list is the adjectives). Nested lists are tough to work with so I like to pull objects out by name or index.
If you need to know if you're working with a nested list use the str(...) function, like this:
str(corpora("words/adjs"))
This will give you a good summary of your R object and then you can dig in and pull out what you need. str(...) is a great function for all objects, not just lists!

How to automatize listing many elements within a command line in R? [duplicate]

Currently, i have multiple dataframes with the same name and in running order (foo1, foo2, foo3, foo4, foo5... etc). I am trying to create a large dataframe containing all the rows of the above dataframes with rbind(). Is there an elegant way to do it which would be the equivalent of rbind(foo1, foo2, foo3, foo4, foo5...)?
I have tried do.call(rbind, paste0("foo",i)) where i=c(1,2,3...) to no avail.
There is a solution mentioned here, which is:
do.matrix <- do.call(rbind, lapply( paste0("variable", 1:10) , get) )
However, the answer mysteriously says "That is the wrong way to handle related items. Better to use a list or dataframe, but you will probably find out why in due course."
Why would that be the wrong way to do this, and what would be the "right" way?
Thanks.
Always try to rigorously capture relations between related instances of data, or related data and methods, or related methods. This generally helps ease aggregate manipulation such as your rbind requirement.
For your case, you should have defined your related data.frames as a single list from the beginning:
foo <- list(data.frame(...), data.frame(...), ... );
And then your requirement could be satisfied thusly:
do.call(rbind, foo );
If it's too late for that, then the solution involving repeated calls to get(), as described in the article to which you linked, can do the job.

multiplying two lists of matrices in continuos form

I have two lists of matrices and I want to multiply the first element of the first list with the first element of the second list and so on, without writing every operatios due to may be a large number of elements on each list (both lists have the same length)
this is what I mean
'(colSums(R1*t(M1))),(colSums(R2*t(M2))),...(colSums(Rn*t(Mn)))'
Do I need to create an extra list?
Although first I must be able to transpose the matrices of one of the lists before multiplying them. The results will be used for easier operations.
I already tried to use indexes and loops and doesn't work,
first tried to transpose matrices in one list like this (M is one of the lists and the other is named R, M contains M1,M2,..Mn and the same for list R)
The complete operation looks like this:
'for (i in 1:length(M)){Mt<-list(t(M[[i]]))}'
and only applies it to the last element.
The full operation looks like this:
'(cbind((colSums(R1*t(M1))),(colSums(R2*t(M2))),...(colSums(Rn*t(Mn))))'
any step of these will be useful
you could use the rlist package.
The function
list.apply(.data, .fun, ...)
will apply a function to each list element.
You can find documentation at [https://cran.r-project.org/web/packages/rlist/rlist.pdf][1].

Subsetting list containing multiple classes by same index/vector

I'm needing to subset a list which contains an array as well as a factor variable. Essentially if you imagine each component of the array is relative to a single individual which is then associated to a two factor variable (treatment).
list(array=array(rnorm(2,4,1),c(5,5,10)), treatment= rep(c(1,2),5))
Typically when sub-setting multiple components of the array from the first component of the list I would use something like
list$array[,,c(2,4,6)]
this would return the array components in location 2,4 and 6. However, for the factor component of the list this wouldn't work as subsetting is different, what you would need is this:
list$treatment[c(2,4,6)]
Need to subset a list with containing different classes (array and vector) by the same relative number.
You're treating your list of matrices as some kind of 3-dimensional object, but it's not.
Your list$matrices is of itself a list as well, which means you can index at as a list as well, it doesn't matter if it is a list of matrices, numerics, plot-objects, or whatever.
The data you provided as an example can just be indexed at one level, so list$matrices[c(2,4,6)] works fine.
And I don't really get your question about saving the indices in a numeric vector, what's to stop you from this code?
indices <- c(2,4,6)
mysubset <- list(list$matrices[indices], list$treatment[indices])
EDIT, adding new info for edited question:
I see you actually have an 3-D array now. Which is kind of weird, as there is no clear convention of what can be seen as "components". I mean, from your question I understand that list$array[,,n] refers to the n-th individual, but from a pure code-point of view there is no reason why something like list$array[n,,] couldn't refer to that.
Maybe you got the idea from other languages, but this is not really R-ish, your earlier example with a list of matrices made more sense to me. And I think the most logical would have been a data.frame with columns matrix and treatment (which is conceptually close to a list with a vector and a list of matrices, but it's clearer to others what you have).
But anyway, what is your desired output?
If it's just subsetting: with this structure, as there are no constraints on what could have been the content, you just have to tell R exactly what you want. There is no one operator that takes a subset of a vector and the 3rd index of an array at the same time. You're going to have to tell R that you want 3rd index to use for subsetting, and that you want to use the same index for subsetting a vector. Which is basically just the code you already have:
idx <- c(2,4,6)
output <- list(list$array[,,idx], list$treatment[idx])
The way that you use for subsetting multiple matrices actually gives an error since you are giving extra dimension although you already specify which sublist you are in. Hence in order to subset matrices for the given indices you can usemy_list[[1]][indices] or directly my_list$matrices[indices]. It is the same for the case treatement my_list[[2]][indices] or my_list$treatement[indices]

Categories

Resources