R replace variable name in all dimensions of multidimensional list - r

I have a large, multidimensional list as a result of a statistic's project. The list holds different objects, holding objects by themselves. There are also plots, matrices etc. It's a heterogeneous mix of a lot of different types and different dimensionalities.
Now I have to change the name of one variable completely. Every occurence has to be overriden. Is there a way to do this?
Here is a little example. There's no use in solving this example explicitely, as my list is much larger.
a <- list(entry1=list("a","b","c","xx",p=c(3,4,"xx")),
entry2=list(matrix(c(1,2,"xx",4), nrow = 2),xx=list(6,7,8,"xx")),
xx=list(1,2,3,4,"xx"))
How can I change the xx to yy? Thanks in advance!

Related

Extracting Nested Elements of an R List Generated by Loops

For lists within lists produced by a loop in R (in this example a list of caret models) I get an object with an unpredictable length and names for inner elements, such as list[[1]][[n repeats of 1]][[2]] where the internal [[1]] is repeated multiple times according to the function's input. In some cases, the length of n is not known, when accessing some older stored lists where input was not saved. While there are ways to work within a list index, like with list[length(list)], there appears to be no way to do this with repeated nested elements. This has made accessing them and passing them to various jobs awkward. I assume there is an efficient way to access them that I have missed, so I'm asking for help to do so, with an example case given below.
The function I'm generating gives out a list from a function that creates several outputs. The final list returned for a function having a complicated output structure is produced by returning something like:
return(list(listOfModels, trainingData, testingData))
The listofModels has variable length, depending on input of models given, and potentially other conditions depend on evaluation inside the function. It is made by:
listOfModels <- list(c(listOfModels, list(trainedModel)))
Where the "trainedModel" refers to the most recently trained model generated in the loop. The models used and the number of them may vary each time depending on choice. An unfortunate result is a complicated nested lists within a list.
That is, output[[1]] contains the models I want to access more efficiently, which are themselves list objects, while output[[2]] and output[[3]] are the dataframes used to train and evaluate the models. While accessing the dataframes is simple and has a defined, reproducible structure each time (simply being output[[2]], output[[3]] every time), output[[1]] becomes a mess. E.g., something like the following follows the "output[[1]]":
The only thing I am able to attempt in order to access this is using the fact that [[1]] is attached upon output[[1]] before [[2]]. All of the nested elements except one have a [[2]] at the end. Given the above pattern, there is an ugly solution that works, but is not a desirable format to work with. E.g., after evaluating n models given by a vector of strings called inputList, and a list given as output of the function, "output", I can have [[1]] repeated tens to hundreds of times.
for (i in (1:length(inputList)-1)){
eval(rlang::parse_expr(paste0(c("output", c(rep("[[1]]", 1+i)), "[[2]]" ) , collapse="")) )
}
This could be used to use all models for some downstream task like making predictions on new data, or whatever. In cases where the length of the inputList was not known, this could be found out by attempting to repeat this until finding an error, or something similar. This approach can be modified to call on a specific part of the list, for example, a certain model within inputList, if I know the original list input and can find the number for that model. Besides the bulkiness code working this way, compared to some way where I could just call on output[[1]][[n]] using some predictable format for various length n. One of the big problems is when accessing older runs that have been saved where the input list of models was not saved, leaving the length of n unknown. I don't know of any way of using something like length() or lengths() to count how many nested elements exist within a list. (For my example, output[[1]] is of length 1, no matter how many [[1]] repeat elements there are.)
I believe the simplest solution is to change the way the list is saved by the function, so that I can access it by a systematic reference, however, I have a bunch of old lists which I still want to access and perform some work with, and I'd also like to be able to have better control of working with lists in any case. So any help would be greatly appreciated.
I expected there would be some way to query the structure of nested R lists, which could be used to pass nested elements to separate functions, without having to use very long repetition of brackets.

Set range data frame in R

I have a large number (17000) of 96x97 matrices with names b1,b2...b17000. I need to combine them into an array 96x97x17000. I am trying to do this through an array function:
s=array(b1,b2...b17000,dim=c(96,97,17000))
The problem is that for this function to work, you need to write down the name of all the matrices. How can this be done without writing down the name of the matrices 17,000 times?
I tried to set this as a range b1:b17000 but it does not work correctly.
Based on your original question the below code should work for you.
names<-c(paste0("b",1:17000))
s<-array(unlist(mget(names)),dim=c(96,97,17000))
This will create a vector called names containing your the names of your matrices (assuming they're actually called m1, m2,... m17000). For example: names[1] will be b1 and names[2] will be b2, so on and so forth.
You can then use names to reference your matrices array as shown in my suggested code.
I hope this helps!

Subsetting list containing multiple classes by same index/vector

I'm needing to subset a list which contains an array as well as a factor variable. Essentially if you imagine each component of the array is relative to a single individual which is then associated to a two factor variable (treatment).
list(array=array(rnorm(2,4,1),c(5,5,10)), treatment= rep(c(1,2),5))
Typically when sub-setting multiple components of the array from the first component of the list I would use something like
list$array[,,c(2,4,6)]
this would return the array components in location 2,4 and 6. However, for the factor component of the list this wouldn't work as subsetting is different, what you would need is this:
list$treatment[c(2,4,6)]
Need to subset a list with containing different classes (array and vector) by the same relative number.
You're treating your list of matrices as some kind of 3-dimensional object, but it's not.
Your list$matrices is of itself a list as well, which means you can index at as a list as well, it doesn't matter if it is a list of matrices, numerics, plot-objects, or whatever.
The data you provided as an example can just be indexed at one level, so list$matrices[c(2,4,6)] works fine.
And I don't really get your question about saving the indices in a numeric vector, what's to stop you from this code?
indices <- c(2,4,6)
mysubset <- list(list$matrices[indices], list$treatment[indices])
EDIT, adding new info for edited question:
I see you actually have an 3-D array now. Which is kind of weird, as there is no clear convention of what can be seen as "components". I mean, from your question I understand that list$array[,,n] refers to the n-th individual, but from a pure code-point of view there is no reason why something like list$array[n,,] couldn't refer to that.
Maybe you got the idea from other languages, but this is not really R-ish, your earlier example with a list of matrices made more sense to me. And I think the most logical would have been a data.frame with columns matrix and treatment (which is conceptually close to a list with a vector and a list of matrices, but it's clearer to others what you have).
But anyway, what is your desired output?
If it's just subsetting: with this structure, as there are no constraints on what could have been the content, you just have to tell R exactly what you want. There is no one operator that takes a subset of a vector and the 3rd index of an array at the same time. You're going to have to tell R that you want 3rd index to use for subsetting, and that you want to use the same index for subsetting a vector. Which is basically just the code you already have:
idx <- c(2,4,6)
output <- list(list$array[,,idx], list$treatment[idx])
The way that you use for subsetting multiple matrices actually gives an error since you are giving extra dimension although you already specify which sublist you are in. Hence in order to subset matrices for the given indices you can usemy_list[[1]][indices] or directly my_list$matrices[indices]. It is the same for the case treatement my_list[[2]][indices] or my_list$treatement[indices]

Naming data frames in lists using a sequence

I have a rather simple question. So I have a list, and I want to name the data frames in the list according to a sequence. Right now I have a sequence that increase according to one letter per list (explained below):
nm1 <- paste0("Results_Comparison_",LETTERS[seq_along(Model_comparisons)])
This creates "Results_Comparison_A", "Results_Comparison_B", "Results_Comparison_C", "Results_Comparison_D", etc. What I want is it for it to be a number instead of a letter. (i.e. Results_Comparison_1, Results_Comparison_2, Results_Comparison_3, etc.) Does anyone know how I could change this? If extra information is needed let me know!
This should work paste0("Results_Comparison_",seq_along(Model_comparisons))

R: Share factor levels between different list items?

I am working with a dataframe, where one of the columns is a multivalued variable, which I've implemented as a list as column.
Here's a reproducible example:
df <- data.frame(title=c('one','two','three'), subjects=I(list(c('A'), c('A','B','C','D'), c('B','D','E'))))
The general idea being that I can attach as many subjects as I'd like without using too much space.
Now the set of possible subjects isn't that large, so if it were a simple column, I'd turn it into a factor. But if I do that here, R stores the levels attribute separately for each list item, (i.e. each row), once again needing a huge amount of storage.
Does anyone know of a way to store a list of factors, with the levels of these factors as a shared attribute?
The only thing I could think of was to do it myself, store the values as integers and create a separate lookup-table, but that doesn't look very efficient.

Resources