R - Multi-level list indexing - r

What is the convention to assign an object to a multi-level list?
Sofar I thought the convention 1,2 of indexing is to use [[]] instead of $.
Hence, when saving results in loops I usually used the following approach:
> result <- matrix(2,2,2)
> result_list <- list()
> result_list[["A"]][["B"]][["C"]] <- result
> print(result_list)
$A
$A$B
$A$B$C
[,1] [,2]
[1,] 2 2
[2,] 2 2
Which works as intended with this matrix.
But when assigning a single number the list seems to skip the last level.
> result <- 2
> result_list <- list()
> result_list[["A"]][["B"]][["C"]] <- result
> print(result_list)
$A
B
2
At the same time, if I use $ instead of [[]] the list again is as intendet.
> result_list$A$B$C <- result
> print(result_list)
$A
$A$B
$A$B$C
[1] 2
As mentioned here you can also use list("A" = list("B" = list("C" = 2))).
Which of these methods should be used for indexing a multi-level list in R?

Although the title of the question referst to multi-level list indexing, and the syntax mylist[['a']][['b']][['c']] is the same that one would use to retrieve an element of a multi-level list, the differences that you're observing actually arise from using the same syntax for creation (or not) of multi-level lists.
To show this, we can first explicitly create the multi-level (nested) lists, and then check that the indexing works as expected both for matrices and for single numbers.
mymatrix=matrix(1:4,nrow=2)
list_b=list(c=mymatrix)
list_a=list(b=list_b)
mynestedlist1=list(a=list_a)
str( mynestedlist1 )
# List of 1
# $ a:List of 1
# ..$ b:List of 1
# .. ..$ c: int [1:2, 1:2] 1 2 3 4
mynumber=2
list_e=list(f=mynumber)
list_d=list(e=list_e)
mynestedlist2=list(d=list_d)
str( mynestedlist2 )
# List of 1
# $ d:List of 1
# ..$ e:List of 1
# .. ..$ f: num 2
( Note that I've created the lists in sequential steps for clarity; the could have been all rolled-together in a single line, like: mynestedlist2=list(d=list(e=list(f=mynumber))) )
Anyway, now we'll check that indexing works Ok:
str(mynestedlist1[['a']][['b']][['c']])
# int [1:2, 1:2] 1 2 3 4
str(mynestedlist1$a$b$c)
# int [1:2, 1:2] 1 2 3 4
str(mynestedlist2[['d']][['e']][['f']])
# num 2
str(mynestedlist2$d$e$f)
# num 2
# and, just to check that we don't 'skip the last level':
str(mynestedlist2[['d']][['e']])
# List of 1
# $ f: num 2
So the direct answer to the question 'which of these methods should be used for indexing a multi-level list in R' is: 'any of them - they're all ok'.
So what's going on with the examples in the question, then?
Here, the same syntax is being used to try to implicitly create lists, and since the structure of the nested list is not specified explicitly, this relies on whether R can infer the structure that you want.
In the first and third examples, there's no ambiguity, but each for a different reason:
First example:
mynestedlist1=list()
mynestedlist1[['a']][['b']][['c']]=mymatrix
We've specified that mynestedlist1 is a list. But its elements could be any kind of object, until we assign them. In this case, we put into the element named 'a' an object with an element 'b' that contains an object with an element 'c' that is a matrix. Since there's no R object that can contain a matrix in a single element except a list, the only way to achieve this assignment is by creating a nested list.
Third example:
mynestedlist3=list()
mynestedlist3$g$h$i=mynumber
In this case, we've used the $ notation, which only applies to lists (or to data types that are similar/equivalent to lists, like dataframes). So, again, the only way to follow the instructions of this assignment is by creating a nested list.
Finally, the pesky second example, but starting with a simpler variant of it:
mylist2=list()
mylist2[['c']][['d']]=mynumber
Here there's an ambiguity. We've specified that mylist2 is a list, and we've put into the element named 'c' an object with an element 'd' that contains a single number. This element could have been a list, but it can also be a simple vector, and in this case R chooses this as the simpler option:
str(mylist2)
# List of 1
# $ c: Named num 2
# ..- attr(*, "names")= chr "d"
Contrast this to the behaviour when trying to assign a matrix using exactly the same syntax: in this case, the only way follow the syntax would be by creating another, nested, list inside the first one.
What about the full second example mylist2[['c']][['d']][['e']]=mynumber, where we try to assign a number named 'e' to the just-created but still-empty object 'd'?
This seems rather unclear, and this may be the reason for the different behaviours of different versions of R (as reported in the comments to the question). In the question, the action taken by R has been to assign the number while dropping its name, similarly to:
myvec=vector(); myvec2=vector()
myvec[['a']]=1
myvec2[['b']]=2
myvec[['a']]=myvec2
str(myvec)
# Named num 2
# - attr(*, "names")= chr "a"
However, the syntax alone doesn't seem to force this behaviour, so it would be sensible to avoid relying on this behaviour when trying to create nested lists, or lists of vectors.

Related

Extract all values from a vector of named numerics with the same name in R

I'm trying to handle a vector of named numerics for the first time in R. The vector itself is named p.values. It consists of p-values which are named after their corresponding variabels. Through simulating I obtained a huge number of p-values that are always named like one of the five variables they correspond to. I'm interested in p-values of only one variable however and tried to extract them with p.values[["var_a"]] but that gives my only the p-value of var_a's last entry. p.values$var_a is invalid and as.numeric(p.values) or unname(p.values) gives my only all values without names obviously. Any idea how I can get R to give me the 1/5 of named numerics that are named var_a?
Short example:
p.values <- as.numeric(c(rep(1:5, each = 5)))
names(p.values) <- rep(letters[1:5], 5)
str(p.values)
Named num [1:25] 1 1 1 1 1 2 2 2 2 2 ...
- attr(*, "names")= chr [1:25] "a" "b" "c" "d" ...
I'd like to get R to show me all 5 numbers named "a".
Thanks for reading my first post here and I hope some more experienced R users know how to deal with named numerics and can help me with this issue.
You can subset p.values using [ with names(p.values) == "a" to show all values named a.
p.values[names(p.values) == "a"]
#a a a a a
#1 2 3 4 5

Build column of data frame with character vectors of different length?

I want to create a data frame in R.
To make an easy 2x2 example of my problem:
Assume the first column is a simple vector:
first <- c(1:2)
The second column is for every row a character vector (but of different length), for example:
c('A') for the first row and c('B','C') for the second.
How can I build this data frame?
If you want to store different vector sizes in each row of a certain column, you will need to use a list, problem that (from ?data.frame)
If a list or data frame or matrix is passed to data.frame it is as if
each component or column had been passed as a separate argument
Thus you will need to wrap it up into I in order to protect you desired structure, e.g.
df <- data.frame(first = 1:2, Second = I(list("A", c("B", "C"))))
str(df)
# 'data.frame': 2 obs. of 2 variables:
# $ first : int 1 2
# $ Second:List of 2
# ..$ : chr "A"
# ..$ : chr "B" "C"
# ..- attr(*, "class")= chr "AsIs"

save vector into matrix/data frame value

Is there a way, in R, to save a whole vector into one value of a matrix or data frame, without having to combine it into a single value first?
For example, if I had a vector..
pk<-c(0.021477,0.021114,0.022794,0.014858,0.009690,0.003255,0.002715)
and a matrix..
tst<-matrix(data=NA,nrow=4,ncol=4)
is there anyway of saying, for example..
tst[1,1]<-pk
?
I know I could paste the vector together, but I'm wondering whether there's a way of avoiding this? It's a matter of efficiency as the actual matrix is 33427 x 33427, with each vector ~ 300 values long, and I need to run further analysis on each value in the matrix. I'm hoping to find a way to speed up the analysis.
You can certainly put a vector in each element of a matrix. Try something like
tst<-matrix(data=list(),nrow=4,ncol=4)
tst[[1,1]] <- pk #note double square brackets needed for assignment
It doesn't print 'nicely'
tst
[,1] [,2] [,3] [,4]
[1,] Integer,5 NULL NULL NULL
[2,] NULL NULL NULL NULL
[3,] NULL NULL NULL NULL
[4,] NULL NULL NULL NULL
but elements can be extracted in the obvious ways
> tst[1,1]
[[1]]
[1] 0.021477 0.021114 0.022794 0.014858 0.009690 0.003255 0.002715
#note list
> tst[[1,1]]
[1] 0.021477 0.021114 0.022794 0.014858 0.009690 0.003255 0.002715
#original vector
If the vectors are of varying length, I can see two ways of dealing with it. Either round up the lengths to the max and put things in an array
tst<- array(data=NA, dim=c(4,4,maxLen))
tst[1,1,1:length(pk)] <- pk
Alternatively you can just create a list of the pks, and generate a map to translate the 1 dimensional list index with the 2D matrix element they would've corresponded to.
Which of these is optimal will depend on the downstream analysis you wish to do. If there's 'inter-pk' communication (e.g. you use element 1 of the pk at [1,1] and element 1 of pk at [1,2], [2,1]...) then the array solution might be better. But if all computations are within an individual vector, the list may be a better way to go
You can also use the "as is" I function and make pk a 1 element list (that holds the vector):
pk <- c(0.021477,0.021114,0.022794,0.014858,0.009690,0.003255,0.002715)
dat <- data.frame(a <- I(list(pk)))
str(dat)
## 'data.frame': 1 obs. of 1 variable:
## $ a....I.list.pk..:List of 1
## ..$ : num 0.02148 0.02111 0.02279 0.01486 0.00969 ...
## ..- attr(*, "class")= chr "AsIs"
dat[1,1]
## [[1]]
## [1] 0.021477 0.021114 0.022794 0.014858 0.009690 0.003255 0.002715

Data.frame with both characters and numerics in one column

I have a function I'm using in R that requires input to several parameters, once as a numeric (1) and as a character (NULL). The default is NULL.
I want to apply the function using all possible combinations of parameters, so I used expand.grid to try and create a dataframe which stores these. However, I am running into problems with creating an object that contains both numerics and characters in one column.
This is what I've tried:
comb<-expand.grid(c("NULL",1),c("NULL",1),stringsAsFactors=FALSE), which returns:
comb
Var1 Var2
1 NULL NULL
2 1 NULL
3 NULL 1
4 1 1
with all entries characters:
class(comb[1,1])
[1] "character"
If I now try and insert a numeric into a specific spot, I still receive a character:
comb[2,1]<-as.numeric(1)
class(comb[2,1])
[1] "character"
I've also tried it using stringsAsFactors=TRUE, or using expand.grid(c(0,1),c(0,1)) and then switching out the 0 for NULL but always have the exact same problem: whenever I do this, I do not get a numeric 1.
Manually creating an object using cbind and then inserting the NULL as a character also does not help. I'd be grateful for a pointer, or a work-around to running the function with all possible combinations of parameters.
As you have been told, generally speaking columns of data frames need to be a single type. It's hard to solve your specific problem, because it is likely that the solution is not really "putting multiple types into a single column" but rather re-organizing your other unseen code to work within this restriction.
As I suggested, it probably will be better to use the built in NA value as expand.grid(c(NA,1),c(NA,1)) and then modify your function to use NA as an input. Or, of course, you could just use some "special" numeric value, like -1, or -99 or something.
The related issue that I mentioned is that you really should avoid using the character string "NULL" to mean anything, since NULL is a special value in R, and confusion will ensue.
These sorts of strategies would all be preferable to mixing types, and using character strings of reserved words like NULL.
All that said, it technically is possible to get around this, but it is awkward, and not a good idea.
d <- data.frame(x = 1:5)
> d$y <- list("a",1,2,3,"b")
> d
x y
1 1 a
2 2 1
3 3 2
4 4 3
5 5 b
> str(d)
'data.frame': 5 obs. of 2 variables:
$ x: int 1 2 3 4 5
$ y:List of 5
..$ : chr "a"
..$ : num 1
..$ : num 2
..$ : num 3
..$ : chr "b"

Reading csv file, having numbers and strings in one column

I am importing a 3 column CSV file. The final column is a series of entries which are either an integer, or a string in quotation marks.
Here are a series of example entries:
1,4,"m"
1,5,20
1,6,"Canada"
1,7,4
1,8,5
When I import this using read.csv, these are all just turned in to factors.
How can I set it up such that these are read as integers and strings?
Thank you!
This is not possible, since a given vector can only have a single mode (e.g. character, numeric, or logical).
However, you could split the vector into two separate vectors, one with numeric values and the second with character values:
vec <- c("m", 20, "Canada", 4, 5)
vnum <- as.numeric(vec)
vchar <- ifelse(is.na(vnum), vec, NA)
vnum
[1] NA 20 NA 4 5
vchar
[1] "m" NA "Canada" NA NA
EDIT Despite the OP's decision to accept this answer, #Andrie's answer is the preferred solution. My answer is meant only to inform about some odd features of data frames.
As others have pointed out, the short answer is that this isn't possible. data.frames are intended to contain columns of a single atomic type. #Andrie's suggestion is a good one, but just for kicks I thought I'd point out a way to shoehorn this type of data into a data.frame.
You can convert the offending column to a list (this code assumes you've set options(stringsAsFactors = FALSE)):
dat <- read.table(textConnection("1,4,'m'
1,5,20
1,6,'Canada'
1,7,4
1,8,5"),header = FALSE,sep = ",")
tmp <- as.list(as.numeric(dat$V3))
tmp[c(1,3)] <- dat$V3[c(1,3)]
dat$V3 <- tmp
str(dat)
'data.frame': 5 obs. of 3 variables:
$ V1: int 1 1 1 1 1
$ V2: int 4 5 6 7 8
$ V3:List of 5
..$ : chr "m"
..$ : num 20
..$ : chr "Canada"
..$ : num 4
..$ : num 5
Now, there are all sorts of reasons why this is a bad idea. For one, lots of code that you'd expect to play nicely with data.frames will not like this and either fail, or behave very strangely. But I thought I'd point it out as a curiosity.
No. A dataframe is a series of pasted together vectors (a list of vectors or matrices). Because each column is a vector it can not be classified as both integer and factor. It must be one or the other. You could split the vector apart into numeric and factor ( acolumn for each) but I don't believe this is what you want.

Resources