R and double brackets for nested lists - r

There are MANY posts about indexing lists, but I still can't quite get my head around indexing methods for named and unnamed nested lists. Here's my example
person <- list("name"="John","age"=19,"speaks"=c("English","French"))
Johns_brother <- list("name"="Sam","age"=20,"speaks"=c("English","Spanish"))
Johns_sister <- list("name"="Minerva","age"=17,"speaks"=c("English","Italian"))
Johns_sister <- list("name"="Minerva","age"=17,"speaks"=c("English","Italian"))
Johns_other_sister <- list("name"="Casandra","age"=23,"speaks"=c("English","Greek"))
person <- list("name"="John","age"=19,"speaks"=c("English","French"),"siblings"=list(Johns_brother,Johns_sister,Johns_other_sister))
Both of these indexing methods return lists
class(person$siblings[1])
class(person$siblings[[1]])
But only the second allows me to select named elements
person$siblings[1]$name
person$siblings[[1]]$name
Now I've seen posts that insist (all caps in the original) "A DOUBLE BRACKET WILL NEVER RETURN A LIST. RATHER A DOUBLE BRACKET WILL RETURN ONLY A SINGLE ELEMENT FROM THE LIST" But that's obviously not true since both indexing methods return lists. But the two forms of brackets are returning DIFFERENT lists, right? What is the underlying logic here?

Think about it. The [[ notation indexes the list element. But what if that element itself is a list?
list(a = list(b = 1))[[1]]
# $b
# [1] 1
In the above example, the return value is still a list because a is a list. The value returned depends on the value being indexed. The statement A DOUBLE BRACKET WILL NEVER RETURN A LIST is simply not true.
Help on this can be found in help(Extract) -
Indexing by [ is similar to atomic vectors and selects a list of the specified element(s).
Both [[ and $ select a single element of the list.
It also helps to know the difference between atomic and recursive (list-like) vectors.

Related

Filtering data, comma vs not comma

I have the following code
#abnormal return
exp.ret <- lm((RET-rf)~mkt.rf+smb+hml, data=tesla[tesla$period=="estimation.period",])
tesla$abn.ret <- (tesla$RET-tesla$rf)-predict(exp.ret,tesla)
#CAR during event window
CAR <- sum(tesla$abn.ret[tesla$period=="event.period",])
First section runs fine, but second gets this error:
"Error in tesla$abn.ret[tesla$period == "event.period", ] :
incorrect number of dimensions
I know that the solution is to remove the last comma:
#CAR during event window
CAR <- sum(tesla$abn.ret[tesla$period=="event.period"])
Just wondering what is the right pedagogical way of understanding it, why do I need a comma in the end in some cases, but some not, when I'm filtering for only parts of the data frame.
$ sign, [[]] and [] have different meanings.
In short:
$ sign and [[]] subsets one column of a dataframe or one item of a list.
The output of a subsetted dataframe will be a vector, while the output of a subsetted list will be a variable the same class as the original item, which can be a dataframe, another list, etc...
It's important to note that $ doesn't accept a column index (only a column name) and that you cannot insert two column names/index after $ or inside [[]].
[] slices a dataframe or a list sorting out one or more elements.
the class of the output variable will be the same as the original variable.
if you slice a dataframe using [], the output will be a dataframe, the same applies for lists, etc...
In your specific case, you used $ sign to subset your variable. Then, you tried to slice this output from the subset action using [ , ], but it turned out that the output is a vector, and a vector has always only one dimension and an error was fired. You should slice your vector using [] (the output will be a vector) or [[]] (the output will be a vector with length = 1).
Possible ways to subset tesla as you wish:
tesla$abn.ret[tesla$period == "event.period"]
tesla[["abn.ret"]][tesla$period == "event.period"]
tesla[tesla$period == "event.period", "abn.ret"]
You would achieve the same result using tesla[["period"]] instead of tesla$period.
For some extra details/examples, refer to An introduction to R, published by CRAN.
I hope it helped you somehow..!
tesla$abn.ret is one-dimensional. Each comma separates a dimension, so yours implies 2 dimensions.
Alternatively you could run
tesla[tesla$period=="event.period", "abn.ret"]
And get the same results, since tesla is 2-d.
If you look at the documentation with command ?'[', you find that the default behaviour of syntax x[i] is to drop one dimension away.
If you want to disable the dropping of the dimension, you have explicitly to write x[i,drop=False].

How can I access data in a nested R list?

I want to learn how to access data from a nested list in R. I am relatively new to the R programming language, so I am unsure how to proceed.
The data is a 'large list(947 elements, 654.9mb) and takes the form:
The numbers within the datalist refer to station numbers and when I click on one (in Rstudio) it looks like this:
I want to kow how I can access the data within 'doy' for example. I have tried:
data[[1]]
which returns all the data for the first element of the list (site, location, doy,ltm etc). So clearly the number used within the square brackets is interpreted as an index for the list, as opposed to an identifier for the elements/station in the list.
Then I tried:
data$1
but it returned the error:
Error: unexpected numeric constant in "data$1"
Then I tried:
data[data$1==doy]
But was returned this:
Error: unexpected numeric constant in "data[data$1"
So at this point, I realise that it is not construing the number of the station as a category/factor within the list. It's just reading it as a number. So I thought I'd put some quotes around it to see if that changed what happened:
data[data$"1"=="doy"]
This returned
named list()
But when I looked at it in the environment, it was a list of 0.
I looked at some of the similar question here on Stack (like: accessing nested lists in R) and tried:
data[data$"1"=="doy",][[1]]
But just got:
Error in data[data$"1" == "doy", ] : incorrect number of dimensions
How can I access this data? It reminds me of a structure in Matlab, but it doesn't seem to be indexed in a similar fashion in R.
Let's look at some ways to do what you want:
data[[1]]
This returns the first element of the list, which is itself a list. You can use the $ subsetting shorthand, but the name of the first element is nonstandard. R prefers names that start with letters and include only alphanumeric characters, periods and underscores. You can escape this behavior with backticks:
data$`1`
If you want to access one of the elements of list 1 in your list of lists, you need to further subset. To get to doy, which is the third element of 1. You can do that four ways.
data[[1]][[3]]
data$`1`[[3]]
data[[1]]$doy
data$`1`$doy
One way (in addition to what Ben Norris has shown):
our_list[[c("1", "doy")]]
Reproducible example data (please provide next time)
our_list <- list(`1` = list(site = "x", doy = 3))

What's the difference between a list and a vector whose mode is list?

Title essentially says it all. I'm having trouble figuring out the difference between initializing a vector with vector(mode="list") and a list with list().
There are some minor differences in the signatures, list() can take value arguments or tag = value arguments whereas vector() cannot.
And then there's the following quote from the list() documentation:
Almost all lists in R internally are Generic Vectors
So is there any actual difference beside the fact that lists can be initialized with tags and values?
I'd say they're the same:
identical(list(),vector(mode="list", length=0))
## [1] TRUE
(see also this question about the confusing fact that a list is a vector in R: usually when R users refer to "vectors", they actually mean atomic vectors ...)
In my experience the most common use case for vector(mode="list",...) is when you want to initialize a list with length>0. vector(mode="list",10) might be a little more expressive than replicate(10,NULL). If you want to create a length-0 list I can't see any reason to use vector() instead of list().

R lapply into data frame

I have a list contains list as elements. I want to convert all elements into data frame. Instead of using for loop. I used lapply function as follow:
myDF=lapply(mylist,FUN=as.data.frame)
However it's not converting.
class(myDF[1])
still returns list.
Any ideas? Thank you so much for your help,
To look at the first element of the list as-is (i.e. not as a list) you will want to use [[ instead of [. So
class(myDF[[1]])
will tell you the class of the first list element in myDF.
Another way to see this is to look at the difference between myDF[1] and myDF[[1]]. myDF[1] returns the first element of the list as a single-element list, whereas myDF[[1]] returns the first element of the list as itself. See help(Extract) for more.

Are R lists generalised vectors? [duplicate]

This question already has answers here:
The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe
(11 answers)
Closed 8 years ago.
My question stems from the usage of [[ and ]] in user created functions to reference list elements. From what I can tell, [[ and ]] work the same way as [ and ] when applied to vectors.
Is this true of all other list operations though? As another example, I can use lapply on a vector.
It makes sense that this is true if a list is just a generalised vector, whose entries can be of differing modes.
EDIT: The one-and-a-half line answer is that both lists and atomic vectors are types of vectors, and subset exactly the same way.
This answer expands on the difference between lists and atomic vectors.
The best explanation of R's data structures, specifically between lists and atomic vectors, is (in my opinion) Hadley Wickham's new book:
http://adv-r.had.co.nz/Data-structures.html
Both lists and atomic vectors are 1 dimensional data structures. However, atomic vectors are homogeneous and lists are heterogeneous. Lists can contain any type of vector, including other lists. Atomic vectors are flat on the other hand.
As far as subsetting using [] vs [[]], [] is preserving for both lists and atomic vectors, where as [[]] is simplifying. Thus, [] and [[]] are NOT the same, whether applied to lists OR atomic vectors. For example, [[]] will simplify a named vector by removing the name; subsetting a named vector by [] will keep the name. For a list, [[]] will pull out the contents of a list, and can return a number of simplified data structures. Subsetting a list by [] will always return a list (preserving).
Subsetting an atomic vector by [[]] returns a length one atomic vector. Subsetting a list by [[]] can return a number of different classes of data structures. This goes back to the fact that atomic vectors are homogeneous and lists are heterogeneous. However, according to Hadley, subsetting a list works exactly the same way as subsetting an atomic vector.
Take a look at this section of Hadley's book for further reference:
http://adv-r.had.co.nz/Subsetting.html#subsetting-operators
Since I wasn't able to come up with any more counter examples, I referred to the documentation on R's internals, and it appears your intuition is correct.
If you look at the section on the underlying structure of R's data structures in C,
SEXPTYPEs, lists are implied to be generic vectors:
19 VECSXP list (generic vector)

Resources