Subsetting in R

Subsetting in R - r

x <- list(l1=list(1:4),l2=list(2:5),l3=list(3:8))
I know [] is used for extracting multiple elements and [[]] is used to extract a single element in a list inside a list. I need help in extracting multiple elements in a list inside another list. For example I need to extract 1,3 from list l1 which is inside another list?

For full details, see help(Extract) which covers [[ and [
The [[ operator can walk/search nested lists in a single step, by providing a vector of names OR indices (a path):
> y = list(a=list(b=1))
> y[[c("a","b")]]
[1] 1
> y[[c(1,1)]]
[1] 1
You can't mix names and indices:
> y[[c("a",1)]]
NULL
It seems like you are asking a different question, since your inner lists are not named.
Here's a solution using only numeric indices:
> x[[c(1,1)]]
[1] 1 2 3 4
> x[[c(1,1)]][c(1,3)]
[1] 1 3
the first 1 gets the first element of the first list. The second 1 unwraps it to expose the vector inside.
This might be useful if your real use case involves more complex paths, but to avoid surprising other programmers, in the given example the following...
x[["l1"]][[1]][c(1,3)]
...is probably preferable. The second 1 unwraps the list.
In your case, the following is also equivalent
unlist(x[["l1"]])[c(1,3)]

It sounds like you might be interested in exploring the rapply function (recursive lapply).
If I understand your question correctly, you could do something like this:
rapply(x[["l1"]], f=`[`, ...=c(1, 3))
# [1] 1 3
which is a little different than:
lapply(x[["l1"]], `[`, c(1, 3))
# [[1]]
# [1] 1 3

Related

Storing a value in a nested list with an unknown depth in R

I am trying to optimize a code which is very computational-intensive, because it deals with subsets of a 80-elements set.
A crucial step that I want to accelerate is finding if the current subset in my loop has already been treated or not. For the moment, I check if this subset is contained in the already treated subset of the same size k (cardinal). It would be much more faster to store progressively treated subset in a nested list to check if a subset has already been treated or not (O(1) instead of a search in O(80 choose k)).
I had no problem coding a function to check if the current subset is in my nested list of treated subset: access(treated, subset=c(2,5,3)) returns TRUE iff treated[[2]][[5]][[3]]==TRUE
However, I have no idea how to store (inside my loop) my current subset in the list of treated. I would like something like this to be possible: treated[h] <- TRUE where h is my current subset (in the above example: h=c(2,5,3))
The main problem that I am facing is that the number of "[[..]]" varies inside my loop. Do I have any other option rather than completing h so that it has a length of 80 and putting a sequence of 80 "[[..]]", like: treated[[h[1]]][[h[2]]]...[[h[80]]] <- TRUE ?

If h is a vector of values then
"[["(treated, h)
recursively subsets the list items.
For example, I created a (not so highly) nested list:
> a
[[1]]
[[1]][[1]]
[1] 2
[[1]][[2]]
[[1]][[2]][[1]]
[1] 3
[[2]]
[1] 1
The following command, correctly recursively applies item subsetting to the list:
> "[["(a, c(1,2,1))
[1] 3
The length of the recursively subsetting vector can vary without fixing the number of [[..]]'s. For example, subsetting two levels of depth with the same syntax:
> "[["(a, c(1,2))
[[1]]
[1] 3

$value in unidimensional integrals in R [duplicate]

I have transitioned from STATA to R, and I was experimenting with different data types so that R's data structures are clear in my mind.
Here's how I set up my data structure:
b<-list(u=5,v=12)
c<-list(u=7)
j<-list(name="Joe",salary=55000,union=T)
bcj<-list(b,c,j)
Now, I was trying to figure out different ways to access u=5. I believe there are three ways:
Try1:
bcj[[1]][[1]]
I got 5. Correct!
Try2:
bcj[[1]][["u"]]
I got 5. Correct!
Try3:
bcj[[1]]$u
I got 5. Correct!
Try4
bcj[[1]][1][1]
Here's what I got:
bcj[[1]][1][1]
$u
[1] 5
class(bcj[[1]][1][1])
[1] "list"
Question 1: Why did this happen?
Also, I experimented with the following:
bcj[[1]][1][1][1][1][1]
$u
[1] 5
class(bcj[[1]][1][1][1][1][1])
[1] "list"
Question 2: I would have expected an error because I don't think so many lists exist in bcj, but R gave me a list. Why did this happen?
PS: I did look at this thread on SO, but it's talking about a different issue.

I think this is sufficient to answer your question. Consider a length-1 list:
x <- list(u = 5)
#$u
#[1] 5
length(x)
#[1] 1
x[1]
x[1][1]
x[1][1][1]
...
always gives you the same:
#$u
#[1] 5
In other words, x[1] will be identical to x, and you fall into infinite recursion. No matter how many [1] you write, you just get x itself.
If I create t1<-list(u=5,v=7), and then do t1[2][1][1][1]...this works as well. However, t1[[2]][2] gives NA
That is the difference between [[ and [ when indexing a list. Using [ will always end up with a list, while [[ will take out the content. Compare:
z1 <- t1[2]
## this is a length-1 list
#$v
#[1] 7
class(z1)
# "list"
z2 <- t1[[2]]
## this takes out the content; in this case, a vector
#[1] 7
class(z2)
#[1] "numeric"
When you do z1[1][1]..., as discussed above, you always end up with z1 itself. While if you do z2[2], you surely get an NA, because z2 has only one element, and you are asking for the 2nd element.
Perhaps this post and my answer there is useful for you: Extract nested list elements using bracketed numbers and names?

How to find common elements on two different length vectors in R?

I need to find common elements in 2 different length vectors.
For example, I have a vector A with 10 elements, and a vector B with 3 elements.
I need get the position of which elements in A is equal to B.
A=c(1,2,45,3,10,5,11,13,6,7)
B=c(45,3,10)
C would be [3,4,5]
I have already tried "match" and "intercept" functions, but no success :(
Thanks a lot! :)

You can use which function.
> which(A %in% B)
[1] 3 4 5

lapply generating different results when the attributes of the list is called using a $ sign

When I created the list and called the list using a "[ ]" operator I got the following result
x <- list(a=1:5, b=rnorm(5))
lapply(x[1], mean)
$a
[1] 3
lapply(x[2], sum)
$b
[1] 0.3653843
But when I called the same list using $ sign I get a different result
> x <- list(a=1:5, b=rnorm(5))
> lapply(x$a, mean)
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 4
[[5]]
[1] 5
> lapply(x$b, sum)
[[1]]
[1] 0.7208679
[[2]]
[1] 1.367853
[[3]]
[1] -0.5799428
[[4]]
[1] -2.186257
[[5]]
[1] 0.1597629
Not able to understand why?

There's a major difference between $ and [. While $ returns the list element, [ returns a list containing one element.
> x[1]
$a
[1] 1 2 3 4 5
> x$a
[1] 1 2 3 4 5
An equivalent expression to x$a is x[[1]]. [[ also returns the list element.
> x[[1]]
[1] 1 2 3 4 5
Since both $ and [[ return a single list element, you can't use them to return multiple ones. However, you can use [ to return a list with multiple elements. For example,
> x[1:2]
$a
[1] 1 2 3 4 5
$b
[1] 0.3465471 0.2955350 1.1292449 1.1136643 -0.9798430

In the first case, the input to lapply is a list with one element equal to c(1:5) or rnorm(5). In the second case, the input to laply is a vector with 5 elements. So the mean function gets each value 1,2,3,4,5 separately (and does nothing in this case but return the same value).
In other words, x[1] give a list of one element
> x <- list(a=1:5, b=rnorm(5))
> str(x[1])
List of 1
$ a: int [1:5] 1 2 3 4 5
Whereas, x$a is equal to x[["a"]] or x[[1]] and gives a vector with 5 elements:
str(x$a)
int [1:5] 1 2 3 4 5

The difference is relatively small but very meaningful. A list a is a fancy word for a vector where elements don't have to be of the same type (int, char, logical, etc.). In fact the components of a list can be anything, even other lists.
To use an analogy:
A vector is a box. The only rule that applies to this box is that all the things in the box have be of the same type. Things we put in boxes (such as numbers, or Boolean values) are called elements.
A list is a crate. The only rule for crates is that we can put only boxes and other crates in the crate but not elements. Things we put in crates are called components.
In order to get things from out vectors/boxes or lists/crates we use three functions (everything in R is a function), each meaning something a little different:
square brackets [k]. These mean "get the k^th element from the vector", or in the case of the list "get me the k^th component in the list". What's the difference you might ask? Well requesting a element in a vector will get you a value (i.e. TRUE, "john doe", 3) whereas requesting an component from a list can get you only a vector or another list (In terms of the analogy: the only thing you are getting out of a crate is either a box or a crate)
double square brackets [[k]]. These mean "get me the contents of the k^th element from the vector", or in the case of the list "get me the contents of the k^th component in the list*. In the case of the vector these double brackets aren't very useful since a vector cannot contain something in turn contains something else. In the box analogy: You're asking for the contents of an element. Since an element has no content R chooses to return the element itself. In the case of the list, R goes to the k^th component in the list and returns its content. using the analogy: R goes to the crate pickes out the k^th box (or crate) in it and returns the content of said box (or crate).
dollar symbol $. This is a symbol that is almost exclusively used in the context of lists since it allows calling named components from the list. The main benefit from this symbol is that it allows you to refer to components in a list as if they were variables in the workspace.
In your example two things are going a little wrong:
First, lapply is an apply function that expects to receive a list object as input (even if they don't explicitly say so). You can see this by printing the lapply code out:
lapply
function (X, FUN, ...)
{
FUN <- match.fun(FUN)
if (!is.vector(X) || is.object(X))
X <- as.list(X)
.Internal(lapply(X, FUN))
}
Notice that regardless of your input R will take it and convert it to a list with the as.list function. This means that the function operates on components and not elements.
In your first input
lapply(x[1], mean)
lapply(x[2], sum)
you are giving the function a component (a box), in the second input
lapply(x$a, mean)
lapply(x$b, sum)
you are giving the function elements. You can see the difference with how R handles the printing of each. x$a prints like a vector, x[1] prints like a list. Once the function receives the elements, it converts them to a list, assuming that each element should be a component in the list, as shown by the following function:
as.list(x$a)
where each component in the new list is a vector with 1 element.
tl;dr: don't confuse components and elements :).

Why does R need the name of the dataframe?

If you have a dataframe like this
mydf <- data.frame(firstcol = c(1,2,1), secondcol = c(3,4,5))
Why would
mydf[mydf$firstcol,]
work but
mydf[firstcol,]
wouldn't?

You can do this:
mydf[,"firstcol"]
Remember that the column goes second, not first.
In your example, to see what mydf[mydf$firstcol,] gives you, let's break it down:
> mydf$firstcol
[1] 1 2 1
So really mydf[mydf$firstcol,] is the same as
> mydf[c(1,2,1),]
firstcol secondcol
1 1 3
2 2 4
1.1 1 3
So you are asking for rows 1, 2, and 1. That is, you are asking for your row one to be the same as row 1 of mydf, your row 2 to be the same as row 2 of mydf and your row 3 to be the same as row 1 of mydf; and you are asking for both columns.
Another question is why the following doesn't work:
> mydf[,firstcol]
Error in `[.data.frame`(mydf, , firstcol) : object 'firstcol' not found
That is, why do you have to put quotes around the column name when you ask for it like that but not when you do mydf$firstcol. The answer is just that the operators you are using require different types of arguments. You can look at '$' to see the form x$name and thus the second argument can be a name, which is not quoted. You can then look up ?'[', which will actually lead you to the same help page. And there you will find the following, which explains it. Note that a "character" vector needs to have quoted entries (that is how you enter a character vector in R (and many other languages).
i, j, ...: indices specifying elements to extract or replace. Indices
are ‘numeric’ or ‘character’ vectors or empty (missing) or
‘NULL’. Numeric values are coerced to integer as by
‘as.integer’ (and hence truncated towards zero). Character
vectors will be matched to the ‘names’ of the object (or for
matrices/arrays, the ‘dimnames’): see ‘Character indices’
below for further details.

Nothing to add to the very clear explanation of Xu Wang. You might want to note in addition that the package data.table allows you to use notation such as mydf[firstcol==1,] or mydf[,firstcol], that many find more natural.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Subsetting in R - r

It sounds like you might be interested in exploring the rapply function (recursive lapply). If I understand your question correctly, you could do something like this: rapply(x[["l1"]], f=`[`, ...=c(1, 3)) # [1] 1 3 which is a little different than: lapply(x[["l1"]], `[`, c(1, 3)) # [[1]] # [1] 1 3

Related

Storing a value in a nested list with an unknown depth in R

$value in unidimensional integrals in R [duplicate]

How to find common elements on two different length vectors in R?

lapply generating different results when the attributes of the list is called using a $ sign

Why does R need the name of the dataframe?

Categories

Resources