I was working with a list of lists today and needed to replace an element of one of the second-level lists. The way to do this seemed obvious, but I realized I wasn't actually clear on why it worked.
Here's an example:
a <- list(aa=list(aaa=1:10,bbb=11:20,ccc=21:30),bb=list(ddd=1:5))
Given this data structure, let's say I want to replace the 3rd element of the nested numeric vector aaa. I could do something like:
newvalue <- 100
a$aa$aaa[3] <- newvalue
Doing this seems obvious enough, but I couldn't explain to myself how this expression actually gets evaluated. Working with the quote function, I cobbled together some rough logic, along the lines of:
(1) Create and submit the top-level function call:
`<-`(a$aa$aaa[3],newvalue)
(2) Lazy evaluation of first argument in (1), call function '[':
`[`(a$aa$aaa,3)
(3) Proceed recursivley down:
`$`(a$aa,"aaa")
(4) ...further down, call '$' again:
`$`(a,"aa")
(5) With (4) returning an actual data structure, proceed back "up the stack" substituting the returned data structures until the actual assignment is made in (1).
I guess my confusion involves some aspects of lazy evaluation and/or evaluation environments. In the example above, I've simply reassigned one element of a vector. But how is it that R keeps track of where that vector is within the greater data structure?
Cheers
I think a$aa$aaa[3] works as followed:
When you want to access an element still within the current object you use single square brackets, []. This makes the element accesible, but you can not perform complex manipulations with it because the element is still part of the object.
When you access the element using $, this translates from a$aa to a[["aa"]], thus releasing the element from the current object.
The total expression a$aa$aaa[3] translates to a[["aa"]][["aaa"]][3]. This is treated as a vector of vectors ->
Take object a, access vector aa.
Access vector aaa.
Access the third element in vector aaa.
The R evaluation:
> a
$aa
$aa$aaa
[1] 1 2 3 4 5 6 7 8 9 10
$aa$bbb
[1] 11 12 13 14 15 16 17 18 19 20
$aa$ccc
[1] 21 22 23 24 25 26 27 28 29 30
$bb
$bb$ddd
[1] 1 2 3 4 5
> a$aa
$aaa
[1] 1 2 3 4 5 6 7 8 9 10
$bbb
[1] 11 12 13 14 15 16 17 18 19 20
$ccc
[1] 21 22 23 24 25 26 27 28 29 30
> a$aa$aaa
[1] 1 2 3 4 5 6 7 8 9 10
> a[["aa"]]
$aaa
[1] 1 2 3 4 5 6 7 8 9 10
$bbb
[1] 11 12 13 14 15 16 17 18 19 20
$ccc
[1] 21 22 23 24 25 26 27 28 29 30
> a[["aa"]][["aaa"]]
[1] 1 2 3 4 5 6 7 8 9 10
> a["aa"]
$aa
$aa$aaa
[1] 1 2 3 4 5 6 7 8 9 10
$aa$bbb
[1] 11 12 13 14 15 16 17 18 19 20
$aa$ccc
[1] 21 22 23 24 25 26 27 28 29 30
> a["aa"]["aaa"]
$<NA>
NULL
Hope this helps.
Related
I am doing text analysis in R. I have a list of lists that contain ngrams.
Look like this:
> list_tetragrams[459]
[[1]]
[1] a small stage show album of jazz standards an album of jazz and play small rooms
[5] and release an album can translate into a her late s and i think she’ll wait
[9] in her late s into a small stage late s and release maybe something she can
[13] one can dream right play small rooms jazz release an album of s and release an
[17] she can translate into she’ll wait until she’s she’s in her late show and play small
I want to convert this list of lists into one list. Here's what I did and the output:
Fngram<- list(unlist(unlist(list_tetragrams)))
Output:
[1] 1 2 3 4 5 6 7 8 9 10 11 12 1 1 2 3 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
[39] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
The code I have been used multiple times, and it is the first time something like this happened. I have tried to use flatten() function or do.all() function. All return the same output. What happened? Can someone please figure it out? Thank you!
An option is to use a recursive function to convert the values to character from factor (the integer coercion values suggest that the nested list elements are factor class), by default, the how = 'unlist' in rapply), then we wrap those vector with list to create a single list element
list(rapply(list_tetragrams, as.character))
i am having trouble understanding the difference between the R function rank and the R function order. they seem to produce the same output:
> rank(c(10,30,20,50,40))
[1] 1 3 2 5 4
> order(c(10,30,20,50,40))
[1] 1 3 2 5 4
Could somebody shed some light on this for me?
Thanks
set.seed(1)
x <- sample(1:50, 30)
x
# [1] 14 19 28 43 10 41 42 29 27 3 9 7 44 15 48 18 25 33 13 34 47 39 49 4 30 46 1 40 20 8
rank(x)
# [1] 9 12 16 25 7 23 24 17 15 2 6 4 26 10 29 11 14 19 8 20 28 21 30 3 18 27 1 22 13 5
order(x)
# [1] 27 10 24 12 30 11 5 19 1 14 16 2 29 17 9 3 8 25 18 20 22 28 6 7 4 13 26 21 15 23
rank returns a vector with the "rank" of each value. the number in the first position is the 9th lowest. order returns the indices that would put the initial vector x in order.
The 27th value of x is the lowest, so 27 is the first element of order(x) - and if you look at rank(x), the 27th element is 1.
x[order(x)]
# [1] 1 3 4 7 8 9 10 13 14 15 18 19 20 25 27 28 29 30 33 34 39 40 41 42 43 44 46 47 48 49
As it turned out this was a special case and made things confusing. I explain below for anyone interested:
rank returns the order of each element in an ascending list
order returns the index each element would have in an ascending list
I always find it confusing to think about the difference between the two, and I always think, "how can I get to order using rank"?
Starting with Justin's example:
Order using rank:
## Setup example to match Justin's example
set.seed(1)
x <- sample(1:50, 30)
## Make a vector to store the sorted x values
xx = integer(length(x))
## i is the index, ir is the ith "rank" value
i = 0
for(ir in rank(x)){
i = i + 1
xx[ir] = x[i]
}
all(xx==x[order(x)])
[1] TRUE
rank is more complicated and not neccessarily an index (integer):
> rank(c(1))
[1] 1
> rank(c(1,1))
[1] 1.5 1.5
> rank(c(1,1,1))
[1] 2 2 2
> rank(c(1,1,1,1))
[1] 2.5 2.5 2.5 2.5
In layman's language, order gives the actual place/position of a value after sorting the values
For eg:
a<-c(3,4,2,7,8,5,1,6)
sort(a) [1] 1 2 3 4 5 6 7 8
The position of 1 in a is 7. similarly position of 2 in a is 3.
order(a) [1] 7 3 1 2 6 8 4 5
as is stated by ?order() in R prompt,
order just return a permutation which sort the original vector into ascending/descending order.
suppose that we have a vector
A<-c(1,4,3,6,7,4);
A.sort<-sort(A);
then
order(A) == match(A.sort,A);
rank(A) == match(A,A.sort);
besides, i find that order has the following property(not validated theoratically):
1 order(A)∈(1,length(A))
2 order(order(order(....order(A)....))):if you take the order of A in odds number of times, the results remains the same, so as to even number of times.
some observations:
set.seed(0)
x<-matrix(rnorm(10),1)
dm<-diag(length(x))
# compute rank from order and backwards:
rank(x) == col(x)%*%dm[order(x),]
order(x) == col(x)%*%dm[rank(x),]
# in special cases like this
x<-cumsum(rep(c(2,0),times=5))+rep(c(0,-1),times=5)
# they are equal
order(x)==rank(x)
R Fiddle
vals<-c(10.3,10.3,10.2,16.4,18.8,19.7,15.6,18.2,22.6,19.9,24.2,21.0,21.4,21.3,19.1,22.2,33.8,27.4,25.7,24.9,34.5,31.7,36.3,38.3,42.6,55.4,55.7,58.3,51.5,51.0,77.0)
# Standard Order
# the second and third values should be reversed
order(vals)
# ------------------------------------------------------------
# [1] 3 1 2 7 4 8 5 15 6 10 12 14 13 16 9 11 20 19 18 22 17 21 23 24 25
# [26] 30 29 26 27 28 31
# ------------------------------------------------------------
# Reverse Decreasing
# should be the same as the original, but it isn't (it's correct)
rev(order(vals, decreasing=T))
# ------------------------------------------------------------
# [1] 3 2 1 7 4 8 5 15 6 10 12 14 13 16 9 11 20 19 18 22 17 21 23 24 25
# [26] 30 29 26 27 28 31
# ------------------------------------------------------------
I need some help in understanding what is happening in R. I think there's a bug when outputting order and how they are not the same. Notice the second and third values of both outputs. Shouldn't the order be 3,3,1 or 2,2,1 or 3,2,1 depending on how order treats the same value? Regardless.. the third value should have order=1.
Is my understanding correct, or am I missing something?
As per the documentation,
order returns a permutation which rearranges its first argument into ascending or descending order, breaking ties by further arguments.
i.e. order() returns a set of indices such that x[order(x)] is in increasing order, or that x[order(x,decreasing = TRUE)] is in decreasing order.
If two consecutive values in x are identical, then the order of their indices in the value returned by order is immaterial, and will simply depend on what is most efficient and involves the least amount of swapping values around in the internal C code.
i am having trouble understanding the difference between the R function rank and the R function order. they seem to produce the same output:
> rank(c(10,30,20,50,40))
[1] 1 3 2 5 4
> order(c(10,30,20,50,40))
[1] 1 3 2 5 4
Could somebody shed some light on this for me?
Thanks
set.seed(1)
x <- sample(1:50, 30)
x
# [1] 14 19 28 43 10 41 42 29 27 3 9 7 44 15 48 18 25 33 13 34 47 39 49 4 30 46 1 40 20 8
rank(x)
# [1] 9 12 16 25 7 23 24 17 15 2 6 4 26 10 29 11 14 19 8 20 28 21 30 3 18 27 1 22 13 5
order(x)
# [1] 27 10 24 12 30 11 5 19 1 14 16 2 29 17 9 3 8 25 18 20 22 28 6 7 4 13 26 21 15 23
rank returns a vector with the "rank" of each value. the number in the first position is the 9th lowest. order returns the indices that would put the initial vector x in order.
The 27th value of x is the lowest, so 27 is the first element of order(x) - and if you look at rank(x), the 27th element is 1.
x[order(x)]
# [1] 1 3 4 7 8 9 10 13 14 15 18 19 20 25 27 28 29 30 33 34 39 40 41 42 43 44 46 47 48 49
As it turned out this was a special case and made things confusing. I explain below for anyone interested:
rank returns the order of each element in an ascending list
order returns the index each element would have in an ascending list
I always find it confusing to think about the difference between the two, and I always think, "how can I get to order using rank"?
Starting with Justin's example:
Order using rank:
## Setup example to match Justin's example
set.seed(1)
x <- sample(1:50, 30)
## Make a vector to store the sorted x values
xx = integer(length(x))
## i is the index, ir is the ith "rank" value
i = 0
for(ir in rank(x)){
i = i + 1
xx[ir] = x[i]
}
all(xx==x[order(x)])
[1] TRUE
rank is more complicated and not neccessarily an index (integer):
> rank(c(1))
[1] 1
> rank(c(1,1))
[1] 1.5 1.5
> rank(c(1,1,1))
[1] 2 2 2
> rank(c(1,1,1,1))
[1] 2.5 2.5 2.5 2.5
In layman's language, order gives the actual place/position of a value after sorting the values
For eg:
a<-c(3,4,2,7,8,5,1,6)
sort(a) [1] 1 2 3 4 5 6 7 8
The position of 1 in a is 7. similarly position of 2 in a is 3.
order(a) [1] 7 3 1 2 6 8 4 5
as is stated by ?order() in R prompt,
order just return a permutation which sort the original vector into ascending/descending order.
suppose that we have a vector
A<-c(1,4,3,6,7,4);
A.sort<-sort(A);
then
order(A) == match(A.sort,A);
rank(A) == match(A,A.sort);
besides, i find that order has the following property(not validated theoratically):
1 order(A)∈(1,length(A))
2 order(order(order(....order(A)....))):if you take the order of A in odds number of times, the results remains the same, so as to even number of times.
some observations:
set.seed(0)
x<-matrix(rnorm(10),1)
dm<-diag(length(x))
# compute rank from order and backwards:
rank(x) == col(x)%*%dm[order(x),]
order(x) == col(x)%*%dm[rank(x),]
# in special cases like this
x<-cumsum(rep(c(2,0),times=5))+rep(c(0,-1),times=5)
# they are equal
order(x)==rank(x)
i am having trouble understanding the difference between the R function rank and the R function order. they seem to produce the same output:
> rank(c(10,30,20,50,40))
[1] 1 3 2 5 4
> order(c(10,30,20,50,40))
[1] 1 3 2 5 4
Could somebody shed some light on this for me?
Thanks
set.seed(1)
x <- sample(1:50, 30)
x
# [1] 14 19 28 43 10 41 42 29 27 3 9 7 44 15 48 18 25 33 13 34 47 39 49 4 30 46 1 40 20 8
rank(x)
# [1] 9 12 16 25 7 23 24 17 15 2 6 4 26 10 29 11 14 19 8 20 28 21 30 3 18 27 1 22 13 5
order(x)
# [1] 27 10 24 12 30 11 5 19 1 14 16 2 29 17 9 3 8 25 18 20 22 28 6 7 4 13 26 21 15 23
rank returns a vector with the "rank" of each value. the number in the first position is the 9th lowest. order returns the indices that would put the initial vector x in order.
The 27th value of x is the lowest, so 27 is the first element of order(x) - and if you look at rank(x), the 27th element is 1.
x[order(x)]
# [1] 1 3 4 7 8 9 10 13 14 15 18 19 20 25 27 28 29 30 33 34 39 40 41 42 43 44 46 47 48 49
As it turned out this was a special case and made things confusing. I explain below for anyone interested:
rank returns the order of each element in an ascending list
order returns the index each element would have in an ascending list
I always find it confusing to think about the difference between the two, and I always think, "how can I get to order using rank"?
Starting with Justin's example:
Order using rank:
## Setup example to match Justin's example
set.seed(1)
x <- sample(1:50, 30)
## Make a vector to store the sorted x values
xx = integer(length(x))
## i is the index, ir is the ith "rank" value
i = 0
for(ir in rank(x)){
i = i + 1
xx[ir] = x[i]
}
all(xx==x[order(x)])
[1] TRUE
rank is more complicated and not neccessarily an index (integer):
> rank(c(1))
[1] 1
> rank(c(1,1))
[1] 1.5 1.5
> rank(c(1,1,1))
[1] 2 2 2
> rank(c(1,1,1,1))
[1] 2.5 2.5 2.5 2.5
In layman's language, order gives the actual place/position of a value after sorting the values
For eg:
a<-c(3,4,2,7,8,5,1,6)
sort(a) [1] 1 2 3 4 5 6 7 8
The position of 1 in a is 7. similarly position of 2 in a is 3.
order(a) [1] 7 3 1 2 6 8 4 5
as is stated by ?order() in R prompt,
order just return a permutation which sort the original vector into ascending/descending order.
suppose that we have a vector
A<-c(1,4,3,6,7,4);
A.sort<-sort(A);
then
order(A) == match(A.sort,A);
rank(A) == match(A,A.sort);
besides, i find that order has the following property(not validated theoratically):
1 order(A)∈(1,length(A))
2 order(order(order(....order(A)....))):if you take the order of A in odds number of times, the results remains the same, so as to even number of times.
some observations:
set.seed(0)
x<-matrix(rnorm(10),1)
dm<-diag(length(x))
# compute rank from order and backwards:
rank(x) == col(x)%*%dm[order(x),]
order(x) == col(x)%*%dm[rank(x),]
# in special cases like this
x<-cumsum(rep(c(2,0),times=5))+rep(c(0,-1),times=5)
# they are equal
order(x)==rank(x)