R - sorting nested list - r

I have nested list, for example:
x <- c(as.list(c("b", 4)), as.list(c("a", 4)))
Is it possible to order it by the second element in the sublists?

I think you want this as an example:
x <- c(list(c("b", 4)), list(c("a", 4)), list(c("b", 3)) )
And to order by the second element in each list you can use this:
> x[ order ( sapply(x, "[[", 2) )]
[[1]]
[1] "b" "3"
[[2]]
[1] "b" "4"
[[3]]
[1] "a" "4"
The saplly(... , "[[" , <n>) paradigm is often useful for extracting from the results of strsplit:
> z <- strsplit(c( "test of sentence reading", "another test", "something esle") , split=" ")
> sapply(z, "[[", 2)
[1] "of" "test" "esle"

Related

Iteratively extract repeated word forms across speaking turns

I'm working on speaking turns in conversation. My interest is in the words that get repeated from a prior turn to a next turn:
turnsX <- data.frame(
speaker = c("A","B","A","B"),
speech = c("let's have a look",
"yeah let's take a look",
"yeah okay so where to start",
"let's start here"), stringsAsFactors = F
)
I want to extract the repeated word forms. To this end I've run a for loop, iteratively defining each speech turn as a regex pattern for the next speech turn and str_extracting the words that get repeated from turn to turn:
library(stringr)
pattern <- c()
extracted <- c()
for(i in 1:nrow(turnsX)){
pattern[i] <- paste0(unlist(str_split(turnsX$speech[i], " ")), collapse = "|")
extracted[i+1] <- str_extract_all(turnsX$speech[i+1], pattern[i])
}
The result however is partly incorrect:
extracted
[[1]]
NULL
[[2]]
[1] "a" "let's" "a" "a" "look"
[[3]]
[1] "yeah" "a" "a"
[[4]]
[1] "start"
[[5]]
[1] NA
The correct result should be:
extracted
[[1]]
NULL
[[2]]
[1] "let's" "a" "look"
[[3]]
[1] "yeah"
[[4]]
[1] "start"
Where's the mistake? How can the code be mended, or what other approach is there, to get the correct result?
Maybe you can use Map and %in%.
x <- strsplit(turnsX$speech, " ")
Map(function(y,z) y[y %in% z], x[-length(x)], x[-1])
#[[1]]
#[1] "let's" "a" "look"
#
#[[2]]
#[1] "yeah"
#
#[[3]]
#[1] "start"
Here's a base R approach using Map :
tmp <- strsplit(turnsX$speech, ' ')
c(NA, Map(intersect, tmp[-1], tmp[-length(tmp)]))
#[[1]]
#[1] NA
#[[2]]
#[1] "let's" "a" "look"
#[[3]]
#[1] "yeah"
#[[4]]
#[1] "start"
You want the word boundaries "\\b"
library(stringr)
pattern <- c()
extracted <- c()
for(i in 2:nrow(turnsX)){
pattern[i - 1] <- paste0(unlist(str_split(turnsX$speech[i - 1], " ")), collapse = "|\\b")
extracted[i] <- str_extract_all(turnsX$speech[i], pattern[i - 1])
}
# [[1]]
# NULL
#
# [[2]]
# [1] "let's" "a" "look"
#
# [[3]]
# [1] "yeah"
#
# [[4]]
# [1] "start"

Sort a list of vectors by a particular index

Given the following list :
l = list(
c('first', 5),
c('second',3),
c('third',2)
)
Which is
> l
[[1]]
[1] "first" "5"
[[2]]
[1] "second" "3"
[[3]]
[1] "third" "2"
How best to sort the elements of the list based on the second elements within
the elements. By this I mean that I have
c('first', 5),
c('second',3),
c('third',2)
And would like to have the ordering based on 5,3,2, giving
c('third',2)
c('second',3),
c('first', 5),
One approach would be :
x=as.double(sapply(l, function(x) x[2]))
l[order(x)]
I'm not sure if there's a better approach though.
An option is to extract the 2nd element with [, convert to numeric, order and use the index for ordering the list 'l'
l[order(as.numeric(sapply(l, `[`, 2)))]
Or unlist, then extract the 2nd element with a recycling logical index, convert to numreic and order
l[order(as.numeric(unlist(l)[c(FALSE, TRUE)]))]
Or using a faster approach with vapply
l[order(as.numeric(vapply(l, `[`, 2, FUN.VALUE = character(1))))]
Or with map and pluck
library(dplyr)
library(purrr)
map_chr(l, pluck, 2) %>%
as.integer %>%
order %>%
l[.]
Maybe data frame can help you somewhat, i.e.
l[order(as.numeric(data.frame(l)[2,]))]
which gives
> l[order(as.numeric(data.frame(l)[2,]))]
[[1]]
[1] "third" "2"
[[2]]
[1] "second" "3"
[[3]]
[1] "first" "5"

How to associate the factor and lists?

I have a lot of named lists. Now I want to separate them according to the number of letter "a" within each element. For instants,
library(stringr)
data1 <- c("apple","appreciate","available","account","adapt")
data2 <- c("tab","banana","cable","tatabox","aaaaaaa","aaaaaaaaaaa")
list1 <- list(data1,data2)
names(list1) <- c("a","b")
ca <- lapply(list1, function(x) str_count(x, "a")) #counting letter a
factor1 <- lapply(ca,as.factor) #convert ca to factor
#is that possible to associate factor1 to list1, then I can separate
#elements depends on the factor1?
#ideal results
result$1 or result[1]
$`1`
$`a`$`1`
[1] "apple" "account"
$`b`$`1`
[1] "tab" "cable"
You can get very close with one line using split and Map:
Map(split, list1, Map(stringr::str_count, list1, "a"))
$a
$a$`1`
[1] "apple" "account"
$a$`2`
[1] "appreciate" "adapt"
$a$`3`
[1] "available"
$b
$b$`1`
[1] "tab" "cable"
$b$`2`
[1] "tatabox"
$b$`3`
[1] "banana"
$b$`7`
[1] "aaaaaaa"
$b$`11`
[1] "aaaaaaaaaaa"
This lists all the "a" elements first and then all the "b" elements grouped by the number of "a" characters.

Extract list element by successive / cascading index chain

ll<-list(list(c('A', 'B', 'C'),"Peter"),"John","Hans")
looks like:
[[1]]
[[1]][[1]]
[1] "A" "B" "C"
[[1]][[2]]
[1] "Peter"
[[2]]
[1] "John"
[[3]]
[1] "Hans"
Lets say I have the indices in a list for "Peter" and "B" respectively.
peter.ind <- list(1,2) # correlates with ll[[1]][[2]]
B.ind <- list(1,1,2) # correlates with ll[[1]][[1]][[2]]
So how can I most effectively extract a "tangled" list element by its cascaded index chain?
Here is my already working function:
extract0r <- function(x,l) {
for(ind in l) {
x <- x[[ind]]
}
return(x)
}
call function:
extract0r(ll,peter.ind) #evals [1] "Peter"
extract0r(ll,B.ind) #evals [1] "B"
Is there a neater alternative to my function?
You can use a recursive function:
ll <- list(list(c('A', 'B', 'C'),"Peter"),"John","Hans")
my.ind <- function(L, ind) {
if (length(ind)==1) return(L[[ind]])
my.ind(L[[ind[1]]], ind[-1])
}
my.ind(ll, c(1,2))
my.ind(ll, c(1,1,2))
# > my.ind(ll, c(1,2))
# [1] "Peter"
# > my.ind(ll, c(1,1,2))
# [1] "B"
The recursive function has a (relative) clear coding, but during execution it has an overhead for the deep function calls.
There are many ways of doing this.
For example, you can build the commands from character strings:
my.ind.str <- function(L, ind) {
command <- paste0(c("L",sprintf("[[%i]]", ind)),collapse="")
return(eval(parse(text=command)))
}
With your example, I had to convert the lists of indices to vectors:
my.ind.str(ll, unlist(peter.ind))
[1] "Peter"
my.ind.str(ll, unlist(B.ind))
[1] "B"

R: Removing blanks from the list

I'm wondering if there is any way to remove blanks from the list.
As far as I've searched, I found out that there are many Q&As for removing
the whole element from the list, but couldn't find the one regarding
a specific component of the element.
To be specific, the list now I'm working with looks like this:
[[1]]
[1] "1" "" "" "2" "" "" "3"
[[2]]
[1] "weak"
[[3]]
[1] "22" "33"
[[4]]
[1] "44" "34p" "45"
From above, you can find " ", which should be removed.
I've tried different commands like
text.words.bl <- text.words.ll[-which(text.words.ll==" ")]
text.words.bl <- text.words.ll[!sapply(text.words.ll, is.null)]
etc, but seems like " "s in [[1]] of the list still remains.
Is it impossible to apply commands to small pieces in each element of the list?
(e.g. 1, 2, weak, 22, 33... respectively)
I've used "lapply" function to run specific commands to each elements,
and it seemed like those lapply commands all worked....
JY
Use %in%, but negate it with !:
## Sample data:
L <- list(c(1, 2, "", "", 4), c(1, "", "", 2), c("", "", 3))
L
# [[1]]
# [1] "1" "2" "" "" "4"
#
# [[2]]
# [1] "1" "" "" "2"
#
# [[3]]
# [1] "" "" "3"
The replacement:
lapply(L, function(x) x[!x %in% ""])
# [[1]]
# [1] "1" "2" "4"
#
# [[2]]
# [1] "1" "2"
#
# [[3]]
# [1] "3"
Obviously, assign the output to "L" if you want to overwrite the original dataset:
L[] <- lapply(L, function(x) x[!x %in% ""])
Another way would be to use nchar(). I borrowed L from #Ananda Mahto.
lapply(L, function(x) x[nchar(x) >= 1])
#[[1]]
#[1] "1" "2" "4"
#
#[[2]]
#[1] "1" "2"
#
#[[3]]
#[1] "3"

Resources