Program to iterate through lists within a list - r

I have a list containing a number of lists containing character vectors. The lists are always arranged so that the first list contains a vector with a single element, the second list contains a vector with two elements and the third contains one or more vectors containing three elements.
fruits <- list(
list(c("orange")),
list(c("pear", "orange")),
list(c("orange", "pear", "grape"),
c("orange", "lemon", "pear"))
)
I need to iterate through the lists in order to remove the elements from the vector in the previous list. i.e. I would first find the value from the vector in the first list ('orange') and remove it from the vector in the second list, then take the values from the second list ("pear", "orange") and remove them from both vectors in the third list, so I ended up with:
new_fruits <- list(
list(c("orange")),
list(c("pear")),
list(c("grape"),
c("lemon"))
)
I should add that I have had a go at doing this, but I'm finding the lists within lists make it quite complicated and my solution is long and not very efficient.

Maybe you can try the code below
new_fruits <- s <- c()
for (k in seq_along(fruits)) {
new_fruits[[k]] <- lapply(fruits[[k]],function(x) x[!x%in%s])
s <- union(s,unlist(fruits[[k]]))
}
which gives
> new_fruits
[[1]]
[[1]][[1]]
[1] "orange"
[[2]]
[[2]][[1]]
[1] "pear"
[[3]]
[[3]][[1]]
[1] "grape"
[[3]][[2]]
[1] "lemon"

Here is an idea where we unlist, convert to strings and resplit to differentiate between different vectors of same element. We then unlist one more time and get the unique values, i.e.
as.list(unique(unlist(strsplit(unlist(lapply(fruits, function(i) sapply(i, toString))), ', '))))
#[[1]]
#[1] "orange"
#[[2]]
#[1] "pear"
#[[3]]
#[1] "grape"
#[[4]]
#[1] "lemon"

Another two compact options:
> mapply(fruits,append(list(list("")),fruits[-length(fruits)], after = length(fruits)), FUN = function(x,y) sapply(x,function(item)list(setdiff(item,y[[1]]))))
[[1]]
[[1]][[1]]
[1] "orange"
[[2]]
[[2]][[1]]
[1] "pear"
[[3]]
[[3]][[1]]
[1] "grape"
[[3]][[2]]
[1] "lemon"
or also
> append(fruits[[1]],mapply(fruits[-1],fruits[-length(fruits)], FUN = function(x,y) sapply(x,function(item)list(setdiff(item,y[[1]])))), after = length(fruits))
[[1]]
[1] "orange"
[[2]]
[[2]][[1]]
[1] "pear"
[[3]]
[[3]][[1]]
[1] "grape"
[[3]][[2]]
[1] "lemon"

Related

Iteratively extract repeated word forms across speaking turns

I'm working on speaking turns in conversation. My interest is in the words that get repeated from a prior turn to a next turn:
turnsX <- data.frame(
speaker = c("A","B","A","B"),
speech = c("let's have a look",
"yeah let's take a look",
"yeah okay so where to start",
"let's start here"), stringsAsFactors = F
)
I want to extract the repeated word forms. To this end I've run a for loop, iteratively defining each speech turn as a regex pattern for the next speech turn and str_extracting the words that get repeated from turn to turn:
library(stringr)
pattern <- c()
extracted <- c()
for(i in 1:nrow(turnsX)){
pattern[i] <- paste0(unlist(str_split(turnsX$speech[i], " ")), collapse = "|")
extracted[i+1] <- str_extract_all(turnsX$speech[i+1], pattern[i])
}
The result however is partly incorrect:
extracted
[[1]]
NULL
[[2]]
[1] "a" "let's" "a" "a" "look"
[[3]]
[1] "yeah" "a" "a"
[[4]]
[1] "start"
[[5]]
[1] NA
The correct result should be:
extracted
[[1]]
NULL
[[2]]
[1] "let's" "a" "look"
[[3]]
[1] "yeah"
[[4]]
[1] "start"
Where's the mistake? How can the code be mended, or what other approach is there, to get the correct result?
Maybe you can use Map and %in%.
x <- strsplit(turnsX$speech, " ")
Map(function(y,z) y[y %in% z], x[-length(x)], x[-1])
#[[1]]
#[1] "let's" "a" "look"
#
#[[2]]
#[1] "yeah"
#
#[[3]]
#[1] "start"
Here's a base R approach using Map :
tmp <- strsplit(turnsX$speech, ' ')
c(NA, Map(intersect, tmp[-1], tmp[-length(tmp)]))
#[[1]]
#[1] NA
#[[2]]
#[1] "let's" "a" "look"
#[[3]]
#[1] "yeah"
#[[4]]
#[1] "start"
You want the word boundaries "\\b"
library(stringr)
pattern <- c()
extracted <- c()
for(i in 2:nrow(turnsX)){
pattern[i - 1] <- paste0(unlist(str_split(turnsX$speech[i - 1], " ")), collapse = "|\\b")
extracted[i] <- str_extract_all(turnsX$speech[i], pattern[i - 1])
}
# [[1]]
# NULL
#
# [[2]]
# [1] "let's" "a" "look"
#
# [[3]]
# [1] "yeah"
#
# [[4]]
# [1] "start"

How to perform a vectorised operation with a self-defined function, adding the results to a list?

library(tidyverse)
ridiculous_function <- function(a, b){
moo <- a
baz <- b
list(moo, baz)
}
test <- ridiculous_function("apple", "A")
> test
[[1]]
[1] "apple"
[[2]]
[1] "A"
This code produces a list of elements of a and b, however what I would like is to run the function over two vectors in parallel, and then put all of the results in the same list.
For example, with these two vectors:
fruits10 <- fruit[1:10]
letters10 <- LETTERS[1:10]
I would want to create a list which produces elements of character vectors for "apple", "A", "apricot", "B", "avocado", "C".. and so on. My real scenario is a lot more complex so I need a solution which works with the confines of my function.
Expected output:
> test
[[1]]
[1] "apple"
[[2]]
[1] "A"
[[3]]
[1] "apricot"
[[4]]
[1] "B"
[[5]]
[1] "avocado"
[[6]]
[1] "C"
....
[[19]]
[1] "blueberry"
[[20]]
[1] "T"
How about:
fruits10 <- fruit[1:10]
letters10 <- LETTERS[1:10]
ridiculous_function <- function(a, b){
moo <- a
baz <- b
list(moo, baz)
}
library(tidyverse)
flatten(map2(fruits10, letters10, ridiculous_function))
which gives you
[1]]
[1] "apple"
[[2]]
[1] "A"
[[3]]
[1] "apricot"
[[4]]
[1] "B"
[[5]]
[1] "avocado"
[[6]]
[1] "C"
[[7]]
[1] "banana"
[[8]]
[1] "D"
etc...
Here are a few different ways of doing this:
library(tidyverse)
fruits10 <- fruit[1:10]
letters10 <- LETTERS[1:10]
ridiculous_function <- function(a, b){
moo <- a
baz <- b
list(moo, baz)
}
# using mapply, base R for writing packages
mapply(ridiculous_function, fruits10, letters10) %>%
split(rep(1:ncol(.), each = nrow(.)))
# using map2, takes two args
map2(fruits10, letters10, ridiculous_function)
# using pmap, can take as many args as you want
list(a = fruits10,
b = letters10) %>%
pmap(ridiculous_function)
You ask for results in a flat list format, so you can pop a flatten at the end
of each of these, but usually you would want to retain the list structure.

How to associate the factor and lists?

I have a lot of named lists. Now I want to separate them according to the number of letter "a" within each element. For instants,
library(stringr)
data1 <- c("apple","appreciate","available","account","adapt")
data2 <- c("tab","banana","cable","tatabox","aaaaaaa","aaaaaaaaaaa")
list1 <- list(data1,data2)
names(list1) <- c("a","b")
ca <- lapply(list1, function(x) str_count(x, "a")) #counting letter a
factor1 <- lapply(ca,as.factor) #convert ca to factor
#is that possible to associate factor1 to list1, then I can separate
#elements depends on the factor1?
#ideal results
result$1 or result[1]
$`1`
$`a`$`1`
[1] "apple" "account"
$`b`$`1`
[1] "tab" "cable"
You can get very close with one line using split and Map:
Map(split, list1, Map(stringr::str_count, list1, "a"))
$a
$a$`1`
[1] "apple" "account"
$a$`2`
[1] "appreciate" "adapt"
$a$`3`
[1] "available"
$b
$b$`1`
[1] "tab" "cable"
$b$`2`
[1] "tatabox"
$b$`3`
[1] "banana"
$b$`7`
[1] "aaaaaaa"
$b$`11`
[1] "aaaaaaaaaaa"
This lists all the "a" elements first and then all the "b" elements grouped by the number of "a" characters.

Split a string on alternating index

I have a string similar to "HLeelmloon" which is two words interweaved together. How can I separate this into two separate words, splitting on alternating letters?
I can use strsplit() and a for loop to allocate alternating letters to two new vectors and then join the list but this seems very long winded:
string <- "HLeelmloon"
split<-el(strsplit(string,''))
> split
[1] "H" "L" "e" "e" "l" "m" "l" "o" "o" "n"
word1<-c()
word2<-c()
for(i in 1:length(split)){
if(i %% 2 == 1){
word1<-append(word1, split[i])
} else {
word2<-append(word2, split[i])
}
}
word1 = paste0(word1, collapse = '')
word2 = paste0(word2, collapse = '')
> word1
[1] "Hello"
> word2
[1] "Lemon"
My issue is it's not very elegant, and it doesn't upscale well if I want to split the string into N different words. Is there a better way to do this?
You could use gsub to capture alternating characters into the same group:
gsub("(.)(.)?", "\\1", string)
#[1] "Hello"
gsub("(.)(.)?", "\\2", string)
#[1] "Lemon"
You can do it by using TRUE and FALSE for indexing, i.e.
v1 = strsplit(string, '')[[1]]
paste(v1[c(TRUE, FALSE)], collapse = '')
#[1] "Hello"
paste(v1[c(FALSE, TRUE)], collapse = '')
#[1] "Lemon"
Considering your question is how to split into more than two words, you should use the split function. Using your example data can be a bit confusing because you chose to name one variable 'split'. In the following block, the first 'split' is the function, the second one your split variable.
number_of_words <- 2
lapply(split(split,1:number_of_words),paste0,collapse='')
$`1`
[1] "Hello"
$`2`
[1] "Lemon"
number_of_words <- 3
lapply(split(split,1:number_of_words),paste0,collapse='')
$`1`
[1] "Heln"
$`2`
[1] "Llo"
$`3`
[1] "emo"
To avoid confusion, here's the same code without the variable named split:
number_of_words <- 2
lapply(split(el(strsplit(string,'')),1:number_of_words),paste0,collapse='')
$`1`
[1] "Hello"
$`2`
[1] "Lemon"
Try this code:
paste0(split[seq(1,nchar(string),by = 2)],collapse="")
[1] "Hello"
> paste0(split[seq(2,nchar(string),by = 2)],collapse="")
[1] "Lemon"
It appends even and odd positions in the string string
Another way using your split variable, will work with any number of words:
N <- 2
apply(matrix(split,N),1,paste,collapse="")
# [1] "Hello" "Lemon"

Check list in list in R

I have 2 lists, I want to check if the second list in the first list, if yes, paste letters "a","b"... to each element in the first list
list1 <- list("Year","Age","Enrollment","SES","BOE")
list2 <- list("Year","Enrollment","SES")
I try to use lapply
text <- letters[1:length(list2)]
listText<- lapply(list1,function(i) ifelse(i %in% list2,paste(i,text[i],sep="^"),i))
I got wrong output
> listText
[[1]]
[1] "Year^NA"
[[2]]
[1] "Age"
[[3]]
[1] "Enrollment^NA"
[[4]]
[1] "SES^NA"
[[5]]
[1] "BOE"
This is the output I want
[[1]]
[1] "Year^a"
[[2]]
[1] "Age"
[[3]]
[1] "Enrollment^b"
[[4]]
[1] "SES^c"
[[5]]
[1] "BOE"
We can use match to find the index and then use it to subset the first list and paste the letters
i1 <- match(unlist(list2), unlist(list1))
list1[i1] <- paste(list1[i1], letters[seq(length(i1))], sep="^")
You just need change to :
text <- as.character(letters[1:length(list2)])
names(text) <- unlist(list2)
The result is :
> listText
[[1]]
[1] "Year^a"
[[2]]
[1] "Age"
[[3]]
[1] "Enrollment^b"
[[4]]
[1] "SES^c"
[[5]]
[1] "BOE"

Resources