using purrr to extract elements from multiple lists starting with a common letter - r

I have a list of lists. One element in each list has a name beginning with "n_". How do I extract these elements and store them in a separate list? Can I use a combination of map and starts_with?
E.g.:
m1 <- list(n_age = c(19,40,39),
names = c("a", "b", "c"))
m2 <- list(n_gender = c("m","f","f"),
names = c("f", "t", "d"))
nice_list <- list(m1, m2)
I was hoping that something like the following to work (it doesn't!):
output <- map(nice_list, starts_with("n_"))

You could (ab)use partial matching of $:
map(nice_list, `$`, "n_")
(I don't really recommend it).
(And I can't figure out why lapply(nice_list, `$`, "n_") doesn't work (gives a list(NULL, NULL)).

How about this?
map(nice_list, ~.x[grep("n_", names(.x))])
#[[1]]
#[[1]]$n_age
#[1] 19 40 39
#
#
#[[2]]
#[[2]]$n_gender
#[1] "m" "f" "f"
Or using starts_with
map(nice_list, ~.x[starts_with("n_", vars = names(.x))])
Or to flatten the nested list, you could do
unlist(map(nice_list, ~.x[grep("n_", names(.x))]), recursive = F)
#$n_age
#[1] 19 40 39
#
#$n_gender
#[1] "m" "f" "f"

Related

Finding specific elements in lists

I am stuck at one of the challenges proposed in a tutorial I am reading.
# Using the following code:
challenge_list <- list(words = c("alpha", "beta", "gamma"),
numbers = 1:10
letter = letters
# challenge_list
# Extract the following things:
#
# - The word "gamma"
# - The letters "a", "e", "i", "o", and "u"
# - The numbers less than or equal to 3
I have tried using the followings:
## 1
challenge_list$"gamma"
## 2
challenge_list [[1]["gamma"]]
But nothing works.
> challenge_list$words[challenge_list$words == "gamma"]
[1] "gamma"
> challenge_list$letter[challenge_list$letter %in% c("a","e","i","o","u")]
[1] "a" "e" "i" "o" "u"
> challenge_list$numbers[challenge_list$numbers<=3]
[1] 1 2 3
We can use a function and then do the subset if it is numeric or not and then use Map to pass the list to vector that correspond to the original list element and apply the f1. This would return the new list with the filtered values
f1 <- function(x, y) if(is.numeric(x)) x[ x <= y] else x [x %in% y]
out <- Map(f1, challenge_list, list('gamma', 3, c("a","e","i","o","u")))
out
-output
#$words
#[1] "gamma"
#$numbers
#[1] 1 2 3
#$letter
#[1] "a" "e" "i" "o" "u"
Try this. Most of R objects can be filtered using brackets. In the case of lists you have to use a pair of them like [[]][] because the first one points to the object inside the list and the second one makes reference to the elements inside them. For vectors the task is easy as you only can use a pair of brackets and set conditions to extract elements. Here the code:
#Data
challenge_list <- list(words = c("alpha", "beta", "gamma"),
numbers = 1:10
letter = letters
#Code
challenge_list[[1]][1]
letter[letter %in% c("a", "e", "i", "o","u")]
numbers[numbers<=3]
As I have noticed your data is in a list, you can also play with the position of the elements like this:
#Data 2
challenge_list <- list(words = c("alpha", "beta", "gamma"),numbers = 1:10,letter = letters)
#Code 2
challenge_list[[1]][1]
challenge_list[[3]][challenge_list[[3]] %in% c("a", "e", "i", "o","u")]
challenge_list[[2]][challenge_list[[2]]<=3]
Output:
challenge_list[[1]][1]
[1] "alpha"
challenge_list[[3]][challenge_list[[3]] %in% c("a", "e", "i", "o","u")]
[1] "a" "e" "i" "o" "u"
challenge_list[[2]][challenge_list[[2]]<=3]
[1] 1 2 3

How to apply the same function to several variables in R?

I know that similar questions have already been asked (e.g. Passing list element names as a variable to functions within lapply or R - iteratively apply a function of a list of variables), but I couldn't manage to find a solution for my problem based on these posts.
I have an event dataset (~100 variables, >2000 observations) that contains variables with information on the involved actors. One variable can only contain one actor, so if several actors have been involved in the event, they are spread over several variables (e.g. actor1, actor2, ...). These actors can be classified into two groups ("s" and "nons"). For later use, I need two lists of actors: one that contains all actors of the category "s" and one that contains all actors of "nons". "s" only consists of three actors while "nons" consists of dozens of actors.
# create example data
df <- data.frame(id = c(1:8),
actor1 = c("A", "B", "D", "E", "F", "G", "H", NA),
actor2 = c("A", NA, "B", "C", "E", "I", "D", "G"))
df <-
df %>%
mutate(actor1 = as.character(actor1),
actor2 = as.character(actor2))
Since the script I am about to prepare is supposed to be used on updated versions of the dataset in the future, I would like to automate as much as possible and keep the parts of the script that would need to be adapted as limited as possible. My idea was to create one function per category that extracts the actors of the respective category (e.g. "nons") from one variable (e.g. actor1) in a list and then "loop" this function over the other variables (ideally with the apply family).
I know which category each actor belongs to ("A", "B", and "C" are category "s"), which allows me to define a separation rule as used in the function below (the filter command).
# create function
nons_function <- function(col) {
col_ <- enquo(col)
nons_list <-
df %>%
filter(!is.na(!!col_), !!col_ != "A", !!col_ != "B", !!col_ != "C") %>%
distinct(!!col_) %>%
pull()
nons_list
}
# create list of variables to "loop" over
actorlist <- c("actor1", "actor2")
This results in the following. Instead of two lists of actors I get a list that contains the variable names as character strings.
> lapply(actorlist, nons_function)
[[1]]
[1] "actor1"
[[2]]
[1] "actor2"
What I would like to get is something like the following:
> lapply(actorlist, nons_function)
[[1]]
[1] "D" "E" "F" "G" "H"
[[2]]
[1] "E" "I" "D" "G"
The problem is probably the way I am passing the variable names to my function within lapply. Apparently, my function is not able use a character input as variable names. However, I have not found a way to either adapt my function in a way that allows for character input or to provide my function with a list of variables to loop over in a way it can digest.
Any help appreciated!
EDIT: Initially I had named the actors in a misleading way (actor names indicated which category an actor belongs to), which lead to answers that do not really help in my case. I have changed the actor names from "s1", "s2", "nons1", "nons2" etc to "A", "B", "C" etc now.
here is an option using base r.
for nons-actors:
lapply( df[, 2:3], function(x) grep( "^nons", x, value = TRUE ) )
#$actor1
#[1] "nons1" "nons2" "nons3" "nons4" "nons5"
#
#$actor2
#[1] "nons2" "nons6" "nons1" "nons4"
and for s-actors:
lapply( df[, 2:3], function(x) grep( "^s", x, value = TRUE ) )
# $actor1
# [1] "s1" "s2"
#
# $actor2
# [1] "s1" "s2" "s3"
Here is an option
library(dplyr)
library(stringr)
library(purrr)
map(actorlist, ~ df %>%
select(.x) %>%
filter(!str_detect(!! rlang::sym(.x), "^s\\d+$")) %>%
pull(1))
#[[1]]
#[1] "nons1" "nons2" "nons3" "nons4" "nons5"
#[[2]]
#[1] "nons2" "nons6" "nons1" "nons4"
It can be wrapped as a function as well. Note that the input is string, so instead of enquo, use sym to convert to symbol and then evaluate (!!)
f1 <- function(dat, colNm) {
dat %>%
select(colNm) %>%
filter(!str_detect(!! rlang::sym(colNm), "^s\\d+$")) %>%
pull(1) %>%
unique
}
map(actorlist, f1, dat = df)
NOTE: This can be done more easily, but here we are using similar code from the OP's post
Another option is to use split with grepl in base R and that returns a list of both 'nons' and 's' after removing the NAs
lapply(df[2:3], function(x) {
x1 <- x[!is.na(x)]
split(x1, grepl("nons", x1))})
Check my solution and see if it works for you.
require("dplyr")
# create example data
df <- data.frame(id = c(1:8),
actor1 = c("s1", "s2", "nons1", "nons2", "nons3", "nons4", "nons5", NA),
actor2 = c("s1", NA, "s2", "s3", "nons2", "nons6", "nons1", "nons4"))
df <-
df %>%
mutate(actor1 = as.character(actor1),
actor2 = as.character(actor2))
# Function for getting the category
category_function <- function(col,categ){
if(categ == "non"){
outp = grep("^non",col,value = T)
}else{
outp = grep("^s",col,value = T)
}
return(outp)
}
# Apply the function to all variables whose name starts with "actor"
sapply(df[grep("actor",names(df),value=T)],category_function,categ="non")
sapply(df[grep("actor",names(df),value=T)],category_function,categ="s")
My output was the following:
> sapply(df[grep("actor",names(df),value=T)],category_function,categ="non")
$actor1
[1] "nons1" "nons2" "nons3" "nons4" "nons5"
$actor2
[1] "nons2" "nons6" "nons1" "nons4"
> sapply(df[grep("actor",names(df),value=T)],category_function,categ="s")
$actor1
[1] "s1" "s2"
$actor2
[1] "s1" "s2" "s3"

Is there a R function for limiting the length of list elements?

I am struggling with a list manipulation in R right now. I have a list containing about 3000 elements, where each element is a character vector. The length of these character vectors is between 7 and 10.
I would like to manipulate this list in such a way, that those character vectors, that contain more than 7 elements, are limited to only the first 7 elements - hence drop the 8th, 9th, and 10th element/word/number of the respective character vector of the list.
Is there an easy way to do this? I hope you understand what I mean.
Thanks in advance!
You can use lapply as below:
mylist <- list(a = c("a", "b"),
b = c("a", "b", "c"))
mylist
$a
[1] "a" "b"
$b
[1] "a" "b" "c"
mylist2 <- lapply(mylist, function(x) {
x[1:min(length(x), 2)]
})
mylist2
$a
[1] "a" "b"
$b
[1] "a" "b"
What you need is an auxiliary function that will shorten your vector. Something like
shorten_vector <- function(y, max_length = 7){
# NOTE: assumes that there are at least 7 elements in the vector.
y[seq_len(max_length)]
}
you can then shorten the vectors in your list with
lapply(your_list, shorten_vector)
Or better
lapply(your_list, head, 7) # Thanks Moody
Reproducible example
# Make an object for an example. A list of length 15
# where each element is a character vector between length 7 and 10
random_length <- sample(7:10, 15, replace = TRUE)
char_list <-
lapply(random_length,
function(x){
letters[seq_len(x)]
})
# utility function
shorten_vector <- function(y, max_length = 7){
y[seq_len(max_length)]
}
lapply(char_list,
shorten_vector)
Bonus
You said in a comment on Sonny's answer that you weren't really sure how the lapply worked. At it's conceptual core, lapply is a wrapper around a for loop. The equivalent for loop would be
for(i in seq_along(char_list)){
char_list[[i]] <- shorten_vector(char_list[[i]])
}
char_list
The lapply just handles the iteration limits for you and looks a little cleaner on the screen.

Remove duplicated elements from list

I have a list of character vectors:
my.list <- list(e1 = c("a","b","c","k"),e2 = c("b","d","e"),e3 = c("t","d","g","a","f"))
And I'm looking for a function that for any character that appears more than once across the list's vectors (in each vector a character can only appear once), will only keep the first appearance.
The result list for this example would therefore be:
res.list <- list(e1 = c("a","b","c","k"),e2 = c("d","e"),e3 = c("t","g","f"))
Note that it is possible that an entire vector in the list is eliminated so that the number of elements in the resulting list doesn't necessarily have to be equal to the input list.
We can unlist the list, get a logical list using duplicated and extract the elements in 'my.list' based on the logical index
un <- unlist(my.list)
res <- Map(`[`, my.list, relist(!duplicated(un), skeleton = my.list))
identical(res, res.list)
#[1] TRUE
Here is an alternative using mapply with setdiff and Reduce.
# make a copy of my.list
res.list <- my.list
# take set difference between contents of list elements and accumulated elements
res.list[-1] <- mapply("setdiff", res.list[-1],
head(Reduce(c, my.list, accumulate=TRUE), -1))
Keeping the first element of the list, we compute on subsequent elements and the a list of the cumulative vector of elements produced by Reduce with c and the accumulate=TRUE argument. head(..., -1) drops the final list item containing all elements so that the lengths align.
This returns
res.list
$e1
[1] "a" "b" "c" "k"
$e2
[1] "d" "e"
$e3
[1] "t" "g" "f"
Note that in Reduce, we could replace c with function(x, y) unique(c(x, y)) and accomplish the same ultimate output.
I found the solutions here very complex for my understanding and sought a simpler technique. Suppose you have the following list.
my_list <- list(a = c(1,2,3,4,5,5), b = c(1,2,2,3,3,4,4),
d = c("Mary", "Mary", "John", "John"))
The following much simpler piece of code removes the duplicates.
sapply(my_list, unique)
You will end up with the following.
$a
[1] 1 2 3 4 5
$b
[1] 1 2 3 4
$d
[1] "Mary" "John"
There is beauty in simplicity!

In R, how can I set the names of an object and return it in one line?

I would like to set the names of my R object and return it in one line. It should look something like:
names(doWork(), c("a", "b", "c"))
And perform the equivalent of:
x <- doWork()
names(x) <- c("a", "b", "c")
x
Is this possible?
You can try setNames
x <- setNames(doWork(), letters[1:3])
To add to what #rawr states:
`names<-`(x, letters[1:3])
works. This isn't super interesting for setting names, since setNames exists, but there are many other attribute replacement functions that don't have a corresponding attribute setting function, so this can become useful (when playing code golf). For example, if we want to set column names for a list of matrices:
mats <- replicate(2, matrix(sample(1:100, 4), 2), simplify=F) # list of matrices
lapply(mats, `colnames<-`, LETTERS[1:2])
Produces:
[[1]]
A B
[1,] 78 59
[2,] 39 93
[[2]]
A B
[1,] 99 54
[2,] 1 16

Resources