How to subset from a list in R - r

I have a rather simple task but haven't find a good solution.
> mylist
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10
[[2]]
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
[[3]]
[1] 25 26 27 28 29 30 31 32
y <- c(3,5,9)
I would like to extract from mylist the sub-elements 3,5, and 9 of each component in the list.
I have tried, sapply[mylist,"[[",y] but not luck!, and others like vapply, lapply, etc..

You could use sapply(mylist, "[", y):
mylist <- list(1:5, 6:10, 11:15)
sapply(mylist, "[", c(2,3))

Try using [ instead of [[ (and depending on what you're after you light actually want lapply).
From ?'[[':
The most important distinction between [, [[ and $ is that the [ can
select more than one element whereas the other two select a single
element.

Using lapply:
# create mylist
list1<-1:10
list2<-letters[1:26]
list3<-25:32
mylist<-list(list1,list2,list3)
# select 3,5,9th element from each list
list.2 <- lapply(mylist, function(x) {x[c(3,5,9)]})

purrr provides another solution for solving these kinds of list manipulations within the tidyverse
library(purrr)
library(dplyr)
desired_values <- c(1,3)
mylist <- list(1:5, letters[1:6], 11:15) %>%
purrr::map(`[`,desired_values)
mylist

An easy way to subset repeated named elements of a list, similar to other answers here.
(so I can find it next time I look this question up)
E.g., subset the "b" elements from a repeating list where each element includes an "a" and "b" sub-element:
mylist <- list(
list(
"a" = runif(3),
"b" = runif(1)
),
list(
"a" = runif(3),
"b" = runif(1)
)
)
mylist
#> [[1]]
#> [[1]]$a
#> [1] 0.7547490 0.6528348 0.2339767
#>
#> [[1]]$b
#> [1] 0.8815888
#>
#>
#> [[2]]
#> [[2]]$a
#> [1] 0.51352909 0.09637425 0.99291650
#>
#> [[2]]$b
#> [1] 0.8407162
blist <- lapply(
X = mylist,
FUN = function(x){x[["b"]]}
)
blist
#> [[1]]
#> [1] 0.8815888
#>
#> [[2]]
#> [1] 0.8407162
Created on 2019-11-06 by the reprex package (v0.3.0)

I don't think sgibb's answer gives what you would want. I suggest making a new function:
subsetList <- function(myList, elementNames) {
lapply(elementNames, FUN=function(x) myList[[x]])
}
Then you can use it like this:
x <- list(a=3, b="hello", c=4.5, d="world")
subsetList(x, c("d", "a"))
subsetList(x, c(4, 1))
These both give
[[1]]
[1] "world"
[[2]]
[1] 3
which is what you would want, I think.

There are better ways of doing this, but here's a quick solution.
# your values
list1<-1:10
list2<-letters[1:26]
list3<-25:32
# put 'em together in a list
mylist<-list(list1,list2,list3)
# function
foo<-function(x){x[c(3,5,9)]}
# apply function to each of the element in the list
foo(mylist[[1]])
foo(mylist[[2]])
foo(mylist[[3]])
# check the output
> foo(mylist[[1]])
[1] 3 5 9
> foo(mylist[[2]])
[1] "c" "e" "i"
> foo(mylist[[3]])
[1] 27 29 NA

Related

Reorder vector so no certain items are positioned next to each other

Please consider the following example:
[[1]]
[1] 11 12 13 14
[[2]]
[1] 1 2 3
[[3]]
[1] 4
[[4]]
[1] 5
[[5]]
[1] 6
[[6]]
[1] 7
[[7]]
[1] 8
[[8]]
[1] 9
[[9]]
[1] 10
[[10]]
[1] 15
[[11]]
[1] 16
[[12]]
[1] 17
In this example, I have 12 unique values in a vector that is 17 elements long. For simplicity, let's say that this vector is:
foo_bar <- c("b","b","b","c","d","e","f","g","h","i","a","a","a","a", "j", "k", "l")
The first code block shows the index positions in foo_bar of each of the unique values (the letters a–l).
I am attempting to write an algorithm that reorders foo_bar so that, for all indices except the final one (index 17 in the foo_bar example), position i and position i+1 never contains the same two values. Here's an example of what would be an appropriate outcome:
reordered_foo_bar <- c("b","c","b","d","b","e","f","g","h","a","i","a","j","a","k","a", "l")
something like this?
foo_bar <- c("b","b","b","c","d","e","f","g","h","i","a","a","a","a", "j", "k", "l")
test == FALSE
while (test == FALSE) {
new_foo_bar <- sample(foo_bar, size = length(foo_bar), replace = FALSE)
test <- length(rle(new_foo_bar)$lengths) == length(foo_bar)
}
new_foo_bar
# [1] "f" "a" "g" "b" "h" "d" "j" "c" "e" "i" "a" "b" "k" "a" "l" "a" "b"
First we identify the indices of the unique values in the vector.
indices <-
unique(foo_bar) %>%
sort() %>%
lapply(function(x) which(foo_bar == x))
Then we create a position score based on 1) which order the value has when ordered by decreasing frequency and 2) how many previous occurences of this value has occurred, and we add these two values together. However, to ensure that we get a different value inserted between them, we divide 2) by 2. Finally, we order the position scores and reorder foo_bar with this new order.
This solution is also robust in case it is not possible to prevent duplicate values next to each other (for example because the values are c("a","a","b","a").
out <-
lengths(indices) %>%
lapply(., function(x) 1:x) %>%
{lapply(len_seq(.), function(x) (unlist(.[x]) + x / 2))} %>%
unlist() %>%
order() %>%
{unlist(indices)[.]} %>%
foo_bar[.]
The output is then:
> "a" "b" "a" "c" "b" "d" "a" "e" "b" "f" "a" "g" "h" "i" "j" "k" "l"

Add positional index to a list

I would like to add a sequential element onto a list. Suppose I have the following list
lst <- list("A"=list(e1="a",e2="!"), "B"=list(e1="b", e2="#"))
$A
$A$e1
[1] "a"
$A$e2
[1] "!"
$B
$B$e1
[1] "b"
$B$e2
[1] "#"
I would like to append a e3 which is the position index of that element in the list so essentially I would like my list to be:
$A
$A$e1
[1] "a"
$A$e2
[1] "!"
$A$e3
[1] 1
$B
$B$e1
[1] "b"
$B$e2
[1] "#"
$B$e3
[1] 2
setNames(lapply(seq_along(lst), function(i){
temp = lst[[i]]
temp$e3 = i
temp
}), names(lst))
#$`A`
#$`A`$`e1`
#[1] "a"
#$`A`$e2
#[1] "!"
#$`A`$e3
#[1] 1
#$B
#$B$`e1`
#[1] "b"
#$B$e2
#[1] "#"
#$B$e3
#[1] 2
Here is a solution that doesn't assume that the sub-lists have the same known number of elements.
library("tidyverse")
library("glue")
lst <- list("A"=list(e1="a",e2="!"), "B"=list(e1="b", e2="#"))
# The part
# `setNames(list(.y), glue("e{length(.x) + 1}"))`
# creates a one-element list named accordingly to append to the previous list
map2(lst, seq(lst),
~ append(.x, setNames(list(.y), glue("e{length(.x) + 1}") )))
#> $A
#> $A$e1
#> [1] "a"
#>
#> $A$e2
#> [1] "!"
#>
#> $A$e3
#> [1] 1
#>
#>
#> $B
#> $B$e1
#> [1] "b"
#>
#> $B$e2
#> [1] "#"
#>
#> $B$e3
#> [1] 2
# If naming the additional element is not important, then this can simplified to
map2(lst, seq(lst), append)
# or
map2(lst, seq(lst), c)
Created on 2019-03-06 by the reprex package (v0.2.1)
Another option using Map
Map(function(x, y) c(x, "e3" = y), x = lst, y = seq_along(lst))
#$A
#$A$e1
#[1] "a"
#$A$e2
#[1] "!"
#$A$e3
#[1] 1
#$B
#$B$e1
#[1] "b"
#$B$e2
#[1] "#"
#$B$e3
#[1] 2
This could be written even more concise as
Map(c, lst, e3 = seq_along(lst))
Thanks to #thelatemail
We can use a for loop as well
for(i in seq_along(lst)) lst[[i]]$e3 <- i
Assuming I understood correctly, that you want to add a 3rd element to each nested list which contains the index of that list in it's parent list. This works:
library(rlist)
lst <- list("A"=list(e1="a",e2="!"), "B"=list(e1="b", e2="#"))
for(i in seq(1:length(lst))){
lst[[i]] <- list.append(lst[[i]],e3=i)
}
lst
We can loop along the length of lst with lapply, adding this sequential index to each element.
lst2 <- lapply(seq_along(lst), function(i) {
df <- lst[[i]]
df$e3 <- i
return(df)
})
names(lst2) <- names(lst) # Preserve names from lst
Or, if you're not scared about modifying in place:
lapply(seq_along(lst), function(i) {
lst[[i]]$e3 <<- i
})
Both give the same output:
$A
$A$e1
[1] "a"
$A$e2
[1] "!"
$A$e3
[1] 1
$B
$B$e1
[1] "b"
$B$e2
[1] "#"
$B$e3
[1] 2

Appending list of dates to list of lists removes date format

I'm trying to append a list of dates to a list of lists such as myList below. This is working as expected except the date format for the date element in each list element is lost. Any ideas?
myList<-list(list("event"="A"),
list("event"="B"),
list("event"="C"))
dates<-as.Date(c("2011-06-05","2012-01-12","2016-05-09"))
outList<-mapply(FUN="c",myList,eventDate=as.list(dates),SIMPLIFY = FALSE)
I'm looking to achieve the below
[[1]]
[[1]]$event
[1] "A"
[[1]]$eventDate
[1] "2011-06-05"
[[2]]
[[2]]$event
[1] "B"
[[2]]$eventDate
[1] "2012-01-12"
[[3]]
[[3]]$event
[1] "C"
[[3]]$eventDate
[1] "2016-06-09"
Using Map, you can also create a small (lambda) function like so:
myList <- list(
list(event = "A"),
list(event = "B"),
list(event = "C")
)
dates <- as.Date(c("2011-06-05", "2012-01-12", "2016-05-09"))
outList <- Map(f = function(origList, date) {
origList$eventDate <- date
return(origList)
}, myList, dates)
outList
#> [[1]]
#> [[1]]$event
#> [1] "A"
#>
#> [[1]]$eventDate
#> [1] "2011-06-05"
#>
#>
#> [[2]]
#> [[2]]$event
#> [1] "B"
#>
#> [[2]]$eventDate
#> [1] "2012-01-12"
#>
#>
#> [[3]]
#> [[3]]$event
#> [1] "C"
#>
#> [[3]]$eventDate
#> [1] "2016-05-09"
The reason why you get the dates converted to numbers, is that the c function converts all elements to the lowest common type (usually characters, in this case numeric).
For example:
c(123, as.Date("2016-01-01"))
#> [1] 123 16801
It may be better to index as c could coerce it to integer storage value
for(i in seq_along(myList)) myList[[i]][['eventDate']] <- dates[i]
An additional list wrapper to insulate each Date element will also work here. I constructed that by running an lapply with the list function on the dates vector:
Map("c", myList, eventDate=lapply(dates, list))
[[1]]
[[1]]$event
[1] "A"
[[1]]$eventDate
[1] "2011-06-05"
[[2]]
[[2]]$event
[1] "B"
[[2]]$eventDate
[1] "2012-01-12"
[[3]]
[[3]]$event
[1] "C"
[[3]]$eventDate
[1] "2016-05-09"

Using lapply to apply function to each row in a tibble

This is my code that attempts apply a function to each row in a tibble , mytib :
> mytib
# A tibble: 3 x 1
value
<chr>
1 1
2 2
3 3
Here is my code where I'm attempting to apply a function to each line in the tibble :
mytib = as_tibble(c("1" , "2" ,"3"))
procLine <- function(f) {
print('here')
print(f)
}
lapply(mytib , procLine)
Using lapply :
> lapply(mytib , procLine)
[1] "here"
[1] "1" "2" "3"
$value
[1] "1" "2" "3"
This output suggests the function is not invoked once per line as I expect the output to be :
here
1
here
2
here
3
How to apply function to each row in tibble ?
Update : I appreciate the supplied answers that allow my expected result but what have I done incorrectly with my implementation ? lapply should apply a function to each element ?
invisible is used to avoid displaying the output. Also you have to loop through elements of the column named 'value', instead of the column as a whole.
invisible( lapply(mytib$value , procLine) )
# [1] "here"
# [1] "1"
# [1] "here"
# [1] "2"
# [1] "here"
# [1] "3"
lapply loops through columns of a data frame by default. See the example below. The values of two columns are printed as a whole in each iteration.
mydf <- data.frame(a = letters[1:3], b = 1:3, stringsAsFactors = FALSE )
invisible(lapply( mydf, print))
# [1] "a" "b" "c"
# [1] 1 2 3
To iterate through each element of a column in a data frame, you have to loop twice like below.
invisible(lapply( mydf, function(x) lapply(x, print)))
# [1] "a"
# [1] "b"
# [1] "c"
# [1] 1
# [1] 2
# [1] 3

Shuffling a vector - all possible outcomes of sample()?

I have a vector with five items.
my_vec <- c("a","b","a","c","d")
If I want to re-arrange those values into a new vector (shuffle), I could use sample():
shuffled_vec <- sample(my_vec)
Easy - but the sample() function only gives me one possible shuffle. What if I want to know all possible shuffling combinations? The various "combn" functions don't seem to help, and expand.grid() gives me every possible combination with replacement, when I need it without replacement. What's the most efficient way to do this?
Note that in my vector, I have the value "a" twice - therefore, in the set of shuffled vectors returned, they all should each have "a" twice in the set.
I think permn from the combinat package does what you want
library(combinat)
permn(my_vec)
A smaller example
> x
[1] "a" "a" "b"
> permn(x)
[[1]]
[1] "a" "a" "b"
[[2]]
[1] "a" "b" "a"
[[3]]
[1] "b" "a" "a"
[[4]]
[1] "b" "a" "a"
[[5]]
[1] "a" "b" "a"
[[6]]
[1] "a" "a" "b"
If the duplicates are a problem you could do something similar to this to get rid of duplicates
strsplit(unique(sapply(permn(my_vec), paste, collapse = ",")), ",")
Or probably a better approach to removing duplicates...
dat <- do.call(rbind, permn(my_vec))
dat[duplicated(dat),]
Noting that your data is effectively 5 levels from 1-5, encoded as "a", "b", "a", "c", and "d", I went looking for ways to get the permutations of the numbers 1-5 and then remap those to the levels you use.
Let's start with the input data:
my_vec <- c("a","b","a","c","d") # the character
my_vec_ind <- seq(1,length(my_vec),1) # their identifier
To get the permutations, I applied the function given at Generating all distinct permutations of a list in R:
permutations <- function(n){
if(n==1){
return(matrix(1))
} else {
sp <- permutations(n-1)
p <- nrow(sp)
A <- matrix(nrow=n*p,ncol=n)
for(i in 1:n){
A[(i-1)*p+1:p,] <- cbind(i,sp+(sp>=i))
}
return(A)
}
}
First, create a data.frame with the permutations:
tmp <- data.frame(permutations(length(my_vec)))
You now have a data frame tmp of 120 rows, where each row is a unique permutation of the numbers, 1-5:
>tmp
X1 X2 X3 X4 X5
1 1 2 3 4 5
2 1 2 3 5 4
3 1 2 4 3 5
...
119 5 4 3 1 2
120 5 4 3 2 1
Now you need to remap them to the strings you had. You can remap them using a variation on the theme of gsub(), proposed here: R: replace characters using gsub, how to create a function?
gsub2 <- function(pattern, replacement, x, ...) {
for(i in 1:length(pattern))
x <- gsub(pattern[i], replacement[i], x, ...)
x
}
gsub() won't work because you have more than one value in the replacement array.
You also need a function you can call using lapply() to use the gsub2() function on every element of your tmp data.frame.
remap <- function(x,
old,
new){
return(gsub2(pattern = old,
replacement = new,
fixed = TRUE,
x = as.character(x)))
}
Almost there. We do the mapping like this:
shuffled_vec <- as.data.frame(lapply(tmp,
remap,
old = as.character(my_vec_ind),
new = my_vec))
which can be simplified to...
shuffled_vec <- as.data.frame(lapply(data.frame(permutations(length(my_vec))),
remap,
old = as.character(my_vec_ind),
new = my_vec))
.. should you feel the need.
That gives you your required answer:
> shuffled_vec
X1 X2 X3 X4 X5
1 a b a c d
2 a b a d c
3 a b c a d
...
119 d c a a b
120 d c a b a
Looking at a previous question (R: generate all permutations of vector without duplicated elements), I can see that the gtools package has a function for this. I couldn't however get this to work directly on your vector as such:
permutations(n = 5, r = 5, v = my_vec)
#Error in permutations(n = 5, r = 5, v = my_vec) :
# too few different elements
You can adapt it however like so:
apply(permutations(n = 5, r = 5), 1, function(x) my_vec[x])
# [,1] [,2] [,3] [,4]
#[1,] "a" "a" "a" "a" ...
#[2,] "b" "b" "b" "b" ...
#[3,] "a" "a" "c" "c" ...
#[4,] "c" "d" "a" "d" ...
#[5,] "d" "c" "d" "a" ...

Resources