Build a dataframe from pairwise combinations of list elements - r

I have a list list. The first 5 elements of this list are:
[[1]]
[1] "#solarpanels" "#solar"
[[2]]
[1] "#Nuclear" "#Wind" "#solar"
[[3]]
[1] "#solar"
[[4]]
[1] "#steel" "#windenergy" "#solarenergy" "#carbonfootprint"
[[5]]
[1] "#solar" "#wind"
I would like to delete elements like [[3]] because contains only one element. Moreover, I would like to build a dataframe containing all the possible combinations for each row of the list. For example, dataframe with two columns (e.g. the first named A, the second B) such as:
A B
"#solarpanels" "#solar"
"#Nuclear" "#Wind"
"#Nuclear" "#solar"
"#steel" "#windenergy"
"#steel" "#solarenergy"
"#steel" "#carbonfootprint"
"#windenergy" "#carbonfootprint"
"#windenergy" "#solarenergy"
"#solarenergy" "#carbonfootprint"
"#solar" "#wind"
I tried with (just for one element)
for (i in 1:(length(list[[4]])-1)) {
df$from = rep(list[[4]][i],length(list[[4]])-i)
df$to = list[[4]][(i+1):length(list[[4]])]
}
where
df=data.frame(A=character(),
B=character(),
stringsAsFactors=FALSE)
but I obtained
data.frame`(`*tmp*`, A, value = c("#steel", "#steel", :
replacement has 3 rows, data has 0
for i=1.

Your data first:
l = list(
c("#solarpanels", "#solar"),
c("#Nuclear", "#Wind", "#solar"),
"#solar",
c("#steel", "#windenergy", "#solarenergy", "#carbonfootprint"),
c("#solar", "#wind")
)
Here's a two-liner version:
l = l[lengths(l) > 1L]
data.frame(do.call(rbind, unlist(lapply(l, combn, 2L, simplify = FALSE), recursive = FALSE)))
# X1 X2
# 1 #solarpanels #solar
# 2 #Nuclear #Wind
# 3 #Nuclear #solar
# 4 #Wind #solar
# 5 #steel #windenergy
# 6 #steel #solarenergy
# 7 #steel #carbonfootprint
# 8 #windenergy #solarenergy
# 9 #windenergy #carbonfootprint
# 10 #solarenergy #carbonfootprint
# 11 #solar #wind
More slowly, for clarity:
combn(x, k) returns every possible (unordered) subset of size k from x; what you're after is the pairs from each element of the list. By default, it returns this as a matrix with p = choose(length(x), k) columns, but that's not a helpful format for your use case; simplify = FALSE returns each subset as a new element of a list instead.
So lapply(l, combn, 2L, simplify = FALSE) will look something like:
# [[1]]
# [[1]][[1]]
# [1] "#solarpanels" "#solar"
#
#
# [[2]]
# [[2]][[1]]
# [1] "#Nuclear" "#Wind"
#
# [[2]][[2]]
# [1] "#Nuclear" "#solar"
(we have to filter the length-1 elements of l first, since it's an error to ask for 2 elements from a length-1 object, hence the first line)
The lapply(.) bit is the crux of your issue; the rest is just kludging the output (which already has all the correct data) into a data.frame format.
First, the lapply output is nested -- it's a list of lists. It's more uniform to have a list of length-2 vectors; unlist(., recusive=FALSE) accomplishes this by un-nesting the first level of lists (with recursive=TRUE, we'd wind up with a big long vector and lose the paired structure; we could work with this, but I think maybe a bit unnatural).
Next, we turn the list of length-2 vectors into a matrix (with an eye to the end goal -- a 2-column matrix is very easy to convert to a data.frame); list->matrix is done in base with do.call(rbind, .).
Finally we pass this to data.frame, et voila!
In data.table, I would do it slightly cleaner and in one command:
setDT(transpose(
unlist(lapply(l[lengths(l) > 1L], combn, 2L, simplify = FALSE), recursive = FALSE)
))[]
Given you likely don't care much about intermediate output, this would also be a good place to use magrittr:
library(magrittr)
l[lengths(l) > 1L] %>%
lapply(combn, 2L, simplify = FALSE) %>%
unlist(recursive = FALSE) %>%
do.call(rbind, . ) %>%
data.frame
It's more readable, but in this case, it might be nice to see that data.frame is the end goal up-front, as the intent of the unlist & do.call steps might otherwise be obscure.

Related

How to use grep to search for patterns matches within a list of data frames using a second list of character vectors in R

I have two lists in R. One is a list of data frames with rows that contain strings (List 1). The other is a list (of the same length) of characters (List 2). I would like to go through the lists in a parallel fashion taking the character string from List 2 and searching for it to get its position (using grep) in the data frame at the corresponding element in List 1. Here is a toy example to show what my lists look like:
List1 <- list(data.frame(a = c("other","other","dog")),
data.frame(a = c("cat","other","other")),
data.frame(a = c("other","other","bird")))
List2 <- list("a" = c("dog|xxx|xxx"),
"a" = c("cat|xxx|xxx"),
"a" = c("bird|xxx|xxx"))
The output I would like to get would be a list of the position in each data frame in List 1 of the pattern match i.e. in this example the positions would be 3, 1 & 3. So the list would be:
[[1]]
[1] 3
[[2]]
[1] 1
[[3]]
[1] 3
I cannot seem to figure out how to do this.
I tried lapply:
NewList1 <- lapply(1:length(List1),
function(x) grep(List2[[x]]))
But that does not work. I also tried purrr:map2:
NewList2<-map2(List2, List1, grep(List2$A, List1))
This also does not work. I would be very grateful of any suggestions anyone may have as to how to fix this. Many thanks to anyone willing to wade in!
Try Map + unlist
> Map(grep, List2, unlist(List1, recursive = FALSE))
$a
[1] 3
$a
[1] 1
$a
[1] 3
Using Map you can do -
Map(function(x, y) grep(y, x$a), List1, List2)
#[[1]]
#[1] 3
#[[2]]
#[1] 1
#[[3]]
#[1] 3
The map2 attempt was close but you need to refer lists as .x and .y in the function.
purrr::map2(List2, List1, ~grep(.x, .y$a))

Filtering a list of lists based on list combinations - R

I have a list of lists:
library(partitions)
list_parts(3)
>[1] (1,2,3)
>[[2]]
>[1] (1,3)(2)
>[[3]]
>[1] (1,2)(3)
>[[4]]
>[1] (2,3)(1)
>[[5]]
>[1] (1)(2)(3)
I need to filter out certain lists based on combinations as they are not feasible. For example list[4] is not possible because (2,3) cannot be a list without (1). How can I filter based on a combination rule set eg remove combinations where 2 and 3 are in a list without 1?
We can borrow the method from this answer to Find vector in list of vectors and wrap it in our own function.
library(partitions)
p = listParts(3)
detect = function(p, pattern) {
Position(function(x) identical(x, pattern), p, nomatch = 0) > 0
}
test = sapply(p, detect, pattern = 2:3)
p[!test]
# [[1]]
# [1] (1,2,3)
#
# [[2]]
# [1] (1,3)(2)
#
# [[3]]
# [1] (1,2)(3)
#
# [[4]]
# [1] (1)(2)(3)

r: how to partition a list or vector into pairs at an offset of 1

sorry for the elementary question but I need to partition a list of numbers at an offset of 1.
e.g.,
i have a list like:
c(194187, 193668, 192892, 192802 ..)
and need a list of lists like:
c(c(194187, 193668), c(193668, 192892), c(192892, 192802)...)
where the last element of list n is the first of list n+1. there must be a way to do this with
split()
but I can't figure it out
in mathematica, the command i need is Partition[list,2,1]
You can try like this, using zoo library
library(zoo)
x <- 1:10 # Vector of 10 numbers
m <- rollapply(data = x, 2, by=1, c) # Creates a Matrix of rows = n-1, each row as a List
l <- split(m, row(m)) #splitting the matrix into individual list
Output:
> l
$`1`
[1] 1 2
$`2`
[1] 2 3
$`3`
[1] 3 4
Here is an option using base R to create a vector of elements
v1 <- rbind(x[-length(x)], x[-1])
c(v1)
#[1] 194187 193668 193668 192892 192892 192802
If we need a list
split(v1, col(v1))
data
x <- c(194187, 193668, 192892, 192802);

Why is.vector on a data-frame doesn't return TRUE?

tl;dr - What the hell is a vector in R?
Long version:
Lots of stuff is a vector in R. For instance, a number is a numeric vector of length 1:
is.vector(1)
[1] TRUE
A list is also a vector.
is.vector(list(1))
[1] TRUE
OK, so a list is a vector. And a data frame is a list, apparently.
is.list(data.frame(x=1))
[1] TRUE
But, (seemingly violating the transitive property), a data frame is not a vector, even though a dataframe is a list, and a list is a vector. EDIT: It is a vector, it just has additional attributes, which leads to this behavior. See accepted answer below.
is.vector(data.frame(x=1))
[1] FALSE
How can this be?
To answer your question another way, the R Internals manual lists R's eight built-in vector types: "logical", "numeric", "character", "list", "complex", "raw", "integer", and "expression".
To test whether the non-attribute part of an object is really one of those vector types "underneath it all", you can examine the results of is(), like this:
isVector <- function(X) "vector" %in% is(X)
df <- data.frame(a=1:4)
isVector(df)
# [1] TRUE
# Use isVector() to examine a number of other vector and non-vector objects
la <- structure(list(1:4), mycomment="nothing")
chr <- "word" ## STRSXP
lst <- list(1:4) ## VECSXP
exp <- expression(rnorm(99)) ## EXPRSXP
rw <- raw(44) ## RAWSXP
nm <- as.name("x") ## LANGSXP
pl <- pairlist(b=5:8) ## LISTSXP
sapply(list(df, la, chr, lst, exp, rw, nm, pl), isVector)
# [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
Illustrating what #joran pointed out, that is.vector returns false on a vector which has any attributes other than names (I never knew that) ...
# 1) Example of when a vector stops being a vector...
> dubious = 7:11
> attributes(dubious)
NULL
> is.vector(dubious)
[1] TRUE
#now assign some additional attributes
> attributes(dubious) <- list(a = 1:5)
> attributes(dubious)
$a
[1] 1 2 3 4 5
> is.vector(dubious)
[1] FALSE
# 2) Example of how to strip a dataframe of attributes so it looks like a true vector ...
> df = data.frame()
> attributes(df)
$names
character(0)
$row.names
integer(0)
$class
[1] "data.frame"
> attributes(df)[['row.names']] <- NULL
> attributes(df)[['class']] <- NULL
> attributes(df)
$names
character(0)
> is.vector(df)
[1] TRUE
Not an answer, but here are some other interesting things that are definitely worth investigating. Some of this has to do with the way objects are stored in R.
One example:
If we set up a matrix of one element, that element being a list, we get the following. Even though it's a list, it can be stored in one element of the matrix.
> x <- matrix(list(1:5)) # we already know that list is also a vector
> x
# [,1]
# [1,] Integer,5
Now if we coerce x to a data frame, it's dimensions are still (1, 1)
> y <- as.data.frame(x)
> dim(y)
# [1] 1 1
Now, if we look at the first element of y, it's the data frame column,
> y[1]
# V1
# 1 1, 2, 3, 4, 5
But if we look at the first column of, y, it's a list
> y[,1]
# [[1]]
# [1] 1 2 3 4 5
which is exactly the same as the first row of y.
> y[1,]
# [[1]]
# [1] 1 2 3 4 5
There are a lot of properties about R objects that are cool to investigate if you have the time.

Apply function to corresponding elements in list of data frames

I have a list of data frames in R. All of the data frames in the list are of the same size. However, the elements may be of different types. For example,
I would like to apply a function to corresponding elements of data frame. For example, I want to use the paste function to produce a data frame such as
"1a" "2b" "3c"
"4d" "5e" "6f"
Is there a straightforward way to do this in R. I know it is possible to use the Reduce function to apply a function on corresponding elements of dataframes within lists. But using the Reduce function in this case does not seem to have the desired effect.
Reduce(paste,l)
Produces:
"c(1, 4) c(\"a\", \"d\")" "c(2, 5) c(\"b\", \"e\")" "c(3, 6) c(\"c\", \"f\")"
Wondering if I can do this without writing messy for loops. Any help is appreciated!
Instead of Reduce, use Map.
# not quite the same as your data
l <- list(data.frame(matrix(1:6,ncol=3)),
data.frame(matrix(letters[1:6],ncol=3), stringsAsFactors=FALSE))
# this returns a list
LL <- do.call(Map, c(list(f=paste0),l))
#
as.data.frame(LL)
# X1 X2 X3
# 1 1a 3c 5e
# 2 2b 4d 6f
To explain #mnel's excellent answer a bit more, consider the simple example of summing the corresponding elements of two vectors:
Map(sum,1:3,4:6)
[[1]]
[1] 5 # sum(1,4)
[[2]]
[1] 7 # sum(2,5)
[[3]]
[1] 9 # sum(3,6)
Map(sum,list(1:3,4:6))
[[1]]
[1] 6 # sum(1:3)
[[2]]
[1] 15 # sum(4:6)
Why the second one is the case might be made more obvious by adding a second list, like:
Map(sum,list(1:3,4:6),list(0,0))
[[1]]
[1] 6 # sum(1:3,0)
[[2]]
[1] 15 # sum(4:6,0)
Now, the next is more tricky. As the help page ?do.call states:
‘do.call’ constructs and executes a function call from a name or a
function and a list of arguments to be passed to it.
So, doing:
do.call(Map,c(sum,list(1:3,4:6)))
calls Map with the inputs of the list c(sum,list(1:3,4:6)), which looks like:
[[1]] # first argument to Map
function (..., na.rm = FALSE) .Primitive("sum") # the 'sum' function
[[2]] # second argument to Map
[1] 1 2 3
[[3]] # third argument to Map
[1] 4 5 6
...and which is therefore equivalent to:
Map(sum, 1:3, 4:6)
Looks familiar! It is equivalent to the first example at the top of this answer.

Resources