Merging intersecting list elements

Merging intersecting list elements - r

I have a list of R integer vectors which some elements are redundant and others intersect each other:
ll <- list(c(1,4), c(5,7,3,9), c(5,3,7,9), c(2,7,10), 8, 6)
The integers in the vector elements are just indices and not ranges, and the order of the elements in the list is arbitrary.
I'm looking for a function that'll return a list that'll merge ll's intersecting elements and remove the redundancy.
For the example ll above this function will return:
list(c(1,4), c(2,3,5,7,9,10), 6, 8)
Any idea?
Unfortunately, the solution offered by #alexis_laz in this post doesn't solve my problem because it assumes that the list is ordered, which is not my case.
For example, if I change the order of ll's elements:
ll <- list(c(2,7,10), c(1,4), c(5,7,3,9), 8, 6, c(5,3,7,9))
#alexis_laz's solution doesn't hold.

It's pretty messy but it my works for your case.
library(dplyr)
for (i in 1:(length(ll)-1)){
if (!is.null(unlist(ll[i])) & length(ll) > i){
for (j in length(ll):(i+1)) {
if (length(intersect(ll[[i]], ll[[j]])) > 0 ){
ll[[i]] <- union(ll[[i]], ll[[j]])
ll <- ll[-j]
}
}
}
}
[[1]]
[1] 1 4
[[2]]
[1] 5 7 3 9 2 10
[[3]]
[1] 8
[[4]]
[1] 6

Related

concatenation sublists of two different lists [duplicate]

I have two lists
first = list(a = 1, b = 2, c = 3)
second = list(a = 2, b = 3, c = 4)
I want to merge these two lists so the final product is
$a
[1] 1 2
$b
[1] 2 3
$c
[1] 3 4
Is there a simple function to do this?

If lists always have the same structure, as in the example, then a simpler solution is
mapply(c, first, second, SIMPLIFY=FALSE)

This is a very simple adaptation of the modifyList function by Sarkar. Because it is recursive, it will handle more complex situations than mapply would, and it will handle mismatched name situations by ignoring the items in 'second' that are not in 'first'.
appendList <- function (x, val)
{
stopifnot(is.list(x), is.list(val))
xnames <- names(x)
for (v in names(val)) {
x[[v]] <- if (v %in% xnames && is.list(x[[v]]) && is.list(val[[v]]))
appendList(x[[v]], val[[v]])
else c(x[[v]], val[[v]])
}
x
}
> appendList(first,second)
$a
[1] 1 2
$b
[1] 2 3
$c
[1] 3 4

Here are two options, the first:
both <- list(first, second)
n <- unique(unlist(lapply(both, names)))
names(n) <- n
lapply(n, function(ni) unlist(lapply(both, `[[`, ni)))
and the second, which works only if they have the same structure:
apply(cbind(first, second),1,function(x) unname(unlist(x)))
Both give the desired result.

Here's some code that I ended up writing, based upon #Andrei's answer but without the elegancy/simplicity. The advantage is that it allows a more complex recursive merge and also differs between elements that should be connected with rbind and those that are just connected with c:
# Decided to move this outside the mapply, not sure this is
# that important for speed but I imagine redefining the function
# might be somewhat time-consuming
mergeLists_internal <- function(o_element, n_element){
if (is.list(n_element)){
# Fill in non-existant element with NA elements
if (length(n_element) != length(o_element)){
n_unique <- names(n_element)[! names(n_element) %in% names(o_element)]
if (length(n_unique) > 0){
for (n in n_unique){
if (is.matrix(n_element[[n]])){
o_element[[n]] <- matrix(NA,
nrow=nrow(n_element[[n]]),
ncol=ncol(n_element[[n]]))
}else{
o_element[[n]] <- rep(NA,
times=length(n_element[[n]]))
}
}
}
o_unique <- names(o_element)[! names(o_element) %in% names(n_element)]
if (length(o_unique) > 0){
for (n in o_unique){
if (is.matrix(n_element[[n]])){
n_element[[n]] <- matrix(NA,
nrow=nrow(o_element[[n]]),
ncol=ncol(o_element[[n]]))
}else{
n_element[[n]] <- rep(NA,
times=length(o_element[[n]]))
}
}
}
}
# Now merge the two lists
return(mergeLists(o_element,
n_element))
}
if(length(n_element)>1){
new_cols <- ifelse(is.matrix(n_element), ncol(n_element), length(n_element))
old_cols <- ifelse(is.matrix(o_element), ncol(o_element), length(o_element))
if (new_cols != old_cols)
stop("Your length doesn't match on the elements,",
" new element (", new_cols , ") !=",
" old element (", old_cols , ")")
}
return(rbind(o_element,
n_element,
deparse.level=0))
return(c(o_element,
n_element))
}
mergeLists <- function(old, new){
if (is.null(old))
return (new)
m <- mapply(mergeLists_internal, old, new, SIMPLIFY=FALSE)
return(m)
}
Here's my example:
v1 <- list("a"=c(1,2), b="test 1", sublist=list(one=20:21, two=21:22))
v2 <- list("a"=c(3,4), b="test 2", sublist=list(one=10:11, two=11:12, three=1:2))
mergeLists(v1, v2)
This results in:
$a
[,1] [,2]
[1,] 1 2
[2,] 3 4
$b
[1] "test 1" "test 2"
$sublist
$sublist$one
[,1] [,2]
[1,] 20 21
[2,] 10 11
$sublist$two
[,1] [,2]
[1,] 21 22
[2,] 11 12
$sublist$three
[,1] [,2]
[1,] NA NA
[2,] 1 2
Yeah, I know - perhaps not the most logical merge but I have a complex parallel loop that I had to generate a more customized .combine function for, and therefore I wrote this monster :-)

merged = map(names(first), ~c(first[[.x]], second[[.x]])
merged = set_names(merged, names(first))
Using purrr. Also solves the problem of your lists not being in order.

In general one could,
merge_list <- function(...) by(v<-unlist(c(...)),names(v),base::c)
Note that the by() solution returns an attributed list, so it will print differently, but will still be a list. But you can get rid of the attributes with attr(x,"_attribute.name_")<-NULL. You can probably also use aggregate().

We can do a lapply with c(), and use setNames to assign the original name to the output.
setNames(lapply(1:length(first), function(x) c(first[[x]], second[[x]])), names(first))
$a
[1] 1 2
$b
[1] 2 3
$c
[1] 3 4

Following #Aaron left Stack Overflow and #Theo answer, the merged list's elements are in form of vector c.
But if you want to bind rows and columns use rbind and cbind.
merged = map(names(first), ~rbind(first[[.x]], second[[.x]])
merged = set_names(merged, names(first))

Using dplyr, I found that this line works for named lists using the same names:
as.list(bind_rows(first, second))

How to add possible divisor numbers?

How do I retrieve maximum sum of possible divisors numbers
I have a below function which will give possible divisors of number
Code
divisors <- function(x) {
y <- seq_len(ceiling(x / 2))
y[x %% y == 0]
}
Example
Divisors of 99 will give the below possible values.
divisors(99)
[1] 1 3 9 11 33
My expected Logic :
Go from last digit to first digit in the divisors value
The last number is 33, Here next immediate number divisible by 33 is 11 . So I selected 11 , now traversing from 11 the next immediate number divisible by 11 is 1. So selected 1. Now add all the numbers.
33 + 11 + 1 = 45
Move to next number 11, Now next immediate number divisible by 11 is 1. So selected 1. Now add all the numbers.
11 + 1 = 12
Here immediate
Move to next number 9, Now next immediate number divisible by 11 is 1. So selected 1. Now add all the numbers.
9 + 3 + 1 = 13
Move to next number 3, Now next immediate number divisible by 3 is 1. So selected 1. Now add all the numbers.
3+1=4
Now maximum among these is 45.
Now I am struggling to write this logic in R . Help / Advice much appreciated.
Note : Prime numbers can be ignored.

update
For large integers, e.g., the maximum integer .Machine$integer.max (prime number), you can run the code below (note that I modified functions divisors and f a bit)
divisors <- function(x) {
y <- seq(x / 2)
y[as.integer(x) %% y == 0]
}
f <- function(y) {
if (length(y) <= 2) {
return(as.integer(sum(y)))
}
l <- length(y)
h <- y[l]
yy <- y[-l]
h + f(yy[h %% yy == 0])
}
and you will see
> n <- .Machine$integer.max - 1
> x <- divisors(n)
> max(sapply(length(x):2, function(k) f(head(x, k))))
[1] 1569603656
You can define a recursive function f that gives successive divisors
f <- function(y) {
if (length(y) == 1) {
return(y)
}
h <- y[length(y)]
yy <- y[-length(y)]
c(f(yy[h %% yy == 0]), h)
}
and you will see all possible successive divisor tuples
> sapply(rev(seq_along(x)), function(k) f(head(x, k)))
[[1]]
[1] 1 11 33
[[2]]
[1] 1 11
[[3]]
[1] 1 3 9
[[4]]
[1] 1 3
[[5]]
[1] 1
Then, we apply f within sapply like below
> max(sapply(rev(seq_along(x)), function(k) sum(f(head(x, k)))))
[1] 45
which gives the desired output.

You can also use the following solution. It may sound a little bit complicated and of course there is always an easier, more efficient solution. However, I thought this could be useful to you. I will take it from your divisors output:
> x
[1] 1 3 9 11 33
# First I created a list whose first element is our original x and from then on
# I subset the first element till the last element of the list
lst <- lapply(0:(length(x)-1), function(a) x[1:(length(x)-a)])
> lst
[[1]]
[1] 1 3 9 11 33
[[2]]
[1] 1 3 9 11
[[3]]
[1] 1 3 9
[[4]]
[1] 1 3
[[5]]
[1] 1
Then I wrote a custom function in order to implement your conditions and gather your desired output. For this purpose I created a function factory which in fact is a function that creates a function:
As you might have noticed the outermost function does not take any argument. It only sets up an empty vector out to save our desired elements in. It is created in the execution environment of the outermost function to shield it from any changes that might affect it in the global environment
The inner function is the one that takes our vector x so in general we call the whole setup like fnf()(x). First element of of our out vector is in fact the first element of the original x(33). Then I found all divisors of the first element whose quotient were 0. After I fount them I took the second element (11) as the first one was (33) and stored it in our out vector. Then I modified the original x vector and omitted the max value (33) and repeated the same process
Since we were going to repeat the process over again, I thought this might be a good case to use recursion. Recursion is a programming technique that a function actually calls itself from its body or from inside itself. As you might have noticed I used fn inside the function to repeat the process again but each time with one fewer value
This may sound a bit complicated but I believed there may be some good points for you to pick up for future exploration, since I found them very useful, hoped that's the case for you too.
fnf <- function() {
out <- c()
fn <- function(x) {
out <<- c(out, x[1])
z <- x[out[length(out)]%%x == 0]
if(length(z) >= 2) {
out[length(out) + 1] <<- z[2]
} else {
return(out)
}
x <- x[!duplicated(x)][which(x[!duplicated(x)] == z[2]):length(x[!duplicated(x)])]
fn(x)
out[!duplicated(out)]
}
}
# The result of applying the custom function on `lst` would result in your
# divisor values
lapply(lst, function(x) fnf()(sort(x, decreasing = TRUE)))
[[1]]
[1] 33 11 1
[[2]]
[1] 11 1
[[3]]
[1] 9 3 1
[[4]]
[1] 3 1
[[5]]
[1] 1
In the end we sum each element and extract the max value
Reduce(max, lapply(lst, function(x) sum(fnf()(sort(x, decreasing = TRUE)))))
[1] 45
Testing a very large integer number, I used dear #ThomasIsCoding's modified divisors function:
divisors <- function(x) {
y <- seq(x / 2)
y[as.integer(x) %% y == 0]
}
x <- divisors(.Machine$integer.max - 1)
lst <- lapply(0:(length(x)-1), function(a) x[1:(length(x)-a)])
Reduce(max, lapply(lst, function(x) sum(fnf()(sort(x, decreasing = TRUE)))))
[1] 1569603656

You'll need to recurse. If I understand correctly, this should do what you want:
fact <- function(x) {
x <- as.integer(x)
div <- seq_len(abs(x)/2)
factors <- div[x %% div == 0L]
return(factors)
}
maxfact <- function(x) {
factors <- fact(x)
if (length(factors) < 3L) {
return(sum(factors))
} else {
return(max(factors + mapply(maxfact, factors)))
}
}
maxfact(99)
[1] 45

How to make variable size 2d arrays in R?

I would like to create a 2d array to store a known number of elements with unknown lengths of each element. However, I am not experienced with R and all the things I tried either require you to state the dimensions beforehand or don't let you append to the elements afterward. Here is something I tried:
b <- list(rep(c(), 5))
for (i in 1:5) {
for (j in 1:i) {
b[i] <- append(b[i], i)
}
print(b[i])
}
This gives this warning Warning message in b[i] <- append(b[i], i): “number of items to replace is not a multiple of replacement length” and does nothing
Here is what I wanted to achieve, but in python:
b = [[] for _ in range(5)]
for i in range(1,6):
for _ in range(i):
b[i-1].append(i)
I don't mind different data types, something like a dictionary will be ok, but I struggle with getting an idea of how this works in this particular case. The examples above are quite dumb, but, in short, what I want to achieve is: [ [1,2,3], [1,2,3,4], [1] ] or { 'a': [1,2], 'b': [1,2,3] }
What is the cleanest way to do that in R?

You dont need to append list you can simply add it to preexisting list.
b <- list()
for(i in 1:5){
b[length(b)+1] <- list(seq(1:i))
}
If you want to add to an existing list you can simply use c()
b[[3]] <- c(b[3][[1]], 1)

Based on what you say, that you want the equivalent in R of the following code in Python,
b = [[] for _ in range(5)]
for i in range(1,6):
for _ in range(i):
b[i-1].append(i)
Which returns the following output:
[[1], [2, 2], [3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5]]
I have 2 ways to do so in R.
num_elements = 5
1st method: You need to load rlist library. Check here
install.packages("rlist")
library(rlist)
list1 <- list()
for (i in 1:num_elements) {
list1 <- list.append(list1, rep(i,i))
}
list1
2nd method: Predefined the initial list with a fixed size.
list2 <- list(rep(NULL,num_elements))
for (i in 1:num_elements) {
list2[i] <- list(rep(i,i))
}
list2
In both ways, the output would be the following:
[[1]]
[1] 1
[[2]]
[1] 2 2
[[3]]
[1] 3 3 3
[[4]]
[1] 4 4 4 4
[[5]]
[1] 5 5 5 5 5
To get items as a list of the sequence of numbers, then you only have to change rep by seq, as follows:
list1 <- list()
for (i in 1:num_elements) {
list1 <- list.append(list1, seq(1,i))
}
In this case, the output is:
[[1]]
[1] 1
[[2]]
[1] 1 2
[[3]]
[1] 1 2 3
[[4]]
[1] 1 2 3 4
[[5]]
[1] 1 2 3 4 5

lapply functions inside each other does not work as expected

I have two lists and I must use for and if condition for my functions over these lists. I then decide to use lapply function. I used lapply function but my code becomes so difficult and do not work. How can I make my code work in an easy way. Is there a good way to do not use many lapply functions.
The idea of my code:
First have some lists.
These lists does not need to be all the same lengths or even all > 0.
So, my code check each list. if it is > 0 or not.
If it is > 0 then:
check the values of the second list.
If the values equal specific values then this values will changes to new values.
The last steps must applied to all the lists that I have.
Here is my code:
the function gave me NULL
nx <- list(1, 1) ## if my list > 0 then check it
x.t <- list(c(2, 3, 4, 4), c(2, 4, 5, 6)) #the list to apply if statement on it.
lapply(nx, function(x) if (x > 0) {
do.t <- lapply(x.t, function(x) { which(x %in% c(2, 7:10))})
##check the values of my list.
lapply(nx, function(x){
lapply(1:length(x), function(i){ for (j in 1:x[[i]]){ ## here I would like j from 1 to length of x where x is a list of two elements.
if (x.t[[i]][do.t[[j]]] == 2) ## here I want to have a condition says that, if the element of each list is equal 2 then this element will have the value 2.5.
x.t[[i]] <- 2.5
}})})})
my function will includes many lists where the condition will be extend. For example,
if (x.t[[i]][do.t[[j]]] == 2){
x.t[[i]] <- 2.5
}else{ some condition}elese{other condtion}
and so on.
the result.
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
NULL
[[1]][[2]]
[[1]][[2]][[1]]
NULL
[[2]]
[[2]][[1]]
[[2]][[1]][[1]]
NULL
[[2]][[2]]
[[2]][[2]][[1]]
NULL
My function is so complicated and hence I provide this example very similar to my original function.

As a general function maybe it's better to divide the code into parts, each one doing just one thing.
Note that the lapply passes entire vectors, the elements of the list x.t to the function. Then, complicated loops through the elements of a vector, processing one at a time.
complicated <- function(x){
for(i in seq_along(x)){
if(x[i] > 0){
if(x[i] == 2)
x[i] <- 2.5
}
}
x
}
x.t.2 <- lapply(x.t, function(x){
x <- complicated(x)
x
})
x.t.2
#[[1]]
#[1] 2.5 3.0 4.0 4.0
#
#[[2]]
#[1] 2.5 4.0 5.0 6.0

Chaining list of vectors

If I have a list of vectors such as below
list.x <- list(1:2, 1:3, 3:4, 5, 5:6)
Is there a way to replace each list element with an element that includes all the other values that the element can be paired with?
For example the first element (list.x[[1]]) would be replace with 1:4 because element 2 (list.x[[2]]) shows that 2, is also paired with 3, and element 3 shows that 3 is also paired with 4.
The final result I would like to achieve would be this list
final.list <- list(1:4, 1:4, 1:4, 5:6, 5:6)

I needed a change of pace today, so I decided to try to answer the question using base R. Here it goes:
First, I created a function that unions two vectors if they intersect, and if not, simply returns the first vector:
expand.if.toucing <- function(vector1, vector2) {
i = intersect(vector1, vector2);
if (NROW(i) > 0)
union(vector1, vector2)
else
vector1;
}
Then I made a function that merges one element in the list of vectors with another:
list.reduce <- function (lst) {
for(v1 in 1:NROW(lst))
for (v2 in 1:NROW(lst)) {
if (v1 == v2)
next;
prevLength <- NROW(lst[[v1]]);
lst[[v1]] <- expand.if.toucing(lst[[v1]], lst[[v2]]);
newLength <- NROW(lst[[v1]]);
if (newLength == prevLength)
next;
return(lst[-v2]);
}
lst;
}
After this, I made a function that merges all vectors in the list that can be merged. This is sort of a proto cluster analysis, so I called it clusterize:
clusterize <- function (lst) {
reduced = TRUE;
while(reduced) {
prevLength <- NROW(lst);
lst <- list.reduce(lst);
newLength <- NROW(lst);
reduced <- prevLength != newLength;
}
lst;
}
Now it's just a matter of replacing each element in the original list with its associated cluster:
replace.with.clusters <- function(lst, clusters) {
for(l in 1:NROW(lst))
for(c in 1:NROW(clusters)) {
lst[[l]] <- expand.if.toucing(lst[[l]], clusters[[c]]);
next;
}
lst;
}
You're good to go. The two main functions are clusterize and replace.with.cluster. Use them like this:
list.x <- list(1:2, 1:3, 3:4, 5, 5:6)
clusters <- clusterize(list.x);
replace.with.clusters(list.x, clusters);
# Outputs the following:
#
# [[1]]
# [1] 1 2 3 4
#
# [[2]]
# [1] 1 2 3 4
#
# [[3]]
# [1] 3 4 1 2
#
# [[4]]
# [1] 5 6
#
# [[5]]
# [1] 5 6
The third element is in a different order than your list, but from the way you describe the problem, order is not truly relevant.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Merging intersecting list elements - r

Related

concatenation sublists of two different lists [duplicate]

How to add possible divisor numbers?

How to make variable size 2d arrays in R?

lapply functions inside each other does not work as expected

Chaining list of vectors

Categories

Resources