How to obtain the list of elements from a Venn diagram - r

I have a Venn diagram made from 3 lists, I would like to obtain all the different sub-lists, common elements between two lists, between the tree of them, and the unique elements for each list. Is there a way to make this as straight forward as possible?
AW.DL <- c("a","b","c","d")
AW.FL <- c("a","b", "e", "f")
AW.UL <- c("a","c", "e", "g")
venn.diagram(
x = list(AW.DL, AW.FL, AW.UL),
category.names = c("AW.DL" , "AW.FL","AW.UL" ),
filename = '#14_venn_diagramm.png',
output=TRUE,
na = "remove"
)

I found that the package VennDiagram has a function calculate.overlap() but I wasn't able to find a way to name the sections from this function. However, if you use package gplots , there is the function venn() which will return the intersections attribute.
AW.DL <- c("a","b","c","d")
AW.FL <- c("a","b", "e", "f")
AW.UL <- c("a","c", "e", "g")
library(gplots)
lst <- list(AW.DL,AW.FL,AW.UL)
ItemsList <- venn(lst, show.plot = FALSE)
lengths(attributes(ItemsList)$intersections)
Output:
> lengths(attributes(ItemsList)$intersections)
A B C A:B A:C B:C A:B:C
1 1 1 1 1 1 1
To get elements, just print attributes(ItemsList)$intersections:
> attributes(ItemsList)$intersections
$A
[1] "d"
$B
[1] "f"
$C
[1] "g"
$`A:B`
[1] "b"
$`A:C`
[1] "c"
$`B:C`
[1] "e"
$`A:B:C`
[1] "a"

Related

Turn a datatable into a two-dimensional list in R

I have a data.table (see dt). I want to turn it into a 2-dimensional list for future use (e.g. a, b and c are column names of another dt. I want to select the value of a non-missing column among a, b and c then impute into x, and so on). So the 2-dimensional list will act like a reference object for fcoalesce function.
# example
dt <- data.table(col1 = c("a", "b", "c", "d", "e", "f"),
col2 = c("x", "x", "x", "y", "y", "z"))
# desirable result
list.1 <- list(c("a", "b", "c"), c("d", "e"), c("f"))
list.2 <- list("x", "y", "z")
list(list.1, list.2)
Since the actual dt is much larger than the example dt, is there a more efficient way to do it?
You can use split():
lst1 <- split(dt$col1, dt$col2)
lst2 <- as.list(names(lst1))
result <- list(unname(lst1), lst2)
result
# [[1]]
# [[1]][[1]]
# [1] "a" "b" "c"
#
# [[1]][[2]]
# [1] "d" "e"
#
# [[1]][[3]]
# [1] "f"
#
#
# [[2]]
# [[2]][[1]]
# [1] "x"
#
# [[2]][[2]]
# [1] "y"
#
# [[2]][[3]]
# [1] "z"

Finding specific elements in lists

I am stuck at one of the challenges proposed in a tutorial I am reading.
# Using the following code:
challenge_list <- list(words = c("alpha", "beta", "gamma"),
numbers = 1:10
letter = letters
# challenge_list
# Extract the following things:
#
# - The word "gamma"
# - The letters "a", "e", "i", "o", and "u"
# - The numbers less than or equal to 3
I have tried using the followings:
## 1
challenge_list$"gamma"
## 2
challenge_list [[1]["gamma"]]
But nothing works.
> challenge_list$words[challenge_list$words == "gamma"]
[1] "gamma"
> challenge_list$letter[challenge_list$letter %in% c("a","e","i","o","u")]
[1] "a" "e" "i" "o" "u"
> challenge_list$numbers[challenge_list$numbers<=3]
[1] 1 2 3
We can use a function and then do the subset if it is numeric or not and then use Map to pass the list to vector that correspond to the original list element and apply the f1. This would return the new list with the filtered values
f1 <- function(x, y) if(is.numeric(x)) x[ x <= y] else x [x %in% y]
out <- Map(f1, challenge_list, list('gamma', 3, c("a","e","i","o","u")))
out
-output
#$words
#[1] "gamma"
#$numbers
#[1] 1 2 3
#$letter
#[1] "a" "e" "i" "o" "u"
Try this. Most of R objects can be filtered using brackets. In the case of lists you have to use a pair of them like [[]][] because the first one points to the object inside the list and the second one makes reference to the elements inside them. For vectors the task is easy as you only can use a pair of brackets and set conditions to extract elements. Here the code:
#Data
challenge_list <- list(words = c("alpha", "beta", "gamma"),
numbers = 1:10
letter = letters
#Code
challenge_list[[1]][1]
letter[letter %in% c("a", "e", "i", "o","u")]
numbers[numbers<=3]
As I have noticed your data is in a list, you can also play with the position of the elements like this:
#Data 2
challenge_list <- list(words = c("alpha", "beta", "gamma"),numbers = 1:10,letter = letters)
#Code 2
challenge_list[[1]][1]
challenge_list[[3]][challenge_list[[3]] %in% c("a", "e", "i", "o","u")]
challenge_list[[2]][challenge_list[[2]]<=3]
Output:
challenge_list[[1]][1]
[1] "alpha"
challenge_list[[3]][challenge_list[[3]] %in% c("a", "e", "i", "o","u")]
[1] "a" "e" "i" "o" "u"
challenge_list[[2]][challenge_list[[2]]<=3]
[1] 1 2 3

R: Split large character lines into slices

For a large dataframe containing 99150000 rows, the following code splits the data my_df into chunks of 1000 rows and writes to the disk.
lapply(seq(1, nrow(my_df), by = 1000),
function(i) write.table(my_df[i:i+1000-1,]
, file = paste0('path_to_logal_dir/data'
, i, '-', i+1000-1, '.csv')
,row.names = F,col.names = F,quote = F)
)
Now, I have the same data (99150000 elements) in the character format, sample data below:
[1] "1979_1,532,40,7.7,12.9,116.9,12.9,85,2,2.001,4,25,55,5.3,55,85,7.7,85,145,7.5,145,265,5.0"
[2] "1979_2,532,40,7.7,12.9,116.9,12.9,85,2,2.001,4,25,55,5.3,55,85,7.7,85,145,7.5"
[3] "1979_3,532,40,7.7,12.9,116.9,12.9,85,2,2.001,4,25,55,5.3,55,85,7.7,85"
...
[99150000] ...
How could I achieve the same task above, that is, splitting the character format data into chunks (files containing 1000 lines)?
This is a solution made using only base R. You can easily generalize it using apply family or purrr package. First I create some fake data
fake_data <- c("A", "B", "C", "D", "E", "F", "G", "H")
fake_data
#> [1] "A" "B" "C" "D" "E" "F" "G" "H"
You want to divide your character vector into groups of 1000 lines. For simplicity I divide this vector into groups of 2 lines
group_length <- 2
This means that the first 2 elements of the character vector belong to the first group, the second 2 elements belong to the second group and so on
groups <- rep(1 : (length(fake_data) / group_length), each = group_length)
groups
#> [1] 1 1 2 2 3 3 4 4
Now I divide the character vector into subgroups based
splitted_groups <- split(fake_data, groups)
splitted_groups
#> $`1`
#> [1] "A" "B"
#>
#> $`2`
#> [1] "C" "D"
#>
#> $`3`
#> [1] "E" "F"
#>
#> $`4`
#> [1] "G" "H"
and create a for loop to save each subgroup to a file
for (i in seq_len(length(fake_data) / group_length)) {
table_data <- data.frame(x = splitted_groups[[i]])
write.csv(table_data, file = paste0("data", i, ".csv"), row.names = FALSE)
}
Created on 2019-07-30 by the reprex package (v0.3.0)
You could also replace the last for loop using the map family defined in the purrr package.

How to paste vector elements comma-separated and in quotation marks?

I want to select columns of data frame dfr by their names in a certain order, that i obtain with the numbers in first place.
> (x <- names(dfr)[c(3, 4, 2, 1, 5)])
[1] "c" "d" "b" "a" "e"
In the final code there only should be included the names version, because it's safer.
dfr[, c("c", "d", "b", "a", "e")
I want to paste the elements separated with commas and quotation marks into a string, in order to include it into the final code. I've tried a few options, but they don't give me what I want:
> paste(x, collapse='", "')
[1] "c\", \"d\", \"b\", \"a\", \"e"
> paste(x, collapse="', '")
[1] "c', 'd', 'b', 'a', 'e"
I need something like "'c', 'd', 'b', 'a', 'e'",—of course "c", "d", "b", "a", "e" would be much nicer.
Data
dfr <- setNames(data.frame(matrix(1:15, 3, 5)), letters[1:5])
So dput(x) is the correct answer but just in case you were wondering how to achieve this by modifying your existing code you could do something like the following:
cat(paste0('c("', paste(x, collapse='", "'), '")'))
c("c", "d", "b", "a", "e")
Can also be done with packages (as Tung has showed), here is an example using glue:
library(glue)
glue('c("{v}")', v = glue_collapse(x, '", "'))
c("c", "d", "b", "a", "e")
Try vector_paste() function from the datapasta package
library(datapasta)
vector_paste(input_vector = letters[1:3])
#> c("a", "b", "c")
vector_paste_vertical(input_vector = letters[1:3])
#> c("a",
#> "b",
#> "c")
Or, using base R, this gives you what you want:
(x <- letters[1:3])
q <- "\""
( y <- paste0("c(", paste(paste0(q, x, q), collapse = ", ") , ")" ))
[1] "c(\"a\", \"b\", \"c\")"
Though I'm not realy sure why you want it? Surely you can simply subset like this:
df <- data.frame(a=1:3, b = 1:3, c = 1:3)
df[ , x]
a b c
1 1 1 1
2 2 2 2
3 3 3 3
df[ , rev(x)]
c b a
1 1 1 1
2 2 2 2
3 3 3 3
suppose you want to add a quotation infront and at the end of a text, and save it as an R object - use the capture.output function from utils pkg.
Example. I want ABCDEFG to be saved as an R object as "ABCDEFG"
> cat("ABCDEFG")
> ABCDEFG
> cat("\"ABCDEFG\"")
> "ABCDEFG"
>
#To save output of the cat as an R object including the quotation marks at the start and end of the word use the capture.ouput
> add_quote <- capture.output(cat("\"ABCDEFG\""))
> add_quote
[1] "\"ABCDEFG\""

Getting the set of nodes connected till the main parent node in R

I have a data set which has 6 rows and 3 columns. The first column represents children, whereas second column onward immediate parents of the corresponding child is allocated.
Above, one can see that "a" and "b" don't have any parents. whereas "c" has only parent and that is "a". "d" has parents "b" and "c" and so on.
What I need is: if given the input as the child, it should give me all the ancestors of that child including child.
e.g. "f" is the child I chose then desired output should be :
{"f", "d", "b"}, {"f", "d", "c", "a"}, {"f", "e", "b"}, {"f", "e", "c", "a"}.
Note: Order of the nodes does not matter.
Thank you so much in advance.
Create sample data. Note use of stringsAsFactors here, I'm assuming your data are characters and not factors:
> d <- data.frame(list("c" = c("a", "b", "c", "d", "e", "f"), "p1" = c(NA, NA, "a", "b", "b", "d"), "p2" = c(NA, NA, NA, "c", "c", "e")),stringsAsFactors=FALSE)
First tidy it up - make the data long, not wide, with each row being a child-parent pair:
> pairs = subset(reshape2::melt(d,id.vars="c",value.name="parent"), !is.na(parent))[,c("c","parent")]
> pairs
c parent
3 c a
4 d b
5 e b
6 f d
10 d c
11 e c
12 f e
Now we can make a graph of the parent-child relationships. This is a directed graph, so plots child-parent as an arrow:
> g = graph.data.frame(pairs)
> plot(g)
Now I'm not sure exactly what you want, but igraph functions can do anything... So for example, here's a search of the graph starting at d from which we can get various bits of information:
> d_search = bfs(g,"d",neimode="out", unreachable=FALSE, order=TRUE, dist=TRUE)
First, which nodes are ancestors of d? Its the ones that can be reached from d via the exhaustive (here, breadth-first) search:
> d_search$order
+ 6/6 vertices, named:
[1] d c b a <NA> <NA>
Note it includes d as well. Trivial enough to drop from this list. That gives you the set of ancestors of d which is what you asked for.
What is the relationship of those nodes to d?
> d_search$dist
c d e f a b
1 0 NaN NaN 2 1
We see that e and f are unreachable, so are not ancestors of d. c and b are direct parents, and a is a grandparent. You can check this from the graph.
You can also get all the paths from any child upwards using functions like shortest_paths and so on.
Here is a recursive function that makes all possible family lines:
d <- data.frame(list("c" = c("a", "b", "c", "d", "e", "f"),
"p1" = c(NA, NA, "a", "b", "b", "d"),
"p2" = c(NA, NA, NA, "c", "c", "e")), stringsAsFactors = F)
# Make data more convenient for the task.
library(reshape2)
dp <- melt(d, id = c("c"), value.name = "p")
# Recursive function builds ancestor vectors.
getAncestors <- function(data, x, ancestors = list(x)) {
parents <- subset(data, c %in% x & !is.na(p), select = c("c", "p"))
if(nrow(parents) == 0) {
return(ancestors)
}
x.c <- parents$c
p.c <- parents$p
ancestors <- lapply(ancestors, function(x) {
if (is.null(x)) return(NULL)
# Here we want to repeat ancestor chain for each new parent.
res <- list()
matches <- 0
for (i in 1:nrow(parents)) {
if (tail(x, 1) == parents[i, ]$c){
res[[i]] <- c(x, parents[i, ]$p)
matches <- matches + 1
}
}
if (matches == 0) { # There are no more parents.
res[[1]] <- x
}
return (res)
})
# remove one level of lists.
ancestors <- unlist(ancestors, recursive = F)
res <- getAncestors(data, p.c, ancestors)
return (res)
}
# Demo of results for the lowest level.
res <- getAncestors(dp, "f")
res
#[[1]]
#[1] "f" "d" "b"
#[[2]]
#[1] "f" "d" "c" "a"
#[[3]]
#[1] "f" "e" "b"
#[[4]]
#[1] "f" "e" "c" "a"
You will need to implement this in a similar way through recursion or with a while loop.

Resources