Turn a datatable into a two-dimensional list in R - r

I have a data.table (see dt). I want to turn it into a 2-dimensional list for future use (e.g. a, b and c are column names of another dt. I want to select the value of a non-missing column among a, b and c then impute into x, and so on). So the 2-dimensional list will act like a reference object for fcoalesce function.
# example
dt <- data.table(col1 = c("a", "b", "c", "d", "e", "f"),
col2 = c("x", "x", "x", "y", "y", "z"))
# desirable result
list.1 <- list(c("a", "b", "c"), c("d", "e"), c("f"))
list.2 <- list("x", "y", "z")
list(list.1, list.2)
Since the actual dt is much larger than the example dt, is there a more efficient way to do it?

You can use split():
lst1 <- split(dt$col1, dt$col2)
lst2 <- as.list(names(lst1))
result <- list(unname(lst1), lst2)
result
# [[1]]
# [[1]][[1]]
# [1] "a" "b" "c"
#
# [[1]][[2]]
# [1] "d" "e"
#
# [[1]][[3]]
# [1] "f"
#
#
# [[2]]
# [[2]][[1]]
# [1] "x"
#
# [[2]][[2]]
# [1] "y"
#
# [[2]][[3]]
# [1] "z"

Related

How to obtain the list of elements from a Venn diagram

I have a Venn diagram made from 3 lists, I would like to obtain all the different sub-lists, common elements between two lists, between the tree of them, and the unique elements for each list. Is there a way to make this as straight forward as possible?
AW.DL <- c("a","b","c","d")
AW.FL <- c("a","b", "e", "f")
AW.UL <- c("a","c", "e", "g")
venn.diagram(
x = list(AW.DL, AW.FL, AW.UL),
category.names = c("AW.DL" , "AW.FL","AW.UL" ),
filename = '#14_venn_diagramm.png',
output=TRUE,
na = "remove"
)
I found that the package VennDiagram has a function calculate.overlap() but I wasn't able to find a way to name the sections from this function. However, if you use package gplots , there is the function venn() which will return the intersections attribute.
AW.DL <- c("a","b","c","d")
AW.FL <- c("a","b", "e", "f")
AW.UL <- c("a","c", "e", "g")
library(gplots)
lst <- list(AW.DL,AW.FL,AW.UL)
ItemsList <- venn(lst, show.plot = FALSE)
lengths(attributes(ItemsList)$intersections)
Output:
> lengths(attributes(ItemsList)$intersections)
A B C A:B A:C B:C A:B:C
1 1 1 1 1 1 1
To get elements, just print attributes(ItemsList)$intersections:
> attributes(ItemsList)$intersections
$A
[1] "d"
$B
[1] "f"
$C
[1] "g"
$`A:B`
[1] "b"
$`A:C`
[1] "c"
$`B:C`
[1] "e"
$`A:B:C`
[1] "a"

Finding specific elements in lists

I am stuck at one of the challenges proposed in a tutorial I am reading.
# Using the following code:
challenge_list <- list(words = c("alpha", "beta", "gamma"),
numbers = 1:10
letter = letters
# challenge_list
# Extract the following things:
#
# - The word "gamma"
# - The letters "a", "e", "i", "o", and "u"
# - The numbers less than or equal to 3
I have tried using the followings:
## 1
challenge_list$"gamma"
## 2
challenge_list [[1]["gamma"]]
But nothing works.
> challenge_list$words[challenge_list$words == "gamma"]
[1] "gamma"
> challenge_list$letter[challenge_list$letter %in% c("a","e","i","o","u")]
[1] "a" "e" "i" "o" "u"
> challenge_list$numbers[challenge_list$numbers<=3]
[1] 1 2 3
We can use a function and then do the subset if it is numeric or not and then use Map to pass the list to vector that correspond to the original list element and apply the f1. This would return the new list with the filtered values
f1 <- function(x, y) if(is.numeric(x)) x[ x <= y] else x [x %in% y]
out <- Map(f1, challenge_list, list('gamma', 3, c("a","e","i","o","u")))
out
-output
#$words
#[1] "gamma"
#$numbers
#[1] 1 2 3
#$letter
#[1] "a" "e" "i" "o" "u"
Try this. Most of R objects can be filtered using brackets. In the case of lists you have to use a pair of them like [[]][] because the first one points to the object inside the list and the second one makes reference to the elements inside them. For vectors the task is easy as you only can use a pair of brackets and set conditions to extract elements. Here the code:
#Data
challenge_list <- list(words = c("alpha", "beta", "gamma"),
numbers = 1:10
letter = letters
#Code
challenge_list[[1]][1]
letter[letter %in% c("a", "e", "i", "o","u")]
numbers[numbers<=3]
As I have noticed your data is in a list, you can also play with the position of the elements like this:
#Data 2
challenge_list <- list(words = c("alpha", "beta", "gamma"),numbers = 1:10,letter = letters)
#Code 2
challenge_list[[1]][1]
challenge_list[[3]][challenge_list[[3]] %in% c("a", "e", "i", "o","u")]
challenge_list[[2]][challenge_list[[2]]<=3]
Output:
challenge_list[[1]][1]
[1] "alpha"
challenge_list[[3]][challenge_list[[3]] %in% c("a", "e", "i", "o","u")]
[1] "a" "e" "i" "o" "u"
challenge_list[[2]][challenge_list[[2]]<=3]
[1] 1 2 3

Store loop result for each iteration as separate vectors

I have all-characters matrix with 13 row and 28 columns (my_matrix):
N N ... S N
N S ... F N
N S ... Z NA
NA S ... F NA
.. .. ... .. ..
NA NA ... N NA
What I want to do is
Delete all NA(s)
Delete all column that has at least one of these characters: "Z",
"B", "E", "V", "D", "VS", "VZ"
And store them as separate vectors, looked like this way (or by row, either way is okay):
my_vector1 = N N N
my_vector2 = N S S S ...etc.
What I have done is to write a loop looks like this:
forbidden_letter <- c("Z", "B", "E", "V", "D", "VS", "VZ")
for(i in 1:28) {
if(!any(forbidden_letter %in% my_matrix[,i]==TRUE)){
temp_my_vector <- matrix(my_matrix[,i], byrow = TRUE)
my_vector <- matrix(temp_my_vector[!temp_my_vector %in% NA], byrow= TRUE)
print(my_vector)}
}
my_vector
With this piece of code, I only get the last iteration's of my_vector
How do I save each iterations into separate vectors?
Let's consider some data:
(dat <- cbind(c("N", "N", "N", NA, NA), c("N", "S", "S", "S", NA), c("S", "F", "Z", "F", "N"), c("N", "N", NA, NA, NA)))
# [,1] [,2] [,3] [,4]
# [1,] "N" "N" "S" "N"
# [2,] "N" "S" "F" "N"
# [3,] "N" "S" "Z" NA
# [4,] NA "S" "F" NA
# [5,] NA NA "N" NA
forbidden_letter <- c("Z", "B", "E", "V", "D", "VS", "VZ")
You can process each column separately with apply:
(vectors <- apply(dat, 2, function(x) if (any(x %in% forbidden_letter)) NULL else x[!is.na(x)]))
# [[1]]
# [1] "N" "N" "N"
#
# [[2]]
# [1] "N" "S" "S" "S"
#
# [[3]]
# NULL
#
# [[4]]
# [1] "N" "N"
You could then get the vector for each column with:
vectors[[1]]
# [1] "N" "N" "N"
vectors[[4]]
# [1] "N" "N"
I believe you want to filter your matrix based on some conditions.
If you have a matrix my_matrix, you can first filter columns with forbidden letters:
for(col in names(my_matrix)) {
if (any(my_matrix[, col] %in% forbidden_letter)) {
my_matrix[, col] <- NULL
}
}
This will delete columns with forbidden letters.
Now, you can access any column via
my_matrix[, col_name]
or
my_matrix[, col_index]
And the result is already a vector, as you want. You don't need to store each column in another variable as you intended. The matrix object already does that for you.
Finally, to remove the NA values, you can simply do na.omit after filtering. For example,
print(na.omit(my_matrix[, 3]))
will print out only non-NA values of your third column. The logic can be extended to any column.

Extract rows from data frame as list of vectors

For example, I have the data
a <- c("a", "b", "c")
b <- c("x", "y", "z")
df <- data.frame(a = a, b = b)
I want to do something to df so my result is
list(c("a", "x"), c("b", "y"), c("c", "z"))
I want the solution to be vectorized, and I'm having trouble utilizing the right *apply functions...
If you want to split up a data.frame by rows, try
split(df, seq.int(nrow(df)))
Here's another way.
as.data.frame(t(df),stringsAsFactors=FALSE,row.names=NA)
# V1 V2 V3
# 1 a b c
# 2 x y z
This produces a data frame, which is in fact a list of vectors. If you must have a "true" list, you could use this:
as.list(as.data.frame(t(df),stringsAsFactors=FALSE,row.names=NA))
# $V1
# [1] "a" "x"
#
# $V2
# [1] "b" "y"
#
# $V3
# [1] "c" "z"

Inserting values in one list into another list by index

I have two lists x and y, and a vector of indices where.
x <- list(a = 1:4, b = letters[1:6])
y <- list(a = c(20, 50), b = c("abc", "xyz"))
where <- c(2, 4)
I want to insert y into x at the indices in where, so that the result is
list(a = c(1,20,2,50,3,4), b = c("a", "abc", "b", "xyz", "c", "d", "e", "f"))
#$a
#[1] 1 20 2 50 3 4
#
#$b
#[1] "a" "abc" "b" "xyz" "c" "d" "e" "f"
I've been trying it with append, but it's not working.
lapply(seq(x), function(i) append(x[[i]], y[[i]], after = where[i]))
#[[1]]
#[1] 1 2 20 50 3 4
#
#[[2]]
#[1] "a" "b" "c" "d" "abc" "xyz" "e" "f"
This is appending at the wrong index. Plus, I want to retain the list names in the process. I also don't know if append is the right function for this, since I've literally never seen it used anywhere.
What's the best way to insert values from one list into another list using an index vector?
How about an mapply solution
x <- list(a = 1:4, b = letters[1:6])
y <- list(a = c(20, 50), b = c("abc", "xyz"))
where <- c(2, 4)
mapply(function(x,y,w) {
r <- vector(class(x), length(x)+length(y))
r[-w] <- x
r[w] <- y
r
}, x, y, MoreArgs=list(where), SIMPLIFY=FALSE)
which returns
$a
[1] 1 20 2 50 3 4
$b
[1] "a" "abc" "b" "xyz" "c" "d" "e" "f"
which seems to be the results you desire.
Here I created a APPEND function that is an iterative (via Reduce) version of append:
APPEND <- function(x, where, y)
Reduce(function(z, args)do.call(append, c(list(z), args)),
Map(list, y, where - 1), init = x)
Then you just need to call that function via Map:
Map(APPEND, x, list(where), y)

Resources