Finding specific elements in lists - r

I am stuck at one of the challenges proposed in a tutorial I am reading.
# Using the following code:
challenge_list <- list(words = c("alpha", "beta", "gamma"),
numbers = 1:10
letter = letters
# challenge_list
# Extract the following things:
#
# - The word "gamma"
# - The letters "a", "e", "i", "o", and "u"
# - The numbers less than or equal to 3
I have tried using the followings:
## 1
challenge_list$"gamma"
## 2
challenge_list [[1]["gamma"]]
But nothing works.

> challenge_list$words[challenge_list$words == "gamma"]
[1] "gamma"
> challenge_list$letter[challenge_list$letter %in% c("a","e","i","o","u")]
[1] "a" "e" "i" "o" "u"
> challenge_list$numbers[challenge_list$numbers<=3]
[1] 1 2 3

We can use a function and then do the subset if it is numeric or not and then use Map to pass the list to vector that correspond to the original list element and apply the f1. This would return the new list with the filtered values
f1 <- function(x, y) if(is.numeric(x)) x[ x <= y] else x [x %in% y]
out <- Map(f1, challenge_list, list('gamma', 3, c("a","e","i","o","u")))
out
-output
#$words
#[1] "gamma"
#$numbers
#[1] 1 2 3
#$letter
#[1] "a" "e" "i" "o" "u"

Try this. Most of R objects can be filtered using brackets. In the case of lists you have to use a pair of them like [[]][] because the first one points to the object inside the list and the second one makes reference to the elements inside them. For vectors the task is easy as you only can use a pair of brackets and set conditions to extract elements. Here the code:
#Data
challenge_list <- list(words = c("alpha", "beta", "gamma"),
numbers = 1:10
letter = letters
#Code
challenge_list[[1]][1]
letter[letter %in% c("a", "e", "i", "o","u")]
numbers[numbers<=3]
As I have noticed your data is in a list, you can also play with the position of the elements like this:
#Data 2
challenge_list <- list(words = c("alpha", "beta", "gamma"),numbers = 1:10,letter = letters)
#Code 2
challenge_list[[1]][1]
challenge_list[[3]][challenge_list[[3]] %in% c("a", "e", "i", "o","u")]
challenge_list[[2]][challenge_list[[2]]<=3]
Output:
challenge_list[[1]][1]
[1] "alpha"
challenge_list[[3]][challenge_list[[3]] %in% c("a", "e", "i", "o","u")]
[1] "a" "e" "i" "o" "u"
challenge_list[[2]][challenge_list[[2]]<=3]
[1] 1 2 3

Related

Turn a datatable into a two-dimensional list in R

I have a data.table (see dt). I want to turn it into a 2-dimensional list for future use (e.g. a, b and c are column names of another dt. I want to select the value of a non-missing column among a, b and c then impute into x, and so on). So the 2-dimensional list will act like a reference object for fcoalesce function.
# example
dt <- data.table(col1 = c("a", "b", "c", "d", "e", "f"),
col2 = c("x", "x", "x", "y", "y", "z"))
# desirable result
list.1 <- list(c("a", "b", "c"), c("d", "e"), c("f"))
list.2 <- list("x", "y", "z")
list(list.1, list.2)
Since the actual dt is much larger than the example dt, is there a more efficient way to do it?
You can use split():
lst1 <- split(dt$col1, dt$col2)
lst2 <- as.list(names(lst1))
result <- list(unname(lst1), lst2)
result
# [[1]]
# [[1]][[1]]
# [1] "a" "b" "c"
#
# [[1]][[2]]
# [1] "d" "e"
#
# [[1]][[3]]
# [1] "f"
#
#
# [[2]]
# [[2]][[1]]
# [1] "x"
#
# [[2]][[2]]
# [1] "y"
#
# [[2]][[3]]
# [1] "z"

How to order vectors with priority layout?

Let's consider these vector of strings following:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
As you can see there are certain strings in this vector starting the same e.g. "B", "B_big".
What I want to end up with is a vector ordered in such layout that all strings with same starting should be next to each other. But order of letter should stay the same (that "B" should be first one, "C" second one and so on). Let me put an example to clarify it:
In simple words, I want to end up with vector:
"B", "B_big", "B_tremendous", "C_small", "C", "A", "A_huge", "A_big", "D"
What I've done to achive this vector: I read from the left and I see "B" so I'm looking on all other vector which starts the same and put it to the right of "B". Then is "C", so I'm looking on all remaining strings and put all starting with "C" e.g. "C_small" to the right and so on.
I'm not sure how to do it. I'm almost sure that gsub function can be used to approach this result, however I'm not sure how to combine it with this searching and replacing. Could you please give me a hand doing so ?
Here's one option:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
xorder <- unique(substr(x, 1, 1))
xnew <- c()
for (letter in xorder) {
if (letter %in% substr(x, 1, 1)) {
xnew <- c(xnew, x[substr(x, 1, 1) == letter])
}
}
xnew
[1] "B" "B_big" "B_tremendous" "C_small" "C"
[6] "A" "A_huge" "A_big" "D"
Use the "prefix" as factor levels and then order:
sx = substr(x, 1, 1)
x[order(factor(sx, levels = unique(sx)))]
# [1] "B" "B_big" "B_tremendous" "C_small" "C" "A" "A_huge" "A_big" "D"
If you are open for non-base alternatives, data.table::chgroup may be used, "groups together duplicated values but retains the group order (according the first appearance order of each group), efficiently":
x[chgroup(substr(x, 1, 1))]
# [1] "B" "B_big" "B_tremendous" "C_small" "C" "A" "A_huge" "A_big" "D"
I suggest splitting the two parts of the text into separate dimensions. Then, define a clear rank order for the descriptive part of the name using a named character vector. From there you can reorder the input vector on the fly. Bundled as a function:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
sorter <- function(x) {
# separate the two parts
prefix <- sub("_.*$", "", x)
suffix <- sub("^.*_", "", x)
# identify inputs with no suffix
suffix <- ifelse(suffix == "", "none", suffix)
# map each suffix to a rank ordering
suffix_order <- c(
"small" = -1,
"none" = 0,
"big" = 1,
"huge" = 2,
"tremendous" = 3
)
# return input vector,
# ordered by the prefix and the mapping of suffix to rank
x[order(prefix, suffix_order[suffix])]
}
sorter(x)
Result
[1] "A_big" "A_huge" "A" "B_big" "B_tremendous" "B" "C_small" "C"
[9] "D"

How to obtain the list of elements from a Venn diagram

I have a Venn diagram made from 3 lists, I would like to obtain all the different sub-lists, common elements between two lists, between the tree of them, and the unique elements for each list. Is there a way to make this as straight forward as possible?
AW.DL <- c("a","b","c","d")
AW.FL <- c("a","b", "e", "f")
AW.UL <- c("a","c", "e", "g")
venn.diagram(
x = list(AW.DL, AW.FL, AW.UL),
category.names = c("AW.DL" , "AW.FL","AW.UL" ),
filename = '#14_venn_diagramm.png',
output=TRUE,
na = "remove"
)
I found that the package VennDiagram has a function calculate.overlap() but I wasn't able to find a way to name the sections from this function. However, if you use package gplots , there is the function venn() which will return the intersections attribute.
AW.DL <- c("a","b","c","d")
AW.FL <- c("a","b", "e", "f")
AW.UL <- c("a","c", "e", "g")
library(gplots)
lst <- list(AW.DL,AW.FL,AW.UL)
ItemsList <- venn(lst, show.plot = FALSE)
lengths(attributes(ItemsList)$intersections)
Output:
> lengths(attributes(ItemsList)$intersections)
A B C A:B A:C B:C A:B:C
1 1 1 1 1 1 1
To get elements, just print attributes(ItemsList)$intersections:
> attributes(ItemsList)$intersections
$A
[1] "d"
$B
[1] "f"
$C
[1] "g"
$`A:B`
[1] "b"
$`A:C`
[1] "c"
$`B:C`
[1] "e"
$`A:B:C`
[1] "a"

Store loop result for each iteration as separate vectors

I have all-characters matrix with 13 row and 28 columns (my_matrix):
N N ... S N
N S ... F N
N S ... Z NA
NA S ... F NA
.. .. ... .. ..
NA NA ... N NA
What I want to do is
Delete all NA(s)
Delete all column that has at least one of these characters: "Z",
"B", "E", "V", "D", "VS", "VZ"
And store them as separate vectors, looked like this way (or by row, either way is okay):
my_vector1 = N N N
my_vector2 = N S S S ...etc.
What I have done is to write a loop looks like this:
forbidden_letter <- c("Z", "B", "E", "V", "D", "VS", "VZ")
for(i in 1:28) {
if(!any(forbidden_letter %in% my_matrix[,i]==TRUE)){
temp_my_vector <- matrix(my_matrix[,i], byrow = TRUE)
my_vector <- matrix(temp_my_vector[!temp_my_vector %in% NA], byrow= TRUE)
print(my_vector)}
}
my_vector
With this piece of code, I only get the last iteration's of my_vector
How do I save each iterations into separate vectors?
Let's consider some data:
(dat <- cbind(c("N", "N", "N", NA, NA), c("N", "S", "S", "S", NA), c("S", "F", "Z", "F", "N"), c("N", "N", NA, NA, NA)))
# [,1] [,2] [,3] [,4]
# [1,] "N" "N" "S" "N"
# [2,] "N" "S" "F" "N"
# [3,] "N" "S" "Z" NA
# [4,] NA "S" "F" NA
# [5,] NA NA "N" NA
forbidden_letter <- c("Z", "B", "E", "V", "D", "VS", "VZ")
You can process each column separately with apply:
(vectors <- apply(dat, 2, function(x) if (any(x %in% forbidden_letter)) NULL else x[!is.na(x)]))
# [[1]]
# [1] "N" "N" "N"
#
# [[2]]
# [1] "N" "S" "S" "S"
#
# [[3]]
# NULL
#
# [[4]]
# [1] "N" "N"
You could then get the vector for each column with:
vectors[[1]]
# [1] "N" "N" "N"
vectors[[4]]
# [1] "N" "N"
I believe you want to filter your matrix based on some conditions.
If you have a matrix my_matrix, you can first filter columns with forbidden letters:
for(col in names(my_matrix)) {
if (any(my_matrix[, col] %in% forbidden_letter)) {
my_matrix[, col] <- NULL
}
}
This will delete columns with forbidden letters.
Now, you can access any column via
my_matrix[, col_name]
or
my_matrix[, col_index]
And the result is already a vector, as you want. You don't need to store each column in another variable as you intended. The matrix object already does that for you.
Finally, to remove the NA values, you can simply do na.omit after filtering. For example,
print(na.omit(my_matrix[, 3]))
will print out only non-NA values of your third column. The logic can be extended to any column.

Extract rows from data frame as list of vectors

For example, I have the data
a <- c("a", "b", "c")
b <- c("x", "y", "z")
df <- data.frame(a = a, b = b)
I want to do something to df so my result is
list(c("a", "x"), c("b", "y"), c("c", "z"))
I want the solution to be vectorized, and I'm having trouble utilizing the right *apply functions...
If you want to split up a data.frame by rows, try
split(df, seq.int(nrow(df)))
Here's another way.
as.data.frame(t(df),stringsAsFactors=FALSE,row.names=NA)
# V1 V2 V3
# 1 a b c
# 2 x y z
This produces a data frame, which is in fact a list of vectors. If you must have a "true" list, you could use this:
as.list(as.data.frame(t(df),stringsAsFactors=FALSE,row.names=NA))
# $V1
# [1] "a" "x"
#
# $V2
# [1] "b" "y"
#
# $V3
# [1] "c" "z"

Resources