Store loop result for each iteration as separate vectors - r

I have all-characters matrix with 13 row and 28 columns (my_matrix):
N N ... S N
N S ... F N
N S ... Z NA
NA S ... F NA
.. .. ... .. ..
NA NA ... N NA
What I want to do is
Delete all NA(s)
Delete all column that has at least one of these characters: "Z",
"B", "E", "V", "D", "VS", "VZ"
And store them as separate vectors, looked like this way (or by row, either way is okay):
my_vector1 = N N N
my_vector2 = N S S S ...etc.
What I have done is to write a loop looks like this:
forbidden_letter <- c("Z", "B", "E", "V", "D", "VS", "VZ")
for(i in 1:28) {
if(!any(forbidden_letter %in% my_matrix[,i]==TRUE)){
temp_my_vector <- matrix(my_matrix[,i], byrow = TRUE)
my_vector <- matrix(temp_my_vector[!temp_my_vector %in% NA], byrow= TRUE)
print(my_vector)}
}
my_vector
With this piece of code, I only get the last iteration's of my_vector
How do I save each iterations into separate vectors?

Let's consider some data:
(dat <- cbind(c("N", "N", "N", NA, NA), c("N", "S", "S", "S", NA), c("S", "F", "Z", "F", "N"), c("N", "N", NA, NA, NA)))
# [,1] [,2] [,3] [,4]
# [1,] "N" "N" "S" "N"
# [2,] "N" "S" "F" "N"
# [3,] "N" "S" "Z" NA
# [4,] NA "S" "F" NA
# [5,] NA NA "N" NA
forbidden_letter <- c("Z", "B", "E", "V", "D", "VS", "VZ")
You can process each column separately with apply:
(vectors <- apply(dat, 2, function(x) if (any(x %in% forbidden_letter)) NULL else x[!is.na(x)]))
# [[1]]
# [1] "N" "N" "N"
#
# [[2]]
# [1] "N" "S" "S" "S"
#
# [[3]]
# NULL
#
# [[4]]
# [1] "N" "N"
You could then get the vector for each column with:
vectors[[1]]
# [1] "N" "N" "N"
vectors[[4]]
# [1] "N" "N"

I believe you want to filter your matrix based on some conditions.
If you have a matrix my_matrix, you can first filter columns with forbidden letters:
for(col in names(my_matrix)) {
if (any(my_matrix[, col] %in% forbidden_letter)) {
my_matrix[, col] <- NULL
}
}
This will delete columns with forbidden letters.
Now, you can access any column via
my_matrix[, col_name]
or
my_matrix[, col_index]
And the result is already a vector, as you want. You don't need to store each column in another variable as you intended. The matrix object already does that for you.
Finally, to remove the NA values, you can simply do na.omit after filtering. For example,
print(na.omit(my_matrix[, 3]))
will print out only non-NA values of your third column. The logic can be extended to any column.

Related

Turn a datatable into a two-dimensional list in R

I have a data.table (see dt). I want to turn it into a 2-dimensional list for future use (e.g. a, b and c are column names of another dt. I want to select the value of a non-missing column among a, b and c then impute into x, and so on). So the 2-dimensional list will act like a reference object for fcoalesce function.
# example
dt <- data.table(col1 = c("a", "b", "c", "d", "e", "f"),
col2 = c("x", "x", "x", "y", "y", "z"))
# desirable result
list.1 <- list(c("a", "b", "c"), c("d", "e"), c("f"))
list.2 <- list("x", "y", "z")
list(list.1, list.2)
Since the actual dt is much larger than the example dt, is there a more efficient way to do it?
You can use split():
lst1 <- split(dt$col1, dt$col2)
lst2 <- as.list(names(lst1))
result <- list(unname(lst1), lst2)
result
# [[1]]
# [[1]][[1]]
# [1] "a" "b" "c"
#
# [[1]][[2]]
# [1] "d" "e"
#
# [[1]][[3]]
# [1] "f"
#
#
# [[2]]
# [[2]][[1]]
# [1] "x"
#
# [[2]][[2]]
# [1] "y"
#
# [[2]][[3]]
# [1] "z"

Finding specific elements in lists

I am stuck at one of the challenges proposed in a tutorial I am reading.
# Using the following code:
challenge_list <- list(words = c("alpha", "beta", "gamma"),
numbers = 1:10
letter = letters
# challenge_list
# Extract the following things:
#
# - The word "gamma"
# - The letters "a", "e", "i", "o", and "u"
# - The numbers less than or equal to 3
I have tried using the followings:
## 1
challenge_list$"gamma"
## 2
challenge_list [[1]["gamma"]]
But nothing works.
> challenge_list$words[challenge_list$words == "gamma"]
[1] "gamma"
> challenge_list$letter[challenge_list$letter %in% c("a","e","i","o","u")]
[1] "a" "e" "i" "o" "u"
> challenge_list$numbers[challenge_list$numbers<=3]
[1] 1 2 3
We can use a function and then do the subset if it is numeric or not and then use Map to pass the list to vector that correspond to the original list element and apply the f1. This would return the new list with the filtered values
f1 <- function(x, y) if(is.numeric(x)) x[ x <= y] else x [x %in% y]
out <- Map(f1, challenge_list, list('gamma', 3, c("a","e","i","o","u")))
out
-output
#$words
#[1] "gamma"
#$numbers
#[1] 1 2 3
#$letter
#[1] "a" "e" "i" "o" "u"
Try this. Most of R objects can be filtered using brackets. In the case of lists you have to use a pair of them like [[]][] because the first one points to the object inside the list and the second one makes reference to the elements inside them. For vectors the task is easy as you only can use a pair of brackets and set conditions to extract elements. Here the code:
#Data
challenge_list <- list(words = c("alpha", "beta", "gamma"),
numbers = 1:10
letter = letters
#Code
challenge_list[[1]][1]
letter[letter %in% c("a", "e", "i", "o","u")]
numbers[numbers<=3]
As I have noticed your data is in a list, you can also play with the position of the elements like this:
#Data 2
challenge_list <- list(words = c("alpha", "beta", "gamma"),numbers = 1:10,letter = letters)
#Code 2
challenge_list[[1]][1]
challenge_list[[3]][challenge_list[[3]] %in% c("a", "e", "i", "o","u")]
challenge_list[[2]][challenge_list[[2]]<=3]
Output:
challenge_list[[1]][1]
[1] "alpha"
challenge_list[[3]][challenge_list[[3]] %in% c("a", "e", "i", "o","u")]
[1] "a" "e" "i" "o" "u"
challenge_list[[2]][challenge_list[[2]]<=3]
[1] 1 2 3

R: subset of character vector

I want to get a subset from a character vector. However I want to obtain vector2 containing elements from initial vector between specific elements.
vector <- c("a", "", "b", "c","","d", "e")
vector
how to grab all elements between elements "b" and "e" and get vector2?
#Expected result:
vector2
"c","","d"
You can also do something like this:
vector <- c("a", "", "b", "c","","d", "e")
vector[seq(which(vector=="b")+1,which(vector=="e")-1)]
#[1] "c" "" "d"
Here is one option
f <- function(x, left, right) {
idx <- x %in% c(left, right)
x[as.logical(cumsum(idx) * !idx)]
}
f(vector, "b", "e")
# [1] "c" "" "d"
The first step is to calculate idx as
vector %in% c("b", "e")
# [1] FALSE FALSE TRUE FALSE FALSE FALSE TRUE
then calculate the cumulative sum
cumsum(vector %in% c("b", "e"))
# [1] 0 0 1 1 1 1 2
multiply by !vector %in% c("b", "e") which gives
cumsum(vector %in% c("b", "e")) * !vector %in% c("b", "e")
# [1] 0 0 0 1 1 1 0
convert to this to a logical vector and use it to subset x.
For the given example another option is charmatch
x <- charmatch(c("b", "e"), vector) + c(1, -1)
vector[seq.int(x[1], x[2])]
# [1] "c" "" "d"
With negative subscripts:
x[-c(1:which(x == 'b'), which(x =='e'):length(x))]
#[1] "c" "" "d"
In case when e is found before b it returns empty vector:
(y <- rev(x))
#[1] "e" "d" "" "c" "b" "" "a"
y[-c(1:which(y == 'b'), which(y =='e'):length(y))]
#character(0)
You can also try:
vector[cumsum(vector %in% c("b", "e")) == 1][-1]
[1] "c" "" "d"

Extract rows from data frame as list of vectors

For example, I have the data
a <- c("a", "b", "c")
b <- c("x", "y", "z")
df <- data.frame(a = a, b = b)
I want to do something to df so my result is
list(c("a", "x"), c("b", "y"), c("c", "z"))
I want the solution to be vectorized, and I'm having trouble utilizing the right *apply functions...
If you want to split up a data.frame by rows, try
split(df, seq.int(nrow(df)))
Here's another way.
as.data.frame(t(df),stringsAsFactors=FALSE,row.names=NA)
# V1 V2 V3
# 1 a b c
# 2 x y z
This produces a data frame, which is in fact a list of vectors. If you must have a "true" list, you could use this:
as.list(as.data.frame(t(df),stringsAsFactors=FALSE,row.names=NA))
# $V1
# [1] "a" "x"
#
# $V2
# [1] "b" "y"
#
# $V3
# [1] "c" "z"

How do I filter two matrices based on common values in a column?

I am trying to filter two matrices based on the first column
a <- matrix(c("b", "s", "a", "w", "r", "te", "fds", "s", "h", "a", "df", "tyi"), nrow = 4)
colnames(a) <- c("fir", "sec", "thi")
fir sec thi
[1,] "b" "r" "h"
[2,] "s" "te" "a"
[3,] "a" "fds" "df"
[4,] "w" "s" "tyi"
b <- matrix(c("a","b","c","d", "e", "f", "g", "h", "i"), nrow = 3)
colnames(b) <- c("fir", "sec", "thi")
fir sec thi
[1,] "a" "d" "g"
[2,] "b" "e" "h"
[3,] "c" "f" "i"
Basically what I want to do is subset matrix a based on the hits in b[,1]
So since (row1, col1) and (row3, col1) in matrix a match certain values in column 1 in matrix b, I'd like to extract those two rows.
I appreciate any tips and advice. Thank you.
Also, can someone explain why this doesn't work?
> c <- intersect(a[,1], b[,1])
> c
[1] "b" "a"
> a[a[,1]==c]
[1] "b" "r" "h"
You could try this, although there may be a more elegant way to do it.
matched <- a[,1] %in% b[,1]
a[matched,]

Resources