Subsetting matrices - r

Considering following vector res and matrix team.
the vector res represent indices, and I require to extract only those names whose index number is in vector res and gender="F".
I need to do this in R and as I am a newbie to R, could not resolve this.
res
[1] 2 12 16 5 6 19 17 14 9 4
team
names genders
[1,] "aa" "M"
[2,] "ab" "M"
[3,] "al" "M"
[4,] "alp" "M"
[5,] "amr" "F"
[6,] "and" "M"
[7,] "an" "M"
[8,] "anv" "F"
[9,] "as" "M"
[10,] "ed" "M"
[11,] "neh" "F"
[12,] "pan" "M"
[13,] "poo" "F"
[14,] "ra" "M"
[15,] "roh" "M"
[16,] "shr" "F"
[17,] "sub" "M"
[18,] "val" "M"
[19,] "xi" "M"

There are many ways to do this.
You could first pick which rows are in res:
team$names[res]
Then you can pick which ones have gender being "F":
team$names[res][ team$genders[res]=="F" ]
Note that team$genders[res] picks out the genders corresponding to the rows in res, and then you filter to only accept those that are female.
If you liked, you could do it the other way round:
team$names[ team$genders=="F" & (1:nrow(team) %in% res) ]
Here team$genders=="F" is a logical vector of length nrow(team), being TRUE whenever the gender is "F" and FALSE otherwise.
The 1:nrow(team) generates row numbers, and 1:nrow(team) %in% res is TRUE if the row number is in res.
The & says "make sure that the gender is "F" AND the row number is in res".
You could even do which(team$genders=="F") which returns a vector of row numbers for females, and then do:
team$names[ intersect( which(team$genders=="F") , res ) ]
where the intersect picks row numbers that are present in both res and the females.
And I'm sure people with think of more ways.

This should work if your team is either a matrix or a data.frame:
# emulate your data
team <- data.frame(names=LETTERS, genders=rep(c("M","F"), 13))
res <- 10:26
team[intersect(res, which(team[,"genders"]=="F")), "names"]
#[1] J L N P R T V X Z
#Levels: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
# Try with a matrix instead of data.frame
team <- as.matrix(team)
team[intersect(res, which(team[,"genders"]=="F")), "names"]
#[1] "J" "L" "N" "P" "R" "T" "V" "X" "Z"
The basic idea is to get the indices of the "F" gender rows (using which) and then use the set operation intersect to AND it with your res indices. There are also union and setdiff variants that can be useful at times.

team <- structure(c("aa", "ab", "al", "alp", "amr", "and", "an", "anv",
"as", "ed", "neh", "pan", "poo", "ra", "roh", "shr", "sub", "val",
"xi", "M", "M", "M", "M", "F", "M", "M", "F", "M", "M", "F",
"M", "F", "M", "M", "F", "M", "M", "M"), .Dim = c(19L, 2L), .Dimnames = list(
NULL, c("names", "genders")))
team[,"names"][ intersect( which(team[,"genders"]=="F") , res ) ]
#[1] "amr" "shr"
team[,"names"][ team[,"genders"]=="F" & 1:NROW(team) %in% res ]
#[1] "amr" "shr"

Related

Reshape 2d into 3d matrix with rows as columns and columns as the 3rd dimension

How would I reshape a matrix where I'd need every 2 rows to start a new column and every column to be in the third dimension in R? I haven't really tried anything outside of dim() which obviously didn't work, as I'm having a hard time wrapping my head around this transformation.
What I have:
d h
c g
b f
a e
What I want:
[ , , 1]
b d
a c
[ , , 2]
g h
e f
I think you can get the required structure of your matrix adjusting the dim attribute.
I added two additional rows to your matrix, it looks like this :
mat <- structure(c("d", "c", "b", "a", "k", "l", "h", "g", "f", "e",
"j", "m"), .Dim = c(6L, 2L))
mat
# [,1] [,2]
#[1,] "d" "h"
#[2,] "c" "g"
#[3,] "b" "f"
#[4,] "a" "e"
#[5,] "k" "j"
#[6,] "l" "m"
To get every column into 3rd dimension and only 2 values in each column you can do :
dim(mat) <- c(2, nrow(mat)/2, ncol(mat))
mat
#, , 1
# [,1] [,2] [,3]
#[1,] "d" "b" "k"
#[2,] "c" "a" "l"
#, , 2
# [,1] [,2] [,3]
#[1,] "h" "f" "j"
#[2,] "g" "e" "m"
We can use array
array(t(df1), c(2, 2, 2))

Obtain positions of multiple characters simultaneously from a list in R

My data structure is as follows:
m
[[1]]
[[1]][[1]]
[1] "g" "g" "h" "k" "k" "k" "l"
[[2]]
[[2]][[1]]
[1] "g" "h" "k" "k" "k" "l" "g"
[[3]]
[[3]][[1]]
[1] "g" "h" "h" "h" "k" "l" "h"
I want to find positions of each unique characters simultaneously. Individually I can obtain positions of each character using the following code:
t<-list()
for (i in 1:length(m)){
for (j in m[[i]][[1]]){
if (j=="k"){
t[[i]]<-grep(j,m[[i]][[1]],fixed=TRUE)}}}
The result I obtain is as follows:
t
[[1]]
[1] 4 5 6
[[2]]
[1] 3 4 5
[[3]]
[1] 5
There are 4 unique characters in list m and I will get 4 lists with positions of unique characters using my code but I have to manually enter each character into the loop. I need a single code which will calculate positions for all the
unique characters simultaneously.
The vectors in your list seem all to be of equal length (if not this could be adjusted by making them equal length). I would first restructure the data:
m <- list(list(c("g", "g", "h", "k", "k", "k", "l")),
list(c("g", "h", "k", "k", "k", "l", "g")))
m <- do.call(rbind, lapply(m, "[[", 1))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#[1,] "g" "g" "h" "k" "k" "k" "l"
#[2,] "g" "h" "k" "k" "k" "l" "g"
Then you can use outer to make all comparisons at once:
res <- which(outer(m, unique(c(m)), "=="), arr.ind = TRUE)
res <- as.data.frame(res)
res$dim3 <- factor(res$dim3, labels = unique(c(m)))
names(res) <- c("list_element", "vector_element", "letter")
#check
res[res$letter == "k" & res$list_element == 1, "vector_element"]
[1] 4 5 6
This solution won't work well if your data is huge.
We can use a double loop with lapply where we check position using which for each unique value in m
lst <- lapply(unique(unlist(m)), function(x)
lapply(m, function(y) which(x == y[[1]])))
lst
#[[1]]
#[[1]][[1]]
#[1] 1 2
#[[1]][[2]]
#[1] 1 7
#[[1]][[3]]
#[1] 1
#[[2]]
#[[2]][[1]]
#[1] 3
#[[2]][[2]]
#[1] 2
#[[2]][[3]]
#[1] 2 3 4 7
......
To identify which values represent which character we can name the list
names(lst) <- unique(unlist(m))
lst
#$g
#$g[[1]]
#[1] 1 2
#$g[[2]]
#[1] 1 7
#$g[[3]]
#[1] 1
...

Applying split function to overlapping rows of a matrix

Suppose I have a matrix which looks like this:
[1] a b c
[2] d e f
[3] g h i
[4] j k l
[5] m n o
[6] p q r
Now I want to split this matrix into smaller ones with each 3 rows, starting from the first row, then the second, ..., so it looks like this in the end:
[1] a b c
[2] d e f
[3] g h i
[1] d e f
[2] g h i
[3] j k l
[1] g h i
[2] j k l
[3] m n o
...
I tried the following code, which didn't do it for me:
lapply(split(1:nrow(matrix),(1:nrow(matrix)-1) %/%3+1),
function(i) matrix[i,])
Can someone help me with this?
The split method showed in the OP's post will split into blocks of 3 rows and that will not be mutually exclusive. Whereas if we want to split in a way that each list element starts with each of the rows of the matrix and the next two rows, we can loop through the sequence of rows, get the sequence from that index to the next two and subset the matrix
lapply(head(seq_len(nrow(matrix)), -2), function(i) matrix[i:(i+2),])
#[[1]]
# [,1] [,2] [,3]
#[1,] "a" "b" "c"
#[2,] "d" "e" "f"
#[3,] "g" "h" "i"
#[[2]]
# [,1] [,2] [,3]
#[1,] "d" "e" "f"
#[2,] "g" "h" "i"
#[3,] "h" "k" "l"
#[[3]]
# [,1] [,2] [,3]
#[1,] "g" "h" "i"
#[2,] "h" "k" "l"
#[3,] "m" "n" "o"
[[4]]
[,1] [,2] [,3]
[1,] "h" "k" "l"
[2,] "m" "n" "o"
[3,] "p" "q" "r"
Or as #lmo suggested, another version of the above would be
lapply(seq_len(nrow(matrix) -2L) - 1L, function(x) matrix[x + 1:3,])
or another option is to create the splitting group with rollapply (from zoo) and then do the split
library(zoo)
grp <- rollapply(seq_len(nrow(matrix)), 3, FUN = I)
lapply(split(grp, row(grp)), function(i) matrix[i, ])
NOTE: matrix is a function name. It is better not to name objects with function names or other reserved words
data
matrix <- structure(c("a", "d", "g", "h", "m", "p", "b", "e", "h", "k",
"n", "q", "c", "f", "i", "l", "o", "r"), .Dim = c(6L, 3L))

How to merge two lists in parallel in R?

I'm asking to how to merge two lists in parallel, not orderly append as below codes.
For example,
A <- list(c(1,2,3), c(3,4,5), c(6,7,8))
B <- list(c("a", "b", "c"), c("d", "e", "f"), c("g", "h", "i"))
As results,
[[1]]
[[1]][[1]]
[1] 1 2 3
[[1]][[2]]
[1] "a" "b" "c"
[[2]]
[[2]][[1]]
[1] 3 4 5
[[2]][[2]]
[1] "d" "e" "f"
[[3]]
[[3]][[1]]
[1] 6 7 8
[[3]][[2]]
[1] "g" "h" "i"
Using  Map simply:
Map(list,A,B)
A longer approach (not recursive yet, up to second level merging):
A <- list(c(1,2,3), c(3,4,5), c(6,7,8))
B <- list(c("a", "b", "c"), c("d", "e", "f"), c("g", "h", "i"))
mergepar <- function(x = A, y = B) { # merge two lists in parallel
ln <- max(length(x), length(y)) # max length
newlist <- as.list(rep(NA, ln)) # empty list of max length
for (i in 1:ln) { # for1, across length
# two level subsetting (first with [ and then [[, so no subscript out of bound error) and lapply
newlist[[i]] <- lapply(list(A, B), function(x) "[["("["(x, i), 1))
}
return(newlist)
}

Appending values with different order in R

I have two data elements in R:
data1
1 M
2 T
3 Z
4 A
5 J
data2 values
[1,] "A" "aa"
[2,] "J" "ab"
[3,] "M" "ac"
[4,] "T" "ad"
[5,] "Z" "ae"
I would like to get:
data1 values
[1,] "M" "ac"
[2,] "T" "ad"
[3,] "Z" "ae"
[4,] "A" "aa"
[5,] "J" "ab"
How can I append the values to data 1 such that they are sorted according to the different order in data 1?
You can get this behavior with the match function:
dat1 = data.frame(data1=c("M", "T", "Z", "A", "J"), stringsAsFactors=FALSE)
dat2 = data.frame(data2=c("A", "J", "M", "T", "Z"),
values=c("aa", "ab", "ac", "ad", "ae"), stringsAsFactors=FALSE)
dat2[match(dat1$data1, dat2$data2),]
# data2 values
# 3 M ac
# 4 T ad
# 5 Z ae
# 1 A aa
# 2 J ab

Resources