Creating special matrix in R - r

I have a matrix as follows.
dat = matrix(c(0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 2, 3, 4, 5, 6), ncol=4)
colnames(dat)=c("m1","m2","m3","m4")
dat
m1 m2 m3 m4
1 0 1 0 2
2 0 0 0 3
3 1 1 0 4
4 1 1 1 5
5 1 1 1 6
I would like to create four matrix(5*4) which each matrix column obtain by multiplying by itself and then each pair row values res1 = (m1*m1, m1*m2, m1*m3, m1*m4) , res2 = (m1*m2, m2*m2, m2*m3, m2*m4), res3 = (mm1*m3, m2*m3, m3*m3, m4*m3), res4 = (m1*m4, m2*m4, m3*m4, m4*m4) such as
res1
1 0 0 0 0
2 0 0 0 0
3 1 1 0 4
4 1 1 1 5
5 1 1 1 6
res2
1 1 1 0 2
2 0 0 0 0
3 1 1 0 4
4 1 1 1 5
5 1 1 1 6
res3
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 1 1 1 5
5 1 1 1 6
res4
1 0 2 0 4
2 0 0 0 9
3 4 4 0 16
4 5 5 5 25
5 6 6 6 36
How can I do it efficiently in R?

Running
res <- lapply(1:ncol(dat), function(i) dat * dat[,i])
will work thanks to the recycling of the element-wise multiplication. If you multiply by one column, those values will repeat over the entire matrix. And lapply will return them all in a list. You can get them out individually as res[[1]], res[[2]], etc.

test<-NULL
for (i in 1:ncol(dat)){
x<-dat*dat[,i]
test[i]<-list(x)
}
same as #Mrflick's comment
test[[2]]
m1 m2 m3 m4
[1,] 0 1 0 2
[2,] 0 0 0 0
[3,] 1 1 0 4
[4,] 1 1 1 5
[5,] 1 1 1 6

Related

How to repeat rows by their value by multiple columns and divide back

Let's say I have this dataframe:
> df <- data.frame(A=1:5, B=c(0, 0, 3, 0, 0), C=c(1, 0, 0, 1, 0), D=c(0, 2, 0, 0, 1))
> df
A B C D
1 1 0 1 0
2 2 0 0 2
3 3 3 0 0
4 4 0 1 0
5 5 0 0 1
How would I go about converting it to:
A B C D
1 1 0 1 0
2 2 0 0 1
3 2 0 0 1
4 3 1 0 0
5 3 1 0 0
6 3 1 0 0
7 4 0 1 0
8 5 0 0 1
As you can see there are value 2 and 3, I want to repeat them by that length and change the values back to 1. How would I do that?
I also want to duplicate the the A column as you can see.
I tried:
replace(df[rep(rownames(df), select(df, -A)),], 2, 1)
But it gives me an error.
One option would be to get max value from columns B, C and D using pmax, use uncount to repeat the rows. Use pmin to replace the values greater than 1 to 1.
library(dplyr)
library(tidyr)
df %>%
mutate(repeat_row = pmax(B, C, D)) %>%
uncount(repeat_row) %>%
mutate(across(-A, pmin, 1))
# A B C D
#1 1 0 1 0
#2 2 0 0 1
#3 2 0 0 1
#4 3 1 0 0
#5 3 1 0 0
#6 3 1 0 0
#7 4 0 1 0
#8 5 0 0 1
Apparently, there's just one value > 0 in columns B to D, so we can exploit the partial rowSums for a replicate call on columns B to D binarized using > 0. So that we can use this in Map, we transpose twice. Rest is cosmetics.
t(do.call(cbind, Map(replicate,
rowSums(df[-1]),
as.data.frame(t(cbind(df[1], df[-1] > 0)))))) |>
as.data.frame() |>
setNames(names(df))
# A B C D
# 1 1 0 1 0
# 2 2 0 0 1
# 3 2 0 0 1
# 4 3 1 0 0
# 5 3 1 0 0
# 6 3 1 0 0
# 7 4 0 1 0
# 8 5 0 0 1
Note: R>=4.1 used.
Just to modify Ronak Shah's answer a bit, I realized you could simply just do it with only dplyr:
library(dplyr)
df[rep(rownames(df), apply(select(df, -A), 1, max)),] %>%
as.data.frame(row.names=1:nrow(.)) %>%
mutate(across(-A, pmin, 1))
Output:
A B C D
1 1 0 1 0
2 2 0 0 1
3 2 0 0 1
4 3 1 0 0
5 3 1 0 0
6 3 1 0 0
7 4 0 1 0
8 5 0 0 1
Or with rowSums:
library(dplyr)
df[rep(rownames(df), rowSums(select(df, -A)),] %>%
as.data.frame(row.names=1:nrow(.)) %>%
mutate(across(-A, pmin, 1))

Count number of pairs across elements in a list in R?

Similar questions have been asked about counting pairs, however none seem to be specifically useful for what I'm trying to do.
What I want is to count the number of pairs across multiple list elements and turn it into a matrix. For example, if I have a list like so:
myList <- list(
a = c(2,4,6),
b = c(1,2,3,4),
c = c(1,2,5,7),
d = c(1,2,4,5,8)
)
We can see that the pair 1:2 appears 3 times (once each in a, b, and c). The pair 1:3 appears only once in b. The pair 1:4 appears 2 times (once each in b and d)... etc.
I would like to count the number of times a pair appears and then turn it into a symmetrical matrix. For example, my desired output would look something like the matrix I created manually (where each element of the matrix is the total count for that pair of values):
> myMatrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0 3 1 2 2 0 1 1
[2,] 3 0 1 3 2 1 1 1
[3,] 1 1 0 1 0 0 0 0
[4,] 2 3 1 0 0 0 0 1
[5,] 2 2 0 0 0 0 1 1
[6,] 0 1 0 0 0 0 0 0
[7,] 1 1 0 0 1 0 0 0
[8,] 1 1 0 1 1 0 0 0
Any suggestions are greatly appreciated
Inspired by #akrun's answer, I think you can use a crossproduct to get this very quickly and simply:
out <- tcrossprod(table(stack(myList)))
diag(out) <- 0
# values
#values 1 2 3 4 5 6 7 8
# 1 0 3 1 2 2 0 1 1
# 2 3 0 1 3 2 1 1 1
# 3 1 1 0 1 0 0 0 0
# 4 2 3 1 0 1 1 0 1
# 5 2 2 0 1 0 0 1 1
# 6 0 1 0 1 0 0 0 0
# 7 1 1 0 0 1 0 0 0
# 8 1 1 0 1 1 0 0 0
Original answer:
Use combn to get the combinations, as well as reversing each combination.
Then convert to a data.frame and table the results.
tab <- lapply(myList, \(x) combn(x, m=2, FUN=\(cm) rbind(cm, rev(cm)), simplify=FALSE))
tab <- data.frame(do.call(rbind, unlist(tab, rec=FALSE)))
table(tab)
# X2
#X1 1 2 3 4 5 6 7 8
# 1 0 3 1 2 2 0 1 1
# 2 3 0 1 3 2 1 1 1
# 3 1 1 0 1 0 0 0 0
# 4 2 3 1 0 1 1 0 1
# 5 2 2 0 1 0 0 1 1
# 6 0 1 0 1 0 0 0 0
# 7 1 1 0 0 1 0 0 0
# 8 1 1 0 1 1 0 0 0
We could loop over the list, get the pairwise combinations with combn, stack it to a two column dataset, convert the 'values' column to factor with levels specified as 1 to 8, get the frequency count (table), do a cross product (crossprod), convert the output back to logical, and then Reduce the list elements by adding elementwise and finally assign the diagonal elements to 0. (If needed set the names attributes of dimnames to NULL
out <- Reduce(`+`, lapply(myList, function(x)
crossprod(table(transform(stack(setNames(
combn(x,
2, simplify = FALSE), combn(x, 2, paste, collapse="_"))),
values = factor(values, levels = 1:8))[2:1]))> 0))
diag(out) <- 0
names(dimnames(out)) <- NULL
-output
> out
1 2 3 4 5 6 7 8
1 0 3 1 2 2 0 1 1
2 3 0 1 3 2 1 1 1
3 1 1 0 1 0 0 0 0
4 2 3 1 0 1 1 0 1
5 2 2 0 1 0 0 1 1
6 0 1 0 1 0 0 0 0
7 1 1 0 0 1 0 0 0
8 1 1 0 1 1 0 0 0
I thought of a solution based on #TarJae answer, is not a elegant one, but it was a fun challenge!
Libraries
library(tidyverse)
Code
map_df(myList,function(x) as_tibble(t(combn(x,2)))) %>%
count(V1,V2) %>%
{. -> temp_df} %>%
bind_rows(
temp_df %>%
rename(V2 = V1, V1 = V2)
) %>%
full_join(
expand_grid(V1 = 1:8,V2 = 1:8)
) %>%
replace_na(replace = list(n = 0)) %>%
arrange(V2,V1) %>%
pivot_wider(names_from = V1,values_from = n) %>%
as.matrix()
Output
V2 1 2 3 4 5 6 7 8
[1,] 1 0 3 1 2 2 0 1 1
[2,] 2 3 0 1 3 2 1 1 1
[3,] 3 1 1 0 1 0 0 0 0
[4,] 4 2 3 1 0 1 1 0 1
[5,] 5 2 2 0 1 0 0 1 1
[6,] 6 0 1 0 1 0 0 0 0
[7,] 7 1 1 0 0 1 0 0 0
[8,] 8 1 1 0 1 1 0 0 0
First identify the possible combination of each vector from the list to a tibble then I bind them to one tibble and count the combinations.
library(tidyverse)
a <- as_tibble(t(combn(myList[[1]],2)))
b <- as_tibble(t(combn(myList[[2]],2)))
c <- as_tibble(t(combn(myList[[3]],2)))
d <- as_tibble(t(combn(myList[[4]],2)))
bind_rows(a,b,c,d) %>%
count(V1, V2)
V1 V2 n
<dbl> <dbl> <int>
1 1 2 3
2 1 3 1
3 1 4 2
4 1 5 2
5 1 7 1
6 1 8 1
7 2 3 1
8 2 4 3
9 2 5 2
10 2 6 1
11 2 7 1
12 2 8 1
13 3 4 1
14 4 5 1
15 4 6 1
16 4 8 1
17 5 7 1
18 5 8 1

Transform variable length list into matrix in R

If I had a list of vectors of variable lengths :
[[1]]
[1] 1 2 3 4
[[2]]
[1] 4 5 6
[[3]]
[1] 1 2 3 4 5 6 7 8 9
[[4]]
[1] 'a' 'b' 'c'
How could I transform this into a data frame / logical matrix with elements of the list represented as columns?
i.e a dataframe like:
1 2 3 4 5 6 7 8 9 'a' 'b' 'c'
[1] 1 1 1 1 0 0 0 0 0 0 0 0
[2] 0 0 0 1 1 1 0 0 0 0 0 0
[3] 1 1 1 1 1 1 1 1 1 0 0 0
[4] 0 0 0 0 0 0 0 0 0 1 1 1
some data:
x <- list(c(1, 2, 3, 4), c(4, 5, 6), c(1, 2, 3, 4, 5, 6, 7, 8, 9), c("a", "b", "c"))
Here is a base R option:
# extract unique values from x
uv <- unique(unlist(x))
# Check in each element of lists which values are present and bind everything toegether
out <- do.call(rbind, lapply(x, function(e) as.integer(uv %in% e) ))
# Convert from matrix to data.frame and add column names
out <- setNames(as.data.frame(out), uv)
out
1 2 3 4 5 6 7 8 9 a b c
1 1 1 1 1 0 0 0 0 0 0 0 0
2 0 0 0 1 1 1 0 0 0 0 0 0
3 1 1 1 1 1 1 1 1 1 0 0 0
4 0 0 0 0 0 0 0 0 0 1 1 1
Here is a base R option with stack and table
table(stack(setNames(x, seq_along(x)))[2:1])
# values
#ind 1 2 3 4 5 6 7 8 9 a b c
# 1 1 1 1 1 0 0 0 0 0 0 0 0
# 2 0 0 0 1 1 1 0 0 0 0 0 0
# 3 1 1 1 1 1 1 1 1 1 0 0 0
# 4 0 0 0 0 0 0 0 0 0 1 1 1
Something like this?
library(tidyverse)
x = list(c(1, 2, 3, 4), c(4, 5, 6), c(1, 2, 3, 4, 5, 6, 7, 8, 9))
y = tibble(column1= map_chr(x, str_flatten, " "))
Where y is this:
# A tibble: 3 x 1
column1
<chr>
1 1 2 3 4
2 4 5 6
3 1 2 3 4 5 6 7 8 9

Shifting rows in R

My data is as follows:
1 2 3 4 5
0 1 2 3 4
0 0 1 2 3
0 0 0 0 1
0 0 0 0 1
How can I make the data so that it will look like this:
1 2 3 4 5
1 2 3 4 0
1 2 3 0 0
0 1 0 0 0
1 0 0 0 0
So that the first row don't shift, the second row shifted left by 1, third row shifted left by 2, fourth row shifted left by 3, and last row shifted left by 4?
I tried to at first shift all the rows below the first row to the left by 1, but apparently, it doesn't work.
nc <- ncol(df)
df[-(1), 2:nc] <- df[-(1), 2:(nc+1)]
df[-(1), 10] <- 0
df
You can use the shift function from data.table with fill = 0. If you want the output as a data.frame, put data.frame() around the last line.
mat <- as.matrix(df)
library(data.table)
t(sapply(seq(nrow(mat)), function(i) shift(mat[i,], i - 1, 'lead', fill = 0)))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 2 3 4 5
# [2,] 1 2 3 4 0
# [3,] 1 2 3 0 0
# [4,] 0 1 0 0 0
# [5,] 1 0 0 0 0
A base R option:
m <- as.matrix(read.table(text = "1 2 3 4 5
0 1 2 3 4
0 0 1 2 3
0 0 0 0 1
0 0 0 0 1"))
do.call(rbind, lapply(seq_along(1:nrow(m)),
function(i) {c(m[i, i:ncol(m)], rep(0, i-1))}))
# V1 V2 V3 V4 V5
#[1,] 1 2 3 4 5
#[2,] 1 2 3 4 0
#[3,] 1 2 3 0 0
#[4,] 0 1 0 0 0
#[5,] 1 0 0 0 0

Splitting one column into multiple columns

I have a huge dataset in which there is one column including several values for each subject (row). Here is a simplified sample dataframe:
data <- data.frame(subject = c(1:8), sex = c(1, 2, 2, 1, 2, 1, 1, 2),
age = c(35, 29, 31, 46, 64, 57, 49, 58),
v1 = c("2", "0", "3,5", "2 1", "A,4", "B,1,C", "A and B,3", "5, 6 A or C"))
> data
subject sex age v1
1 1 1 35 2
2 2 2 29 0
3 3 2 31 3,5 # separated by a comma
4 4 1 46 2 1 # separated by a blank space
5 5 2 64 A,4
6 6 1 57 B,1,C
7 7 1 49 A and B,3
8 8 2 58 5, 6 A or C
I first want to remove the letters (A, B, A and B, …) in the fourth column (v1), and then split the fourth column into multiple columns just like this:
subject sex age x1 x2 x3 x4 x5 x6
1 1 1 35 0 1 0 0 0 0
2 2 2 29 0 0 0 0 0 0
3 3 2 31 0 0 1 0 1 0
4 4 1 46 1 1 0 0 0 0
5 5 2 64 0 0 0 1 0 0
6 6 1 57 1 0 0 0 0 0
7 7 1 49 0 0 1 0 0 0
8 8 2 58 0 0 0 0 1 1
where the 1st subject takes 1 at x2 because it takes 2 at v1 in the original dataset, the 3rd subject takes 1 at both x3 and x5 because it takes 3 and 5 at v1 in the original dataset, and so on.
I would appreciate any help on this question. Thanks a lot.
You can cbind this result to data[-4] and get what you need:
0+t(sapply(as.character(data$v1), function(line)
sapply(1:6, function(x) x %in% unlist(strsplit(line, split="\\s|\\,"))) ))
#----------------
[,1] [,2] [,3] [,4] [,5] [,6]
2 0 1 0 0 0 0
0 0 0 0 0 0 0
3,5 0 0 1 0 1 0
2 1 1 1 0 0 0 0
A,4 0 0 0 1 0 0
B,1,C 1 0 0 0 0 0
A and B,3 0 0 1 0 0 0
5, 6 A or C 0 0 0 0 1 1
One solution:
r <- sapply(strsplit(as.character(dt$v1), "[^0-9]+"), as.numeric)
m <- as.data.frame(t(sapply(r, function(x) {
y <- rep(0, 6)
y[x[!is.na(x)]] <- 1
y
})))
data <- cbind(data[, c("subject", "sex", "age")], m)
# subject sex age V1 V2 V3 V4 V5 V6
# 1 1 1 35 0 1 0 0 0 0
# 2 2 2 29 0 0 0 0 0 0
# 3 3 2 31 0 0 1 0 1 0
# 4 4 1 46 1 1 0 0 0 0
# 5 5 2 64 0 0 0 1 0 0
# 6 6 1 57 1 0 0 0 0 0
# 7 7 1 49 0 0 1 0 0 0
# 8 8 2 58 0 0 0 0 1 1
Following DWin's awesome solution, m could be modified as:
m <- as.data.frame(t(sapply(r, function(x) {
0 + 1:6 %in% x[!is.na(x)]
})))

Resources