Experimental design table in R [duplicate] - r

This question already has answers here:
How to create design matrix in r
(6 answers)
Closed 8 years ago.
How can I generate the following experimental design table in R?

Looks like you want every combination except 0 0 0 0.
> # create all combinations of 4 0s/1s
> design <- expand.grid(0:1, 0:1, 0:1, 0:1)
> design
Var1 Var2 Var3 Var4
1 0 0 0 0
2 1 0 0 0
3 0 1 0 0
4 1 1 0 0
5 0 0 1 0
6 1 0 1 0
7 0 1 1 0
8 1 1 1 0
9 0 0 0 1
10 1 0 0 1
11 0 1 0 1
12 1 1 0 1
13 0 0 1 1
14 1 0 1 1
15 0 1 1 1
16 1 1 1 1
> # remove the single run you don't want
> design[-1,]
Var1 Var2 Var3 Var4
2 1 0 0 0
3 0 1 0 0
4 1 1 0 0
5 0 0 1 0
6 1 0 1 0
7 0 1 1 0
8 1 1 1 0
9 0 0 0 1
10 1 0 0 1
11 0 1 0 1
12 1 1 0 1
13 0 0 1 1
14 1 0 1 1
15 0 1 1 1
16 1 1 1 1

You may make use of a nice trick connected with binary representations of consecutive integers (I assume you do not wish to generate a row with zeros only):
n <- 4
M <- matrix(NA_integer_, nrow=2^n-1, ncol=n)
for (i in 1:(2^n-1))
M[i, ] <- as.integer(intToBits(i)[1:n])
print(M)
which gives for n==4:
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 1 1 0 0
[4,] 0 0 1 0
[5,] 1 0 1 0
[6,] 0 1 1 0
[7,] 1 1 1 0
[8,] 0 0 0 1
[9,] 1 0 0 1
[10,] 0 1 0 1
[11,] 1 1 0 1
[12,] 0 0 1 1
[13,] 1 0 1 1
[14,] 0 1 1 1
[15,] 1 1 1 1

If you're going to analyze factorial designs in R, you're better off using one of the many DoE packages. For instance, the DoE.base package has a function, fac.design(...) which does essentially what you want:
library(DoE.base)
df <- fac.design(nlevels=2,nfactors=4,randomize=F,
factor.names=list(0:1,0:1,0:1,0:1))
As pointed out in another answer, your design is a full factorial, except that is it missing two of the combinations (which makes me wonder if it's a factorial design at all...).

Related

Count number of pairs across elements in a list in R?

Similar questions have been asked about counting pairs, however none seem to be specifically useful for what I'm trying to do.
What I want is to count the number of pairs across multiple list elements and turn it into a matrix. For example, if I have a list like so:
myList <- list(
a = c(2,4,6),
b = c(1,2,3,4),
c = c(1,2,5,7),
d = c(1,2,4,5,8)
)
We can see that the pair 1:2 appears 3 times (once each in a, b, and c). The pair 1:3 appears only once in b. The pair 1:4 appears 2 times (once each in b and d)... etc.
I would like to count the number of times a pair appears and then turn it into a symmetrical matrix. For example, my desired output would look something like the matrix I created manually (where each element of the matrix is the total count for that pair of values):
> myMatrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0 3 1 2 2 0 1 1
[2,] 3 0 1 3 2 1 1 1
[3,] 1 1 0 1 0 0 0 0
[4,] 2 3 1 0 0 0 0 1
[5,] 2 2 0 0 0 0 1 1
[6,] 0 1 0 0 0 0 0 0
[7,] 1 1 0 0 1 0 0 0
[8,] 1 1 0 1 1 0 0 0
Any suggestions are greatly appreciated
Inspired by #akrun's answer, I think you can use a crossproduct to get this very quickly and simply:
out <- tcrossprod(table(stack(myList)))
diag(out) <- 0
# values
#values 1 2 3 4 5 6 7 8
# 1 0 3 1 2 2 0 1 1
# 2 3 0 1 3 2 1 1 1
# 3 1 1 0 1 0 0 0 0
# 4 2 3 1 0 1 1 0 1
# 5 2 2 0 1 0 0 1 1
# 6 0 1 0 1 0 0 0 0
# 7 1 1 0 0 1 0 0 0
# 8 1 1 0 1 1 0 0 0
Original answer:
Use combn to get the combinations, as well as reversing each combination.
Then convert to a data.frame and table the results.
tab <- lapply(myList, \(x) combn(x, m=2, FUN=\(cm) rbind(cm, rev(cm)), simplify=FALSE))
tab <- data.frame(do.call(rbind, unlist(tab, rec=FALSE)))
table(tab)
# X2
#X1 1 2 3 4 5 6 7 8
# 1 0 3 1 2 2 0 1 1
# 2 3 0 1 3 2 1 1 1
# 3 1 1 0 1 0 0 0 0
# 4 2 3 1 0 1 1 0 1
# 5 2 2 0 1 0 0 1 1
# 6 0 1 0 1 0 0 0 0
# 7 1 1 0 0 1 0 0 0
# 8 1 1 0 1 1 0 0 0
We could loop over the list, get the pairwise combinations with combn, stack it to a two column dataset, convert the 'values' column to factor with levels specified as 1 to 8, get the frequency count (table), do a cross product (crossprod), convert the output back to logical, and then Reduce the list elements by adding elementwise and finally assign the diagonal elements to 0. (If needed set the names attributes of dimnames to NULL
out <- Reduce(`+`, lapply(myList, function(x)
crossprod(table(transform(stack(setNames(
combn(x,
2, simplify = FALSE), combn(x, 2, paste, collapse="_"))),
values = factor(values, levels = 1:8))[2:1]))> 0))
diag(out) <- 0
names(dimnames(out)) <- NULL
-output
> out
1 2 3 4 5 6 7 8
1 0 3 1 2 2 0 1 1
2 3 0 1 3 2 1 1 1
3 1 1 0 1 0 0 0 0
4 2 3 1 0 1 1 0 1
5 2 2 0 1 0 0 1 1
6 0 1 0 1 0 0 0 0
7 1 1 0 0 1 0 0 0
8 1 1 0 1 1 0 0 0
I thought of a solution based on #TarJae answer, is not a elegant one, but it was a fun challenge!
Libraries
library(tidyverse)
Code
map_df(myList,function(x) as_tibble(t(combn(x,2)))) %>%
count(V1,V2) %>%
{. -> temp_df} %>%
bind_rows(
temp_df %>%
rename(V2 = V1, V1 = V2)
) %>%
full_join(
expand_grid(V1 = 1:8,V2 = 1:8)
) %>%
replace_na(replace = list(n = 0)) %>%
arrange(V2,V1) %>%
pivot_wider(names_from = V1,values_from = n) %>%
as.matrix()
Output
V2 1 2 3 4 5 6 7 8
[1,] 1 0 3 1 2 2 0 1 1
[2,] 2 3 0 1 3 2 1 1 1
[3,] 3 1 1 0 1 0 0 0 0
[4,] 4 2 3 1 0 1 1 0 1
[5,] 5 2 2 0 1 0 0 1 1
[6,] 6 0 1 0 1 0 0 0 0
[7,] 7 1 1 0 0 1 0 0 0
[8,] 8 1 1 0 1 1 0 0 0
First identify the possible combination of each vector from the list to a tibble then I bind them to one tibble and count the combinations.
library(tidyverse)
a <- as_tibble(t(combn(myList[[1]],2)))
b <- as_tibble(t(combn(myList[[2]],2)))
c <- as_tibble(t(combn(myList[[3]],2)))
d <- as_tibble(t(combn(myList[[4]],2)))
bind_rows(a,b,c,d) %>%
count(V1, V2)
V1 V2 n
<dbl> <dbl> <int>
1 1 2 3
2 1 3 1
3 1 4 2
4 1 5 2
5 1 7 1
6 1 8 1
7 2 3 1
8 2 4 3
9 2 5 2
10 2 6 1
11 2 7 1
12 2 8 1
13 3 4 1
14 4 5 1
15 4 6 1
16 4 8 1
17 5 7 1
18 5 8 1

one hot encode each column in a Int matrix in R

I have an issue of translating matrix into one hot encoding in R. I implemented in Matlab but i have difficulty in handling the object in R. Here i have an object of type 'matrix'.
I would like to apply one hot encoding to this matrix. I have problem with column names.
here is an example:
> set.seed(4)
> t <- matrix(floor(runif(10, 1,9)),5,5)
[,1] [,2] [,3] [,4] [,5]
[1,] 5 3 5 3 5
[2,] 1 6 1 6 1
[3,] 3 8 3 8 3
[4,] 3 8 3 8 3
[5,] 7 1 7 1 7
> class(t)
[1] "matrix"
Expecting:
1_1 1_3 1_5 1_7 2_1 2_3 2_6 2_8 ...
[1,] 0 0 1 0 0 1 0 0 ...
[2,] 1 0 0 0 0 0 1 0 ...
[3,] 0 1 0 0 0 0 0 1 ...
[4,] 0 1 0 0 0 0 0 1 ...
[5,] 0 0 0 1 1 0 0 0 ...
I tried the following, but the matrix remains the same.
library(data.table)
library(mltools)
test_table <- one_hot(as.data.table(t))
Any suggestions would be very much appreciated.
Your data table must contain some columns (variables) that have class "factor". Try this:
> t <- data.table(t)
> t[,V1:=factor(V1)]
> one_hot(t)
V1_1 V1_3 V1_5 V1_7 V2 V3 V4 V5
1: 0 0 1 0 3 5 3 5
2: 1 0 0 0 6 1 6 1
3: 0 1 0 0 8 3 8 3
4: 0 1 0 0 8 3 8 3
5: 0 0 0 1 1 7 1 7
But I read that from here that the dummyVars function from the caret package is quicker if your matrix is large.
Edit: Forgot to set the seed. :P
And a quick way to factor all variables in a data table:
t.f <- t[, lapply(.SD, as.factor)]
There are probably more concise ways to do this but this should work (and is at least easy to read and understand ;)
Suggested solution using base R and double loop:
set.seed(4)
t <- matrix(floor(runif(10, 1,9)),5,5)
# initialize result object
#
t_hot <- NULL
# for each column in original matrix
#
for (col in seq_along(t[1,])) {
# for each unique value in this column (sorted so the resulting
# columns appear in order)
#
for (val in sort(unique(t[, col]))) {
t_hot <- cbind(t_hot, ifelse(t[, col] == val, 1, 0))
# make name for this column
#
colnames(t_hot)[ncol(t_hot)] <- paste0(col, "_", val)
}
}
This returns:
1_1 1_3 1_5 1_7 2_1 2_3 2_6 2_8 3_1 3_3 3_5 3_7 4_1 4_3 4_6 4_8 5_1 5_3 5_5 5_7
[1,] 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0
[2,] 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0
[3,] 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0
[4,] 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0
[5,] 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1

R: when both events (columns) are true, refer to one of them to decide the value

I want to manipulate two columns in R, so that when both events are true, refer to one of the columns to decide the value. For example:
a<- c(0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0)
b<- c(0,1,1,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0)
when a and b are both true, at a[9] and a[10], refer to b to decide the value of another column c in the following lines. Then, if b is FALSE at some line, (here is line 17) check again if both a and b are true. So, the desired output is like this:
c<- c(0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0)
data <- cbind(a,b,c)
data
a b c
[1,] 0 0 0
[2,] 0 1 0
[3,] 0 1 0
[4,] 0 0 0
[5,] 0 0 0
[6,] 1 0 0
[7,] 1 0 0
[8,] 1 0 0
[9,] 1 1 1
[10,] 1 1 1
[11,] 0 1 1
[12,] 0 1 1
[13,] 0 1 1
[14,] 0 1 1
[15,] 0 1 1
[16,] 0 1 1
[17,] 0 0 0
[18,] 0 0 0
As the data comes in many lines, I would prefer the use vectorized method like ifelse() to handle this.
Many thanks to all the people who can help me with this.
I'm not 100% sure I understand the issue but here is a tidyverse solution that reproduces the output:
a<- c(0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0)
b<- c(0,1,1,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0)
df <- data.frame(a,b)
library(tidyverse)
df %>% mutate(c=case_when(
a == 1 & b == 1 ~ 1,
a == 0 & b == 0 ~ 0,
TRUE ~ NA_real_
)) %>% fill(c)
# a b c
# 1 0 0 0
# 2 0 1 0
# 3 0 1 0
# 4 0 0 0
# 5 0 0 0
# 6 1 0 0
# 7 1 0 0
# 8 1 0 0
# 9 1 1 1
# 10 1 1 1
# 11 0 1 1
# 12 0 1 1
# 13 0 1 1
# 14 0 1 1
# 15 0 1 1
# 16 0 1 1
# 17 0 0 0
# 18 0 0 0

Matrix from rows with delimited items in R

I have a such database with semicolon delimited values in rows:
A;1;3;5;7;9
B;1;2;3
C;1;3;5
D;2;4;8
There is different count of items in each row. Each item is only once in each row (no repeating).
I'd like to make a matrix for item base collaborative filtering. The first column with letters is deleted and the numbers are transformed like this:
1 2 3 4 5 6 7 8 9
-----------------
1 0 1 0 1 0 1 0 1
1 1 1 0 0 0 0 0 0
1 0 1 0 1 0 0 0 0
0 1 0 1 0 0 0 0 0
Can you please give me an advice how to manage it?
Here is an option. We read in the string into a character vector, strsplit on ;, initialize the empty matrix, and then assign for each row using a matrix index of the row with all the column values:
DAT <- readLines(textConnection("A;1;3;5;7;9
B;1;2;3
C;1;3;5
D;2;4;8"))
DAT.NUM <- lapply(strsplit(DAT, ";"), function(x) as.integer(x[-1]))
RES <- matrix(0L, length(DAT), max(unlist(DAT.NUM)))
for(i in seq_along(DAT)) RES[cbind(i, DAT.NUM[[i]])] <- 1L
Produces:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 0 1 0 1 0 1 0 1
[2,] 1 1 1 0 0 0 0 0 0
[3,] 1 0 1 0 1 0 0 0 0
[4,] 0 1 0 1 0 0 0 1 0
Alternatively, inspired by #user227710, you can:
t(table(stack(setNames(DAT.NUM, seq_along(DAT.NUM)))))
Which produces:
values
ind 1 2 3 4 5 7 8 9
1 1 0 1 0 1 1 0 1
2 1 1 1 0 0 0 0 0
3 1 0 1 0 1 0 0 0
4 0 1 0 1 0 0 1 0

list all permutations of k numbers, taken from 0:k, that sums to k

This question is closely related to another question R:sample(). I want to find a way in R to list all the permutations of k numbers, that sums to k, where each number is chosen from 0:k. If k=7, I can choose 7 numbers from 0,1,...,7. A feasible solution is then 0,1,2,3,1,0,0 another is 1,1,1,1,1,1,1. I don't want to generate all permutations, since if k is just fairly larger than 7 this explodes.
Of course in the k=7 example I could use the following:
perms7<-matrix(numeric(7*1716),ncol=7)
count=0
for(i in 0:7)
for(j in 0:(7-i))
for(k in 0:(7-i-j))
for(l in 0:(7-i-j-k))
for(n in 0:(7-i-j-k-l))
for(m in 0:(7-i-j-k-l-n)){
res<-7-i-j-k-l-n-m
count<-count+1
perms7[count,]<-c(i,j,k,l,n,m,res)
}
head(perms7,10)
But how can I generalize this approach to account for any k without having to write (k-1) loops?
I tried to come up with a recursive scheme:
perms7<-matrix(numeric(7*1716),ncol=7) #store solutions (adjustable size later)
k<-7 #size of interest
d<-0 #depth
count=0 #count of permutations
rec<-function(j,d,a){
a<-a-j #max loop
d<-d+1 #depth (posistion)
for(i in 0:a ) {
if(d<(k-1)) rec(i,d,a)
count<<-count+1
perms7[count,d]<<-i
perms7[count,k]<<-k-sum(perms7[count,-k])
}
}
rec(0,0,k)
But got stuck, and I'm not quite sure this is the right way to go. Wonder if there is any "magic" R function that is neat for this (though very specific) problem or just part of it.
In the k=7 case, all the 2.097.152 permutations and the 1.716 that sum to k=7 can be found by:
library(gtools)
k=7
perms <- permutations(k+1, k, 0:k, repeats.allowed=T) #all permutations
perms.k <- perms[rowSums(perms) == k,] #permutations which sums to k
for k=8 there are 43.046.721 permutations but I only want to list the 6.435.
Any help is greatly appreciated!
There's a package for that...
require( partitions )
parts(7)
#[1,] 7 6 5 5 4 4 4 3 3 3 3 2 2 2 1
#[2,] 0 1 2 1 3 2 1 3 2 2 1 2 2 1 1
#[3,] 0 0 0 1 0 1 1 1 2 1 1 2 1 1 1
#[4,] 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1
#[5,] 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1
#[6,] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
#[7,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
You appear to be looking for compositions(). e.g. for k=4:
parts(4)
#[1,] 4 3 2 2 1
#[2,] 0 1 2 1 1
#[3,] 0 0 0 1 1
#[4,] 0 0 0 0 1
compositions(4,4)
#[1,] 4 3 2 1 0 3 2 1 0 2 1 0 1 0 0 3 2 1 0 2 1 0 1 0 0 2 1 0 1 0 0 1 0 0 0
#[2,] 0 1 2 3 4 0 1 2 3 0 1 2 0 1 0 0 1 2 3 0 1 2 0 1 0 0 1 2 0 1 0 0 1 0 0
#[3,] 0 0 0 0 0 1 1 1 1 2 2 2 3 3 4 0 0 0 0 1 1 1 2 2 3 0 0 0 1 1 2 0 0 1 0
#[4,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 4
And just to check your math... :-)
ncol(compositions(8,8))
#[1] 6435

Resources