Splitting one column into multiple columns - r

I have a huge dataset in which there is one column including several values for each subject (row). Here is a simplified sample dataframe:
data <- data.frame(subject = c(1:8), sex = c(1, 2, 2, 1, 2, 1, 1, 2),
age = c(35, 29, 31, 46, 64, 57, 49, 58),
v1 = c("2", "0", "3,5", "2 1", "A,4", "B,1,C", "A and B,3", "5, 6 A or C"))
> data
subject sex age v1
1 1 1 35 2
2 2 2 29 0
3 3 2 31 3,5 # separated by a comma
4 4 1 46 2 1 # separated by a blank space
5 5 2 64 A,4
6 6 1 57 B,1,C
7 7 1 49 A and B,3
8 8 2 58 5, 6 A or C
I first want to remove the letters (A, B, A and B, …) in the fourth column (v1), and then split the fourth column into multiple columns just like this:
subject sex age x1 x2 x3 x4 x5 x6
1 1 1 35 0 1 0 0 0 0
2 2 2 29 0 0 0 0 0 0
3 3 2 31 0 0 1 0 1 0
4 4 1 46 1 1 0 0 0 0
5 5 2 64 0 0 0 1 0 0
6 6 1 57 1 0 0 0 0 0
7 7 1 49 0 0 1 0 0 0
8 8 2 58 0 0 0 0 1 1
where the 1st subject takes 1 at x2 because it takes 2 at v1 in the original dataset, the 3rd subject takes 1 at both x3 and x5 because it takes 3 and 5 at v1 in the original dataset, and so on.
I would appreciate any help on this question. Thanks a lot.

You can cbind this result to data[-4] and get what you need:
0+t(sapply(as.character(data$v1), function(line)
sapply(1:6, function(x) x %in% unlist(strsplit(line, split="\\s|\\,"))) ))
#----------------
[,1] [,2] [,3] [,4] [,5] [,6]
2 0 1 0 0 0 0
0 0 0 0 0 0 0
3,5 0 0 1 0 1 0
2 1 1 1 0 0 0 0
A,4 0 0 0 1 0 0
B,1,C 1 0 0 0 0 0
A and B,3 0 0 1 0 0 0
5, 6 A or C 0 0 0 0 1 1

One solution:
r <- sapply(strsplit(as.character(dt$v1), "[^0-9]+"), as.numeric)
m <- as.data.frame(t(sapply(r, function(x) {
y <- rep(0, 6)
y[x[!is.na(x)]] <- 1
y
})))
data <- cbind(data[, c("subject", "sex", "age")], m)
# subject sex age V1 V2 V3 V4 V5 V6
# 1 1 1 35 0 1 0 0 0 0
# 2 2 2 29 0 0 0 0 0 0
# 3 3 2 31 0 0 1 0 1 0
# 4 4 1 46 1 1 0 0 0 0
# 5 5 2 64 0 0 0 1 0 0
# 6 6 1 57 1 0 0 0 0 0
# 7 7 1 49 0 0 1 0 0 0
# 8 8 2 58 0 0 0 0 1 1
Following DWin's awesome solution, m could be modified as:
m <- as.data.frame(t(sapply(r, function(x) {
0 + 1:6 %in% x[!is.na(x)]
})))

Related

Creating a factor/categorical variable from 4 dummies

I have a data frame with four columns, let's call them V1-V4 and ten observations. Exactly one of V1-V4 is 1 for each row, and the others of V1-V4 are 0. I want to create a new column called NEWCOL that takes on the value of 3 if V3 is 1, 4 if V4 is 1, and is 0 otherwise.
I have to do this for MANY sets of variables V1-V4 so I would like the solution to be as short as possible so that it will be easy to replicate.
This does it for 4 columns to add a fifth using matrix multiplication:
> cbind( mydf, newcol=data.matrix(mydf) %*% c(0,0,3,4) )
V1 V2 V3 V4 newcol
1 1 0 0 0 0
2 1 0 0 0 0
3 0 1 0 0 0
4 0 1 0 0 0
5 0 0 1 0 3
6 0 0 1 0 3
7 0 0 0 1 4
8 0 0 0 1 4
9 0 0 0 1 4
10 0 0 0 1 4
It's generalizable to getting multiple columns.... we just need the rules. You need to make a matric with the the same number of rows as there are columns in the original data and have one column for each of the new factors needed to build each new variable. This shows how to build one new column from the sum of 3 times the third column plus 4 times the fourth, and another new column from one times the first and 2 times the second.
> cbind( mydf, newcol=data.matrix(mydf) %*% matrix(c(0,0,3,4, # first set of factors
1,2,0,0), # second set
ncol=2) )
V1 V2 V3 V4 newcol.1 newcol.2
1 1 0 0 0 0 1
2 1 0 0 0 0 1
3 0 1 0 0 0 2
4 0 1 0 0 0 2
5 0 0 1 0 3 0
6 0 0 1 0 3 0
7 0 0 0 1 4 0
8 0 0 0 1 4 0
9 0 0 0 1 4 0
10 0 0 0 1 4 0
An example data set:
mydf <- data.frame(V1 = c(1, 1, rep(0, 8)),
V2 = c(0, 0, 1, 1, rep(0, 6)),
V3 = c(rep(0, 4), 1, 1, rep(0, 4)),
V4 = c(rep(0, 6), rep(1, 4)))
# V1 V2 V3 V4
# 1 1 0 0 0
# 2 1 0 0 0
# 3 0 1 0 0
# 4 0 1 0 0
# 5 0 0 1 0
# 6 0 0 1 0
# 7 0 0 0 1
# 8 0 0 0 1
# 9 0 0 0 1
# 10 0 0 0 1
Here's an easy approach to generate the new column:
mydf <- transform(mydf, NEWCOL = V3 * 3 + V4 * 4)
# V1 V2 V3 V4 NEWCOL
# 1 1 0 0 0 0
# 2 1 0 0 0 0
# 3 0 1 0 0 0
# 4 0 1 0 0 0
# 5 0 0 1 0 3
# 6 0 0 1 0 3
# 7 0 0 0 1 4
# 8 0 0 0 1 4
# 9 0 0 0 1 4
# 10 0 0 0 1 4

Creating special matrix in R

I have a matrix as follows.
dat = matrix(c(0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 2, 3, 4, 5, 6), ncol=4)
colnames(dat)=c("m1","m2","m3","m4")
dat
m1 m2 m3 m4
1 0 1 0 2
2 0 0 0 3
3 1 1 0 4
4 1 1 1 5
5 1 1 1 6
I would like to create four matrix(5*4) which each matrix column obtain by multiplying by itself and then each pair row values res1 = (m1*m1, m1*m2, m1*m3, m1*m4) , res2 = (m1*m2, m2*m2, m2*m3, m2*m4), res3 = (mm1*m3, m2*m3, m3*m3, m4*m3), res4 = (m1*m4, m2*m4, m3*m4, m4*m4) such as
res1
1 0 0 0 0
2 0 0 0 0
3 1 1 0 4
4 1 1 1 5
5 1 1 1 6
res2
1 1 1 0 2
2 0 0 0 0
3 1 1 0 4
4 1 1 1 5
5 1 1 1 6
res3
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 1 1 1 5
5 1 1 1 6
res4
1 0 2 0 4
2 0 0 0 9
3 4 4 0 16
4 5 5 5 25
5 6 6 6 36
How can I do it efficiently in R?
Running
res <- lapply(1:ncol(dat), function(i) dat * dat[,i])
will work thanks to the recycling of the element-wise multiplication. If you multiply by one column, those values will repeat over the entire matrix. And lapply will return them all in a list. You can get them out individually as res[[1]], res[[2]], etc.
test<-NULL
for (i in 1:ncol(dat)){
x<-dat*dat[,i]
test[i]<-list(x)
}
same as #Mrflick's comment
test[[2]]
m1 m2 m3 m4
[1,] 0 1 0 2
[2,] 0 0 0 0
[3,] 1 1 0 4
[4,] 1 1 1 5
[5,] 1 1 1 6

In R: efficiently convert one format (character vector) to another format (numeric matrix)

Using one software, I can calculate the fingerprint like this:
>L
[1] "1 1:1 2:1 3:1 5:1 6:1 8:1"
[2] "5 1:1 2:1 4:1"
[3] "9 1:1 2:1 7:1 10:1"
The first value: 1, 5, 9 is the corresponding molecular names, and the remaining is the corresponding finger prints, which have a fixed length, say 10. It means that one the left of ":" is the position and on the right is the bit, where 1 indicate having this bit, and 0 is omit (indicate no bit), so I would like to restore the original format. That is for the 10 bit, every bit should have corresponding value:
L should like this, I can save L as csv format.
mol 1 2 3 4 5 6 7 8 9 10
1 1 1 1 0 1 1 0 1 0 0
5 1 1 0 1 0 0 0 0 0 0
9 1 1 0 0 0 0 1 0 0 1
Here, the L have million rows, what is the efficient way to convert the wanted format?
Thanks.
Update
To avoid read.csv just use strsplit and the non-exported splitstackshape:::numMat functions:
M <- strsplit(L, "\\s+|:")
cbind(mol = as.numeric(sapply(M, `[`, 1)),
splitstackshape:::numMat(lapply(M, `[`, -1), fill=0))
Update 2: Benchmarks
For the curious....
The sample data:
L <- c("1 1:1 2:1 3:1 5:1 6:1 8:1",
"5 1:1 2:1 4:1",
"9 1:1 2:1 7:1 10:1")
M <- replicate(10000, L)
#thelatemail's answer:
fun1 <- function() {
spl <- lapply(strsplit(M,"\\s+|:.? |:.$"),as.numeric)
vals <- lapply(spl,"[",-1)
data.frame(
mol=sapply(spl,"[",1),
t(sapply(vals, function(x) {
out <- rep(0,max(unlist(vals)))
out[x] <- 1
out} ))
)
}
system.time(out_late <- fun1())
# user system elapsed
# 98.36 1.28 100.06
head(out_late)
# mol X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# 1 1 1 1 1 0 1 1 0 1 0 0
# 2 5 1 1 0 1 0 0 0 0 0 0
# 3 9 1 1 0 0 0 0 1 0 0 1
# 4 1 1 1 1 0 1 1 0 1 0 0
# 5 5 1 1 0 1 0 0 0 0 0 0
# 6 9 1 1 0 0 0 0 1 0 0 1
My updated answer:
library(splitstackshape)
fun2 <- function() {
M <- strsplit(M, "\\s+|:")
cbind(mol = as.numeric(sapply(M, `[`, 1)),
splitstackshape:::numMat(lapply(M, `[`, -1), fill=0))
}
system.time(out_ananda <- fun2())
# user system elapsed
# 0.67 0.00 0.68
head(out_ananda)
# mol 1 2 3 4 5 6 7 8 9 10
# [1,] 1 1 1 1 0 1 1 0 1 0 0
# [2,] 5 1 1 0 1 0 0 0 0 0 0
# [3,] 9 1 1 0 0 0 0 1 0 0 1
# [4,] 1 1 1 1 0 1 1 0 1 0 0
# [5,] 5 1 1 0 1 0 0 0 0 0 0
# [6,] 9 1 1 0 0 0 0 1 0 0 1
#Matthew's answer. Note that this would need to be modified to accept different "val" values.
fun3 <- function() {
t(sapply(strsplit(M, "\\s+"), function(l) {
mol <- as.numeric(l[1])
names(mol) <- 'mol'
val <- numeric(10)
names(val) <- 1:10
for (x in strsplit(l[-1], ":"))
val[x[1]] <- as.numeric(x[2])
c(mol, val)
}))
}
system.time(out_matthew <- fun3())
# user system elapsed
# 2.33 0.00 2.34
head(out_matthew)
# mol 1 2 3 4 5 6 7 8 9 10
# [1,] 1 1 1 1 0 1 1 0 1 0 0
# [2,] 5 1 1 0 1 0 0 0 0 0 0
# [3,] 9 1 1 0 0 0 0 1 0 0 1
# [4,] 1 1 1 1 0 1 1 0 1 0 0
# [5,] 5 1 1 0 1 0 0 0 0 0 0
# [6,] 9 1 1 0 0 0 0 1 0 0 1
An attempt using base R functions, assuming L is the same as used by #Ananda.
spl <- lapply(strsplit(L,"\\s+|:.? |:.$"),as.numeric)
vals <- lapply(spl,"[",-1)
data.frame(
mol=sapply(spl,"[",1),
t(sapply(vals, function(x) {
out <- rep(0,max(unlist(vals)))
out[x] <- 1
out} ))
)
# mol X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
#1 1 1 1 1 0 1 1 0 1 0 0
#2 5 1 1 0 1 0 0 0 0 0 0
#3 9 1 1 0 0 0 0 1 0 0 1
Borrowing from thelatemail, here's an expression which returns a matrix of the proper elements. Rather than setting the value to 1, I set the value to whatever follows the : character in a for loop. Then the whole thing is transposed to give the format that you desire.
t(sapply(strsplit(L, "\\s+"), function(l) {
# Each line is passed in as a vector, the first element is "mol"
mol <- as.numeric(l[1])
names(mol) <- 'mol'
# Store the values in a vector of length 10, with names
val <- numeric(10)
names(val) <- 1:10
# Split the tail of the input vector on ":" and assign to the proper slot of the output vector
for (x in strsplit(l[-1], ":"))
val[x[1]] <- as.numeric(x[2])
# Put them back together
c(mol, val)
}))
## mol 1 2 3 4 5 6 7 8 9 10
## [1,] 1 1 1 1 0 1 1 0 1 0 0
## [2,] 5 1 1 0 1 0 0 0 0 0 0
## [3,] 9 1 1 0 0 0 0 1 0 0 1

Sum pairs of columns by group

I wish to sum pairs of columns by group. In the example below I wish to sum pairs (v1 and v2), (v3 and v4), and (v5 and v6), each by r1, r2 and r3.
I can do this using the sapply statement below and I get the correct answer. However, the required code is complex. Could someone show me how to do the same operation perhaps in package data.table or with rollapply and/or other options? I have not yet explored those options.
Sorry if this is a duplicate.
my.data <- read.table(text= "
r1 r2 r3 t1 t2 t3 v1 v2 v3 v4 v5 v6
1 0 0 10 20 30 1 0 0 0 0 0
1 0 0 10 20 30 1 1 0 0 0 0
1 0 0 10 20 30 1 0 1 0 0 0
1 0 0 10 20 30 1 0 1 1 0 0
1 0 0 10 20 30 0 0 0 0 0 0
0 1 0 10 20 30 0 1 1 1 1 1
0 1 0 10 20 30 0 0 1 1 1 1
0 1 0 10 20 30 0 0 0 1 1 1
0 1 0 10 20 30 0 0 0 0 1 1
0 1 0 10 20 30 0 0 0 0 0 1
0 0 1 10 20 30 1 1 1 1 1 1
0 0 1 10 20 30 1 0 1 1 1 1
0 0 1 10 20 30 1 0 0 1 1 1
0 0 1 10 20 30 1 0 0 0 1 1
0 0 1 10 20 30 1 0 0 0 0 1
", header=TRUE, na.strings=NA)
my.data$my.group <- which(my.data[,1:3]==1, arr.ind=TRUE)[,2]
my.data
my.sums <- t(sapply(split(my.data[,7:(ncol(my.data)-1)], my.data$my.group), function(i) sapply(seq(2, ncol(i), 2), function(j) sum(i[,c((j-1),j)], na.rm=TRUE))))
my.sums
# [,1] [,2] [,3]
# 1 5 3 0
# 2 1 5 9
# 3 6 5 9
Here's a pretty general expression that you can probably simplify if you want it to match your specific data dimensions/column names/etc:
library(data.table)
dt = data.table(my.data)
dt[, lapply(1:(ncol(.SD)/2), function(x) sum(.SD[[2*x-1]], .SD[[2*x]])),
by = eval(grep('^r', names(dt), value = TRUE)),
.SDcols = grep('^v', names(dt), value = TRUE)]
# r1 r2 r3 V1 V2 V3
#1: 1 0 0 5 3 0
#2: 0 1 0 1 5 9
#3: 0 0 1 6 5 9
Also, using aggregate and mapply:
DF <- my.data
#function to sum 2 columns
fun <- function(col1, col2)
{
rowSums(aggregate(DF[c(col1, col2)], by = list(DF$r1, DF$r2, DF$r3), sum)[c(4, 5)])
}
#all pairs of columns, to be summed, in a matrix
#(7 is the column of v1)
args_mat <- matrix(7:ncol(DF), ncol = 2, byrow = T)
#apply `fun` to all pairs
mapply(fun, args_mat[,1], args_mat[,2])
# [,1] [,2] [,3]
#[1,] 5 3 0
#[2,] 1 5 9
#[3,] 6 5 9

How to create design matrix in r

I have two factors. factor A have 2 level, factor B have 3 level.
How to create the following design matrix?
factorA1 factorA2 factorB1 factorB2 factorB3
[1,] 1 0 1 0 0
[2,] 1 0 0 1 0
[3,] 1 0 0 0 1
[4,] 0 1 1 0 0
[5,] 0 1 0 1 0
[6,] 0 1 0 0 1
You have a couple of options:
Use base and piece it together yourself:
(iris.dummy<-with(iris,model.matrix(~Species-1)))
(IRIS<-data.frame(iris,iris.dummy))
Or use the ade4 package as follows:
dummy <- function(df) {
require(ade4)
ISFACT <- sapply(df, is.factor)
FACTS <- acm.disjonctif(df[, ISFACT, drop = FALSE])
NONFACTS <- df[, !ISFACT,drop = FALSE]
data.frame(NONFACTS, FACTS)
}
dat <-data.frame(eggs = c("foo", "foo", "bar", "bar"),
ham = c("red","blue","green","red"), x=rnorm(4))
dummy(dat)
## x eggs.bar eggs.foo ham.blue ham.green ham.red
## 1 0.3365302 0 1 0 0 1
## 2 1.1341354 0 1 1 0 0
## 3 2.0489741 1 0 0 1 0
## 4 1.1019108 1 0 0 0 1
Assuming your data in in a data.frame called dat, let's say the two factors are given as in this example:
> dat <- data.frame(f1=sample(LETTERS[1:3],20,T),f2=sample(LETTERS[4:5],20,T),id=1:20)
> dat
f1 f2 id
1 C D 1
2 B E 2
3 B E 3
4 A D 4
5 C E 5
6 C E 6
7 C D 7
8 B E 8
9 C D 9
10 A D 10
11 B E 11
12 C E 12
13 B D 13
14 B E 14
15 A D 15
16 C E 16
17 C D 17
18 C D 18
19 B D 19
20 C D 20
> dat$f1
[1] C B B A C C C B C A B C B B A C C C B C
Levels: A B C
> dat$f2
[1] D E E D E E D E D D E E D E D E D D D D
Levels: D E
You can use outer to get a matrix as you showed, for each factor:
> F1 <- with(dat, outer(f1, levels(f1), `==`)*1)
> colnames(F1) <- paste("f1",sep="=",levels(dat$f1))
> F1
f1=A f1=B f1=C
[1,] 0 0 1
[2,] 0 1 0
[3,] 0 1 0
[4,] 1 0 0
[5,] 0 0 1
[6,] 0 0 1
[7,] 0 0 1
[8,] 0 1 0
[9,] 0 0 1
[10,] 1 0 0
[11,] 0 1 0
[12,] 0 0 1
[13,] 0 1 0
[14,] 0 1 0
[15,] 1 0 0
[16,] 0 0 1
[17,] 0 0 1
[18,] 0 0 1
[19,] 0 1 0
[20,] 0 0 1
Now do the same for the second factor:
> F2 <- with(dat, outer(f2, levels(f2), `==`)*1)
> colnames(F2) <- paste("f2",sep="=",levels(dat$f2))
And cbind them to get the final result:
> cbind(F1,F2)
model.matrix is the process that lm and others use in the background to convert for you.
dat <- data.frame(f1=sample(LETTERS[1:3],20,T),f2=sample(LETTERS[4:5],20,T),id=1:20)
dat
model.matrix(~dat$f1 + dat$f2)
It creates the INTERCEPT variable as a column of 1's, but you can easily remove that if you need.
model.matrix(~dat$f1 + dat$f2)[,-1]
Edit: Now i see that this is essentially the same as one of the other comments, but more concise.
Expanding and generalizing #Ferdinand.kraft's answer:
dat <- data.frame(
f1 = sample(LETTERS[1:3], 20, TRUE),
f2 = sample(LETTERS[4:5], 20, TRUE),
row.names = paste0("id_", 1:20))
covariates <- c("f1", "f2") # in case you have other columns that you don't want to include in the design matrix
design <- do.call(cbind, lapply(covariates, function(covariate){
apply(outer(dat[[covariate]], unique(dat[[covariate]]), FUN = "=="), 2, as.integer)
}))
rownames(design) <- rownames(dat)
colnames(design) <- unlist(sapply(covariates, function(covariate) unique(dat[[covariate]])))
design <- design[, !duplicated(colnames(design))] # duplicated colnames happen sometimes
design
# C A B D E
# id_1 1 0 0 1 0
# id_2 0 1 0 1 0
# id_3 0 0 1 1 0
# id_4 1 0 0 1 0
# id_5 0 1 0 1 0
# id_6 0 1 0 0 1
# id_7 0 0 1 0 1
Model matrix only allows what it calls "dummy" coding for the first factor in a formula.
If the intercept is present, it plays that role. To get the desired effect of a redundant index matrix (where you have a 1 in every column for the corresponding factor level and 0 elsewhere), you can lie to model.matrix() and pretend there's an extra level. Then trim off the intercept column.
> a=rep(1:2,3)
> b=rep(1:3,2)
> df=data.frame(A=a,B=b)
> # Lie and pretend there's a level 0 in each factor.
> df$A=factor(a,as.character(0:2))
> df$B=factor(b,as.character(0:3))
> mm=model.matrix (~A+B,df)
> mm
(Intercept) A1 A2 B1 B2 B3
1 1 1 0 1 0 0
2 1 0 1 0 1 0
3 1 1 0 0 0 1
4 1 0 1 1 0 0
5 1 1 0 0 1 0
6 1 0 1 0 0 1
attr(,"assign")
[1] 0 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$A
[1] "contr.treatment"
attr(,"contrasts")$B
[1] "contr.treatment"
> # mm has an intercept column not requested, so kill it
> dm=as.matrix(mm[,-1])
> dm
A1 A2 B1 B2 B3
1 1 0 1 0 0
2 0 1 0 1 0
3 1 0 0 0 1
4 0 1 1 0 0
5 1 0 0 1 0
6 0 1 0 0 1
> # You can also add interactions
> mm2=model.matrix (~A*B,df)
> dm2=as.matrix(mm2[,-1])
> dm2
A1 A2 B1 B2 B3 A1:B1 A2:B1 A1:B2 A2:B2 A1:B3 A2:B3
1 1 0 1 0 0 1 0 0 0 0 0
2 0 1 0 1 0 0 0 0 1 0 0
3 1 0 0 0 1 0 0 0 0 1 0
4 0 1 1 0 0 0 1 0 0 0 0
5 1 0 0 1 0 0 0 1 0 0 0
6 0 1 0 0 1 0 0 0 0 0 1
Things get complicated with model.matrix() again if we add a covariate x and interactions of x with factors.
a=rep(1:2,3)
b=rep(1:3,2)
x=1:6
df=data.frame(A=a,B=b,x=x)
# Lie and pretend there's a level 0 in each factor.
df$A=factor(a,as.character(0:2))
df$B=factor(b,as.character(0:3))
mm=model.matrix (~A + B + A:x + B:x,df)
print(mm)
(Intercept) A1 A2 B1 B2 B3 A0:x A1:x A2:x B1:x B2:x B3:x
1 1 1 0 1 0 0 0 1 0 1 0 0
2 1 0 1 0 1 0 0 0 2 0 2 0
3 1 1 0 0 0 1 0 3 0 0 0 3
4 1 0 1 1 0 0 0 0 4 4 0 0
5 1 1 0 0 1 0 0 5 0 0 5 0
6 1 0 1 0 0 1 0 0 6 0 0 6
So mm has an intercept, but now A:x interaction terms have an unwanted level A0:x
If we reintroduce x as as a separate term, we will cancel that unwanted level
mm2=model.matrix (~ x + A + B + A:x + B:x, df)
print(mm2)
(Intercept) x A1 A2 B1 B2 B3 x:A1 x:A2 x:B1 x:B2 x:B3
1 1 1 1 0 1 0 0 1 0 1 0 0
2 1 2 0 1 0 1 0 0 2 0 2 0
3 1 3 1 0 0 0 1 3 0 0 0 3
4 1 4 0 1 1 0 0 0 4 4 0 0
5 1 5 1 0 0 1 0 5 0 0 5 0
6 1 6 0 1 0 0 1 0 6 0 0 6
We can get rid of the unwanted intercept and the unwanted bare x term
dm2=as.matrix(mm2[,c(-1,-2)])
print(dm2)
A1 A2 B1 B2 B3 x:A1 x:A2 x:B1 x:B2 x:B3
1 1 0 1 0 0 1 0 1 0 0
2 0 1 0 1 0 0 2 0 2 0
3 1 0 0 0 1 3 0 0 0 3
4 0 1 1 0 0 0 4 4 0 0
5 1 0 0 1 0 5 0 0 5 0
6 0 1 0 0 1 0 6 0 0 6

Resources