How can I create dummy variables from a numeric variable in R?
I want to create N dummy variables. In such a way the numeric variable means how many zeros will come, counting from the first column. Imagine N=6. Like this:
x
a 5
b 2
c 4
d 1
e 9
It must become:
1 2 3 4 5 6
a 0 0 0 0 0 1
b 0 0 1 1 1 1
c 0 0 0 0 1 1
d 0 1 1 1 1 1
e 0 0 0 0 0 0
Thank you!
Here's a hacky solution for you
x = c(5,2,4,1,9)
N = 6
out = matrix(1, length(x), N)
for (i in 1:length(x))
out[i,1:min(x[i], N)] = 0
> out
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 1
[2,] 0 0 1 1 1 1
[3,] 0 0 0 0 1 1
[4,] 0 1 1 1 1 1
[5,] 0 0 0 0 0 0
We could do this in a vectorized manner by creating row/column index and assigning an already created matrix of 1s to 0 based on the index
m1 <- matrix(1, ncol = N, nrow = length(x),
dimnames = list(letters[seq_along(x)], seq_len(N)))
x1 <- pmin(x, ncol(m1))
m1[cbind(rep(seq_len(nrow(m1)), x1), sequence(x1))] <- 0
m1
# 1 2 3 4 5 6
#a 0 0 0 0 0 1
#b 0 0 1 1 1 1
#c 0 0 0 0 1 1
#d 0 1 1 1 1 1
#e 0 0 0 0 0 0
data
x <- c(5,2,4,1,9)
N <- 6
I have been working on this since an hour and I feel like I ran against a wall: I want to transform a vector of comma separated strings to a matrix.
I have a vector like:
'ABC,DFGH,IJ'
'KLMN,OP,DFGH,QR'
'ST,ABC'
I want to get a matrix like
ABC DFGH IJ KLMN OP QR ST
1 1 1 0 0 0 0
0 1 0 1 1 1 0
1 0 0 0 0 0 1
Sample data:
myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')
Base R answers are welcome as well. I might need this trick for some bigger datasets again.
Another base R solution:
> myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')
> mv <- strsplit(myvec,",")
> u <- unique(unlist(mv))
> t(sapply(mv, function(x) u %in% x)*1)
# output without colnames
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 1 1 0 0 0 0
[2,] 0 1 0 1 1 1 0
[3,] 1 0 0 0 0 0 1
> r <- t(sapply(mv, function(x) u %in% x)*1)
# adding colnames
> colnames(r) <- u
> r
ABC DFGH IJ KLMN OP QR ST
[1,] 1 1 1 0 0 0 0
[2,] 0 1 0 1 1 1 0
[3,] 1 0 0 0 0 0 1
library(tidyverse)
myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')
data.frame(myvec) %>% # create a data frame
mutate(id = row_number(), # create row id (helpful in order to reshape)
value = 1) %>% # create value = 1 (helpful in order to reshape)
separate_rows(myvec) %>% # separate values (using the commas; automatically done by this function)
spread(myvec, value, fill = 0) %>% # reshape dataset
select(-id) # remove row id column
# ABC DFGH IJ KLMN OP QR ST
# 1 1 1 1 0 0 0 0
# 2 0 1 0 1 1 1 0
# 3 1 0 0 0 0 0 1
You can try this with BASE R:
Data:
myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')
Solution:
unq <- unique(strsplit(paste0(myvec,collapse=","),",")[[1]])
sapply(unq, function(x)grepl(x,strsplit(myvec,","))+0)
Output:
> sapply(unq, function(x)grepl(x,strsplit(myvec,","))+0)
ABC DFGH IJ KLMN OP QR ST
[1,] 1 1 1 0 0 0 0
[2,] 0 1 0 1 1 1 0
[3,] 1 0 0 0 0 0 1
I would like to create the following vector sequence.
0 1 0 0 2 0 0 0 3 0 0 0 0 4
My thought was to create 0 first with rep() but not sure how to add the 1:4.
Create a diagonal matrix, take the upper triangle, and remove the first element:
d <- diag(0:4)
d[upper.tri(d, TRUE)][-1L]
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
If you prefer a one-liner that makes no global assignments, wrap it up in a function:
(function() { d <- diag(0:4); d[upper.tri(d, TRUE)][-1L] })()
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
And for code golf purposes, here's another variation using d from above:
d[!lower.tri(d)][-1L]
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
rep and rbind up to their old tricks:
rep(rbind(0,1:4),rbind(1:4,1))
#[1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
This essentially creates 2 matrices, one for the value, and one for how many times the value is repeated. rep does not care if an input is a matrix, as it will just flatten it back to a vector going down each column in order.
rbind(0,1:4)
# [,1] [,2] [,3] [,4]
#[1,] 0 0 0 0
#[2,] 1 2 3 4
rbind(1:4,1)
# [,1] [,2] [,3] [,4]
#[1,] 1 2 3 4
#[2,] 1 1 1 1
You can use rep() to create a sequence that has n + 1 of each value:
n <- 4
myseq <- rep(seq_len(n), seq_len(n) + 1)
# [1] 1 1 2 2 2 3 3 3 3 4 4 4 4 4
Then you can use diff() to find the elements you want. You need to append a 1 to the end of the diff() output, since you always want the last value.
c(diff(myseq), 1)
# [1] 0 1 0 0 1 0 0 0 1 0 0 0 0 1
Then you just need to multiply the original sequence with the diff() output.
myseq <- myseq * c(diff(myseq), 1)
myseq
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
unlist(lapply(1:4, function(i) c(rep(0,i),i)))
# the sequence
s = 1:4
# create zeros vector
vec = rep(0, sum(s+1))
# assign the sequence to the corresponding position in the zeros vector
vec[cumsum(s+1)] <- s
vec
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
Or to be more succinct, use replace:
replace(rep(0, sum(s+1)), cumsum(s+1), s)
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
I'd like to make some calculations on FIRST robotics teams and need to build, for lack of better words, a binary interaction matrix. That is when two teams were on the same alliance. Each alliance has three teams, so there are 7 values from each match added to the matrix, when considering (i,j), (j,i), and (i,i).
The full data I'm using is here: http://frc-events.firstinspires.org/2016/MOKC/qualifications
But for simplicity, here is an example of 9 teams playing 1 match each.
> data.frame(Team.1=1:3,Team.2=4:6,Team.3=7:9)
Team.1 Team.2 Team.3
1 1 4 7
2 2 5 8
3 3 6 9
The matrix should count each binary interaction, (1,4),(4,7),(3,6),(6,3),(9,9), etc, and will be an N x N matrix, where in the above example N=9. Here's the matrix that represents the above lists:
> matrix(data=c(1,0,0,1,0,0,1,0,0,+
+ 0,1,0,0,1,0,0,1,0,+
+ 0,0,1,0,0,1,0,0,1,+
+ 1,0,0,1,0,0,1,0,0,+
+ 0,1,0,0,1,0,0,1,0,+
+ 0,0,1,0,0,1,0,0,1,+
+ 1,0,0,1,0,0,1,0,0,+
+ 0,1,0,0,1,0,0,1,0,+
+ 0,0,1,0,0,1,0,0,1),9,9)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 0 0 1 0 0 1 0 0
[2,] 0 1 0 0 1 0 0 1 0
[3,] 0 0 1 0 0 1 0 0 1
[4,] 1 0 0 1 0 0 1 0 0
[5,] 0 1 0 0 1 0 0 1 0
[6,] 0 0 1 0 0 1 0 0 1
[7,] 1 0 0 1 0 0 1 0 0
[8,] 0 1 0 0 1 0 0 1 0
[9,] 0 0 1 0 0 1 0 0 1
In the real data, the team number are not sequential, and are would be more like 5732,1345,3451,etc, and there are more matches per team meaning the matrix values would be between 0 and max number of matches any of the teams played. This can be seen in the real data.
Thanks to anyone that can help.
There is probably a more elegant approach, but here is one using data.table.
library(data.table)
dat <- data.table(Team.1=1:3,Team.2=4:6,Team.3=7:9)
#add match ID
dat[,match:=1:.N]
#turn to long
mdat <- melt(dat,id="match",value.name="team")[,variable:=NULL]
#merge with itself
dat2 <- merge(mdat, mdat, by=c("match"),all=T, allow.cartesian = T)
# reshape
dcast(dat2, team.x~team.y, fun.agg=length)
team.x 1 2 3 4 5 6 7 8 9
1: 1 1 0 0 1 0 0 1 0 0
2: 2 0 1 0 0 1 0 0 1 0
3: 3 0 0 1 0 0 1 0 0 1
4: 4 1 0 0 1 0 0 1 0 0
5: 5 0 1 0 0 1 0 0 1 0
6: 6 0 0 1 0 0 1 0 0 1
7: 7 1 0 0 1 0 0 1 0 0
8: 8 0 1 0 0 1 0 0 1 0
9: 9 0 0 1 0 0 1 0 0 1
And, because I can, one in base-R. A case where I think the use of a for-loop is justified (as you keep modifying the same object).
#make matrix to put results in
nteams = length(unique(unlist(dat)))
res <- matrix(0,nrow=nteams, ncol=nteams)
#split data by row, generate combinations for each row and add to matrix
for(i in 1:nrow(dat)){
x=unlist(dat[i,])
coords=as.matrix(expand.grid(x,x))
res[coords] <- res[coords]+1
}
Here is my suggestion with base functions. I tried to create a matrix. My approach was to look for the position indexes for 1.
library(magrittr)
mydf <- data.frame(Team.1 = 1:3, Team.2 = 4:6,Team.3 = 7:9)
### Create a matrix with position indexes
lapply(1:nrow(mydf), function(x){
a <- t(combn(mydf[x, ], 2)) # Get some combination
b <- a[, 2:1] # Get other combination by reversing columns
foo <- rbind(a, b)
foo
}) %>%
do.call(rbind, .) -> ana
ana <- matrix(unlist(ana), nrow = nrow(ana))
### Another set: Get indexes for self (e.g., (1,1), (2,2), (3,3))
foo <- rep(1:max(mydf), times = 2)
matrix(foo, nrow = length(foo) / 2) -> bob
### A matric with all position indexes
cammy <- rbind(ana, bob)
### Create a plain matrix
mat <- matrix(0, nrow = max(mydf), ncol = max(mydf))
### Fill in the matrix with 1
mat[cammy] <- 1
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] 1 0 0 1 0 0 1 0 0
# [2,] 0 1 0 0 1 0 0 1 0
# [3,] 0 0 1 0 0 1 0 0 1
# [4,] 1 0 0 1 0 0 1 0 0
# [5,] 0 1 0 0 1 0 0 1 0
# [6,] 0 0 1 0 0 1 0 0 1
# [7,] 1 0 0 1 0 0 1 0 0
# [8,] 0 1 0 0 1 0 0 1 0
# [9,] 0 0 1 0 0 1 0 0 1
EDIT
Here is a revised version based on the previous idea. This is not concise like Heroka's idea with base functions. In my modified data, team 1 and 4 had two matches. The idea here is that I counted how many times each pair appeared in the data set. The dplyr part is doing that. In the for loop, I filled in the matrix, mat by going through each row of cammy.
mydf <- data.frame(Team.1=c(1:3,1),Team.2=c(4:6,4),Team.3=c(7:9,5))
# Team.1 Team.2 Team.3
#1 1 4 7
#2 2 5 8
#3 3 6 9
#4 1 4 5
library(dplyr)
lapply(1:nrow(mydf), function(x){
a <- t(combn(mydf[x, ], 2)) # Get some combination
b <- a[, 2:1] # Get other combination by reversing columns
foo <- rbind(a, b)
foo
}) %>%
do.call(rbind, .) -> ana
ana <- data.frame(matrix(unlist(ana), nrow = nrow(ana)))
### Another set: Get indexes for self (e.g., (1,1), (2,2), (3,3))
foo <- rep(1:max(mydf), times = 2)
data.frame(matrix(foo, nrow = length(foo) / 2)) -> bob
cammy <- bind_rows(ana, bob) %>%
group_by(X1, X2) %>%
mutate(total = n()) %>%
as.matrix
### Create a plain matrix
mat <- matrix(0, nrow = max(mydf), ncol = max(mydf))
for(i in 1:nrow(cammy)){
mat[cammy[i, 1], cammy[i, 2]] <- cammy[i, 3]
}
print(mat)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] 1 0 0 2 1 0 1 0 0
# [2,] 0 1 0 0 1 0 0 1 0
# [3,] 0 0 1 0 0 1 0 0 1
# [4,] 2 0 0 1 1 0 1 0 0
# [5,] 1 1 0 1 1 0 0 1 0
# [6,] 0 0 1 0 0 1 0 0 1
# [7,] 1 0 0 1 0 0 1 0 0
# [8,] 0 1 0 0 1 0 0 1 0
# [9,] 0 0 1 0 0 1 0 0 1
Hi I'm pretty much stumped on on trying to figure this out and could use a little help. Basically, I have a n x n matrix where the diagonal is set to a value k and every other value is 0.
1 2 3 4 5
1 k 0 0 0 0
2 0 k 0 0 0
3 0 0 k 0 0
4 0 0 0 k 0
5 0 0 0 0 k
Basically, I need to be able to make two other diagonals in this matrix with the value of 1 so it ends up looking like this:
1 2 3 4 5
1 k 1 0 0 0
2 1 k 1 0 0
3 0 1 k 1 0
4 0 0 1 k 1
5 0 0 0 1 k
So far all I have for code is being able to make the diagonal matrix
m=diag(k,n,n) but I have no idea on how to add the two other diagonals. Would I use apply() and cbind() or rbind()?
You can use col and row to create and index to subset and assign the upper and lower diagonals.
k=3
m <- k* diag(6)
m[abs(row(m) - col(m)) == 1] <- 1
m
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 3 1 0 0 0 0
#[2,] 1 3 1 0 0 0
#[3,] 0 1 3 1 0 0
#[4,] 0 0 1 3 1 0
#[5,] 0 0 0 1 3 1
#[6,] 0 0 0 0 1 3
If you wanted reverse diagonals you could use col(m) - row(m)
Try this function, it will make a matrix of dimensions row X col and diagonal of the numeric n.
matfun <- function(diag=n, row=4,col=4){
x = diag(1,row,col)
diag*x+rbind(as.vector(rep(0,col)),x[1:(row-1),])+cbind(as.vector(rep(0,row)),x[,1:(col-1)])
}
HTH