There are N groups (aka judges, let's say 17), and M elements (let's call them cases, let's say 22) such that 3*M <= 4*N.
N <- LETTERS[1:17]
M <- 1:22
I want to assign each of the N judges 4 or fewer cases, such that each case is evaluated by no more or no fewer than 3 judges, and no judge sees the same case twice.
A : 1, 2, 19
B : 2, 3, 8, 22
...
Q : 1, 2, 12, 10
Any quick and easy way to do it in R?
Tried this so far:
df <- data.frame(ID=rep(M,3))
values <- N
df$values[sample(1:nrow(df), nrow(df), FALSE)] <- rep(values, 4)
Usually when I see "random assignment subject to constraints" questions, my mind goes to the following idea:
Select a random weight for assigning item i to category j (in this case assigning case i to judge j)
Use linear programming to identify the assignments that satisfy all constraints (<= 4 cases/judge and 3 reviews per case) with maximum weight.
This is pretty straightforward in R with a linear programming package like lpSolve, creating a binary variable x_ij that indicates whether we assign case i to judge j for every case/judge pair:
library(lpSolve)
set.seed(144)
# vars is a convenience matrix that tells us the i and j index of each variable in our model
vars <- expand.grid(i=M, j=N)
mod <- lp(direction = "max",
objective.in = rnorm(nrow(vars)),
const.mat = rbind(t(sapply(M, function(i) as.numeric(vars$i == i))),
t(sapply(N, function(j) as.numeric(vars$j == j)))),
const.dir = rep(c("=", "<="), c(length(M), length(N))),
const.rhs = rep(c(3, 4), c(length(M), length(N))),
all.bin = TRUE)
# Extract all cases assigned to each judge
sapply(N, function(j) vars$i[mod$solution > 0.999 & vars$j == j])
# $A
# [1] 2 10 15
#
# $B
# [1] 7 8 13 22
#
# $C
# [1] 2 3 7 9
# ...
By the way we've setup the weights and constraints, this can really be thought of as randomly selecting from all feasible assignments of cases to judges.
Here's what I would do:
set.seed(1)
rM = sample(M)
rN = sample(N)
tasks = rep(rM, each=3)
judges = rep(rN, length.out = length(tasks))
matches = data.frame(judges, tasks)
You can verify that your conditions hold true by tabulating:
tab = with(matches, table(judges, tasks))
max(tab) # 1
addmargins(tab)
tasks
judges 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Sum
A 0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 4
B 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 4
C 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 4
D 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 4
E 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 4
F 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 4
G 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 4
H 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 4
I 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 4
J 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 1 0 0 4
K 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 4
L 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 4
M 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 3
N 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 3
O 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 1 0 0 0 4
P 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 4
Q 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 4
Sum 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 66
Note: Judges close together in rN will draw similar case loads.
GetJudgeCaseList <- function(CaseList, judgeList, casesAllowed, NumJudges) {
e <- new.env()
e$casesLeft <- data.frame(Judges = judgeList, itersLeft = casesAllowed)
e$judgeList = judgeList
doCase <- function(i) {
pickJudges <- function(NumJudges, judgeList) {
CurJudges <- sample(judgeList, NumJudges)
return(CurJudges)
}
case <- pickJudges(NumJudges, e$judgeList)
e$casesLeft[casesLeft$Judges%in%case, 2] <- e$casesLeft[casesLeft$Judges%in%case, 2] - 1
e$judgeList <- e$casesLeft$Judges[e$casesLeft$itersLeft!=0]
return(data.frame(Case = CaseList[i], judges = paste0(case, collapse = ", ")))
}
Cases <- do.call(rbind, lapply(1:length(CaseList), doCase))
return(Cases)
}
GetJudgeCaseList(CaseList = c(1:22), judgeList = N, casesAllowed = 4, NumJudges = 3)
Case judges
1 1 a, h, o
2 2 k, i, j
3 3 j, q, a
4 4 j, n, p
5 5 g, o, n
6 6 q, g, l
7 7 g, d, i
8 8 b, l, f
9 9 m, b, i
10 10 k, m, c
11 11 l, m, p
12 12 m, o, q
13 13 p, g, b
14 14 p, f, b
15 15 l, e, i
16 16 d, h, o
17 17 d, c, q
18 18 a, f, e
19 19 e, d, c
20 20 e, n, k
21 21 a, k, f
22 22 j, n, c
Related
Have a 1000*16 matrix from a simulation with team names as characters. I want to count number of occurrences per team in all 16 columns.
I know I could do apply(test, 2, table) but that makes the data hard to work with afterward since all teams is not included in every column.
If you have a vector that is all the unique team names you could do something like this. I'm counting occurrences here via column to ensure that not every team (in this case letter) is not included.
set.seed(15)
letter_mat <- matrix(
sample(
LETTERS,
size = 1000*16,
replace = TRUE
),
ncol = 16,
nrow = 1000
)
output <- t(
apply(
letter_mat,
1,
function(x) table(factor(x, levels = LETTERS))
)
)
head(output)
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
[1,] 1 2 0 1 1 1 1 0 0 0 1 0 0 0 0 1 1 1 1 1 0 1 1 0 0 1
[2,] 0 1 0 2 2 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 2 2 1
[3,] 1 1 0 0 1 0 1 2 1 0 0 0 0 0 1 0 1 0 1 1 0 0 3 0 1 1
[4,] 0 1 0 0 0 1 0 0 0 2 0 1 0 0 1 1 1 1 2 0 2 3 0 0 0 0
[5,] 2 1 0 0 0 0 0 2 0 2 1 1 1 0 0 2 0 2 1 0 0 1 0 0 0 0
[6,] 0 0 0 0 0 1 3 1 0 0 0 0 1 1 3 0 1 0 0 1 0 0 0 1 0 3
I have a matrix that is for example like this:
rownames V1
a 1
c 3
b 2
d 4
y 2
q 4
i 1
j 1
r 3
I want to make a Symmetric binary matrix that it's dimnames of that is the same as rownames of above matrix. I want to fill these matrix by 1 & 0 in such a way that 1 indicated placing variables that has the same number in front of it and 0 for the opposite situation.This matrix would be like
dimnames
a c b d y q i j r
a 1 0 0 0 0 0 1 1 0
c 0 1 0 0 0 0 0 0 1
b 0 0 1 0 1 0 0 0 0
d 0 0 0 1 0 1 0 0 0
y 0 0 1 0 1 0 0 0 0
q 0 0 0 1 0 1 0 0 0
i 1 0 0 0 0 0 1 1 0
j 1 0 0 0 0 0 1 1 0
r 0 1 0 0 0 0 0 0 1
Anybody know how can I do that?
Use dist:
DF <- read.table(text = "rownames V1
a 1
c 3
b 2
d 4
y 2
q 4
i 1
j 1
r 3", header = TRUE)
res <- as.matrix(dist(DF$V1)) == 0L
#alternatively:
#res <- !as.matrix(dist(DF$V1))
#diag(res) <- 0L #for the first version of the question, i.e. a zero diagonal
res <- +(res) #for the second version, i.e. to coerce to an integer matrix
dimnames(res) <- list(DF$rownames, DF$rownames)
# 1 2 3 4 5 6 7 8 9
#1 1 0 0 0 0 0 1 1 0
#2 0 1 0 0 0 0 0 0 1
#3 0 0 1 0 1 0 0 0 0
#4 0 0 0 1 0 1 0 0 0
#5 0 0 1 0 1 0 0 0 0
#6 0 0 0 1 0 1 0 0 0
#7 1 0 0 0 0 0 1 1 0
#8 1 0 0 0 0 0 1 1 0
#9 0 1 0 0 0 0 0 0 1
You can do this using table and crossprod.
tcrossprod(table(DF))
# rownames
# rownames a b c d i j q r y
# a 1 0 0 0 1 1 0 0 0
# b 0 1 0 0 0 0 0 0 1
# c 0 0 1 0 0 0 0 1 0
# d 0 0 0 1 0 0 1 0 0
# i 1 0 0 0 1 1 0 0 0
# j 1 0 0 0 1 1 0 0 0
# q 0 0 0 1 0 0 1 0 0
# r 0 0 1 0 0 0 0 1 0
# y 0 1 0 0 0 0 0 0 1
If you want the row and column order as they are found in the data, rather than alphanumerically, you can subset
tcrossprod(table(DF))[DF$rownames, DF$rownames]
or use factor
tcrossprod(table(factor(DF$rownames, levels=unique(DF$rownames)), DF$V1))
If your data is large or sparse, you can use the sparse matrix algebra in xtabs, with similar ways to change the order of the resulting table as before.
Matrix::tcrossprod(xtabs(data=DF, ~ rownames + V1, sparse=TRUE))
I've collected data as follow :
A B C D E F G
1 1 0 0 0 0 0 0
1,2 0 1 0 0 0 0 2
1,2,3 0 0 0 0 0 0 0
1,3 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
2,3 4 0 0 0 5 0 0
3 1 3 0 0 0 2 0
4 0 0 0 0 0 0 0
For each Color (A,B,C,D,E,F,G) it corresponds to one or many category at the same time(1,2,3,4) according sample. For many category, there is comma separation.
I want to simplify my data to have it as follows :
A B C D E F G
1 1 1 0 0 0 0 2
3 4 0 0 0 5 2 0
2 4 1 0 0 5 0 2
4 0 0 0 0 0 0 0
is there a simple way (function) to do this ?
Reproducible example :
DF <- read.table(text = " Color Cat
A 1
B 1
C 4,2
D 1,3
E 1,2
F 3
G 5
A 2
B 3
C 1,2
D 4,3
E 3
F 1
G 1" , header = TRUE)
DF = table(DF$Cat,DF$Color)
cats <- strsplit(rownames(DF), ",", fixed = TRUE)
DF <- DF[rep(seq_len(nrow(DF)), sapply(cats, length)),]
DF$cat <- unlist(cats)
DF <- aggregate(. ~ cat, DF, FUN = sum)
DF <- read.table(text = " A B C D E F G
1 1 0 0 0 0 0 0
1,2 0 1 0 0 0 0 2
1,2,3 0 0 0 0 0 0 0
1,3 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
2,3 4 0 0 0 5 0 0
3 1 3 0 0 0 2 0
4 0 0 0 0 0 0 0", header = TRUE)
#split the row names
cats <- strsplit(rownames(DF), ",", fixed = TRUE)
#repeat each row of the DF times the number of cats
DF <- DF[rep(seq_len(nrow(DF)), sapply(cats, length)),]
#add column with cats
DF$cat <- unlist(cats)
#aggregate (your question is unclear regarding how)
DF <- aggregate(. ~ cat, DF, FUN = sum) #or FUN = max???
# cat A B C D E F G
#1 1 1 1 0 0 0 0 2
#2 2 4 1 0 0 5 0 2
#3 3 5 3 0 0 5 2 0
#4 4 0 0 0 0 0 0 0
How can I force the dimensions of a table to be equal in R?
For example:
a <- c(0,1,2,3,4,5,1,3,4,5,3,4,5)
b <- c(1,2,3,3,3,3,3,3,3,3,5,5,6)
c <- table(a,b)
print(c)
# b
#a 1 2 3 5 6
# 0 1 0 0 0 0
# 1 0 1 1 0 0
# 2 0 0 1 0 0
# 3 0 0 2 1 0
# 4 0 0 2 1 0
# 5 0 0 2 0 1
However, I am looking for the following result:
print(c)
# b
#a 0 1 2 3 4 5 6
# 0 0 1 0 0 0 0 0
# 1 0 0 1 1 0 0 0
# 2 0 0 0 1 0 0 0
# 3 0 0 0 2 0 1 0
# 4 0 0 0 2 0 1 0
# 5 0 0 0 2 0 0 1
# 6 0 0 0 0 0 0 0
By using factors. table doesn't know the levels of your variable unless you tell it in some way!
a <- c(0,1,2,3,4,5,1,3,4,5,3,4,5)
b <- c(1,2,3,3,3,3,3,3,3,3,5,5,6)
a <- factor(a, levels = 0:6)
b <- factor(b, levels = 0:6)
table(a,b)
# b
#a 0 1 2 3 4 5 6
# 0 0 1 0 0 0 0 0
# 1 0 0 1 1 0 0 0
# 2 0 0 0 1 0 0 0
# 3 0 0 0 2 0 1 0
# 4 0 0 0 2 0 1 0
# 5 0 0 0 2 0 0 1
# 6 0 0 0 0 0 0 0
Edit The general way to force a square cross-tabulation is to do something like
x <- factor(a, levels = union(a, b))
y <- factor(b, levels = union(a, b))
table(x, y)
Suppose I have a long table like this:
A <- rep(c("a","b","c","d"),each=4)
B <- rep(c("e","f","g","h"),4)
C <- rep(c("i","j"),8)
D <- rnorm(16)
df <- data.frame(A,B,C,D)
head(df)
A B C D
1 a e i -0.18984508
2 a f j -1.82703822
3 a g i -0.17307580
4 a h j -1.38104238
5 b e i 0.08699983
6 b f j -0.36442461
I would like to change to long table to a wide format so that each element in column A and B is a title of a column. Each row should be a 1 or 0 indicating if elements exists. Column C and D remains the same. The desired table is something like this:
C D a b e f g h
i -0.18984508 1 0 1 0 0 0
j -1.82703822 1 0 0 1 0 0
i -0.17307580 1 0 0 0 1 0
j -1.38104238 1 0 0 0 0 1
i 0.08699983 0 1 1 0 0 0
j -0.36442461 0 1 0 1 0 0
This is a form of reshaping which can be done with the reshape2 package.
library("reshape2")
dcast(melt(df, id.vars=c("C", "D")), C+D~value, fun.aggregate=length)
which gives
C D a b c d e f g h
1 i -1.44485242 0 1 0 0 0 0 1 0
2 i -0.80834639 0 0 0 1 0 0 1 0
3 i -0.15202085 0 0 0 1 1 0 0 0
4 i -0.05626233 1 0 0 0 1 0 0 0
5 i 0.12031754 1 0 0 0 0 0 1 0
6 i 0.62206658 0 0 1 0 0 0 1 0
7 i 0.77101891 0 1 0 0 1 0 0 0
8 i 1.38752097 0 0 1 0 1 0 0 0
9 j -2.52137154 0 0 0 1 0 0 0 1
10 j -0.53231537 0 1 0 0 0 0 0 1
11 j -0.30178539 1 0 0 0 0 0 0 1
12 j -0.29823112 1 0 0 0 0 1 0 0
13 j -0.12988540 0 1 0 0 0 1 0 0
14 j 0.00517754 0 0 1 0 0 1 0 0
15 j 0.51452289 0 0 1 0 0 0 0 1
16 j 0.53260223 0 0 0 1 0 1 0 0
The order is not the same as the original data set, but if that is important put an order column in, carry it through, and then sort on it at the end.