R: Print omitted 0's in table() - contingency tables [duplicate]

R: Print omitted 0's in table() - contingency tables [duplicate] - r

I am using the following R code to produce a confusion matrix comparing the true labels of some data to the output of a neural network.
t <- table(as.factor(test.labels), as.factor(nnetpredict))
However, sometimes the neural network doesn't predict any of a certain class, so the table isn't square (as, for example, there are 5 levels in the test.labels factor, but only 3 levels in the nnetpredict factor). I want to make the table square by adding in any factor levels necessary, and setting their counts to zero.
How should I go about doing this?
Example:
> table(as.factor(a), as.factor(b))
1 2 3 4 5 6 7 8 9 10
1 1 0 0 0 0 0 0 1 0 0
2 0 1 0 0 0 0 0 0 1 0
3 0 0 1 0 0 0 0 0 0 1
4 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0
7 0 0 0 0 0 0 1 0 0 0
You can see in the table above that there are 7 rows, but 10 columns, because the a factor only has 7 levels, whereas the b factor has 10 levels. What I want to do is to pad the table with zeros so that the row labels and the column labels are the same, and the matrix is square. From the example above, this would produce:
1 2 3 4 5 6 7 8 9 10
1 1 0 0 0 0 0 0 1 0 0
2 0 1 0 0 0 0 0 0 1 0
3 0 0 1 0 0 0 0 0 0 1
4 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0
7 0 0 0 0 0 0 1 0 0 0
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
The reason I need to do this is two-fold:
For display to users/in reports
So that I can use a function to calculate the Kappa statistic, which requires a table formatted like this (square, same row and col labels)

EDIT - round II to address the additional details in the question. I deleted my first answer since it wasn't relevant anymore.
This has produced the desired output for the test cases I've given it, but I definitely advise testing thoroughly with your real data. The approach here is to find the full list of levels for both inputs into the table and set that full list as the levels before generating the table.
squareTable <- function(x,y) {
x <- factor(x)
y <- factor(y)
commonLevels <- sort(unique(c(levels(x), levels(y))))
x <- factor(x, levels = commonLevels)
y <- factor(y, levels = commonLevels)
table(x,y)
}
Two test cases:
> #Test case 1
> set.seed(1)
> x <- factor(sample(0:9, 100, TRUE))
> y <- factor(sample(3:7, 100, TRUE))
>
> table(x,y)
y
x 3 4 5 6 7
0 2 1 3 1 0
1 1 0 2 3 0
2 1 0 3 4 3
3 0 3 6 3 2
4 4 4 3 2 1
5 2 2 0 1 0
6 1 2 3 2 3
7 3 3 3 4 2
8 0 4 1 2 4
9 2 1 0 0 3
> squareTable(x,y)
y
x 0 1 2 3 4 5 6 7 8 9
0 0 0 0 2 1 3 1 0 0 0
1 0 0 0 1 0 2 3 0 0 0
2 0 0 0 1 0 3 4 3 0 0
3 0 0 0 0 3 6 3 2 0 0
4 0 0 0 4 4 3 2 1 0 0
5 0 0 0 2 2 0 1 0 0 0
6 0 0 0 1 2 3 2 3 0 0
7 0 0 0 3 3 3 4 2 0 0
8 0 0 0 0 4 1 2 4 0 0
9 0 0 0 2 1 0 0 3 0 0
> squareTable(y,x)
y
x 0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 2 1 1 0 4 2 1 3 0 2
4 1 0 0 3 4 2 2 3 4 1
5 3 2 3 6 3 0 3 3 1 0
6 1 3 4 3 2 1 2 4 2 0
7 0 0 3 2 1 0 3 2 4 3
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
>
> #Test case 2
> set.seed(1)
> xx <- factor(sample(0:2, 100, TRUE))
> yy <- factor(sample(3:5, 100, TRUE))
>
> table(xx,yy)
yy
xx 3 4 5
0 4 14 9
1 14 15 9
2 11 11 13
> squareTable(xx,yy)
y
x 0 1 2 3 4 5
0 0 0 0 4 14 9
1 0 0 0 14 15 9
2 0 0 0 11 11 13
3 0 0 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
> squareTable(yy,xx)
y
x 0 1 2 3 4 5
0 0 0 0 0 0 0
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 4 14 11 0 0 0
4 14 15 11 0 0 0
5 9 9 13 0 0 0

Related

How to keep ID in dummyVars()

I would like to do transform Gender and Country using One-Hot-Encoding.
With the code below I can not create the new dataset including the ID
library(caret)
ID<-1:10
Gender<-c("F","F","F","M","M","F","M","M","F","M")
Country<-c("Mali","France","France","Guinea","Senegal",
"Mali","France","Mali","Senegal","France")
data<-data.frame(ID,Gender,Country)
#One hot encoding
dmy <- dummyVars(" ~Gender+Country", data = data, fullRank = T)
dat_transformed <- data.frame(predict(dmy, newdata = data))
dat_transformed
Gender.M Country.Guinea Country.Mali Country.Senegal
1 0 0 1 0
2 0 0 0 0
3 0 0 0 0
4 1 1 0 0
5 1 0 0 1
6 0 0 1 0
7 1 0 0 0
8 1 0 1 0
9 0 0 0 1
10 1 0 0 0
I want to get a dataset that include the ID without enconding it.
ID Gender.M Country.Guinea Country.Mali Country.Senegal
1 1 0 0 1 0
2 2 0 0 0 0
3 3 0 0 0 0
4 4 1 1 0 0
5 5 1 0 0 1
6 6 0 0 1 0
7 7 1 0 0 0
8 8 1 0 1 0
9 9 0 0 0 1
10 10 1 0 0 0

dat_transformed <- cbind(ID,dat_transformed)
dat_transformed
ID Gender.M Country.Guinea Country.Mali Country.Senegal
1 0 0 1 0
2 0 0 0 0
3 0 0 0 0
4 1 1 0 0
5 1 0 0 1
6 0 0 1 0
7 1 0 0 0
8 1 0 1 0
9 0 0 0 1
10 1 0 0 0

How to add multiple columns in R with different condition for each column?

Here is my data set. I would like to add 5 new columns to mydata with 5 different conditions.
mydata=data.frame(sub=rep(c(1:4),c(3,4,5,5)),t=c(1:3,1:4,1:5,1:5),
y.val=c(10,20,13,
5,7,8,0,
45,17,25,12,10,
40,0,0,5,8))
mydata
sub t y.val
1 1 1 10
2 1 2 20
3 1 3 13
4 2 1 5
5 2 2 7
6 2 3 8
7 2 4 0
8 3 1 45
9 3 2 17
10 3 3 25
11 3 4 12
12 3 5 10
13 4 1 40
14 4 2 0
15 4 3 0
16 4 4 5
17 4 5 8
I would like to add the following 5 (max of 't' column) columns as
mydata$It1=ifelse(mydata$t==1 & mydata$y.val>0,1,0)
mydata$It2=ifelse(mydata$t==2 & mydata$y.val>0,1,0)
mydata$It3=ifelse(mydata$t==3 & mydata$y.val>0,1,0)
mydata$It4=ifelse(mydata$t==4 & mydata$y.val>0,1,0)
mydata$It5=ifelse(mydata$t==5 & mydata$y.val>0,1,0)
Here is the expected outcome.
> mydata
sub t y.val It1 It2 It3 It4 It5
1 1 1 10 1 0 0 0 0
2 1 2 20 0 1 0 0 0
3 1 3 13 0 0 1 0 0
4 2 1 5 1 0 0 0 0
5 2 2 7 0 1 0 0 0
6 2 3 8 0 0 1 0 0
7 2 4 0 0 0 0 0 0
8 3 1 45 1 0 0 0 0
9 3 2 17 0 1 0 0 0
10 3 3 25 0 0 1 0 0
11 3 4 12 0 0 0 1 0
12 3 5 10 0 0 0 0 1
13 4 1 40 1 0 0 0 0
14 4 2 0 0 0 0 0 0
15 4 3 0 0 0 0 0 0
16 4 4 5 0 0 0 1 0
17 4 5 8 0 0 0 0 1
I appreciate your help if it can be written as a function using for loop or any other technique.

You could use sapply/lapply
n <- seq_len(5)
mydata[paste0("It", n)] <- +(sapply(n, function(x) mydata$t==x & mydata$y.val>0))
mydata
# sub t y.val It1 It2 It3 It4 It5
#1 1 1 10 1 0 0 0 0
#2 1 2 20 0 1 0 0 0
#3 1 3 13 0 0 1 0 0
#4 2 1 5 1 0 0 0 0
#5 2 2 7 0 1 0 0 0
#6 2 3 8 0 0 1 0 0
#7 2 4 0 0 0 0 0 0
#8 3 1 45 1 0 0 0 0
#9 3 2 17 0 1 0 0 0
#10 3 3 25 0 0 1 0 0
#11 3 4 12 0 0 0 1 0
#12 3 5 10 0 0 0 0 1
#13 4 1 40 1 0 0 0 0
#14 4 2 0 0 0 0 0 0
#15 4 3 0 0 0 0 0 0
#16 4 4 5 0 0 0 1 0
#17 4 5 8 0 0 0 0 1
mydata$t==x & mydata$y.val>0 returns a logical value of TRUE/FALSE based on condition. The + changes those logical values to 1/0 respectively. (Try +c(FALSE, TRUE)). It avoids using ifelse i.e ifelse(condition, 1, 0).

Here's another approach based on multiplying a model matrix by the logical y.val > 0.
df <- cbind(mydata[1:3], model.matrix(~ factor(t) + 0, mydata)*(mydata$y.val>0))
Which gives:
sub t y.val factor.t.1 factor.t.2 factor.t.3 factor.t.4 factor.t.5
1 1 1 10 1 0 0 0 0
2 1 2 20 0 1 0 0 0
3 1 3 13 0 0 1 0 0
4 2 1 5 1 0 0 0 0
5 2 2 7 0 1 0 0 0
6 2 3 8 0 0 1 0 0
7 2 4 0 0 0 0 0 0
8 3 1 45 1 0 0 0 0
9 3 2 17 0 1 0 0 0
10 3 3 25 0 0 1 0 0
11 3 4 12 0 0 0 1 0
12 3 5 10 0 0 0 0 1
13 4 1 40 1 0 0 0 0
14 4 2 0 0 0 0 0 0
15 4 3 0 0 0 0 0 0
16 4 4 5 0 0 0 1 0
17 4 5 8 0 0 0 0 1
To clean up the names you can do:
names(df) <- sub("factor.t.", "It", names(df), fixed = TRUE)

You can use sapply to compare each t for equality against 1:5 and combine this with an & of y.val>0.
within(mydata, It <- +(sapply(1:5, `==`, t) & y.val>0))
# sub t y.val It.1 It.2 It.3 It.4 It.5
#1 1 1 10 1 0 0 0 0
#2 1 2 20 0 1 0 0 0
#3 1 3 13 0 0 1 0 0
#4 2 1 5 1 0 0 0 0
#5 2 2 7 0 1 0 0 0
#6 2 3 8 0 0 1 0 0
#7 2 4 0 0 0 0 0 0
#8 3 1 45 1 0 0 0 0
#9 3 2 17 0 1 0 0 0
#10 3 3 25 0 0 1 0 0
#11 3 4 12 0 0 0 1 0
#12 3 5 10 0 0 0 0 1
#13 4 1 40 1 0 0 0 0
#14 4 2 0 0 0 0 0 0
#15 4 3 0 0 0 0 0 0
#16 4 4 5 0 0 0 1 0
#17 4 5 8 0 0 0 0 1

Here's a tidyverse solution, using pivot_wider:
library(tidyverse)
mydata %>%
mutate(new_col = paste0("It", t),
y_test = as.integer(y.val > 0)) %>%
pivot_wider(id_cols = c(sub, t, y.val),
names_from = new_col,
values_from = y_test,
values_fill = list(y_test = 0))
sub t y.val It1 It2 It3 It4 It5
<int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 10 1 0 0 0 0
2 1 2 20 0 1 0 0 0
3 1 3 13 0 0 1 0 0
4 2 1 5 1 0 0 0 0
5 2 2 7 0 1 0 0 0
6 2 3 8 0 0 1 0 0
7 2 4 0 0 0 0 0 0
8 3 1 45 1 0 0 0 0
9 3 2 17 0 1 0 0 0
10 3 3 25 0 0 1 0 0
11 3 4 12 0 0 0 1 0
12 3 5 10 0 0 0 0 1
13 4 1 40 1 0 0 0 0
14 4 2 0 0 0 0 0 0
15 4 3 0 0 0 0 0 0
16 4 4 5 0 0 0 1 0
17 4 5 8 0 0 0 0 1
Explanation:
Make two columns, new_col (new column names with "It") and y_test (y.val > 0).
Pivot new_col values into column names.
Fill in the NA values with zeros.

One purrr and dplyr option could be:
map_dfc(.x = 1:5,
~ mydata %>%
mutate(!!paste0("It", .x) := as.integer(t == .x & y.val > 0)) %>%
select(starts_with("It"))) %>%
bind_cols(mydata)
It1 It2 It3 It4 It5 sub t y.val
1 1 0 0 0 0 1 1 10
2 0 1 0 0 0 1 2 20
3 0 0 1 0 0 1 3 13
4 1 0 0 0 0 2 1 5
5 0 1 0 0 0 2 2 7
6 0 0 1 0 0 2 3 8
7 0 0 0 0 0 2 4 0
8 1 0 0 0 0 3 1 45
9 0 1 0 0 0 3 2 17
10 0 0 1 0 0 3 3 25
11 0 0 0 1 0 3 4 12
12 0 0 0 0 1 3 5 10
13 1 0 0 0 0 4 1 40
14 0 0 0 0 0 4 2 0
15 0 0 0 0 0 4 3 0
16 0 0 0 1 0 4 4 5
17 0 0 0 0 1 4 5 8
Or if you want to perform it dynamically according the range in t column:
map_dfc(.x = reduce(as.list(range(mydata$t)), `:`),
~ mydata %>%
mutate(!!paste0("It", .x) := as.integer(t == .x & y.val > 0)) %>%
select(starts_with("It"))) %>%
bind_cols(mydata)

How to merge various levels into one attribute after dummy coding the data in R?

I have dummy coded the data in R using the package named "dummies". This gave me the output in various levels and create the dummies for each level. I want to consolidate all those levels on the basis of attributes. Please help me out! Thanks in advance. The following is the code I used:
#To read the data from the working directory
bankfull<-read.csv("bank.csv")
#Calling the library named dummies
library(dummies)
#Redefining the bankfull data with the dummy codes
bankfull<-dummy.data.frame(bankfull,sep=",")
#Viewing the data after dummy coding
print(bankfull)
The output is as follows:
#To read the data from the working directory
> bankfull<-read.csv("bank.csv")
> #Calling the library named dummies
> library(dummies)
dummies-1.5.6 provided by Decision Patterns
> #Redefining the bankfull data with the dummy codes
> bankfull<-dummy.data.frame(bankfull,sep=",")
> #Viewing the data after dummy coding
> View(bankfull)
> library(carData)
> #Viewing the data after dummy coding
> print(bankfull)
CHK_ACCT DURATION HISTORY NEW_CAR USED_CAR FURNITURE RADIO_TV EDUCATION RETRAINING AMOUNT SAV_ACCT EMPLOYMENT INSTALL_RATE MALE_DIV
1 0 6 4 0 0 0 1 0 0 1169 4 4 4 0
2 1 48 2 0 0 0 1 0 0 5951 0 2 2 0
3 3 12 4 0 0 0 0 1 0 2096 0 3 2 0
4 0 42 2 0 0 1 0 0 0 7882 0 3 2 0
5 0 24 3 1 0 0 0 0 0 4870 0 2 3 0
6 3 36 2 0 0 0 0 1 0 9055 4 2 2 0
7 3 24 2 0 0 1 0 0 0 2835 2 4 3 0
8 1 36 2 0 1 0 0 0 0 6948 0 2 2 0
9 3 12 2 0 0 0 1 0 0 3059 3 3 2 1
10 1 30 4 1 0 0 0 0 0 5234 0 0 4 0
11 1 12 2 1 0 0 0 0 0 1295 0 1 3 0
12 0 48 2 0 0 0 0 0 1 4308 0 1 3 0
13 1 12 2 0 0 0 1 0 0 1567 0 2 1 0
14 0 24 4 1 0 0 0 0 0 1199 0 4 4 0
15 0 15 2 1 0 0 0 0 0 1403 0 2 2 0
16 0 24 2 0 0 0 1 0 0 1282 1 2 4 0
17 3 24 4 0 0 0 1 0 0 2424 4 4 4 0
18 0 30 0 0 0 0 0 0 1 8072 4 1 2 0
19 1 24 2 0 1 0 0 0 0 12579 0 4 4 0
20 3 24 2 0 0 0 1 0 0 3430 2 4 3 0
21 3 9 4 1 0 0 0 0 0 2134 0 2 4 0
22 0 6 2 0 0 0 1 0 0 2647 2 2 2 0
23 0 10 4 1 0 0 0 0 0 2241 0 1 1 0
24 1 12 4 0 1 0 0 0 0 1804 1 1 3 0
25 3 10 4 0 0 1 0 0 0 2069 4 2 2 0
26 0 6 2 0 0 1 0 0 0 1374 0 2 1 0
27 3 6 0 0 0 0 1 0 0 426 0 4 4 0
28 2 12 1 0 0 0 1 0 0 409 3 2 3 0
29 1 7 2 0 0 0 1 0 0 2415 0 2 3 0
30 0 60 3 0 0 0 0 0 1 6836 0 4 3 0
31 1 18 2 0 0 0 0 0 1 1913 3 1 3 0
32 0 24 2 0 0 1 0 0 0 4020 0 2 2 0
MALE_SINGLE MALE_MAR_or_WID CO_APPLICANT GUARANTOR PRESENT_RESIDENT REAL_ESTATE PROP_UNKN_NONE AGE OTHER_INSTALL RENT OWN_RES NUM_CREDITS
1 1 0 0 0 4 1 0 67 0 0 1 2
2 0 0 0 0 2 1 0 22 0 0 1 1
3 1 0 0 0 3 1 0 49 0 0 1 1
4 1 0 0 1 4 0 0 45 0 0 0 1
5 1 0 0 0 4 0 1 53 0 0 0 2
6 1 0 0 0 4 0 1 35 0 0 0 1
7 1 0 0 0 4 0 0 53 0 0 1 1
8 1 0 0 0 2 0 0 35 0 1 0 1
9 0 0 0 0 4 1 0 61 0 0 1 1
10 0 1 0 0 2 0 0 28 0 0 1 2
11 0 0 0 0 1 0 0 25 0 1 0 1
12 0 0 0 0 4 0 0 24 0 1 0 1
13 0 0 0 0 1 0 0 22 0 0 1 1
14 1 0 0 0 4 0 0 60 0 0 1 2
15 0 0 0 0 4 0 0 28 0 1 0 1
16 0 0 0 0 2 0 0 32 0 0 1 1
17 1 0 0 0 4 0 0 53 0 0 1 2
18 1 0 0 0 3 0 0 25 1 0 1 3
19 0 0 0 0 2 0 1 44 0 0 0 1
20 1 0 0 0 2 0 0 31 0 0 1 1
21 1 0 0 0 4 0 0 48 0 0 1 3
22 1 0 0 0 3 1 0 44 0 1 0 1
23 1 0 0 0 3 1 0 48 0 1 0 2
24 1 0 0 0 4 0 0 44 0 0 1 1
25 0 1 0 0 1 0 0 26 0 0 1 2
26 1 0 0 0 2 1 0 36 1 0 1 1
27 0 1 0 0 4 0 0 39 0 0 1 1
28 0 0 0 0 3 1 0 42 0 1 0 2
29 1 0 0 1 2 1 0 34 0 0 1 1
30 1 0 0 0 4 0 1 63 0 0 1 2
31 0 1 0 0 3 1 0 36 1 0 1 1
32 1 0 0 0 2 0 0 27 1 0 1 1
JOB NUM_DEPENDENTS TELEPHONE FOREIGN RESPONSE
1 2 1 1 0 1
2 2 1 0 0 0
3 1 2 0 0 1
4 2 2 0 0 1
5 2 2 0 0 0
6 1 2 1 0 1
7 2 1 0 0 1
8 3 1 1 0 1
9 1 1 0 0 1
10 3 1 0 0 0
11 2 1 0 0 0
12 2 1 0 0 0
13 2 1 1 0 1
14 1 1 0 0 0
15 2 1 0 0 1
16 1 1 0 0 0
17 2 1 0 0 1
18 2 1 0 0 1
19 3 1 1 0 0
20 2 2 1 0 1
21 2 1 1 0 1
22 2 2 0 0 1
23 1 2 0 1 1
24 2 1 0 0 1
25 2 1 0 1 1
26 1 1 1 0 1
27 1 1 0 0 1
28 2 1 0 0 1
29 2 1 0 0 1
30 2 1 1 0 0
31 2 1 1 0 1
32 2 1 0 0 1
[ reached getOption("max.print") -- omitted 968 rows ]
In the output the dummies are given according to the levels. I need the output according to the attributes. Can I merge all of them without using any third party packages?

How to sum leading diagonal of table in R

I have a table created using the table() command in R:
y
x 0 1 2 3 4 5 6 7 8 9
0 23 0 0 0 0 1 0 0 0 0
1 0 23 1 0 1 0 1 2 0 2
2 1 1 28 0 0 0 1 0 2 2
3 0 1 0 24 0 1 0 0 0 1
4 1 1 0 0 34 0 3 0 0 0
5 0 0 0 0 0 33 0 0 0 0
6 0 0 0 0 0 2 32 0 0 0
7 0 1 0 1 0 0 0 36 0 1
8 1 1 1 1 0 0 0 1 20 1
9 1 3 0 1 0 1 0 1 0 24
This table shows the results of a classification, and I want to sum the leading diagonal of it (the diagonal with the large numbers - like 23, 23, 28 etc). Is there a sensible/easy way to do this in R?

How about sum(diag(tbl)), where tbl is your table?

Force `table` to include all factors from both arrays in R

I am using the following R code to produce a confusion matrix comparing the true labels of some data to the output of a neural network.
t <- table(as.factor(test.labels), as.factor(nnetpredict))
However, sometimes the neural network doesn't predict any of a certain class, so the table isn't square (as, for example, there are 5 levels in the test.labels factor, but only 3 levels in the nnetpredict factor). I want to make the table square by adding in any factor levels necessary, and setting their counts to zero.
How should I go about doing this?
Example:
> table(as.factor(a), as.factor(b))
1 2 3 4 5 6 7 8 9 10
1 1 0 0 0 0 0 0 1 0 0
2 0 1 0 0 0 0 0 0 1 0
3 0 0 1 0 0 0 0 0 0 1
4 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0
7 0 0 0 0 0 0 1 0 0 0
You can see in the table above that there are 7 rows, but 10 columns, because the a factor only has 7 levels, whereas the b factor has 10 levels. What I want to do is to pad the table with zeros so that the row labels and the column labels are the same, and the matrix is square. From the example above, this would produce:
1 2 3 4 5 6 7 8 9 10
1 1 0 0 0 0 0 0 1 0 0
2 0 1 0 0 0 0 0 0 1 0
3 0 0 1 0 0 0 0 0 0 1
4 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0
7 0 0 0 0 0 0 1 0 0 0
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
The reason I need to do this is two-fold:
For display to users/in reports
So that I can use a function to calculate the Kappa statistic, which requires a table formatted like this (square, same row and col labels)

EDIT - round II to address the additional details in the question. I deleted my first answer since it wasn't relevant anymore.
This has produced the desired output for the test cases I've given it, but I definitely advise testing thoroughly with your real data. The approach here is to find the full list of levels for both inputs into the table and set that full list as the levels before generating the table.
squareTable <- function(x,y) {
x <- factor(x)
y <- factor(y)
commonLevels <- sort(unique(c(levels(x), levels(y))))
x <- factor(x, levels = commonLevels)
y <- factor(y, levels = commonLevels)
table(x,y)
}
Two test cases:
> #Test case 1
> set.seed(1)
> x <- factor(sample(0:9, 100, TRUE))
> y <- factor(sample(3:7, 100, TRUE))
>
> table(x,y)
y
x 3 4 5 6 7
0 2 1 3 1 0
1 1 0 2 3 0
2 1 0 3 4 3
3 0 3 6 3 2
4 4 4 3 2 1
5 2 2 0 1 0
6 1 2 3 2 3
7 3 3 3 4 2
8 0 4 1 2 4
9 2 1 0 0 3
> squareTable(x,y)
y
x 0 1 2 3 4 5 6 7 8 9
0 0 0 0 2 1 3 1 0 0 0
1 0 0 0 1 0 2 3 0 0 0
2 0 0 0 1 0 3 4 3 0 0
3 0 0 0 0 3 6 3 2 0 0
4 0 0 0 4 4 3 2 1 0 0
5 0 0 0 2 2 0 1 0 0 0
6 0 0 0 1 2 3 2 3 0 0
7 0 0 0 3 3 3 4 2 0 0
8 0 0 0 0 4 1 2 4 0 0
9 0 0 0 2 1 0 0 3 0 0
> squareTable(y,x)
y
x 0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 2 1 1 0 4 2 1 3 0 2
4 1 0 0 3 4 2 2 3 4 1
5 3 2 3 6 3 0 3 3 1 0
6 1 3 4 3 2 1 2 4 2 0
7 0 0 3 2 1 0 3 2 4 3
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
>
> #Test case 2
> set.seed(1)
> xx <- factor(sample(0:2, 100, TRUE))
> yy <- factor(sample(3:5, 100, TRUE))
>
> table(xx,yy)
yy
xx 3 4 5
0 4 14 9
1 14 15 9
2 11 11 13
> squareTable(xx,yy)
y
x 0 1 2 3 4 5
0 0 0 0 4 14 9
1 0 0 0 14 15 9
2 0 0 0 11 11 13
3 0 0 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
> squareTable(yy,xx)
y
x 0 1 2 3 4 5
0 0 0 0 0 0 0
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 4 14 11 0 0 0
4 14 15 11 0 0 0
5 9 9 13 0 0 0

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R: Print omitted 0's in table() - contingency tables [duplicate] - r

Related

How to keep ID in dummyVars()

How to add multiple columns in R with different condition for each column?

How to merge various levels into one attribute after dummy coding the data in R?

How to sum leading diagonal of table in R

Force `table` to include all factors from both arrays in R

Categories

Resources