Extend table by adding missing values [duplicate] - r

This question already has an answer here:
Include levels of zero count in result of table()
(1 answer)
Closed 8 years ago.
I need to extend a table in R language.
result 3 4 5 6 7 8
5 6 29 295 104 6 0
6 1 9 112 238 66 5
7 0 0 5 29 40 6
Should be extended to
result 1 2 3 4 5 6 7 8 9 10
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0
5 0 0 6 29 295 104 6 0 0 0
6 0 0 1 9 112 238 66 5 0 0
7 0 0 0 0 5 29 40 6 0 0
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
So I need add zeros in missing values. Also, in alternative scenario an output as a matrix (10x10) with the same data would be satisfying.
EDIT:
table(factor(x, levels = 1:10), factor(y, levels = 1:10)) worked perfectly.

As the guys in the comments mentioned. Factoring works perfectly.
table(factor(x, levels = 1:10), factor(y, levels = 1:10))

Related

R: sorting a matrix? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have following R matrix:
> matrix
8 4 3 5 7 2 1 6 ...
8 0 0 1 0 0 0 0 0
4 1 0 1 1 0 2 0 0
3 5 0 0 1 0 0 0 0
5 0 0 1 0 0 3 0 0
7 0 0 0 0 0 0 0 0
2 3 4 1 0 0 7 0 0
1 8 0 4 0 0 0 8 0
6 9 0 1 0 0 0 0 0
...
[ reached getOption("max.print") -- omitted 23 rows ]
Question: Is it possible to sort the matrix rows and columns, so that
1 2 3 4 5 6 7 8 ...
1 ...
2
3
4
5
6
7
8
...
?
I only found this here and wondered if there is not better native option.
Thanks!
Assuming you are refering to the row and column names of the matrix such as in this example matrix
m<-matrix(scan(text="
0 0 1 0 0 0 0 0
1 0 1 1 0 2 0 0
5 0 0 1 0 0 0 0
0 0 1 0 0 3 0 0
0 0 0 0 0 0 0 0
3 4 1 0 0 7 0 0
8 0 4 0 0 0 8 0
9 0 1 0 0 0 0 0"), ncol=8)
colnames(m)<-c(8,4,3,5,7,2,1,6)
rownames(m)<-c(8,4,3,5,7,2,1,6)
You could sort the rows and columns by name with
m[, sort(colnames(m))][sort(rownames(m)), ]
Row and column names are always treated as strings. So if you have larger numbers, you may want to convert to numeric before sorting: sort(as.numeric(colnames(m)))
You can also use order() function and pick up rows and columns by positions:
mat[order(rownames(mat)),order(colnames(mat))]
# 1 2 3 4 5 6 7 8
#1 8 0 4 0 0 0 0 8
#2 0 7 1 4 0 0 0 3
#3 0 0 0 0 1 0 0 5
#4 0 2 1 0 1 0 0 1
#5 0 3 1 0 0 0 0 0
#6 0 0 1 0 0 0 0 9
#7 0 0 0 0 0 0 0 0
#8 0 0 1 0 0 0 0 0

Merge/match two data frames

I would like to merge two data frames, y$genes and symbol_annotations, by
the row names of y and the second column, "hgnc_symbol", of symbol_annotations, and create a column labeled "Symbol", y$genes$Symbol, listing all of the matches. If there is no match between "hgnc_symbol" and the row name, I would like for 'NA' to populate instead of an empty cell. I keep getting an error because the two data frames aren't of the same dimensions and contain NAs, and I'm not sure how to correct it.
>read.counts <- read.table("gene_counts.txt", header=TRUE)
>row.names(read.counts) <- read.counts$Geneid
>treatment <- factor(treatment)
> head(treatment)
[1] T0 IL2 IL2.ZA IL2.OKT3 IL2.OKT3.ZA T0
Levels: T0 IL2 IL2.OKT3 IL2.OKT3.ZA IL2.ZA
>y <- DGEList(read.counts, group=treatment, genes=read.counts)
>head(y$genes)
SM01 SM02 SM03 SM04 SM05 SM06 SM07 SM08 SM09 SM10 SM11 SM12 SM13 SM14 SM15 SM16 SM17 SM18 SM19
ENSG00000223972 0 1 1 1 0 0 1 0 0 3 0 0 1 2 0 0 0 0 1
ENSG00000227232 33 31 13 15 20 43 36 32 43 43 61 42 92 73 80 64 33 25 28
ENSG00000278267 1 0 1 0 0 5 3 1 1 2 1 0 2 4 6 0 2 2 1
ENSG00000243485 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0
ENSG00000237613 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ENSG00000268020 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
SM20 SM21 SM22 SM23 SM24 SM25 SM26 SM27 SM28 SM29 SM30
ENSG00000223972 0 0 0 0 1 0 0 0 0 0 0
ENSG00000227232 15 60 13 29 22 28 87 42 61 67 74
ENSG00000278267 2 3 5 1 3 4 4 3 2 4 3
ENSG00000243485 0 0 0 0 0 1 0 0 0 0 1
ENSG00000237613 0 0 0 0 0 0 0 0 0 0 0
ENSG00000268020 0 0 0 0 0 0 0 0 0 0 0
>head(symbol_annotations, n=10)
ensembl_gene_id hgnc_symbol
1 ENSG00000210049 MT-TF
2 ENSG00000211459 MT-RNR1
3 ENSG00000210077 MT-TV
4 ENSG00000210082 MT-RNR2
5 ENSG00000209082 MT-TL1
6 ENSG00000198888 MT-ND1
7 ENSG00000210100 MT-TI
8 ENSG00000223795 <NA>
9 ENSG00000210107 MT-TQ
10 ENSG00000210112 MT-TM
>dim(symbol_annotations)
[1] 58069 2
>dim(y$genes)
[1] 58051 30
>y$genes$Symbol <- merge((rownames(y)), symbol_annotations[,c(2)])
Error in if (n > 0) c(NA_integer_, -n) else integer() :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In rep.fac * nx : NAs produced by integer overflow
2: In .set_row_names(as.integer(prod(d))) :
NAs introduced by coercion to integer range

R: Print omitted 0's in table() - contingency tables [duplicate]

I am using the following R code to produce a confusion matrix comparing the true labels of some data to the output of a neural network.
t <- table(as.factor(test.labels), as.factor(nnetpredict))
However, sometimes the neural network doesn't predict any of a certain class, so the table isn't square (as, for example, there are 5 levels in the test.labels factor, but only 3 levels in the nnetpredict factor). I want to make the table square by adding in any factor levels necessary, and setting their counts to zero.
How should I go about doing this?
Example:
> table(as.factor(a), as.factor(b))
1 2 3 4 5 6 7 8 9 10
1 1 0 0 0 0 0 0 1 0 0
2 0 1 0 0 0 0 0 0 1 0
3 0 0 1 0 0 0 0 0 0 1
4 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0
7 0 0 0 0 0 0 1 0 0 0
You can see in the table above that there are 7 rows, but 10 columns, because the a factor only has 7 levels, whereas the b factor has 10 levels. What I want to do is to pad the table with zeros so that the row labels and the column labels are the same, and the matrix is square. From the example above, this would produce:
1 2 3 4 5 6 7 8 9 10
1 1 0 0 0 0 0 0 1 0 0
2 0 1 0 0 0 0 0 0 1 0
3 0 0 1 0 0 0 0 0 0 1
4 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0
7 0 0 0 0 0 0 1 0 0 0
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
The reason I need to do this is two-fold:
For display to users/in reports
So that I can use a function to calculate the Kappa statistic, which requires a table formatted like this (square, same row and col labels)
EDIT - round II to address the additional details in the question. I deleted my first answer since it wasn't relevant anymore.
This has produced the desired output for the test cases I've given it, but I definitely advise testing thoroughly with your real data. The approach here is to find the full list of levels for both inputs into the table and set that full list as the levels before generating the table.
squareTable <- function(x,y) {
x <- factor(x)
y <- factor(y)
commonLevels <- sort(unique(c(levels(x), levels(y))))
x <- factor(x, levels = commonLevels)
y <- factor(y, levels = commonLevels)
table(x,y)
}
Two test cases:
> #Test case 1
> set.seed(1)
> x <- factor(sample(0:9, 100, TRUE))
> y <- factor(sample(3:7, 100, TRUE))
>
> table(x,y)
y
x 3 4 5 6 7
0 2 1 3 1 0
1 1 0 2 3 0
2 1 0 3 4 3
3 0 3 6 3 2
4 4 4 3 2 1
5 2 2 0 1 0
6 1 2 3 2 3
7 3 3 3 4 2
8 0 4 1 2 4
9 2 1 0 0 3
> squareTable(x,y)
y
x 0 1 2 3 4 5 6 7 8 9
0 0 0 0 2 1 3 1 0 0 0
1 0 0 0 1 0 2 3 0 0 0
2 0 0 0 1 0 3 4 3 0 0
3 0 0 0 0 3 6 3 2 0 0
4 0 0 0 4 4 3 2 1 0 0
5 0 0 0 2 2 0 1 0 0 0
6 0 0 0 1 2 3 2 3 0 0
7 0 0 0 3 3 3 4 2 0 0
8 0 0 0 0 4 1 2 4 0 0
9 0 0 0 2 1 0 0 3 0 0
> squareTable(y,x)
y
x 0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 2 1 1 0 4 2 1 3 0 2
4 1 0 0 3 4 2 2 3 4 1
5 3 2 3 6 3 0 3 3 1 0
6 1 3 4 3 2 1 2 4 2 0
7 0 0 3 2 1 0 3 2 4 3
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
>
> #Test case 2
> set.seed(1)
> xx <- factor(sample(0:2, 100, TRUE))
> yy <- factor(sample(3:5, 100, TRUE))
>
> table(xx,yy)
yy
xx 3 4 5
0 4 14 9
1 14 15 9
2 11 11 13
> squareTable(xx,yy)
y
x 0 1 2 3 4 5
0 0 0 0 4 14 9
1 0 0 0 14 15 9
2 0 0 0 11 11 13
3 0 0 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
> squareTable(yy,xx)
y
x 0 1 2 3 4 5
0 0 0 0 0 0 0
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 4 14 11 0 0 0
4 14 15 11 0 0 0
5 9 9 13 0 0 0

How to get a square table?

I've got the following code to create a classification table in R:
> table(class = class1, truth = valid[,1])
1 2 3 4 5 6 7 8 9 10 11 12
1 357 73 0 0 47 0 5 32 20 0 4 7
2 25 71 0 0 23 4 1 0 2 1 8 3
3 1 2 120 1 5 0 1 0 0 0 0 0
4 0 0 0 77 0 0 0 0 1 0 0 0
5 15 27 0 0 67 6 7 0 4 1 5 7
6 1 2 0 0 2 44 0 0 0 7 7 0
7 1 1 0 0 10 0 66 0 1 0 1 7
9 1 0 0 0 3 0 0 2 8 0 0 2
10 1 1 0 0 1 6 0 0 0 17 0 0
11 0 7 0 0 3 1 0 0 0 4 10 2
12 0 1 0 0 1 0 0 0 0 0 0 1
However, I need this table to be a square (line 8 is missing in this example), i.e. the number of rows should equal the number of columns, and I need the rownames and colnames to be preserved. The missing line should be filled with zeros. Any way of doing this?
The problem most probably comes from a difference in levels.
Try copying the levels from valid to class1:
class1 <- factor(class1, levels=levels(valid[,1])
table(class = class1, truth = valid[,1])

Force `table` to include all factors from both arrays in R

I am using the following R code to produce a confusion matrix comparing the true labels of some data to the output of a neural network.
t <- table(as.factor(test.labels), as.factor(nnetpredict))
However, sometimes the neural network doesn't predict any of a certain class, so the table isn't square (as, for example, there are 5 levels in the test.labels factor, but only 3 levels in the nnetpredict factor). I want to make the table square by adding in any factor levels necessary, and setting their counts to zero.
How should I go about doing this?
Example:
> table(as.factor(a), as.factor(b))
1 2 3 4 5 6 7 8 9 10
1 1 0 0 0 0 0 0 1 0 0
2 0 1 0 0 0 0 0 0 1 0
3 0 0 1 0 0 0 0 0 0 1
4 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0
7 0 0 0 0 0 0 1 0 0 0
You can see in the table above that there are 7 rows, but 10 columns, because the a factor only has 7 levels, whereas the b factor has 10 levels. What I want to do is to pad the table with zeros so that the row labels and the column labels are the same, and the matrix is square. From the example above, this would produce:
1 2 3 4 5 6 7 8 9 10
1 1 0 0 0 0 0 0 1 0 0
2 0 1 0 0 0 0 0 0 1 0
3 0 0 1 0 0 0 0 0 0 1
4 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0
7 0 0 0 0 0 0 1 0 0 0
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
The reason I need to do this is two-fold:
For display to users/in reports
So that I can use a function to calculate the Kappa statistic, which requires a table formatted like this (square, same row and col labels)
EDIT - round II to address the additional details in the question. I deleted my first answer since it wasn't relevant anymore.
This has produced the desired output for the test cases I've given it, but I definitely advise testing thoroughly with your real data. The approach here is to find the full list of levels for both inputs into the table and set that full list as the levels before generating the table.
squareTable <- function(x,y) {
x <- factor(x)
y <- factor(y)
commonLevels <- sort(unique(c(levels(x), levels(y))))
x <- factor(x, levels = commonLevels)
y <- factor(y, levels = commonLevels)
table(x,y)
}
Two test cases:
> #Test case 1
> set.seed(1)
> x <- factor(sample(0:9, 100, TRUE))
> y <- factor(sample(3:7, 100, TRUE))
>
> table(x,y)
y
x 3 4 5 6 7
0 2 1 3 1 0
1 1 0 2 3 0
2 1 0 3 4 3
3 0 3 6 3 2
4 4 4 3 2 1
5 2 2 0 1 0
6 1 2 3 2 3
7 3 3 3 4 2
8 0 4 1 2 4
9 2 1 0 0 3
> squareTable(x,y)
y
x 0 1 2 3 4 5 6 7 8 9
0 0 0 0 2 1 3 1 0 0 0
1 0 0 0 1 0 2 3 0 0 0
2 0 0 0 1 0 3 4 3 0 0
3 0 0 0 0 3 6 3 2 0 0
4 0 0 0 4 4 3 2 1 0 0
5 0 0 0 2 2 0 1 0 0 0
6 0 0 0 1 2 3 2 3 0 0
7 0 0 0 3 3 3 4 2 0 0
8 0 0 0 0 4 1 2 4 0 0
9 0 0 0 2 1 0 0 3 0 0
> squareTable(y,x)
y
x 0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 2 1 1 0 4 2 1 3 0 2
4 1 0 0 3 4 2 2 3 4 1
5 3 2 3 6 3 0 3 3 1 0
6 1 3 4 3 2 1 2 4 2 0
7 0 0 3 2 1 0 3 2 4 3
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
>
> #Test case 2
> set.seed(1)
> xx <- factor(sample(0:2, 100, TRUE))
> yy <- factor(sample(3:5, 100, TRUE))
>
> table(xx,yy)
yy
xx 3 4 5
0 4 14 9
1 14 15 9
2 11 11 13
> squareTable(xx,yy)
y
x 0 1 2 3 4 5
0 0 0 0 4 14 9
1 0 0 0 14 15 9
2 0 0 0 11 11 13
3 0 0 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
> squareTable(yy,xx)
y
x 0 1 2 3 4 5
0 0 0 0 0 0 0
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 4 14 11 0 0 0
4 14 15 11 0 0 0
5 9 9 13 0 0 0

Resources