I have a square matrix with information on the co-voting behavior between individuals (15x15 in toy example below). The rows and columns of the matrix are arranged according to the groups the individuals belong to (A, B or C). The entries indicate whether or not two individuals voted the same way 50% of the time (possible entries: 1, 0, NaN).
I need to calculate the rate/fraction of co-voting within and between groups. The resulting matrix in the toy example should be a 3x3 matrix with A, B, C on the rows and columns and values ranging from 0 to 1. How can I do this using for loops?
A A A A A B B B B B C C C C C
A 1 1 1 1 1 1 0 0 0 0 0 1 1 1 0
A 1 1 1 1 1 1 0 0 0 0 0 1 1 1 0
A 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0
A 1 1 1 1 1 1 0 0 0 0 0 1 1 1 0
A 1 1 1 1 1 1 0 0 0 0 0 1 1 1 0
B 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1
B 0 0 0 0 0 1 1 1 1 1 1 0 0 1 1
B 0 0 0 0 0 1 1 1 1 1 1 1 0 1 1
B 0 0 0 0 0 0 1 1 1 1 1 0 0 0 1
B 0 0 0 0 0 1 1 1 1 1 1 0 0 1 1
C 0 0 1 0 0 1 1 1 1 1 1 1 0 0 1
C 1 1 1 1 1 1 0 1 0 0 1 1 1 0 1
C 1 1 1 1 1 1 0 0 0 0 0 1 1 1 0
C 1 1 1 1 1 1 1 1 0 1 0 0 1 1 0
C 0 0 0 0 0 1 1 1 1 1 1 1 0 0 1
If your matrix is called m, you could do
groups <- unique(colnames(m))
res <- matrix(0, 3, 3, dimnames = list(groups, groups))
for(i in groups) {
for(j in groups) {
mat <- m[rownames(m) %in% i, colnames(m) %in% j]
res[rownames(res) %in% i, colnames(res) %in% j] <- sum(mat) / length(mat)
}
}
res
#> A B C
#> A 1.00 0.20 0.64
#> B 0.20 0.92 0.68
#> C 0.64 0.68 0.60
Created on 2022-06-02 by the reprex package (v2.0.1)
Data taken from question in reproducible format
m <- structure(c(1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L,
0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L,
1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L,
1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L,
0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L,
1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L,
0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L,
0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L,
0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 0L, 0L, 1L), dim = c(15L, 15L), dimnames = list(c("A", "A",
"A", "A", "A", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C"
), c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "C", "C",
"C", "C", "C")))
Related
I have a datset that looks something like this:
> head(BurnData)
Treatment Gender Race Surface head buttock trunk up.leg low.leg resp.tract type ex.time excision antib.time antibiotic
1 0 0 0 15 0 0 1 1 0 0 2 12 0 12 0
2 0 0 1 20 0 0 1 0 0 0 4 9 0 9 0
3 0 0 1 15 0 0 0 1 1 0 2 13 0 13 0
4 0 0 0 20 1 0 1 0 0 0 2 11 1 29 0
5 0 0 1 70 1 1 1 1 0 0 2 28 1 31 0
6 0 0 1 20 1 0 1 0 0 0 4 11 0 11 0
inf.time infection
1 12 0
2 9 0
3 7 1
4 29 0
5 4 1
6 8 1
I want to run a Cox's Regression on variables Surface, ex.time and, antib.time and treatment. Treatment is an indicator variable. Surface denotes the % of body burned. ex.time and antib.time both record time to event in days.
I am aware that to run a time dependent Cox's Regression i need to convert the data in longitudinal structure, but how can i do it in R?
then i will use the forluma:
coxph(formula = Surv(tstart, tstop, infection) ~ covariate)
DATA
> dput(head(BurnData))
structure(list(Treatment = c(0L, 0L, 0L, 0L, 0L, 0L), Gender = c(0L,
0L, 0L, 0L, 0L, 0L), Race = c(0L, 1L, 1L, 0L, 1L, 1L), Surface = c(15L,
20L, 15L, 20L, 70L, 20L), head = c(0L, 0L, 0L, 1L, 1L, 1L), buttock = c(0L,
0L, 0L, 0L, 1L, 0L), trunk = c(1L, 1L, 0L, 1L, 1L, 1L), up.leg = c(1L,
0L, 1L, 0L, 1L, 0L), low.leg = c(0L, 0L, 1L, 0L, 0L, 0L), resp.tract = c(0L,
0L, 0L, 0L, 0L, 0L), type = c(2L, 4L, 2L, 2L, 2L, 4L), ex.time = c(12L,
9L, 13L, 11L, 28L, 11L), excision = c(0L, 0L, 0L, 1L, 1L, 0L),
antib.time = c(12L, 9L, 13L, 29L, 31L, 11L), antibiotic = c(0L,
0L, 0L, 0L, 0L, 0L), inf.time = c(12L, 9L, 7L, 29L, 4L, 8L
), infection = c(0L, 0L, 1L, 0L, 1L, 1L), Surface_discr = structure(c(1L,
1L, 1L, 1L, 2L, 1L), .Label = c("1", "2"), class = "factor"),
ex.time_discr = c(1L, 1L, 1L, 1L, 2L, 1L), antib.time_discr = c(1L,
1L, 1L, 2L, 2L, 1L)), .Names = c("Treatment", "Gender", "Race",
"Surface", "head", "buttock", "trunk", "up.leg", "low.leg", "resp.tract",
"type", "ex.time", "excision", "antib.time", "antibiotic", "inf.time",
"infection", "Surface_discr", "ex.time_discr", "antib.time_discr"
), row.names = c(NA, 6L), class = "data.frame")
My data looks like this, all columns with binary presence/absence data:
POP1 POP2 POP3 T1 T2 T3 T4 T5 T6 T7 T8 T9
1 1 0 1 1 1 1 0 1 0 0 1
1 0 1 0 1 1 0 1 1 0 1 1
1 1 0 1 1 1 1 0 0 1 0 1
0 0 0 0 1 1 0 1 0 1 1 0
1 0 1 0 0 1 1 1 0 1 1 0
0 1 0 0 1 1 1 0 0 0 0 1
0 1 0 1 1 0 1 0 0 0 0 0
1 1 1 0 1 0 0 0 1 0 0 0
0 0 0 0 1 1 1 1 1 0 0 1
1 0 0 1 0 1 0 1 0 1 1 1
1 1 0 0 1 0 1 0 0 1 0 0
1 0 1 0 1 1 1 0 1 0 1 0
0 1 0 1 1 1 1 0 0 0 0 0
1 0 0 0 1 1 0 0 0 0 1 1
The POP1:POP3 are populations, and I need counts of all 1's for all T1:T9 for all POP1=1, POP2=1 and POP3=1. I need a table that crosstabulates my data like this:
T1 T2 T3 T4 T5 T6 T7 T8 T9
POP1=1 3 9 7 5 3 4 4 5 5
POP2=1 4 7 8 6 2 3 2 0 3
POP3=1 0 3 4 2 2 2 1 3 1
Don't bother checking the aggregated counts, they're not necessarily correct. I've have tried lots of synthaxes without getting what I want. Thankful for some guidance.
You need the matrix multiplication %*% here:
t(df[1:3]) %*% as.matrix(df[4:12])
T1 T2 T3 T4 T5 T6 T7 T8 T9
POP1 3 7 7 5 3 4 4 5 5
POP2 4 7 4 6 0 2 2 0 3
POP3 0 3 3 2 2 3 1 3 1
df = structure(list(POP1 = c(1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L,
1L, 1L, 0L, 1L), POP2 = c(1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L,
0L, 1L, 0L, 1L, 0L), POP3 = c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L,
0L, 0L, 0L, 1L, 0L, 0L), T1 = c(1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L,
0L, 1L, 0L, 0L, 1L, 0L), T2 = c(1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L,
1L, 0L, 1L, 1L, 1L, 1L), T3 = c(1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L,
1L, 1L, 0L, 1L, 1L, 1L), T4 = c(1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L,
1L, 0L, 1L, 1L, 1L, 0L), T5 = c(0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 0L), T6 = c(1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L,
1L, 0L, 0L, 1L, 0L, 0L), T7 = c(0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L,
0L, 1L, 1L, 0L, 0L, 0L), T8 = c(0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L,
0L, 1L, 0L, 1L, 0L, 1L), T9 = c(1L, 1L, 1L, 0L, 0L, 1L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 1L)), .Names = c("POP1", "POP2", "POP3",
"T1", "T2", "T3", "T4", "T5", "T6", "T7", "T8", "T9"), class = "data.frame",
row.names = c(NA, -14L))
library(reshape2)
df = melt(df, id.vars = colnames(df)[-(1:3)] )
do.call(rbind, lapply(split(df, df$variable), function(x)
apply(x[x$value == 1,1:9], 2, function(y) sum(y))))
# T1 T2 T3 T4 T5 T6 T7 T8 T9
#POP1 3 7 7 5 3 4 4 5 5
#POP2 4 7 4 6 0 2 2 0 3
#POP3 0 3 3 2 2 3 1 3 1
This is a question about the fourthcorner algorithm in R. It's designed to measure the relationship between three different tables: an n x m table (table R) of m environmental variables (columns) at n sites (rows), an n x p table (table L) of p abundances (columns) at n sites (rows), and a p x s table (table Q) of s traits (columns) for p species (rows).
The fourthcorner function is in the package ade4.
All three of my dataframes are binary (0s and 1s denoting the presence or absence of a variable, a species at a site, or a trait, respectively). I've tried using "yes" and "no" instead of 0s and 1s without success.
Here are some example matrices in the format I'm using:
tabQ
Trait1 Trait2 Trait3 Trait4
Sp1 0 1 0 0
Sp2 0 1 0 0
Sp3 1 0 1 0
Sp4 1 0 1 0
Sp5 0 1 0 0
Sp6 0 1 0 0
Sp7 0 0 0 1
Sp8 0 0 0 1
tabR
EnV1 EnV2 EnV3 EnV4
Site1 1 1 1 1
Site2 1 1 0 1
Site3 0 1 0 1
Site4 1 1 1 1
Site5 1 1 0 1
Site6 0 1 0 0
Site7 0 1 0 1
Site8 0 1 0 1
Site9 1 1 1 1
Site10 1 1 0 1
Site11 1 1 1 1
Site12 0 1 0 0
Site13 1 1 0 1
Site14 1 1 0 1
Site15 0 1 0 1
Site16 1 1 0 1
Site17 0 1 0 1
Site18 1 1 1 1
Site19 1 1 0 1
Site20 1 1 0 1
tabL
Sp1 Sp2 Sp3 Sp4 Sp5 Sp6 Sp7 Sp8
Site1 1 1 0 0 0 0 0 0
Site2 1 1 0 0 0 0 0 0
Site3 1 1 0 0 0 0 0 0
Site4 1 0 0 0 0 0 0 1
Site5 1 1 0 0 0 0 0 0
Site6 1 0 0 0 1 0 0 0
Site7 1 0 0 0 0 0 0 0
Site8 0 0 0 0 1 0 0 0
Site9 1 0 0 0 0 0 0 0
Site10 1 1 0 0 0 0 0 0
Site11 0 0 1 1 0 0 0 0
Site12 0 0 0 0 0 1 0 0
Site13 1 0 0 0 0 0 0 0
Site14 0 0 0 0 1 0 0 0
Site15 1 1 0 0 0 0 0 0
Site16 1 1 0 0 0 0 0 0
Site17 1 0 0 0 0 0 0 0
Site18 0 0 1 0 0 0 0 0
Site19 1 0 0 0 0 0 0 0
Site20 1 1 0 0 0 0 1 0
I read these dataframes into R from text files, and I specify that the first column is row names.
This is the error I get when I try to use the fourthcorner function on my matrices:
fourth1=fourthcorner(tabR,tabL,tabQ,nrepet=1)
Error in apply(sim, 2, function(x) length(na.omit(x))) :
dim(X) must have a positive length
I don't understand where the problem lies, is it a formatting issue? If so, should I reformat one of the matrices? Which one is causing the trouble? Or can I not use binary traits and environmental variables for this function? In other words, can I solve this problem by changing a piece of code, or is it impossible to use this function for this question?
As an additional tidbit of information, I did email the author of the function, but unfortunately I did not understand his response fully, possibly because my R skills still leave much to be desired. Here is his response if it is helpful:
Q could contain quantitative or qualitative traits. In R, qualitative traits should be coded as factors to obtain adapted statistics (i.e. chi2 or eta2). If you code qualitative variables as dummy variables, they would be considered as quantitative.
Thank you very much to any and all insight.
I noted that your example fails only nrepet is equal to one, so if you can use any other positive number you should be fine.
However, if you do need nrepet=1, you should contact with the author of ade4 and ask to him/her to fix the fourthcorner function code. I traced back the error and found that fourthcorner calls as.krandtest with sim = res$tabD[-1,] where res$tabD is a matrix with nrepet+1 rows. When nrepet=1 and you remove one row from a two-row matrix, R automatically converts the resulting one-row matrix into a vector, but as.krandtest function expects sim to be a matrix and thus raises the error.
Here is your input data just in case somebody else would like to answer your question:
tabR
structure(list(EnV1 = c(1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L,
1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L), EnV2 = c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), EnV3 = c(1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L), EnV4 = c(1L, 1L, 1L, 1L, 1L,
0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("EnV1",
"EnV2", "EnV3", "EnV4"), row.names = c("Site1", "Site2", "Site3",
"Site4", "Site5", "Site6", "Site7", "Site8", "Site9", "Site10",
"Site11", "Site12", "Site13", "Site14", "Site15", "Site16", "Site17",
"Site18", "Site19", "Site20"), class = "data.frame")
tabL
structure(list(Sp1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L,
0L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L), Sp2 = c(1L, 1L, 1L,
0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L,
1L), Sp3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L), Sp4 = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
Sp5 = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L), Sp6 = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
), Sp7 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), Sp8 = c(0L, 0L, 0L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L)), .Names = c("Sp1", "Sp2", "Sp3", "Sp4", "Sp5", "Sp6",
"Sp7", "Sp8"), row.names = c("Site1", "Site2", "Site3", "Site4",
"Site5", "Site6", "Site7", "Site8", "Site9", "Site10", "Site11",
"Site12", "Site13", "Site14", "Site15", "Site16", "Site17", "Site18",
"Site19", "Site20"), class = "data.frame")
tabQ
structure(list(Trait1 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L), Trait2 = c(1L,
1L, 0L, 0L, 1L, 1L, 0L, 0L), Trait3 = c(0L, 0L, 1L, 1L, 0L, 0L,
0L, 0L), Trait4 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L)), .Names = c("Trait1",
"Trait2", "Trait3", "Trait4"), row.names = c("Sp1", "Sp2", "Sp3",
"Sp4", "Sp5", "Sp6", "Sp7", "Sp8"), class = "data.frame")
If i have
ex1 <-
structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 1L,
1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L,
0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L,
1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), .Dim = c(10L, 12L
), .Dimnames = list(c("q1", "q2", "q3", "q4", "q5", "q6", "q7",
"q8", "q9", "q10"), c("q1", "q2", "q3", "q4", "q5", "q6", "q7",
"q8", "q9", "q10", "q11", "q12")))
and
ex2 <-
structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 1L,
1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L,
0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L,
1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), .Dim = c(10L, 12L
), .Dimnames = list(c("q4", "q7", "q10", "q9", "q2", "q1", "q6",
"q3", "q5", "q8"), c("q12", "q9", "q10", "q6", "q5", "q7", "q4",
"q1", "q11", "q2", "q3", "q8")))
How can I make the row and column order of ex2 match ex1 and vice versa .
I tried methods in this post but to no avail.
Just use the rownames and colnames to subset. [ is pretty flexible:
ex2[rownames(ex1),colnames(ex1)]
q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12
q1 1 0 0 1 1 1 1 0 1 1 1 1
q2 0 1 0 1 1 1 1 0 1 1 0 1
q3 1 1 1 1 1 1 1 0 1 1 0 1
q4 0 0 0 1 1 0 0 0 0 0 0 1
q5 1 1 1 1 1 1 1 0 1 1 1 1
q6 1 0 0 1 1 1 1 0 1 1 1 1
q7 1 0 0 0 0 0 1 0 1 1 0 1
q8 1 1 1 1 1 1 1 1 1 1 1 1
q9 0 0 1 0 1 1 1 0 1 1 1 1
q10 1 0 0 0 0 1 0 1 1 0 1 1
ex1[rownames(ex2),colnames(ex2)]
q12 q9 q10 q6 q5 q7 q4 q1 q11 q2 q3 q8
q4 0 1 0 1 1 0 1 1 1 1 1 0
q7 0 1 0 1 1 1 1 1 0 1 1 1
q10 1 1 1 1 1 1 1 1 1 1 1 1
q9 0 1 1 1 1 1 1 1 1 1 1 1
q2 0 0 0 1 0 0 0 1 0 1 1 1
q1 0 0 0 0 1 1 0 1 0 0 0 0
q6 0 1 0 1 1 1 1 1 0 1 1 1
q3 1 1 0 0 0 0 1 1 0 1 0 1
q5 0 0 1 1 1 1 1 1 0 1 1 0
q8 0 0 1 1 1 1 1 1 1 1 1 1
I have a table that looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
586 0 0 0 1 0 0 0 1 3 1 0 1 0 0 0 0 0 1 0 2 0 3 0 0 0 4 0 1 2 0
637 0 0 0 0 0 0 2 3 2 2 0 4 0 0 0 0 1 0 0 2 0 1 1 1 0 0 0 0 0 1
989 0 0 1 0 0 0 2 1 0 0 0 2 1 0 0 1 2 1 0 3 0 2 0 1 1 0 1 0 1 0
1081 0 0 0 1 0 0 1 0 1 1 0 0 2 0 0 0 0 0 0 3 0 5 0 0 2 1 0 1 1 1
2922 0 1 1 1 0 0 0 2 1 0 0 0 2 0 0 0 1 1 0 1 0 3 1 1 2 0 0 1 0 1
3032 0 1 0 0 0 0 0 3 0 0 1 0 2 1 0 1 0 1 1 0 0 3 1 1 1 1 0 0 1 1
Numbers 1 to 30 in the first row are my labels, and the columns are my items. I would like to find, for each item, the label with the most counts. E.g. 586 has 4 counts of 26, which is the highest number in that row, so for 586, I would like to assign 26.
I am able to get the maximum value for each row with max(table1[1,])), which gets me the maximum value for first row, but doesn't get me the label it corresponds to, but I don't know how to proceed. All help is appreciated!
dput:
structure(c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L,
0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 1L, 0L, 0L, 1L, 3L, 1L,
0L, 2L, 3L, 3L, 2L, 0L, 1L, 1L, 0L, 1L, 2L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 1L, 4L, 2L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 2L,
2L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 1L, 0L, 1L, 2L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L,
0L, 0L, 0L, 0L, 1L, 2L, 2L, 3L, 3L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 3L, 1L, 2L, 5L, 3L, 3L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L,
0L, 1L, 1L, 0L, 0L, 1L, 2L, 2L, 1L, 4L, 0L, 0L, 1L, 0L, 1L, 0L,
0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 2L, 0L, 1L, 1L, 0L,
1L, 0L, 1L, 0L, 1L, 1L, 1L), .Dim = c(6L, 30L), .Dimnames = structure(list(
c("586", "637", "989", "1081", "2922", "3032"), c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23",
"24", "25", "26", "27", "28", "29", "30")), .Names = c("",
"")))
max.col will give you vector of column numbers which correspond to maximum value for each row.
> max.col(df, tie='first')
[1] 26 12 20 22 22 8
You can use that vector to get column names for each row.
> colnames(df)[max.col(df, tie='first')]
[1] "26" "12" "20" "22" "22" "8"
Perhaps you are looking for which.max. Assuming your matrix is called "temp":
> apply(temp, 1, which.max)
586 637 989 1081 2922 3032
26 12 20 22 22 8
apply with MARGIN = 1 (the second argument) will apply a function by row.