How many rows in a matrix do not contain a zero? - r

This is my matrix in R:
[,1] [,2] [,3]
[1,] 5 0 0
[2,] 0 0 5
[3,] 0 2 3
[4,] 1 2 2
[5,] 5 0 0
[6,] 1 4 0
[7,] 4 1 0
[8,] 0 0 5
[9,] 1 2 2
[10,] 3 2 0
[11,] 4 0 1
mat <- structure(c(5L, 0L, 0L, 1L, 5L, 1L, 4L, 0L, 1L, 3L, 4L, 0L, 0L,
2L, 2L, 0L, 4L, 1L, 0L, 2L, 2L, 0L, 0L, 5L, 3L, 2L, 0L, 0L, 0L,
5L, 2L, 0L, 1L), .Dim = c(11L, 3L))
I need find how many rows do not contain a zero, in this case the answer is 2:
the 4th row (1,2,2)
the 9th row, also(1,2,2)
Is there is a command for this or should I make a routine? I tried with two for() loops, but it's bad.

quick answer:
sum( 0 < apply(mat,1,prod) )
also:
nonzerorows <- 0 < apply(mat,1,prod) # logical selector of rows
mat[ nonzerorows, ]
mat[!nonzerorows, ]
which(nonzerorows)
sum(nonzerorows)
OP's data:
mat <- structure(c(5L, 0L, 0L, 1L, 5L, 1L, 4L, 0L, 1L, 3L, 4L, 0L, 0L,
2L, 2L, 0L, 4L, 1L, 0L, 2L, 2L, 0L, 0L, 5L, 3L, 2L, 0L, 0L, 0L,
5L, 2L, 0L, 1L), .Dim = c(11L, 3L))

mat <- matrix(sample(0:4, 16, replace=T), 4, 4)
mat
# [,1] [,2] [,3] [,4]
# [1,] 4 1 2 2
# [2,] 3 3 1 1
# [3,] 1 2 4 4
# [4,] 0 4 4 4
apply(mat, 1, function(x) all(x!=0))
# [1] TRUE TRUE TRUE FALSE
which(apply(mat, 1, function(x) all(x!=0)))
# [1] 1 2 3

A more general approach, e.g. when you're concerned about other numbers or elements could be this:
sum(apply(mat,1,function(x) {0 %in% x == F}))

Related

choice experiment data: mlogit exercise 3 "error in reshapelong.... 'varying arguments must be same length'

Following Exercise 3 of the mlogit package https://cran.r-project.org/web/packages/mlogit/vignettes/e3mxlogit.html, but attempting to use my own data (see below)
structure(list(Choice.Set = c(4L, 5L, 7L, 8L, 10L, 12L), Alternative = c(2L,
1L, 1L, 2L, 2L, 2L), respondent = c(1L, 1L, 1L, 1L, 1L, 1L),
code = c(7L, 9L, 13L, 15L, 19L, 23L), Choice = c(1L, 1L,
1L, 1L, 1L, 1L), price1 = c(0L, 0L, 1L, 1L, 0L, 0L), price2 = c(0L,
1L, 0L, 0L, 1L, 1L), price3 = c(0L, 0L, 0L, 0L, 0L, 0L),
price4 = c(1L, 0L, 0L, 0L, 0L, 0L), price5 = c(0L, 0L, 0L,
0L, 0L, 0L), zone1 = c(0L, 0L, 0L, 1L, 1L, 1L), zone2 = c(0L,
0L, 0L, 0L, 0L, 0L), zone3 = c(1L, 0L, 1L, 0L, 0L, 0L), zone4 = c(0L,
1L, 0L, 0L, 0L, 0L), lic1 = c(0L, 0L, 0L, 0L, 0L, 0L), lic2 = c(1L,
0L, 1L, 0L, 1L, 1L), lic3 = c(0L, 1L, 0L, 1L, 0L, 0L), enf1 = c(0L,
0L, 1L, 0L, 1L, 0L), enf2 = c(0L, 0L, 0L, 1L, 0L, 1L), enf3 = c(1L,
1L, 0L, 0L, 0L, 0L), chid = 1:6), row.names = c(4L, 5L, 7L,
8L, 10L, 12L), class = "data.frame")
I have run into an error when running the code:
dfml <- dfidx(df, idx=list(c("chid", "respondent")),
choice="Alternative", varying=6:20, sep ="")
"Error in reshapeLong(data, idvar = idvar, timevar = timevar, varying = varying, :
'varying' arguments must be the same length"
I have check the data and each col from 6:20 is the same length, however, some respondents chose some of the options more than the others. Can someone possibly point out where I have gone wrong? It's my first attempt at analyzing choice experiment data.
The error means, that your price has five options, whereas the others, zone, lic, enf have less. dfidx obviously can't handle that. You need to provide them, at least as NA columns.
df <- transform(df, zone5=NA, lic4=NA, lic5=NA, enf4=NA, enf5=NA)
library(mlogit)
dfml <- dfidx(df, idx=list(c("chid","respondent")), choice="Alternative",
varying=grep('^price|^zone|^lic|^enf', names(df)), sep="")
dfml
# ~~~~~~~
# first 10 observations out of 30
# ~~~~~~~
# Choice.Set Alternative code Choice price zone lic enf idx
# 1 4 FALSE 7 1 0 0 0 0 1:1
# 2 4 TRUE 7 1 0 0 1 0 1:2
# 3 4 FALSE 7 1 0 1 0 1 1:3
# 4 4 FALSE 7 1 1 0 NA NA 1:4
# 5 4 FALSE 7 1 0 NA NA NA 1:5
# 6 5 TRUE 9 1 0 0 0 0 2:1
# 7 5 FALSE 9 1 1 0 0 0 2:2
# 8 5 FALSE 9 1 0 0 1 1 2:3
# 9 5 FALSE 9 1 0 1 NA NA 2:4
# 10 5 FALSE 9 1 0 NA NA NA 2:5
#
# ~~~ indexes ~~~~
# chid respondent id2
# 1 1 1 1
# 2 1 1 2
# 3 1 1 3
# 4 1 1 4
# 5 1 1 5
# 6 2 1 1
# 7 2 1 2
# 8 2 1 3
# 9 2 1 4
# 10 2 1 5
# indexes: 1, 1, 2
I use grep here to identify the varying= columns. Get rid of the habit of lazily specifying variables as numbers; it's dangerous since order might change easily with small changes in the script.

Calculating co-voting within and between groups

I have a square matrix with information on the co-voting behavior between individuals (15x15 in toy example below). The rows and columns of the matrix are arranged according to the groups the individuals belong to (A, B or C). The entries indicate whether or not two individuals voted the same way 50% of the time (possible entries: 1, 0, NaN).
I need to calculate the rate/fraction of co-voting within and between groups. The resulting matrix in the toy example should be a 3x3 matrix with A, B, C on the rows and columns and values ranging from 0 to 1. How can I do this using for loops?
A A A A A B B B B B C C C C C
A 1 1 1 1 1 1 0 0 0 0 0 1 1 1 0
A 1 1 1 1 1 1 0 0 0 0 0 1 1 1 0
A 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0
A 1 1 1 1 1 1 0 0 0 0 0 1 1 1 0
A 1 1 1 1 1 1 0 0 0 0 0 1 1 1 0
B 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1
B 0 0 0 0 0 1 1 1 1 1 1 0 0 1 1
B 0 0 0 0 0 1 1 1 1 1 1 1 0 1 1
B 0 0 0 0 0 0 1 1 1 1 1 0 0 0 1
B 0 0 0 0 0 1 1 1 1 1 1 0 0 1 1
C 0 0 1 0 0 1 1 1 1 1 1 1 0 0 1
C 1 1 1 1 1 1 0 1 0 0 1 1 1 0 1
C 1 1 1 1 1 1 0 0 0 0 0 1 1 1 0
C 1 1 1 1 1 1 1 1 0 1 0 0 1 1 0
C 0 0 0 0 0 1 1 1 1 1 1 1 0 0 1
If your matrix is called m, you could do
groups <- unique(colnames(m))
res <- matrix(0, 3, 3, dimnames = list(groups, groups))
for(i in groups) {
for(j in groups) {
mat <- m[rownames(m) %in% i, colnames(m) %in% j]
res[rownames(res) %in% i, colnames(res) %in% j] <- sum(mat) / length(mat)
}
}
res
#> A B C
#> A 1.00 0.20 0.64
#> B 0.20 0.92 0.68
#> C 0.64 0.68 0.60
Created on 2022-06-02 by the reprex package (v2.0.1)
Data taken from question in reproducible format
m <- structure(c(1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L,
0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L,
1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L,
1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L,
0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L,
1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L,
0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L,
0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L,
0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 0L, 0L, 1L), dim = c(15L, 15L), dimnames = list(c("A", "A",
"A", "A", "A", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C"
), c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "C", "C",
"C", "C", "C")))

How do you select rows based on the values of a column?

Probably the question is very simple, but I'm pretty new in R Studio. I have a matrix with a lot of scores of a given team. So it looks like this:
H/A Goals Goals Against Goals H Goals Against H Goals A Goals Against A
[1,] 1 2 1 2 1 -1 -1
[2,] 0 0 2 -1 -1 0 2
[3,] 1 1 0 1 0 -1 -1
[4,] 0 3 2 -1 -1 3 2
[5,] 1 0 1 0 1 -1 -1
[6,] 0 1 3 -1 -1 1 3
>
Where in the column1 (H/A), 1 correspond for Home Games and 0 for Away games.
How could I remove the rows when there is a -1 in both column 4 and column 5? I don't want to have those rows because i want to do some maths only using home or away games.
Try this
old_dataframe<-data.frame(x=rnorm(100),y=rpois(100,1),z=rnorm(100),q=rnorm(100),l=rpois(100,1))
new_dataframe <- old_dataframe[old_dataframe[,4] > 0 & old_dataframe[,5]>0, ]
Because you are dealing with a matrix, you have to provide column numbers (column 4 and column 5) and create a condition as an intersection of two subconditions:
m <- m[m[,4] != -1 & m[,5] != -1,]
Data
m <- structure(c(1L, 0L, 1L, 0L, 1L, 0L, 2L, 0L, 1L, 3L, 0L, 1L, 1L,
2L, 0L, 2L, 1L, 3L, 2L, -1L, 1L, -1L, 0L, -1L, 1L, -1L, 0L, -1L,
1L, -1L, -1L, 0L, -1L, 3L, -1L, 1L, -1L, 2L, -1L, 2L, -1L, 3L
), .Dim = 6:7, .Dimnames = list(NULL, c("a", "b", "c", "d", "e",
"f", "g")))

Handling zeros in Cramer's V for contingency table

I'm following the vcd docs where assocstats is called on an xtable call on multiple subsets of a data frame. However, I get NaNs with a specific subset because the expected observations for many cases is 0:
factor.2
factor.1 0 1 2 3 4 5 or more
0 0 12 7 1 0 1
1 0 2 1 1 0 0
2 0 8 2 1 0 0
3 0 5 4 0 0 0
4 0 1 2 2 0 0
5 0 6 8 0 0 0
6 0 5 3 0 0 0
7 0 5 1 0 0 0
8 0 5 4 0 1 0
9 0 1 1 0 1 0
10 0 5 6 0 0 1
temp.table <- structure(c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 12L,
2L, 8L, 5L, 1L, 6L, 5L, 5L, 5L, 1L, 5L, 7L, 1L, 2L, 4L, 2L, 8L,
3L, 1L, 4L, 1L, 6L, 1L, 1L, 1L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L), .Dim = c(11L, 6L), .Dimnames = structure(list(
factor.1 = c("0", "1", "2", "3", "4", "5", "6", "7", "8",
"9", "10"), factor.2 = c("0", "1", "2", "3", "4", "5 or more"
)), .Names = c("factor.1", "factor.2")), class = c("xtabs",
"table"), call = xtabs(data = cases.limited, na.action = na.omit))
library(vcd)
assocstats(temp.table)
X^2 df P(> X^2)
Likelihood Ratio 35.004 50 0.94676
Pearson NaN 50 NaN
Phi-Coefficient : NA
Contingency Coeff.: NaN
Cramer's V : NaN
Is there a way to quickly and efficiently avoid including these cases in the analysis without extensive rewriting of some of what assocstats or xtable do? I understand that there is arguably less statistical power, but Cramer's V is already an optimistic estimator, so the results will still be useful to me.

How to extract dimension of a submatrix from a bigger matrix using R?

I have the following matrix:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 0 0 0 0
[4,] 1 1 1 1 0 0 0 0 0 0
[5,] 1 1 1 1 0 0 0 0 0 0
[6,] 1 1 1 1 0 0 0 0 0 0
[7,] 1 1 1 1 0 0 0 0 0 0
[8,] 1 1 1 1 0 0 0 0 0 0
[9,] 1 1 1 1 0 0 0 0 0 0
[10,] 1 1 1 1 0 0 0 0 0 0
and I would like to know how to extract the 7x4 dimension of the submatrix with elements equal to 1.
Similar to JDLs answer, but giving you the sub matrix dimensions directly:
mat <- structure(c(
0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
), .Dim = c(10L, 10L), .Dimnames = list(NULL, NULL))
dim(mat[apply(mat, 1, any), apply(mat, 2, any)])
#[1] 7 4
This will remove rows and columns containing only zeros. If you want to keep rows and columns containing at least one 1, you could do:
mat[3, 5] <- 2 #just to complicate it a little
f <- function(x) any(x==1) #create a simple function
dim(mat[apply(mat, 1, f), apply(mat, 2, f)])
#[1] 7 4
You are effectively asking "how many rows and columns have a one in them"? These questions are answered easiest using apply:
apply(M,1,any)
apply(M,2,any)
will tell you the number of rows and columns respectively which contain anything non-zero.
If testing for non-zero-ness isn't really your problem, replace any with a function that will return TRUE for the desired rows and FALSE otherwise.
If you can't guarantee that the ones form a submatrix (i.e. they aren't in a rectangular formation) then you will need to do some more work than this.
You can try:
apply(which(matrix==1, arr.ind = T), 2, function(x) length(unique(x)))
row col
7 4
You could coerce to a sparse matrix and extract the indices slots:
library(Matrix)
m <- as(M, "TsparseMatrix")
#row dim:
diff(range(m#i)) + 1L
#[[1] 7
#column dim:
diff(range(m#j)) + 1L
#[1] 4
I expect this to be quite efficient and it might be useful to store/treat your matrix as a sparse matrix anyway.

Resources