Find the proportion of even numbers per row - r

I have a matrix containing 5 columns and 20 rows. For each row, I want to find the proportion of even numbers that row has and write it per row. My trouble is finding the proportion of even numbers.
So here is a part of the output:
1 2 3 4 5
[1,] 6 5 1 2 5
x <- apply(matrix, 1, length(matrix %% 2 == 0)/5)
matrix <- cbind(matrix, x)

take a look in ?"%%". Here an example:
## reproducible example
set.seed(1)
mat <- matrix(
sample(1:10,5*20,replace = TRUE),
nrow = 20, ncol = 5, byrow = TRUE)
## 1- convert matrix to a logical one using %%
## 2- compute occurrence of TRUE value using the vectorised rowSums
## 3- divide by the number of column to convert occurrence to proportions
rowSums(mat %% 2 ==0)/ncol(mat)

Related

How to randomly pick a % of observations from a matrix in R?

I have a matrix, and I want to randomly pick 10% of the elements in the matrix, and store these elements into a dataframe indexing row, column and value.
I want to remark that I am interested in randomly sampling both row and column, so I am not interested in partial solutions to sample 10% of the rows and picking all the columns, or the other way around, sampling 10% of the columns and picking all the rows.
For example,
M = matrix(rnorm(30), 10, 3)
Given this matrix, that has 30 different elements, I would like to randomly sample 10% of them (0.1 * 30 = 3) and store those in a dataset of the form
row column value
4 2 x
7 1 x
2 1 x
You can use sample from a vector from seq_along and get the row and column using arrayInd and cbind this with the value of the matrix.
i <- sample(seq_along(M), length(M) %/% 10)
cbind(arrayInd(i, dim(M)), M[i])
#cbind(arrayInd(i, dim(M), c("row", "column"), TRUE), value = M[i]) #Alternative with names
# [,1] [,2] [,3]
#[1,] 5 1 -0.72818419
#[2,] 9 1 1.14609041
#[3,] 2 2 0.01162598
View the 2d matrix as a long 1d array,i.e. ravel it logically, not flattern it.
Then get 0.1 * len(matrix) * len(matrix[0]) rand numbers from 0 to len(matrix-1)*len(matrix[0]-1)
For example, you have a rand number: randVar,
then it can be represented in the form of row and col
row = randVar / len(matrix[0])
col = randVar % len(matrix[0])

Index a matrix using two equal-length vectors for row and column indices

Imagine you have a 5x5 matrix and want to select values using two equal-length vectors, one representing the index row and the other representing the index column:
m <- matrix(1:25, ncol = 5)
R <- c(1, 3, 5)
C <- c(2, 4, 4)
The desired output here is a single vector of of 6 (row 1, column 2), 18 (row 3, column 4), 20 (row 5, column 4).
The best I could come up with was:
diag(m[R,C])
But this constructs a matrix of size length(R) * length(C) and so is slow if R and C are long.
How do you do this?
You can use cbind(R, C) for indexing with [:
m[cbind(R,C)]
#> [1] 6 18 20
using mapply
mapply(function(r,c) {m[r,c]}, r = R, c = C)
# [1] 6 18 20
creating a matrix for indices.
m[matrix(c(R,C), ncol=2)]
# [1] 6 18 20
Another option is
M[do.call(cbind, list(R, C))]

Can a column be a vector or list class?

I'm working with multiple response questions in a survey, and I have a character column that contains values that look like "1,2,3" and "1,4,5". The participants click all values that apply, and I"m given this result.
What is the best solution to deal with this problem? Should I create new columns that tell me if a value in that list is present or not? Or can I create a column that has a list/vector class?
One can't say what is best without knowing the purpose but storing them as indicator columns, i.e. one 0/1 column per option, would let you perform a regression or tabulate them easily. Here we convert x into a 0/1 matrix m and then consider what fraction of respondents answered yes to each question and we also regress with them in various ways of which two are shown, take various correlations and plots.
We also show a plot based on applying stack from to the list representation so it might be useful to use more than one representation and convert among them.
x <- c("1,2,3", "1,4,5")
m <- t(+outer(1:5, lapply(strsplit(x, ","), as.numeric), Vectorize(`%in%`)))
colMeans(m)
y <- 1:2
lm(y ~ m+0)
lapply(1:5, function(i) glm(m[, i] ~ y, family = binomial()))
cor(m)
cor(t(m))
heatmap(m)
stk <- stack(setNames(lapply(strsplit(x, ","), as.numeric), seq_along(x)))
plot(stk)
Here is a data frame with 4 different possibilities:
library(dst) # encode/decode
DF <- data.frame(x, stringsAsFactors = FALSE)
DF$list <- strsplit(x, ",")
DF <- cbind(DF, m, code = apply(m, 1, decode, base = 2))
DF
## x list 1 2 3 4 5 code
## 1 1,2,3 1, 2, 3 1 1 1 0 0 28
## 2 1,4,5 1, 4, 5 1 0 0 1 1 19
Note that decode converts 0/1 values into a numeric value and encode can be used to reverse that:
t(encode(base = rep(2, 5), c(28, 19)))
## [,1] [,2] [,3] [,4] [,5]
## r 1 1 1 0 0
## 1 0 0 1 1

how to iterate through each element in a matrix in r

Context: I am iterating through several variables in my dataset, and performing a pairwise t.test between the factors for each of those variables. ( which i have succesfully managed to perform). and example of the result i have is as so:
Table of P-values between classes 11,12,13 and 14
My next task with which i am having difficulty with is presenting each of those values as a table where for each element, if its value is below a certain threshold (say .05) then the table should dynamically display if the test between the two classes passes ( represented by a 1 if below 0.05 and a 0 if above 0.05) the table should also display a ratio of the number of tests passed as a proportion of the number of tests conducted. ( number of entries in the table below 0.05 over the total number of entries in the diagonal matrix). In reference to the image above the output should look like this:
Ideal Matrix
And so the problem, is essentially that i have to iterate through the first matrix (exclude the first row and first column), apply a function then generate a new row and header with a row and column summary! Any help or advice would be appreciated.
R is not really a useful tool to build such a table, but here is one solution.
Data (shortened the decimals for convenience):
mat <- matrix(c(.569, .0001, .1211, NA, .0001, .3262, NA, NA, .0001), nrow = 3)
[,1] [,2] [,3]
[1,] 0.5690 NA NA
[2,] 0.0001 0.0001 NA
[3,] 0.1211 0.3262 1e-04
First we convert to the 0,1 scheme by using ifelse with the condition < .05:
mat <- ifelse(mat < .05, 1, 0)
Then we add another column with the rowSums:
mat <- cbind(mat, rowSums(mat, na.rm = T))
Then we add another row with the colSums of the boolean matrix !is.na(mat), therefore counting the numbers of non NA per column:
mat <- rbind(mat, colSums(!is.na(mat)))
Then we change the lower right cell to the sum of the inner matrix divided by the amount of non NA of the inner matrix:
mat[nrow(mat), ncol(mat)] <- sum(mat[1:nrow(mat)-1, 1:ncol(mat)-1], na.rm = T)/
sum(!is.na(mat[1:nrow(mat)-1, 1:ncol(mat)-1]))
Finally, we change the row and column names:
rownames(mat) <- c(12:14, "SumCount")
colnames(mat) <- c(11:13, "SumScore")
End result:
> mat
11 12 13 SumScore
12 0 NA NA 0.0
13 1 1 NA 2.0
14 0 0 1 1.0
SumCount 3 2 1 0.5
Notice that no looping was necessary, as R is very efficient with vectorized operations on matrices.
Here is one way of doing what you want.
First I will make up a matrix.
set.seed(3781)
pval <- matrix(runif(9, 0, 0.07), 3)
is.na(pval) <- upper.tri(pval)
dimnames(pval) <- list(12:14, 11:13)
Now the question.
Ideal <- matrix(as.integer(pval < 0.05), nrow(pval))
dimnames(Ideal) <- dimnames(pval)
Ideal
# 11 12 13
#12 1 NA NA
#13 1 1 NA
#14 1 0 0
r <- sum(Ideal, na.rm = TRUE)/sum(!is.na(Ideal))
r
#[1] 0.6666667
So now all what is needed is to add the extra row and column.
Ideal <- rbind(Ideal, colSums(!is.na(Ideal)))
Ideal <- cbind(Ideal, rowSums(Ideal, na.rm = TRUE))
Ideal[nrow(pval) + 1, ncol(pval) + 1] <- r
rownames(Ideal)[nrow(pval) + 1] <- "SumCount"
colnames(Ideal)[nrow(pval) + 1] <- "SumScore"

cumulative sum for rows upto select row number followed by summing up columns matrix in R

I am trying to perform following kind of summation on a matrix:
Let's say the matrix is:
mat <- matrix(c(1:5,rep(0,7),c(1:7),rep(0,5),c(1:10), 0,0), 12,3)
I want to do cumulative sum on rows up to row numbers 5, 7, 10 for column numbers 1,2,3 respectively. (The real data can have arbitrary number of rows and columns).
For now, I have been using following code:
sum1 <- matrix(rep(0, 36), 12, 3)
row_index <- c(5,7,10)
for (k in 1:3) {
sum1[1:row_index[k], k] <- cumsum(mat[1:row_index[k], k])
}
sum1 <- matrix(apply(sum1,1,sum))
To start with, I have the matrix and row_index. I want to avoid using the loop as the data has a lot of columns. I am wondering if there is a way to do that.
depth <- c(5,7,10)
mapply( function(x,y) cumsum(mat[1:x, y]), depth, seq_along(depth) )
[[1]]
[1] 1 3 6 10 15
[[2]]
[1] 1 3 6 10 15 21 28
[[3]]
[1] 1 3 6 10 15 21 28 36 45 55
First, define a function:
sumcolumn <- function(rows, columns, mat){
cumsum(mat[1:rows, columns])
}
then use mapply on your vectors of columns/rows:
mapply(sumcolumn, rows = c(5, 7, 10), columns = c(1, 2, 3), MoreArgs = list(mat = mat))

Resources