how to iterate through each element in a matrix in r - r

Context: I am iterating through several variables in my dataset, and performing a pairwise t.test between the factors for each of those variables. ( which i have succesfully managed to perform). and example of the result i have is as so:
Table of P-values between classes 11,12,13 and 14
My next task with which i am having difficulty with is presenting each of those values as a table where for each element, if its value is below a certain threshold (say .05) then the table should dynamically display if the test between the two classes passes ( represented by a 1 if below 0.05 and a 0 if above 0.05) the table should also display a ratio of the number of tests passed as a proportion of the number of tests conducted. ( number of entries in the table below 0.05 over the total number of entries in the diagonal matrix). In reference to the image above the output should look like this:
Ideal Matrix
And so the problem, is essentially that i have to iterate through the first matrix (exclude the first row and first column), apply a function then generate a new row and header with a row and column summary! Any help or advice would be appreciated.

R is not really a useful tool to build such a table, but here is one solution.
Data (shortened the decimals for convenience):
mat <- matrix(c(.569, .0001, .1211, NA, .0001, .3262, NA, NA, .0001), nrow = 3)
[,1] [,2] [,3]
[1,] 0.5690 NA NA
[2,] 0.0001 0.0001 NA
[3,] 0.1211 0.3262 1e-04
First we convert to the 0,1 scheme by using ifelse with the condition < .05:
mat <- ifelse(mat < .05, 1, 0)
Then we add another column with the rowSums:
mat <- cbind(mat, rowSums(mat, na.rm = T))
Then we add another row with the colSums of the boolean matrix !is.na(mat), therefore counting the numbers of non NA per column:
mat <- rbind(mat, colSums(!is.na(mat)))
Then we change the lower right cell to the sum of the inner matrix divided by the amount of non NA of the inner matrix:
mat[nrow(mat), ncol(mat)] <- sum(mat[1:nrow(mat)-1, 1:ncol(mat)-1], na.rm = T)/
sum(!is.na(mat[1:nrow(mat)-1, 1:ncol(mat)-1]))
Finally, we change the row and column names:
rownames(mat) <- c(12:14, "SumCount")
colnames(mat) <- c(11:13, "SumScore")
End result:
> mat
11 12 13 SumScore
12 0 NA NA 0.0
13 1 1 NA 2.0
14 0 0 1 1.0
SumCount 3 2 1 0.5
Notice that no looping was necessary, as R is very efficient with vectorized operations on matrices.

Here is one way of doing what you want.
First I will make up a matrix.
set.seed(3781)
pval <- matrix(runif(9, 0, 0.07), 3)
is.na(pval) <- upper.tri(pval)
dimnames(pval) <- list(12:14, 11:13)
Now the question.
Ideal <- matrix(as.integer(pval < 0.05), nrow(pval))
dimnames(Ideal) <- dimnames(pval)
Ideal
# 11 12 13
#12 1 NA NA
#13 1 1 NA
#14 1 0 0
r <- sum(Ideal, na.rm = TRUE)/sum(!is.na(Ideal))
r
#[1] 0.6666667
So now all what is needed is to add the extra row and column.
Ideal <- rbind(Ideal, colSums(!is.na(Ideal)))
Ideal <- cbind(Ideal, rowSums(Ideal, na.rm = TRUE))
Ideal[nrow(pval) + 1, ncol(pval) + 1] <- r
rownames(Ideal)[nrow(pval) + 1] <- "SumCount"
colnames(Ideal)[nrow(pval) + 1] <- "SumScore"

Related

how to create a row that is calculated from another row automatically like how we do it in excel?

does anyone know how to have a row in R that is calculated from another row automatically? i.e.
lets say in excel, i want to make a row C, which is made up of (B2/B1)
e.g. C1 = B2/B1
C2 = B3/B2
...
Cn = Cn+1/Cn
but in excel, we only need to do one calculation then drag it down. how do we do it in R?
In R you work with columns as vectors so the operations are vectorized. The calculations as described could be implemented by the following commands, given a data.frame df (i.e. a table) and the respective column names as mentioned:
df["C1"] <- df["B2"]/df["B1"]
df["C2"] <- df["B3"]/df["B2"]
In R you usually would name the columns according to the content they hold. With that, you refer to the columns by their name, although you can also address the first column as df[, 1], the first row as df[1, ] and so on.
EDIT 1:
There are multiple ways - and certainly some more elegant ways to get it done - but for understanding I kept it in simple base R:
Example dataset for demonstration:
df <- data.frame("B1" = c(1, 2, 3),
"B2" = c(2, 4, 6),
"B3" = c(4, 8, 12))
Column calculation:
for (i in 1:ncol(df)-1) {
col_name <- paste0("C", i)
df[col_name] <- df[, i+1]/df[, i]
}
Output:
B1 B2 B3 C1 C2
1 1 2 4 2 2
2 2 4 8 2 2
3 3 6 12 2 2
So you iterate through the available columns B1/B2/B3. Dynamically create a column name in every iteration, based on the number of the current iteration, and then calculate the respective column contents.
EDIT 2:
Rowwise, as you actually meant it apparently, works similarly:
a <- c(10,15,20, 1)
df <- data.frame(a)
for (i in 1:nrow(df)) {
df$b[i] <- df$a[i+1]/df$a[i]
}
Output:
a b
1 10 1.500000
2 15 1.333333
3 20 0.050000
4 1 NA
You can do this just using vectors, without a for loop.
a <- c(10,15,20, 1)
df <- data.frame(a)
df$b <- c(df$a[-1], 0) / df$a
print(df)
a b
1 10 1.500000
2 15 1.333333
3 20 0.050000
4 1 0.000000
Explanation:
In the example data, df$a is the vector 10 15 20 1.
df$a[-1] is the same vector with its first element removed, 15 20 1.
And using c() to add a new element to the end so that the vector has the same lenght as before:
c(df$a[-1],0) which is 15 20 1 0
What we want for column b is this vector divided by the original df$a.
So:
df$b <- c(df$a[-1], 0) / df$a

Split integers based on a value in second column, assign new values, and, recombine into new dataset

In R, I have a 2xn matrix of data containing all integers.
The first column indicates the size of an item. Some of these sizes were due to merging, so the second column indicates the number of items that went into that size (including 1) (calling it 'index'). The sum of the indices indicate how many items were actually in the original data.
I now need to create a new data set that splits any merged sizes back out according to the number in the index, resulting in a 2xn vector (with a new length n according the the total number of indices) and a second column all 1's.
I need this split to happen in two ways.
"Homogeneously" where any merged sizes are assigned to the number of indices as homogeneously as possible. For instance, a size of 6 with index of 3 would now be c(2,2,2). Importantly, all number have to be integers, so it should be something like c(1,2) or c(2,1). It cant be c(1.5,1.5).
"Heterogeneously" where the number of sizes are skewed to assign 1 to all positions in the index except one, which would contain the reminder. For instance, of a size of 6 with index of 3, it would now be c(1,1,4) or any combination of 1, 1, and 4.
Below I am providing some sample data that gives an example of what I have, what I want, and what I have tried.
#Example data that I have
Y.have<-cbind(c(19,1,1,1,1,4,3,1,1,8),c(3,1,1,1,1,2,1,1,1,3))
The data show that three items went into the size of 19 for the first row, one item went into the size one in the second column, and so on. Importantly, in these data there were originally 15 items (i.e. sum(Y.have[,2])), some of which got merged, so the final data will need to be of length 15.
What I want the data to look like is:
####Homogenous separation - split values evenly as possible
#' The value of 19 in row 1 is now a vector of c(6,6,7) (or any combination thereof, i.e. c(6,7,6) is fine) since the position in the second column is a 3
#' Rows 2-5 are unchanged since they have a 1 in the second column
#' The value of 4 in row 6 is now a vecttor of c(2,2) since the position of the second column is a 2
#' Rows 7-9 are unchanged since they have a 1 in the second column
#' The value of 8 in row 10 is now a vector of c(3,3,2) (or any combination thereof) since the position in the second column is a 3
Y.want.hom<-cbind(c(c(6,6,7),1,1,1,1,c(2,2),3,1,1,c(3,3,2)),c(rep(1,times=sum(Y.have[,2]))))
####Heterogenous separation - split values with as many singles as possible,
#' The value of 19 in row 1 is now a vector of c(1,1,17) (or any combination thereof, i.e. c(1,17,1) is fine) since the position in the second column is a 3
#' Rows 2-5 are unchanged since they have a 1 in the second column
#' The value of 4 in row 6 is now a vecttor of c(1,3) since the position of the second column is a 2
#' Rows 7-9 are unchanged since they have a 1 in the second column
#' The value of 8 in row 10 is now a vector of c(1,1,6) (or any combination thereof) since the position in the second column is a 3
Y.want.het<-cbind(c(c(1,1,17),1,1,1,1,c(1,3),3,1,1,c(1,1,6)),c(rep(1,times=sum(Y.have[,2]))))
Note that the positions of the integers in the final data don't matter since they will all have one index case.
I have tried splitting the data (split) according to index case. This creates a list with a length according to the number of unique index values. I then iterated through that positions in that list and divided by the position.
a<-split(Y.have[,1],Y.have[,2]) #Split into a list according to the index
b<-list() #initiate new list
for (i in 1:length(a)){
b[[i]]<-a[[i]]/i #get homogenous values
b[[i]]<-rep(b[i],times=i) #repeat the values based on the number of indicies
}
Y.test<-cbind(unlist(b),rep(1,times=length(unlist(c)))) #create new dataset
This was a terrible approach. First, it will produce decimals. Second, the position in the list does not necessarily equal the index number (i.e. if there was no index of 2, the second position would be the next lowest index, but would divide by 2).
However, it at least allowed me to separate out the data by index, manipulate it, and recombine it to a proper length. I now need help in that middle part - manipulating the data for both homogeneous and heterogenous reassignment. I would prefer base r, but any approach would certainly be fine! Thank you in advance!
Here might be one approach.
Create two functions for homogeneous and heterogeneous splits:
get_hom_ints <- function(M, N) {
vec <- rep(floor(M/N), N)
for (i in seq_len(M - sum(vec))) {
vec[i] <- vec[i] + 1
}
vec
}
get_het_ints <- function(M, N) {
vec <- rep(1, N)
vec[1] <- M - sum(vec) + 1
vec
}
Then use apply to go through each row of the matrix:
het_vec <- unlist(apply(Y.have, 1, function(x) get_het_ints(x[1], x[2])))
unname(cbind(het_vec, rep(1, length(het_vec))))
hom_vec <- unlist(apply(Y.have, 1, function(x) get_hom_ints(x[1], x[2])))
unname(cbind(hom_vec, rep(1, length(het_vec))))
Output
(heterogeneous)
[,1] [,2]
[1,] 17 1
[2,] 1 1
[3,] 1 1
[4,] 1 1
[5,] 1 1
[6,] 1 1
[7,] 1 1
[8,] 3 1
[9,] 1 1
[10,] 3 1
[11,] 1 1
[12,] 1 1
[13,] 6 1
[14,] 1 1
[15,] 1 1
(homogeneous)
[,1] [,2]
[1,] 7 1
[2,] 6 1
[3,] 6 1
[4,] 1 1
[5,] 1 1
[6,] 1 1
[7,] 1 1
[8,] 2 1
[9,] 2 1
[10,] 3 1
[11,] 1 1
[12,] 1 1
[13,] 3 1
[14,] 3 1
[15,] 2 1
library(partitions) is created for this type of requirements check it out.
Apply below logics to your code it should work
ex:
hom <- restrictedparts(19,3) #where 19 is Y.have[,1][1] and 3 is Y.have[,2][1] as per your data
print(hom[,ncol(hom)])
#output : 7 6 6
het <- Reduce(intersect, list(which(hom[2,1:ncol(hom)] %in% 1),which(hom[3,1:ncol(hom)] %in% 1)))
hom[,het]
#output : 17 1 1
One option would be to use integer division (%/%) and modulus (%%). It may not give the exact results you specified ie. 8 and 3 give (2,2,4) rather than (3,3,2), but does generally do what you described.
Y.have<-cbind(c(19,1,1,1,1,4,3,1,1,8),c(3,1,1,1,1,2,1,1,1,3))
homoVec <- c()
for (i in 1:length(Y.have[,1])){
if (Y.have[i,2] == 1) {
a = Y.have[i,1]
homoVec <- append(homoVec, a)
} else {
quantNum <- Y.have[i,1]
indexNum <- Y.have[i,2]
b <- quantNum %/% indexNum
c <- quantNum %% indexNum
a <- c(rep(b, indexNum-1), b + c)
homoVec <- append(homoVec, a)
}
}
homoOut <- data.frame(homoVec, 1)
heteroVec <- c()
for (i in 1:length(Y.have[,1])){
if (Y.have[i,2] == 1) {
a = 1
heteroVec <- append(heteroVec, a)
} else {
quantNum <- Y.have[i,1]
indexNum <- Y.have[i,2]
firstNum <- quantNum - (indexNum - 1)
a <- c(firstNum, rep(1, indexNum - 1))
heteroVec <- append(heteroVec, a)
}
}
heteroOut <- data.frame(heteroVec, 1)
If it is really important to have the math exactly as you described in your example then this should work.
homoVec <- c()
for (i in 1:length(Y.have[,1])){
if (Y.have[i,2] == 1) {
a = Y.have[i,1]
homoVec <- append(homoVec, a)
} else {
quantNum <- Y.have[i,1]
indexNum <- Y.have[i,2]
b <- round(quantNum/indexNum)
roundSum <- b * (indexNum - 1)
c <- quantNum - roundSum
a <- c(rep(b, indexNum-1), c)
homoVec <- append(homoVec, a)
}
}
homoOut <- data.frame(homoVec, 1)

Incorrect number of subscripts on matrix. while assigning values from dataframe to matrix

Error message pops up when assigning values in dataframe A to matrix B.
A is a dataframe contains 9000 observations of 3 variables. Data are simulated values of 1000 iterations. Each iteration contains 9 values, i.e. 9 * 1000 = 9000.
V1 is iteration ID, variable name(which not useful for now), V3 is the variable I need.
I create a matrix B to keep values from A[,3]. However, the first value in each iteration will be discarded. Therefore, only 8 values in each iter are kept.
B <- matrix(NA, nrow = 1000, ncol = 8)
for(i in 1:iter){
for(m in 1:8){
B[i,m] <- A[9*(i-1)+m+1,3]
}
}
Then I got the error message. Couldn't figure it out anyways. Any help or suggestions or idea are the most welcome!
So, if I understand well, you basically want to fill the matrix row by row with all values of A[,3] except the first value of each group of 9 values.
Instead of using two for loops, you can go straight by filling directly the matrix with A[,3] when creating the matrix object B. It will fill it column by column, so you just have to transpose the matrix and remove the first column to get your result. The code looks like this:
B <- t(matrix(A$V3, nrow = 9, ncol = 1000))
B <- B[,-1]
Example
We defined a dataframe A with 3 variables and 9000 observations
A = data.frame(V1 = rnorm(9000),
V2 = rnorm(9000),
V3 = rnorm(9000))
> head(A)
V1 V2 V3
1 1.0755625 2.82414180 1.76860717
2 0.3421535 0.85857695 0.05682035
3 1.3747495 -0.01151905 0.90259357
4 1.1589849 0.91009114 0.35132258
5 -0.1107268 1.38244412 0.76163226
6 -1.5551836 1.27199029 -0.56923898
Then we apply the code below to generate B and we can check that B is:
> head(B[,1:5])
[,1] [,2] [,3] [,4] [,5]
[1,] 0.05682035 0.9025936 0.35132258 0.7616323 -0.5692390
[2,] -0.75018285 -0.6160903 -1.43556979 -0.3983150 2.0722279
[3,] 0.97226064 1.5366989 0.06546405 -0.5666010 2.3127568
[4,] -0.66904980 -1.9877136 -0.49963116 0.9217295 -0.6338961
[5,] 0.42339924 -0.6077871 0.16467356 -0.3301223 -0.6031495
[6,] 0.82212429 0.3383385 -0.26872905 1.1513397 -0.2644223
You can notice that first row of B correspond to first values of A WITHOUT the first one. and if we check dimensions of B, you will see:
> dim(B)
[1] 1000 8

Can a column be a vector or list class?

I'm working with multiple response questions in a survey, and I have a character column that contains values that look like "1,2,3" and "1,4,5". The participants click all values that apply, and I"m given this result.
What is the best solution to deal with this problem? Should I create new columns that tell me if a value in that list is present or not? Or can I create a column that has a list/vector class?
One can't say what is best without knowing the purpose but storing them as indicator columns, i.e. one 0/1 column per option, would let you perform a regression or tabulate them easily. Here we convert x into a 0/1 matrix m and then consider what fraction of respondents answered yes to each question and we also regress with them in various ways of which two are shown, take various correlations and plots.
We also show a plot based on applying stack from to the list representation so it might be useful to use more than one representation and convert among them.
x <- c("1,2,3", "1,4,5")
m <- t(+outer(1:5, lapply(strsplit(x, ","), as.numeric), Vectorize(`%in%`)))
colMeans(m)
y <- 1:2
lm(y ~ m+0)
lapply(1:5, function(i) glm(m[, i] ~ y, family = binomial()))
cor(m)
cor(t(m))
heatmap(m)
stk <- stack(setNames(lapply(strsplit(x, ","), as.numeric), seq_along(x)))
plot(stk)
Here is a data frame with 4 different possibilities:
library(dst) # encode/decode
DF <- data.frame(x, stringsAsFactors = FALSE)
DF$list <- strsplit(x, ",")
DF <- cbind(DF, m, code = apply(m, 1, decode, base = 2))
DF
## x list 1 2 3 4 5 code
## 1 1,2,3 1, 2, 3 1 1 1 0 0 28
## 2 1,4,5 1, 4, 5 1 0 0 1 1 19
Note that decode converts 0/1 values into a numeric value and encode can be used to reverse that:
t(encode(base = rep(2, 5), c(28, 19)))
## [,1] [,2] [,3] [,4] [,5]
## r 1 1 1 0 0
## 1 0 0 1 1

Find the proportion of even numbers per row

I have a matrix containing 5 columns and 20 rows. For each row, I want to find the proportion of even numbers that row has and write it per row. My trouble is finding the proportion of even numbers.
So here is a part of the output:
1 2 3 4 5
[1,] 6 5 1 2 5
x <- apply(matrix, 1, length(matrix %% 2 == 0)/5)
matrix <- cbind(matrix, x)
take a look in ?"%%". Here an example:
## reproducible example
set.seed(1)
mat <- matrix(
sample(1:10,5*20,replace = TRUE),
nrow = 20, ncol = 5, byrow = TRUE)
## 1- convert matrix to a logical one using %%
## 2- compute occurrence of TRUE value using the vectorised rowSums
## 3- divide by the number of column to convert occurrence to proportions
rowSums(mat %% 2 ==0)/ncol(mat)

Resources