How to randomly pick a % of observations from a matrix in R? - r

I have a matrix, and I want to randomly pick 10% of the elements in the matrix, and store these elements into a dataframe indexing row, column and value.
I want to remark that I am interested in randomly sampling both row and column, so I am not interested in partial solutions to sample 10% of the rows and picking all the columns, or the other way around, sampling 10% of the columns and picking all the rows.
For example,
M = matrix(rnorm(30), 10, 3)
Given this matrix, that has 30 different elements, I would like to randomly sample 10% of them (0.1 * 30 = 3) and store those in a dataset of the form
row column value
4 2 x
7 1 x
2 1 x

You can use sample from a vector from seq_along and get the row and column using arrayInd and cbind this with the value of the matrix.
i <- sample(seq_along(M), length(M) %/% 10)
cbind(arrayInd(i, dim(M)), M[i])
#cbind(arrayInd(i, dim(M), c("row", "column"), TRUE), value = M[i]) #Alternative with names
# [,1] [,2] [,3]
#[1,] 5 1 -0.72818419
#[2,] 9 1 1.14609041
#[3,] 2 2 0.01162598

View the 2d matrix as a long 1d array,i.e. ravel it logically, not flattern it.
Then get 0.1 * len(matrix) * len(matrix[0]) rand numbers from 0 to len(matrix-1)*len(matrix[0]-1)
For example, you have a rand number: randVar,
then it can be represented in the form of row and col
row = randVar / len(matrix[0])
col = randVar % len(matrix[0])

Related

Split integers based on a value in second column, assign new values, and, recombine into new dataset

In R, I have a 2xn matrix of data containing all integers.
The first column indicates the size of an item. Some of these sizes were due to merging, so the second column indicates the number of items that went into that size (including 1) (calling it 'index'). The sum of the indices indicate how many items were actually in the original data.
I now need to create a new data set that splits any merged sizes back out according to the number in the index, resulting in a 2xn vector (with a new length n according the the total number of indices) and a second column all 1's.
I need this split to happen in two ways.
"Homogeneously" where any merged sizes are assigned to the number of indices as homogeneously as possible. For instance, a size of 6 with index of 3 would now be c(2,2,2). Importantly, all number have to be integers, so it should be something like c(1,2) or c(2,1). It cant be c(1.5,1.5).
"Heterogeneously" where the number of sizes are skewed to assign 1 to all positions in the index except one, which would contain the reminder. For instance, of a size of 6 with index of 3, it would now be c(1,1,4) or any combination of 1, 1, and 4.
Below I am providing some sample data that gives an example of what I have, what I want, and what I have tried.
#Example data that I have
Y.have<-cbind(c(19,1,1,1,1,4,3,1,1,8),c(3,1,1,1,1,2,1,1,1,3))
The data show that three items went into the size of 19 for the first row, one item went into the size one in the second column, and so on. Importantly, in these data there were originally 15 items (i.e. sum(Y.have[,2])), some of which got merged, so the final data will need to be of length 15.
What I want the data to look like is:
####Homogenous separation - split values evenly as possible
#' The value of 19 in row 1 is now a vector of c(6,6,7) (or any combination thereof, i.e. c(6,7,6) is fine) since the position in the second column is a 3
#' Rows 2-5 are unchanged since they have a 1 in the second column
#' The value of 4 in row 6 is now a vecttor of c(2,2) since the position of the second column is a 2
#' Rows 7-9 are unchanged since they have a 1 in the second column
#' The value of 8 in row 10 is now a vector of c(3,3,2) (or any combination thereof) since the position in the second column is a 3
Y.want.hom<-cbind(c(c(6,6,7),1,1,1,1,c(2,2),3,1,1,c(3,3,2)),c(rep(1,times=sum(Y.have[,2]))))
####Heterogenous separation - split values with as many singles as possible,
#' The value of 19 in row 1 is now a vector of c(1,1,17) (or any combination thereof, i.e. c(1,17,1) is fine) since the position in the second column is a 3
#' Rows 2-5 are unchanged since they have a 1 in the second column
#' The value of 4 in row 6 is now a vecttor of c(1,3) since the position of the second column is a 2
#' Rows 7-9 are unchanged since they have a 1 in the second column
#' The value of 8 in row 10 is now a vector of c(1,1,6) (or any combination thereof) since the position in the second column is a 3
Y.want.het<-cbind(c(c(1,1,17),1,1,1,1,c(1,3),3,1,1,c(1,1,6)),c(rep(1,times=sum(Y.have[,2]))))
Note that the positions of the integers in the final data don't matter since they will all have one index case.
I have tried splitting the data (split) according to index case. This creates a list with a length according to the number of unique index values. I then iterated through that positions in that list and divided by the position.
a<-split(Y.have[,1],Y.have[,2]) #Split into a list according to the index
b<-list() #initiate new list
for (i in 1:length(a)){
b[[i]]<-a[[i]]/i #get homogenous values
b[[i]]<-rep(b[i],times=i) #repeat the values based on the number of indicies
}
Y.test<-cbind(unlist(b),rep(1,times=length(unlist(c)))) #create new dataset
This was a terrible approach. First, it will produce decimals. Second, the position in the list does not necessarily equal the index number (i.e. if there was no index of 2, the second position would be the next lowest index, but would divide by 2).
However, it at least allowed me to separate out the data by index, manipulate it, and recombine it to a proper length. I now need help in that middle part - manipulating the data for both homogeneous and heterogenous reassignment. I would prefer base r, but any approach would certainly be fine! Thank you in advance!
Here might be one approach.
Create two functions for homogeneous and heterogeneous splits:
get_hom_ints <- function(M, N) {
vec <- rep(floor(M/N), N)
for (i in seq_len(M - sum(vec))) {
vec[i] <- vec[i] + 1
}
vec
}
get_het_ints <- function(M, N) {
vec <- rep(1, N)
vec[1] <- M - sum(vec) + 1
vec
}
Then use apply to go through each row of the matrix:
het_vec <- unlist(apply(Y.have, 1, function(x) get_het_ints(x[1], x[2])))
unname(cbind(het_vec, rep(1, length(het_vec))))
hom_vec <- unlist(apply(Y.have, 1, function(x) get_hom_ints(x[1], x[2])))
unname(cbind(hom_vec, rep(1, length(het_vec))))
Output
(heterogeneous)
[,1] [,2]
[1,] 17 1
[2,] 1 1
[3,] 1 1
[4,] 1 1
[5,] 1 1
[6,] 1 1
[7,] 1 1
[8,] 3 1
[9,] 1 1
[10,] 3 1
[11,] 1 1
[12,] 1 1
[13,] 6 1
[14,] 1 1
[15,] 1 1
(homogeneous)
[,1] [,2]
[1,] 7 1
[2,] 6 1
[3,] 6 1
[4,] 1 1
[5,] 1 1
[6,] 1 1
[7,] 1 1
[8,] 2 1
[9,] 2 1
[10,] 3 1
[11,] 1 1
[12,] 1 1
[13,] 3 1
[14,] 3 1
[15,] 2 1
library(partitions) is created for this type of requirements check it out.
Apply below logics to your code it should work
ex:
hom <- restrictedparts(19,3) #where 19 is Y.have[,1][1] and 3 is Y.have[,2][1] as per your data
print(hom[,ncol(hom)])
#output : 7 6 6
het <- Reduce(intersect, list(which(hom[2,1:ncol(hom)] %in% 1),which(hom[3,1:ncol(hom)] %in% 1)))
hom[,het]
#output : 17 1 1
One option would be to use integer division (%/%) and modulus (%%). It may not give the exact results you specified ie. 8 and 3 give (2,2,4) rather than (3,3,2), but does generally do what you described.
Y.have<-cbind(c(19,1,1,1,1,4,3,1,1,8),c(3,1,1,1,1,2,1,1,1,3))
homoVec <- c()
for (i in 1:length(Y.have[,1])){
if (Y.have[i,2] == 1) {
a = Y.have[i,1]
homoVec <- append(homoVec, a)
} else {
quantNum <- Y.have[i,1]
indexNum <- Y.have[i,2]
b <- quantNum %/% indexNum
c <- quantNum %% indexNum
a <- c(rep(b, indexNum-1), b + c)
homoVec <- append(homoVec, a)
}
}
homoOut <- data.frame(homoVec, 1)
heteroVec <- c()
for (i in 1:length(Y.have[,1])){
if (Y.have[i,2] == 1) {
a = 1
heteroVec <- append(heteroVec, a)
} else {
quantNum <- Y.have[i,1]
indexNum <- Y.have[i,2]
firstNum <- quantNum - (indexNum - 1)
a <- c(firstNum, rep(1, indexNum - 1))
heteroVec <- append(heteroVec, a)
}
}
heteroOut <- data.frame(heteroVec, 1)
If it is really important to have the math exactly as you described in your example then this should work.
homoVec <- c()
for (i in 1:length(Y.have[,1])){
if (Y.have[i,2] == 1) {
a = Y.have[i,1]
homoVec <- append(homoVec, a)
} else {
quantNum <- Y.have[i,1]
indexNum <- Y.have[i,2]
b <- round(quantNum/indexNum)
roundSum <- b * (indexNum - 1)
c <- quantNum - roundSum
a <- c(rep(b, indexNum-1), c)
homoVec <- append(homoVec, a)
}
}
homoOut <- data.frame(homoVec, 1)

Incorrect number of subscripts on matrix. while assigning values from dataframe to matrix

Error message pops up when assigning values in dataframe A to matrix B.
A is a dataframe contains 9000 observations of 3 variables. Data are simulated values of 1000 iterations. Each iteration contains 9 values, i.e. 9 * 1000 = 9000.
V1 is iteration ID, variable name(which not useful for now), V3 is the variable I need.
I create a matrix B to keep values from A[,3]. However, the first value in each iteration will be discarded. Therefore, only 8 values in each iter are kept.
B <- matrix(NA, nrow = 1000, ncol = 8)
for(i in 1:iter){
for(m in 1:8){
B[i,m] <- A[9*(i-1)+m+1,3]
}
}
Then I got the error message. Couldn't figure it out anyways. Any help or suggestions or idea are the most welcome!
So, if I understand well, you basically want to fill the matrix row by row with all values of A[,3] except the first value of each group of 9 values.
Instead of using two for loops, you can go straight by filling directly the matrix with A[,3] when creating the matrix object B. It will fill it column by column, so you just have to transpose the matrix and remove the first column to get your result. The code looks like this:
B <- t(matrix(A$V3, nrow = 9, ncol = 1000))
B <- B[,-1]
Example
We defined a dataframe A with 3 variables and 9000 observations
A = data.frame(V1 = rnorm(9000),
V2 = rnorm(9000),
V3 = rnorm(9000))
> head(A)
V1 V2 V3
1 1.0755625 2.82414180 1.76860717
2 0.3421535 0.85857695 0.05682035
3 1.3747495 -0.01151905 0.90259357
4 1.1589849 0.91009114 0.35132258
5 -0.1107268 1.38244412 0.76163226
6 -1.5551836 1.27199029 -0.56923898
Then we apply the code below to generate B and we can check that B is:
> head(B[,1:5])
[,1] [,2] [,3] [,4] [,5]
[1,] 0.05682035 0.9025936 0.35132258 0.7616323 -0.5692390
[2,] -0.75018285 -0.6160903 -1.43556979 -0.3983150 2.0722279
[3,] 0.97226064 1.5366989 0.06546405 -0.5666010 2.3127568
[4,] -0.66904980 -1.9877136 -0.49963116 0.9217295 -0.6338961
[5,] 0.42339924 -0.6077871 0.16467356 -0.3301223 -0.6031495
[6,] 0.82212429 0.3383385 -0.26872905 1.1513397 -0.2644223
You can notice that first row of B correspond to first values of A WITHOUT the first one. and if we check dimensions of B, you will see:
> dim(B)
[1] 1000 8

how to iterate through each element in a matrix in r

Context: I am iterating through several variables in my dataset, and performing a pairwise t.test between the factors for each of those variables. ( which i have succesfully managed to perform). and example of the result i have is as so:
Table of P-values between classes 11,12,13 and 14
My next task with which i am having difficulty with is presenting each of those values as a table where for each element, if its value is below a certain threshold (say .05) then the table should dynamically display if the test between the two classes passes ( represented by a 1 if below 0.05 and a 0 if above 0.05) the table should also display a ratio of the number of tests passed as a proportion of the number of tests conducted. ( number of entries in the table below 0.05 over the total number of entries in the diagonal matrix). In reference to the image above the output should look like this:
Ideal Matrix
And so the problem, is essentially that i have to iterate through the first matrix (exclude the first row and first column), apply a function then generate a new row and header with a row and column summary! Any help or advice would be appreciated.
R is not really a useful tool to build such a table, but here is one solution.
Data (shortened the decimals for convenience):
mat <- matrix(c(.569, .0001, .1211, NA, .0001, .3262, NA, NA, .0001), nrow = 3)
[,1] [,2] [,3]
[1,] 0.5690 NA NA
[2,] 0.0001 0.0001 NA
[3,] 0.1211 0.3262 1e-04
First we convert to the 0,1 scheme by using ifelse with the condition < .05:
mat <- ifelse(mat < .05, 1, 0)
Then we add another column with the rowSums:
mat <- cbind(mat, rowSums(mat, na.rm = T))
Then we add another row with the colSums of the boolean matrix !is.na(mat), therefore counting the numbers of non NA per column:
mat <- rbind(mat, colSums(!is.na(mat)))
Then we change the lower right cell to the sum of the inner matrix divided by the amount of non NA of the inner matrix:
mat[nrow(mat), ncol(mat)] <- sum(mat[1:nrow(mat)-1, 1:ncol(mat)-1], na.rm = T)/
sum(!is.na(mat[1:nrow(mat)-1, 1:ncol(mat)-1]))
Finally, we change the row and column names:
rownames(mat) <- c(12:14, "SumCount")
colnames(mat) <- c(11:13, "SumScore")
End result:
> mat
11 12 13 SumScore
12 0 NA NA 0.0
13 1 1 NA 2.0
14 0 0 1 1.0
SumCount 3 2 1 0.5
Notice that no looping was necessary, as R is very efficient with vectorized operations on matrices.
Here is one way of doing what you want.
First I will make up a matrix.
set.seed(3781)
pval <- matrix(runif(9, 0, 0.07), 3)
is.na(pval) <- upper.tri(pval)
dimnames(pval) <- list(12:14, 11:13)
Now the question.
Ideal <- matrix(as.integer(pval < 0.05), nrow(pval))
dimnames(Ideal) <- dimnames(pval)
Ideal
# 11 12 13
#12 1 NA NA
#13 1 1 NA
#14 1 0 0
r <- sum(Ideal, na.rm = TRUE)/sum(!is.na(Ideal))
r
#[1] 0.6666667
So now all what is needed is to add the extra row and column.
Ideal <- rbind(Ideal, colSums(!is.na(Ideal)))
Ideal <- cbind(Ideal, rowSums(Ideal, na.rm = TRUE))
Ideal[nrow(pval) + 1, ncol(pval) + 1] <- r
rownames(Ideal)[nrow(pval) + 1] <- "SumCount"
colnames(Ideal)[nrow(pval) + 1] <- "SumScore"

Find the proportion of even numbers per row

I have a matrix containing 5 columns and 20 rows. For each row, I want to find the proportion of even numbers that row has and write it per row. My trouble is finding the proportion of even numbers.
So here is a part of the output:
1 2 3 4 5
[1,] 6 5 1 2 5
x <- apply(matrix, 1, length(matrix %% 2 == 0)/5)
matrix <- cbind(matrix, x)
take a look in ?"%%". Here an example:
## reproducible example
set.seed(1)
mat <- matrix(
sample(1:10,5*20,replace = TRUE),
nrow = 20, ncol = 5, byrow = TRUE)
## 1- convert matrix to a logical one using %%
## 2- compute occurrence of TRUE value using the vectorised rowSums
## 3- divide by the number of column to convert occurrence to proportions
rowSums(mat %% 2 ==0)/ncol(mat)

Reorder columns in a matrix

Suppose that I have an n row, m column matrix A, and I want to reorder every column in m according to the sorting of some specific row.
For instance, if I take order(A[,k]), that gives me the numeric or alphabetical order of elements in column k. I now want to sort every column in matrix A according to those rankings, so that elements 1...n in every row are ordered to correspond to elements 1...n (by rank) in column k. Is there a simple way to do this without looping over all columns?
Just use:
A[order(A[,k]),]
For example:
set.seed(21)
A <- matrix(rnorm(50),10,5)
A[order(A[,1]),]
to elaborate on #joshua's answer: I think the confusion may arise from the fact that you are ordering on a column but then passing that ordering as an index to the rows.
That's likely why you tried A[, order(A[,k])] instead of A[order(A[,k]),]
order(x) contrary to the name, does not actually order x, but rather just provides an ordering to x.
For example:
set.seed(1)
A <- matrix(sample(LETTERS[2:8], 24, T), ncol=6)
print(A, quote=F)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] C C F F G H
[2,] D H B D H C
[3,] F H C G D F
[4,] H F C E G B
order(A[, 2])
[1] 1 4 2 3
*Note that the output is only 4 elements long, which is the number of rows of A, not columns.*
The output essentially says that within column 2 of A,
the 1st element goes first,
the 4th element goes second,
the 2nd element goes thrid,
etc..
But each element of column A is attached to a row. We need to re-order the rows not the columns.
To apply that ordering to the entire matrix (or data frame), we use the ordering as a row index:
rowIndex <- order(A[, 2])
# Note that these are all equivalent
A[rowIndex, ]
A[order(A[, 2]), ]
A[c(1, 4, 1, 3), ]
Lastly, we can pass order() more than one vector, and it will use subsequent vectors to break ties.
However, regardless of the number of columns from A we give it, order will still give us a single vector, equal in size to the number of rows of A:
# Order according to column 2; ties are left according to their original order
order(A[, 2])
[1] 1 4 2 3
# Order according to column 2; ties are ordered according to column 5
order(A[, 2], A[, 5])
[1] 1 4 3 2

Resources