Index matrix with randomly sampled columns - r

I'm trying to fill the columns of an index matrix with samples from 1:whatever using a for loop. The purpose of this is for a bootstrap coding problem. The issue I'm getting is that the for loop wont run correctly once it reaches a number that is not a multiple of the row length. For some reason it thinks I want an equal representation of number in each column. How do I get this to stop?
index.mat=matrix(NA,nr=12,nc=10,byrow=FALSE)
for(i in 1:5)
{
index.mat[,i] <- sample(1:i, i, replace=TRUE)
print(index.mat)
}
will print
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 2 1 NA NA NA NA NA NA
[2,] 1 1 2 4 NA NA NA NA NA NA
[3,] 1 1 2 1 NA NA NA NA NA NA
[4,] 1 1 2 2 NA NA NA NA NA NA
[5,] 1 1 2 1 NA NA NA NA NA NA
[6,] 1 1 2 4 NA NA NA NA NA NA
[7,] 1 1 2 1 NA NA NA NA NA NA
[8,] 1 1 2 2 NA NA NA NA NA NA
[9,] 1 1 2 1 NA NA NA NA NA NA
[10,] 1 1 2 4 NA NA NA NA NA NA
[11,] 1 1 2 1 NA NA NA NA NA NA
[12,] 1 1 2 2 NA NA NA NA NA NA
as the final matrix before giving the error
Error in index.mat[, i] <- sample(1:i, i, replace = TRUE) :
number of items to replace is not a multiple of replacement length

Just use sample(i, size = 12, replace = TRUE).
Your LHS is index.mat[,i] which has length 12.
Your RHS is sample(1:i, i, replace = TRUE), which has length i.
By nature, R will recycle the RHS when the lengths don't match up -- this means, when i=2, your RHS is length 2 and it will simply be repeated 6 times to match the LHS length 12.
In this particular case, if the RHS length isn't a divisor of the LHS length, you get an error -- which happens first when, you guessed it, i=5 (since 1, 2, 3, and 4 all divide 12 evenly).

Related

Produce a triangular matrix of integers increasing by 1

I am trying to produce a matrix of variable dimensions of the form below (i.e. integers increasing by 1 at a time, with a lower triangle of NAs)
NA 1 2 3 4
NA NA 5 6 7
NA NA NA 8 9
NA NA NA NA 10
NA NA NA NA 11
I have used the below code
sample_vector <- c(1:(total_nodes^2))
sample_matrix <- matrix(sample_vector, nrow=total_nodes, byrow=FALSE)
sample_matrix[lower.tri(sample_matrix, diag = TRUE)] <- NA
However the matrix I get with this method is of the form:
NA 2 3 4 5
NA NA 8 9 10
NA NA NA 14 15
NA NA NA NA 20
NA NA NA NA 25
How about this
total_nodes <- 5
sample_matrix <- matrix(NA, nrow=total_nodes, ncol=total_nodes)
sample_matrix[lower.tri(sample_matrix)]<-1:sum(lower.tri(sample_matrix))
sample_matrix <- t(sample_matrix)
sample_matrix
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA 1 2 3 4
# [2,] NA NA 5 6 7
# [3,] NA NA NA 8 9
# [4,] NA NA NA NA 10
# [5,] NA NA NA NA NA
I'm using the diag function to construct a matrix and upper.tri to turn it into a "target" aas well as a logical indexing tool:
upr5 <- upper.tri(diag(5))
upr5
upr5[upr5] <- 1:sum(upr5)
upr5[upr5==0] <- NA # would otherwise have been zeroes
upr5
[,1] [,2] [,3] [,4] [,5]
[1,] NA 1 2 4 7
[2,] NA NA 3 5 8
[3,] NA NA NA 6 9
[4,] NA NA NA NA 10
[5,] NA NA NA NA NA

R matchingMarkets::hri() Error in x[y] : invalid subscript type 'list'

I am trying to use the Gale-Shapely algorithm in R matchingMarkets::hri() to assign 10 students (A-J) to 6 groups (1-6) based on their preferences and subject to capacity constraints in each group. Each student ranks their top 3 choices for groups and all other choices are null. My issue is that
> hri(nSlots=capacities$capacity, s.prefs = student_prefs_matrix, c.prefs = null_matrix)
returns this:
Error in x[y] : invalid subscript type 'list'
hri() does allow missing values, according to the documentation (this is unlike the similar matchingR::galeShapely.collegeAdmissions()), so that is not where the issue lays. I compared my inputs to the example in the documentation (p. 7) and all same type of structure. Here are my inputs:
> student_prefs_matrix
a b c d e f g h i j
1 3 1 NA NA NA NA NA NA NA NA
2 NA NA 3 NA NA 3 3 2 2 2
3 NA NA NA NA 1 2 NA NA 1 3
4 1 3 NA 3 NA NA NA NA NA NA
5 NA 2 2 1 3 1 1 1 3 1
6 2 NA 1 2 2 NA 2 3 NA NA
> null_matrix
1 2 3 4 5 6
a NA NA NA NA NA NA
b NA NA NA NA NA NA
c NA NA NA NA NA NA
d NA NA NA NA NA NA
e NA NA NA NA NA NA
f NA NA NA NA NA NA
g NA NA NA NA NA NA
h NA NA NA NA NA NA
i NA NA NA NA NA NA
j NA NA NA NA NA NA
> capacities$capacity
[1] 2 2 2 2 1 1
Can anyone give a hint as to what this error may mean? The only list (vector) I give is for nSlots which is supposed to be a list. Alternatively, is there a better way to solve this matching problem? I know Gale Shapely is meant for 2 sided matchings but I thought this may work anyways if I always look for "student optimal" matching. Thanks for the help! This is my first time posting a question on here.
To fix your problem, make sure to provide the preference lists in the appropriate format. See the documentation of the matchingMarkets package at https://matchingmarkets.org/hri.html for several examples on this.
Let us look at an example with 7 students, 2 colleges with 3 slots each, and given preference lists as follows:
> s.prefs <- matrix(c(1,2, 1,2, 1,NA, 1,2, 1,2, 1,2, 1,2), 2,7)
> c.prefs <- matrix(c(1,2,3,4,5,6,7, 1,2,3,4,5,NA,NA), 7,2)
> hri(s.prefs=s.prefs, c.prefs=c.prefs, nSlots=c(3,3))
> s.prefs
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 1 1 1 1 1 1
[2,] 2 2 NA 2 2 NA NA
> c.prefs
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 3 4
[4,] 4 5
[5,] 5 NA
[6,] 6 NA
[7,] 7 NA
For your specification of the preference lists, there are two problems. The most obvious problem is your college_prefs_matrix that currently indicates that none of the colleges find any student acceptable. Thus no stable matching can exist. The other problem is in your student_prefs_matrix. In your example, student a finds colleges 4, 6, and 1 acceptable (in that order). Thus the preference list should be:
> student_prefs_matrix
a b c d ...
[1,] 4 1 6 5 ...
[2,] 6 5 5 6 ...
[3,] 1 4 2 4 ...
[4,] NA NA NA NA ...
[5,] NA NA NA NA ...
[6,] NA NA NA NA ...

Create a Triangular Matrix from a Vector performing sequential operations

I have been trying to solve the following problem.
Suppose I have the following vector:
aux1<-c(0,0,0,4,5,0,7,0,0,10,11,12) where the numbers represent the number of the row.
I want to calculate the distance between the differents elements of this vector fixing the first component, then the second and so on.
If the element is zero, I do not want to count it, so I put a NA instead. The output I want should look like this:
NA NA NA NA NA
NA NA NA NA NA
NA NA NA NA NA
NA NA NA NA NA
1 NA NA NA NA
NA NA NA NA NA
3 2 NA NA NA
NA NA NA NA NA
NA NA NA NA NA
6 5 3 NA NA
7 6 4 1
8 7 5 2 1
In the first column, I have the difference between the first element different from zero and all other elements, i.e., Matrix[5,1]=5-4=1 and Matrix[12,1]=12-4=8. Also, Matrix[7,2]=7-5=2, where 5 is the second element in the vector non-equal to zero. Notice that Matrix[10,3]=10-7=3, where 7 is third element non-equal to zero, but the seventh element in my vector.
I have tried to do this in a loop. My current code looks like this:
M=matrix(nrow=N-1, ncol=N-1))
for (i in 1:N-1){
for (j in 1:N-1){
if(j<=i)
next
else
if(aux1[j]>0)
M[j,i]=aux1[j]-aux1[i]
else
M[j,i]=0
}
}
Unfortunately. I have not been able to solve my problem. Any help would be greatly appreciated.
You could try something like the following (with generous help from #thela)
res <- outer(aux1, head(aux1[aux1 > 0], -1), `-`)
is.na(res) <- res <= 0
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA NA NA NA NA
# [2,] NA NA NA NA NA
# [3,] NA NA NA NA NA
# [4,] NA NA NA NA NA
# [5,] 1 NA NA NA NA
# [6,] NA NA NA NA NA
# [7,] 3 2 NA NA NA
# [8,] NA NA NA NA NA
# [9,] NA NA NA NA NA
# [10,] 6 5 3 NA NA
# [11,] 7 6 4 1 NA
# [12,] 8 7 5 2 1
Using sapply and ifelse :
sapply(head(vv[vv>0],-1),function(y)ifelse(vv-y>0,vv-y,NA))
You loop over the positive values (you should also remove the last element), then you extract each value from the original vector. I used ifelse to replace negative values.
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA NA NA NA NA
# [2,] NA NA NA NA NA
# [3,] NA NA NA NA NA
# [4,] NA NA NA NA NA
# [5,] 1 NA NA NA NA
# [6,] NA NA NA NA NA
# [7,] 3 2 NA NA NA
# [8,] NA NA NA NA NA
# [9,] NA NA NA NA NA
# [10,] 6 5 3 NA NA
# [11,] 7 6 4 1 NA
# [12,] 8 7 5 2 1

How to order columns and rows to create a relatively dense submatrix

I have a large matrix which comprises 1,2 and missing (coded as NA) values. The matrix has 500000 rows by 10000 columns. There are approximately 0.05% 1- or 2-values, and the remaining values are NA.
I would like to reorder the rows and columns of the matrix so that the top left corner of the matrix comprises a relatively high number of 1s and 2s compared to the rest of the matrix. In other words, I would like to create a relatively datarich subset of the matrix, by reordering the matrix rows and columns.
Is there an efficient method of achieving this in R?
In particular, I'm interested in solutions where sorting by the number of non-NA values in each row and column is not sufficient to produce a dense corner.
In addition, I'll add a constraint. The size of the dense corner will be pre-defined.
In the following example, the goal is to reorder the rows and columns so that the top leftmost 3x3 submatrix is relatively dense (i.e. few or no NA values).
m1 <- matrix(c(rep(c(rep(NA, 3), rep(1, 7)), 1),
rep(c(rep(2, 3), rep(NA, 7)), 7),
rep(c(rep(NA, 3), rep(1, 7)), 2)
), nrow=10, byrow=TRUE)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] NA NA NA 1 1 1 1 1 1 1
[2,] 2 2 2 NA NA NA NA NA NA NA
[3,] 2 2 2 NA NA NA NA NA NA NA
[4,] 2 2 2 NA NA NA NA NA NA NA
[5,] 2 2 2 NA NA NA NA NA NA NA
[6,] 2 2 2 NA NA NA NA NA NA NA
[7,] 2 2 2 NA NA NA NA NA NA NA
[8,] 2 2 2 NA NA NA NA NA NA NA
[9,] NA NA NA 1 1 1 1 1 1 1
[10,] NA NA NA 1 1 1 1 1 1 1
The rows and columns are ordered by the number of non-NA values (using code from an answer below):
m1 <- m1[order(rowSums(is.na(m1))), order(colSums(is.na(m1)))]
However, this does not result in a dense 3x3 top leftmost corner:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] NA NA NA 1 1 1 1 1 1 1
[2,] NA NA NA 1 1 1 1 1 1 1
[3,] NA NA NA 1 1 1 1 1 1 1
[4,] 2 2 2 NA NA NA NA NA NA NA
[5,] 2 2 2 NA NA NA NA NA NA NA
[6,] 2 2 2 NA NA NA NA NA NA NA
[7,] 2 2 2 NA NA NA NA NA NA NA
[8,] 2 2 2 NA NA NA NA NA NA NA
[9,] 2 2 2 NA NA NA NA NA NA NA
[10,] 2 2 2 NA NA NA NA NA NA NA
I thought that there maybe a set of optimisation procedures that I could implement as my working matrix is too large to do the reorganisation by eye.

Assign Value to Diagonal Entries of Matrix

I need to access and assign single slots of an m*n matrix inside a for loop. The code so far:
rowCount <- 9
similMatrix = matrix(nrow = rowCount - 1, ncol = rowCount)
show(similMatrix)
for(i in (rowCount - 1)){
for (j in rowCount)
if (i == j){
similMatrix[i == j] <- 0;
}
}
show(similMatrix)
so if i = j the NA value in the matrix needs to be replaced with 0.
You want the function diag<-
m <- matrix(1:12, nrow=3)
m
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
diag(m) <- 0
m
[,1] [,2] [,3] [,4]
[1,] 0 4 7 10
[2,] 2 0 8 11
[3,] 3 6 0 12
For the purpose of setting the "diagonal" elements to zero you have already been given an answer but I wonder if you were hoping for something more general. The reasons for lack of success with that code were two-fold: the construction of your indices were flawed and the indexing was wrong. This would have succeeded:
for(i in 1:(rowCount - 1)){ # need an expression that retruns a sequence
for (j in 1:rowCount) # ditto
if (i == j){
similMatrix[i,j] <- 0; # need to index the matrix with two element if using i,j
}
}
#----------
> show(similMatrix)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 0 NA NA NA NA NA NA NA NA
[2,] NA 0 NA NA NA NA NA NA NA
[3,] NA NA 0 NA NA NA NA NA NA
[4,] NA NA NA 0 NA NA NA NA NA
[5,] NA NA NA NA 0 NA NA NA NA
[6,] NA NA NA NA NA 0 NA NA NA
[7,] NA NA NA NA NA NA 0 NA NA
[8,] NA NA NA NA NA NA NA 0 NA
But resorting to loops in R is generally considered a last resort (sometimes for the wrong reasons.) There is a much more compact way of doing the same "loop" operation and it generalizes more widely than just setting the diagonal.
similMatrix[ row(similMatrix) == col(similMatrix) ] <- 0
> similMatrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 0 NA NA NA NA NA NA NA NA
[2,] NA 0 NA NA NA NA NA NA NA
[3,] NA NA 0 NA NA NA NA NA NA
[4,] NA NA NA 0 NA NA NA NA NA
[5,] NA NA NA NA 0 NA NA NA NA
[6,] NA NA NA NA NA 0 NA NA NA
[7,] NA NA NA NA NA NA 0 NA NA
[8,] NA NA NA NA NA NA NA 0 NA
If you wanted to set the subdiagonal to zero you could just use:
similMatrix[ row(similMatrix)-1 == col(similMatrix) ] <- 0
You can avoid generating the extra row and col matrices using this:
mind <- min( dim(similMatrix) )
# avoid going outside dimensions if not symmetric
similMatrix[ cbind( seq(maxd),seq(maxd) ) <- 0

Resources