I'm trying to subset a matrix so that I only get the matrix where the first variable is larger than the second variable. I have the matrix out which is a 3000x2 matrix.
I tried
out<-out[out[,1] > out[,2]]
but this eliminates the row.names altogether, and I get a string of integers between 1 to 3000. Would there be a way to preserve the row.names?
Of note, if you only return a subset of one row to form a matrix with one dimension being unity, R will drop the row name:
m <- matrix(1:9, ncol = 3)
rownames(m) <- c("a", "b", "c")
m[1, ] # lost the row name
m[1, , drop = FALSE] # got row name back and a matrix
m[c(1,1), ] # the row name is back when result has nrow > 1
There appears to be no simple way of working around this other than checking for one-row result and assigning the row name.
A matrix is treated by R as a vector with columns and rows.
> A <- matrix(1:9, ncol=3)
# A is filled with 1,...,9 columnwise
> A
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
# only elements with even number in 2nd column of same row
> v <- A[A[,2] %% 2 == 0]
> m <- A[A[,2] %% 2 == 0,]
> v
[1] 1 3 4 6 7 9
> m
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 3 6 9
# The result of evaluating odd/even-ness of middle column.
# This boolean vector is repeated column-wise by default
# until all element's fate in A is determined.
> A[,2] %% 2 == 0
[1] TRUE FALSE TRUE
When you leave out the comma (v), then you address A as a 1-dimensional data structure and R implicitely handles your expression as a vector.
v is in that sense not "string of integers" but a vector of integers. When you add the comma, then you tell R that your condition only adresses the first dimension while indicating a second one (after the comma) - which causes R to handle your expression as a matrix (m).
Related
Suppose we have a "test" matrix that looks like this: (1,2,3, 4,5,6, 7,8,9, 10,11,12) generated by running test <- matrix(1:12, ncol = 4). A simple 3 x 4 (rows x columns) matrix of numbers running from 1 to 12.
Now suppose we'd like to add a value of 1 to each element in each odd-numbered matrix column, so we end up with a matrix of the following values: (2,3,4, 4,5,6, 8,9,10, 10,11,12). How would we use an apply() function to do this?
Note that this is a simplified example. In the more complete code I'm working with, the matrix dynamically expands/contracts based on user inputs so I need an apply() function that counts the actual number of matrix columns, rather than using a fixed assumption of 4 columns per the above example. (And I'm not adding a value of 1 to the elements; I'm running the parallel minima function test[,1] <- pmin(test1[,1], 5) to say limit each value to a max of 5).
With my current limited understanding of the apply() family of functions, all I can so far do is apply(test, 2, function(x) {return(x+1)}) but this is adding a value of 1 to all elements in all columns rather than only the odd-numbered columns.
You may simply subset the input data frame to access only odd or even numbered columns. Consider:
test[c(TRUE, FALSE)] <- apply(test[c(TRUE, FALSE)], 2, function(x) f(x))
test[c(FALSE, TRUE)] <- apply(test[c(FALSE, TRUE)], 2, function(x) f(x))
This works because the recycling rules in R will cause e.g. c(TRUE, FALSE) to be repeated however many times is needed to cover all columns in the input test data frame.
For a matrix, we need to use the drop=FALSE flag when subsetting the matrix in order to keep it in matrix form when using apply():
test <- matrix(1:12, ncol = 4)
test[,c(TRUE, FALSE)] <- apply(test[,c(TRUE, FALSE),drop=FALSE], 2, function(x) x+1)
test
[,1] [,2] [,3] [,4]
[1,] 2 4 8 10
[2,] 3 5 9 11
[3,] 4 6 10 12
^ ^ ... these columns incremented by 1
You may use modulo %% 2.
odd <- !seq(ncol(test)) %% 2 == 0
test[, odd] <- apply(test[, odd], 2, function(x) {return(x + 1)})
# [,1] [,2] [,3] [,4]
# [1,] 2 4 8 10
# [2,] 3 5 9 11
# [3,] 4 6 10 12
Let's say I want to create a sparse matrix SMatrix where all non-zero values are 1.
I already have a matrix of positions, where column 1 stores row index and column 2 stores col index:
vec1 <- c(10,1)
vec2 <- c(12,1)
vec3 <- c(2,3)
positions <- matrix(c(vec1, vec2, vec3),
ncol=2,
dimnames = list(NULL, c("row", "col")),
byrow = T)
positions
row col
[1,] 10 1
[2,] 12 1
[3,] 2 3
I can create the vector x and i which will be the equivalent of SMatrix#x and SMatrix#i like this:
x <- rep(1, nrow(positions))
i <- positions[order(positions[,2]),1] - 1
But how can I create the vector p, which should be the equivalent of SMatrix#p ?
You can use Matrix::sparseMatrix to get the compressed, or pointer representation of the row or column indices.
Matrix::sparseMatrix(positions[,1], positions[,2], x=1)#p
#[1] 0 2 2 3
or use diffinv like:
diffinv(c(table(factor(positions[,2], seq_len(max(positions[,2]))))))
#[1] 0 2 2 3
Doing the opposite of:
dp <- diff(p)
rep(seq_along(dp),dp)
What is given in the manual to expanded form p to row or column indices.
I have a numeric matrix, and I need to extract the set of elements with the largest possible sum, subject to the constraint that no 2 elements can come from the same row or the same column. Is there any efficient algorithm for this, and is there an implementation of that algorithm for R?
For example, if the matrix is (using R's matrix notation):
[,1] [,2] [,3]
[1,] 7 1 9
[2,] 8 4 2
[3,] 3 6 5
then the unique solution is [1,3], [2,1], [3,2], which extracts the numbers 9, 8, and 6 for a total of 23. However, if the matrix is:
[,1] [,2] [,3]
[1,] 6 2 1
[2,] 4 9 5
[3,] 8 7 3
then there are 3 equally good solutions: 1,8,9; 3,6,9; and 5,6,7. These all add up to 18.
Additional notes:
If there are multiple equally good solutions, I need to find all of them. (Being able to find additional solutions that are almost as good would be useful as well, but not essential.)
The matrix elements are all non-negative, and many of them will be zero. Each row and column will contain at least 1 element that is nonzero.
The matrix can contain repeated elements.
The matrix need not be square. It might have more rows than columns or vice versa, but the constraint is always the same: no row or column may be used twice.
This problem could also be reformulated as finding a maximal-scoring set of edges between the 2 halves of a bipartite graph without re-using any node.
If it helps, you may assume that there is some small fixed k such that no row or column contains more than k non-zero values.
If anyone is curious, the rows of the matrix represent items to be labeled, the columns represent the labels, and each matrix element represents the "consistency score" for assigning a label to an item. I want to assign the each label to exactly one item in the way that maximizes the total consistency.
My suggest would be to (1) find all the combinations of elements following the rule that in each combination, no two elements coming from the same row or same column (2) calculate the sum of elements in each combination (3) find the maximum sum and the corresponding combination.
Here I only show the square matrix case, the non-square matrix would follow similar idea.
(1) Suppose the matrix is n*n, keep the row order as 1 to n, all I need to do is to find all the permutations of columns index (1:n), after combine the row index and one permutation of columns index, then I would get the positions of elements in one combination that follow the rule, in this way I can identify the positions of elements in all the combinations.
matrix_data <- matrix(c(6,2,1,4,9,5,8,7,3), byrow=T,nrow = 3)
## example matrix
n_length <- dim(matrix_data)[1]
## row length
all_permutation <- permn(c(1:n_length))
## list of all the permutations of columns index
(2) Find sum of elements in each combination
index_func <- function(x){ ## x will be a permutation from the list all_permutation
matrix_indexs <- matrix(data = c(c(1:n_length),x),
byrow = F, nrow = n_length)
## combine row index and column index to construct the positions of the elements in the matrix
matrix_elements <- matrix_data[matrix_indexs]
## extract the elements based on their position
matrix_combine <- cbind(matrix_indexs,matrix_elements)
## combine the above two matrices
return(matrix_combine)
}
results <- sapply(all_permutation, sum(index_func(x)[,"matrix_elements"]))
## find the sums of all the combination
(3) Find the maximum sum and corresponding combination
max(results) ## 18 maximum sum is 18
max_index <- which(results==max(results)) ## 1 2 4 there are three combinations
## if you want the complete position index
lapply(all_permutation[max_index], index_func)
## output, first column is row index, second column is column index, last column is the corresponding matrix elements
[[1]]
matrix_elements
[1,] 1 1 6
[2,] 2 2 9
[3,] 3 3 3
[[2]]
matrix_elements
[1,] 1 1 6
[2,] 2 3 5
[3,] 3 2 7
[[3]]
matrix_elements
[1,] 1 3 1
[2,] 2 2 9
[3,] 3 1 8
Here are 2 options:
1) Approaching this as an optimization problem where the objective function is to maximize the sum of elements chosen subject to the constraints that each row and column cannot be selected more than once.
sample data:
set.seed(0L)
m <- matrix(sample(12), nrow=4)
#m <- matrix(sample(16), nrow=4)
m
[,1] [,2] [,3]
[1,] 9 2 6
[2,] 4 5 11
[3,] 7 3 12
[4,] 1 8 10
code:
library(lpSolve)
nr <- nrow(m)
nc <- ncol(m)
#create the indicator matrix for column indexes
colmat <- data.table::shift(c(rep(1, nr), rep(0, (nc-1)*nr)), seq(0, by=nr, length.out=nc), fill=0)
#create indicator matrix for row indexes
rowmat <- data.table::shift(rep(c(1, rep(0, nr-1)), nc), 0:(nr-1), fill=0)
A <- do.call(rbind, c(colmat, rowmat))
#call lp solver
res <- lp("max",
as.vector(m),
A,
rep("<=", nrow(A)),
rep(1, nrow(A)),
all.bin=TRUE,
num.bin.solns=3)
sample output:
which(matrix(res$solution[1:ncol(A)], nrow=nr)==1L, arr.ind=TRUE)
row col
[1,] 1 1
[2,] 4 2
[3,] 3 3
2)
And the above leads to an greedy heuristics approach to pick the largest element and eliminate the chosen row and column and then repeat on the smaller matrix:
v <- integer(min(nc, nr))
allix <- matrix(0, nrow=length(v), ncol=2)
for (k in seq_along(v)) {
ix <- which(m == max(m), arr.ind=TRUE)
allix[k,] <- ix
v[k] <- m[ix]
m <- m[-ix[1], -ix[2], drop=FALSE]
}
v
#[1] 12 9 8
But this does not lead to multiple solutions and hence not developing further to extract indices.
I'm trying to name a vector with only a single column, i.e. say I have
vector<-c(1,2,3,4)
I want to name a single column of (1,2,3,4) as "a", i.e. I want something like:
a
1
2
3
4
If I try
colnames(vector)<- c("a")
It gives me output:
Error in `colnames<-`(`*tmp*`, value = "a") :
attempt to set 'colnames' on an object with less than two dimensions
If I try
names(vector)<- c("a")
Vector is named as
a <NA> <NA> <NA>
1 2 3 4
My question is if such a vector is allowed in R? Specifically, is this allowed without using a matrix or data.frame or any other such class which can store more than one columns? If yes, how do I create it?
If you want something with a column name and that will print in the column format then use a single column matrix or data.frame:
vector <- matrix( c(1,2,3,4), dimnames=list(NULL, "a") )
vector <- data.frame( a=c(1,2,3,4) )
There is a 1d object type but rather confusingly it requires that the assignment of a single dimension value to be its length. See:
?dim
dim(vector)=1L
Error in dim(vector) = 1L :
dims [product 1] do not match the length of object [4]
> dim(vector)=4L
> vector
[1] 1 2 3 4
> str(vector)
num [1:4(1d)] 1 2 3 4
Actually the dim function help page doesn't appear to document the requirement that the product of the dim-result will equal the length. My guess is that your homework assignment was intended to get you to read the dim help page and then discover (as I just did) that a one-d object is possible but a bit confusing.
As it turns out the distinction between row and column vectors is not enforced:
> vector %*% matrix(1:16,4)
[,1] [,2] [,3] [,4]
[1,] 30 70 110 150
> t(vector) %*% matrix(1:16,4)
[,1] [,2] [,3] [,4]
[1,] 30 70 110 150
> t(vector) %*% matrix(1:16,4) %*% vector
[,1]
[1,] 1100
> vector %*% matrix(1:16,4) %*% vector
[,1]
[1,] 1100
I have the following matrices :
> matrix <- matrix(c(1,3,4,NA,NA,NA,3,0,4,6,0,NA,2,NA,NA,2,0,1,0,0), nrow=5,ncol=4)
> n <- matrix(c(1,2,5,6,2),nrow=5,ncol=1)
As you can see, for each rows I have
multiple NAs - the number NAs is undefined
ONE single "0"
I would like to subset the 0 for the values of the n. Intended output below.
> output <- matrix(c(1, 3, 4,NA,NA,NA,3,5,4,6,1,NA,2,NA,NA,2,2,1,6,2), nrow=5,ncol=4)
I have tried the following
subset <- matrix == 0 & !is.na(matrix)
matrix[subset] <- n
#does not give intended output, but subset locates the values i want to change
When used on my "real" data i get the following message :
Warning message: In m[subset] <- n : number of items to replace is not
a multiple of replacement length
Thanks
EDIT : added a row to the matrix, as my real life problem is with an unbalanced matrix. I am using Matrices and not DF here, because i think (not sure)that with very large datasets, R is quicker with large matrices rather than subsets of dataframes.
We can do this using
out1 <- matrix+n[row(matrix)]*(matrix==0)
identical(output, out1)
#[1] TRUE
It appears you want to replace the values by row, but subsetting is replacing the values by column (and maybe that's not a completely thorough explanation). Transposing the matrix will get the desired output:
matrix <- t(matrix)
subset <- matrix == 0 & !is.na(matrix)
matrix[subset] <- n
matrix <- t(matrix)
setequal(output, matrix)
[1] TRUE
You can try this option with ifelse:
ifelse(matrix == 0, c(n) * (matrix == 0), matrix)
# [,1] [,2] [,3] [,4]
#[1,] 1 NA 1 2
#[2,] 3 NA NA 2
#[3,] 4 3 5 NA
#[4,] NA 6 NA 2
zero = matrix == 0
identical(ifelse(zero, c(n) * zero, matrix), output)
# [1] TRUE