Consistently subset matrix to a vector and avoid colnames? - r

I would like to know if there is R syntax to extract a column from a matrix and always have no name attribute on the returned vector (I wish to rely on this behaviour).
My problem is the following inconsistency:
when a matrix has more than one row and I do myMatrix[, 1] I will get the first column of myMatrix with no name attribute. This is what I want.
when a matrix has exactly one row and I do myMatrix[, 1], I will get the first column of myMatrix but it has the first colname as its name.
I would like to be able to do myMatrix[, 1] and consistently get something with no name.
An example to demonstrate this:
# make a matrix with more than one row,
x <- matrix(1:2, nrow=2)
colnames(x) <- 'foo'
# foo
# [1,] 1
# [2,] 2
# extract first column. Note no 'foo' name is attached.
x[, 1]
# [1] 1 2
# now suppose x has just one row (and is a matrix)
x <- x[1, , drop=F]
# extract first column
x[, 1]
# foo # <-- we keep the name!!
# 1
Now, the documentation for [ (?'[') mentions this behaviour, so it's not a bug or anything (although, why?! why this inconsistency?!):
A vector obtained by matrix indexing will be unnamed unless ‘x’ is one-dimensional when the row names (if any) will be indexed to provide names for the result.
My question is, is there a way to do x[, 1] such that the result is always unnamed, where x is a matrix?
Is my only hope unname(x[, 1]) or is there something analogous to ['s drop argument? Or is there an option I can set to say "always unname"? Some trick I can use (somehow override ['s behaviour when the extracted result is a vector?)

Update on why the code below works (as far as I can tell)
Subsetting with [ is handled using functions contained in the R source file subset.c in ~/src/main. When using matrix indexing to subset a matrix, the function VectorSubset is called. When there is more than one index used (i.e., one each for rows and columns as in x[,1]), then MatrixSubset is called.
The function VectorSubset only assigns names to 1-dimensional arrays being subsetted. Since a matrix is a 2-D array, no names are assigned to the result when using matrix indexing. The function MatrixSubset, however, does attempt to pass on dimnames under certain circumstances.
Therefore, the matrix indexing you refer to in the quote from the help page seems to be the key:
x <- matrix(1)
colnames(x) <- "foo"
x[, 1] ## 'Normal' indexing
# foo
# 1
x[matrix(c(1, 1), ncol = 2)] ## Matrix indexing
# [1] 1
And with a wider 1-row matrix:
xx <- matrix(1:10, nrow = 1)
colnames(xx) <- sprintf('foo%i', seq_len(ncol(xx)))
xx[, 6] ## 'Normal' indexing
# foo6
# 6
xx[matrix(c(1, 6), ncol = 2)] ## Matrix indexing
# [1] 6
With a matrix with both dimensions > 1:
yy <- matrix(1:10, nrow = 2, dimnames = list(NULL,
sprintf('foo%i', 1:5)))
yy[cbind(seq_len(nrow(yy)), 3)] ## Matrix indexing
# [1] 5 6

Related

Can I further vectorize this function

I am relatively new to R, and matrix-based scripting languages in general. I have written this function to return the index's of each row which has a content similar to any another row's content. It is a primitive form of spam reduction that I am developing.
if (!require("RecordLinkage")) install.packages("RecordLinkage")
library("RecordLinkage")
# Takes a column of strings, returns a list of index's
check_similarity <- function(x) {
threshold <- 0.8
values <- NULL
for(i in 1:length(x)) {
values <- c(values, which(jarowinkler(x[i], x[-i]) > threshold))
}
return(values)
}
is there a way that I could write this to avoid the for loop entirely?
We can simplify the code somewhat using sapply.
# some test data #
x = c('hello', 'hollow', 'cat', 'turtle', 'bottle', 'xxx')
# create an x by x matrix specifying which strings are alike
m = sapply(x, jarowinkler, x) > threshold
# set diagonal to FALSE: we're not interested in strings being identical to themselves
diag(m) = FALSE
# And find index positions of all strings that are similar to at least one other string
which(rowSums(m) > 0)
# [1] 1 2 4 5
I.e. this returns the index positions of 'hello', 'hollow', 'turtle', and 'bottle' as being similar to another string
If you prefer, you can use colSums instead of rowSums to get a named vector, but this could be messy if the strings are long:
which(colSums(m) > 0)
# hello hollow turtle bottle
# 1 2 4 5

Subsetting a matrix created from a list of lists

I am having some trouble manipulating a matrix that I have created from a list of lists. I don't really understand why the resulting matrix doesn't act like a normal matrix. That is, I expect when I subset a column for it to return a vector, but instead I get a list. Here is a working example:
x = list()
x[[1]] = list(1, 2, 3)
x[[2]] = list(4, 5, 6)
x[[3]] = list(7, 8, 9)
y = do.call(rbind, x)
y
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
y is in the format that I expect. Ultimately I will have a list of these matrices that I want to average, but I keep getting an error which appears to be due to the fact that when you subset this matrix you get lists instead of vectors.
y[,1]
[[1]]
[1] 1
[[2]]
[1] 4
[[3]]
[1] 7
Does anyone know a) why this is happening? and b) How I could avoid / solve this problem?
Thanks in advance
This is just another problem with "matrix of list". You need
do.call(rbind, lapply(x, unlist))
or even simpler:
matrix(unlist(x), nrow = length(x), byrow = TRUE)
If you need some background reading, see: Why a row of my matrix is a list. That is a more complex case than yours.
It looks like the problem is due to x being a list of lists, rather than a list of vectors. This is not great, but it'll work:
y = do.call(rbind, lapply(x, unlist))
do.call passes the argument for each list separately, so you are really just binding your three lists together. This explains why they are in list format when you call the elements.
Unlist x and then use sapply to create your matrix. Since R defaults to filling columns first, you'll need to transpose it to get your desired matrix.
y <- t(sapply(x, unlist))

R: Expand a vector of matrix column numbers into a matrix with those columns filled

I have two vectors in R and want to generate a new matrix based on them.
a=c(1,2,1,2,3) # a[1] is 1: thus row 1, column 1 should be equal to...
b=c(10,20,30,40,50) # ...b[1], or 10.
I want to produce matrix 'v' BUT without my 'for' loop through columns of v and my multiplication:
v = as.data.frame(matrix(0,nrow=length(a),ncol=length(unique(a))))
for(i in 1:ncol(v)) v[[i]][a==i] <- 1 # looping through columns of 'v'
v <- v*b
I am sure there is a fast/elegant way to do it in R. At least of expanding 'a' into the earlier version of 'v' (before its multiplication by 'b').
Thanks a lot!
This is one way that sparse matrices can be defined.
Matrix::sparseMatrix(i = seq_along(a), j = a, x = b)
# Setup the problem:
set.seed(4242)
a <- sample(1:100, 1000000, replace = TRUE)
b <- sample(1:500, length(a), replace = TRUE)
# Start the timer
start.time <- proc.time()[3]
# Actual code
# We use a matrix instead of a data.frame
# The number of columns matches the largest column index in vector "a"
v <- matrix(0,nrow=length(a), ncol= max(a))
v[cbind(seq_along(a), a)] <- b
# Show elapsed time
stop.time <- proc.time()[3]
cat("elapsed time is: ", stop.time - start.time, "seconds.\n")
# For a million rows and a hundred columns, my prehistoric
# ... laptop says: elapsed time is: 2.597 seconds.
# these checks take much longer to run than the function itself
# Make sure the modified column in each row matches vector "a"
stopifnot(TRUE == all.equal(a, apply(v!=0, 1, which)))
# Make sure the modified value in each row equals vector "b"
stopifnot(TRUE == all.equal(rowSums(v), b))

colMeans of a sparse matrix times a matrix stored as the last element of a list

I want to get the column means for the last list element, which is a sparse matrix multiplied times a regular matrix. Whenever I use colMeans, however, I get an error. For example:
# Use the igraph package to create a sparse matrix
library(igraph)
my.lattice <- get.adjacency(graph.lattice(length = 5, dim = 2))
# Create a conformable matrix of TRUE and FALSE values
start <- matrix(sample(c(TRUE, FALSE), 50, replace = T), ncol = 2)
# Multiply the matrix times the vector, and save the results to a list
out <- list()
out[[1]] <- my.lattice %*% start
out[[2]] <- my.lattice %*% out[[1]]
# Try to get column means of the last element
colMeans(tail(out, 1)[[1]]) # Selecting first element because tail creates a list
# Error in colMeans(tail(out, 1)[[1]]) :
# 'x' must be an array of at least two dimensions
# But tail(out, 1)[[1]] seems to have two dimensions
dim(tail(out, 1)[[1]])
# [1] 25 2
Any idea what's causing this error, or what I can do about it?
It looks like explicitly calling the colMeans function from the Matrix package works:
> Matrix::colMeans(tail(out, 1)[[1]])
# [1] 4.48 5.48
Thanks to user20650 for this suggestion.

Select elements from a matrix in R all at once

I have a row vector and a column vector say c(1,2), c(7,100). I want to extract (1,7), (2,100).
Out, I find Matrix[row, column] will return a cross-product thing not just a vector of two numbers.
What should I do?
You want to exploit the feature that if m is a matrix containing the row/col indices required, then subsetting by passing m as argument i of [ gives the desired behaviour. From ?'['
i, j, ...: indices specifying elements to extract or replace.
.... snipped ....
When indexing arrays by ‘[’ a single argument ‘i’ can be a
matrix with as many columns as there are dimensions of ‘x’;
the result is then a vector with elements corresponding to
the sets of indices in each row of ‘i’.
Here is an example
rv <- 1:2
cv <- 3:4
mat <- matrix(1:25, ncol = 5)
mat[cbind(rv, cv)]
R> cbind(rv, cv)
rv cv
[1,] 1 3
[2,] 2 4
R> mat[cbind(rv, cv)]
[1] 11 17
You can use 2 column subsetting matrices inside [:
mx <- matrix(1:200, nrow=2)
mx[cbind(c(1, 2), c(7, 100))]
produces:
[1] 13 200

Resources