printing matrices and vectors side by side - r

For tutorial purposes, I'd like to be able to print or display matrices and vectors side-by-side, often to illustrate the result of a matrix equation, like $A x = b$.
I could do this using SAS/IML, where the print statement takes an arbitrary collection of (space separated) expressions, evaluates them and prints the result, e.g.,
print A ' * ' x '=' (A * x) '=' b;
A X #TEM1001 B
1 1 -4 * 0.733 = 2 = 2
1 -2 1 -0.33 1 1
1 1 1 -0.4 0 0
Note that quoted strings are printed as is.
I've searched, but can find nothing like this in R. I imagine something like this could be done by a function showObj(object, ...) taking its list of arguments, formatting each to a block of characters, and joining them side-by-side.
Another use of this would be a compact way of displaying a 3D array as the side-by-side collection of its slices.
Does this ring a bell or does anyone have a suggestion for getting started?

I have created a very simple function that can print matrices and vectors with arbitrary character strings (typically operators) in between. It allows for matrices with different numbers of rows and treats vectors as column matrices. It is not very elaborate, so I fear there are many examples where it fails. But for an example as simple as the one in your question, it should be enough.
format() is used to convert the numbers to characters. This has the advantage that all the rows of the matrix have the same width and are thus nicely aligned when printed. If needed, you could add some of the arguments of format() also as arguments mat_op_print() to make the configurable. As an example, I have added the argument width that can be used to control the minimal width of the columns.
If the matrices and vectors are name in the function call, these names are printed as headers in the first line. Otherwise, only the numbers are printed.
So, this is the function:
mat_op_print <- function(..., width = 0) {
# get arguments
args <- list(...)
chars <- sapply(args, is.character)
# auxilliary function to create character of n spaces
spaces <- function(n) paste(rep(" ", n), collapse = "")
# convert vectors to row matrix
vecs <- sapply(args, is.vector)
args[vecs & !chars] <- lapply(args[vecs & !chars], function(v) matrix(v, ncol = 1))
# convert all non-characters to character with format
args[!chars] <- lapply(args[!chars], format, width = width)
# print names as the first line, if present
arg_names <- names(args)
if (!is.null(arg_names)) {
get_title <- function(x, name) {
if (is.matrix(x)) {
paste0(name, spaces(sum(nchar(x[1, ])) + ncol(x) - 1 - nchar(name)))
} else {
spaces(nchar(x))
}
}
cat(mapply(get_title, args, arg_names), "\n")
}
# auxiliary function to create the lines
get_line <- function(x, n) {
if (is.matrix(x)) {
if (nrow(x) < n) {
spaces(sum(nchar(x[1, ])) + ncol(x) - 1)
} else {
paste(x[n, ], collapse = " ")
}
} else if (n == 1) {
x
} else {
spaces(nchar(x))
}
}
# print as many lines as needed for the matrix with most rows
N <- max(sapply(args[!chars], nrow))
for (n in 1:N) {
cat(sapply(args, get_line, n), "\n")
}
}
And this is an example of how it works:
A = matrix(c(0.5, 1, 3, 0.75, 2.8, 4), nrow = 2)
x = c(0.5, 3.7, 2.3)
y = c(0.7, -1.2)
b = A %*% x - y
mat_op_print(A = A, " * ", x = x, " - ", y = y, " = ", b = b, width = 6)
## A x y b
## 0.50 3.00 2.80 * 0.5 - 0.7 = 17.090
## 1.00 0.75 4.00 3.7 -1.2 13.675
## 2.3
Also printing the slices of a 3-dimensional array side-by-side is possible:
A <- array(1:12, dim = c(2, 2, 3))
mat_op_print(A1 = A[, , 1], " | ", A2 = A[, , 2], " | ", A3 = A[, , 3])
## A1 A2 A3
## 1 3 | 5 7 | 9 11
## 2 4 6 8 10 12

Related

Faster ways to generate Yellowstone sequence (A098550) in R?

I just saw a YouTube video from Numberphile on the Yellowstone sequence (A098550). It's base on a sequence starting with 1 and 2, with subsequent terms generated by the rules:
no repeated terms
always pick the lowest integer
gcd(a_n, a_(n-1)) = 1
gcd(a_n, a_(n-2)) > 1
The first 15 terms would be: 1 2 3 4 9 8 15 14 5 6 25 12 35 16 7
A Q&D approach in R could be something like this, but understandably, this becomes very slow at attempts to make longer sequences. It also make some assumptions about the highest number that is possible within the sequence (as info: the sequence of 10,000 items never goes higher than 5000).
What can we do to make this faster?
library(DescTools)
a <- c(1, 2, 3)
p <- length(a)
# all natural numbers
all_ints <- 1:5000
for (n in p:1000) {
# rule 1 - remove all number that are in sequence already
next_a_set <- all_ints[which(!all_ints %in% a)]
# rule 3 - search the remaining set for numbers that have gcd == 1
next_a_option <- next_a_set[which(
sapply(
next_a_set,
function(x) GCD(a[n], x)
) == 1
)]
# rule 4 - search the remaining number for gcd > 1
next_a <- next_a_option[which(
sapply(
next_a_option,
function(x) GCD(a[n - 1], x)
) > 1
)]
# select the lowest
a <- c(a, min(next_a))
n <- n + 1
}
Here's a version that's about 20 times faster than yours, with comments about the changes:
# Set a to the final length from the start.
a <- c(1, 2, 3, rep(NA, 997))
p <- 3
# Define a vectorized gcd() function. We'll be testing
# lots of gcds at once. This uses the Euclidean algorithm.
gcd <- function(x, y) { # vectorized gcd
while (any(y != 0)) {
x1 <- ifelse(y == 0, x, y)
y <- ifelse(y == 0, 0, x %% y)
x <- x1
}
x
}
# Guess at a reasonably large vector to work from,
# but we'll grow it later if not big enough.
allnum <- 1:1000
# Keep a logical record of what has been used
used <- c(rep(TRUE, 3), rep(FALSE, length(allnum) - 3))
for (n in p:1000) {
# rule 1 - remove all number that are in sequence already
# nothing to do -- used already records that.
repeat {
# rule 3 - search the remaining set for numbers that have gcd == 1
keep <- !used & gcd(a[n], allnum) == 1
# rule 4 - search the remaining number for gcd > 1
keep <- keep & gcd(a[n-1], allnum) > 1
# If we found anything, break out of this loop
if (any(keep))
break
# Otherwise, make the set of possible values twice as big,
# and try again
allnum <- seq_len(2*length(allnum))
used <- c(used, rep(FALSE, length(used)))
}
# select the lowest
newval <- which.max(keep)
# Assign into the appropriate place
a[n+1] <- newval
# Record that it has been used
used[newval] <- TRUE
}
If you profile it, you'll see it spends most of its time in the gcd() function. You could probably make that a lot faster by redoing it in C or C++.
The biggest change here is pre-allocation and restricting the search to numbers that have not yet been used.
library(numbers)
N <- 5e3
a <- integer(N)
a[1:3] <- 1:3
b <- logical(N) # which numbers have been used already?
b[1:3] <- TRUE
NN <- 1:N
system.time({
for (n in 4:N) {
a1 <- a[n - 1L]
a2 <- a[n - 2L]
for (k in NN[!b]) {
if (GCD(k, a1) == 1L & GCD(k, a2) > 1L) {
a[n] <- k
b[k] <- TRUE
break
}
}
if (!a[n]) {
a <- a[1:(n - 1L)]
break
}
}
})
#> user system elapsed
#> 1.28 0.00 1.28
length(a)
#> [1] 1137
For a fast C++ algorithm, see here.

Find maximum depth of nested parenthesis in a string R code

I have a problem to finish this R code. We are given a string having parenthesis like below
“( ((X)) (((Y))) )”
We need to find the maximum depth of balanced parenthesis, like 4 in above example. Since ‘Y’ is surrounded by 4 balanced parenthesis.
If parenthesis are unbalanced then return -1
My code looks like this:
current_max = 0
max = 0
def = function (S){
n=S
for (i in nchar(n))
if (is.element('(',n[i]))
{
current_max <- current_max + 1
}
if (current_max > max)
{
max <- current_max
}
else if (is.element(')',n[i]))
{
if (current_max > 0)
{
current_max <- current_max - 1
}
else
{
return -1
}
}
if (current_max != 0)
{
return -1
}
return (max)
}
but when i call function def("(A((B)))") answer should be 2. But every time it shows 0 even when the parenthesis is unbalanced. Im not sure if the code is correct or where is the mistake. Im trying to learn R so be patient with me. Thanks
If x <- "( ((X)) (((Y))) )", then remove all of the non-parentheses and split into characters...
y <- unlist(strsplit(gsub("[^\\(\\)]", "", x), ""))
y
[1] "(" "(" "(" ")" ")" "(" "(" "(" ")" ")" ")" ")"
and then the maximum nesting is the highest cumulative sum of +1 (for () and -1 (for ))...
z <- max(cumsum(ifelse(y=="(", 1, -1)))
z
[1] 4
If the parentheses are unbalanced then sum(ifelse(y=="(", 1, -1))) will not equal zero.
Here are three solutions. They are all vectorized, i.e. the input x can be a character vector, and they all handle the case of no parentheses properly.
1) strapply/proto strapply in the gsubfn packages matches the regular expression given as the second argument running the function fun in the proto object p which should also be passed to strapply. The pre function in p initializes the calculation for each component of the input x. The proto object can be used to retain memory of past matches (here lev is the nesting level) allowing counting to be done. We append an arbitrary character, here "X" to each string to ensure that there is always at least one match. If we knew there were no zero length character string inputs this could be omitted. The sapply uses Max which takes the maximum of the returned depths or returns -1 if no balance.
library(gsubfn) # also pulls in proto
# test input
x <- c("(A((B)))", "((A) ((())) (B))", "abc", "", "(A)((B)", "(A(B)))")
p <- proto(pre = function(.) .$lev <- 0,
fun = function(., x) .$lev <- .$lev + (x == "(") - (x == ")") )
Max <- function(x) if (tail(x, 1) == 0 && min(x) == 0) max(x) else -1
sapply(strapply(paste(x, "X"), ".", p), Max)
## [1] 3 4 0 0 -1 -1
2) Reduce This is a base solution. It makes use of Max from (1).
fun <- function(lev, char) lev + (char == "(") - (char == ")")
sapply(x, function(x) Max(Reduce(fun, init = 0, unlist(strsplit(x, "")), acc = TRUE)))
(A((B))) ((A) ((())) (B)) abc
3 4 0 0
(A)((B) (A(B)))
-1 -1
3) strapply/list Another possibility is to extract the parentheses and return with +1 or -1 for ( and ) using strapply with a replacement list. Then run cumsum and Max (from above) over that.
library(gsubfn)
fn$sapply(strapply(x, "[()]", list("(" = +1, ")" = -1), empty = 0), ~ Max(cumsum(x)))
## [1] 3 4 0 0 -1 -1

Round only elements exceeding 1

I'm trying to run a simple custom function test that takes a vector x and rounds the contents if the entry is greater than 1, returning a new vector. For instance, for input x <- c(.5, .2, 1.6, 7.9) I would expect test(x) to output [1] 0.5 0.2 2 8.
However, my code either returns 1 entry or only alters 1 entry and leaves the rest of the output blank.
test <- function(x) {
nr <- nrow(x)
e <- numeric(nr)
for (i in x) {
if(i > 1) {
e <- round(i, 1)
}
else {
e <- i
}
}
return(e)
}
How do I iterate through a predetermined list instead of a range of integers?
Instead of performing a for loop with an embedded if statement, it would be more efficient (both in terms of runtime and amount of code written) to use ifelse. You would transform your entire function to a single line of code:
test <- function(x) ifelse(x > 1, round(x, 0), x)
Here it is in action:
x <- c(.5, .2, 1.6, 7.9)
test(x)
# [1] 0.5 0.2 2.0 8.0

R: deep copy a function argument

Consider the following code
i = 3
j = i
i = 4 # j != i
However, what I want is
i = 3
f <- function(x, j=i)
x * j
i = 4
f(4) # 16, but i want it to be 12
In case you are wondering why I want to do this you could consider this code - the application is a multiple decrements model. The diagonals of a transition matrix are the sum of the other decrements in that row. I would like to define the decrements I need than calculate the other functions using those decrements. In this case, I only need uxt01 and uxt10 and from these I want to produce the functions uxt00 and uxt11. I wanted something that scales to higher dimensions.
Qxt <- matrix(c(uxt00=function(t=0,x=0) 0,
uxt01=function(t=0,x=0) 0.05,
uxt10=function(t=0,x=0) 0.07
uxt11=function(t=0,x=0) 0), 2, 2, byrow=TRUE)
Qxt.diag <- function(Qxt) {
ndecrements <- length(Qxt[1,])
for(index in seq(1, N, N+1)) { # 1, 4
Qxt[[index]] <- function(t=0, x=0, i=index, N=ndecrements) {
row <- ceiling(index/ndecr)
row.decrements <- seq( (row - 1)*N + 1, (row)*N)
other.decrements <- row.decrements[which(row.decrements != i]
-sum(unlist(lapply(Qxt.fns[[other.decrements]],
function(f) f(t,x))))
}
}
Qxt.fns
}
This can be done by assigning the default expression for the formal parameter j manually, after creating the function:
i <- 3;
f <- function(x,j) x*j;
f;
## function(x,j) x*j
formals(f);
## $x
##
##
## $j
##
##
formals(f)$j <- i;
f;
## function (x, j = 3)
## x * j
formals(f);
## $x
##
##
## $j
## [1] 3
##
i <- 4;
f(4);
## [1] 12
This is only possible because R is an awesome language and provides you complete read/write access to all three special properties of functions, which are:
The parse tree that comprises the body: body().
The formal parameters and their default values (which are themselves parse trees): formals().
The enclosing environment (which is used to implement closures): environment().
Assign it to a different variable if you want to reuse i:
default_value = i
f = function(x, j = default_value)
x * j
i = 4
f(4) # 12
of course, you should not let this variable just lying around — that’s as bad as the original code. You can make it “private” to the function, though, by defining both together in a local environment:
f = local({
default_value = i
function(x, j = default_value)
x * j
})

Matching patterns in a matrix

My data looks like this:
S
0101001010000000000000000100111100000000000011101100010101010
1001010000000001100000000100000000000100000010101110101010010
1101010101010010000000000100000000100101010010110101010101011
0000000000000000001000000111000110000000000000000000000000000
the S indicates the column from which I am talking. It is col 26. All four rows share a 1 at that position.
I would need to be able to count for each row from 2 to 4:
How many columns to the left and right are the same as row 1?
For row 2 it would be 3 to the right (as it reaches 1/0) and 8 to the left (as it reaches 0/1).
The result for every row should be entered into a matrix like this:
row2 8 3
row3 11 9
Is there a fast and efficient way to do that? The matrix I am dealing with is very large.
If you need something fast, you could use Rcpp:
mat <- as.matrix(read.fwf(textConnection("0101001010000000000000000100111100000000000011101100010101010
1001010000000001100000000100000000000100000010101110101010010
1101010101010010000000000100000000100101010010110101010101011
0000000000000000001000000111000110000000000000000000000000000"), widths = rep(1, 61)))
library(Rcpp)
cppFunction('
IntegerMatrix countLR(const LogicalMatrix& mat, const int S) {
const int nr(mat.nrow()), nc(mat.ncol());
IntegerMatrix res(nr - 1, 2);
for(int i=1; i<nr;i++){
for(int j=S-2; j>=0;j--) {
if (mat(0,j) != mat(i,j)) break;
else res(i-1,0)++;
}
for(int j=S; j<nc;j++) {
if (mat(0,j) != mat(i,j)) break;
else res(i-1,1)++;
}
}
return(res);
}' )
countLR(mat, 26)
# [,1] [,2]
#[1,] 8 2
#[2,] 10 2
#[3,] 6 0
I assumed that column 26 itself doesn't count for the result. I also assumed that the matrix can only contain 0/1 (i.e., boolean) values. Adjust as needed.
It's pretty easy with strsplit and rle to pull apart and assemble this data:
> S <- scan(what="") #input of character mode
1: 0101001010000000000000000100111100000000000011101100010101010
2: 1001010000000001100000000100000000000100000010101110101010010
3: 1101010101010010000000000100000000100101010010110101010101011
4: 0000000000000000001000000111000110000000000000000000000000000
5:
s2 <- strsplit(S, split="")
sapply(s2, "[[", 26) # verify the 26th position is all ones
#[1] "1" "1" "1" "1"
#length of strings from 26th postion to right
rtlen <- length(s2[[1]])-(26-1)
# Pick from the `rle` $values where values TRUE
rle( tail( s2[[1]] == s2[[2]], rtlen) )
Run Length Encoding
lengths: int [1:11] 3 4 5 1 7 1 4 1 1 6 ...
values : logi [1:11] TRUE FALSE TRUE FALSE TRUE FALSE ...
Now that you have an algorithm for a single instance, you can iterate of the rest of the items in s2. To do the backwards look I just did the same operation on a rev-ersed section of the strings.
m<-matrix(NA, 3,2);
for (i in 2:4) { m[i-1,2] <- rle(tail( s2[[1]] == s2[[i]], rtlen) )$lengths[1]
m[i-1, 1] <- rle( rev( head( s2[[1]] == s2[[i]], 26)) )$lengths[1] }
m
[,1] [,2]
[1,] 9 3 # I think you counted wrong
[2,] 11 3
[3,] 7 1
Notice that I was comparing each one to the first row and your results suggest you were doing something else...perhaps comparing to the row above. That could easily be done instead with only a very small mod to the code indices for choice of the comparison vector:
m<-matrix(NA, 3,2);
for (i in 2:4) { m[i-1,2] <- rle(tail( s2[[i-1]] == s2[[i]], rtlen) )$lengths[1]
m[i-1, 1] <- rle( rev( head( s2[[i-1]] == s2[[i]], 26)) )$lengths[1] }
m
[,1] [,2]
[1,] 9 3
[2,] 9 9 #Again I think you may have miscounted. Easy to do, eh?
[3,] 7 1
This problem intrigued me. Since the matrix is binary, it's far more efficient to pack the matrix into a raw matrix than it is to use sparse matrices. It means that the storage for a 1,000 x 21,000,000 pattern matrix is approx. 2.4 GiB (print(object.size(raw(1000 * 21000000 / 8)), units = "GB")).
The following should be a relatively efficient way to tackle the problem. The Rcpp code takes a raw matrix which indicates the differences between the first row of the original matrix and the other rows. For efficiency in the R code, it's actually arranged with the patterns in columns rather than rows. The other functions help to convert existing sparse or regular matrices into packed ones and to read a matrix directly from a file.
library("Rcpp")
library("Matrix")
writeLines("0101001010000000000000000100111100000000000011101100010101010
1001010000000001100000000100000000000100000010101110101010010
1101010101010010000000000100000000100101010010110101010101011
0000000000000000001000000111000110000000000000000000000000000", "example.txt")
cppFunction('
IntegerMatrix countLRPacked(IntegerMatrix mat, long S) {
long l = S - 2;
long r = S;
long i, cl, cr;
int nr(mat.nrow()), nc(mat.ncol());
IntegerMatrix res(nc, 2);
for(int i=0; i<nc;i++){
// First the left side
// Work out which byte is the first to have a 1 in it
long j = l >> 3;
int x = mat(j, i) & ((1 << ((l & 7) + 1)) - 1);
long cl = l & 7;
while(j > 0 && !x) {
j --;
x = mat(j, i);
cl += 8;
}
// Then work out where the 1 is in the byte
while (x >>= 1) --cl;
// Now the right side
j = r >> 3;
x = mat(j, i) & ~((1 << ((r & 7))) - 1);
cr = 8 - (r & 7);
while(j < (nr-1) && !x) {
j ++;
x = mat(j, i);
cr += 8;
}
cr--;
while (x = (x << 1) & 0xff) --cr;
res(i, 0) = cl;
res(i, 1) = cr;
}
return(res);
}')
# Reads a binary matrix from file or character vector
# Borrows the first bit of code from read.table
readBinaryMatrix <- function(file = NULL, text = NULL) {
if (missing(file) && !missing(text)) {
file <- textConnection(text)
on.exit(close(file))
}
if (is.character(file)) {
file <- file(file, "rt")
on.exit(close(file))
}
if (!inherits(file, "connection"))
stop("'file' must be a character string or connection")
if (!isOpen(file, "rt")) {
open(file, "rt")
on.exit(close(file))
}
lst <- list()
i <- 1
while(length(line <- readLines(file, n = 1)) > 0) {
lst[[i]] <- packRow(as.integer(strsplit(line, "", fixed = TRUE)[[1]]))
i <- i + 1
}
do.call("cbind", lst)
}
# Converts a binary integer vector into a packed raw vector,
# padding out at the end to make the input length a multiple of 8
packRow <- function(row) {
packBits(as.raw(c(row, rep(0, (8 - length(row)) %% 8 ))))
}
# Converts a binary integer matrix to a packed raw matrix
# Note the matrix is transposed (makes the subsequent xor more efficient)
packMatrix <- function(mat) {
stopifnot(class(mat) %in% c("matrix", "dgCMatrix"))
apply(mat, 1, packRow)
}
# Takes either a packed raw matrix or a binary integer matrix, uses xor to compare all the first row
# with the others and then hands it over to the Rcpp code for processing
countLR <- function(mat, S) {
stopifnot(class(mat) %in% c("matrix", "dgCMatrix"))
if (storage.mode(mat) != "raw") {
mat <- packMatrix(mat)
}
stopifnot(8 * nrow(mat) > S)
y <- xor(mat[, -1, drop = FALSE], mat[, 1, drop = TRUE])
countLRPacked(y, S)
}
sMat <- Matrix(as.matrix(read.fwf("example.txt", widths = rep(1, 61))))
pMat <- readBinaryMatrix("example.txt")
countLR(sMat, 26)
countLR(pMat, 26)
You should note that the width of the pattern matrix is right-padded to a multiple of 8, so if the patterns match all the way to the right hand side this will result in the right hand count being possibly a bit high. This could be corrected if need be.
Slow R version to do this (moved from duplicate):
countLR <- function(mat, S) {
mat2 <- mat[1, ] != t(mat[-1, , drop = FALSE])
l <- apply(mat2[(S - 1):1, ], 2, function(x) which(x)[1] - 1)
l[is.na(l)] <- S - 1
r <- apply(mat2[(S + 1):nrow(mat2), ], 2, function(x) which(x)[1] - 1)
r[is.na(l)] <- ncol(mat) - S
cbind(l, r)
}

Resources