Solving underdetermined linear systems with R - r

R can solve underdetermined linear systems:
A = matrix((1:12)^2,3,4,T)
B = 1:3
qr(A)$rank # 3
qr.solve(A, B) # solutions will have one zero, not necessarily the same one
# 0.1875 -0.5000 0.3125 0.0000
solve(qr(A, LAPACK = TRUE), B)
# 0.08333333 -0.18750000 0.00000000 0.10416667
(It gives one solution among the infinity of solutions).
However, if the rank (here 2) is lower than the number of rows (here 3), it won't work:
A = matrix(c((1:8)^2,0,0,0,0),3,4,T)
B = c(1,2,0)
A
# [,1] [,2] [,3] [,4]
# [1,] 1 4 9 16
# [2,] 25 36 49 64
# [3,] 0 0 0 0
qr.solve(A, B) # Error in qr.solve(A, B) : singular matrix
solve(qr(A, LAPACK = TRUE), B) # Error in qr.coef(a, b) : error code 3
but this system does have a solution!
I know that the general solution is to use SVD or generalized/pseudo inverse of A (see this question and its answers), but:
Is there a mean with solve or qr.solve to automatically reduce the system AX=B to an equivalent system CX=D of only rank(A) rows, for which qr.solve(C, D) would simply work out-of-the-box?
Example:
C = matrix(c((1:8)^2),2,4,T)
D = c(1,2)
qr.solve(C, D)
# -0.437500 0.359375 0.000000 0.000000

qr.coef along with qr seem to do the job:
(A <- matrix(c((1:8)^2, 0, 0, 0, 0), nrow = 3, ncol = 4, byrow = TRUE))
# [,1] [,2] [,3] [,4]
# [1,] 1 4 9 16
# [2,] 25 36 49 64
# [3,] 0 0 0 0
(B <- c(1, 2, 0))
# [1] 1 2 0
(X0 <- qr.coef(qr(A), B))
# [1] -0.437500 0.359375 NA NA
X0[is.na(X0)] <- 0
X0
# [1] -0.437500 0.359375 0.000000 0.000000
# Verification:
A %*% X0
# [,1]
# [1,] 1
# [2,] 2
# [3,] 0
Second example:
(A<-matrix(c(1, 2, 0, 0, 1, 2, 0, 0, 1, 2, 1, 0), nrow = 3, ncol = 4, byrow = TRUE))
# [,1] [,2] [,3] [,4]
# [1,] 1 2 0 0
# [2,] 1 2 0 0
# [3,] 1 2 1 0
(B<-c(1, 1, 2))
# [1] 1 1 2
qr.solve(A, B)
# Error in qr.solve(A, B) : singular matrix 'a' in solve
(X0 <- qr.coef(qr(A), B))
# [1] 1 NA 1 NA
X0[is.na(X0)] <- 0
X0
# [1] 1 0 1 0
A %*% X0
# [,1]
# [1,] 1
# [2,] 1
# [3,] 2

Related

Extension/Optimisation of code: from one to several iterations

For a matrix of pairwise distances pdm (symmetric), where each row/column represents a point, and a vector of distances r, I will do th following for each point
# some small toy data
# note. real data is bigger, e.g. ~15k points.
pdm <- matrix(data = c(0, 4, 3,
4, 0, 2,
3, 2, 0),
nrow = 3, ncol = 3)
r <- seq(0, 5, .5)
length(r)
#> [1] 11
# index m correspondens to order of points.
m <- c(1, 2, 3)
# change format
pdml <- as.list(as.data.frame(pdm))
# ---- 1
# procedure for first point (1)
a <- list()
for(i in seq_along(r)) {
a[[i]] <- ifelse(0 < pdml[[1]] & pdml[[1]] <= r[i], 1, 0)
a[[i]] <- which(a[[i]] != 0)
# if-statement is needed since which() produces annoying integer(0) entries
if(identical(a[[i]], integer(0))) a[[i]] <- 0
a[[i]] <- sum(m[1] * m[a[[i]]])
}
# change format
do.call(rbind, a)
#> [,1]
#> [1,] 0
#> [2,] 0
#> [3,] 0
#> [4,] 0
#> [5,] 0
#> [6,] 0
#> [7,] 3
#> [8,] 3
#> [9,] 5
#> [10,] 5
#> [11,] 5
# ---- 2
# procedure for second point (2),
# ... adaption: pdml[[2]] and m[2]
Created on 2022-08-09 by the reprex package (v2.0.1)
Desired Output
After the calculation is done, I would like to calc. the average for each distance $r_i$ across all points.
Can somebody please provide a solution to extend my approach to all points or by showing an alternative, which cerntainly is more efficient? Also, any recommendation on how to improve my question is much appreciated.
Note, if it makes things easier, it is, of course, also an option to use the upper/lower half of pdm only.
If you precalculate the possible products you want to sum with tcrossprod(m),
you can simplify the calculation to a couple of matrix operations:
# Input data
m <- c(1, 2, 3)
d <- matrix(
data = c(
0, 4, 3,
4, 0, 2,
3, 2, 0
),
nrow = 3,
ncol = 3
)
r <- seq(0, 5) # Reduced for simplicity
# Possible summands
v <- tcrossprod(m) * (d != 0)
v
#> [,1] [,2] [,3]
#> [1,] 0 2 3
#> [2,] 2 0 6
#> [3,] 3 6 0
# The calculation
a <- sapply(r, \(r) colSums(v * (d <= r)))
a
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0 0 0 3 5 5
#> [2,] 0 0 6 6 8 8
#> [3,] 0 0 6 9 9 9
And since you said you then wanted the mean for each distance, over points:
colMeans(a)
#> [1] 0.000000 0.000000 4.000000 6.000000 7.333333 7.333333
A slightly more obscure but potentially faster way to find a would be
with 3-d arrays:
colSums(outer(v, rep(1, length(r))) * outer(d, r, `<=`))
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0 0 0 3 5 5
#> [2,] 0 0 6 6 8 8
#> [3,] 0 0 6 9 9 9

calculate frequency or percentage matrix in R

if I have the following:
mm <- matrix(0, 4, 3)
mm<-apply(mm, c(1, 2), function(x) sample(c(0, 1), 1))
> mm
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 0
[3,] 0 0 0
[4,] 1 0 1
How do I output a matrix that expresses the frequency or percentage of different columns where both values = 1. For example - there are two rows out of 4 where column 1 and column 2 both equal 1 (=0.5) and 1 row out of 4 where column 2 and column 3 = 1 (=0.25), so in this case I'd need:
[,1] [,2] [,3]
[1,] 1 0.5 0.5
[2,] 0.5 1 0.25
[3,] 0.5 0.25 1
I am not interested in comparing the same columns, so by default the diagonal remains at 1.
I thought I may get somewhere with cor(mm) where there may be a way to output co-frequencies or co-percentages instead of correlation coefficients but this appears to not be the case. But the dimensions of the final output should be an N by N column matrix as cor() outputs:
> cor(mm)
[,1] [,2] [,3]
[1,] 1.0000000 0.5773503 0.5773503
[2,] 0.5773503 1.0000000 0.0000000
[3,] 0.5773503 0.0000000 1.0000000
but obviously these are correlation coefficients, I just want to co-frequencies or co-percentages instead.
A base R solution is using crossprod, i.e.,
r <- `diag<-`(crossprod(mm)/nrow(mm),1)
such that
> r
[,1] [,2] [,3]
[1,] 1.0 0.50 0.50
[2,] 0.5 1.00 0.25
[3,] 0.5 0.25 1.00
DATA
mm <- structure(c(1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1), .Dim = 4:3)
set.seed(123)
mm <- matrix(0, 4, 3)
mm<-apply(mm, c(1, 2), function(x) sample(c(0, 1), 1))
combinations <- expand.grid(1:ncol(mm), 1:ncol(mm))
matrix(unlist(Map(function(x, y) {
if (x == y) {
res <- 1
} else {
res <- sum(mm[, x] * mm[, y]) / nrow(mm)
}
res
}, combinations[, 1], combinations[, 2])), 3)
# [,1] [,2] [,3]
# [1,] 1.00 0.25 0.0
# [2,] 0.25 1.00 0.5
# [3,] 0.00 0.50 1.0

splitting list elements expanding the list

I'm doing some kind of optical character recognition and face the following issue. I store the glyphs in a list of binary matrices and they can be of different size, but their maximum possible width is wid = 3 columns (may be any defined constant, not just 3). In some cases after the first stage of processing I get data which look like this:
myll <- list(matrix(c(0, 0, 0, 1, 1, 0), ncol = 2),
matrix(c(0), ncol = 1),
matrix(c(1, 1, 0), ncol = 3),
matrix(c(1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1), ncol = 7),
matrix(c(1, 1, 1, 1), ncol = 2))
# [[1]]
# [,1] [,2]
# [1,] 0 1
# [2,] 0 1
# [3,] 0 0
#
# [[2]]
# [,1]
# [1,] 0
#
# [[3]]
# [,1] [,2] [,3]
# [1,] 1 1 0
#
# [[4]]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 1 1 1 0 0 0 1
# [2,] 0 1 0 1 0 0 1
# [3,] 1 1 1 1 0 0 1
#
# [[5]]
# [,1] [,2]
# [1,] 1 1
# [2,] 1 1
So, some glyphs may be not separated for some reasons. This happens only with glyphs of maximum possible width. Moreover, there may be some junk at the end of the matrix. I have to split them into matrices of width ncol = wid leaving the last piece (junk) as is. Then I store this matrices in separate elements of list to get the following output:
# [[1]]
# [,1] [,2]
# [1,] 0 1
# [2,] 0 1
# [3,] 0 0
#
# [[2]]
# [,1]
# [1,] 0
#
# [[3]]
# [,1] [,2] [,3]
# [1,] 1 1 0
#
# [[4]]
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 0 1 0
# [3,] 1 1 1
#
# [[5]]
# [,1] [,2] [,3]
# [1,] 0 0 0
# [2,] 1 0 0
# [3,] 1 0 0
#
# [[6]]
# [,1]
# [1,] 1
# [2,] 1
# [3,] 1
#
# [[7]]
# [,1] [,2]
# [1,] 1 1
# [2,] 1 1
At the moment I can make it with the help of this functions
checkGlyphs <- function(gl_m, wid = 3) {
if (ncol(gl_m) > wid)
return(list(gl_m[,1:wid], matrix(gl_m[,-(1:wid)], nrow = nrow(gl_m)))) else
return(gl_m)
}
separateGlyphs <- function(myll, wid = 3) {
require("magrittr")
presplit <- lapply(myll, checkGlyphs, wid)
total_new_length <-
presplit[unlist(lapply(presplit, is.list))] %>% lapply(length) %>% unlist() %>% sum() +
as.integer(!unlist(lapply(presplit, is.list))) %>% sum()
splitted <- vector("list", length = total_new_length)
spl_index <- 1
for (i in 1:length(presplit))
{
if (!is.list(presplit[[i]]))
{
splitted[[spl_index]] <- presplit[[i]]
spl_index <- spl_index + 1
} else
{
for (j in 1:length(presplit[[i]]))
{
splitted[[spl_index]] <- presplit[[i]][[j]]
spl_index <- spl_index + 1
}
}
}
if (any(lapply(splitted, ncol) > wid)) return(separateGlyphs(splitted, wid)) else
return(splitted)
}
But I believe there is more fast and convenient way to achieve the same result (without using for loops and this enlooped reassignment of elements and then recursion if needed O_o).
I will be thankful for any suggestions on the point or, alternatively, for recommending some OCR-packages for R.
This should do the trick, with the values in final being what you're after.
combined <- do.call(cbind, lapply(myll, unlist))
idx <- seq(1, ncol(combined), 2)
final <- do.call(list, lapply(idx, function(x) combined[, x:(x+1)]))

R mean values of a sum of Matrices without divide for points equal zero

A = matrix(c(1,2,3, 0, 2, 2, 0,2 ,3), nrow=3, ncol=3)
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 2 2 2
[3,] 3 2 3
B = matrix(c(1,2,3, 1, 4, 2, 2,2 ,1), nrow=3, ncol=3)
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 4 2
[3,] 3 2 1
C = A + B /(Sum numbers diff of zero)
C = matrix(c(1,2,3, 1, 3, 2, 2,2 ,2), nrow=3, ncol=3)
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 3 2
[3,] 3 2 2
I need do it for a list of N matrices (mat_vect[[i]]):
list_mat_vect[[i]] <- assign(paste("a", i, sep = ""), mat_vect[[i]])
Sum matrix and get mean value
mat_sum_mean = Reduce("+", list_mat_vect) / length(file_list)
Here is dividing for all numbers, including the zeros. I dont want that.
You can do
(A+B)/((A!=0) + (B!=0))
to get
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 3 2
[3,] 3 2 2
Here != tests for equality with zero returning TRUE or FALSE. When we add those up, the TRUEs are treated like 1 and the FALSEs become 0.
You can do this with a list of matrices as well
list_mat_vect<-list(A,B)
Reduce("+", list_mat_vect) / Reduce("+", lapply(list_mat_vect, function(x) x!=0))

Remove NA/NaN/Inf in a matrix

I want to try two things :
How do I remove rows that contain NA/NaN/Inf
How do I set value of data point from NA/NaN/Inf to 0.
So far, I have tried using the following for NA values, but been getting warnings.
> eg <- data[rowSums(is.na(data)) == 0,]
Error in rowSums(is.na(data)) :
'x' must be an array of at least two dimensions
In addition: Warning message:
In is.na(data) : is.na() applied to non-(list or vector) of type 'closure'
I guess I'll throw my hat into the ring with my preferred methods:
# sample data
m <- matrix(c(1,2,NA,NaN,1,Inf,-1,1,9,3),5)
# remove all rows with non-finite values
m[!rowSums(!is.finite(m)),]
# replace all non-finite values with 0
m[!is.finite(m)] <- 0
library(functional)
m[apply(m, 1, Compose(is.finite, all)),]
Demonstration:
m <- matrix(c(1,2,3,NA,4,5), 3)
m
## [,1] [,2]
## [1,] 1 NA
## [2,] 2 4
## [3,] 3 5
m[apply(m, 1, Compose(is.finite, all)),]
## [,1] [,2]
## [1,] 2 4
## [2,] 3 5
Note: Compose(is.finite, all) is equivalent to function(x) all(is.finite(x))
To set the values to 0, use matrix indexing:
m[!is.finite(m)] <- 0
m
## [,1] [,2]
## [1,] 1 0
## [2,] 2 4
## [3,] 3 5
NaRV.omit(x) is my preferred option for question 1. Mnemonic NaRV means "not a regular value".
require(IDPmisc)
m <- matrix(c(1,2,3,NA,5, NaN, 7, 8, 9, Inf, 11, 12, -Inf, 14, 15), 5)
> m
[,1] [,2] [,3]
[1,] 1 NaN 11
[2,] 2 7 12
[3,] 3 8 -Inf
[4,] NA 9 14
[5,] 5 Inf 15
> NaRV.omit(m)
[,1] [,2] [,3]
[1,] 2 7 12
attr(,"na.action")
[1] 1 3 4 5
attr(,"class")
[1] "omit"
Just another way (for the first question):
m <- structure(c(1, 2, 3, NA, 4, 5, Inf, 5, 6, NaN, 7, 8),
.Dim = c(4L, 3L))
# [,1] [,2] [,3]
# [1,] 1 4 6
# [2,] 2 5 NaN
# [3,] 3 Inf 7
# [4,] NA 5 8
m[complete.cases(m * 0), , drop=FALSE]
# [,1] [,2] [,3]
# [1,] 1 4 6
I can't think anything else other than Matthew's answer for the second part.

Resources