Related
I have a list of matrices that I've created. The matrices in the list have different dimensions, and I would like to fill the matrices that don't have a 3x3 dimension with NAs.
I have included my expected outcome below. I would like to include this in a if statement, where if the matrix in the list doesn't have a 3x3 dimension I would like to added empty columns/rows to those matrices and fill them with an NA. Is there an efficient way of doing this in base r?
# Created Matrices
m1 <- matrix(1:9, 3,3)
m2 <- matrix(1:4, 2,2)
m3 <- matrix(1:3, 3, 1)
# Matrices into a list
l1 <- list(m1, m2, m3)
l1
# Expected Matrices and outputs
m2_new <- matrix(c(1,2,NA,3, 4, rep(NA, 4)), 3,3)
m3_new <- matrix(c(1,2,3,rep(NA, 6)), 3,3)
expected <- list(m1, m2_new, m3_new)
One option would be to create a NA matrix and replace the values with the 'x' based on the row/col index
dummy <- matrix(ncol = 3, nrow = 3)
l2 <- lapply(l1, function(x) replace(dummy, cbind(c(row(x)), c(col(x))), x))
-checking
> all.equal(l2, expected)
[1] TRUE
You can replace parts of a matrix with matrix indexing.
mat <- array(dim = c(3, 3))
lapply(l1, function(x) `[<-`(mat, 1:nrow(x), 1:ncol(x), x))
# [[1]]
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 2 5 8
# [3,] 3 6 9
#
# [[2]]
# [,1] [,2] [,3]
# [1,] 1 3 NA
# [2,] 2 4 NA
# [3,] NA NA NA
#
# [[3]]
# [,1] [,2] [,3]
# [1,] 1 NA NA
# [2,] 2 NA NA
# [3,] 3 NA NA
Update see comment by Darren Tsai:
n <- 3
l2 <- lapply(l1, function(x) rbind(x, matrix(ncol = ncol(x), nrow = n - nrow(x))))
x <- sapply(l2, `length<-`, max(lengths(l2)))
list(m1 = matrix(x[,1],3), m2 = matrix(x[,2],3), m3 = matrix(x[,3],3))
$m1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
$m2
[,1] [,2] [,3]
[1,] 1 3 NA
[2,] 2 4 NA
[3,] NA NA NA
$m3
[,1] [,2] [,3]
[1,] 1 NA NA
[2,] 2 NA NA
[3,] 3 NA NA
First answer: not correct output:
Here is another approach:
x <- t(sapply(l1, `length<-`, max(lengths(l1))))
l2 <- list(x[,1:3], x[,4:6], x[,7:9])
l2
[[1]]
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 2 3
[3,] 1 2 3
[[2]]
[,1] [,2] [,3]
[1,] 2 2 2
[2,] 4 NA NA
[3,] NA NA NA
[[3]]
[,1] [,2] [,3]
[1,] 3 3 3
[2,] NA NA NA
[3,] NA NA NA
I think there are better solutions but mine will handle a 1x1 matrix as well, which is really just a vector.
You can use the function I've made here resize_matrix in your code however you'd like. It is pretty verbose, but I thinks it's easy to understand exactly what it's doing under the hood. Note: the function is meant to be used in an lapply() call.
The input:
m1 <- matrix(1:9, 3,3)
m2 <- matrix(1:4, 2,2)
m3 <- matrix(1:3, 3, 1)
m4 <- matrix(1:3, 1, 3)
m5 <- matrix(1, 1, 1)
# Matrices into a list
l1 <- list(m1, m2, m3, m4, m5)
l1
#> [[1]]
#> [,1] [,2] [,3]
#> [1,] 1 4 7
#> [2,] 2 5 8
#> [3,] 3 6 9
#>
#> [[2]]
#> [,1] [,2]
#> [1,] 1 3
#> [2,] 2 4
#>
#> [[3]]
#> [,1]
#> [1,] 1
#> [2,] 2
#> [3,] 3
#>
#> [[4]]
#> [,1] [,2] [,3]
#> [1,] 1 2 3
#>
#> [[5]]
#> [,1]
#> [1,] 1
The function:
resize_matrix <- function(mat, desired_rows = 3, desired_columns = 3){
needed_cols <- desired_columns - dim(mat)[2]; needed_cols
needed_rows <- desired_rows - dim(mat)[1]; needed_rows
if (dim(mat)[1] == 1 & dim(mat)[2] == 1){
# we're give a matrix with a single value, expand correctly
final_mat <- matrix(NA, nrow = desired_rows, ncol = desired_columns)
final_mat[1,1] <- mat
} else if (needed_cols > 0 & needed_rows > 0){
# we need to add both rows and columns
col_res <- rep(NA, needed_rows)
row_res <- rep(NA, needed_cols)
mat_temp1 <- rbind(mat, col_res)
final_mat <- unname(cbind(mat_temp1, row_res))
} else if (needed_cols > 0 & needed_rows == 0) {
# we need to add only columns
row_res <- matrix(rep(NA, needed_cols),
ncol = needed_cols, nrow = desired_rows)
final_mat <- unname(cbind(mat, row_res))
} else if (needed_cols == 0 & needed_rows > 0) {
# we need to add only rows
col_res <- matrix(rep(NA, needed_rows),
ncol = desired_columns, nrow = needed_rows)
final_mat <- unname(rbind(mat, col_res))
} else {
# we don't need to add anything, return the matrix
final_mat <- mat
}
return(final_mat)
}
The output:
lapply(l1, FUN = resize_matrix)
#> [[1]]
#> [,1] [,2] [,3]
#> [1,] 1 4 7
#> [2,] 2 5 8
#> [3,] 3 6 9
#>
#> [[2]]
#> [,1] [,2] [,3]
#> [1,] 1 3 NA
#> [2,] 2 4 NA
#> [3,] NA NA NA
#>
#> [[3]]
#> [,1] [,2] [,3]
#> [1,] 1 NA NA
#> [2,] 2 NA NA
#> [3,] 3 NA NA
#>
#> [[4]]
#> [,1] [,2] [,3]
#> [1,] 1 2 3
#> [2,] NA NA NA
#> [3,] NA NA NA
#>
#> [[5]]
#> [,1] [,2] [,3]
#> [1,] 1 NA NA
#> [2,] NA NA NA
#> [3,] NA NA NA
Created on 2022-04-16 by the reprex package (v2.0.1)
cbind(1:2, 1:10)
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 1 3
[4,] 2 4
[5,] 1 5
[6,] 2 6
[7,] 1 7
[8,] 2 8
[9,] 1 9
[10,] 2 10
I want an output like below
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 3
[4,] 4
[5,] 5
[6,] 6
[7,] 7
[8,] 8
[9,] 9
[10,] 10
The trick is to make all your inputs the same length.
x <- 1:2
y <- 1:10
n <- max(length(x), length(y))
length(x) <- n
length(y) <- n
If you want you output to be an array, then cbind works, but you get additional NA values to pad out the rectangle.
cbind(x, y)
x y
[1,] 1 1
[2,] 2 2
[3,] NA 3
[4,] NA 4
[5,] NA 5
[6,] NA 6
[7,] NA 7
[8,] NA 8
[9,] NA 9
[10,] NA 10
To get rid of the NAs, the output must be a list.
Map(function(...)
{
ans <- c(...)
ans[!is.na(ans)]
}, as.list(x), as.list(y)
)
[[1]]
[1] 1 1
[[2]]
[1] 2 2
[[3]]
[1] 3
[[4]]
[1] 4
[[5]]
[1] 5
[[6]]
[1] 6
[[7]]
[1] 7
[[8]]
[1] 8
[[9]]
[1] 9
[[10]]
[1] 10
EDIT: I swapped mapply(..., SIMPLIFY = FALSE) for Map.
I came across similar problem and I would like to suggest that additional solution that some, I hope, may find useful. The solution is fairly straightforward and makes use of the qpcR package and the provided cbind.na function.
Example
x <- 1:2
y <- 1:10
dta <- qpcR:::cbind.na(x, y)
Results
> head(dta)
x y
[1,] 1 1
[2,] 2 2
[3,] NA 3
[4,] NA 4
[5,] NA 5
[6,] NA 6
Side comments
Following the OP's original example, column names can be easily removed:
colnames(dta) <- NULL
the operation would produce the desired output in full:
> head(dta)
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] NA 3
[4,] NA 4
[5,] NA 5
[6,] NA 6
I would like to propose an alternate solution that makes use of the rowr package and their cbind.fill function.
> rowr::cbind.fill(1:2,1:10, fill = NA);
object object
1 1 1
2 2 2
3 NA 3
4 NA 4
5 NA 5
6 NA 6
7 NA 7
8 NA 8
9 NA 9
10 NA 10
Or alternatively, to match the OP's desired output:
> rowr::cbind.fill(1:2,1:10, fill = '');
object object
1 1 1
2 2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
Helper function...
bind.pad <- function(l, side="r", len=max(sapply(l,length)))
{
if (side %in% c("b", "r")) {
out <- sapply(l, 'length<-', value=len)
} else {
out <- sapply(sapply(sapply(l, rev), 'length<-', value=len, simplify=F), rev)}
if (side %in% c("r", "l")) out <- t(out)
out
}
Examples:
> l <- lapply(c(3,2,1,2,3),seq)
> lapply(c("t","l","b","r"), bind.pad, l=l, len=4)
[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] NA NA NA NA NA
[2,] 1 NA NA NA 1
[3,] 2 1 NA 1 2
[4,] 3 2 1 2 3
[[2]]
[,1] [,2] [,3] [,4]
[1,] NA 1 2 3
[2,] NA NA 1 2
[3,] NA NA NA 1
[4,] NA NA 1 2
[5,] NA 1 2 3
[[3]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 1
[2,] 2 2 NA 2 2
[3,] 3 NA NA NA 3
[4,] NA NA NA NA NA
[[4]]
[,1] [,2] [,3] [,4]
[1,] 1 2 3 NA
[2,] 1 2 NA NA
[3,] 1 NA NA NA
[4,] 1 2 NA NA
[5,] 1 2 3 NA
Given that some of the solutions above rely on packages that are no longer available, here a helper function that only uses dplyr.
bind_cols_fill <- function(df_list) {
max_rows <- map_int(df_list, nrow) %>% max()
map(df_list, function(df) {
if(nrow(df) == max_rows) return(df)
first <- names(df)[1] %>% sym()
df %>% add_row(!!first := rep(NA, max_rows - nrow(df)))
}) %>% bind_cols()
}
Note that this takes a list of data frames, so that it is slightly cumbersome if one only wants to combine two vectors:
x <- 1:2
y <- 1:10
bind_cols_fill(list(tibble(x), tibble(y))
Another solution with no dependencies:
my_bind <- function(x, y){
if(length(x = x) > length(x = y)){
len_diff <- length(x) - length(y)
y <- c(y, rep(NA, len_diff))
}else if(length(x = x) < length(x = y)){
len_diff <- length(y) - length(x)
x <- c(x, rep(NA, len_diff))
}
cbind(x, y)
}
my_bind(x = letters[1:4], y = letters[1:2])
Let's say I have a 3-dimensional array:
a <- array(dim = c(3, 2, 3))
and a vector with indices for the 1st dimension:
ind <- c(1,2,3)
Now I want to put a number (e.g. 1) into the rows (ind) across 2nd and 2rd dimension.
a[ind,,] <- matrix(1, ncol = 2, nrow = length(ind))
clearly does not work! A loop would work, however is there a better solution?
The result should be:
, , 1
[,1] [,2]
[1,] 1 1
[2,] NA NA
[3,] NA NA
, , 2
[,1] [,2]
[1,] NA NA
[2,] 1 1
[3,] NA NA
, , 3
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] 1 1
Using mapply.
mapply(function(x, y) {a[x,,y] <<- 1;a}, ind, ind)
a
# , , 1
#
# [,1] [,2]
# [1,] 1 1
# [2,] NA NA
# [3,] NA NA
#
# , , 2
#
# [,1] [,2]
# [1,] NA NA
# [2,] 1 1
# [3,] NA NA
#
# , , 3
#
# [,1] [,2]
# [1,] NA NA
# [2,] NA NA
# [3,] 1 1
Or, for this specific case where x and y is the same, just:
sapply(ind, function(x) {a[x,,x] <<- 1;a})
Note: If you find the console output unnecessary wrap an invisible() around the code.
Another alternative that replaces everything in one step. It takes advantage of ['s operations when its argument is a matrix, each column indexing on a dimension of the array.
To confirm that it is putting values where we intend, I'll modify the replacement matrix to be 1:6.
# original data
a <- array(dim = c(3, 2, 3))
# the replacement matrix
r <- matrix(1:6, ncol = 2, nrow = length(ind))
# this will be what we index `a` on
m <- expand.grid(ind, seq_len(dim(a)[2]))
m$Var3 <- m$Var1 # repeat `ind` in the third dimension
m
# Var1 Var2 Var3
# 1 1 1 1
# 2 2 1 2
# 3 3 1 3
# 4 1 2 1
# 5 2 2 2
# 6 3 2 3
And the replacement:
a[as.matrix(m)] <- r
a
# , , 1
# [,1] [,2]
# [1,] 1 4
# [2,] NA NA
# [3,] NA NA
# , , 2
# [,1] [,2]
# [1,] NA NA
# [2,] 2 5
# [3,] NA NA
# , , 3
# [,1] [,2]
# [1,] NA NA
# [2,] NA NA
# [3,] 3 6
I'm plan to write a function called lagit(a,k) to get a result like this:
lagit(c(1,5,6,4,7),c(1,3))
then it should output:
L0 L1 L3
1 NA NA
5 1 NA
6 5 NA
4 6 1
7 4 5
I'm facing 2 problems right now:
1. lag each column as each element in vector k;
2. how to bind a vector to a matrix(I used a for loop.
I was asked to use the functions in base only. So I can't call any functions from other packages.
Try this:
lagit <- function(a,k) {
tmp <- lapply(k,function(i) c(rep(NA,i),head(a,length(a)-i)))
res <- cbind(a,do.call(cbind,tmp))
colnames(res) <- paste0("L",c(0,k))
res
}
lagit(a,k)
#[1,] 1 NA NA
#[2,] 5 1 NA
#[3,] 6 5 NA
#[4,] 4 6 1
#[5,] 7 4 5
Where:
a <- c(1,5,6,4,7)
k <- c(1,3)
Here is an alternative approach
x <- c(1,5,6,4,7)
# Define a function that operates on a vector x
lagit <- function(x, k) {
stopifnot(k >= 0 & k <= length(x))
replace(rep(NA, length(x)), (k + 1):length(x), x[1:(length(x) - k)])
}
While not strictly necessary I've added a stopifnot statement to ensure that the lag is positive and less or equal to the length of the vector.
# Use sapply to apply lagit to different lags and store result as a matrix
sapply(c(0, 1, 3), function(k) lagit(x, k))
# [,1] [,2] [,3]
#[1,] 1 NA NA
#[2,] 5 1 NA
#[3,] 6 5 NA
#[4,] 4 6 1
#[5,] 7 4 5
A recursive solution:
myLag <- function(x, n){
if(n > 0) myLag(c(NA, x)[1:length(x)], n-1) else x
}
The ability of this function is equivalent to dplyr::lag() and data.table::shift(). Let's test it:
myLag(1:10, 3)
# [1] NA NA NA 1 2 3 4 5 6 7
In your case:
a <- c(1,5,6,4,7)
b <- c(1,3)
> sapply(b, myLag, x = a)
[1,] NA NA
[2,] 1 NA
[3,] 5 NA
[4,] 6 1
[5,] 4 5
> cbind(a, sapply(b, myLag, x = a))
[1,] 1 NA NA
[2,] 5 1 NA
[3,] 6 5 NA
[4,] 4 6 1
[5,] 7 4 5
Yet another option that uses vapply and length<- under the hood
lagit <- function(a, k) {
l <- length(a)
k <- if (0 %in% k) k else c(0, k)
vapply(k, function(x) `length<-`(c(rep(NA, times = x), a), l), numeric(l))
}
lagit(1:5, c(1, 3, 6))
# [,1] [,2] [,3] [,4]
#[1,] 1 NA NA NA
#[2,] 2 1 NA NA
#[3,] 3 2 NA NA
#[4,] 4 3 1 NA
#[5,] 5 4 2 NA
A base R solution
myLag <- function(x, n){
if(n >= length(x))
return(rep(NA,n))
else if(n < length(x) & n > 0)
c(rep(NA,n), x[1:(length(x)-n)])
else
x
}
lagit <- function(x,y){
cbind(x, sapply(y, function(z) myLag(x,z)))
}
> lagit(c(1,5,6,4,7),c(1,3))
x
[1,] 1 NA NA
[2,] 5 1 NA
[3,] 6 5 NA
[4,] 4 6 1
[5,] 7 4 5
cbind(1:2, 1:10)
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 1 3
[4,] 2 4
[5,] 1 5
[6,] 2 6
[7,] 1 7
[8,] 2 8
[9,] 1 9
[10,] 2 10
I want an output like below
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 3
[4,] 4
[5,] 5
[6,] 6
[7,] 7
[8,] 8
[9,] 9
[10,] 10
The trick is to make all your inputs the same length.
x <- 1:2
y <- 1:10
n <- max(length(x), length(y))
length(x) <- n
length(y) <- n
If you want you output to be an array, then cbind works, but you get additional NA values to pad out the rectangle.
cbind(x, y)
x y
[1,] 1 1
[2,] 2 2
[3,] NA 3
[4,] NA 4
[5,] NA 5
[6,] NA 6
[7,] NA 7
[8,] NA 8
[9,] NA 9
[10,] NA 10
To get rid of the NAs, the output must be a list.
Map(function(...)
{
ans <- c(...)
ans[!is.na(ans)]
}, as.list(x), as.list(y)
)
[[1]]
[1] 1 1
[[2]]
[1] 2 2
[[3]]
[1] 3
[[4]]
[1] 4
[[5]]
[1] 5
[[6]]
[1] 6
[[7]]
[1] 7
[[8]]
[1] 8
[[9]]
[1] 9
[[10]]
[1] 10
EDIT: I swapped mapply(..., SIMPLIFY = FALSE) for Map.
I came across similar problem and I would like to suggest that additional solution that some, I hope, may find useful. The solution is fairly straightforward and makes use of the qpcR package and the provided cbind.na function.
Example
x <- 1:2
y <- 1:10
dta <- qpcR:::cbind.na(x, y)
Results
> head(dta)
x y
[1,] 1 1
[2,] 2 2
[3,] NA 3
[4,] NA 4
[5,] NA 5
[6,] NA 6
Side comments
Following the OP's original example, column names can be easily removed:
colnames(dta) <- NULL
the operation would produce the desired output in full:
> head(dta)
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] NA 3
[4,] NA 4
[5,] NA 5
[6,] NA 6
I would like to propose an alternate solution that makes use of the rowr package and their cbind.fill function.
> rowr::cbind.fill(1:2,1:10, fill = NA);
object object
1 1 1
2 2 2
3 NA 3
4 NA 4
5 NA 5
6 NA 6
7 NA 7
8 NA 8
9 NA 9
10 NA 10
Or alternatively, to match the OP's desired output:
> rowr::cbind.fill(1:2,1:10, fill = '');
object object
1 1 1
2 2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
Helper function...
bind.pad <- function(l, side="r", len=max(sapply(l,length)))
{
if (side %in% c("b", "r")) {
out <- sapply(l, 'length<-', value=len)
} else {
out <- sapply(sapply(sapply(l, rev), 'length<-', value=len, simplify=F), rev)}
if (side %in% c("r", "l")) out <- t(out)
out
}
Examples:
> l <- lapply(c(3,2,1,2,3),seq)
> lapply(c("t","l","b","r"), bind.pad, l=l, len=4)
[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] NA NA NA NA NA
[2,] 1 NA NA NA 1
[3,] 2 1 NA 1 2
[4,] 3 2 1 2 3
[[2]]
[,1] [,2] [,3] [,4]
[1,] NA 1 2 3
[2,] NA NA 1 2
[3,] NA NA NA 1
[4,] NA NA 1 2
[5,] NA 1 2 3
[[3]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 1
[2,] 2 2 NA 2 2
[3,] 3 NA NA NA 3
[4,] NA NA NA NA NA
[[4]]
[,1] [,2] [,3] [,4]
[1,] 1 2 3 NA
[2,] 1 2 NA NA
[3,] 1 NA NA NA
[4,] 1 2 NA NA
[5,] 1 2 3 NA
Given that some of the solutions above rely on packages that are no longer available, here a helper function that only uses dplyr.
bind_cols_fill <- function(df_list) {
max_rows <- map_int(df_list, nrow) %>% max()
map(df_list, function(df) {
if(nrow(df) == max_rows) return(df)
first <- names(df)[1] %>% sym()
df %>% add_row(!!first := rep(NA, max_rows - nrow(df)))
}) %>% bind_cols()
}
Note that this takes a list of data frames, so that it is slightly cumbersome if one only wants to combine two vectors:
x <- 1:2
y <- 1:10
bind_cols_fill(list(tibble(x), tibble(y))
Another solution with no dependencies:
my_bind <- function(x, y){
if(length(x = x) > length(x = y)){
len_diff <- length(x) - length(y)
y <- c(y, rep(NA, len_diff))
}else if(length(x = x) < length(x = y)){
len_diff <- length(y) - length(x)
x <- c(x, rep(NA, len_diff))
}
cbind(x, y)
}
my_bind(x = letters[1:4], y = letters[1:2])