Related
I have a list with 535 elements where each of these elements is a 1575x1575 matrix.
Some of the rows and columns are however entirely NAs.
I want to remove these rows and columns and already wrote a line which works when I just apply it for one entry.
But I can't figure out how to apply this apply function for the whole list. covmatrix is my list in this example.
testf <- function(i){
covmatrix[[i]][apply(!is.na(covmatrix[[i]]),2,any),apply(!is.na(covmatrix[[i]]),2,any)]
}
newlist <- lapply(covmatrix, testf)
I get the error code: Error in covmatrix[[i]] : no such Index at Level 1
I guess I do not understand properly how lapply works.
Lets' take the following toy example data:
matlist <- lapply(1:3, function(x) matrix(1:9, ncol = 3))
matlist[[2]][1,] <- NA
matlist[[3]][,1] <- NA
matlist
#> [[1]]
#> [,1] [,2] [,3]
#> [1,] 1 4 7
#> [2,] 2 5 8
#> [3,] 3 6 9
#>
#> [[2]]
#> [,1] [,2] [,3]
#> [1,] NA NA NA
#> [2,] 2 5 8
#> [3,] 3 6 9
#>
#> [[3]]
#> [,1] [,2] [,3]
#> [1,] NA 4 7
#> [2,] NA 5 8
#> [3,] NA 6 9
It makes coding a lot easier if we break down the problem into little chunks. For a complex problem, clarity of code is more important than brevity.
First we need a function that will return FALSE if all elements of a vector are NA, and TRUE otherwise:
notallNA <- function(vector) !all(is.na(vector))
Now we write a second function that uses our first function to remove rows and columns that consist purely of NAs from a matrix:
remove_NA <- function(mat) {
valid_rows <- apply(mat, 1, notallNA)
valid_cols <- apply(mat, 2, notallNA)
return(mat[valid_rows, valid_cols])
}
Finally, we can lapply this function to our list of matrices:
lapply(matlist, remove_NA)
#> [[1]]
#> [,1] [,2] [,3]
#> [1,] 1 4 7
#> [2,] 2 5 8
#> [3,] 3 6 9
#>
#> [[2]]
#> [,1] [,2] [,3]
#> [1,] 2 5 8
#> [2,] 3 6 9
#>
#> [[3]]
#> [,1] [,2]
#> [1,] 4 7
#> [2,] 5 8
#> [3,] 6 9
Note that, although we could squash these two functions into one or two lines of code, and do the whole thing as a lambda inside an lapply, the above code is simpler and easier to read / maintain than:
lapply(matlist, function(x) x[apply(x, 1, function(y) !all(is.na(y))),
apply(x, 2, function(y) !all(is.na(y)))])
#> [[1]]
#> [,1] [,2] [,3]
#> [1,] 1 4 7
#> [2,] 2 5 8
#> [3,] 3 6 9
#>
#> [[2]]
#> [,1] [,2] [,3]
#> [1,] 2 5 8
#> [2,] 3 6 9
#>
#> [[3]]
#> [,1] [,2]
#> [1,] 4 7
#> [2,] 5 8
#> [3,] 6 9
Assume that your list of matrices looks like this
set.seed(100)
ls_of_mat <- replicate(5, matrix(sample(c(NA, 1:10), size = 36, T, c(.7, rep(.3 / 10, 10))), 6), F)
[[1]]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] NA 5 NA NA NA NA
[2,] NA NA NA NA NA 9
[3,] NA NA 4 NA 4 NA
[4,] NA NA NA 2 8 10
[5,] NA NA NA NA NA NA
[6,] NA 8 NA 7 NA 8
[[2]]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] NA 4 NA NA NA NA
[2,] NA 6 NA NA NA NA
[3,] 1 NA NA NA 10 NA
[4,] NA NA NA NA NA NA
[5,] NA 4 NA NA NA NA
[6,] 3 8 NA NA NA NA
[[3]]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] NA 6 NA 8 NA NA
[2,] 10 NA NA NA NA NA
[3,] NA NA 7 NA NA NA
[4,] NA NA NA NA 4 NA
[5,] 3 9 NA 8 NA 1
[6,] 4 1 7 NA NA 2
Your logic simplifies to
# 1. find non-NA elements
# 2. drop rows and cols with less than one (zero) non-NA element
lapply(ls_of_mat, function(x) {
is_value <- !is.na(x)
x[!rowSums(is_value) < 1L, !colSums(is_value) < 1L]
})
Output
[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] 5 NA NA NA NA
[2,] NA NA NA NA 9
[3,] NA 4 NA 4 NA
[4,] NA NA 2 8 10
[5,] 8 NA 7 NA 8
[[2]]
[,1] [,2] [,3]
[1,] NA 4 NA
[2,] NA 6 NA
[3,] 1 NA 10
[4,] NA 4 NA
[5,] 3 8 NA
[[3]]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] NA 6 NA 8 NA NA
[2,] 10 NA NA NA NA NA
[3,] NA NA 7 NA NA NA
[4,] NA NA NA NA 4 NA
[5,] 3 9 NA 8 NA 1
[6,] 4 1 7 NA NA 2
This question already has an answer here:
Move NA to the start of each column in a matrix
(1 answer)
Closed 2 years ago.
I have a bunch of columns which all start on the same row but I would rather them all end on the same row. Here is a simplified example
A <- c(2,7,3,5,5,9,8,1,NA,NA)
B <- c(NA,5,2,1,6,4,6,7,NA,NA)
C <- c(NA,NA,NA,NA,3,6,7,1,5,6)
Start <- cbind(A,B,C)
Which gives:
A B C
[1,] 2 NA NA
[2,] 7 5 NA
[3,] 3 2 NA
[4,] 5 1 NA
[5,] 5 6 3
[6,] 9 4 6
[7,] 8 6 7
[8,] 1 7 1
[9,] NA NA 5
[10,] NA NA 6
But I want to manipulate this so it is output like this:
A B C
[1,] NA NA NA
[2,] NA NA NA
[3,] 2 NA NA
[4,] 7 5 NA
[5,] 3 2 3
[6,] 5 1 6
[7,] 5 6 7
[8,] 9 4 1
[9,] 8 6 5
[10,] 1 7 6
Couldn't really find a solution on this site. Thanks for any help.
You can try:
apply(Start, 2, function(x) rev(`length<-`(na.omit(rev(x)), nrow(Start))))
A B C
[1,] NA NA NA
[2,] NA NA NA
[3,] 2 NA NA
[4,] 7 5 NA
[5,] 3 2 3
[6,] 5 1 6
[7,] 5 6 7
[8,] 9 4 1
[9,] 8 6 5
[10,] 1 7 6
We can try apply + is.na
apply(Start,2,function(x) c(x[is.na(x)],x[!is.na(x)]))
or
apply(Start,2,function(x) do.call(c,rev(split(x,is.na(x)))))
such that
A B C
[1,] NA NA NA
[2,] NA NA NA
[3,] 2 NA NA
[4,] 7 5 NA
[5,] 3 2 3
[6,] 5 1 6
[7,] 5 6 7
[8,] 9 4 1
[9,] 8 6 5
[10,] 1 7 6
There is a sort parameter for this:
A <- c(2,7,3,5,5,9,8,1,NA,NA)
B <- c(NA,5,2,1,6,4,6,7,NA,NA)
C <- c(NA,NA,NA,NA,3,6,7,1,5,6)
Start <- as.data.frame(cbind(A,B,C) ) # added "as.data.frame" here ..
do.call(cbind, lapply(Start, sort, na.last = FALSE))
Or:
do.call(cbind, lapply(Start, function(x) {
res <- sort(x, na.last = FALSE)
res[!is.na(res)] <- x[!is.na(x)]
res
}))
# A B C
# [1,] NA NA NA
# [2,] NA NA NA
# [3,] 2 NA NA
# [4,] 7 5 NA
# [5,] 3 2 3
# [6,] 5 1 6
# [7,] 5 6 7
# [8,] 9 4 1
# [9,] 8 6 5
#[10,] 1 7 6
Goal
I want to use a long vector of numbers, to create a matrix where each column is a successive offset (lag or lead) of the original vector. If n is the maximum offset, the matrix will have dimensions [length(vector), n * 2 + 1] (because we want offsets in both directions, and include the 0 offset, i.e. the original vector).
Example
To illustrate, consider the following vector:
test <- c(2, 8, 1, 10, 7, 5, 9, 3, 4, 6)
[1] 2 8 1 10 7 5 9 3 4 6
Expected output
Now we create offsets of values, let's say for n == 3:
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] NA NA NA 2 8 1 10
[2,] NA NA 2 8 1 10 7
[3,] NA 2 8 1 10 7 5
[4,] 2 8 1 10 7 5 9
[5,] 8 1 10 7 5 9 3
[6,] 1 10 7 5 9 3 4
[7,] 10 7 5 9 3 4 6
[8,] 7 5 9 3 4 6 NA
[9,] 5 9 3 4 6 NA NA
[10,] 9 3 4 6 NA NA NA
I am looking for an efficient solution. data.table or tidyverse solutions more than welcome.
Returning only the rows that have no NA's (i.e. rows 4 to 7) is also ok.
Current solution
lags <- lapply(3:1, function(x) dplyr::lag(test, x))
leads <- lapply(1:3, function(x) dplyr::lead(test, x))
l <- c(lags, test, leads)
matrix(unlist(l), nrow = length(test))
In base R, you can use embed to get rows 4 through 7. You have to reverse the column order, however.
embed(test, 7)[, 7:1]
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 2 8 1 10 7 5 9
[2,] 8 1 10 7 5 9 3
[3,] 1 10 7 5 9 3 4
[4,] 10 7 5 9 3 4 6
data
test <- c(2, 8, 1, 10, 7, 5, 9, 3, 4, 6)
This will produce what you need...
n <- 3
t(embed(c(rep(NA,n), test, rep(NA,n)), length(test)))[length(test):1,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] NA NA NA 2 8 1 10
[2,] NA NA 2 8 1 10 7
[3,] NA 2 8 1 10 7 5
[4,] 2 8 1 10 7 5 9
[5,] 8 1 10 7 5 9 3
[6,] 1 10 7 5 9 3 4
[7,] 10 7 5 9 3 4 6
[8,] 7 5 9 3 4 6 NA
[9,] 5 9 3 4 6 NA NA
[10,] 9 3 4 6 NA NA NA
This can be solved by constructing the matrix from a long vector and returning only the wanted columns and rows:
test <- c(2, 8, 1, 10, 7, 5, 9, 3, 4, 6)
n_offs <- 3L
n_row <- length(test) + n_offs + 1L
matrix(rep(c(rep(NA, n_offs), test), n_row), nrow = n_row)[1:length(test), 1:(n_offs * 2L + 1L)]
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] NA NA NA 2 8 1 10
[2,] NA NA 2 8 1 10 7
[3,] NA 2 8 1 10 7 5
[4,] 2 8 1 10 7 5 9
[5,] 8 1 10 7 5 9 3
[6,] 1 10 7 5 9 3 4
[7,] 10 7 5 9 3 4 6
[8,] 7 5 9 3 4 6 NA
[9,] 5 9 3 4 6 NA NA
[10,] 9 3 4 6 NA NA NA
A variant which just returns the same result as embed(test, 7)[, 7:1] is:
matrix(rep(test, length(test) + 1L), nrow = length(test) + 1L)[
seq_len(length(test) - 2L * n_offs), seq_len(n_offs * 2L + 1L)]
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 2 8 1 10 7 5 9
[2,] 8 1 10 7 5 9 3
[3,] 1 10 7 5 9 3 4
[4,] 10 7 5 9 3 4 6
I want to iterate the following matrix and print sets of 2 cell values. Is there a way to do this without a for-loop?
Input:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 4 7 10 13 16
[2,] 2 5 8 11 14 17
[3,] 3 6 9 12 15 18
Expected Output:
[,1] [,2]
[1,] 1 4
[2,] 7 10
[3,] 13 16
[4,] 2 5
[5,] 8 11
[6,] 14 17
[7,] 3 6
[8,] 9 12
[9,] 15 18
This my code:
mat<-matrix(data=seq(1:18), nrow=3,ncol=6)
r <- rep(seq(1,3),each=2)
c1 <- seq(1,6,2)
c2 <- seq(2,6,2)
m <- mat[r,c(c1:c2)] # This does not work, it only output first two cells
We can get the transpose of the matrix, then convert back to matrix by specifying the ncol
matrix(t(mat), ncol=2, byrow=TRUE)
# [,1] [,2]
# [1,] 1 4
# [2,] 7 10
# [3,] 13 16
# [4,] 2 5
# [5,] 8 11
# [6,] 14 17
# [7,] 3 6
# [8,] 9 12
# [9,] 15 18
I have written a function with 3 nested foreach loops, running in parallel. The goal of the function is to split a list of 30 [10,5] matrices (i.e. [[30]][10,5]) into a list of 5 [10,30] matrices (i.e. [[5]][10,30]).
However, I am trying to run this function with 1,000,000 paths (i.e. foreach (m = 1:1000000)), and obviously, the performance is terrible.
I'd like to avoid apply functions if possible because I've found that they don't work well when used in conjunction with parallel foreach loops:
library(foreach)
library(doParallel)
# input matr: a list of 30 [10,5] matrices
matrix_splitter <- function(matr) {
time_horizon <- 30
paths <- 10
asset <- 5
security_paths <- foreach(i = 1:asset, .combine = rbind, .packages = "doParallel", .export = "daily") %dopar% {
foreach(m = 1:paths, .combine = rbind, .packages = "doParallel", .export = "daily") %dopar% {
foreach(p = daily, .combine = c) %dopar% {
p[m,i]
}
}
}
df_securities <- as.data.frame(security_paths)
split(df_securities, sample(rep(1:paths), asset))
}
Overall, I'm trying to convert this data format:
[[30]]
[,1] [,2] [,3] [,4] [,5]
[1,] 0.2800977 2.06715521 0.9196326 0.3560659 1.36126507
[2,] -0.5119867 0.24329025 0.1513218 -1.2528092 -0.04795098
[3,] -2.0293933 -1.17989270 0.3053376 -0.9528611 0.86758140
[4,] -0.6419024 -0.24846720 -0.6640066 -1.7104961 -0.32759406
[5,] -0.4340359 -0.44034013 3.3440507 0.7380613 2.01237069
[6,] -0.6679914 -0.01332117 1.9286056 -0.7194116 0.15549978
[7,] 0.5919820 0.11616685 -0.8424634 -0.7652715 1.34176688
[8,] 0.8079152 0.40592119 -0.4291811 0.9358829 -0.97479314
[9,] -0.0265207 -0.03598320 1.1287344 0.4732984 1.37792596
[10,] 1.0553966 0.65776721 -1.2833613 -0.2414846 0.81528686
To this format (obviously up to V30):
$`5`
V1 V2 V3 V4 V5 V6 V7
result.2 -0.11822260 1.7712833 1.97737285 -1.6643193 0.4788075 1.2394064 1.4800787
result.7 -1.23251178 0.4267885 -0.07728632 0.3463092 0.8766395 0.6324840 0.5946710
result.2.1 -1.27309457 -0.3128173 -0.79561297 -0.4713307 -0.4344864 0.4688124 -0.5646857
result.7.1 0.51702719 -1.6242650 -2.37976199 -0.1088408 0.4846507 -0.7594376 0.9326529
result.2.2 1.77550390 0.9279155 0.26168402 0.4893835 1.4131326 0.5989508 -0.3434010
result.7.2 -0.01590682 -0.5568578 1.35789122 -0.1385092 -0.4501515 -0.2581724 0.5451699
result.2.3 0.30400225 -1.0245640 -0.05285694 -0.1354228 0.3070331 -0.7618850 1.0330961
result.7.3 -0.08139912 0.4106541 1.40418839 0.2471505 1.2106539 1.3844721 0.4006751
result.2.4 0.94977544 -0.8045054 1.48791211 1.4361686 -0.3789274 -1.9570125 -1.6576634
result.7.4 0.70449194 1.6887800 0.56447340 0.6465640 2.6865388 -0.7367524 0.6242624
V8 V9 V10 V11 V12 V13
result.2 -0.432404728 -1.6225350 0.09855465 0.17371907 0.3081843 0.15148452
result.7 -0.597420706 0.6173004 0.07518596 2.01741406 0.1767152 -0.39219471
result.2.1 0.918408322 -1.6896424 -0.13409626 0.38674224 0.3491750 -1.61083286
result.7.1 2.564057340 -0.7696399 1.06103614 1.38528367 1.1684045 -0.08467871
result.2.2 0.951995816 0.1910284 1.79943500 2.13909498 0.2847664 0.31094568
result.7.2 -0.479349220 -0.2368760 0.04298525 -0.40385960 0.3986555 -1.93499213
result.2.3 -1.382370069 1.0459845 -0.33106323 -0.43362925 0.7045572 -0.30211601
result.7.3 -1.457106442 0.1487447 -2.52392942 -0.02399523 -1.0349746 0.87666365
result.2.4 -0.848879365 0.7521024 0.16790915 0.47112444 0.8886361 -0.12733039
result.7.4 -0.003350467 0.4021858 -1.80031445 -1.42399232 1.0507765 -0.36193846
The package plyr is designed for this problem thanks to alply. The idea is: unlist your list, fromat it in the appropriate way in an array, and convert this array to a list of matrix using alply.
Example of transformation of a list of 2 matrix 3x5 to a list of 5 matrix 2x3:
library(plyr)
lst = list(matrix(1:15, ncol=5), matrix(10:24, ncol=5))
alply(array(unlist(lst), c(2,3,5)),3)
#$`1`
# [,1] [,2] [,3]
#[1,] 1 3 5
#[2,] 2 4 6
#$`2`
# [,1] [,2] [,3]
#[1,] 7 9 11
#[2,] 8 10 12
#$`3`
# [,1] [,2] [,3]
#[1,] 13 15 11
#[2,] 14 10 12
#$`4`
# [,1] [,2] [,3]
#[1,] 13 15 17
#[2,] 14 16 18
#$`5`
# [,1] [,2] [,3]
#[1,] 19 21 23
#[2,] 20 22 24
I believe you are looking for the answer to this:
Function to split a matrix into sub-matrices in R
You would just use do.call(rbind, matlist) as an input to those functions.
I would convert all of your list into a great big vector, and then re-dimension it.
For my solution, I started with:
[[28]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 11 21 31 41
[2,] 2 12 22 32 42
[3,] 3 13 23 33 43
[4,] 4 14 24 34 44
[5,] 5 15 25 35 45
[6,] 6 16 26 36 46
[7,] 7 17 27 37 47
[8,] 8 18 28 38 48
[9,] 9 19 29 39 49
[10,] 10 20 30 40 50
Repeated thirty times. This is the variable orig. My code:
flattened.vec <- unlist(orig) #flatten the list of matrices into one big vector
dim(flattened.vec) <-c(10,150) #need to rearrange the vector so the re-shape comes out right
transposed.matrix <- t(flattened.vec) #transposing to make sure right elements go to the right place
new.matrix.list <- split(transposed.matrix,cut(seq_along(transposed.matrix)%%5, 10, labels = FALSE)) #split the big, transposed matrix into 5 10x30 matrices
This code gives you 5 vectors, that you need to dim(10,30) and then use t() on them in a foreach to get 5 30X10 vectors (I would normally use an apply function, and am not familiar with the foreach library).
End result for one of the 5 matrices result after doing so:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17]
[1,] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2,] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3,] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4,] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5,] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
[6,] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
[7,] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
[8,] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
[9,] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
[10,] 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
[,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
[1,] 1 1 1 1 1 1 1 1 1 1 1 1 1
[2,] 2 2 2 2 2 2 2 2 2 2 2 2 2
[3,] 3 3 3 3 3 3 3 3 3 3 3 3 3
[4,] 4 4 4 4 4 4 4 4 4 4 4 4 4
[5,] 5 5 5 5 5 5 5 5 5 5 5 5 5
[6,] 6 6 6 6 6 6 6 6 6 6 6 6 6
[7,] 7 7 7 7 7 7 7 7 7 7 7 7 7
[8,] 8 8 8 8 8 8 8 8 8 8 8 8 8
[9,] 9 9 9 9 9 9 9 9 9 9 9 9 9
[10,] 10 10 10 10 10 10 10 10 10 10 10 10 10
Incidentally, this is probably what the plyr package does on its own already (as posted by Colonel Beauvel), just manually instead of using an external library