I have a matrix:
mat <- matrix(c(2,11,3,1,2,4,55,65,12,4,6,6,7,9,3,23,16,77,5,5,7),ncol = 3, byrow = TRUE)
[,1] [,2] [,3]
[1,] 2 11 3
[2,] 1 2 4
[3,] 55 65 12
[4,] 4 6 6
[5,] 7 9 3
[6,] 23 16 77
[7,] 5 5 7
I want to add a column with rows index. This index will starts at 1 and repeats the same index, until it arrived to a row where the rowsums is > 100 to move to the next value.
Indx[,2][,3][,4]
[1,] 1 2 11 3
[2,] 1 1 2 4
[3,] 2 55 65 12
[4,] 3 4 6 6
[5,] 3 7 9 3
[6,] 4 23 16 77
[7,] 5 5 5 7
Using rle:
matRle <- rle(rowSums(mat) > 100)$lengths
cbind(rep(seq(length(matRle)), matRle), mat)
# [,1] [,2] [,3] [,4]
# [1,] 1 2 11 3
# [2,] 1 1 2 4
# [3,] 2 55 65 12
# [4,] 3 4 6 6
# [5,] 3 7 9 3
# [6,] 4 23 16 77
# [7,] 5 5 5 7
A solution using dplyr.
library(dplyr)
mat2 <- mat %>%
as.data.frame() %>%
mutate(Indx = cumsum(rowSums(dat) > 100 | lag(rowSums(dat) > 100, default = TRUE))) %>%
select(Indx, paste0("V", 1:ncol(mat))) %>%
as.matrix()
mat2
# Indx V1 V2 V3
# [1,] 1 2 11 3
# [2,] 1 1 2 4
# [3,] 2 55 65 12
# [4,] 3 4 6 6
# [5,] 3 7 9 3
# [6,] 4 23 16 77
# [7,] 5 5 5 7
cbind(cumsum(replace(a<-rowSums(mat)>100,which(a==1)+1,1))+1,mat)
[,1] [,2] [,3] [,4]
[1,] 1 2 11 3
[2,] 1 1 2 4
[3,] 2 55 65 12
[4,] 3 4 6 6
[5,] 3 7 9 3
[6,] 4 23 16 77
[7,] 5 5 5 7
What does this do??:
first obtain the rowSums which are greater than 100
a<-rowSums(mat)>100
Then the next row for every row>100, should have the next index. Thus do a replace and cumsum:
cumsum(replace(a,which(a==1)+1,1))
Now you will realize that this starts from zero, so you add 1.
We could do this with rleid from data.table
library(data.table)
cbind(Indx = rleid(rowSums(mat) > 100), mat)
# Indx
#[1,] 1 2 11 3
#[2,] 1 1 2 4
#[3,] 2 55 65 12
#[4,] 3 4 6 6
#[5,] 3 7 9 3
#[6,] 4 23 16 77
#[7,] 5 5 5 7
Related
I have a list of matrices, generated with the code below
a<-c(0,5,0,1,5,1,5,4,6,7)
b<-c(3,1,0,2,4,2,5,5,7,8)
c<-c(5,9,0,1,3,2,5,6,2,7)
d<-c(6,5,0,1,3,4,5,6,7,1)
k<-data.frame(a,b,c,d)
k<-as.matrix(k)
#dimnames(k)<-list(cntry,cntry)
e<-c(0,5,2,2,1,2,3,6,9,2)
f<-c(2,0,4,1,1,3,4,5,1,4)
g<-c(3,3,0,2,0,9,3,2,1,9)
h<-c(6,1,1,1,5,7,8,8,0,2)
l<-data.frame(e,f,g,h)
l<-as.matrix(l)
#dimnames(l)<-list(cntry,cntry)
list<-list(k,l)
names(list)<-2010:2011
list
list
$`2010`
a b c d
[1,] 0 3 5 6
[2,] 5 1 9 5
[3,] 0 3 2 2
[4,] 1 2 1 1
[5,] 5 4 3 3
[6,] 1 2 2 4
[7,] 5 5 5 5
[8,] 4 5 6 6
[9,] 6 7 2 7
[10,] 7 8 7 1
$`2011`
e f g h
[1,] 0 2 3 6
[2,] 5 0 3 1
[3,] 2 4 0 1
[4,] 2 1 2 1
[5,] 1 1 0 5
[6,] 2 3 9 7
[7,] 3 4 3 8
[8,] 6 5 2 8
[9,] 9 1 1 0
[10,] 2 4 9 2
In each matrix I would like to delete the rows that are smaller than 1. But when I delete in matrix "2010" the first row (because <1), all other first rows in 2010 and 2011 should be deleted. Then the third row of first column is <1, then all other third columns should be deleted and so on...
The result should look like:
a b c d
[4,] 1 2 1 1
[6,] 1 2 2 4
[7,] 5 5 5 5
[8,] 4 5 6 6
[10,] 7 8 7 1
$`2011`
e f g h
[4,] 2 1 2 1
[6,] 2 3 9 7
[7,] 3 4 3 8
[8,] 6 5 2 8
[10,] 2 4 9 2
We can use rowSums
lapply(list, function(x) x[!rowSums(x <1),])
If we need to remove the rows that are common
ind <- Reduce(`&`, lapply(list, function(x) !rowSums(x < 1)))
lapply(list, function(x) x[ind,])
# a b c d
#[1,] 1 2 1 1
#[2,] 1 2 2 4
#[3,] 5 5 5 5
#[4,] 4 5 6 6
#[5,] 7 8 7 1
#$`2011`
# e f g h
#[1,] 2 1 2 1
#[2,] 2 3 9 7
#[3,] 3 4 3 8
#[4,] 6 5 2 8
#[5,] 2 4 9 2
Update
Based on the OP's comments about removing rows where the row is greater than the standard deviation of each columns,
lapply(list, function(x) {
for(i in seq_len(ncol(x))) x <- x[!rowSums(x > sd(x[,i])),]
x
})
# get union of the row index with at least one of the elements less 1
removed <- Reduce(union, lapply(list, function(x) which(rowSums(x < 1) != 0)))
lapply(list, function(x) x[-removed, ])
$`2010`
a b c d
[1,] 1 2 1 1
[2,] 1 2 2 4
[3,] 5 5 5 5
[4,] 4 5 6 6
[5,] 7 8 7 1
$`2011`
e f g h
[1,] 2 1 2 1
[2,] 2 3 9 7
[3,] 3 4 3 8
[4,] 6 5 2 8
[5,] 2 4 9 2
I have written a function with 3 nested foreach loops, running in parallel. The goal of the function is to split a list of 30 [10,5] matrices (i.e. [[30]][10,5]) into a list of 5 [10,30] matrices (i.e. [[5]][10,30]).
However, I am trying to run this function with 1,000,000 paths (i.e. foreach (m = 1:1000000)), and obviously, the performance is terrible.
I'd like to avoid apply functions if possible because I've found that they don't work well when used in conjunction with parallel foreach loops:
library(foreach)
library(doParallel)
# input matr: a list of 30 [10,5] matrices
matrix_splitter <- function(matr) {
time_horizon <- 30
paths <- 10
asset <- 5
security_paths <- foreach(i = 1:asset, .combine = rbind, .packages = "doParallel", .export = "daily") %dopar% {
foreach(m = 1:paths, .combine = rbind, .packages = "doParallel", .export = "daily") %dopar% {
foreach(p = daily, .combine = c) %dopar% {
p[m,i]
}
}
}
df_securities <- as.data.frame(security_paths)
split(df_securities, sample(rep(1:paths), asset))
}
Overall, I'm trying to convert this data format:
[[30]]
[,1] [,2] [,3] [,4] [,5]
[1,] 0.2800977 2.06715521 0.9196326 0.3560659 1.36126507
[2,] -0.5119867 0.24329025 0.1513218 -1.2528092 -0.04795098
[3,] -2.0293933 -1.17989270 0.3053376 -0.9528611 0.86758140
[4,] -0.6419024 -0.24846720 -0.6640066 -1.7104961 -0.32759406
[5,] -0.4340359 -0.44034013 3.3440507 0.7380613 2.01237069
[6,] -0.6679914 -0.01332117 1.9286056 -0.7194116 0.15549978
[7,] 0.5919820 0.11616685 -0.8424634 -0.7652715 1.34176688
[8,] 0.8079152 0.40592119 -0.4291811 0.9358829 -0.97479314
[9,] -0.0265207 -0.03598320 1.1287344 0.4732984 1.37792596
[10,] 1.0553966 0.65776721 -1.2833613 -0.2414846 0.81528686
To this format (obviously up to V30):
$`5`
V1 V2 V3 V4 V5 V6 V7
result.2 -0.11822260 1.7712833 1.97737285 -1.6643193 0.4788075 1.2394064 1.4800787
result.7 -1.23251178 0.4267885 -0.07728632 0.3463092 0.8766395 0.6324840 0.5946710
result.2.1 -1.27309457 -0.3128173 -0.79561297 -0.4713307 -0.4344864 0.4688124 -0.5646857
result.7.1 0.51702719 -1.6242650 -2.37976199 -0.1088408 0.4846507 -0.7594376 0.9326529
result.2.2 1.77550390 0.9279155 0.26168402 0.4893835 1.4131326 0.5989508 -0.3434010
result.7.2 -0.01590682 -0.5568578 1.35789122 -0.1385092 -0.4501515 -0.2581724 0.5451699
result.2.3 0.30400225 -1.0245640 -0.05285694 -0.1354228 0.3070331 -0.7618850 1.0330961
result.7.3 -0.08139912 0.4106541 1.40418839 0.2471505 1.2106539 1.3844721 0.4006751
result.2.4 0.94977544 -0.8045054 1.48791211 1.4361686 -0.3789274 -1.9570125 -1.6576634
result.7.4 0.70449194 1.6887800 0.56447340 0.6465640 2.6865388 -0.7367524 0.6242624
V8 V9 V10 V11 V12 V13
result.2 -0.432404728 -1.6225350 0.09855465 0.17371907 0.3081843 0.15148452
result.7 -0.597420706 0.6173004 0.07518596 2.01741406 0.1767152 -0.39219471
result.2.1 0.918408322 -1.6896424 -0.13409626 0.38674224 0.3491750 -1.61083286
result.7.1 2.564057340 -0.7696399 1.06103614 1.38528367 1.1684045 -0.08467871
result.2.2 0.951995816 0.1910284 1.79943500 2.13909498 0.2847664 0.31094568
result.7.2 -0.479349220 -0.2368760 0.04298525 -0.40385960 0.3986555 -1.93499213
result.2.3 -1.382370069 1.0459845 -0.33106323 -0.43362925 0.7045572 -0.30211601
result.7.3 -1.457106442 0.1487447 -2.52392942 -0.02399523 -1.0349746 0.87666365
result.2.4 -0.848879365 0.7521024 0.16790915 0.47112444 0.8886361 -0.12733039
result.7.4 -0.003350467 0.4021858 -1.80031445 -1.42399232 1.0507765 -0.36193846
The package plyr is designed for this problem thanks to alply. The idea is: unlist your list, fromat it in the appropriate way in an array, and convert this array to a list of matrix using alply.
Example of transformation of a list of 2 matrix 3x5 to a list of 5 matrix 2x3:
library(plyr)
lst = list(matrix(1:15, ncol=5), matrix(10:24, ncol=5))
alply(array(unlist(lst), c(2,3,5)),3)
#$`1`
# [,1] [,2] [,3]
#[1,] 1 3 5
#[2,] 2 4 6
#$`2`
# [,1] [,2] [,3]
#[1,] 7 9 11
#[2,] 8 10 12
#$`3`
# [,1] [,2] [,3]
#[1,] 13 15 11
#[2,] 14 10 12
#$`4`
# [,1] [,2] [,3]
#[1,] 13 15 17
#[2,] 14 16 18
#$`5`
# [,1] [,2] [,3]
#[1,] 19 21 23
#[2,] 20 22 24
I believe you are looking for the answer to this:
Function to split a matrix into sub-matrices in R
You would just use do.call(rbind, matlist) as an input to those functions.
I would convert all of your list into a great big vector, and then re-dimension it.
For my solution, I started with:
[[28]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 11 21 31 41
[2,] 2 12 22 32 42
[3,] 3 13 23 33 43
[4,] 4 14 24 34 44
[5,] 5 15 25 35 45
[6,] 6 16 26 36 46
[7,] 7 17 27 37 47
[8,] 8 18 28 38 48
[9,] 9 19 29 39 49
[10,] 10 20 30 40 50
Repeated thirty times. This is the variable orig. My code:
flattened.vec <- unlist(orig) #flatten the list of matrices into one big vector
dim(flattened.vec) <-c(10,150) #need to rearrange the vector so the re-shape comes out right
transposed.matrix <- t(flattened.vec) #transposing to make sure right elements go to the right place
new.matrix.list <- split(transposed.matrix,cut(seq_along(transposed.matrix)%%5, 10, labels = FALSE)) #split the big, transposed matrix into 5 10x30 matrices
This code gives you 5 vectors, that you need to dim(10,30) and then use t() on them in a foreach to get 5 30X10 vectors (I would normally use an apply function, and am not familiar with the foreach library).
End result for one of the 5 matrices result after doing so:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17]
[1,] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2,] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3,] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4,] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5,] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
[6,] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
[7,] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
[8,] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
[9,] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
[10,] 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
[,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
[1,] 1 1 1 1 1 1 1 1 1 1 1 1 1
[2,] 2 2 2 2 2 2 2 2 2 2 2 2 2
[3,] 3 3 3 3 3 3 3 3 3 3 3 3 3
[4,] 4 4 4 4 4 4 4 4 4 4 4 4 4
[5,] 5 5 5 5 5 5 5 5 5 5 5 5 5
[6,] 6 6 6 6 6 6 6 6 6 6 6 6 6
[7,] 7 7 7 7 7 7 7 7 7 7 7 7 7
[8,] 8 8 8 8 8 8 8 8 8 8 8 8 8
[9,] 9 9 9 9 9 9 9 9 9 9 9 9 9
[10,] 10 10 10 10 10 10 10 10 10 10 10 10 10
Incidentally, this is probably what the plyr package does on its own already (as posted by Colonel Beauvel), just manually instead of using an external library
I have a table which is
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 10 0.00040803 0.00255277
[2,] 1 11 3 0.01765470 0.01584580
[3,] 1 6 2 0.15514850 0.15509000
[4,] 1 8 14 0.02100531 0.02572320
[5,] 1 9 4 0.04748648 0.00843252
[6,] 2 5 10 0.00040760 0.06782680
[7,] 2 11 3 0.01765480 0.01584580
[8,] 2 6 2 0.15514810 0.15509000
[9,] 2 8 14 0.02100491 0.02572320
[10,] 2 9 4 0.04748608 0.00843252
[11,] 3 5 10 0.00040760 0.06782680
[12,] 3 11 3 0.01765480 0.01584580
[13,] 3 8 14 0.02100391 0.02572320
[14,] 3 9 4 0.04748508 0.00843252
[15,] 4 5 10 0.00040760 0.06782680
[16,] 4 11 3 0.01765480 0.01584580
[17,] 4 8 14 0.02100391 0.02572320
[18,] 4 9 4 0.04748508 0.00843252
[19,] 5 8 14 0.02100391 0.02572320
[20,] 5 9 4 0.04748508 0.00843252
I want to remove duplicates from this table. However, only colums 2,3,4 matter. Example: rows 1,6,11,15 are identical if only columns 2,3,4 are observed. Note for column 4: is it possible to incorporate that it is considered as being the same as long as it is within 10e-5 of the value? So that rows 1 and 6 would be considered as being identical although the value in column 4 differs slightly (within the tolerance I mentioned)?
Then it would be great to get an output which would be like:
column 2 value | column 3 value | column 1 value at which the the pair has been first observed (with the tolerance) (in the example 1) | column 1 value at which the pair has been last observed (with tolerance) (in the example 4) | value of column 4 at first appearance (0.00040803 in the example)
This is a way of thinking about it, but I'm not sure it's what you're looking for. The logic should be able to get you started though.
dat <- YOUR DATA SET
dat
V1 V2 V3 V4 V5
1 1 5 10 0.00040803 0.00255277
2 1 11 3 0.01765470 0.01584580
3 1 6 2 0.15514850 0.15509000
4 1 8 14 0.02100531 0.02572320
5 1 9 4 0.04748648 0.00843252
# TRUNCATED
dat <- dat[, c(2, 3, 4)]
dat$V4 <- round(dat$V4, 5)
unique(dat)
V2 V3 V4
1 5 10 0.00041
2 11 3 0.01765
3 6 2 0.15515
4 8 14 0.02101
5 9 4 0.04749
9 8 14 0.02100
You could do something like this:
# read your data
yy <- read.csv('your-data.csv', header=F)
## V1 V2 V3 V4 V5
## 1 1 5 10 0.00040803 0.00255277
## 2 1 11 3 0.01765470 0.01584580
## 3 1 6 2 0.15514850 0.15509000
## 4 1 8 14 0.02100531 0.02572320
# create a logical matrix indicating value is within tolerance
mat.eq.tol <- sapply(yy$V4, function(x) abs(yy$V4-x) < 1E-5)
# minimum index
eq.min <- apply(mat.eq.tol, 1, function(x) min(which(x)))
# maximum index
eq.max <- apply(mat.eq.tol, 1, function(x) max(which(x)))
# combine result
res <- cbind(yy$V2, yy$V3, yy$V1[eq.min], yy$V1[eq.max], yy$V4[eq.min])
## [,1] [,2] [,3] [,4] [,5]
## [1,] 5 10 1 4 0.00040803
## [2,] 11 3 1 4 0.01765470
## [3,] 6 2 1 2 0.15514850
## [4,] 8 14 1 5 0.02100531
## [5,] 9 4 1 5 0.04748648
## [6,] 5 10 1 4 0.00040803
Suppose I have a matrix m and a positive integer vector v, what I want to do is get a new matrix m_new and each row of m (say m[i, ]) are replicated by v[i] times in m_new. For example:
m = matrix(1:6, nrow = 3)
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
v = c(3, 1, 2)
And m_new should be:
[,1] [,2]
[1,] 1 4 # m[1, ] is replicated by
[2,] 1 4 # v[1] = 3
[3,] 1 4 # times
[4,] 2 5
[5,] 3 6
[6,] 3 6
A for loop will make it for the small case:
m_new = matrix(0, sum(v), ncol(m))
k = 1
for(i in 1:nrow(m)){
for(j in k:(k+v[i]-1)){
m_new[j, ] = m[i, ]
}
k = k + v[i]
}
, but the row number of m in real world is usually big. Is there any effient way to do this?
m[rep(1:nrow(m), times = v), ]
# [,1] [,2]
# [1,] 1 4
# [2,] 1 4
# [3,] 1 4
# [4,] 2 5
# [5,] 3 6
# [6,] 3 6
> m <- matrix(1:25, ncol=5)
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
> apply(m, 2, function(c) rep(c,v))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 2 7 12 17 22
[4,] 3 8 13 18 23
[5,] 3 8 13 18 23
[6,] 3 8 13 18 23
[7,] 4 9 14 19 24
[8,] 4 9 14 19 24
[9,] 4 9 14 19 24
[10,] 4 9 14 19 24
[11,] 5 10 15 20 25
[12,] 5 10 15 20 25
[13,] 5 10 15 20 25
[14,] 5 10 15 20 25
[15,] 5 10 15 20 25
I have a data.frame with numeric values. How can I replace the each row maximum with 0
So for example in a row:
10, 1, 3, 4
the output would be
0, 1, 3, 4
I tried:
df1 <- data.frame(df)[apply(df,1,which.max=0)]
but I have something wrong.
I would be grateful for your help.
How about
replace_max <- function(x){x[which.max(x)] <- 0;x}
t(apply(df, 1, replace_max))
or
library(plyr)
adply(df, 1, replace_max)
EDIT to do rows
EDIT:2 to ensure data.frame
Here's how I would do it:
a <-matrix(round(runif(25)*100,0),ncol=5) #create matrix
my.max <-apply(a,1,which.max) #find max position by row
> a
[,1] [,2] [,3] [,4] [,5]
[1,] 62 14 19 64 40
[2,] 74 83 26 95 14
[3,] 32 69 24 12 67
[4,] 100 57 19 3 16
[5,] 41 6 93 85 67
z <-cbind(1:5,my.max) #create coordinates
a[z] <-0 #replace those entries
> a
[,1] [,2] [,3] [,4] [,5]
[1,] 62 14 19 0 40
[2,] 74 83 26 0 14
[3,] 32 0 24 12 67
[4,] 0 57 19 3 16
[5,] 41 6 0 85 67
Try this:
#Generating a fake dataframe:
df=data.frame(A=c(1:5), B=c(6,111,5,7,10), C=c(11,28,65,7,15) , D=c(21:25))
> df
A B C D
1 1 6 11 21
2 2 111 28 22
3 3 5 65 23
4 4 7 7 24
5 5 10 15 25
n=length(rownames(df))
for(i in 1:n){
c1=as.numeric(which.max(df[i,]))
df[i,c1]=0
}
df #output
A B C D
1 1 6 11 0
2 2 0 28 22
3 3 5 0 23
4 4 7 7 0
5 5 10 15 0
How about:
x <- matrix(sample(1:16),nrow=4)
x
[,1] [,2] [,3] [,4]
[1,] 1 12 6 4
[2,] 16 2 13 15
[3,] 11 8 10 7
[4,] 14 9 5 3
x*as.logical(x-apply(x,1,max))
[,1] [,2] [,3] [,4]
[1,] 1 0 6 4
[2,] 0 2 13 15
[3,] 0 8 10 7
[4,] 0 9 5 3