I have a data.frame with numeric values. How can I replace the each row maximum with 0
So for example in a row:
10, 1, 3, 4
the output would be
0, 1, 3, 4
I tried:
df1 <- data.frame(df)[apply(df,1,which.max=0)]
but I have something wrong.
I would be grateful for your help.
How about
replace_max <- function(x){x[which.max(x)] <- 0;x}
t(apply(df, 1, replace_max))
or
library(plyr)
adply(df, 1, replace_max)
EDIT to do rows
EDIT:2 to ensure data.frame
Here's how I would do it:
a <-matrix(round(runif(25)*100,0),ncol=5) #create matrix
my.max <-apply(a,1,which.max) #find max position by row
> a
[,1] [,2] [,3] [,4] [,5]
[1,] 62 14 19 64 40
[2,] 74 83 26 95 14
[3,] 32 69 24 12 67
[4,] 100 57 19 3 16
[5,] 41 6 93 85 67
z <-cbind(1:5,my.max) #create coordinates
a[z] <-0 #replace those entries
> a
[,1] [,2] [,3] [,4] [,5]
[1,] 62 14 19 0 40
[2,] 74 83 26 0 14
[3,] 32 0 24 12 67
[4,] 0 57 19 3 16
[5,] 41 6 0 85 67
Try this:
#Generating a fake dataframe:
df=data.frame(A=c(1:5), B=c(6,111,5,7,10), C=c(11,28,65,7,15) , D=c(21:25))
> df
A B C D
1 1 6 11 21
2 2 111 28 22
3 3 5 65 23
4 4 7 7 24
5 5 10 15 25
n=length(rownames(df))
for(i in 1:n){
c1=as.numeric(which.max(df[i,]))
df[i,c1]=0
}
df #output
A B C D
1 1 6 11 0
2 2 0 28 22
3 3 5 0 23
4 4 7 7 0
5 5 10 15 0
How about:
x <- matrix(sample(1:16),nrow=4)
x
[,1] [,2] [,3] [,4]
[1,] 1 12 6 4
[2,] 16 2 13 15
[3,] 11 8 10 7
[4,] 14 9 5 3
x*as.logical(x-apply(x,1,max))
[,1] [,2] [,3] [,4]
[1,] 1 0 6 4
[2,] 0 2 13 15
[3,] 0 8 10 7
[4,] 0 9 5 3
Related
Imagine that you have these variables:
> a <- list(matrix(1:25, 5, 5, byrow = TRUE), matrix(31:55, 5, 5, byrow = TRUE))
> b <- list(rep(1, 5), rep(2, 5))
> a
[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
[4,] 16 17 18 19 20
[5,] 21 22 23 24 25
[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,] 31 32 33 34 35
[2,] 36 37 38 39 40
[3,] 41 42 43 44 45
[4,] 46 47 48 49 50
[5,] 51 52 53 54 55
> b
[[1]]
[1] 1 1 1 1 1
[[2]]
[1] 2 2 2 2 2
I want to end up with something like this:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 1
[2,] 1 2 3 4 5
[3,] 6 7 8 9 10
[4,] 11 12 13 14 15
[5,] 16 17 18 19 20
[6,] 21 22 23 24 25
[,1] [,2] [,3] [,4] [,5]
[1,] 2 2 2 2 2
[2,] 31 32 33 34 35
[3,] 36 37 38 39 40
[4,] 41 42 43 44 45
[5,] 46 47 48 49 50
[6,] 51 52 53 54 55
So, it is like having a Python zip-like function and then apply rbind.
Any idea?
An option is Map from base R
Map(rbind, b, a)
Or you can try:
lapply(1:length(a),function(i)rbind(b[[i]],a[[i]]))
Assuming length(a) == length(b)
One option is to use the purrr package.
library(purrr)
map2(b, a, rbind)
I have a matrix:
mat <- matrix(c(2,11,3,1,2,4,55,65,12,4,6,6,7,9,3,23,16,77,5,5,7),ncol = 3, byrow = TRUE)
[,1] [,2] [,3]
[1,] 2 11 3
[2,] 1 2 4
[3,] 55 65 12
[4,] 4 6 6
[5,] 7 9 3
[6,] 23 16 77
[7,] 5 5 7
I want to add a column with rows index. This index will starts at 1 and repeats the same index, until it arrived to a row where the rowsums is > 100 to move to the next value.
Indx[,2][,3][,4]
[1,] 1 2 11 3
[2,] 1 1 2 4
[3,] 2 55 65 12
[4,] 3 4 6 6
[5,] 3 7 9 3
[6,] 4 23 16 77
[7,] 5 5 5 7
Using rle:
matRle <- rle(rowSums(mat) > 100)$lengths
cbind(rep(seq(length(matRle)), matRle), mat)
# [,1] [,2] [,3] [,4]
# [1,] 1 2 11 3
# [2,] 1 1 2 4
# [3,] 2 55 65 12
# [4,] 3 4 6 6
# [5,] 3 7 9 3
# [6,] 4 23 16 77
# [7,] 5 5 5 7
A solution using dplyr.
library(dplyr)
mat2 <- mat %>%
as.data.frame() %>%
mutate(Indx = cumsum(rowSums(dat) > 100 | lag(rowSums(dat) > 100, default = TRUE))) %>%
select(Indx, paste0("V", 1:ncol(mat))) %>%
as.matrix()
mat2
# Indx V1 V2 V3
# [1,] 1 2 11 3
# [2,] 1 1 2 4
# [3,] 2 55 65 12
# [4,] 3 4 6 6
# [5,] 3 7 9 3
# [6,] 4 23 16 77
# [7,] 5 5 5 7
cbind(cumsum(replace(a<-rowSums(mat)>100,which(a==1)+1,1))+1,mat)
[,1] [,2] [,3] [,4]
[1,] 1 2 11 3
[2,] 1 1 2 4
[3,] 2 55 65 12
[4,] 3 4 6 6
[5,] 3 7 9 3
[6,] 4 23 16 77
[7,] 5 5 5 7
What does this do??:
first obtain the rowSums which are greater than 100
a<-rowSums(mat)>100
Then the next row for every row>100, should have the next index. Thus do a replace and cumsum:
cumsum(replace(a,which(a==1)+1,1))
Now you will realize that this starts from zero, so you add 1.
We could do this with rleid from data.table
library(data.table)
cbind(Indx = rleid(rowSums(mat) > 100), mat)
# Indx
#[1,] 1 2 11 3
#[2,] 1 1 2 4
#[3,] 2 55 65 12
#[4,] 3 4 6 6
#[5,] 3 7 9 3
#[6,] 4 23 16 77
#[7,] 5 5 5 7
I have a large matrix/dataframe (2500x20) and need, by column, a rolling deviation of the maximum less the minimum of all previous cells - excluding the current.
I apply my function by column. My function shifts the whole column (as vector) by 1:length, producing a list of lists. I need this as matrix to apply the max(x)-min(x) function. This works for a small matrix and runs too long for the size I need.
The (small) source (provided):
[,1] [,2] [,3] [,4] [,5]
[1,] 55 9 99 0 NA
[2,] 54 7 98 1 NA
[3,] 56 12 97 2 NA
[4,] 53 8 96 3 1
[5,] 57 22 95 4 0
[6,] 52 51 94 5 -1
[7,] 58 6 93 6 NA
[8,] 51 6 93 7 0
[9,] 59 51 92 8 2
[10,] 50 78 91 9 NA
[11,] 60 12 90 10 NA
[12,] 49 5 89 11 -2
Expected outcome:
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 0 0 0 0 0
[3,] 1 2 1 1 0
[4,] 2 5 2 2 0
[5,] 3 5 3 3 0
[6,] 4 15 4 4 1
[7,] 5 44 5 5 2
[8,] 6 45 6 6 2
[9,] 7 45 6 7 2
[10,] 8 45 7 8 3
[11,] 9 72 8 9 3
[12,] 10 72 9 10 3
The first result-row will always be 0 because it has no previous rows it could calculate from.
The second result-row will always be zero because the one previous row is the maximum and minimum value.
The last source-row will never influence the result.
What I have so far:
library(data.table)
mytest <- matrix(c(
55, 9,99, 0,NA,
54, 7,98, 1,NA,
56,12,97, 2,NA,
53, 8,96, 3, 1,
57,22,95, 4, 0,
52,51,94, 5,-1,
58, 6,93, 6,NA,
51, 6,93, 7, 0,
59,51,92, 8, 2,
50,78,91, 9,NA,
60,12,90,10,NA,
49, 5,89,11,-2
), ncol=5, byrow = TRUE)
rolling_deviation <- function (a_column){
tmp1 = shift(a_column, 1:(length(a_column)-1), NA, "lag")
tmp2 = matrix(unlist(tmp1), ncol = length(a_column), byrow = TRUE)
apply(tmp2,2,function(x){
x = x[!is.na(x)]
ifelse(length(x)==0, 0, max(x) - min(x))
})
}
apply(mytest,2,rolling_deviation)
I need this to calculate faster, there will be more rows as observations increase.
How about this?
> system.time(outcome<- apply(mytest,2,rolling_deviation))
user system elapsed
0.014 0.002 0.038
> system.time(outcome1<- setDT(data.frame(mytest))[, lapply(.SD, rolling_deviation)])
user system elapsed
0.002 0.000 0.002
The results are the same:
> outcome
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 0 0 0 0 0
[3,] 1 2 1 1 0
[4,] 2 5 2 2 0
[5,] 3 5 3 3 0
[6,] 4 15 4 4 1
[7,] 5 44 5 5 2
[8,] 6 45 6 6 2
[9,] 7 45 6 7 2
[10,] 8 45 7 8 3
[11,] 9 72 8 9 3
[12,] 10 72 9 10 3
> outcome1
X1 X2 X3 X4 X5
1: 0 0 0 0 0
2: 0 0 0 0 0
3: 1 2 1 1 0
4: 2 5 2 2 0
5: 3 5 3 3 0
6: 4 15 4 4 1
7: 5 44 5 5 2
8: 6 45 6 6 2
9: 7 45 6 7 2
10: 8 45 7 8 3
11: 9 72 8 9 3
12: 10 72 9 10 3
I solved my problem by creating a nested loop. I was taught that resorting to loops would almost always be bad for performance, but I can't find a better way. I need a helper function for my min/max operations because NA arguments are likely and not an error for my purpose.
rolling_range <- function(in_mat){
ignore_na = function(op, a,b){
if(is.na(a)){
return(b)
}else if(is.na(b)){
return(a)
}else{
return(op(a,b))
}
}
out_mat.min = matrix(NA, nrow = nrow(in_mat), ncol = ncol(in_mat))
out_mat.max = matrix(NA, nrow = nrow(in_mat), ncol = ncol(in_mat))
out_mat.result = matrix(0, nrow = nrow(in_mat), ncol = ncol(in_mat))
for(m in 1:ncol(in_mat)){
for(n in 2:nrow(in_mat)){
out_mat.min[n,m] = ignore_na(`min`, out_mat.min[(n-1),m], in_mat[(n-1),m])
out_mat.max[n,m] = ignore_na(`max`, out_mat.max[(n-1),m], in_mat[(n-1),m])
out_mat.result[n,m] = ifelse(is.na(out_mat.min[n,m]) || is.na(out_mat.max[n,m]), 0, out_mat.max[n,m] - out_mat.min[n,m] )
}
}
return(out_mat.result)
}
rolling_range(mytest)
How could I build a function that extracts the diagonal blocks matrices of a larger one? The problem is as follows. The function takes a centred matrix as argument, computes the full error covariance matrix and extracts the blocks on the leading diagonal? I tried the following, but not working.
err_cov <- function(x){
m <- nrow(x)
n <- ncol(x)
#compute the full error covariance matrix as the inner product
#of vec(x) and its transpose. Note that, omega is a mnxmn matrix
vec <- as.vector(x)
omega <- vec%*%t(vec)
sigmas <- list()
for(i in 0:n-1){
#here the blocks have to be m nxn matrices along the
#leading diagonal
for (j in 1:m)
sigmas[[j]] <- omega[(n*i+1):n*(i+1), (n*i+1):n*(i+1)]
}
return(sigmas)
}
So, for instance for
A
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> B<-as.vector(A)
> B
[1] 1 2 3 4 5 6 7 8 9 10 11 12
> C<-B%*%t(B)
> C
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 2 3 4 5 6 7 8 9 10 11 12
[2,] 2 4 6 8 10 12 14 16 18 20 22 24
[3,] 3 6 9 12 15 18 21 24 27 30 33 36
[4,] 4 8 12 16 20 24 28 32 36 40 44 48
[5,] 5 10 15 20 25 30 35 40 45 50 55 60
[6,] 6 12 18 24 30 36 42 48 54 60 66 72
[7,] 7 14 21 28 35 42 49 56 63 70 77 84
[8,] 8 16 24 32 40 48 56 64 72 80 88 96
[9,] 9 18 27 36 45 54 63 72 81 90 99 108
[10,] 10 20 30 40 50 60 70 80 90 100 110 120
[11,] 11 22 33 44 55 66 77 88 99 110 121 132
[12,] 12 24 36 48 60 72 84 96 108 120 132 144
The function should return:
> C1
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 4 6
[3,] 3 6 9
> C2
[,1] [,2] [,3]
[1,] 16 20 24
[2,] 20 25 30
[3,] 24 30 36
> C3
[,1] [,2] [,3]
[1,] 49 56 63
[2,] 56 64 72
[3,] 63 72 81
> C4
[,1] [,2] [,3]
[1,] 100 110 120
[2,] 110 121 132
[3,] 120 132 144
Thanks for answering.
I think a clearer solution is to reset the dimensions and then let R do the index calculations for you:
err_cov <- function(x){
m <- nrow(x)
n <- ncol(x)
#compute the full error covariance matrix as the inner product
#of vec(x) and its transpose
vec <- as.vector(x)
omega <- tcrossprod(vec)
dim(omega) <- c(n,m,n,m)
sigmas <- list()
for (j in 1:m)
sigmas[[j]] <- omega[,j,,j]
return(sigmas)
}
Here is an example:
> x
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> tcrossprod(vec)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 2 4 6 8 10 12
[3,] 3 6 9 12 15 18
[4,] 4 8 12 16 20 24
[5,] 5 10 15 20 25 30
[6,] 6 12 18 24 30 36
> err_cov(x)
[[1]]
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 4 6
[3,] 3 6 9
[[2]]
[,1] [,2] [,3]
[1,] 16 20 24
[2,] 20 25 30
[3,] 24 30 36
Suppose I have a matrix m and a positive integer vector v, what I want to do is get a new matrix m_new and each row of m (say m[i, ]) are replicated by v[i] times in m_new. For example:
m = matrix(1:6, nrow = 3)
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
v = c(3, 1, 2)
And m_new should be:
[,1] [,2]
[1,] 1 4 # m[1, ] is replicated by
[2,] 1 4 # v[1] = 3
[3,] 1 4 # times
[4,] 2 5
[5,] 3 6
[6,] 3 6
A for loop will make it for the small case:
m_new = matrix(0, sum(v), ncol(m))
k = 1
for(i in 1:nrow(m)){
for(j in k:(k+v[i]-1)){
m_new[j, ] = m[i, ]
}
k = k + v[i]
}
, but the row number of m in real world is usually big. Is there any effient way to do this?
m[rep(1:nrow(m), times = v), ]
# [,1] [,2]
# [1,] 1 4
# [2,] 1 4
# [3,] 1 4
# [4,] 2 5
# [5,] 3 6
# [6,] 3 6
> m <- matrix(1:25, ncol=5)
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
> apply(m, 2, function(c) rep(c,v))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 2 7 12 17 22
[4,] 3 8 13 18 23
[5,] 3 8 13 18 23
[6,] 3 8 13 18 23
[7,] 4 9 14 19 24
[8,] 4 9 14 19 24
[9,] 4 9 14 19 24
[10,] 4 9 14 19 24
[11,] 5 10 15 20 25
[12,] 5 10 15 20 25
[13,] 5 10 15 20 25
[14,] 5 10 15 20 25
[15,] 5 10 15 20 25