I have this matrix:
mat=matrix(c(1,1,1,2,2,2,3,4,
4,4,4,4,4,3,5,6,
3,3,5,5,6,8,0,9,
1,1,1,1,1,4,5,6),nrow=4,byrow=TRUE)
print(mat)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 1 1 2 2 2 3 4
[2,] 4 4 4 4 4 3 5 6
[3,] 3 3 5 5 6 8 0 9
[4,] 1 1 1 1 1 4 5 6
and a subset with the index of the row I want to apply my function:
subset=c(2,4)
I would like to add a new column in the matrix "mat" which contains, only for the subset I specified, the value of the object with the max frequency in the row.
In this case:
for row number 1, I would like to have an empty cell in the new column,
for row number 2, I would like to have the value "4" in the new column,
for row number 3, I would like to have an empty cell in the new column,
for row number 4, I would like to have the value "1" in the new column.
EDIT:
thanks for the code in the answer!
now i should replace the NA values with other values:
i have another matrix:
mat2=matrix(c(24,1,3,2, 4,4,4,4, 3,2,2,5, 1,3,5,1),nrow=4,byrow=TRUE)
[,1] [,2] [,3] [,4]
[1,] 24 1 3 2
[2,] 4 4 4 4
[3,] 3 2 2 5
[4,] 1 3 5 1
and the subset:
subset=c(1,3)
i want to replcace the NA of the matrix (the remaining rows out of the first subeset) with the colnames of the value of the row with the max value.
in this case, i will have "1" for the first row and "4" for the third one.
Your are looking for the mode. Unfortunately R doesn't provide a builtin mode function. But it is not too hard to write your own one:
## create mode function
modeValue <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
## add new column with NA
smat <- cbind(mat, NA)
## calculate mode for subset
smat[subset, ncol(smat)] <- apply(smat[subset, , drop=FALSE], 1, modeValue)
smat
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] 1 1 1 2 2 2 3 4 NA
# [2,] 4 4 4 4 4 3 5 6 4
# [3,] 3 3 5 5 6 8 0 9 NA
# [4,] 1 1 1 1 1 4 5 6 1
Here is a function that will work. It calculates such values (modes)for all rows then substitutes missings where desired:
myFunc <- function(x, myRows) {
myModes <- apply(mat, 1, FUN=function(i) {
temp<- table(i)
as.numeric(names(temp)[which.max(temp)])
})
myModes[setdiff(seq.int(nrow(x)), myRows)] <- NA
myModes
}
For the example, this returns
myFunc(mat, c(2,4))
[1] NA 4 NA 1
To add this to your matrix, just use cbind:
cbind(mat, myFunc(mat, c(2,4)))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 1 1 2 2 2 3 4 NA
[2,] 4 4 4 4 4 3 5 6 4
[3,] 3 3 5 5 6 8 0 9 NA
[4,] 1 1 1 1 1 4 5 6 1
Related
Given a vector, 1:4, and a sequence length, 2, I would like to separate the vector into 'sub-vectors', each with a length of 2, and generate a matrix of all possible combinations of these sub-vectors.
Output would look like this:
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 3 4 1 2
Another example. With vector 1:8 and sub-vector length of 4, output would look like this:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 2 3 4 5 6 7 8
[2,] 5 6 7 8 1 2 3 4
With a vector 1:9 and sub-vector length of 3, output would look like this:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 2 3 4 5 6 7 8 9
[2,] 1 2 3 7 8 9 4 5 6
[3,] 4 5 6 1 2 3 7 8 9
[4,] 4 5 6 7 8 9 1 2 3
[5,] 7 8 9 4 5 6 1 2 3
[6,] 7 8 9 1 2 3 4 5 6
It's a given that the vector length must be divisible by the sub-vector length.
I can answer the whole question, but it will take a bit longer. This should give you the flavour of the answer.
The package combinat has a function called permn which gives you the all the permutations of a vector. You want this, but not quite. What you need is the permutations of all the blocks. So in your first example you have two blocks of length two, and in your second example you have three blocks of length three. If we look at the first, and think about ordering the blocks:
> library(combinat)
> numBlocks = 2
> permn(1:numBlocks)
[[1]]
[1] 1 2
[[2]]
[1] 2 1
So I hope you can see that the first permutation would take the blocks b1 = c(1,2), and b2 = c(3,4) and order them c(b1,b2), and the second would order them c(b2,b1).
Equally if you had three blocks, b1 = 1:3; b2 = 4:6; b3 = 7:9 then
permn(1:3)
[[1]]
[1] 1 2 3
[[2]]
[1] 1 3 2
[[3]]
[1] 3 1 2
[[4]]
[1] 3 2 1
[[5]]
[1] 2 3 1
[[6]]
[1] 2 1 3
gives you the ordering of these blocks. The more general solution is figuring out how to move the blocks around, but that isn't too hard.
Update: Using my multicool package. Note co-lexical ordering (coolex) isn't the order you'd come up with by yourself.
library(multicool)
combs = function(v, blockLength){
if(length(v) %% blockLength != 0){
stop("vector length must be divisible by blockLength")
}
numBlocks = length(v) / blockLength
blockWise = matrix(v, nc = blockLength, byrow = TRUE)
m = initMC(1:numBlocks)
Perms = allPerm(m)
t(apply(Perms, 1, function(p)as.vector(t(blockWise[p,]))))
}
> combs(1:4, 2)
[,1] [,2] [,3] [,4]
[1,] 3 4 1 2
[2,] 1 2 3 4
> combs(1:9, 3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 7 8 9 4 5 6 1 2 3
[2,] 1 2 3 7 8 9 4 5 6
[3,] 7 8 9 1 2 3 4 5 6
[4,] 4 5 6 7 8 9 1 2 3
[5,] 1 2 3 4 5 6 7 8 9
[6,] 4 5 6 1 2 3 7 8 9
The title with the following example should be self-explanatory:
m = unique(replicate(5, sample(1:5, 5, rep=F)), MARGIN = 2)
m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 1 4 3
[2,] 5 1 5 1 2
[3,] 4 3 3 3 1
[4,] 3 4 4 5 5
[5,] 2 2 2 2 4
But what I want is instead:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 3 4 5
[2,] 5 5 2 1 1
[3,] 3 4 1 3 3
[4,] 4 3 5 5 4
[5,] 2 2 4 2 2
Ideally, I would like to find a method that allows the same process to be carried out when the column vectors are words (alphabetic order).
I tried things like m[ , sort(m)] but nothing did the trick...
m[, order(m[1, ]) will order the columns by the first row. m[, order(m[1, ], m[2, ])] will order by the first row, using second row as tie-breaker. Getting fancy, m[, do.call(order, split(m, row(m)))] will order the columns by the first row, using all subsequent rows for tie-breakers. This will work character data just as well as numeric.
set.seed(47)
m = replicate(5, sample(1:5, 5, rep=F))
m
# [,1] [,2] [,3] [,4] [,5]
# [1,] 5 4 1 5 1
# [2,] 2 2 3 2 3
# [3,] 3 5 5 1 2
# [4,] 4 3 2 3 5
# [5,] 1 1 4 4 4
m[, do.call(order, split(m, row(m)))]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 1 4 5 5
# [2,] 3 3 2 2 2
# [3,] 2 5 5 1 3
# [4,] 5 2 3 3 4
# [5,] 4 4 1 4 1
Let's say I have the below matrix:
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
I want to generate a matrix which is the concatenation (by column) of matrices that are generated by repetition of each column k times. For example, when k=3, below is what I want to get:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 1 2 2 2
[2,] 3 3 3 4 4 4
[3,] 5 5 5 6 6 6
How can I do that without a for loop?
You can do this with column indexing. A convenient way to repeat each column number the correct number of times is the rep function:
mat[,rep(seq_len(ncol(mat)), each=3)]
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 1 1 2 2 2
# [2,] 3 3 3 4 4 4
# [3,] 5 5 5 6 6 6
In the above expression, seq_len(ncol(mat)) is the sequence from 1 through the number of columns in the matrix (you could think of it like 1:ncol(mat), except it deals nicely with some special cases like 0-column matrices).
Data:
(mat <- matrix(1:6, nrow=3, byrow = TRUE))
# [,1] [,2]
# [1,] 1 2
# [2,] 3 4
# [3,] 5 6
We can repeat each element of matrix k times and fit the vector in a matrix where number of columns is k times the original one.
k <- 3
matrix(rep(t(mat), each = k), ncol = ncol(mat) * k, byrow = TRUE)
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1 1 1 2 2 2
#[2,] 3 3 3 4 4 4
#[3,] 5 5 5 6 6 6
I have this matrix:
mat=matrix(c(1,1,1,2,2,2,3,4,NA,
4,4,4,4,4,3,5,6,4,
3,3,5,5,6,8,0,9,NA,
1,1,1,1,1,4,5,6,1),nrow=4,byrow=TRUE)
print(mat)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] 1 1 1 2 2 2 3 4 NA
# [2,] 4 4 4 4 4 3 5 6 4
# [3,] 3 3 5 5 6 8 0 9 NA
# [4,] 1 1 1 1 1 4 5 6 1
I should replace the NA values with other values, in this way:
I have another matrix:
mat2=matrix(c(24,1,3,2, 4,4,4,4, 3,2,2,5, 1,3,5,1),nrow=4,byrow=TRUE)
[,1] [,2] [,3] [,4]
[1,] 24 1 3 2
[2,] 4 4 4 4
[3,] 3 2 2 5
[4,] 1 3 5 1
and the subset with the index of the rows with NA of the first matrix "mat":
subset=c(1,3)
I want to replcace the NA of the matrix with the colnames of the value of the row with the max value.
in this case, I will have "1" for the first row and "4" for the third one, I don't care about row 2 and 4.
Use this
mat[subset,9] <- apply(mat2[subset,],1,which.max)
mat[which(is.na(mat))] <- apply(mat2,1,max)[which(is.na(mat), arr.ind = T)[1,]]
This should replace every NA value with the maximum value from the same row in mat2. I don't have an open core to debug on so I hope this works. If you have any questions or it crashes just comment.
How can I sum the number of complete cases of two columns?
With c equal to:
a b
[1,] NA NA
[2,] 1 1
[3,] 1 1
[4,] NA 1
Applying something like
rollapply(c, 2, function(x) sum(complete.cases(x)),fill=NA)
I'd like to get back a single number, 2 in this case. This will be for a large data set with many columns, so I'd like to use rollapply across the whole set instead of simply doing sum(complete.cases(a,b)).
Am I over thinking it?
Thanks!
Did you try sum(complete.cases(x))?!
set.seed(123)
x <- matrix( sample( c(NA,1:5) , 15 , TRUE ) , 5 )
# [,1] [,2] [,3]
#[1,] 1 NA 5
#[2,] 4 3 2
#[3,] 2 5 4
#[4,] 5 3 3
#[5,] 5 2 NA
sum(complete.cases(x))
#[1] 3
To find the complete.cases() of the first two columns:
sum(complete.cases(x[,1:2]))
#[1] 4
And to apply to two columns of a matrix across the whole matrix you could do this:
# Bigger data for example
set.seed(123)
x <- matrix( sample( c(NA,1:5) , 50 , TRUE ) , 5 )
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,] 1 NA 5 5 5 4 5 2 NA NA
#[2,] 4 3 2 1 4 3 5 4 2 1
#[3,] 2 5 4 NA 3 3 4 1 2 2
#[4,] 5 3 3 1 5 1 4 1 2 1
#[5,] 5 2 NA 5 3 NA NA 1 NA 5
# Column indices
id <- seq( 1 , ncol(x) , by = 2 )
[1] 1 3 5 7 9
apply( cbind(id,id+1) , 1 , function(i) sum(complete.cases(x[,c(i)])) )
[1] 4 3 4 4 3
complete.cases() works row-wise across the whole data.frame or matrix returning TRUE for those rows which are not missing any data. A minor aside, "c" is a bad variable name because c() is one of the most commonly used functions.
You can calculate the number of complete cases in neighboring matrix columns using rollapply like this:
m <- matrix(c(NA,1,1,NA,1,1,1,1),ncol=4)
# [,1] [,2] [,3] [,4]
#[1,] NA 1 1 1
#[2,] 1 NA 1 1
library(zoo)
rowSums(rollapply(is.na(t(m)), 2, function(x) !any(x)))
#[1] 0 1 2
This shoudl work for both matrix and data.frame
> sum(apply(c, 1, function(x)all(!is.na(x))))
[1] 2
and you could simply iterate through large matrix M
for (i in 1:(ncol(M)-1) ){
c <- M[,c(i,i+1]
agreement <- sum(apply(c, 1, function(x)all(!is.na(x))))
}