conditionally dropping columns from a matrix - r

I have a matrix:
set.seed(23)
dt.data <- unname(as.matrix(data.frame(rnorm(5,30,2),rnorm(5,-3,1),
replicate(3,rnorm(5,5,1)))))
dt.data
# [,1] [,2] [,3] [,4] [,5]
#[1,] 30.38642 -1.892510 5.218288 5.308137 5.835391
#[2,] 29.13064 -3.278086 3.953465 4.479822 4.433985
#[3,] 31.82653 -1.980795 4.711311 4.557686 5.788419
#[4,] 33.58678 -2.954563 5.481550 4.400687 3.834071
#[5,] 31.99321 -1.424220 3.783624 6.294578 4.469180
I'd like to drop all columns from the matrix whose mean is less than zero OR greater than 25 (i.e. - the first 2 columns above). I've been trying it with an apply function:
apply(dt.data,2,
function(x) if ((mean(x,na.rm=TRUE))>25 | (mean(x,na.rm=TRUE)<0)) {
dt.data<-dt.data[,-x]
})
I can use another apply function to check the means and drop them manually, but I'd like to generalize the procedure. The above function doesn't work, and throws the following error:
Error in dt.data[, -x] : only 0's may be mixed with negative subscripts
Any tips?

colMeans should do it:
m <- colMeans(dt.data, na.rm=TRUE)
dt.data[, !(m > 25 | m < 0)]

Related

Fill inner cells based on margins

I have the following data
margin1 <- c(72,34,446,40,33,71,2,96)
margin2 <- c(70,36,455,41,36,56,2,98)
propabilities <- matrix(1/8,8,8)
Now I would like to fill the inner cells of a 8x8 matrix by multiplying the following logic
matrix <- matrix(0,8,8)
matrix[1,] <- probabilities[1,]*margin2[1]
matrix[2,] <- probabilities[2,]*margin2[2]
matrix[3,] <- probabilities[3,]*margin2[3]
matrix[4,] <- probabilities[4,]*margin2[4]
matrix[5,] <- probabilities[5,]*margin2[5]
matrix[6,] <- probabilities[6,]*margin2[6]
matrix[7,] <- probabilities[7,]*margin2[7]
matrix[8,] <- probabilities[8,]*margin2[8]
However, what makes this difficult is, that the inner cells should always be integers. Therefore, I wrote the following rounding function:
rounding <- function(x) {
output <- matrix(0,8,8)
for(i in 1:nrow(x)){
obj <- x[i,]
y <- floor(obj)
indices <- tail(order(obj-y), round(sum(obj)) - sum(y))
y[indices] <- y[indices] + 1
output[i,]<- y
}
x <- output
return(x)
}
My expected output is the following:
matrix <- rounding(matrix)
While this works to ensure, that the rowSums of the matrix object are equal to margin2, the colSums do not equal margin1. This however, is exactly what I would need. Is there any way to rewrite the rounding function, that would achieve this?
Provided I have understood you correctly, the problem you're describing refers to the question, how to fill a matrix given its row and column sums (the "margins" as you call them).
In your particular case, you're trying to fill an 8x8 matrix. Since you have 64 unknowns, but 8 + 8 - 1 = 15 independent equations (8 row sums, 8 column sums, minus 1 because the sum of the row sums must be equal to the sum of the column sums) the bottom line is that there doesn't exist a unique solution, and instead there will be many.
If matrix values can be rational numbers you can fill the matrix with values margin2_i * margin1_j / sum(margin2) for row i and column j, or in R
mat <- margin2 %*% t(margin1) / sum(margin1)
mat
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 6.3476071 2.99748111 39.319899 3.5264484 2.90931990 6.2594458
#[2,] 3.2644836 1.54156171 20.221662 1.8136020 1.49622166 3.2191436
#[3,] 41.2594458 19.48362720 255.579345 22.9219144 18.91057935 40.6863980
#[4,] 3.7178841 1.75566751 23.030227 2.0654912 1.70403023 3.6662469
#[5,] 3.2644836 1.54156171 20.221662 1.8136020 1.49622166 3.2191436
#[6,] 5.0780856 2.39798489 31.455919 2.8211587 2.32745592 5.0075567
#[7,] 0.1813602 0.08564232 1.123426 0.1007557 0.08312343 0.1788413
#[8,] 8.8866499 4.19647355 55.047859 4.9370277 4.07304786 8.7632242
# [,7] [,8]
#[1,] 0.176322418 8.4634761
#[2,] 0.090680101 4.3526448
#[3,] 1.146095718 55.0125945
#[4,] 0.103274559 4.9571788
#[5,] 0.090680101 4.3526448
#[6,] 0.141057935 6.7707809
#[7,] 0.005037783 0.2418136
#[8,] 0.246851385 11.8488665
We can confirm that indeed
the row sum of mat is equal to margin2
identical(rowSums(mat), margin2)
#[1] TRUE
and that
the column sum of mat is equal to margin1
identical(colSums(mat), margin1)
#[1] TRUE
The problem is more complex if you want to restrict matrix values to only integer values. Here I would refer you to an excellent post on Mathematics that illustrates an iterative solution strategy.

Finding cumulative sum and then average the values in R

I want to compute cumulative sum for the first (n-1) columns(if we have n columns matrix) and subsequently average the values. I created a sample matrix to do this task. I have the following matrix
ma = matrix(c(1:10), nrow = 2, ncol = 5)
ma
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
I wanted to find the following
ans = matrix(c(1,2,2,3,3,4,4,5), nrow = 2, ncol = 4)
ans
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
The following are my r function.
ColCumSumsAve <- function(y){
for(i in seq_len(dim(y)[2]-1)) {
y[,i] <- cumsum(y[,i])/i
}
}
ColCumSumsAve(ma)
However, when I run the above function its not producing any output. Are there any mistakes in the code?
Thanks.
There were several mistakes.
Solution
This is what I tested and what works:
colCumSumAve <- function(m) {
csum <- t(apply(X=m, MARGIN=1, FUN=cumsum))
res <- t(Reduce(`/`, list(t(csum), 1:ncol(m))))
res[, 1:(ncol(m)-1)]
}
Test it with:
> colCumSumAve(ma)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
which is correct.
Explanation:
colCumSumAve <- function(m) {
csum <- t(apply(X=m, MARGIN=1, FUN=cumsum)) # calculate row-wise colsum
res <- t(Reduce(`/`, list(t(csum), 1:ncol(m))))
# This is the trickiest part.
# Because `csum` is a matrix, the matrix will be treated like a vector
# when `Reduce`-ing using `/` with a vector `1:ncol(m)`.
# To get quasi-row-wise treatment, I change orientation
# of the matrix by `t()`.
# However, the output, the output will be in this transformed
# orientation as a consequence. So I re-transform by applying `t()`
# on the entire result at the end - to get again the original
# input matrix orientation.
# `Reduce` using `/` here by sequencial list of the `t(csum)` and
# `1:ncol(m)` finally, has as effect `/`-ing `csum` values by their
# corresponding column position.
res[, 1:(ncol(m)-1)] # removes last column for the answer.
# this, of course could be done right at the beginning,
# saving calculation of values in the last column,
# but this calculation actually is not the speed-limiting or speed-down-slowing step
# of these calculations (since this is sth vectorized)
# rather the `apply` and `Reduce` will be rather speed-limiting.
}
Well, okay, I could do then:
colCumSumAve <- function(m) {
csum <- t(apply(X=m[, 1:(ncol(m)-1)], MARGIN=1, FUN=cumsum))
t(Reduce(`/`, list(t(csum), 1:ncol(m))))
}
or:
colCumSumAve <- function(m) {
m <- m[, 1:(ncol(m)-1)] # remove last column
csum <- t(apply(X=m, MARGIN=1, FUN=cumsum))
t(Reduce(`/`, list(t(csum), 1:ncol(m))))
}
This is actually the more optimized solution, then.
Original Function
Your original function makes only assignments in the for-loop and doesn't return anything.
So I copied first your input into a res, processed it with your for-loop and then returned res.
ColCumSumsAve <- function(y){
res <- y
for(i in seq_len(dim(y)[2]-1)) {
res[,i] <- cumsum(y[,i])/i
}
res
}
However, this gives:
> ColCumSumsAve(ma)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1.5 1.666667 1.75 9
[2,] 3 3.5 3.666667 3.75 10
The problem is that the cumsum in matrices is calculated in column-direction instead row-wise, since it treats the matrix like a vector (which goes columnwise through the matrix).
Corrected Original Function
After some frickeling, I realized, the correct solution is:
ColCumSumsAve <- function(y){
res <- matrix(NA, nrow(y), ncol(y)-1)
# create empty matrix with the dimensions of y minus last column
for (i in 1:(nrow(y))) { # go through rows
for (j in 1:(ncol(y)-1)) { # go through columns
res[i, j] <- sum(y[i, 1:j])/j # for each position do this
}
}
res # return `res`ult by calling it at the end!
}
with the testing:
> ColCumSumsAve(ma)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
Note: dim(y)[2] is ncol(y) - and dim(y)[1] is nrow(y) -
and instead seq_len(), 1: is shorter and I guess even slightly faster.
Note: My solution given first will be faster, since it uses apply, vectorized cumsum and Reduce. - for-loops in R are slower.
Late Note: Not so sure that the first solution is faster. Since R-3.x it seems that for loops are faster. Reduce will be the speed limiting funtion and can be sometimes incredibly slow.
k <- t(apply(ma,1,cumsum))[,-ncol(k)]
for (i in 1:ncol(k)){
k[,i] <- k[,i]/i
}
k
This should work.
All you need is rowMeans:
nc <- 4
cbind(ma[,1],sapply(2:nc,function(x) rowMeans(ma[,1:x])))
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
Here's how I did it
> t(apply(ma, 1, function(x) cumsum(x) / 1:length(x)))[,-NCOL(ma)]
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
This applies the cumsum function row-wise to the matrix ma and then divides by the correct length to get the average (cumsum(x) and 1:length(x) will have the same length). Then simply transpose with t and remove the last column with [,-NCOL(ma)].
The reason why there is no output from your function is because you aren't returning anything. You should end the function with return(y) or simply y as Marius suggested. Regardless, your function doesn't seem to give you the correct response anyway.

How to use some apply function to solve what requires two for-loops in R

I have a matrix, named "mat", and a smaller matrix, named "center".
temp = c(1.8421,5.6586,6.3526,2.904,3.232,4.6076,4.8,3.2909,4.6122,4.9399)
mat = matrix(temp, ncol=2)
[,1] [,2]
[1,] 1.8421 4.6076
[2,] 5.6586 4.8000
[3,] 6.3526 3.2909
[4,] 2.9040 4.6122
[5,] 3.2320 4.9399
center = matrix(c(3, 6, 3, 2), ncol=2)
[,1] [,2]
[1,] 3 3
[2,] 6 2
I need to compute the distance between each row of mat with every row of center. For example, the distance of mat[1,] and center[1,] can be computed as
diff = mat[1,]-center[1,]
t(diff)%*%diff
[,1]
[1,] 3.92511
Similarly, I can find the distance of mat[1,] and center[2,]
diff = mat[1,]-center[2,]
t(diff)%*%diff
[,1]
[1,] 24.08771
Repeat this process for each row of mat, I will end up with
[,1] [,2]
[1,] 3.925110 24.087710
[2,] 10.308154 7.956554
[3,] 11.324550 1.790750
[4,] 2.608405 16.408805
[5,] 3.817036 16.304836
I know how to implement it with for-loops. I was really hoping someone could tell me how to do it with some kind of an apply() function, maybe mapply() I guess.
Thanks
apply(center, 1, function(x) colSums((x - t(mat)) ^ 2))
# [,1] [,2]
# [1,] 3.925110 24.087710
# [2,] 10.308154 7.956554
# [3,] 11.324550 1.790750
# [4,] 2.608405 16.408805
# [5,] 3.817036 16.304836
If you want the apply for expressiveness of code that's one thing but it's still looping, just different syntax. This can be done without any loops, or with a very small one across center instead of mat. I'd just transpose first because it's wise to get into the habit of getting as much as possible out of the apply statement. (The BrodieG answer is pretty much identical in function.) These are working because R will automatically recycle the smaller vector along the matrix and do it much faster than apply or for.
tm <- t(mat)
apply(center, 1, function(m){
colSums((tm - m)^2) })
Use dist and then extract the relevant submatrix:
ix <- 1:nrow(mat)
as.matrix( dist( rbind(mat, center) )^2 )[ix, -ix]
6 7
# 1 3.925110 24.087710
# 2 10.308154 7.956554
# 3 11.324550 1.790750
# 4 2.608405 16.408805
# 5 3.817036 16.304836
REVISION: simplified slightly.
You could use outer as well
d <- function(i, j) sum((mat[i, ] - center[j, ])^2)
outer(1:nrow(mat), 1:nrow(center), Vectorize(d))
This will solve it
t(apply(mat,1,function(row){
d1<-sum((row-center[1,])^2)
d2<-sum((row-center[2,])^2)
return(c(d1,d2))
}))
Result:
[,1] [,2]
[1,] 3.925110 24.087710
[2,] 10.308154 7.956554
[3,] 11.324550 1.790750
[4,] 2.608405 16.408805
[5,] 3.817036 16.304836

check column values and print/delete rows satisfying condition based on percent of columns

I have a matrix of values arranged in different columns per row.
What I want my code to do :
Iterate over a row -> check if value of Column < threshold (e.g. 1)
Within the row, if there are say 80% columns satisfying that condition, Keep the row ; else remove the full row.
Code so far :
myfilt <- function(t,x){
if ((length(which(t[x,] > 1)) / 60) >= 0.8){
return(1)
}else{
return(0)
}
}
y=c()
for(i in 1:length(t[,1])){
y = c(y,myfilt(t,i))
}
But when I print t[v,] all the rows have same value :(
Not sure what I am doing wrong. Also if there is a shorter way to do this, let me know.
P.S. : Here 't' is the name of matrix I am testing
Here's a way to do it :
## Parameters
threshold <- 0.8
perc.to.keep <- 0.5
## Example Matrix
set.seed(1337)
m <- matrix(rnorm(25,1,1),nrow=5,ncol=5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1.7122837 0.8383025 -0.02718379 2.2157099 2.1291008
# [2,] 0.2462742 2.4602621 -0.04117532 -0.6214087 1.4501467
# [3,] 1.0381899 3.0094584 0.12937698 0.9314247 1.0505864
# [4,] 2.1784211 0.9220618 1.85313022 0.9370171 0.8756698
# [5,] 0.8467962 2.3543421 0.37723981 2.0757077 1.9120115
test <- m < threshold
sel <- apply(test,1,function(v) sum(v)/length(v)) < perc
m[sel,]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1.7122837 0.8383025 -0.02718379 2.2157099 2.1291008
# [2,] 1.0381899 3.0094584 0.12937698 0.9314247 1.0505864
# [3,] 2.1784211 0.9220618 1.85313022 0.9370171 0.8756698
# [4,] 0.8467962 2.3543421 0.37723981 2.0757077 1.9120115

In R: How to perform log on elements of a matrix without a loop

In R: How do I perform a log operation on every element of a matrix without using a loop?
I have a matrix m and I want each element to be replaced by its log.
log(m) doesn't work.
params = array(list(),c(2, 2))
then I manually set all elements.
params
[,1] [,2]
[1,] 3 3
[2,] 3 3
log(params)
Error in log(params) : Non-numeric argument to mathematical function
log(M) works for all of us on correct matrices:
R> M <- matrix(1:4,2)
R> M
[,1] [,2]
[1,] 1 3
[2,] 2 4
R> log(M)
[,1] [,2]
[1,] 0.00000 1.0986
[2,] 0.69315 1.3863
R>
Can you show us your M ?
You failed at making matrix -- this way you have a matrix of lists, and you want a matrix of numbers.
Try this:
params<-array(3,c(2,2))
log(params)
Now assume there is a 0 in some matrix cells. Furthermore, assume you want a 0 to appear in the output for such cells.
How would you compute the solution now?
Notice that log(0) is not a nice number.

Resources