# R - Create new matrix of only newly changed values, else insert NAs - r

In R language, I want to create a new matrix of only newly changed values, else insert NAs.
t12 below has correct logical answer, but I need code to get t12desired, please.
t1<-matrix(c(1,2,1,3,1,4),ncol=3,byrow=T); t1
t2<-matrix(c(1,1,1,3,1,4),ncol=3,byrow=T); t2
t12<-t2!=t1; t12
t12desired<-(matrix(c(NA,1,NA,NA,NA,NA),ncol=3,byrow=T)); t12desired

We can compare the datasets and then with NA^ returns NA for all TRUE and 1 for FALSE
NA^(t2 == t1)
It is a bit unclear whether the OP wanted 1s and NAs (for nonmatching cases) or replacing the original matrix 't2' where it is not matching with 't1' to NA (as #Onyambu mentioned). If it is the latter
`is.na<-`(t2, t2==t1)
or multiply with 't2' (if it is numeric matrix)
NA^(t2 == t1) * t2

I think you can do:
# Step 1: Get values if equal or not (vector)
vals = sapply(t2 == t1, function(x) ifelse(isTRUE(x), NA, 1))
# Step 2: Convert to matrix
mat = matrix(vals,nrow = 2,ncol = 3)
print(mat)
[,1] [,2] [,3]
[1,] NA 1 NA
[2,] NA NA NA

Related

How to calculate matrix cumsum by condition in R?

Lets say I have a 2x2 - matrix like:
and try to find the cumsum till position [0,1], which would be 1+3 = 4
or [1,0] which equals to 1+2 = 3
So only the values, which matched the criteria, will be summed together..
Is there a function/method to this?
You are looking for the sum of a leading block of a matrix? This is most straightforward if you work with numeric index. In case of character index (i.e., row names and column names), we can match for numeric index before doing sum.
mat <- matrix(1:4, 2, 2, dimnames = list(0:1, 0:1))
rn <- "0"; cn <- "1"
sum(mat[1:match(rn, rownames(mat)), 1:match(cn, colnames(mat))])
#[1] 4
rn <- "1"; cn <- "0"
sum(mat[1:match(rn, rownames(mat)), 1:match(cn, colnames(mat))])
#[1] 3
Could you maybe explain to me why this code works?
In general, you can extract a block of a matrix mat, between rows i1 ~ i2 and columns j1 ~ j2 using mat[i1:i2, j1:j2]. A leading block means that the starting row and column are i1 = 1 and j1 = 1. In your case, the terminating row and column are to be determined by names, so I do match to first find the right i2 and j2.
I could sort of see your motivation. This is like selecting a region in an Excel sheet. :)
Another possibility:
m <- matrix(1:4,nrow=2)
m
#> [,1] [,2]
#> [1,] 1 3
#> [2,] 2 4
pos <- c(1,0)
pos <- pos + 1
sum(m[1:pos[1],1:pos[2]])
#> [1] 3
pos <- c(0,1)
pos <- pos + 1
sum(m[1:pos[1],1:pos[2]])
#> [1] 4
cumsum of the first column of the matrix AsIs then that of the transpose.
lapply(list(I, t), \(f) {r <- unname(cumsum(f(m)[, 1])); r[length(r)]})
# [[1]]
# [1] 3
#
# [[2]]
# [1] 4
Data:
m <- matrix(c(1, 2, 3, 4), 2, 2)

Return Index of Minimum Row for Each Column of Matrix

Suppose I have a matrix like the example below called m1:
m1<-matrix(6:1,nrow=3,ncol=2)
[,1] [,2]
[1,] 6 3
[2,] 5 2
[3,] 4 1
How do I get the index row for the minimum value of each column?
I know which.min() will return the column index value for each row.
The output should be: 3 and 3 because the minimum for column [,1] is 4 corresponding to row [3,] and the minimum for column [,2] is 1 corresponding row [3,].
If we need column wise index use apply with MARGIN=2 and apply the which.min
apply(m1, 2, which.min)
#[1] 3 3
If 1 column at a time is needed:
apply(as.matrix(m1[,1, drop = FALSE]), 2, which.min)
If we check ?Extract, the default usage is
x[i, j, ... , drop = TRUE]
drop - For matrices and arrays. If TRUE the result is coerced to the lowest possible dimension (see the examples). This only works for extracting elements, not for the replacement. See drop for further details.
To avoid getting dimensions dropped, use drop = FALSE
If we need the min values of each row
do.call(pmin, as.data.frame(m1))
Or
apply(m1, 2, min)
Or
library(matrixStats)
rowMins(m1)
data
m1 <- matrix(6:1,nrow=3,ncol=2)

subsetting values of matrix with a vector (conditions)

I have the following matrices :
> matrix <- matrix(c(1,3,4,NA,NA,NA,3,0,4,6,0,NA,2,NA,NA,2,0,1,0,0), nrow=5,ncol=4)
> n <- matrix(c(1,2,5,6,2),nrow=5,ncol=1)
As you can see, for each rows I have
multiple NAs - the number NAs is undefined
ONE single "0"
I would like to subset the 0 for the values of the n. Intended output below.
> output <- matrix(c(1, 3, 4,NA,NA,NA,3,5,4,6,1,NA,2,NA,NA,2,2,1,6,2), nrow=5,ncol=4)
I have tried the following
subset <- matrix == 0 & !is.na(matrix)
matrix[subset] <- n
#does not give intended output, but subset locates the values i want to change
When used on my "real" data i get the following message :
Warning message: In m[subset] <- n : number of items to replace is not
a multiple of replacement length
Thanks
EDIT : added a row to the matrix, as my real life problem is with an unbalanced matrix. I am using Matrices and not DF here, because i think (not sure)that with very large datasets, R is quicker with large matrices rather than subsets of dataframes.
We can do this using
out1 <- matrix+n[row(matrix)]*(matrix==0)
identical(output, out1)
#[1] TRUE
It appears you want to replace the values by row, but subsetting is replacing the values by column (and maybe that's not a completely thorough explanation). Transposing the matrix will get the desired output:
matrix <- t(matrix)
subset <- matrix == 0 & !is.na(matrix)
matrix[subset] <- n
matrix <- t(matrix)
setequal(output, matrix)
[1] TRUE
You can try this option with ifelse:
ifelse(matrix == 0, c(n) * (matrix == 0), matrix)
# [,1] [,2] [,3] [,4]
#[1,] 1 NA 1 2
#[2,] 3 NA NA 2
#[3,] 4 3 5 NA
#[4,] NA 6 NA 2
zero = matrix == 0
identical(ifelse(zero, c(n) * zero, matrix), output)
# [1] TRUE

Triplicates in R

I have a set of 80 samples, with 2 variables, each measured as triplicate:
sample var1a var1b var1c var2a var2b var2c
1 -169.784 -155.414 -146.555 -175.295 -159.534 -132.511
2 -180.577 -180.792 -178.192 -177.294 -171.809 -166.147
3 -178.605 -184.183 -177.672 -167.321 -168.572 -165.335
and so on. How do I apply functions like mean, sd, se etc. for each row for var1 and var2? Also, the dataset contains NAs. Thanks for bothering with such basic questions
What is your expected result when there are NAs? apply(df[-1], 1, mean) (or whatever function) will work, but it would give NA as a result for the row. If you can replace NA with 0 then you could do df[is.na(df)] <- 0 first, and then the apply function in order to get the results.
One approach could be to reshape your data set. Another one might be just apply a function over rows of a subset of the data frame.
So, for var2X you have:
apply(dat[5:7], 1, function(x){m <- mean(x); s <- sd(x); da <-c(m, s) })
[,1] [,2] [,3]
[1,] -155.78000 -171.750000 -167.076000
[2,] 21.63763 5.573734 1.632348
and for var1X:
apply(dat[2:4], 1, function(x){m <- mean(x); s <- sd(x); da <-c(m, s) })
[,1] [,2] [,3]
[1,] -157.25100 -179.853667 -180.153333
[2,] 11.72295 1.443055 3.520835

subsetting matrix while preserving row.names

I'm trying to subset a matrix so that I only get the matrix where the first variable is larger than the second variable. I have the matrix out which is a 3000x2 matrix.
I tried
out<-out[out[,1] > out[,2]]
but this eliminates the row.names altogether, and I get a string of integers between 1 to 3000. Would there be a way to preserve the row.names?
Of note, if you only return a subset of one row to form a matrix with one dimension being unity, R will drop the row name:
m <- matrix(1:9, ncol = 3)
rownames(m) <- c("a", "b", "c")
m[1, ] # lost the row name
m[1, , drop = FALSE] # got row name back and a matrix
m[c(1,1), ] # the row name is back when result has nrow > 1
There appears to be no simple way of working around this other than checking for one-row result and assigning the row name.
A matrix is treated by R as a vector with columns and rows.
> A <- matrix(1:9, ncol=3)
# A is filled with 1,...,9 columnwise
> A
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
# only elements with even number in 2nd column of same row
> v <- A[A[,2] %% 2 == 0]
> m <- A[A[,2] %% 2 == 0,]
> v
[1] 1 3 4 6 7 9
> m
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 3 6 9
# The result of evaluating odd/even-ness of middle column.
# This boolean vector is repeated column-wise by default
# until all element's fate in A is determined.
> A[,2] %% 2 == 0
[1] TRUE FALSE TRUE
When you leave out the comma (v), then you address A as a 1-dimensional data structure and R implicitely handles your expression as a vector.
v is in that sense not "string of integers" but a vector of integers. When you add the comma, then you tell R that your condition only adresses the first dimension while indicating a second one (after the comma) - which causes R to handle your expression as a matrix (m).

Resources