How to remove parts of a column from matrix in R - r

Let's say I have a matrix
[,1] [,2] [,3] [,4]
[1,] 10 11 12 13
[2,] 9 10 15 4
[3,] 5 7 4 10
[4,] 1 2 6 2
I want to remove parts of a column where the values are <=5. Even if there is a higher value in the next row of the column (ie. [3,4] after [2,4] is <5), those will become 0, so I should be left with:
[,1] [,2] [,3] [,4]
[1,] 10 11 12 13
[2,] 9 10 15 NA
[3,] NA 7 NA NA
[4,] NA NA NA NA
The matrix was created by using a for-loop to iterate a population 100 times so my matrix is 100x100.
I tried to use an if function in the for-loop to remove parts of the column but instead it just removed all columns after the first one.
if(matrix[,col]<=5) break

Here's a way to replace the required values in a matrix with NA:
# Create a random matrix with 20 rows and 20 columns
m <- matrix(floor(runif(400, min = 0, max = 101)), nrow = 20)
# Function that iterates through a vector and replaces values <= 5
# and the following values with NA
f <- function(x) {
fillNA <- FALSE
for (i in 1:length(x)) {
if (fillNA || x[i] <= 5) {
x[i] <- NA
fillNA <- TRUE
}
}
x
}
# Apply the function column-wise
apply(m, 2, f)

We can do this in base R. Let's assume that your matrix is called m. The function below does the following:
Check each element to see if it is <= 5, producing TRUE/FALSE values.
Cumulatively sum the TRUE/FALSE values.
Replace any non-zero cumulative values with NA.
Use apply to perform this operation per column of the matrix.
This can be fit on one line:
m2 <- apply(m, 2, \(x) ifelse(cumsum(x <= 5), NA, x))
[,1] [,2] [,3] [,4]
[1,] 10 11 12 13
[2,] 9 10 15 NA
[3,] NA 7 NA NA
[4,] NA NA NA NA

# Load the necessary packages
library(dplyr)
# Set the seed for reproducibility
set.seed(123)
# Create a random matrix with 100 rows and 100 columns
matrix <- matrix(runif(10000), nrow = 100)
# Replace values in each row of the matrix that are <= 5 with NA
matrix[apply(matrix, 1, function(x) any(x <= 5)), ] <- NA
# View the modified matrix
matrix
This code first loads the dplyr package, which is not necessary for this task but is used here to create a random matrix. It then sets the seed for reproducibility, so that the same random matrix is generated every time the code is run. Next, it creates a random matrix with 100 rows and 100 columns using the runif function, which generates random uniform numbers between 0 and 1. Finally, it uses the apply function to apply the logic to each row of the matrix and replace any values that are <= 5 with NA.

Related

returning matrix column indices matching value(s) in R

I'm looking for a fast way to return the indices of columns of a matrix that match values provided in a vector (ideally of length 1 or the same as the number of rows in the matrix)
for instance:
mat <- matrix(1:100,10)
values <- c(11,2,23,12,35,6,97,3,9,10)
the desired function, which I call rowMatches() would return:
rowMatches(mat, values)
[1] 2 1 3 NA 4 1 10 NA 1 1
Indeed, value 11 is first found at the 2nd column of the first row, value 2 appears at the 1st column of the 2nd row, value 23 is at the 3rd column of the 3rd row, value 12 is not in the 4th row... and so on.
Since I haven't found any solution in package matrixStats, I came up with this function:
rowMatches <- function(mat,values) {
res <- integer(nrow(mat))
matches <- mat == values
for (col in ncol(mat):1) {
res[matches[,col]] <- col
}
res[res==0] <- NA
res
}
For my intended use, there will be millions of rows and few columns. So splitting the matrix into rows (in a list called, say, rows) and calling Map(match, as.list(values), rows) would be way too slow.
But I'm not satisfied by my function because there is a loop, which may be slow if there are many columns. It should be possible to use apply() on columns, but it won't make it faster.
Any ideas?
res <- arrayInd(match(values, mat), .dim = dim(mat))
res[res[, 1] != seq_len(nrow(res)), 2] <- NA
# [,1] [,2]
# [1,] 1 2
# [2,] 2 1
# [3,] 3 3
# [4,] 2 NA
# [5,] 5 4
# [6,] 6 1
# [7,] 7 10
# [8,] 3 NA
# [9,] 9 1
#[10,] 10 1
Roland's answer is good, but I'll post an alternative solution:
res <- which(mat==values, arr.ind = T)
res <- res[match(seq_len(nrow(mat)), res[,1]), 2]

Delete specific values in a matrix according to two position vectors

My aim is to delete specific positions in a matrix according to a vector. Just giving you a small example.
Users_pos <- c(1,2)
Items_pos <- c(3,2)
Given a Matrix A:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
My aim according to the two Vectors User_pos and Item_pos is to delete the following values
A[1,3] and A[3,2]
I'm wondering if there's a possibility to do so without typing in the values for rows and columns by hand.
You can index k elements in a matrix A using A[X], where X is a k-row, 2-column matrix where each row is the (row, col) value of the indicated element. Therefore, you can index your two elements in A with the following indexing matrix:
rbind(Users_pos, Items_pos)
# [,1] [,2]
# Users_pos 1 2
# Items_pos 3 2
Using this indexing, you could choose to extract the information current stored with A[X] or replace those elements with A[X] <- new.values. If you, for instance, wanted to replace these elements with NA, you could do:
A[rbind(Users_pos, Items_pos)] <- NA
A
# [,1] [,2] [,3]
# [1,] 1 NA 3
# [2,] 4 5 6
# [3,] 7 NA 9

Moving Data from Matrix A to Matrix B in R

I want to cut/move/replace some data (to be precise 2500) from Matrix A to Matrix B in R.
for example Move cell(i,j) from matrix A to cell(i,j) in matrix B. both i and j have some fixed value(50 to be precise) and replace that cell(i,j) in matrix A with "0".
Since I am newto programming can anyone help me with the coding?
Thanks in Advance
Regards
You can first define a two column coordinate-matrix of the values you want to replace, where the first column refers is the row-index and the second column is the column-index. As an example, suppose you want to replace the cells c(2,1), c(2,2) and c(1,2) in a 3x3 matrix B with the calues from a 3x3 matrix A:
ind <- cbind(c(2,2,1), c(1,2,2))
A <- matrix(1:9, ncol = 3)
B <- matrix(NA, ncol = 3, nrow = 3)
B[ind] <- A[ind]; A[ind] <- 0
B
[,1] [,2] [,3]
[1,] NA 4 NA
[2,] 2 5 NA
[3,] NA NA NA
A
[,1] [,2] [,3]
[1,] 1 0 7
[2,] 0 0 8
[3,] 3 6 9

Getting elements of a matrix with vectors of coordinates

This is a really basic question, but I can't seem to solve it or find an answer for it anywhere : suppose I have two vectors x,y of coordinates and a matrix m.
I would like a vector z such that z[i] = m[x[i],y[i]]for all i.
I tried z=m[x,y], but that creates a memory overflow. The vector and matrix are quite large so looping is pretty much out of the question. Any ideas ?
Use cbind. Here's a simple example:
mat <- matrix(1:25, ncol = 5)
mat
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 6 11 16 21
# [2,] 2 7 12 17 22
# [3,] 3 8 13 18 23
# [4,] 4 9 14 19 24
# [5,] 5 10 15 20 25
x <- 1:5
y <- c(2, 3, 1, 4, 3)
mat[cbind(x, y)]
# [1] 6 12 3 19 15
## Verify with a few values...
mat[1, 2]
# [1] 6
mat[2, 3]
# [1] 12
mat[3, 1]
# [1] 3
From ?Extract:
A third form of indexing is via a numeric matrix with the one column for each dimension: each row of the index matrix then selects a single element of the array, and the result is a vector. Negative indices are not allowed in the index matrix. NA and zero values are allowed: rows of an index matrix containing a zero are ignored, whereas rows containing an NA produce an NA in the result.
Another way is to use the fact that you can index a matrix as if it were a vector, with elements numbered in column-major form. Using the example from #AnandoMahto:
mat[x+nrow(mat)*(y-1)]
[1] 6 12 3 19 15

Questions about missing data

In a matrix, if there is some missing data recorded as NA.
how could I delete rows with NA in the matrix?
can I use na.rm?
na.omit() will take matrices (and data frames) and return only those rows with no NA values whatsoever - it takes complete.cases() one step further by deleting the FALSE rows for you.
> x <- data.frame(c(1,2,3), c(4, NA, 6))
> x
c.1..2..3. c.4..NA..6.
1 1 4
2 2 NA
3 3 6
> na.omit(x)
c.1..2..3. c.4..NA..6.
1 1 4
3 3 6
I think na.rm usually only works within functions, say for the mean function. I would go with complete.cases: http://stat.ethz.ch/R-manual/R-patched/library/stats/html/complete.cases.htm
let's say you have the following 3x3-matrix:
x <- matrix(c(1:8, NA), 3, 3)
> x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 NA
then you can get the complete cases of this matrix with
y <- x[complete.cases(x),]
> y
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
The complete.cases-function returns a vector of truth values that says whether or not a case is complete:
> complete.cases(x)
[1] TRUE TRUE FALSE
and then you index the rows of matrix x and add the "," to say that you want all columns.
If you want to remove rows that contain NA's you can use apply() to apply a quick function to check each row. E.g., if your matrix is x,
goodIdx <- apply(x, 1, function(r) !any(is.na(r)))
newX <- x[goodIdx,]

Resources