Replacing values in a dataframe by row in R - r

Is there a way in R to replace values in each row of a matrix/dataframe with a specific value from that row?
For example, I have the following matrix:
df<-cbind(c("A","C","G","T"),c("T","G","C","A"),c(0,1,0,1),c(1,0,1,0),c(0,1,0,1))
df
# [,1] [,2] [,3] [,4] [,5]
#[1,] "A" "T" "0" "1" "0"
#[2,] "C" "G" "1" "0" "1"
#[3,] "G" "C" "0" "1" "0"
#[4,] "T" "A" "1" "0" "1"
and I want to replace the zeros in each row with the corresponding letter from the first column of that row, such that the new matrix will look like this:
newdf
# [,1] [,2] [,3] [,4] [,5]
#[1,] "A" "T" "A" "1" "A"
#[2,] "C" "G" "1" "C" "1"
#[3,] "G" "C" "G" "1" "G"
#[4,] "T" "A" "1" "T" "1"
The closest I have been able to get is with the following commands, but it does not replace the zeros with the correct values from column 1.
df[df==0]<-NA
df[, 3:ncol(df)][is.na(df[, 3:ncol(df)])] <- df[,1]

We can replicate the first column to make the lengths equal and then do the assignment based on the logical matrix. It will subset the elements that are of the same length as in the rhs
i1 <- df == 0
newdf <- df
newdf[i1] <- df[,1][row(df)][i1]
newdf
[,1] [,2] [,3] [,4] [,5]
#[1,] "A" "T" "A" "1" "A"
#[2,] "C" "G" "1" "C" "1"
#[3,] "G" "C" "G" "1" "G"
#[4,] "T" "A" "1" "T" "1"

Related

Making symmetrical matrix of characters [duplicate]

This question already has answers here:
Creating a symmetric matrix in R
(7 answers)
Closed 2 years ago.
I have a matrix of characters:
mat1
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "0" "B" "A" "C" "D" "D"
[2,] "0" "0" "B" "C" "C" "C"
[3,] "0" "0" "0" "D" "D" "C"
[4,] "0" "0" "0" "0" "B" "B"
[5,] "0" "0" "0" "0" "0" "A"
[6,] "0" "0" "0" "0" "0" "0"
I want to have a Symmetrical matrix of that, as below:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "0" "B" "A" "C" "D" "D"
[2,] "B" "0" "B" "C" "C" "C"
[3,] "A" "B" "0" "D" "D" "C"
[4,] "C" "C" "D" "0" "B" "B"
[5,] "D" "C" "D" "B" "0" "A"
[6,] "D" "C" "C" "B" "A" "0"
You can set the lower triangular part of the matrix as equal to the lower triangular part of the transposed matrix, by using the lower.tri functions on the matrix mat1:
mat1[lower.tri(mat1)] <- t(mat1)[lower.tri(mat1)]

Replace values in one matrix with values from another

I am a programming newbie attempting to compare two matrices. In case an element from first column in mat1 matches any element from first column in mat2, then I want that matching element in mat1 to be replaced with the neighboor (same row different column) to the match in mat2.
INPUT:
mat1<-matrix(letters[1:5])
mat2<-cbind(letters[4:8],1:5)
> mat1
[,1]
[1,] "a"
[2,] "b"
[3,] "c"
[4,] "d"
[5,] "e"
> mat2
[,1] [,2]
[1,] "d" "1"
[2,] "e" "2"
[3,] "f" "3"
[4,] "g" "4"
[5,] "h" "5"
wished OUTPUT:
> mat3
[,1]
[1,] "a"
[2,] "b"
[3,] "c"
[4,] "1"
[5,] "2"
I have attempted the following without succeeding:
> for(x in mat1){mat3<-ifelse(x==mat2,mat2[which(x==mat2),2],mat1)}
> mat3
[,1] [,2]
[1,] "a" "a"
[2,] "2" "b"
[3,] "c" "c"
[4,] "d" "d"
[5,] "e" "e"
Any advice will be very appreciated. Have spent a whole day without making it work. It doesn't matter to me if the elements are in a matrix or a data frame.
Thanks.
ifelse is vectorized so, we can use it on the whole column. Create the test logical condition in ifelse by checking whether the first column values of 'mat1' is %in% the first column of 'mat2', then , get the index of the corresponding values with match, extract the values of the second column with that index, or else return the first column of 'mat1'
mat3 <- matrix(ifelse(mat1[,1] %in% mat2[,1],
mat2[,2][match(mat1[,1], mat2[,1])], mat1[,1]))
mat3
# [,1]
#[1,] "a"
#[2,] "b"
#[3,] "c"
#[4,] "1"
#[5,] "2"
Here is another base R solution
v <- `names<-`(mat2[,2],mat2[,1])
mat3 <- matrix(unname(ifelse(is.na(v[mat1]),mat1,v[mat1])))
which gives
> mat3
[,1]
[1,] "a"
[2,] "b"
[3,] "c"
[4,] "1"
[5,] "2"
An option just using logical operation rather than a function
mat3 <- mat1
mat3[mat1[,1] %in% mat2[,1], 1] <- mat2[mat2[,1] %in% mat1[,1], 2]
Subsetting the values to find those that occur in both and replacing them where they do

What's the most economical way to fill a matrix with random values?

I'm trying to find the most economical and elegant code for a simple task: fill an empty matrix with randomly sampled values (here, A, B, or C). For illustration, let's take this matrix:
x <- matrix(NA, nrow=8, ncol=4)
[,1] [,2] [,3] [,4]
[1,] NA NA NA NA
[2,] NA NA NA NA
[3,] NA NA NA NA
[4,] NA NA NA NA
[5,] NA NA NA NA
[6,] NA NA NA NA
[7,] NA NA NA NA
[8,] NA NA NA NA
To fill it I've used two codes so far, each successfully doing the job. The first uses sapply:
x[] <- sapply(x, function(i) sample(LETTERS[1:3], 1, replace = F))
x
[,1] [,2] [,3] [,4]
[1,] "C" "A" "B" "C"
[2,] "B" "B" "B" "B"
[3,] "A" "B" "B" "B"
[4,] "B" "C" "A" "C"
[5,] "B" "A" "C" "A"
[6,] "A" "B" "C" "A"
[7,] "A" "C" "C" "A"
[8,] "C" "B" "B" "C"
while the second is a forloop:
for(i in 1:nrow(x)){
x[i,] <- sample(LETTERS[1:3], 4, replace = T)
}
x
[,1] [,2] [,3] [,4]
[1,] "C" "A" "C" "C"
[2,] "C" "A" "B" "B"
[3,] "C" "C" "A" "B"
[4,] "C" "C" "A" "C"
[5,] "A" "C" "C" "C"
[6,] "B" "C" "A" "A"
[7,] "C" "C" "B" "A"
[8,] "B" "C" "B" "C"
I like neither of them as they both look bulky. Is there a better way to get the expected result, that is, is there a shorter and/or more elegant way?
How about assigning it directly?
x[] <- sample(LETTERS, length(x), replace = TRUE)
x
# [,1] [,2] [,3] [,4]
#[1,] "A" "H" "V" "A"
#[2,] "X" "M" "Y" "O"
#[3,] "A" "W" "N" "I"
#[4,] "H" "Y" "Y" "C"
#[5,] "W" "N" "O" "P"
#[6,] "Y" "H" "P" "J"
#[7,] "I" "Y" "N" "H"
#[8,] "S" "F" "Z" "I"
If you want only include first three LETTERS this would work
x[] <- sample(LETTERS[1:3], length(x), replace = TRUE)
We can use replace without changing the original matrix
replace(x, TRUE, sample(LETTERS, length(x), replace = TRUE))
# [,1] [,2] [,3] [,4]
#[1,] "B" "O" "S" "D"
#[2,] "N" "C" "Q" "Z"
#[3,] "X" "X" "Z" "X"
#[4,] "O" "G" "R" "R"
#[5,] "L" "B" "S" "U"
#[6,] "Y" "I" "O" "A"
#[7,] "L" "Y" "P" "M"
#[8,] "R" "X" "H" "T"

Row number in dataframe based on multiple parameters in R

I wish to find the row number, based on multiple parameters. I have made this test matrix:
data=
[,1] [,2] [,3]
[1,] "1" "a" "0"
[2,] "2" "b" "0"
[3,] "3" "c" "0"
[4,] "4" "a" "0"
[5,] "1" "b" "0"
[6,] "2" "c" "0"
[7,] "3" "a" "0"
[8,] "4" "b" "0"
Then I want to get the row number where
data[,1]==1 and data[,2]=='b'

Exclude rows where element has been previously met for N times

I have following input data:
# [,1] [,2]
#[1,] "A" "B"
#[2,] "A" "C"
#[3,] "A" "D"
#[4,] "B" "C"
#[5,] "B" "D"
#[6,] "C" "D"
Next I want to exclude rows where first or second element has been previously for N times. For example if N = 2 then need to exclude following rows:
#[3,] "A" "D" - element "A" has been 2 times
#[5,] "B" "D" - element "B" has been 2 times
#[6,] "C" "D" - element "C" has been 2 times
Note: Need to take into account excluding results immediately. For example if element has met 5 times and after removing it met only 1 times then need to leave next row with this element. Because now it meets 2 times.
Example (N=2):
Input data:
[,1] [,2]
[1,] "A" "B"
[2,] "A" "C"
[3,] "A" "D"
[4,] "A" "E"
[5,] "B" "C"
[6,] "B" "D"
[7,] "B" "E"
[8,] "C" "D"
[9,] "C" "E"
[10,] "D" "E"
Output data:
[,1] [,2]
[1,] "A" "B"
[2,] "A" "C"
[5,] "B" "C"
[10,] "D" "E"
There are possibly more elegant solutions... but this seems to work:
v <- c("A", "B", "C", "D", "E")
cmb <- t(combn(v, 2))
n <- 2
# Go through each letter
for (l in v)
{
# Find the combinations using that letter
rows <- apply(cmb, 1, function(x){l %in% x})
rows.2 <- which(rows==T)
if (length(rows.2)>n)
rows.2 <- rows.2[1:n]
# Take the first n rows containing the letter,
# then append all the ones not containing it
cmb <- rbind(cmb[rows.2,], cmb[rows==F,])
}
cmb
which outputs:
[,1] [,2]
[1,] "D" "E"
[2,] "B" "C"
[3,] "A" "C"
[4,] "A" "B"

Resources