Ordering rows and columns of R Matrix by criteria - r

I have a matrix in R like this:
A B C D E F
A 2 5 0 1 3 6
B 5 0 0 1 5 9
C 0 0 0 0 0 1
D 6 1 1 3 4 4
E 3 1 5 2 1 6
F 0 0 1 1 7 9
mat = structure(c(2L, 5L, 0L, 6L, 3L, 0L, 5L, 0L, 0L, 1L, 1L, 0L, 0L,
0L, 0L, 1L, 5L, 1L, 1L, 1L, 0L, 3L, 2L, 1L, 3L, 5L, 0L, 4L, 1L,
7L, 6L, 9L, 1L, 4L, 6L, 9L), .Dim = c(6L, 6L), .Dimnames = list(
c("A", "B", "C", "D", "E", "F"), c("A", "B", "C", "D", "E",
"F")))
The matrix is not symmetric.
I want to reorder the rows and columns according to the following criteria:
NAME TYPE
A Dog
B Cat
C Cat
D Other
E Cat
F Dog
crit = structure(list(NAME = c("A", "B", "C", "D", "E", "F"), TYPE = c("Dog",
"Cat", "Cat", "Other", "Cat", "Dog")), .Names = c("NAME", "TYPE"
), row.names = c(NA, -6L), class = "data.frame")
I am trying to get the matrix rows and columns to be re-ordered, so that each category is grouped together:
A F B C E D
A
F
B
C
E
D
I am un-able to find any reasonable way of doing this.
In case it matters, or makes things simpler, I can get rid of the category 'Others' and just stick with 'Cat' and 'Dog'.
I need to find a way to write code for this re-ordering to happen as the matrix is quite big.

In base, just index by order:
mat[order(crit$TYPE), order(crit$TYPE)]
#
# B C E A F D
# B 0 0 5 5 9 1
# C 0 0 0 0 1 0
# E 1 5 1 3 6 2
# A 5 0 3 2 6 1
# F 0 1 7 0 9 1
# D 1 1 4 6 4 3
It orders on an alphabetical sort of crit$TYPE, so Cat (B, C, and E) comes before Dog (A and F). If you want to set the order, use factor levels:
mat[order(factor(crit$TYPE, levels = c('Dog', 'Cat', 'Other'))),
order(factor(crit$TYPE, levels = c('Dog', 'Cat', 'Other')))]
#
# A F B C E D
# A 2 6 5 0 3 1
# F 0 9 0 1 7 1
# B 5 9 0 0 5 1
# C 0 1 0 0 0 0
# E 3 6 1 5 1 2
# D 6 4 1 1 4 3

Related

New data frame, if specific value(s) is contained AND other values aren't included in a range of columns in r

So, I have a large data frame with monthly observations of n individuals.
ind y_0101 y_0102 y_0103 y_0104_ .... y_0311 y_0312
A 33 6 1 2 1 5
B 36 5 0 2 1 5
C 22 4 1 NA 1 5
D 2 2 0 2 1 5
E 5 2 1 2 1 6
F 7 1 0 2 1 5
G 8 6 1 2 1 5
H 2 8 0 2 2 5
I 1 3 1 2 1 5
J 3 2 0 2 1 5
I want to create a new data frame, in which include the individuals who meet some specific conditions.
E.g. if, for individual i, the range of column y_0101:y_0312 does NOT include values of 3 & 6 & NA, AND include values of 2 | 1 THEN for individual i should be included in new data frame. Which produce the following data frame:
ind y_0101 y_0102 y_0103 y_0104_ .... y_0311 y_0312
B 36 5 0 2 1 5
D 2 2 0 2 1 5
F 7 1 0 2 1 5
H 2 8 0 2 2 5
I tried different ways, but I can't figure out how to get multiple conditions included.
df <- df %>% filter(vars(starts_with("y_"))!=3 | !=6 | != NA)
or
df <- df %>% filter_at(vars(starts_with("y_")), all_vars(!=3 | !=6 | != NA)
I've tried some other things as well, like !%in%, but that doesn't seem to work. Any ideas?
I think you're almost there, but might need a slight shift in the logic:
df <- data.frame(A1 = 1:10,
A2 = 10:1,
A3 = 1:10,
B1 = 1:10)
df %>%
filter_at(vars(starts_with("A")), ~!(.x %in% c(3, 6, NA))) %>%
filter(if_any(starts_with("A"), ~ .x %in% c(1, 2)))
In the first step, I filter out all rows where any of the columns are 3, 6, or NA. In the second row, I filter down to only rows where at least one of the columns is 1 or 2. Does this help with your case?
Here is a base R option using rowSums :
cols <- grep('y_', names(df))
include <- c(1, 2)
not_include <- c(3, 6, NA)
result <- subset(df, rowSums(sapply(df[cols], `%in%`, include)) > 0 &
rowSums(sapply(df[cols], `%in%`, not_include)) == 0)
result
# ind y_0101 y_0102 y_0103 y_0104 y_0311 y_0312
#2 B 36 5 0 2 1 5
#4 D 2 2 0 2 1 5
#6 F 7 1 0 2 1 5
#8 H 2 8 0 2 2 5
data
df <- structure(list(ind = c("A", "B", "C", "D", "E", "F", "G", "H",
"I", "J"), y_0101 = c(33L, 36L, 22L, 2L, 5L, 7L, 8L, 2L, 1L,
3L), y_0102 = c(6L, 5L, 4L, 2L, 2L, 1L, 6L, 8L, 3L, 2L), y_0103 = c(1L,
0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L), y_0104 = c(2L, 2L, NA, 2L,
2L, 2L, 2L, 2L, 2L, 2L), y_0311 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 1L), y_0312 = c(5L, 5L, 5L, 5L, 6L, 5L, 5L, 5L, 5L, 5L
)), class = "data.frame", row.names = c(NA, -10L))

Counting the common indices shared between columns of a dataframe containing only binary values in R

Suppose I have a dataframe containing binary values as:
A B C D
a 1 0 0 0
b 0 1 1 0
c 1 1 0 1
d 0 0 1 1
e 1 1 1 1
f 1 0 0 1
I'd like to count the number of common indices shared between the pair of columns in the dataframe
as shown below. What is the most efficient way to do so in R ?
A B C D
A - 2 1 3
B 2 - 2 2
C 1 2 - 2
D 3 2 2 -
Any help is greatly appreciated. Thanks!
Maybe this is what you are after
> `diag<-`(crossprod(as.matrix(df)),NA)
A B C D
A NA 2 1 3
B 2 NA 2 2
C 1 2 NA 2
D 3 2 2 NA
Data
> dput(df)
structure(list(A = c(1L, 0L, 1L, 0L, 1L, 1L), B = c(0L, 1L, 1L,
0L, 1L, 0L), C = c(0L, 1L, 0L, 1L, 1L, 0L), D = c(0L, 0L, 1L,
1L, 1L, 1L)), class = "data.frame", row.names = c("a", "b", "c",
"d", "e", "f"))

How to change value with condition dataframe in r?

I have dataframe something like:
myData <- User X Y Similar
A 1 4 100
A 1 2 100
A 1 1 100
A 3 2 80
A 2 1 20
A 2 4 100
B 3 1 50
B 4 2 90
B 1 3 100
To something like this:
myData <- User X Y Similar
A 1 4 0
A 1 2 0
A 1 1 0
A 3 2 80
A 2 1 20
A 2 4 100
B 3 1 50
B 4 2 90
B 1 3 0
Question
I want to change value in similar column to 0 with condition. The condition is if variable x = 1 and variable similar = 100. How to do that in r?
Thanks
We create a logical vector based on the 'X' and 'Similar' and do the assignment of 'Similar with that index to replace those values to 0
i1 <- with(myData, X ==1 & Similar == 100)
myData$Similar[i1] <- 0
-output
myData
# User X Y Similar
#1 A 1 4 0
#2 A 1 2 0
#3 A 1 1 0
#4 A 3 2 80
#5 A 2 1 20
#6 A 2 4 100
#7 B 3 1 50
#8 B 4 2 90
#9 B 1 3 0
data
myData <- structure(list(User = c("A", "A", "A", "A", "A", "A", "B", "B",
"B"), X = c(1L, 1L, 1L, 3L, 2L, 2L, 3L, 4L, 1L), Y = c(4L, 2L,
1L, 2L, 1L, 4L, 1L, 2L, 3L), Similar = c(100L, 100L, 100L, 80L,
20L, 100L, 50L, 90L, 100L)), class = "data.frame", row.names = c(NA,
-9L))

Refer to column name and row name within an apply statement in R

I have a dataframe in R which looks like the one below.
a b c d e f
0 1 1 0 0 0
1 1 1 1 0 1
0 0 0 1 0 1
1 0 0 1 0 1
1 1 1 0 0 0
The database is big, spanning over 100 columns and 5000 rows and contain all binaries (0's and 1's). I want to construct an overlap between each and every columns in R. Something like the one given below. This overlap dataframe will be a square matrix with equal number of rows and columns and that will be same as the number of columns in the 1st dataframe.
a b c d e f
a 3 2 2 2 0 2
b 2 3 3 3 0 1
c 2 3 3 1 0 1
d 2 3 1 3 0 3
e 0 0 0 0 0 0
f 2 1 1 3 0 3
Each cell of the second dataframe is populated by the number of cases where both row and column have 1 in the first dataframe.
I'm thinking of constructing a empty matrix like this:
df <- matrix(ncol = ncol(data), nrow = ncol(data))
colnames(df) <- names(data)
rownames(df) <- names(data)
.. and iterating over each cell of this matrix using an apply command reading the corresponding row name (say, x) and column name (say, y) and running a function like the one below.
summation <- function (x,y) (return (sum(data$x * data$y)))
The problem with is I can't find out the row name and column name while within an apply function. Any help will be appreciated.
Any more efficient way than what I'm thinking is more than welcome.
You are looking for crossprod
crossprod(as.matrix(df1))
# a b c d e f
#a 3 2 2 2 0 2
#b 2 3 3 1 0 1
#c 2 3 3 1 0 1
#d 2 1 1 3 0 3
#e 0 0 0 0 0 0
#f 2 1 1 3 0 3
data
df1 <- structure(list(a = c(0L, 1L, 0L, 1L, 1L), b = c(1L, 1L, 0L, 0L,
1L), c = c(1L, 1L, 0L, 0L, 1L), d = c(0L, 1L, 1L, 1L, 0L), e = c(0L,
0L, 0L, 0L, 0L), f = c(0L, 1L, 1L, 1L, 0L)), .Names = c("a",
"b", "c", "d", "e", "f"), class = "data.frame", row.names = c(NA,
-5L))

How to convert a non-square matrix to a square matrix with R?

I have a network data and trying to analyze it. The problem is it has some missing rows or columns. I want to match rows and columns, so it can be a square matrix
My data looks like this:
A B C D E
A 0 2 1 4 5
B 1 0 2 4 2
D 2 4 0 2 2
E 1 2 2 2 0
And I want to make it looks like this:
A B C D E
A 0 2 1 4 5
B 1 0 2 4 2
C NA NA NA NA NA
D 2 4 0 2 2
E 1 2 2 2 0
As my data is very huge so I cannot do it by hands. It there any syntax to do it automatically?
One option is to create a NA matrix based on the unique column names and row names (assuming that it is symmetric) and then fill it by matching row names and column names in original dataset
un1 <- unique(sort(c(colnames(m1), rownames(m1))))
m2 <- matrix(NA, length(un1), length(un1), dimnames = list(un1, un1))
m2[row.names(m1), colnames(m1)] <- m1
m2
# A B C D E
#A 0 2 1 4 5
#B 1 0 2 4 2
#C NA NA NA NA NA
#D 2 4 0 2 2
#E 1 2 2 2 0
data
m1 <- structure(c(0L, 1L, 2L, 1L, 2L, 0L, 4L, 2L, 1L, 2L, 0L, 2L, 4L,
4L, 2L, 2L, 5L, 2L, 2L, 0L), .Dim = 4:5, .Dimnames = list(c("A",
"B", "D", "E"), c("A", "B", "C", "D", "E")))

Resources