Arrange a matrix with regard to the rownames and colnames - r

Image we have a matrix, M*N, M rows and N columns, like
b a d c e
a 2 1 4 3 5
b 3 2 5 4 6
c 1 3 3 2 4
I want to write a function, where take the above matrix, return the following matrix:
a b c d e
a 1 2 3 4 5
b 2 3 4 5 6
c 3 1 2 3 4
Where the first part of the matrix M*M, 3*3 in this case is symmetric in terms of rownames and colnames, and 3*5 in total, the rest 3*2 matrix is pushed afterwards.

For an N x M matrix where N <= M and all row names are contained in col names, this will bring the columns with names existing in row names to the front in the same order as the row names, and leave the rest of the columns in their original order after that:
mat_ord <- function(mx) mx[, c(rownames(mx), setdiff(colnames(mx), rownames(mx)))]
mat_ord(mx)
produces:
a b c d e
a 1 2 3 4 5
b 2 3 4 5 6
c 3 1 2 3 4
To see the difference, consider mx2 which has rows and columns ordered differently than mx:
e a b d c
b 6 2 3 5 4
a 5 1 2 4 3
c 4 3 1 3 2
And with mat_ord(mx2) we get:
b a c e d
b 3 2 4 6 5
a 2 1 3 5 4
c 1 3 2 4 3
UPDATE: this sorts rows and columns while ensuring symmetry on first N cols/rows:
mat_ord2 <- function(mx) mx[sort(rownames(mx)), c(sort(rownames(mx)), sort(setdiff(colnames(mx), rownames(mx))))]
mat_ord2(mx2)
produces:
a b c d e
a 1 2 3 4 5
b 2 3 4 5 6
c 3 1 2 3 4

Related

Difference Matrix in R

I would like to take the difference between two dataframes that are of different lengths and output a matrix in R.
x = data.frame(name=c('a','b','c','d','e'),length=c(5,6,7,8,9))
y = data.frame(name=c('r','t','v'),length=c(10,11,12))
> x
name length
1 a 5
2 b 6
3 c 7
4 d 8
5 e 9
> y
name length
1 r 10
2 t 11
3 v 12
The result I want is the difference in a matrix. Length of y minus length of x. I also want to keep the names consistent. So something like this:
>
0 r t v
a 5 6 7
b 4 5 6
c 3 4 5
d 2 3 4
e 1 2 3
How can I approach this problem?
This is an outer operation:
outer(setNames(x$length, x$name), setNames(y$length, y$name), FUN=\(x,y) y-x)
# r t v
#a 5 6 7
#b 4 5 6
#c 3 4 5
#d 2 3 4
#e 1 2 3

How do I group_by if the column that I want to summarize with has all the same values

x l
1 1 a
2 3 b
3 2 c
4 3 b
5 2 c
6 4 d
7 5 f
8 2 c
9 1 a
10 1 a
11 3 b
12 4 d
The above is the input.
The below is the output.
x l
1 1 a
2 3 b
3 2 c
4 4 d
5 5 f
I know that column l will have the same value for each group_by(x).
l is a string
# Creation of dataset
x <- c(1,3,2,3,2,4,5,2,1,1,3,4)
l<- c("a","b","c","b","c","d","f","c","a","a","b","d")
df <- data.frame(x,l)
# Simply call unique function on your dataframe
dfu <- unique(df)

Create a new variable which count length of duplicate in R

I have a data frame,I want to create a variable z,count duplicate of "y variable", if y have 1,1 set z = 2,2, if y have 3,3,3, set z = 3,3,3.
x = c("a","b","c","d","e","a","b","c","d","e","a","b","c")
y = c(1,1,2,2,2,3,3,4,4,4,5,5,5)
data <- data.frame(x,y)
data
x y z
1 a 1 2
2 b 1 2
3 c 2 3
4 d 2 3
5 e 2 3
6 a 3 2
7 b 3 2
8 c 4 3
9 d 4 3
10 e 4 3
11 a 5 3
12 b 5 3
13 c 5 3
Thanks for your help.
You can try the rle:
data$z <- with(data, unlist(mapply(rep, rle(y)$lengths, rle(y)$lengths)))
data
x y z
1 a 1 2
2 b 1 2
3 c 2 3
4 d 2 3
5 e 2 3
6 a 3 2
7 b 3 2
8 c 4 3
9 d 4 3
10 e 4 3
11 a 5 3
12 b 5 3
13 c 5 3
If your your variable y is sorted as an increasing sequence as you say, then the following solution will work:
# calculate counts of each level
counts <- table(data$y)
# fill in z
data$z <- counts[match(data$y, names(counts))]
Note, however, that this method will fail if y is not ordered and, since you want to restart the count when a different level occurs. For these purposes, #psidom's solution is more robust to mis-ordered data as rle will reset the count.
This method calculates the total occurrences of a level and then feeds these total counts to the proper location using match.
Here is a quick method using dplyr, and its rather intuitive syntax:
library(dplyr)
left_join(data, data %>%
group_by(y) %>%
summarize(z = n()),
by = "y")
x y z
1 a 1 2
2 b 1 2
3 c 2 3
4 d 2 3
5 e 2 3
6 a 3 2
7 b 3 2
8 c 4 3
9 d 4 3
10 e 4 3
11 a 5 3
12 b 5 3
13 c 5 3
We can do this easily with data.table
library(data.table)
setDT(data)[, z := .N , rleid(y)]
data
# x y z
# 1: a 1 2
# 2: b 1 2
# 3: c 2 3
# 4: d 2 3
# 5: e 2 3
# 6: a 3 2
# 7: b 3 2
# 8: c 4 3
# 9: d 4 3
#10: e 4 3
#11: a 5 3
#12: b 5 3
#13: c 5 3
Or using rle from base R without any loops
inverse.rle(within.list(rle(data$y), values <- lengths))
#[1] 2 2 3 3 3 2 2 3 3 3 3 3 3
Or another base R method with ave
with(data, ave(y, cumsum(c(TRUE, y[-1]!= y[-length(y)])), FUN=length))
#[1] 2 2 3 3 3 2 2 3 3 3 3 3 3

How to reverse a column in R

I have a dataframe as described below. Now I want to reverse the order of column B without hampering the total order of the dataframe. So now the column B has 5,4,3,2,1. I want to change it to 1,2,3,4,5. I don't want to sort as it will hamper the total ordering.
A B C
1 5 6
2 4 8
3 3 5
4 2 5
5 1 3
You can replace just that column:
x$B <- rev(x$B)
On your data:
> x$B <- rev(x$B)
> x
A B C
1 1 1 6
2 2 2 8
3 3 3 5
4 4 4 5
5 5 5 3
transform is also handy for this:
> transform(x, B = rev(B))
A B C
1 1 1 6
2 2 2 8
3 3 3 5
4 4 4 5
5 5 5 3
This doesn't modify x so you need to assign the result to something (perhaps back to x).

Returning 1st Largest and 2nd Largest numbers

df <- A B C D E F G H
0 1 2 3 4 5 6 7
1 2 3 8 5 6 7 4
Need to find the 1st and 2nd largest number in the above given data frame . Result should be as below .
A B C D E F G H 1st Largest 2nd Largest
0 1 2 3 4 5 6 7 7 6
1 2 3 8 5 6 7 4 8 7
We can loop through the rows using apply (with MARGIN=1), sort the elements with decreasing=TRUE option, and get the first two elements with head or just [1:2], transpose the output and assign it to create two new columns in 'df'.
df[c("firstLargest", "SecondLargest")] <- t(apply(df, 1,
function(x) head(sort(x, decreasing=TRUE),2)))
df
# A B C D E F G H firstLargest SecondLargest
#1 0 1 2 3 4 5 6 7 7 6
#2 1 2 3 8 5 6 7 4 8 7

Resources