Difference Matrix in R - r

I would like to take the difference between two dataframes that are of different lengths and output a matrix in R.
x = data.frame(name=c('a','b','c','d','e'),length=c(5,6,7,8,9))
y = data.frame(name=c('r','t','v'),length=c(10,11,12))
> x
name length
1 a 5
2 b 6
3 c 7
4 d 8
5 e 9
> y
name length
1 r 10
2 t 11
3 v 12
The result I want is the difference in a matrix. Length of y minus length of x. I also want to keep the names consistent. So something like this:
>
0 r t v
a 5 6 7
b 4 5 6
c 3 4 5
d 2 3 4
e 1 2 3
How can I approach this problem?

This is an outer operation:
outer(setNames(x$length, x$name), setNames(y$length, y$name), FUN=\(x,y) y-x)
# r t v
#a 5 6 7
#b 4 5 6
#c 3 4 5
#d 2 3 4
#e 1 2 3

Related

How to reverse a column in R

I have a dataframe as described below. Now I want to reverse the order of column B without hampering the total order of the dataframe. So now the column B has 5,4,3,2,1. I want to change it to 1,2,3,4,5. I don't want to sort as it will hamper the total ordering.
A B C
1 5 6
2 4 8
3 3 5
4 2 5
5 1 3
You can replace just that column:
x$B <- rev(x$B)
On your data:
> x$B <- rev(x$B)
> x
A B C
1 1 1 6
2 2 2 8
3 3 3 5
4 4 4 5
5 5 5 3
transform is also handy for this:
> transform(x, B = rev(B))
A B C
1 1 1 6
2 2 2 8
3 3 3 5
4 4 4 5
5 5 5 3
This doesn't modify x so you need to assign the result to something (perhaps back to x).

Returning 1st Largest and 2nd Largest numbers

df <- A B C D E F G H
0 1 2 3 4 5 6 7
1 2 3 8 5 6 7 4
Need to find the 1st and 2nd largest number in the above given data frame . Result should be as below .
A B C D E F G H 1st Largest 2nd Largest
0 1 2 3 4 5 6 7 7 6
1 2 3 8 5 6 7 4 8 7
We can loop through the rows using apply (with MARGIN=1), sort the elements with decreasing=TRUE option, and get the first two elements with head or just [1:2], transpose the output and assign it to create two new columns in 'df'.
df[c("firstLargest", "SecondLargest")] <- t(apply(df, 1,
function(x) head(sort(x, decreasing=TRUE),2)))
df
# A B C D E F G H firstLargest SecondLargest
#1 0 1 2 3 4 5 6 7 7 6
#2 1 2 3 8 5 6 7 4 8 7

how to delete dataframe's row with 3 of 5 equal element?

I have a dataframe with 5 columns and many many rows, that have repetition of elements only for the first 3 columns (in short, it is a volume built by several volumes, and so there are same coordinates (x,y,z) with different labels, and I would like to eliminate the repeated coordinates).
How can I eliminate these with R commands?
Thanks
AV
You can use duplicated function, e.g. :
# create an example data.frame
Lab1<-letters[1:10]
Lab2<-LETTERS[1:10]
x <- c(3,4,3,3,4,2,4,3,9,0)
y <- c(3,4,3,5,4,2,1,5,7,2)
z <- c(8,7,8,8,4,3,1,8,6,3)
DF <- data.frame(Lab1,Lab2,x,y,z)
> DF
Lab1 Lab2 x y z
1 a A 3 3 8
2 b B 4 4 7
3 c C 3 3 8
4 d D 3 5 8
5 e E 4 4 4
6 f F 2 2 3
7 g G 4 1 1
8 h H 3 5 8
9 i I 9 7 6
10 j J 0 2 3
# remove rows having repeated x,y,z
DF2 <- DF[!duplicated(DF[,c('x','y','z')]),]
> DF2
Lab1 Lab2 x y z
1 a A 3 3 8
2 b B 4 4 7
4 d D 3 5 8
5 e E 4 4 4
6 f F 2 2 3
7 g G 4 1 1
9 i I 9 7 6
10 j J 0 2 3
EDIT :
To allow choosing amongst the rows having the same coordinates, you can use for example by function (even if is less efficient then previous approach) :
res <- by(DF,
INDICES=paste(DF$x,DF$y,DF$z,sep='|'),
FUN=function(equalRows){
# equalRows is a data.frame with the rows having the same x,y,z
# for exampel here we choose the first row ordering by Lab1 then Lab2
row <- equalRows[order(equalRows$Lab1,equalRows$Lab2),][1,]
return(row)
})
DF2 <- do.call(rbind.data.frame,res)
> DF2
Lab1 Lab2 x y z
0|2|3 j J 0 2 3
2|2|3 f F 2 2 3
3|3|8 a A 3 3 8
3|5|8 d D 3 5 8
4|1|1 g G 4 1 1
4|4|4 e E 4 4 4
4|4|7 b B 4 4 7
9|7|6 i I 9 7 6

How to group consecutive columns of a dataframe using split function of R

Here's a short version of my large dataframe
>k
a b c d e f
1 3 4 5 7 8
2 1 7 9 0 3
3 2 2 5 6 9
I want to split in a way so that I can make separate dataframes of a,b,& c and d,e,& f like this
>k
$`1`
a b c
1 3 4
2 1 7
3 2 2
$`2`
d e f
5 7 8
9 0 3
5 6 9
I tried something like this -
range = seq(3,6,3)
k<-split(k, cut(colnames(k), range))
But it doesn't work since colnames(k) has to be numeric. Any other simple idea?
Something like this?
group <- rep(1:2, each=3)
lapply(unique(group), FUN=function(n) k[group==n])
# [[1]]
# a b c
# 1 1 3 4
# 2 2 1 7
# 3 3 2 2
#
# [[2]]
# d e f
# 1 5 7 8
# 2 9 0 3
# 3 5 6 9

Arrange a matrix with regard to the rownames and colnames

Image we have a matrix, M*N, M rows and N columns, like
b a d c e
a 2 1 4 3 5
b 3 2 5 4 6
c 1 3 3 2 4
I want to write a function, where take the above matrix, return the following matrix:
a b c d e
a 1 2 3 4 5
b 2 3 4 5 6
c 3 1 2 3 4
Where the first part of the matrix M*M, 3*3 in this case is symmetric in terms of rownames and colnames, and 3*5 in total, the rest 3*2 matrix is pushed afterwards.
For an N x M matrix where N <= M and all row names are contained in col names, this will bring the columns with names existing in row names to the front in the same order as the row names, and leave the rest of the columns in their original order after that:
mat_ord <- function(mx) mx[, c(rownames(mx), setdiff(colnames(mx), rownames(mx)))]
mat_ord(mx)
produces:
a b c d e
a 1 2 3 4 5
b 2 3 4 5 6
c 3 1 2 3 4
To see the difference, consider mx2 which has rows and columns ordered differently than mx:
e a b d c
b 6 2 3 5 4
a 5 1 2 4 3
c 4 3 1 3 2
And with mat_ord(mx2) we get:
b a c e d
b 3 2 4 6 5
a 2 1 3 5 4
c 1 3 2 4 3
UPDATE: this sorts rows and columns while ensuring symmetry on first N cols/rows:
mat_ord2 <- function(mx) mx[sort(rownames(mx)), c(sort(rownames(mx)), sort(setdiff(colnames(mx), rownames(mx))))]
mat_ord2(mx2)
produces:
a b c d e
a 1 2 3 4 5
b 2 3 4 5 6
c 3 1 2 3 4

Resources