df <- A B C D E F G H
0 1 2 3 4 5 6 7
1 2 3 8 5 6 7 4
Need to find the 1st and 2nd largest number in the above given data frame . Result should be as below .
A B C D E F G H 1st Largest 2nd Largest
0 1 2 3 4 5 6 7 7 6
1 2 3 8 5 6 7 4 8 7
We can loop through the rows using apply (with MARGIN=1), sort the elements with decreasing=TRUE option, and get the first two elements with head or just [1:2], transpose the output and assign it to create two new columns in 'df'.
df[c("firstLargest", "SecondLargest")] <- t(apply(df, 1,
function(x) head(sort(x, decreasing=TRUE),2)))
df
# A B C D E F G H firstLargest SecondLargest
#1 0 1 2 3 4 5 6 7 7 6
#2 1 2 3 8 5 6 7 4 8 7
Related
I would like to take the difference between two dataframes that are of different lengths and output a matrix in R.
x = data.frame(name=c('a','b','c','d','e'),length=c(5,6,7,8,9))
y = data.frame(name=c('r','t','v'),length=c(10,11,12))
> x
name length
1 a 5
2 b 6
3 c 7
4 d 8
5 e 9
> y
name length
1 r 10
2 t 11
3 v 12
The result I want is the difference in a matrix. Length of y minus length of x. I also want to keep the names consistent. So something like this:
>
0 r t v
a 5 6 7
b 4 5 6
c 3 4 5
d 2 3 4
e 1 2 3
How can I approach this problem?
This is an outer operation:
outer(setNames(x$length, x$name), setNames(y$length, y$name), FUN=\(x,y) y-x)
# r t v
#a 5 6 7
#b 4 5 6
#c 3 4 5
#d 2 3 4
#e 1 2 3
x l
1 1 a
2 3 b
3 2 c
4 3 b
5 2 c
6 4 d
7 5 f
8 2 c
9 1 a
10 1 a
11 3 b
12 4 d
The above is the input.
The below is the output.
x l
1 1 a
2 3 b
3 2 c
4 4 d
5 5 f
I know that column l will have the same value for each group_by(x).
l is a string
# Creation of dataset
x <- c(1,3,2,3,2,4,5,2,1,1,3,4)
l<- c("a","b","c","b","c","d","f","c","a","a","b","d")
df <- data.frame(x,l)
# Simply call unique function on your dataframe
dfu <- unique(df)
I have a dataframe with 5 columns and many many rows, that have repetition of elements only for the first 3 columns (in short, it is a volume built by several volumes, and so there are same coordinates (x,y,z) with different labels, and I would like to eliminate the repeated coordinates).
How can I eliminate these with R commands?
Thanks
AV
You can use duplicated function, e.g. :
# create an example data.frame
Lab1<-letters[1:10]
Lab2<-LETTERS[1:10]
x <- c(3,4,3,3,4,2,4,3,9,0)
y <- c(3,4,3,5,4,2,1,5,7,2)
z <- c(8,7,8,8,4,3,1,8,6,3)
DF <- data.frame(Lab1,Lab2,x,y,z)
> DF
Lab1 Lab2 x y z
1 a A 3 3 8
2 b B 4 4 7
3 c C 3 3 8
4 d D 3 5 8
5 e E 4 4 4
6 f F 2 2 3
7 g G 4 1 1
8 h H 3 5 8
9 i I 9 7 6
10 j J 0 2 3
# remove rows having repeated x,y,z
DF2 <- DF[!duplicated(DF[,c('x','y','z')]),]
> DF2
Lab1 Lab2 x y z
1 a A 3 3 8
2 b B 4 4 7
4 d D 3 5 8
5 e E 4 4 4
6 f F 2 2 3
7 g G 4 1 1
9 i I 9 7 6
10 j J 0 2 3
EDIT :
To allow choosing amongst the rows having the same coordinates, you can use for example by function (even if is less efficient then previous approach) :
res <- by(DF,
INDICES=paste(DF$x,DF$y,DF$z,sep='|'),
FUN=function(equalRows){
# equalRows is a data.frame with the rows having the same x,y,z
# for exampel here we choose the first row ordering by Lab1 then Lab2
row <- equalRows[order(equalRows$Lab1,equalRows$Lab2),][1,]
return(row)
})
DF2 <- do.call(rbind.data.frame,res)
> DF2
Lab1 Lab2 x y z
0|2|3 j J 0 2 3
2|2|3 f F 2 2 3
3|3|8 a A 3 3 8
3|5|8 d D 3 5 8
4|1|1 g G 4 1 1
4|4|4 e E 4 4 4
4|4|7 b B 4 4 7
9|7|6 i I 9 7 6
Here's a short version of my large dataframe
>k
a b c d e f
1 3 4 5 7 8
2 1 7 9 0 3
3 2 2 5 6 9
I want to split in a way so that I can make separate dataframes of a,b,& c and d,e,& f like this
>k
$`1`
a b c
1 3 4
2 1 7
3 2 2
$`2`
d e f
5 7 8
9 0 3
5 6 9
I tried something like this -
range = seq(3,6,3)
k<-split(k, cut(colnames(k), range))
But it doesn't work since colnames(k) has to be numeric. Any other simple idea?
Something like this?
group <- rep(1:2, each=3)
lapply(unique(group), FUN=function(n) k[group==n])
# [[1]]
# a b c
# 1 1 3 4
# 2 2 1 7
# 3 3 2 2
#
# [[2]]
# d e f
# 1 5 7 8
# 2 9 0 3
# 3 5 6 9
Image we have a matrix, M*N, M rows and N columns, like
b a d c e
a 2 1 4 3 5
b 3 2 5 4 6
c 1 3 3 2 4
I want to write a function, where take the above matrix, return the following matrix:
a b c d e
a 1 2 3 4 5
b 2 3 4 5 6
c 3 1 2 3 4
Where the first part of the matrix M*M, 3*3 in this case is symmetric in terms of rownames and colnames, and 3*5 in total, the rest 3*2 matrix is pushed afterwards.
For an N x M matrix where N <= M and all row names are contained in col names, this will bring the columns with names existing in row names to the front in the same order as the row names, and leave the rest of the columns in their original order after that:
mat_ord <- function(mx) mx[, c(rownames(mx), setdiff(colnames(mx), rownames(mx)))]
mat_ord(mx)
produces:
a b c d e
a 1 2 3 4 5
b 2 3 4 5 6
c 3 1 2 3 4
To see the difference, consider mx2 which has rows and columns ordered differently than mx:
e a b d c
b 6 2 3 5 4
a 5 1 2 4 3
c 4 3 1 3 2
And with mat_ord(mx2) we get:
b a c e d
b 3 2 4 6 5
a 2 1 3 5 4
c 1 3 2 4 3
UPDATE: this sorts rows and columns while ensuring symmetry on first N cols/rows:
mat_ord2 <- function(mx) mx[sort(rownames(mx)), c(sort(rownames(mx)), sort(setdiff(colnames(mx), rownames(mx))))]
mat_ord2(mx2)
produces:
a b c d e
a 1 2 3 4 5
b 2 3 4 5 6
c 3 1 2 3 4