How to order column of data.frame by another variable [duplicate] - r

This question already has answers here:
Sort data frame by two columns (with condition) [duplicate]
(2 answers)
Closed 5 years ago.
mydata <- data.frame(id = c(rep(1, 3), rep(2, 3), rep(3, 3)),
score = c(c(1, 2, 3), c(3, 2, 1), c(1, 3, 2)),
location = c(rep(c("X", "Y", "Z"), 3)))
> mydata
id score location
1 1 1 X
2 1 2 Y
3 1 3 Z
4 2 3 X
5 2 2 Y
6 2 1 Z
7 3 1 X
8 3 3 Y
9 3 2 Z
I would like to sort my data.frame according to score from smallest to largest for each id.
Simplying ordering by score ignores the id column.
> mydata[with(mydata, order(score)),]
id score location
1 1 1 X
6 2 1 Z
7 3 1 X
2 1 2 Y
5 2 2 Y
9 3 2 Z
3 1 3 Z
4 2 3 X
8 3 3 Y
Essentially, I want my output to be
id score location
1 1 1 X
2 1 2 Y
3 1 3 Z
4 2 1 Z
5 2 2 Y
6 2 3 X
7 3 1 X
8 3 2 Z
9 3 3 Y

Using base R only.
mydata[order(mydata$id, mydata$score), ]
id score location
1 1 1 X
2 1 2 Y
3 1 3 Z
6 2 1 Z
5 2 2 Y
4 2 3 X
7 3 1 X
9 3 2 Z
8 3 3 Y

You can use dplyr package:
library(dplyr)
mydata %>% arrange(id,score)
# id score location
# 1 1 1 X
# 2 1 2 Y
# 3 1 3 Z
# 4 2 1 Z
# 5 2 2 Y
# 6 2 3 X
# 7 3 1 X
# 8 3 2 Z
# 9 3 3 Y

Related

Creating list of all pairwise comparisons within data frame in R

From a data frame in R that has X Y coordinates (see example) I would like to add to rows (final X and final Y) to show all possible pairwise comparisons between the two.
dt = data.frame(X = seq(1, 5, by=1), Y = seq(1, 5, by=1))
This is the final goal but there should be a row for every possible combination of x, y and final_x, final_y
You can use expand.grid:
eg <- expand.grid(final_Y = 1:5, Y = 1:5, final_X = 1:5, X = 1:5)[,c(4,2,3,1)]
head(eg, n=20)
# X Y final_X final_Y
# 1 1 1 1 1
# 2 1 1 1 2
# 3 1 1 1 3
# 4 1 1 1 4
# 5 1 1 1 5
# 6 1 2 1 1
# 7 1 2 1 2
# 8 1 2 1 3
# 9 1 2 1 4
# 10 1 2 1 5
# 11 1 3 1 1
# 12 1 3 1 2
# 13 1 3 1 3
# 14 1 3 1 4
# 15 1 3 1 5
# 16 1 4 1 1
# 17 1 4 1 2
# 18 1 4 1 3
# 19 1 4 1 4
# 20 1 4 1 5
nrow(eg)
# [1] 625
I defined the columns out of order and reordered them simply to match the ordering of your expected output. One could easily do expand.grid(X=,Y=,final_X=,final_Y=) and leave off the [,c(...)] and the effective results would be the same but in a different row-order.

R: how to obtain unique pairwise combinations of 2 vectors [duplicate]

This question already has answers here:
How to generate permutations or combinations of object in R?
(3 answers)
Closed 2 years ago.
x = 1:3
y = 1:3
> expand.grid(x = 1:3, y = 1:3)
x y
1 1 1
2 2 1
3 3 1
4 1 2
5 2 2
6 3 2
7 1 3
8 2 3
9 3 3
Using expand.grid gives me all of the combinations. However, I want only pairwise comparisons, that is, I don't want a comparison of 1 vs 1, 2 vs, 2, or 3 vs 3. Moreover, I want to keep only the unique pairs, i.e., I want to keep 1 vs 2 (and not 2 vs 1).
In summary, for the above x and y, I want the following 3 pairwise combinations:
x y
1 1 2
2 1 3
3 2 3
Similarly, for x = y = 1:4, I want the following pairwise combinations:
x y
1 1 2
2 1 3
3 1 4
4 2 3
5 2 4
6 3 4
We can use combn
f1 <- function(x) setNames(as.data.frame(t(combn(x, 2))), c("x", "y"))
f1(1:3)
# x y
#1 1 2
#2 1 3
#3 2 3
f1(1:4)
# x y
#1 1 2
#2 1 3
#3 1 4
#4 2 3
#5 2 4
#6 3 4
Using data.table,
library(data.table)
x <- 1:4
y <- 1:4
CJ(x, y)[x < y]
x y
1: 1 2
2: 1 3
3: 1 4
4: 2 3
5: 2 4
6: 3 4
Actually you are already very close to the desired output. You may need subset as well
> subset(expand.grid(x = x, y = y), x < y)
x y
4 1 2
7 1 3
8 2 3
Here is another option but with longer code
v <- letters[1:5] # dummy data vector
mat <- diag(length(v))
inds <- upper.tri(mat)
data.frame(
x = v[row(mat)[inds]],
y = v[col(mat)[inds]]
)
which gives
x y
1 a b
2 a c
3 b c
4 a d
5 b d
6 c d
7 a e
8 b e
9 c e
10 d e

How to create columns from anothers columns?

I want to built a dataframe like df2 from df1, looking always for the name of the column where the value is closet to 0: Where clossets_1 - closer value to 0 of the columns x,y and z. clossets_2 - closer value to 0 of the columns x and a, because x is the most received value in clossets_1. clossets_3 - closer value to 0 of the columns a and b, because a is the most received value in clossets_2.
df1
df1
# x y z a b
#1 1 2 3 4 3
#2 2 3 4 1 2
#3 3 2 4 2 1
#4 4 3 2 3 6
Desire output:
df2
# x y z clossets_1 a clossets_2 b clossets_3
#1 1 2 3 x 4 x 3 b
#2 2 3 4 x 1 a 2 a
#3 3 2 4 y 2 a 1 b
#4 4 3 2 z 3 a 2 b
Here is the first step to get you started:
cols = c("x","y","z")
df2 = df1
df2$clossets_1 = cols[apply(df1[,cols], 1, function(x) {which(x == min(x))})]
df2
## x y z a b clossets_1
## 1 1 2 3 4 3 x
## 2 2 3 4 1 2 x
## 3 3 2 4 2 1 y
## 4 4 3 2 3 6 z
I solved it this way, using the first step of #BigFinger answer and the mlv() function from the package modeest to find the most repeated value in the closests columns
library(DescTools)
library(modeest)
library(tibble)
df1 = tibble(x = c(1,2,3,4),
y = c(2,3,2,3),
z = c(3,4,4,2),
clossest_1 = c("x","y","z")[apply(data.frame(x,y,z),1,function(x){which(x == Closest(x,0))})],
a = c(4,1,2,3),
clossest_2 = c(mlv(clossest_1),"a")[apply(data.frame(get(mlv(clossest_1)),a),1,function(x){which(x == Closest(x,0))})],
b = c(3,2,1,2),
clossest_3 = c(mlv(clossest_2),"b")[apply(data.frame(get(mlv(clossest_2)),b),1,function(x){which(x == Closest(x,0))})])
df1
# A tibble: 4 x 8
# x y z clossest_1 a clossest_2 b clossest_3
# <dbl> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <chr>
#1 1 2 3 x 4 x 3 b
#2 2 3 4 x 1 a 2 a
#3 3 2 4 y 2 a 1 b
#4 4 3 2 z 3 a 2 b

Creating unique group id's within a dataframe with more than one group

I have a dataframe:
y <- c(3, 3, 3, 2, 2, 2, 2, 1, 1, 2)
z <- c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4)
df <- data.frame(y, z)
> df
y z
1 3 1
2 3 1
3 3 1
4 2 2
5 2 2
6 2 3
7 2 3
8 1 3
9 1 4
10 2 4
Now i want to create a group-id. The groups are based on y and should be numbered from 1 to n. Repetitive numbers for y refer to one group. In addition, the groups are nested in other groups based on z, so equal numbers for y represent different groups if they are in different groups at z. That means: for y there are 6 groups, for z 4 groups. Result should be:
> df
y z group_id
1 3 1 1
2 3 1 1
3 3 1 1
4 2 2 2
5 2 2 2
6 2 3 3
7 2 3 3
8 1 3 4
9 1 4 5
10 2 4 6
I am happy about any help.
You can use rleid from data.table package -
df$group_id <- data.table::rleid(paste(df$y, df$z))
df
y z group_id
1 3 1 1
2 3 1 1
3 3 1 1
4 2 2 2
5 2 2 2
6 2 3 3
7 2 3 3
8 1 3 4
9 1 4 5
10 2 4 6
We can use rleid from data.table
library(data.table)
setDT(df)[, group_id := rleid(y, z)]

convert lists of vectors in just one tibble data frame

I have two lists. Each of them with many vectors (around 500) of different lengths and I would like to get a tibble data frame with three columns.
My reproducible example is the following:
> a
[[1]]
[1] 1 3 6
[[2]]
[1] 5 4
> b
[[1]]
[1] 3 4
[[2]]
[1] 5 6 7
I would like to get the following tibble data frame:
name index value
a 1 1
a 1 3
a 1 6
a 2 5
a 2 4
b 1 3
b 1 4
b 2 5
b 2 6
b 2 7
I would be grateful if someone could help me with this issue
using Base R:
transform(stack(c(a=a,b=b)),name=substr(ind,1,1),ind=substr(ind,2,2))
values ind name
1 1 1 a
2 2 1 a
3 3 1 a
4 5 2 a
5 6 2 a
6 3 1 b
7 4 1 b
8 5 2 b
9 6 2 b
10 7 2 b
using tidyverse:
library(tidyverse)
list(a=a,b=b)%>%map(~stack(setNames(.x,1:length(.x))))%>%bind_rows(.id = "name")
name values ind
1 a 1 1
2 a 2 1
3 a 3 1
4 a 5 2
5 a 6 2
6 b 3 1
7 b 4 1
8 b 5 2
9 b 6 2
10 b 7 2
Here is one option with tidyverse
library(tidyverse)
list(a= a, b = b) %>%
map_df(enframe, name = "index", .id = 'name') %>%
unnest
# A tibble: 10 x 3
# name index value
# <chr> <int> <dbl>
# 1 a 1 1
# 2 a 1 3
# 3 a 1 6
# 4 a 2 5
# 5 a 2 4
# 6 b 1 3
# 7 b 1 4
# 8 b 2 5
# 9 b 2 6
#10 b 2 7
data
a <- list(c(1, 3, 6), c(5, 4))
b <- list(c(3, 4), c(5, 6, 7))

Resources