I have a dataframe that has two columns "id" and "detail" (df_current below). I need to group the dataframe by id, and spread the file so that the columns become "Interface1", "Interface2", etc. and the contents under the interface columns are the immediate values under each time the interface value appears. Essentially the "!" is working as a separator, but it is not needed in the output.
The desired output is shown below as: "df_needed_from_current".
I have tried multiple approaches (group_by, spread, reshape, dcast etc.), but can't get it to work. Any help would be greatly appreciated!
Sample Current Dataframe (code to create under):
id
detail
1
!
1
Interface1
1
a
1
b
1
!
1
Interface2
1
a
1
b
2
!
2
Interface1
2
a
2
b
2
c
2
!
2
Interface2
2
a
3
!
3
Interface1
3
a
3
b
3
c
3
d
df_current <- data.frame(
id = c("1","1","1","1","1","1","1","1","2",
"2","2","2","2","2","2","2","3","3",
"3","3","3","3","4","4","4","4","4",
"4","4","4","4","4","4","4","4","4",
"5","5","5","5","5","5","5","5","5",
"5","5","5","5"),
detail = c("!", "Interface1","a","b","!",
"Interface2","a","b","!","Interface1",
"a","b","c","!","Interface2","a",
"!", "Interface1","a","b","c","d",
"!", "Interface1","a","b","!",
"Interface2","a","b","c","!","Interface3",
"a","b","c","!","Interface1","a","b","!",
"Interface2","a","b","c","!","Interface3",
"a","b"))
Dataframe Needed (code to create under):
ID
Interface1
Interface2
Interface3
1
a
a
NA
1
b
b
NA
2
a
a
NA
2
b
NA
NA
2
c
NA
NA
3
a
NA
NA
3
b
NA
NA
3
c
NA
NA
3
d
NA
NA
df_needed_from_current <- data.frame(
id = c("1","1","2","2","2","3","3","3","3","4","4","4","5","5","5"),
Interface1 = c("a","b","a","b","c","a","b","c","d","a","b","NA","a","b","NA"),
Interface2 = c("a","b","a","NA","NA","NA","NA","NA","NA","a","b","c","a","b","c"),
Interface3 = c("NA","NA","NA","NA","NA","NA","NA","NA","NA","a","b","c","a","b","NA")
)
We remove the rows where the 'detail' values is "!", then create a new column 'interface' with only values that have prefix 'Interface' from 'detail', use fill from tidyr to fill the NA elements with the previous non-NA, filter the rows where the 'detail' values are not the same as 'interface' column, create a row sequence id with rowid(from data.table) and reshape to 'wide' format with pivot_wider
library(dplyr)
library(tidyr)
library(data.table)
library(stringr)
df_current %>%
filter(detail != "!") %>%
mutate(interface = case_when(str_detect(detail, 'Interface') ~ detail)) %>%
group_by(id) %>%
fill(interface) %>%
ungroup %>%
filter(detail != interface) %>%
mutate(rn = rowid(id, interface)) %>%
pivot_wider(names_from = interface, values_from = detail) %>%
select(-rn)
# A tibble: 15 x 4
# id Interface1 Interface2 Interface3
# <chr> <chr> <chr> <chr>
# 1 1 a a <NA>
# 2 1 b b <NA>
# 3 2 a a <NA>
# 4 2 b <NA> <NA>
# 5 2 c <NA> <NA>
# 6 3 a <NA> <NA>
# 7 3 b <NA> <NA>
# 8 3 c <NA> <NA>
# 9 3 d <NA> <NA>
#10 4 a a a
#11 4 b b b
#12 4 <NA> c c
#13 5 a a a
#14 5 b b b
#15 5 <NA> c <NA>
Related
I have a shopping list data like this:
df <- data.frame(id = 1:5, item = c("apple2milk5", "milk1", "juice3apple5", "egg10juice1", "egg8milk2"), stringsAsFactors = F)
# id item
# 1 1 apple2milk5
# 2 2 milk1
# 3 3 juice3apple5
# 4 4 egg10juice1
# 5 5 egg8milk2
I want to separate the variable item into multiple columns and record the number behind the goods. The problem I met is that the goods each person purchases are different so I cannot solve it using tidyr::separate() or other analogous functions. What I expect is:
# id apple milk juice egg
# 1 1 2 5 NA NA
# 2 2 NA 1 NA NA
# 3 3 5 NA 3 NA
# 4 4 NA NA 1 10
# 5 5 NA 2 NA 8
Note: The categories of goods in the market are unknown. So don't assume there are only 4 kinds of goods.
Thanks for any helps!
I just came up with a tidyverse solution which uses stringr::str_extract_all() to extract the quantities, sets their names as product names, and expands them to wide using tidyr::unnest_wider().
library(tidyverse)
df %>%
mutate(N = map2(str_extract_all(item, "\\d+"), str_extract_all(item, "\\D+"), set_names)) %>%
unnest_wider(N, transform = as.numeric)
# # A tibble: 5 × 6
# id item apple milk juice egg
# <int> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 apple2milk5 2 5 NA NA
# 2 2 milk1 NA 1 NA NA
# 3 3 juice3apple5 5 NA 3 NA
# 4 4 egg10juice1 NA NA 1 10
# 5 5 egg8milk2 NA 2 NA 8
I'll add yet another answer. It only slightly differs from #ASuliman's but uses a bit of the newer tidyr and some cute regex to become a bit more straightforward.
The regex trick is that the pattern "(?<=\\d)\\B(?=[a-z])" will match the non-boundary (i.e. an empty location) between numbers and letters, allowing you to create rows for every "apple5" type of entry. Extract the letters into an item column and numbers into a count column. Using the new pivot_wider which replaces spread, you can convert those counts to numeric values as you reshape.
library(dplyr)
library(tidyr)
df %>%
separate_rows(item, sep = "(?<=\\d)\\B(?=[a-z])") %>%
extract(item, into = c("item", "count"), regex = "^([a-z]+)(\\d+)$") %>%
pivot_wider(names_from = item, values_from = count, values_fn = list(count = as.numeric))
#> # A tibble: 5 x 5
#> id apple milk juice egg
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 5 NA NA
#> 2 2 NA 1 NA NA
#> 3 3 5 NA 3 NA
#> 4 4 NA NA 1 10
#> 5 5 NA 2 NA 8
Possibily something like this, and should work with any item/quantity.
It just assumes that the quantity follows the item.
Lets use a custom function which extracts item and quantity:
my_fun <- function(w) {
items <- stringr::str_split(w, "\\d+", simplify = T)
items <- items[items!=""] # dont now why but you get en empty spot each time
quantities <- stringr::str_split(w, "\\D+", simplify = T)
quantities <- quantities[quantities!=""]
d <- data.frame(item = items, quantity=quantities, stringsAsFactors = F)
return(d)
}
Example:
my_fun("apple2milk5")
# gives:
# item quantity
# 1 apple 2
# 2 milk 5
Now we can apply the function to each id, using nest and map:
library(dplyr)
df_result <- df %>%
nest(item) %>%
mutate(res = purrr::map(data, ~my_fun(.x))) %>%
unnest(res)
df_results
# # A tibble: 9 x 3
# id item quantity
# <int> <chr> <chr>
# 1 1 apple 2
# 2 1 milk 5
# 3 2 milk 1
# 4 3 juice 3
# 5 3 apple 5
# 6 4 egg 10
# 7 4 juice 1
# 8 5 egg 8
# 9 5 milk 2
Now we can use dcast() (probabily spread would work too):
data.table::dcast(df_result, id~item, value.var="quantity")
# id apple egg juice milk
# 1 1 2 <NA> <NA> 5
# 2 2 <NA> <NA> <NA> 1
# 3 3 5 <NA> 3 <NA>
# 4 4 <NA> 10 1 <NA>
# 5 5 <NA> 8 <NA> 2
Data:
df <- data.frame(id = 1:5, item = c("apple2milk5", "milk1", "juice3apple5", "egg10juice1", "egg8milk2"), stringsAsFactors = F)
tmp = lapply(strsplit(df$item, "(?<=\\d)(?=\\D)|(?<=\\D)(?=\\d)", perl = TRUE),
function(x) {
d = split(x, 0:1)
setNames(as.numeric(d[[2]]), d[[1]])
})
nm = unique(unlist(lapply(tmp, names)))
cbind(df, do.call(rbind, lapply(tmp, function(x) setNames(x[nm], nm))))
# id item apple milk juice egg
#1 1 apple2milk5 2 5 NA NA
#2 2 milk1 NA 1 NA NA
#3 3 juice3apple5 5 NA 3 NA
#4 4 egg10juice1 NA NA 1 10
#5 5 egg8milk2 NA 2 NA 8
Place a space before each numeric substring and a newline after it. Then read that data using read.table and unnest it. Finally use pivot_wider to convert from long to wide form.
library(dplyr)
library(tidyr)
df %>%
mutate(item = gsub("(\\d+)", " \\1\n", item)) %>%
rowwise %>%
mutate(item = list(read.table(text = item, as.is = TRUE))) %>%
ungroup %>%
unnest(item) %>%
pivot_wider(names_from = "V1", values_from = "V2")
giving:
# A tibble: 5 x 5
id apple milk juice egg
<int> <int> <int> <int> <int>
1 1 2 5 NA NA
2 2 NA 1 NA NA
3 3 5 NA 3 NA
4 4 NA NA 1 10
5 5 NA 2 NA 8
Variation
This is a variation of the above code that eliminates the unnest. We replace each numeric string by a space, that string, another space, the id and a newline. Then use read.table to read that in. Note the use of %$% rather than %>% before the read.table. Finally use pivot_wider to convert from long to wide form.
library(dplyr)
library(magrittr)
library(tidyr)
df %>%
rowwise %>%
mutate(item = gsub("(\\d+)", paste(" \\1", id, "\n"), item)) %$%
read.table(text = item, as.is = TRUE, col.names = c("nm", "no", "id")) %>%
ungroup %>%
pivot_wider(names_from = "nm", values_from = "no")
You can try
library(tidyverse)
library(stringi)
df %>%
mutate(item2 =gsub("[0-9]", " ", df$item)) %>%
mutate(item3 =gsub("[a-z]", " ", df$item)) %>%
mutate_at(vars(item2, item3), ~stringi::stri_extract_all_words(.) %>% map(paste, collapse=",")) %>%
separate_rows(item2, item3, sep = ",") %>%
spread(item2, item3)
id item apple egg juice milk
1 1 apple2milk5 2 <NA> <NA> 5
2 2 milk1 <NA> <NA> <NA> 1
3 3 juice3apple5 5 <NA> 3 <NA>
4 4 egg10juice1 <NA> 10 1 <NA>
5 5 egg8milk2 <NA> 8 <NA> 2
#replace any digit followed by a character "positive look-ahead assertion" by the digit plus a comma
library(dplyr)
library(tidyr)
df %>% mutate(item=gsub('(\\d+(?=\\D))','\\1,' ,item, perl = TRUE)) %>%
separate_rows(item, sep = ",") %>%
extract(item, into = c('prod','quan'), '(\\D+)(\\d+)') %>%
spread(prod, quan, fill=0)
id apple egg juice milk
1 1 2 0 0 5
2 2 0 0 0 1
3 3 5 0 3 0
4 4 0 10 1 0
5 5 0 8 0 2
This is a simple solution in base R and stringr:
goods <- unique(unlist(stringr::str_split(df$item, pattern = "[0-9]")))
goods <- goods[goods != ""]
df <- cbind(df$id, sapply(goods,
function(x) stringr::str_extract(df$item, pattern = paste0(x,"[0-9]*"))))
df <- as.data.frame(df)
df[-1] <- lapply(df[-1], function(x) as.numeric(stringr::str_extract(x, pattern = "[0-9]*$")))
names(df)[1] <- "id"
Output
id apple milk juice egg
1 1 2 5 NA NA
2 2 NA 1 NA NA
3 3 5 NA 3 NA
4 4 NA NA 1 10
5 5 NA 2 NA 8
Mostly base R with some input from stringr and data.table:
library(stringr)
library(data.table)
cbind(
id = df$id,
rbindlist(
lapply(df$item, function(x) as.list(setNames(str_extract_all(x, "[0-9]+")[[1]], strsplit(x, "[0-9]+")[[1]]))),
fill = TRUE
)
)
id apple milk juice egg
1: 1 2 5 <NA> <NA>
2: 2 <NA> 1 <NA> <NA>
3: 3 5 <NA> 3 <NA>
4: 4 <NA> <NA> 1 10
5: 5 <NA> 2 <NA> 8
A cleaner data.table solution with input from stringr:
df[,
.(it_count = str_extract_all(item, "[0-9]+")[[1]],
it_name = str_extract_all(item, "[^0-9]+")[[1]]),
by = id
][, dcast(.SD, id ~ it_name, value.var = "it_count")]
id apple egg juice milk
1: 1 2 <NA> <NA> 5
2: 2 <NA> <NA> <NA> 1
3: 3 5 <NA> 3 <NA>
4: 4 <NA> 10 1 <NA>
5: 5 <NA> 8 <NA> 2
I have 2 data frames. One (df1) looks like this:
var.1 var.2 var.3 var.4
1 7 9 1 2
2 4 6 9 7
3 2 NA NA NA
And the other (df2) looks like this:
var.a var.b var.c var.d
1 1 b c d
2 2 f g h
3 4 j k l
3 7 j k z
...
with all of the values listed out in var.1-var.4 in df1 in var.a of df2.
I want to match var.a from df2 across all of the columns listed in df1 and then add these columns to df1 with new/combined column names. So for instance it'll look like this:
var.1 var1.b var1.c var1.d ... var.4 var4.b var4.c var4.d
1 7 j k z 2 f g h
2 4 j k l 7 j k z
3 2 f g h NA NA NA NA
Thanks in advance!
Here's a tidyverse solution. First, I define the data frames.
df1 <- read.table(text = " var.1 var.2 var.3 var.4
1 7 9 1 2
2 4 6 9 7
3 2 NA NA NA", header = TRUE)
df2 <- read.table(text = " var.a var.b var.c var.d
1 1 b c d
2 2 f g h
3 4 j k l
4 7 j k z", header=TRUE)
Then, I load the libraries.
# Load libraries
library(tidyr)
library(dplyr)
library(tibble)
Finally, I restructure the data.
# Manipulate data
df1 %>%
rownames_to_column() %>%
gather(variable, value, -rowname) %>%
left_join(df2, by = c("value" = "var.a")) %>%
gather(foo, bar, -variable, -rowname) %>%
unite(goop, variable, foo) %>%
spread(goop, bar) %>%
select(-rowname)
#> Warning: attributes are not identical across measure variables;
#> they will be dropped
which gives,
#> var.1_value var.1_var.b var.1_var.c var.1_var.d var.2_value var.2_var.b
#> 1 7 j k z 9 <NA>
#> 2 4 j k l 6 <NA>
#> 3 2 f g h <NA> <NA>
#> var.2_var.c var.2_var.d var.3_value var.3_var.b var.3_var.c var.3_var.d
#> 1 <NA> <NA> 1 b c d
#> 2 <NA> <NA> 9 <NA> <NA> <NA>
#> 3 <NA> <NA> <NA> <NA> <NA> <NA>
#> var.4_value var.4_var.b var.4_var.c var.4_var.d
#> 1 2 f g h
#> 2 7 j k z
#> 3 <NA> <NA> <NA> <NA>
Created on 2019-05-30 by the reprex package (v0.3.0)
This is a little bit convoluted, but I'll try to explain.
I turn row numbers into a column at first, as this will help me put the data back together at the very end.
I go from wide to long format for df1.
I join df2 to df1 based on var.a and var.1 (now called value), respectively.
I go from wide to long again.
I combine the variable names from each data frame into one variable.
Finally, I go from long to wide format (this is where the row numbers come in handy) and drop the row numbers.
I have the following data frame in R
df1 <- data.frame(
"ID" = c("A", "B", "A", "B"),
"Value" = c(1, 2, 5, 5),
"freq" = c(1, 3, 5, 3)
)
I wish to obtain the following data frame
Value freq ID
1 1 A
2 NA A
3 NA A
4 NA A
5 1 A
1 NA B
2 2 B
3 NA B
4 NA B
5 5 B
I have tried the following code
library(tidyverse)
df_new <- bind_cols(df1 %>%
select(Value, freq, ID) %>%
complete(., expand(.,
Value = min(df1$Value):max(df1$Value))),)
I am getting the following output
Value freq ID
<dbl> <dbl> <fct>
1 1 A
2 3 B
3 NA NA
4 NA NA
5 5 A
5 3 B
I request someone to help me.
Using tidyr::full_seq we can find the full version of Value but nesting(full_seq(Value,1) will return an error:
Error: by can't contain join column full_seq(Value, 1) which is missing from RHS
so we need to add a name, hence nesting(Value=full_seq(Value,1)
library(tidyr)
df1 %>% complete(ID, nesting(Value=full_seq(Value,1)))
# A tibble: 10 x 3
ID Value freq
<fct> <dbl> <dbl>
1 A 1. 1.
2 A 2. NA
3 A 3. NA
4 A 4. NA
5 A 5. 5.
6 B 1. NA
7 B 2. 3.
8 B 3. NA
9 B 4. NA
10 B 5. 3.
Using data.table:
library(data.table)
setDT(df1)
setkey(df1, ID, Value)
df1[CJ(ID = c("A", "B"), Value = 1:5)]
ID Value freq
1: A 1 1
2: A 2 NA
3: A 3 NA
4: A 4 NA
5: A 5 5
6: B 1 NA
7: B 2 3
8: B 3 NA
9: B 4 NA
10: B 5 3
Would the following approach work for you?
with(data = df1,
expr = {
data.frame(Value = rep(wrapr::seqi(min(Value), max(Value)), length(unique(ID))),
ID = unique(ID))
}) %>%
left_join(y = df1,
by = c("ID" = "ID", "Value" = "Value")) %>%
arrange(ID, Value)
Results
Value ID freq
1 1 A 1
2 2 A NA
3 3 A NA
4 4 A NA
5 5 A 5
6 1 B NA
7 2 B 3
8 3 B NA
9 4 B NA
10 5 B 3
Comments
If I'm following your example correctly, your ID group takes values from 1 to 5. If this is the case, my approach would be to generate that reading unique combinations of both from the original data frame.
The only variable that is carried from the original data frame is freq that may / may not be available for a given par ID-Value. I would join that variable via left_join (as you seem to like tidyverse)
In your example, you have freq variable with values 1,3,5 but then in the example you list 1,2,5? In my example, I took original freq and left join it. You can modify it further using normal dplyr pipeline, if this is something you intended to do.
I have a messy table which has a single column that contains multiple category labels, separated by several delimiters. I want to us R to split that column at each delimiter, and create a new column for each category label. The methods I have seen can only split at one delimiter at a time.
My current table looks like this:
my_table = read.csv("./my_table.csv")
# > my_table
# ID TYPE TEXT
# 1 1 a blue water
# 2 2 a,b,c fresh water
# 3 3 a;b,f cold stream
# 4 4 f, b and c lovely sunset
# 5 5 b;c up there
I want a table that looks like this:
# ID A B C D TEXT
# 1 1 a blue water
# 2 2 a b c fresh water
# 3 3 a b d cold stream
# 4 4 b c d lovely sunset
# 5 5 b c up there
Here is what I have tried:
my_table1 <- my_table %>%
separate(TYPE, c('A', 'B'), ",")
my_table1
# > docs1
# ID A B TEXT
# 1 1 a <NA> blue water
# 2 2 a b fresh water
# 3 3 a;b f cold stream
# 4 4 f b and c lovely sunset
# 5 5 b;c <NA> up there
my_table2 <- my_table1 %>%
separate(A, c('A', 'C' ), ";")
# > docs2
# ID A C B TEXT
# 1 1 a <NA> <NA> blue water
# 2 2 a <NA> b fresh water
# 3 3 a b f cold stream
# 4 4 f <NA> b and c lovely sunset
# 5 5 b c <NA> up there
my_table3 <- my_table2 %>%
separate(A, c('A', 'D'), "and")
# > docs3
# ID A D C B TEXT
# 1 1 a <NA> <NA> <NA> blue water
# 2 2 a <NA> <NA> b fresh water
# 3 3 a <NA> b f cold stream
# 4 4 f <NA> <NA> b and c lovely sunset
# 5 5 b <NA> c <NA> up there
This gets me close, but the column names are off. Plus, I don't want to have to guess about where the string "b and c" ends up after a couple iterations. I have thousands of rows and maybe five or six categories. My guess is that there is a simpler way to do this.
As an alternative and to extend your tidyverse attempt, here is a solution using strsplit and unnest:
df %>%
mutate(
val = strsplit(as.character(TYPE), "(;|,\\s*|\\s*and\\s*)")) %>%
unnest() %>%
select(-TYPE) %>%
group_by(ID, TEXT) %>%
mutate(n = 1:n()) %>%
spread(n, val)
## A tibble: 5 x 5
## Groups: ID, TEXT [5]
# ID TEXT `1` `2` `3`
# <int> <fct> <chr> <chr> <chr>
#1 1 blue water a NA NA
#2 2 fresh water a b c
#3 3 cold stream a b f
#4 4 lovely sunset f b c
#5 5 up there b c NA
Note that this is not exactly the same as your expected output. It does however match #MKR's output.
Sample data
df <- read.table(text =
"ID TYPE TEXT
1 1 'a' 'blue water'
2 2 'a,b,c' 'fresh water'
3 3 'a;b,f' 'cold stream'
4 4 'f, b and c' 'lovely sunset'
5 5 'b;c' 'up there'")
The cSplit function from splitstackshape package can make problem easier to solve. An approach could be as:
library(splitstackshape)
# First use `gsub` to replace other delimiter and have only ',' delimiter.
my_table$TYPE <- gsub("and|;",",",my_table$TYPE)
Mod_df <- cSplit(my_table, "TYPE", sep = ",")
Mod_df
# ID TEXT TYPE_1 TYPE_2 TYPE_3
# 1: 1 blue water a NA NA
# 2: 2 fresh water a b c
# 3: 3 cold stream a b f
# 4: 4 lovely sunset f b c
# 5: 5 up there b c NA
The tidyr::gather and spread can be used to get the format mentioned by OP as:
library(tidyr)
gather(Mod_df, key, value, -ID,-TEXT) %>% mutate_if(is.factor, as.character) %>%
mutate(K = toupper(value)) %>%
select(-key) %>%
filter(!is.na(K)) %>%
spread(K, value)
# ID TEXT A B C F
# 1 1 blue water a <NA> <NA> <NA>
# 2 2 fresh water a b c <NA>
# 3 3 cold stream a b <NA> f
# 4 4 lovely sunset <NA> b c f
# 5 5 up there <NA> b c <NA>
Data
my_table <- read.table(text =
" ID TYPE TEXT
1 1 a 'blue water'
2 2 'a,b,c' 'fresh water'
3 3 'a;b,f' 'cold stream'
4 4 'f, b and c' 'lovely sunset'
5 5 'b;c' 'up there'",
header = TRUE, stringsAsFactors = FALSE)
I have some data as follows:
library(tidyr)
library(data.table)
thisdata <- data.frame(numbers = c(1,3,4,5,6,1,2,4,5,6)
,letters = c('A','A','A','A','A','B','B','B','B','B'))
otherdata <- data.frame(numbers = c(1,2,3,4,5,6))
I am looking to split 'thisdata' by the letters column, merge the two lists to 'otherdata' by the numbers column, then fill letters NA with the corresponding letter in that list. So:
out <- split(thisdata , f = thisdata$letters )
out2 <- lapply(out, function(x) merge(x,otherdata,by="numbers",all = TRUE))
However, I can't get the 'fill' function in tidyr to work within the lapply
out3 <- lapply(out2,function(x) fill(x$channel))
Error in UseMethod("fill_") :
no applicable method for 'fill_' applied to an object of class "NULL"
This is the output I'm after, but would rather perform the calculation within the list format:
out4 <- rbindlist(out2)
out5 <- out4 %>%
fill(letters) %>% #default direction down
fill(letters,.direction = "up")
numbers letters
1: 1 A
2: 2 A
3: 3 A
4: 4 A
5: 5 A
6: 6 A
7: 1 B
8: 2 B
9: 3 B
10: 4 B
11: 5 B
12: 6 B
fill expects a data frame as first parameter, try fill(x, letters) or x %>% fill(letters) with magrittr pipe:
out3 <- lapply(out2,function(x) fill(x, letters))
out3
#$A
# numbers letters
#1 1 A
#2 2 A
#3 3 A
#4 4 A
#5 5 A
#6 6 A
#$B
# numbers letters
#1 1 B
#2 2 B
#3 3 B
#4 4 B
#5 5 B
#6 6 B
A simpler method is use tidyr::complete:
thisdata %>%
complete(numbers = otherdata$numbers, letters) %>%
arrange(letters)
# A tibble: 12 x 2
# numbers letters
# <dbl> <fctr>
# 1 1 A
# 2 2 A
# 3 3 A
# 4 4 A
# 5 5 A
# 6 6 A
# 7 1 B
# 8 2 B
# 9 3 B
#10 4 B
#11 5 B
#12 6 B