I have a tibble which has column names containing spaces & special characters which make it a hassle to work with. I want to change these column names to easier to use names while I'm working with the data, and then change them back to the original names at the end for display. Ideally, I want to be able to do this as part of a pipe, however I haven't figured out how to do it with rename_with().
Sample data:
df <- tibble(oldname1 = seq(1:10),
oldname2 = letters[seq(1:10)],
oldname3 = LETTERS[seq(1:10)])
cols_lookup <- tibble(old_names = c("oldname4", "oldname2", "oldname1"),
new_names = c("newname4", "newname2", "newname1"))
Desired output:
> head(df_renamed)
# A tibble: 6 x 3
newname1 newname2 oldname3
<int> <chr> <chr>
1 1 a A
2 2 b B
3 3 c C
4 4 d D
5 5 e E
6 6 f F
Some columns are removed & reordered during this work so when converting them back there will be entries in the cols_lookup table which are no longer in df. There are also new columns created in df which I want to remain named the same.
I am aware there are similar questions which have already been asked, however the answers either don't work well with tibbles or in a pipe (eg. those using match()), or don't work if the columns aren't all present in the same order in both tables.
We can use rename_at. From the master lookup table, filter the rows where the names of dataset have a match (filtered_lookup), then use that in rename_at where we specify the 'old_names' in vars and replace with the 'new_names'
library(dplyr)
filtered_lookup <- cols_lookup %>%
filter(old_names %in% names(df))
df %>%
rename_at(vars(filtered_lookup$old_names), ~ filtered_lookup$new_names)
Or using rename_with, use the same logic
df %>%
rename_with(.fn = ~filtered_lookup$new_names, .cols = filtered_lookup$old_names)
Or another option is rename with splicing (!!!) from a named vector
library(tibble)
df %>%
rename(!!! deframe(filtered_lookup[2:1]))
You can use rename_ with setnames
cols_lookup <- tibble(old_names = c("oldname3", "oldname2", "oldname1"),
new_names = c("newname3", "newname2", "newname1"))
df
rename_(df, .dots=setNames(cols_lookup$old_names, cols_lookup$new_names))
Output:
# A tibble: 10 x 3
newname1 newname2 newname3
<int> <chr> <chr>
1 1 a A
2 2 b B
3 3 c C
4 4 d D
5 5 e E
6 6 f F
7 7 g G
8 8 h H
9 9 i I
10 10 j J
Related
I am struggling to count the number of unique combinations in my data. I would like to first group them by the id and then count, how many times combination of each values occurs. here, it does not matter if the elements are combined in 'd-f or f-d, they still belongs in teh same category, as they have same element:
combinations:
n
c-f: 2 # aslo f-c
c-d-f: 1 # also cfd or fdc
d-f: 2 # also f-d or d-f. The dash is only for isualization purposes
Dummy example:
# my data
dd <- data.frame(id = c(1,1,2,2,2,3,3,4, 4, 5,5),
cat = c('c','f','c','d','f','c','f', 'd', 'f', 'f', 'd'))
> dd
id cat
1 1 c
2 1 f
3 2 c
4 2 d
5 2 f
6 3 c
7 3 f
8 4 d
9 4 f
10 5 f
11 5 d
Using paste is a great solution provided by #benson23, but it considers as unique category f-d and d-f. I wish, however, that the order will not matter. Thank you!
Create a "combination" column in summarise, we can count this column afterwards.
An easy way to count the category is to order them at the beginning, then in this case they will all be in the same order.
library(dplyr)
dd %>%
group_by(id) %>%
arrange(id, cat) %>%
summarize(combination = paste0(cat, collapse = "-"), .groups = "drop") %>%
count(combination)
# A tibble: 3 x 2
combination n
<chr> <int>
1 c-d-f 1
2 c-f 2
3 d-f 2
I have a tibble which has column names containing spaces & special characters which make it a hassle to work with. I want to change these column names to easier to use names while I'm working with the data, and then change them back to the original names at the end for display. Ideally, I want to be able to do this as part of a pipe, however I haven't figured out how to do it with rename_with().
Sample data:
df <- tibble(oldname1 = seq(1:10),
oldname2 = letters[seq(1:10)],
oldname3 = LETTERS[seq(1:10)])
cols_lookup <- tibble(old_names = c("oldname4", "oldname2", "oldname1"),
new_names = c("newname4", "newname2", "newname1"))
Desired output:
> head(df_renamed)
# A tibble: 6 x 3
newname1 newname2 oldname3
<int> <chr> <chr>
1 1 a A
2 2 b B
3 3 c C
4 4 d D
5 5 e E
6 6 f F
Some columns are removed & reordered during this work so when converting them back there will be entries in the cols_lookup table which are no longer in df. There are also new columns created in df which I want to remain named the same.
I am aware there are similar questions which have already been asked, however the answers either don't work well with tibbles or in a pipe (eg. those using match()), or don't work if the columns aren't all present in the same order in both tables.
We can use rename_at. From the master lookup table, filter the rows where the names of dataset have a match (filtered_lookup), then use that in rename_at where we specify the 'old_names' in vars and replace with the 'new_names'
library(dplyr)
filtered_lookup <- cols_lookup %>%
filter(old_names %in% names(df))
df %>%
rename_at(vars(filtered_lookup$old_names), ~ filtered_lookup$new_names)
Or using rename_with, use the same logic
df %>%
rename_with(.fn = ~filtered_lookup$new_names, .cols = filtered_lookup$old_names)
Or another option is rename with splicing (!!!) from a named vector
library(tibble)
df %>%
rename(!!! deframe(filtered_lookup[2:1]))
You can use rename_ with setnames
cols_lookup <- tibble(old_names = c("oldname3", "oldname2", "oldname1"),
new_names = c("newname3", "newname2", "newname1"))
df
rename_(df, .dots=setNames(cols_lookup$old_names, cols_lookup$new_names))
Output:
# A tibble: 10 x 3
newname1 newname2 newname3
<int> <chr> <chr>
1 1 a A
2 2 b B
3 3 c C
4 4 d D
5 5 e E
6 6 f F
7 7 g G
8 8 h H
9 9 i I
10 10 j J
This question already has answers here:
Count common sets of items between different customers
(4 answers)
Intersect all possible combinations of list elements
(3 answers)
Closed 1 year ago.
Suppose you have a dataframe with ids and elements prescripted to each id. For example:
example <- data.frame(id = c(1,1,1,1,1,2,2,2,3,4,4,4,4,4,4,4,5,5,5,5),
vals = c("a","b",'c','d','e','a','b','d','c',
'd','f','g','h','a','k','l','m', 'a',
'b', 'c'))
I want to find all possible pair combinations. The main struggle here is not the functional of R language that I can use, but the logic. How can I iterate through all elements and find patterns? For instance, a was picked with b 3 times in my sample dataframe. But original dataframe is more than 30k rows, so I cannot count these combinations manually. How do I automatize this process of finding the number of picks of each elements?
I was thinking about widening my df with pivot_wider and then using map_lgl to find matches. Then I faced the problem that it will take a lot of time for me to find all possible combinations, applying map_lgl for every pair of elements.
I was asking nearly the same question less than a month ago, fellow users answered it but the result is not anything I really need.
Do you have any ideas how to create a dataframe with all possible combinations of values for all ids?
I understand that this code is slow, but here is another example code to get the expected output based on tidyverse package.
What I do here is first create a nested dataframe by id, then produce all pair combinations for each id, unnest the dataframe, and finally count the pairs.
library(tidyverse)
example <- data.frame(
id = c(1,1,1,1,1,2,2,2,3,4,4,4,4,4,4,4,5,5,5,5),
vals = c("a","b",'c','d','e','a','b','d','c','d','f','g','h','a','k','l','m','a','b', 'c')
)
example %>% nest(dataset=-id) %>% mutate(dataset=map(dataset, function(dataset){
if(nrow(dataset)>1){
dataset %>% .$vals %>% combn(., 2) %>% t() %>% as_tibble(.name_repair=~c("val1", "val2")) %>% return()
}else{
return(NULL)
}
})) %>% unnest(cols=dataset) %>% group_by(val1, val2) %>% summarize(n=n(), .groups="drop") %>% arrange(desc(n), val1, val2)
#> # A tibble: 34 x 3
#> val1 val2 n
#> <chr> <chr> <int>
#> 1 a b 3
#> 2 a c 2
#> 3 a d 2
#> 4 b c 2
#> 5 b d 2
#> 6 a e 1
#> 7 a k 1
#> 8 a l 1
#> 9 b e 1
#> 10 c d 1
#> # … with 24 more rows
Created on 2021-03-04 by the reprex package (v1.0.0)
This won't (can't) be fast for many IDs. If it is too slow, you need to parallelize or implement it in a compiled language (e.g., using Rcpp).
We sort vals. We can then create all combination of two items grouped by ID. We exclude ID's with 1 item. Finally we tabulate the result.
library(data.table)
setDT(example)
setorder(example, id, vals)
example[, if (.N > 1) split(combn(vals, 2), 1:2), by = id][, .N, by = c("1", "2")]
# 1 2 N
# 1: a b 3
# 2: a c 2
# 3: a d 3
# 4: a e 1
# 5: b c 2
# 6: b d 2
# 7: b e 1
#<...>
I have a data frame with three columns. Each row contains three unique numbers between 1 and 5 (inclusive).
df <- data.frame(a=c(1,4,2),
b=c(5,3,1),
c=c(3,1,5))
I want to use mutate to create two additional columns that, for each row, contain the two numbers between 1 and 5 that do not appear in the initial three columns in ascending order. The desired data frame in the example would be:
df2 <- data.frame(a=c(1,4,2),
b=c(5,3,1),
c=c(3,1,5),
d=c(2,2,3),
e=c(4,5,4))
I tried to use the below mutate function utilizing setdiff to accomplish this, but returned NAs rather than the values I was looking for:
df <- df %>% mutate(d=setdiff(c(a,b,c),c(1:5))[1],
e=setdiff(c(a,b,c),c(1:5))[2])
I can get around this by looping through each row (or using an apply function) but would prefer a mutate approach if possible.
Thank you for your help!
Base R:
cbind(df, t(apply(df, 1, setdiff, x = 1:5)))
# a b c 1 2
# 1 1 5 3 2 4
# 2 4 3 1 2 5
# 3 2 1 5 3 4
Warning: if there are any non-numerical columns, apply will happily up-convert things (converting to a matrix internally).
We can use pmap to loop over the rows, create a list column and then unnest it to create two new columns
library(dplyr)
librayr(purrr)
library(tidyr)
df %>%
mutate(out = pmap(., ~ setdiff(1:5, c(...)) %>%
as.list%>%
set_names(c('d', 'e')))) %%>%
unnest_wider(c(out))
# A tibble: 3 x 5
# a b c d e
# <dbl> <dbl> <dbl> <int> <int>
#1 1 5 3 2 4
#2 4 3 1 2 5
#3 2 1 5 3 4
Or using base R
df[c('d', 'e')] <- do.call(rbind, lapply(asplit(df, 1), function(x) setdiff(1:5, x)))
This is a simplified version of a problem involving a large list containing complex tables. I want to extract the tables from the list and apply a function to each one. Here we can create a simple list containing small named data frames:
library(tidyverse)
table_names <- c('dfA', 'dfB', 'dfC')
dfA <- tibble(a = 1:3, b = 4:6, c = 7:9)
dfB <- tibble(a = 10:12, b = 13:15, c = 16:18)
dfC <- tibble(a = 19:21, b = 22:24, c = 25:27)
df_list <- list(dfA, dfB, dfC) %>% setNames(table_names)
Here is a simplified example of the kind of operation I would like to apply:
dfA_mod <- df_list$dfA %>%
mutate(name = 'dfA') %>%
select(name, everything())
In this example, I extract one of three tables in the list df_list$dfA, create a new column with the same value in each row mutate(name = 'dfA'), and re-order the columns so that the new column appears in the left-most position select(name, everything()). The resulting object is assigned to dfA_mod.
To solve the larger problem, I want to use one of the purrr::map() variants to apply the function over the character vector table_names, which was initiated in the first block of code above. The elements of table_names serve two purposes: 1) naming the tables held in the list; and 2) supplying values for the name column in the modified table.
I could write a function such as:
fun <- function(x) {
df_list$x %>%
mutate(name = x) %>%
select(name, everything()) %>%
assign(paste0(x, '_mod'), ., envir = .GlobalEnv)
}
And then use map() to create a new list of modified tables:
new_list <- df_list %>% map(table_name, fun(x))
But of course this code does not work, with the main obstacle being (for me at least) figuring out how to quote and unquote the right terms within the function. I'm a beginner at tidy evaluation, and I could use some help in specifying the function and using map properly.
Here is the desired output (for one modified table):
# A tibble: 3 x 4
name a b c
<chr> <int> <int> <int>
1 dfA 1 4 7
2 dfA 2 5 8
3 dfA 3 6 9
Thanks in advance for any help!
We can use purrr::imap which passes data in the list as well as name of the list
library(dplyr)
library(purrr)
df_out <- imap(df_list, ~.x %>% mutate(name = .y) %>% select(name, everything()))
df_out
#$dfA
# A tibble: 3 x 4
# name a b c
# <chr> <int> <int> <int>
#1 dfA 1 4 7
#2 dfA 2 5 8
#3 dfA 3 6 9
#$dfB
# A tibble: 3 x 4
# name a b c
# <chr> <int> <int> <int>
#1 dfB 10 13 16
#....
#....
This gives a list of desired dataframes, if you want them as separate dataframes, you can do
names(df_out) <- paste0(names(df_out), "_mod")
list2env(df_out, .GlobalEnv)
We can also do it using base R Map
df_out <- Map(function(x, y) transform(x, name = y)[c('name', names(x))],
df_list, names(df_list))
and give list names same as above.
We can convert it to a single data.frame with map while passing the .id
library(purrr)
map_dfr(df_list, I, .id = 'name')
Or with bind_rows
library(dplyr)
bind_rows(df_list, .id = 'name')
# A tibble: 9 x 4
# name a b c
# <chr> <int> <int> <int>
#1 dfA 1 4 7
#2 dfA 2 5 8
#3 dfA 3 6 9
#4 dfB 10 13 16
#5 dfB 11 14 17
#6 dfB 12 15 18
#7 dfC 19 22 25
#8 dfC 20 23 26
#9 dfC 21 24 27