I want to rename many colums. Now I rewrite the statment for each column:
df <- data.frame(col1 = 1:4, col2 = c('a', 'b', 'c', 'd'), col3 = rep(1,4))
df %>%
rename(col1 = col1_new) %>%
rename(col2 = col2_new) %>%
rename(col3 = col3_new)
How do I avoid the duplication of the rename statement? Is there a solution using functional programming with R?
It is easier to use setNames than with rename
df %>%
setNames(., paste0(names(.), "_new"))
# col1_new col2_new col3_new
#1 1 a 1
#2 2 b 1
#3 3 c 1
#4 4 d 1
If there is no limitation such as all the steps should be done within the %>%, a more easier and general approach is
colnames(df) <- paste0(colnames(df), "_new")
Related
My data frame is as such:
#generic dataset
datatest <- data.frame(col1 = c(1,2,3,4), col2 = c('A', 'B', 'C', 'D'))
#character objects
name1 <- 'A'
name2 <- 'B'
I want to rename my columns using the name1 and name2 objects. These dynamically change in the code so I can't use the following:
#I DON'T WANT THIS
datatest %>% rename(A = col1, B = col2)
I want to use this:
datatest %>% rename(name1 = col1, name2 = col2)
but then the data table columns end up becoming 'name1' and 'name2' respectively, when they should be A and B. Here is the data table at the moment.
name1 (I want this to be A)
name2 (I want this to be B)
1
A
2
B
3
C
4
D
Any help is hugely appreciated. I have the same issue with kable tables too.
Thanks in advance!
Couple of options -
Using rename_with -
library(dplyr)
name1 <- 'A'
name2 <- 'B'
datatest %>% rename_with(~c(name1, name2), c(col1, col2))
#If there are only two columns in datatest
datatest %>% rename_with(~c(name1, name2))
# A B
#1 1 A
#2 2 B
#3 3 C
#4 4 D
Use a named vector
name <- c(A = 'col1', B = 'col2')
datatest %>% rename(!!name)
You may try
datatest %>% rename({{name1}} := col1, {{name2}} := col2)
A B
1 1 A
2 2 B
3 3 C
4 4 D
Here is one more option using !!!setNames
datatest %>%
rename(!!!setNames(names(.), c(name1, name2)))
A B
1 1 A
2 2 B
3 3 C
4 4 D
Data:
ID
B
C
1
NA
x
2
x
NA
3
x
x
Results:
ID
Unified
1
C
2
B
3
B_C
I'm trying to combine colums B and C, using mutate and unify, but how would I scale up this function so that I can reuse this for multiple columns (think 100+), instead of having to write out the variables each time? Or is there a function that's already built in to do this?
My current solution is this:
library(tidyverse)
Data %>%
mutate(B = replace(B, B == 'x', 'B'), C = replace(C, C == 'x', 'C')) %>%
unite("Unified", B:C, na.rm = TRUE, remove= TRUE)
We may use across to loop over the column, replace the value that corresponds to 'x' with column name (cur_column())
library(dplyr)
library(tidyr)
Data %>%
mutate(across(B:C, ~ replace(., .== 'x', cur_column()))) %>%
unite(Unified, B:C, na.rm = TRUE, remove = TRUE)
-output
ID Unified
1 1 C
2 2 B
3 3 B_C
data
Data <- structure(list(ID = 1:3, B = c(NA, "x", "x"), C = c("x", NA,
"x")), class = "data.frame", row.names = c(NA, -3L))
Here are couple of options.
Using dplyr -
library(dplyr)
cols <- names(Data)[-1]
Data %>%
rowwise() %>%
mutate(Unified = paste0(cols[!is.na(c_across(B:C))], collapse = '_')) %>%
ungroup -> Data
Data
# ID B C Unified
# <int> <chr> <chr> <chr>
#1 1 NA x C
#2 2 x NA B
#3 3 x x B_C
Base R
Data$Unified <- apply(Data[cols], 1, function(x)
paste0(cols[!is.na(x)], collapse = '_'))
I'm trying to iterate over throws of a data frame and get access to values in the columns of each row. Perhaps, I need a paradigm shift. I've attempted a vectorization approach. My ultimate objective is to use specific column values in each row to filter another data frame.
Any help would be appreciated.
df <- data.frame(a = 1:3, b = letters[24:26], c = 7:9)
f <- function(row) {
var1 <- row$a
var2 <- row$b
var3 <- row$c
}
pmap(df, f)
Is there a way to do this in purrr?
Using pmap, we can do
library(purrr)
pmap(df, ~ f(list(...)))
#[[1]]
#[1] 7
#[[2]]
#[1] 8
#[[3]]
#[1] 9
Or use rowwise with cur_data
library(dplyr)
df %>%
rowwise %>%
transmute(new = f(cur_data()))
-output
# A tibble: 3 x 1
# Rowwise:
# new
# <int>
#1 7
#2 8
#3 9
library(tidyverse)
df <- data.frame(a = 1:3, b = letters[24:26], c = 7:9)
f <- function(row) {
var1 <- row$a
var2 <- row$b
var3 <- row$c
}
df %>%
split(rownames(.)) %>%
map(
~f(.x)
)
I am looking to find common cases across groups in R, based on a tidy data set.
I could split the data sets and then join them, or use Reduce, but that seems laborious and I sure there must be a way to do this easily for tidy data, likely using dplyr and group_by().
Here is an example:
data <- data.frame(case = c('A', 'B', 'C', 'D', 'B', 'C', 'D', 'E'),
var = c(rep(1,4), rep(2, 4)))
case var
1 A 1
2 B 1
3 C 1
4 D 1
5 B 2
6 C 2
7 D 2
8 E 2
What I want is the cases common across variables: 'B', 'C', 'D'. I am thinking this should be easy but can't find an answer.
Group by case, then grab the first row for those cases that have the correct number of occurrences.
library(dplyr)
data %>%
group_by(case) %>%
slice(which(n_distinct(var) == n_distinct(.$var))[1])
After grouping by 'case', filter the groups having the number of distinct elements in 'var' equal to all the distinct elements in 'var', ungroup and get the distinct 'case'
library(dplyr)
data %>%
group_by(case) %>%
filter(n_distinct(var) == n_distinct(.$var)) %>%
ungroup %>%
distinct(case)
# A tibble: 3 x 1
# case
# <fct>
#1 B
#2 C
#3 D
Or using data.table
library(data.table)
setDT(data)[, .GRP[uniqueN(var) == uniqueN(data$var)], case]$case
#[1] B C D
Or using base R
with(data, names(Filter(function(x) all(unique(var) %in% x), split(var, case))))
#[1] "B" "C" "D"
I have a list of dataframes which I am trying to apply a script to which works for a single data frame.
Part of the script uses both piping and group_by:
df2 <- df1 %>%
group_by (col1) %>%
summarise(newcol = sum(col2))
I've tried various loops or variations with lapply but haven't been able to find a way for it to work with a lists of dataframes where it would be something along the lines of:
mylist2 <- mylist1 %>%
group_by (col1) %>%
summarise(newcol = sum(col2))
But obviously changed around to work with loops or lapply. I'm probably missing something simple here but would appreciate some help. Thanks
PS - I looked at providing the data from the lists but wasn't able to provide reproducible samples.
Here is a tidyverse way.
# generate some data
mylist1 <- replicate(2, data.frame(col1 = rep(letters[1:2], 2),
col2 = 1:4),
simplify = FALSE)
library(purrr)
library(dplyr)
mylist1 %>%
map(., ~ group_by(., col1) %>%
summarise(new_col = sum(col2)))
#[[1]]
# A tibble: 2 x 2
# col1 new_col
# <fct> <int>
#1 a 4
#2 b 6
#[[2]]
# A tibble: 2 x 2
# col1 new_col
# <fct> <int>
#1 a 4
#2 b 6
In base R you might try lapply and tapply
lapply(mylist1, function(x)
tapply(X = x[["col2"]], INDEX = x[["col1"]], FUN = 'sum'))
#[[1]]
#a b
#4 6
#[[2]]
#a b
#4 6