How to rename last n columns of dataframe with vector of strings, using rename_with or rename? - r

I'm using rename_at, but since it is superseded, I need to find a way to rename n last columns with some vector of strings using rename_with() or rename()
library(tidyverse)
df <- tibble(
a = 1:10,
b = 1:10,
c = 1:10,
d = 1:10,
e = 1:10
)
new_names <- c("1", "2", "4", "5", "10")
df %>%
rename_at(vars(names(.) %>% tail(5)), funs(paste0("", new_names))) # only `funs(new_names)` won't work

Base R approach :
n <- ncol(df)
names(df)[(n-4):n] <- new_names
df
# A tibble: 10 x 5
# `1` `2` `4` `5` `10`
# <int> <int> <int> <int> <int>
# 1 1 1 1 1 1
# 2 2 2 2 2 2
# 3 3 3 3 3 3
# 4 4 4 4 4 4
# 5 5 5 5 5 5
# 6 6 6 6 6 6
# 7 7 7 7 7 7
# 8 8 8 8 8 8
# 9 9 9 9 9 9
#10 10 10 10 10 10

Using rename_with
library(dplyr)
library(stringr)
df %>%
rename_with(~ str_c(., new_names), tail(names(.), 5))
# A tibble: 10 x 5
# a1 b2 c4 d5 e10
# <int> <int> <int> <int> <int>
# 1 1 1 1 1 1
# 2 2 2 2 2 2
# 3 3 3 3 3 3
# 4 4 4 4 4 4
# 5 5 5 5 5 5
# 6 6 6 6 6 6
# 7 7 7 7 7 7
# 8 8 8 8 8 8
# 9 9 9 9 9 9
#10 10 10 10 10 10
Or with rename
df %>%
rename(!!! setNames(tail(names(.), 5), new_names))
Or using rename_at directly on the tail of names
df %>%
rename_at(vars(tail(names(.), 5)), ~ str_c(., new_names))
-output
# A tibble: 10 x 5
# a1 b2 c4 d5 e10
# <int> <int> <int> <int> <int>
# 1 1 1 1 1 1
# 2 2 2 2 2 2
# 3 3 3 3 3 3
# 4 4 4 4 4 4
# 5 5 5 5 5 5
# 6 6 6 6 6 6
# 7 7 7 7 7 7
# 8 8 8 8 8 8
# 9 9 9 9 9 9
#10 10 10 10 10 10
if it is to just replace the names
df %>%
rename_at(vars(tail(names(.), 5)), ~ new_names)
# A tibble: 10 x 5
# `1` `2` `4` `5` `10`
# <int> <int> <int> <int> <int>
# 1 1 1 1 1 1
# 2 2 2 2 2 2
# 3 3 3 3 3 3
# 4 4 4 4 4 4
# 5 5 5 5 5 5
# 6 6 6 6 6 6
# 7 7 7 7 7 7
# 8 8 8 8 8 8
# 9 9 9 9 9 9
#10 10 10 10 10 10
In the example, there are only 5 columns. Suppose, if it is only the last 3 columns
df %>%
rename_at(vars(tail(names(.), 3)), ~ str_c(., tail(new_names, 3)))
funs take a function, so wrapping with paste0 or as.character does that instead of just a input vector

Related

How to get the selected max/min value (i.e. second largest/smallest) across row by dplyr

As the title, How do I get the second/third largest/smallest value across rows by dplyr? Is there an elegant way to achieve it?
a <- data.frame(gp1=c(3:11), gp2=c(1:9), gp3=c(8,8,2,6,6,6,12,12,6))
## the max/min value is very simple
a %>%
rowwise() %>%
mutate(max1=max(gp1, gp2, gp3))
#
# # A tibble: 9 × 4
# # Rowwise:
# gp1 gp2 gp3 max1
# <int> <int> <dbl> <dbl>
# 1 3 1 8 8
# 2 4 2 8 8
# 3 5 3 2 5
# 4 6 4 6 6
# 5 7 5 6 7
# 6 8 6 6 8
# 7 9 7 12 12
# 8 10 8 12 12
# 9 11 9 6 11
The result should be similar to this:
#
# # A tibble: 9 × 4
# # Rowwise:
# gp1 gp2 gp3 max1 max2
# <int> <int> <dbl> <dbl> <dbl>
# 1 3 1 8 8 3
# 2 4 2 8 8 4
# 3 5 3 2 5 3
# 4 6 4 6 6 6
# 5 7 5 6 7 6
# 6 8 6 6 8 6
# 7 9 7 12 12 9
# 8 10 8 12 12 12
# 9 11 9 6 11 9
You can use c_across along with sort. The use of rev here reverses the sorted data, making it easy to select the largest value with index 1, the second-largest with index 2, etc.
Note that column "max2" in your example output makes errors in certain rows (I think you may have been including the "max1" column in some cases).
a %>%
rowwise() %>%
mutate(
max1 = max(gp1, gp2, gp3),
max2 = rev(sort(c_across(c(gp1, gp2, gp3))))[2]
)
gp1 gp2 gp3 max1 max2
<int> <int> <dbl> <dbl> <dbl>
1 3 1 8 8 3
2 4 2 8 8 4
3 5 3 2 5 3
4 6 4 6 6 6
5 7 5 6 7 6
6 8 6 6 8 6
7 9 7 12 12 9
8 10 8 12 12 10
9 11 9 6 11 9
A solution with pmap which does not involve rowwise:
library(purrr)
a %>%
mutate(max1 = pmax(gp1, gp2, gp3),
max2 = pmap(., ~ rev(sort(c(..1, ..2, ..3)))[2]))
gp1 gp2 gp3 max1 max2
1 3 1 8 8 3
2 4 2 8 8 4
3 5 3 2 5 3
4 6 4 6 6 6
5 7 5 6 7 6
6 8 6 6 8 6
7 9 7 12 12 9
8 10 8 12 12 10
9 11 9 6 11 9
I am sure there is a shorter way to automate it, but here is a quick solution for now:
library(dplyr)
library(slider)
a %>%
rowwise() %>%
mutate(output = list(slide_dfc(sort(c_across(everything()), decreasing = TRUE), max, .before = 1, .complete = TRUE))) %>%
unnest_wider(output) %>%
rename_with(~ sub('\\.+(\\d)', 'Max_\\1', .), contains('.')) %>%
suppressMessages()
# A tibble: 9 × 5
gp1 gp2 gp3 Max_1 Max_2
<int> <int> <dbl> <dbl> <dbl>
1 3 1 8 8 3
2 4 2 8 8 4
3 5 3 2 5 3
4 6 4 6 6 6
5 7 5 6 7 6
6 8 6 6 8 6
7 9 7 12 12 9
8 10 8 12 12 10
9 11 9 6 11 9
An option with pmax
library(dplyr)
a %>%
mutate(max1 = do.call(pmax, across(everything())),
across(starts_with('gp'), ~ replace(.x, .x == max1, NA))) %>%
transmute(max2 = do.call(pmax, c(across(starts_with('gp')), na.rm = TRUE))) %>%
bind_cols(a, .)
-output
gp1 gp2 gp3 max2
1 3 1 8 3
2 4 2 8 4
3 5 3 2 3
4 6 4 6 4
5 7 5 6 6
6 8 6 6 6
7 9 7 12 9
8 10 8 12 10
9 11 9 6 9
Or in base R
a$max2 <- do.call(pmax, c(replace(a, cbind(seq_len(nrow(a)),
max.col(a, 'first')), NA), na.rm = TRUE))
a$max2
[1] 3 4 3 6 6 6 9 10 9

join columns recursively in R

Hello I have a data frame of 245 columns but to add some sets and generate new columns try to do it recursively as follows
cl1<-sample(1:4,10,replace=TRUE)
cl2<-sample(1:4,10,replace=TRUE)
cl3<-sample(1:4,10,replace=TRUE)
cl4<-sample(1:4,10,replace=TRUE)
cl5<-sample(1:4,10,replace=TRUE)
cl6<-sample(1:4,10,replace=TRUE)
dat<-data.frame(cl1,cl2,cl3,cl4,cl5,cl6)
my intention is to add column 1 with column 3 and 5, likewise column 2 with 4 and 6 and in the end obtain a dataframe with two columns
and you should pay me something like that
I have programmed the following code
revisar<- function(a){
todos = list()
i=1
j=3
l=5
k=1
while(i<=2 ){
cl<-a[,i]
cl2<-a[,j]
cl3<-a[,l]
cl[is.na(cl)] <- 0
cl2[is.na(cl2)] <- 0
cl3[is.na(cl3)] <- 0
colu<-cl+cl2+cl3
col<-cbind(colu,colu)
i<-i+1
j<-j+1
l<-l+1
k<-k+1
}
return(col)
}
it turns out that it only returns column 2 repeated twice and I must replicate the same thing to join those 245 columns.7
I would like to know what is failing the example
base R
Literal programming:
with(dat, data.frame(s1 = cl1+cl3+cl5, s2 = cl2+cl4+cl6))
# s1 s2
# 1 7 11
# 2 7 7
# 3 4 11
# 4 4 10
# 5 9 8
# 6 12 5
# 7 7 6
# 8 7 10
# 9 4 9
# 10 6 5
Programmatically,
L <- list(s1 = c(1,3,5), s2 = c(2,4,6))
out <- data.frame(lapply(L, function(z) do.call(rowSums, list(as.matrix(dat[,z])))))
out
# s1 s2
# 1 7 11
# 2 7 7
# 3 4 11
# 4 4 10
# 5 9 8
# 6 12 5
# 7 7 6
# 8 7 10
# 9 4 9
# 10 6 5
dplyr
library(dplyr)
dat %>%
transmute(
s1 = rowSums(cbind(cl1, cl3, cl5)),
s2 = rowSums(cbind(cl2, cl4, cl6))
)
or programmatically using purrr:
purrr::map_dfc(L, ~ rowSums(dat[, .]))
Data
set.seed(42)
# your `dat` above
Here is an alternative general approach:
Here we sum all uneven columns -> s1 and
all even columns -> s2:
library(dplyr)
dat %>%
rowwise() %>%
mutate(s1 = sum(c_across(seq(1,ncol(dat),2)), na.rm = TRUE),
s2 = sum(c_across(seq(2,ncol(dat),2)), na.rm = TRUE))
cl1 cl2 cl3 cl4 cl5 cl6 s1 s2
<int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 3 2 3 2 7 5
2 2 4 1 4 2 3 5 11
3 2 2 2 2 1 3 5 7
4 2 4 4 3 1 4 7 11
5 2 4 4 3 2 2 8 9
6 3 3 3 2 2 2 8 7
7 2 1 1 2 1 4 4 7
8 2 4 1 3 2 3 5 10
9 3 1 1 2 3 4 7 7
10 2 4 1 3 4 4 7 11

Is there an R function which can pass elements of lists as arguments without specifying individual elements

Is there an R function which can pass all the elements of a list as the arguments of a function?
library(tidyr)
a <- c(1,2,3)
b <- c(4,5,6)
c <- c(7,8,9)
d <- list(a,b,c)
crossing(d[[1]],d[[2]],d[[3]])
Instead of specifying d[[1]],d[[2]],d[[3]], i'd like to just include d
Expected result:
> crossing(d[[1]],d[[2]],d[[3]])
# A tibble: 27 x 3
`d[[1]]` `d[[2]]` `d[[3]]`
<dbl> <dbl> <dbl>
1 1 4 7
2 1 4 8
3 1 4 9
4 1 5 7
5 1 5 8
6 1 5 9
7 1 6 7
8 1 6 8
9 1 6 9
10 2 4 7
# ... with 17 more rows
You can use do.call to executes a function call and a list of arguments to be passed to it.
c(d[[1]],d[[2]],d[[3]])
#[1] 1 2 3 4 5 6 7 8 9
do.call("c", d)
#[1] 1 2 3 4 5 6 7 8 9
And for crossing, which needs not duplicated Column names:
library(tidyr)
names(d) <- seq_along(d)
do.call(crossing, d)
## A tibble: 27 x 3
# `1` `2` `3`
# <dbl> <dbl> <dbl>
# 1 1 4 7
# 2 1 4 8
# 3 1 4 9
# 4 1 5 7
# 5 1 5 8
# 6 1 5 9
# 7 1 6 7
# 8 1 6 8
# 9 1 6 9
#10 2 4 7
## … with 17 more rows

R how to fill in NA with rules

data=data.frame(person=c(1,1,1,2,2,2,2,3,3,3,3),
t=c(3,NA,9,4,7,NA,13,3,NA,NA,12),
WANT=c(3,6,9,4,7,10,13,3,6,9,12))
So basically I am wanting to create a new variable 'WANT' which takes the PREVIOUS value in t and ADDS 3 to it, and if there are many NA in a row then it keeps doing this. My attempt is:
library(dplyr)
data %>%
group_by(person) %>%
mutate(WANT_TRY = fill(t) + 3)
Here's one way -
data %>%
group_by(person) %>%
mutate(
# cs = cumsum(!is.na(t)), # creates index for reference value; uncomment if interested
w = case_when(
# rle() gives the running length of NA
is.na(t) ~ t[cumsum(!is.na(t))] + 3*sequence(rle(is.na(t))$lengths),
TRUE ~ t
)
) %>%
ungroup()
# A tibble: 11 x 4
person t WANT w
<dbl> <dbl> <dbl> <dbl>
1 1 3 3 3
2 1 NA 6 6
3 1 9 9 9
4 2 4 4 4
5 2 7 7 7
6 2 NA 10 10
7 2 13 13 13
8 3 3 3 3
9 3 NA 6 6
10 3 NA 9 9
11 3 12 12 12
Here is another way. We can do linear interpolation with the imputeTS package.
library(dplyr)
library(imputeTS)
data2 <- data %>%
group_by(person) %>%
mutate(WANT2 = na.interpolation(WANT)) %>%
ungroup()
data2
# # A tibble: 11 x 4
# person t WANT WANT2
# <dbl> <dbl> <dbl> <dbl>
# 1 1 3 3 3
# 2 1 NA 6 6
# 3 1 9 9 9
# 4 2 4 4 4
# 5 2 7 7 7
# 6 2 NA 10 10
# 7 2 13 13 13
# 8 3 3 3 3
# 9 3 NA 6 6
# 10 3 NA 9 9
# 11 3 12 12 12
This is harder than it seems because of the double NA at the end. If it weren't for that, then the following:
ifelse(is.na(data$t), c(0, data$t[-nrow(data)])+3, data$t)
...would give you want you want. The simplest way, that uses the same logic but doesn't look very clever (sorry!) would be:
.impute <- function(x) ifelse(is.na(x), c(0, x[-length(x)])+3, x)
.impute(.impute(data$t))
...which just cheats by doing it twice. Does that help?
You can use functional programming from purrr and "NA-safe" addition from hablar:
library(hablar)
library(dplyr)
library(purrr)
data %>%
group_by(person) %>%
mutate(WANT2 = accumulate(t, ~.x %plus_% 3))
Result
# A tibble: 11 x 4
# Groups: person [3]
person t WANT WANT2
<dbl> <dbl> <dbl> <dbl>
1 1 3 3 3
2 1 NA 6 6
3 1 9 9 9
4 2 4 4 4
5 2 7 7 7
6 2 NA 10 10
7 2 13 13 13
8 3 3 3 3
9 3 NA 6 6
10 3 NA 9 9
11 3 12 12 12

R: Summing a sequence of columns row-wise with dplyr

In the spirit of similar questions along these lines here and here, I would like to be able to sum across a sequence of columns in my data_frame & create a new column:
df_abc = data_frame(
FJDFjdfF = seq(1:100),
FfdfFxfj = seq(1:100),
orfOiRFj = seq(1:100),
xDGHdj = seq(1:100),
jfdIDFF = seq(1:100),
DJHhhjhF = seq(1:100),
KhjhjFlFLF = seq(1:100),
IgiGJIJFG= seq(1:100),
)
# this does what I want
df_abc %>%
mutate(
sum_1 = orfOiRFj + xDGHdj + jfdIDFF + DJHhhjhF
)
Clearly, if there are a lot of variables in this sequence, typing them out is not feasible. Also, the names of the variables are not regex-friendly, so cannot be selected by a rule, other than the fact that they occur in a sequence.
I am hoping that there exists an abstraction in the tidyverse, that allows something like:
df_abc %>%
mutate(
sum_1 = sum(orfOiRFj:DJHhhjhF)
)
Thanks.
You can use rowSums to do that:
# option 1
df_abc %>% mutate(sum_1 = rowSums(.[3:6]))
# option 2
df_abc %>% mutate(sum_1 = rowSums(select(.,orfOiRFj:DJHhhjhF)))
The result:
# A tibble: 100 x 9
FJDFjdfF FfdfFxfj orfOiRFj xDGHdj jfdIDFF DJHhhjhF KhjhjFlFLF IgiGJIJFG sum_1
<int> <int> <int> <int> <int> <int> <int> <int> <dbl>
1 1 1 1 1 1 1 1 1 4
2 2 2 2 2 2 2 2 2 8
3 3 3 3 3 3 3 3 3 12
4 4 4 4 4 4 4 4 4 16
5 5 5 5 5 5 5 5 5 20
6 6 6 6 6 6 6 6 6 24
7 7 7 7 7 7 7 7 7 28
8 8 8 8 8 8 8 8 8 32
9 9 9 9 9 9 9 9 9 36
10 10 10 10 10 10 10 10 10 40
# ... with 90 more rows

Resources