Achieving the equivalent of rbind using tidyr [duplicate] - r

This question already has answers here:
tidyverse pivot_longer several sets of columns, but avoid intermediate mutate_wider steps [duplicate]
(3 answers)
Closed 2 years ago.
I have some data that looks like this.
set.seed(1)
df <- data.frame(group = rep(letters[1:2],each=3),
day = rep(1:3,2),
var1_mean = round(rnorm(6),2),
var1_sd = round(rnorm(6,5),2),
var2_mean = round(rnorm(6),2),
var2_sd = round(rnorm(6,5),2))
df
# output
# group day var1_mean var1_sd var2_mean var2_sd
# a 1 -0.63 5.49 -0.62 5.82
# a 2 0.18 5.74 -2.21 5.59
# a 3 -0.84 5.58 1.12 5.92
# b 1 1.60 4.69 -0.04 5.78
# b 2 0.33 6.51 -0.02 5.07
# b 3 -0.82 5.39 0.94 3.01
Now here is what I would like it to look like (and the code I used to get there)
library(tidyverse)
rbind(df %>% select(group, day, starts_with("var1")) %>% rename(mean = var1_mean, sd = var1_sd),
df %>% select(group, day, starts_with("var2")) %>% rename(mean = var2_mean, sd = var2_sd)) %>%
add_column(var = rep(paste0("var",1:2),each=6), .before = "group")
# output
# var group day mean sd
# var1 a 1 -0.63 5.49
# var1 a 2 0.18 5.74
# var1 a 3 -0.84 5.58
# var1 b 1 1.60 4.69
# var1 b 2 0.33 6.51
# var1 b 3 -0.82 5.39
# var2 a 1 -0.62 5.82
# var2 a 2 -2.21 5.59
# var2 a 3 1.12 5.92
# var2 b 1 -0.04 5.78
# var2 b 2 -0.02 5.07
# var2 b 3 0.94 3.01
Now my code obviously gets the job done but I was wondering if there is some way to use pivot_longer() or some other function to do it less clunkily.

We can use pivot_longer where we specify the names_sep as _ and the names_to with ".value" and a grouping name
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with('var'),
names_to = c('grp', '.value'), names_sep="_")
# group day grp mean sd
# <chr> <int> <chr> <dbl> <dbl>
# 1 a 1 var1 -0.63 5.49
# 2 a 1 var2 -0.62 5.82
# 3 a 2 var1 0.18 5.74
# 4 a 2 var2 -2.21 5.59
# 5 a 3 var1 -0.84 5.58
# 6 a 3 var2 1.12 5.92
# 7 b 1 var1 1.6 4.69
# 8 b 1 var2 -0.04 5.78
# 9 b 2 var1 0.33 6.51
#10 b 2 var2 -0.02 5.07
#11 b 3 var1 -0.82 5.39
#12 b 3 var2 0.94 3.01
we could remove the 'grp' column later
df %>%
pivot_longer(cols = starts_with('var'),
names_to = c('grp', '.value'), names_sep="_") %>%
select(-grp)

Related

Converting a list of named list to data frame

I have a list returned by sapply which looks like this:
> my_list
[,1] [,2] [,3] [,4]
val 1.73 2.73 4.71 5.27
cost 10.1 8.71 9.95 0.01
time 5.36 5.84 5.68 2.10
I'd like to convert it into a data frame:
id
val
cost
time
1
1.73
10.1
5.36
2
2.73
8.71
5.84
3
4.71
9.95
5.68
4
5.27
0.01
2.10
How can I transform the list into the data frame this way?
Edit: Here is the output of dput(my_list):
structure(list(1.73, 10.1, 5.36,2.73,8.71,5.84,
4.71,9.95,5.68, 5.27, 0.01, 2.10),
dim = c(3L, 4L), dimnames = list(c("val",
"cost", "time"), NULL))
Use t
t(dat) |>
transform(id = seq(ncol(dat)))
val cost time id
1 1.73 10.1 5.36 1
2 2.73 8.71 5.84 2
3 4.71 9.95 5.68 3
4 5.27 0.01 2.1 4
The elements in the data are list elements, if we want to make it regular vectors, an option is unnest
library(dplyr)
library(tidyr)
t(my_list) %>%
as_tibble %>%
unnest(where(is.list)) %>%
mutate(id = row_number(), .before = 1)
-output
# A tibble: 4 × 4
id val cost time
<int> <dbl> <dbl> <dbl>
1 1 1.73 10.1 5.36
2 2 2.73 8.71 5.84
3 3 4.71 9.95 5.68
4 4 5.27 0.01 2.1

creating a data.frame from a larger data.frame based on index values

I have two small data.frames (d1&d2) below. In d1, the column post varies across the rows (length(unique(d1$post)) == 1L gives FALSE).
From d1, I wonder how to form the following data.frame (ALWAYS, items with 1 suffix (ex. mpre1) are from control==F subset of the dataframe & items with 2 suffix (ex. mpre2) are from control==T subset):
# Desired output from `d1` (4 rows x 6 columns):
# mpre1 sdpre1 n1 mpre2 sdpre2 n2
#1 0.31 0.39 20 0.23 0.39 18 ##group=1,control=F&T,outcome=1
#2 3.54 1.21 20 3.08 1.57 18 ##group=1,control=F&T,outcome=2
#3 0.16 0.27 19 0.23 0.39 18 ##group=2,control=F&T,outcome=1
#4 2.85 1.99 19 3.08 1.57 18 ##group=2,control=F&T,outcome=2
In d2, the column post does NOT vary across the rows (length(unique(d2$post)) == 1L gives TRUE). From d2, I wonder how to form the following data.frame:
# Desired output from `d2`(4 rows x 6 columns):
# mpre1 sdpre1 n1 mpre2 sdpre2 n2
#1 81.6 10.8 73 80.50 11.20 80 ##group=1,control=F&T,outcome=1
#2 85.7 13.7 66 90.30 6.60 74 ##group=1,control=F&T,outcome=2
#3 81.4 10.9 72 80.50 11.20 80 ##group=2,control=F&T,outcome=1
#4 90.4 8.2 61 90.30 6.60 74 ##group=2,control=F&T,outcome=2
The index values (for group & outcome) to extract the above vectors from either d1 or d2 are given by (I mean for d1 put d1 for d2 put d2):
with(subset(d1,!control),rev(expand.grid(outcome=unique(outcome),group=unique(group))))
I have a list of these data.frames, thus a functional BASE R answer is highly appreciated (d1& d2 are below).
(d1 = read.csv("https://raw.githubusercontent.com/rnorouzian/m2/main/g.csv"))
# study.name group n mpre sdpre mpos sdpos post control outcome
#1 Diab_a 1 20 0.31 0.39 0.02 0.06 1 FALSE 1
#2 Diab_a 1 20 0.31 0.39 0.05 0.08 2 FALSE 1
#3 Diab_a 1 20 3.54 1.21 1.38 0.89 1 FALSE 2
#4 Diab_a 1 20 3.54 1.21 1.38 0.55 2 FALSE 2
#5 Diab_a 2 19 0.16 0.27 0.12 0.19 1 FALSE 1
#6 Diab_a 2 19 0.16 0.27 0.03 0.06 2 FALSE 1
#7 Diab_a 2 19 2.85 1.99 1.22 0.43 1 FALSE 2
#8 Diab_a 2 19 2.85 1.99 1.94 1.12 2 FALSE 2
#9 Diab_a 3 18 0.23 0.39 0.07 0.12 1 TRUE 1
#10 Diab_a 3 18 0.23 0.39 0.06 0.09 2 TRUE 1
#11 Diab_a 3 18 3.08 1.57 1.53 0.64 1 TRUE 2
#12 Diab_a 3 18 3.08 1.57 1.93 0.61 2 TRUE 2
(d2 = read.csv("https://raw.githubusercontent.com/rnorouzian/m2/main/g2.csv"))
# study.name group n mpre sdpre mpos sdpos post control outcome
#1 Dlsk_Krlr 1 73 81.6 10.8 83.1 11.1 1 FALSE 1
#2 Dlsk_Krlr 1 66 85.7 13.7 88.8 10.5 1 FALSE 2
#3 Dlsk_Krlr 2 72 81.4 10.9 85.0 8.1 1 FALSE 1
#4 Dlsk_Krlr 2 61 90.4 8.2 91.2 7.6 1 FALSE 2
#5 Dlsk_Krlr 3 80 80.5 11.2 80.8 10.7 1 TRUE 1
#6 Dlsk_Krlr 3 74 90.3 6.6 89.6 6.3 1 TRUE 2
Do you want this?
library(tidyverse)
d1 %>% select( n, mpre, sdpre, control, outcome, post) %>%
unique %>%
mutate(control = control + 1) %>%
pivot_wider(values_from = c(mpre, sdpre, n), names_from = control, names_glue = '{.value}{control}',
values_fn = list) %>%
mutate(across(ends_with('1') | ends_with('2'), ~ifelse(post ==1, map_dbl(., first),
map_dbl(., last)))) %>%
arrange(post) %>%
select(ends_with('1'), ends_with('2'))
# A tibble: 4 x 6
mpre1 sdpre1 n1 mpre2 sdpre2 n2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.31 0.39 20 0.23 0.39 18
2 3.54 1.21 20 3.08 1.57 18
3 0.16 0.27 19 0.23 0.39 18
4 2.85 1.99 19 3.08 1.57 18
for d2?
d2 %>% select(n, mpre, sdpre, control, outcome, post) %>%
mutate(control = control + 1) %>%
pivot_wider(values_from = c(mpre, sdpre, n), names_from = control,
names_glue = '{.value}{control}', values_fn = list) %>%
unnest(everything()) %>%
select(ends_with('1'), ends_with('2'))
# A tibble: 4 x 6
mpre1 sdpre1 n1 mpre2 sdpre2 n2
<dbl> <dbl> <int> <dbl> <dbl> <int>
1 81.6 10.8 73 80.5 11.2 80
2 81.4 10.9 72 80.5 11.2 80
3 85.7 13.7 66 90.3 6.6 74
4 90.4 8.2 61 90.3 6.6 74
If the strategy adopted in d2 is near to your expectation, you can do similar for `d1 also
d1 %>% select(n, mpre, sdpre, control, outcome, post) %>%
mutate(control = control + 1) %>%
pivot_wider(values_from = c(mpre, sdpre, n), names_from = control,
names_glue = '{.value}{control}', values_fn = list) %>%
unnest(everything()) %>%
select(ends_with('1'), ends_with('2')) %>% unique
# A tibble: 4 x 6
mpre1 sdpre1 n1 mpre2 sdpre2 n2
<dbl> <dbl> <int> <dbl> <dbl> <int>
1 0.31 0.39 20 0.23 0.39 18
2 0.16 0.27 19 0.23 0.39 18
3 3.54 1.21 20 3.08 1.57 18
4 2.85 1.99 19 3.08 1.57 18
I wrote the code in full BASE R.
Then checked it on d1 and d2 dataframe.
Then to checked it in an other dataframe, I created an imaginary dataset(d3) with 6 groups and 4 outomes
I implemented them on tidyverse answer, The results are similar to my code. answer(albeit with different rows order, because I wanted to make it similar to the desired output order).
But the difference is that this code is written completely in BASE R as question required.
d1 <- read.csv("https://raw.githubusercontent.com/rnorouzian/m2/main/g.csv")
d2 <- read.csv("https://raw.githubusercontent.com/rnorouzian/m2/main/g2.csv")
d1_index <- rev(expand.grid(outcome = unique(d1$outcome), group =unique(d1$group)))
d2_index <- rev(expand.grid(outcome = unique(d2$outcome), group =unique(d2$group)))
df_fin <- function(index, data){
group1 <- group2 <- data.frame()
sd <- data.frame()
group2_df <- group1_df <- final_df <- data.frame()
Groups_list <- list()
outcome_no <- range(data$outcome)
for (i in unique(data$group)) {
for (j in seq_len(range(outcome_no)[2])){
couple <- subset(index, group == i & (outcome == j))
subsetted_df <- subset(data, group == couple[1, 1] & (outcome == couple[1,2] | outcome == couple[2,2]) & post == 1)
if (TRUE %in% subsetted_df$control){
subsetted_df <- subset(data, group == couple[1, 1] & (outcome == couple[1,2] | outcome == couple[2,2]))
subsetted_df <- subsetted_df[order(subsetted_df$post),]
} else {
subsetted_df <- subset(data, group == couple[1, 1] & (outcome == couple[1,2] | outcome == couple[2,2]) & post == 1)
}
sd <- rbind(sd, subsetted_df)
}
}
G1 <- subset(sd, control == FALSE)
G2 <- subset(sd, control == TRUE)
G2 <- G2[order(G2$post),]
G1_r <- G1[,c("mpre", "sdpre", "n")]
G2_r <- G2[,c("mpre", "sdpre", "n")]
colnames(G1_r) <- c("mpre1", "sdpre1", "n1")
colnames(G2_r) <- c("mpre2", "sdpre2", "n2")
final_df <- cbind(G1_r, G2_r)
return(final_df)
}
#Check the function(Base R) on d1,d2
# d1
df_fin(d1_index, d1)
## mpre1 sdpre1 n1 mpre2 sdpre2 n2
## 1 0.31 0.39 20 0.23 0.39 18
## 3 3.54 1.21 20 3.08 1.57 18
## 5 0.16 0.27 19 0.23 0.39 18
## 7 2.85 1.99 19 3.08 1.57 18
#d2
df_fin(d2_index, d2)
## mpre1 sdpre1 n1 mpre2 sdpre2 n2
## 1 81.6 10.8 73 80.5 11.2 80
## 2 85.7 13.7 66 90.3 6.6 74
## 3 81.4 10.9 72 80.5 11.2 80
## 4 90.4 8.2 61 90.3 6.6 74
#Creating an example dataframe d3 to ckeck the code functionality for other dataset
d3 <- d1
d3 <- rbind(d3, d1)
d3[13:24, 'group'] <- rep(c(4,5,6), each = 4)
d3[13:24, 'outcome'] <- rep(c(3,4), each = 2)
d3[13:24, 'mpre'] <- rep(c(111,999), each = 2)
d3_index <- rev(expand.grid(outcome = unique(d3$outcome), group =unique(d3$group)))
#Check on d3
df_fin(d3_index, d3)
## mpre1 sdpre1 n1 mpre2 sdpre2 n2
## 1 0.31 0.39 20 0.23 0.39 18
## 3 3.54 1.21 20 3.08 1.57 18
## 5 0.16 0.27 19 111.00 0.39 18
## 7 2.85 1.99 19 999.00 1.57 18
## 13 111.00 0.39 20 0.23 0.39 18
## 15 999.00 1.21 20 3.08 1.57 18
## 17 111.00 0.27 19 111.00 0.39 18
## 19 999.00 1.99 19 999.00 1.57 18

Using pivot_longer in tidyr with a complex separator [duplicate]

This question already has an answer here:
How to use Pivot_longer to reshape from wide-type data to long-type data with multiple variables
(1 answer)
Closed 2 years ago.
In a previous post here I tried to get the equivalent of an rbind using tidyr::pivotlonger(). This is the data and the solution.
set.seed(1)
df1 <- data.frame(group = rep(letters[1:2],each=3),
day = rep(1:3,2),
var1_mean = round(rnorm(6),2),
var1_sd = round(rnorm(6,5),2),
var2_mean = round(rnorm(6),2),
var2_sd = round(rnorm(6,5),2))
# group day var1_mean var1_sd var2_mean var2_sd
# 1 a 1 -0.63 5.49 -0.62 5.82
# 2 a 2 0.18 5.74 -2.21 5.59
# 3 a 3 -0.84 5.58 1.12 5.92
# 4 b 1 1.60 4.69 -0.04 5.78
# 5 b 2 0.33 6.51 -0.02 5.07
# 6 b 3 -0.82 5.39 0.94 3.01
df1 %>%
pivot_longer(cols = starts_with('var'),
names_to = c('grp', '.value'),
names_sep="_")
# group day grp mean sd
# <fct> <int> <chr> <dbl> <dbl>
# 1 a 1 var1 -0.63 5.49
# 2 a 1 var2 -0.62 5.82
# 3 a 2 var1 0.18 5.74
# 4 a 2 var2 -2.21 5.59
# 5 a 3 var1 -0.84 5.58
# 6 a 3 var2 1.12 5.92
# 7 b 1 var1 1.6 4.69
# 8 b 1 var2 -0.04 5.78
# 9 b 2 var1 0.33 6.51
# 10 b 2 var2 -0.02 5.07
# 11 b 3 var1 -0.82 5.39
# 12 b 3 var2 0.94 3.01
This solution is quite contingent on the naming convention used for the mean and sd variables. If there is a different naming convention, with a more complex separator between the two important nodes of the column names, like so...
df2 <- data.frame(group = rep(letters[1:2],each=3),
day = rep(1:3,2),
mean_var_1 = round(rnorm(6),2),
sd_var_1 = round(rnorm(6,5),2),
mean_var_2 = round(rnorm(6),2),
sd_var_2 = round(rnorm(6,5),2))
df2
# group day mean_var_1 sd_var_1 mean_var_2 sd_var_2
# 1 a 1 0.62 6.36 -0.39 5.70
# 2 a 2 -0.06 4.90 -0.06 5.56
# 3 a 3 -0.16 5.39 1.10 4.31
# 4 b 1 -1.47 4.95 0.76 4.29
# 5 b 2 -0.48 3.62 -0.16 5.36
# 6 b 3 0.42 4.59 -0.25 5.77
How would I achieve a similar result to the first example, with a single mean and sd column and with var_1 and var_2 as the grouping variable?
If you have names that are complicated you can use names_pattern argument where you can specify how each part of column name would be used to get data in long format.
tidyr::pivot_longer(df2,
cols = contains('var'),
names_to = c('.value', 'grp'),
names_pattern = '(.*?)_(.*)')
# group day grp mean sd
# <chr> <int> <chr> <dbl> <dbl>
# 1 a 1 var_1 0.62 6.36
# 2 a 1 var_2 -0.39 5.7
# 3 a 2 var_1 -0.06 4.9
# 4 a 2 var_2 -0.06 5.56
# 5 a 3 var_1 -0.16 5.39
# 6 a 3 var_2 1.1 4.31
# 7 b 1 var_1 -1.47 4.95
# 8 b 1 var_2 0.76 4.29
# 9 b 2 var_1 -0.48 3.62
#10 b 2 var_2 -0.16 5.36
#11 b 3 var_1 0.42 4.59
#12 b 3 var_2 -0.25 5.77
'(.*?)_(.*)' uses two groups of data where the first group is everything until the first underscore ((.*?)) in the column name and the second group is everything after the underscore following the first group ((.*)).

Provide tibble names in purrr

I would like to know if it is possible to provide column names in the as_tibble function. I know that I could use the rename function to change column names, but I would like to save the number of lines I write. Lets say I want my column names to be a1, a2, a3.
> library(purrr)
> library(tidyverse)
> 1:3 %>%
+ map(~ rnorm(104, .x)) %>%
+ map_dfc(~as_tibble(.x))
# A tibble: 104 x 3
value value1 value2
<dbl> <dbl> <dbl>
1 2.91139409 1.44646163 1.298360
2 0.87725704 4.05341889 3.892296
3 0.73230088 2.72506579 3.520865
4 1.02862344 2.09576397 4.009980
5 0.49159059 -1.23746772 3.172201
6 0.24665840 1.80876495 2.927716
7 0.75112051 2.22486452 2.896452
8 -0.06036349 3.63503054 3.218324
9 1.84431314 1.88562406 2.398761
10 0.70866474 0.08947359 3.954770
# ... with 94 more rows
We can put as_tibble with map_dfc, and then use setNames(paste0("a", seq_len(ncol(.)))) to change column name based on the number of columns.
library(tidyverse)
set.seed(123)
1:3 %>%
map_dfc(~as_tibble(rnorm(104, .x))) %>%
setNames(paste0("a", seq_len(ncol(.))))
# A tibble: 104 x 3
a1 a2 a3
<dbl> <dbl> <dbl>
1 0.440 1.05 4.65
2 0.770 1.95 2.95
3 2.56 1.22 3.12
4 1.07 0.332 3.24
5 1.13 1.62 4.23
6 2.72 2.92 2.48
7 1.46 1.42 2.01
8 -0.265 2.61 4.68
9 0.313 0.382 2.56
10 0.554 1.94 2.28
# ... with 94 more rows

change several column names() in data.frame() with str_replace_all()

I read this this question and practiced matching patterns, but I am still not figuring it.
I have a panel with the same measure, several times per year. Now, I want to rename them in a logical way. My raw data looks a bit like this,
set.seed(667)
dta <- data.frame(id = 1:6,
R1213 = runif(6),
R1224 = runif(6, 1, 2),
R1255 = runif(6, 2, 3),
R1235 = runif(6, 3, 4))
# install.packages(c("tidyverse"), dependencies = TRUE)
require(tidyverse)
(tbl <- dta %>% as_tibble())
#> # A tibble: 6 x 5
#> id R1213 R1224 R1255 R1235
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.488 1.60 2.07 3.07
#> 2 2 0.692 1.42 2.76 3.19
#> 3 3 0.262 1.34 2.33 3.82
#> 4 4 0.330 1.77 2.61 3.93
#> 5 5 0.582 1.92 2.15 3.86
#> 6 6 0.930 1.88 2.56 3.59
Now, I use str_replace_all() to rename them, here with only one variable in where I use pate, and everything is fine (it might also be possible to optimize this in other ways, if so please feel to let me know),
names(tbl) <- tbl %>% names() %>%
str_replace_all('^R1.[125].$', 'A') %>%
str_replace_all('^R1.[3].$', paste0('A.2018.', 1))
tbl
#> # A tibble: 6 x 5
#> id A A A A.2018.1
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.488 1.60 2.07 3.07
#> 2 2 0.692 1.42 2.76 3.19
#> 3 3 0.262 1.34 2.33 3.82
#> 4 4 0.330 1.77 2.61 3.93
#> 5 5 0.582 1.92 2.15 3.86
#> 6 6 0.930 1.88 2.56 3.59
Eveything call A is actually from the same year, let's say 2017, but with the suffix .1, .2, etc. need to appended. I start over and again use paste0('A.2017.', 1:3), but this time with three suffices,
tbl <- dta %>% as_tibble()
names(tbl) <- tbl %>% names() %>%
str_replace_all('^R1.[125].$', paste0('A.2017.', 1:3)) %>%
str_replace_all('^R1.[7].$', paste0('A.2018.', 1))
tbl
#> Warning message:
#> In stri_replace_all_regex(string, pattern, fix_replacement(replacement), :
#> longer object length is not a multiple of shorter object length
#> > tbl
#> # A tibble: 6 x 5
#> id A.2017.2 A.2017.3 A.2017.1 R1235
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.488 1.60 2.07 3.07
#> 2 2 0.692 1.42 2.76 3.19
#> 3 3 0.262 1.34 2.33 3.82
#> 4 4 0.330 1.77 2.61 3.93
#> 5 5 0.582 1.92 2.15 3.86
#> 6 6 0.930 1.88 2.56 3.59
this does come out, but the order is reversed and I am told longer object length is not a multiple of shorter object length, but isen't 3 the right length? I am looking to do this in a cleaner and simpler way. Also, I don't really like names(tbl) <-, if that can be done in a more elegant way.
Building on David's suggestion - how about something like the following using dplyr::rename_at?
library(dplyr)
## Get data
set.seed(667)
dta <- data.frame(id = 1:6,
R1213 = runif(6),
R1224 = runif(6, 1, 2),
R1255 = runif(6, 2, 3),
R1235 = runif(6, 3, 4)) %>%
as_tibble()
## Rename
dta <- dta %>%
rename_at(.vars = grep('^R1.[125].$', names(.)),
.funs = ~paste0("A.2017.", 1:length(.)))
dta
#> # A tibble: 6 x 5
#> id A.2017.1 A.2017.2 A.2017.3 R1235
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.196 1.74 2.51 3.49
#> 2 2 0.478 1.85 2.06 3.69
#> 3 3 0.780 1.32 2.21 3.26
#> 4 4 0.705 1.49 2.49 3.33
#> 5 5 0.942 1.59 2.66 3.58
#> 6 6 0.906 1.90 2.87 3.93
Vectorised solution for multiple patterns
For a complete solution that can be used for multiple patterns and replacements, we can make use of purr::map2_dfc as follows.
library(dplyr)
library(purrr)
## Get data
set.seed(667)
dta <- data.frame(id = 1:6,
R1213 = runif(6),
R1224 = runif(6, 1, 2),
R1255 = runif(6, 2, 3),
R1235 = runif(6, 3, 4)) %>%
as_tibble()
## Define a function to keep a hold out data set, then rename iteratively for each pattern and replacement.
rename_multiple_years <- function(df, patterns,
replacements,
hold_out_var = "id") {
hold_out_df <- df %>%
select_at(.vars = hold_out_var)
rename_df <- map2_dfc(patterns, replacements, function(pattern, replacement) {
df %>%
rename_at(.vars = grep(pattern, names(.)),
.funs = ~paste0(replacement, 1:length(.))) %>%
select_at(.vars = grep(replacement, names(.)))
})
final_df <- bind_cols(hold_out_df, rename_df)
return(final_df)
}
## Call function on specified patterns and replacements
renamed_dta <- dta %>%
rename_multiple_years(patterns = c("^R1.[125].$", "^R1.[3].$"),
replacements = c("A.2017.", "A.2018."))
renamed_dta
#> # A tibble: 6 x 5
#> id A.2017.1 A.2017.2 A.2017.3 A.2018.1
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.196 1.74 2.51 3.49
#> 2 2 0.478 1.85 2.06 3.69
#> 3 3 0.780 1.32 2.21 3.26
#> 4 4 0.705 1.49 2.49 3.33
#> 5 5 0.942 1.59 2.66 3.58
#> 6 6 0.906 1.90 2.87 3.93
Towards tidy data
Now that the variables have been renamed you might find it useful to have your data in a tidy format. The following using tidyr::gather might be useful.
library(tidyr)
library(dplyr)
#Use tidy dataframe gather all variables, split by "." and drop A column (or keep if a measurement id)
renamed_dta %>%
gather(key = "measure", value = "value", -id) %>%
separate(measure, c("A", "year", "measure"), "[[.]]") %>%
select(-A)
#> # A tibble: 24 x 4
#> id year measure value
#> <int> <chr> <chr> <dbl>
#> 1 1 2017 1 0.196
#> 2 2 2017 1 0.478
#> 3 3 2017 1 0.780
#> 4 4 2017 1 0.705
#> 5 5 2017 1 0.942
#> 6 6 2017 1 0.906
#> 7 1 2017 2 1.74
#> 8 2 2017 2 1.85
#> 9 3 2017 2 1.32
#> 10 4 2017 2 1.49
#> # ... with 14 more rows

Resources