Apply same recoding rules to multiple data frames - r

I have 5 data frames. I want to recode all variables ending with "_comfort", "_agree", and "effective" using the same rules for each data frame. As is, the values in each column are 1:5 and I want is to recode 5's to 1, 4's to 2, 2's to 4, and 5's to 1 (3 will stay the same).
I do not want the end result to one merged dataset, but instead to apply the same recoding rules across all 5 independent data frames. For simplicity sake, let's just assume I have 2 data frames:
df1 <- data.frame(a_comfort = c(1, 2, 3, 4, 5),
b_comfort = c(1, 2, 3, 4, 5),
c_effective = c(1, 2, 3, 4, 5))
df2 <- data.frame(a_comfort = c(1, 2, 3, 4, 5),
b_comfort = c(1, 2, 3, 4, 5),
c_effective = c(1, 2, 3, 4, 5))
What I want is:
df1 <- data.frame(a_comfort = c(5, 4, 3, 2, 1),
b_comfort = c(5, 4, 3, 2, 1),
c_effective = c(5, 4, 3, 2, 1))
df2 <- data.frame(a_comfort = c(5, 4, 3, 2, 1),
b_comfort = c(5, 4, 3, 2, 1),
c_effective = c(5, 4, 3, 2, 1))
Conventionally, I would use dplyr's mutate_atand ends_withto achieve my goal, but have not been successful with this method across multiple data frames. I am thinking a combination of the purr and dplyr packages will work, but haven't nailed down the exact technique.
Thanks in advance for any help!

You can use get() and assign() in a loop:
library(dplyr)
for (df_name in c("df1", "df2")) {
df <- mutate(
get(df_name),
across(
ends_with(c("_comfort", "_agree", "_effective")),
\(x) 6 - x
)
)
assign(df_name, df)
}
Result:
#> df1
a_comfort b_comfort c_effective
1 5 5 5
2 4 4 4
3 3 3 3
4 2 2 2
5 1 1 1
#> df2
a_comfort b_comfort c_effective
1 5 5 5
2 4 4 4
3 3 3 3
4 2 2 2
5 1 1 1
Note, however, it’s often better practice to keep multiple related dataframes in a list than loose in the global environment (see). In this case, you can use purrr::map() (or base::lapply()):
library(dplyr)
library(purrr)
dfs <- list(df1, df2)
dfs <- map(
dfs,
\(df) mutate(
df,
across(
ends_with(c("_comfort", "_agree", "_effective")),
\(x) 6 - x
)
)
)
Result:
#> dfs
[[1]]
a_comfort b_comfort c_effective
1 5 5 5
2 4 4 4
3 3 3 3
4 2 2 2
5 1 1 1
[[2]]
a_comfort b_comfort c_effective
1 5 5 5
2 4 4 4
3 3 3 3
4 2 2 2
5 1 1 1

You can use ls(pattern = 'df\\d+') to find all objects whose names match a certain pattern. Then store them into a list and pass to purrr::map or lapply for recoding.
library(dplyr)
df.lst <- purrr::map(
mget(ls(pattern = 'df\\d+')),
~ .x %>% mutate(6 - across(ends_with(c("_comfort", "_agree", "effective"))))
)
# $df1
# a_comfort b_comfort c_effective
# 1 5 5 5
# 2 4 4 4
# 3 3 3 3
# 4 2 2 2
# 5 1 1 1
#
# $df2
# a_comfort b_comfort c_effective
# 1 5 5 5
# 2 4 4 4
# 3 3 3 3
# 4 2 2 2
# 5 1 1 1
You can further overwrite those dataframes in your workspace from the list through list2env().
list2env(df.lst, .GlobalEnv)

Please try the below code where i convert the columns to factor and then recode them
data
a_comfort b_comfort c_effective
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
code
library(tidyverse)
df1 %>% mutate(across(c(ends_with('comfort'),ends_with('effective')), ~ factor(.x, levels=c('1','2','3','4','5'), labels=c('5','4','3','2','1'))))
output
a_comfort b_comfort c_effective
1 5 5 5
2 4 4 4
3 3 3 3
4 2 2 2
5 1 1 1

Related

Count how many rows have the same ID and add the number in an new column

My dataframe contains data about political careers, such as a unique identifier (called: ui) column for each politician and the electoral term(called: electoral_term) in which they were elected. Since a politician can be elected in multiple electoral terms, there are multiple rows that contain the same ui.
Now I would like to add another column to my dataframe, that counts how many times the politician got re-elected.
So e.g. the politician with ui=1 was re-elected 2 times, since he occured in 3 electoral_terms.
I already tried
df %>% count(ui)
But that only gives out a table which can't be added into my dataframe.
Thanks in advance!
We may use base R
df$reelected <- with(df, ave(ui, ui, FUN = length)-1)
-output
> df
ui electoral reelected
1 1 1 2
2 1 2 2
3 1 3 2
4 2 2 0
5 3 7 1
6 3 9 1
data
df <- structure(list(ui = c(1, 1, 1, 2, 3, 3), electoral = c(1, 2,
3, 2, 7, 9)), class = "data.frame", row.names = c(NA, -6L))
mydf <- tibble::tribble(~ui, ~electoral, 1, 1, 1, 2, 1, 3, 2, 2, 3, 7, 3, 9)
library(dplyr)
df |>
add_count(ui, name = "re_elected") |>
mutate(re_elected = re_elected - 1)
# A tibble: 6 × 3
ui electoral re_elected
<dbl> <dbl> <dbl>
1 1 1 2
2 1 2 2
3 1 3 2
4 2 2 0
5 3 7 1
6 3 9 1
library(tidyverse)
df %>%
group_by(ui) %>%
mutate(re_elected = n() - 1)
# A tibble: 6 × 3
# Groups: ui [3]
ui electoral re_elected
<dbl> <dbl> <dbl>
1 1 1 2
2 1 2 2
3 1 3 2
4 2 2 0
5 3 7 1
6 3 9 1

How to bind tibbles by row with different number of columns in R [duplicate]

This question already has answers here:
Combine two data frames by rows (rbind) when they have different sets of columns
(14 answers)
Closed 2 years ago.
I want to bind df1 with df2 by row, keeping the same column name, to obtain df3.
library(tidyverse)
df1 <- tibble(a = c(1, 2, 3),
b = c(4, 5, 6),
c = c(1, 5, 7))
df2 <- tibble(a = c(8, 9),
b = c(5, 6))
# how to bind these tibbles by row to get
df3 <- tibble(a = c(1, 2, 3, 8, 9),
b = c(4, 5, 6, 5, 6),
c = c(1, 5, 7, NA, NA))
Created on 2020-10-30 by the reprex package (v0.3.0)
Try this using bind_rows() from dplyr. Updated credit to #AbdessabourMtk:
df3 <- dplyr::bind_rows(df1,df2)
Output:
# A tibble: 5 x 3
a b c
<dbl> <dbl> <dbl>
1 1 4 1
2 2 5 5
3 3 6 7
4 8 5 NA
5 9 6 NA
A base R option
df2[setdiff(names(df1),names(df2))]<-NA
df3 <- rbind(df1,df2)
giving
> df3
# A tibble: 5 x 3
a b c
<dbl> <dbl> <dbl>
1 1 4 1
2 2 5 5
3 3 6 7
4 8 5 NA
5 9 6 NA
We can use rbindlist from data.table
library(data.table)
rbindlist(list(df1, df2), fill = TRUE)
-output
# a b c
#1: 1 4 1
#2: 2 5 5
#3: 3 6 7
#4: 8 5 NA
#5: 9 6 NA

R - How to recode multiple columns [duplicate]

This question already has answers here:
Replacing character values with NA in a data frame
(7 answers)
Closed 1 year ago.
I am trying to change the 6s to NAs across multiple columns. I have tried using the mutate_at command in dplyr, but can't seem to make it work. Any ideas?
library(dplyr)
ID <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) #Create vector of IDs for ID column.
Score1 <- c(1, 2, 3, 2, 5, 6, 6, 2, 5, 4) #Create vector of scores for Score1 column.
Score2 <- c(2, 2, 3, 6, 5, 6, 6, 2, 3, 4) #Create vector of scores for Score2 column.
Score3 <- c(3, 2, 3, 4, 5, 5, 6, 2, 6, 4) #Create vector of scores for Score3 column.
df <- data.frame(ID, Score1, Score2, Score3) #Combine columns into a data frame.
VectorOfNames <- as.vector(c("Score1", "Score2", "Score3")) #Create a vector of column names.
df <- mutate_at(df, VectorOfNames, 6=NA) #Within the data frame, apply the function (6=NA) to the columns specified in VectorOfNames.
dplyr has the na_if() function for precisely this task. You were almost there with your code and can use:
mutate_at(df, VectorOfNames, ~na_if(.x, 6))
ID Score1 Score2 Score3
1 1 1 2 3
2 2 2 2 2
3 3 3 3 3
4 4 2 NA 4
5 5 5 5 5
6 6 NA NA 5
7 7 NA NA NA
8 8 2 2 2
9 9 5 3 NA
10 10 4 4 4
You could use :
library(dplyr)
df %>%mutate_at(VectorOfNames, ~replace(., . == 6, NA))
#OR
#df %>%mutate_at(VectorOfNames, ~ifelse(. == 6, NA, .))
# ID Score1 Score2 Score3
#1 1 1 2 3
#2 2 2 2 2
#3 3 3 3 3
#4 4 2 NA 4
#5 5 5 5 5
#6 6 NA NA 5
#7 7 NA NA NA
#8 8 2 2 2
#9 9 5 3 NA
#10 10 4 4 4
Or in base R :
df[VectorOfNames][df[VectorOfNames] == 6] <- NA

Generate random sequential number by group with multiple times

I'm trying to generate random number by group with multiple times.
For example,
> set.seed(1002)
> df<-data.frame(ID=LETTERS[seq(1:5)],num=sample(c(2,3,4), size=5, replace=TRUE))
> df
ID num
1 A 3
2 B 4
3 C 3
4 D 2
5 E 3
In ID, I want to generate sequential random number without replacement with (for example) 4 times.
If ID is A, it will randomly select numbers among 1:3 4 times. So, this will be
sample(c(1,2,3,1,2,3,1,2,3),replace=FALSE)
or
ep(sample(c(1:4), replace=FALSE),times=4)
If the results is 3 2 1 2 1 3 2 3 3 1 1 2, then the data will be
ID num
1 A 3
2 A 2
3 A 2
4 A 1
5 A 1
6 A 3
7 A 2
8 A 1
9 A 3
I tried several things, like
df%>%group_by(ID)%>%mutate(random=sample(rep(1:num,times=4),replace=FALSE))
It failed. The warning appeared with In 1:num
I also tried this.
ddply(df,.(ID),function(x) sample(rep(1:num,times=4),replace=FALSE))
The error appeared again, with NA/NaN.
I would really appreciate if you let me know how to solve this problem.
We can create a list-column and then unnest it to have separate rows.
n <- 4
library(dplyr)
df %>%
group_by(ID) %>%
mutate(num = list(sample(rep(seq_len(num), n)))) %>%
tidyr::unnest(num)
# ID num
# <fct> <int>
# 1 A 2
# 2 A 2
# 3 A 2
# 4 A 3
# 5 A 3
# 6 A 1
# 7 A 3
# 8 A 1
# 9 A 1
#10 A 3
# … with 50 more rows
I'm not quite clear on your expected output.
The following samples num elements from 1:num with replacement, and stores samples in a list column sample.
library(tidyverse)
set.seed(2018)
df %>% mutate(sample = map(num, ~sample(1:.x, replace = T)))
# ID num sample
#1 A 2 1, 1
#2 B 4 3, 4, 1, 2
#3 C 2 1, 1
#4 D 4 3, 3, 4, 4
#5 E 2 2, 2
Or if you want to repeat sampling num elements (with replacement) 4 times, you can do
set.seed(2018)
df %>%
mutate(sample = map(num, ~as.numeric(replicate(4, sample(1:.x, replace = T)))))
#ID num sample
#1 A 2 1, 1, 1, 2, 1, 2, 1, 1
#2 B 4 3, 3, 4, 4, 4, 4, 4, 2, 3, 4, 3, 3, 2, 1, 1, 2
#3 C 2 1, 1, 1, 1, 1, 1, 1, 2
#4 D 4 2, 3, 2, 1, 3, 4, 1, 2, 1, 2, 2, 1, 1, 1, 2, 1
#5 E 2 2, 1, 2, 2, 1, 1, 1, 2

dplyr how to lag by group

I have a data frame of orders and receivables with lead times.
Can I use dplyr to fill in the receive column according to the groups lead time?
df <- data.frame(team = c("a","a","a","a", "a", "b", "b", "b", "b", "b"),
order = c(2, 4, 3, 5, 6, 7, 8, 5, 4, 5),
lead_time = c(3, 3, 3, 3, 3, 2, 2, 2, 2, 2))
>df
team order lead_time
a 2 3
a 4 3
a 3 3
a 5 3
a 6 3
b 7 2
b 8 2
b 5 2
b 4 2
b 5 2
And adding a receive column like so:
dfb <- data.frame(team = c("a","a","a","a", "a", "b", "b", "b", "b", "b"),
order = c(2, 4, 3, 5, 6, 7, 8, 5, 4, 5),
lead_time = c(3, 3, 3, 3, 3, 2, 2, 2, 2, 2),
receive = c(0, 0, 0, 2, 4, 0, 0, 7, 8, 5))
>dfb
team order lead_time receive
a 2 3 0
a 4 3 0
a 3 3 0
a 5 3 2
a 6 3 4
b 7 2 0
b 8 2 0
b 5 2 7
b 4 2 8
b 5 2 5
I was thinking along these lines but run into an error
dfc <- df %>%
group_by(team) %>%
mutate(receive = if_else( row_number() < lead_time, 0, lag(order, n = lead_time)))
Error in mutate_impl(.data, dots) :
could not convert second argument to an integer. type=SYMSXP, length = 1
Thanks for the help!
This looks like a bug; There might be some unintended mask of the lag function between dplyr and stats package, try this work around:
df %>%
group_by(team) %>%
# explicitly specify the source of the lag function here
mutate(receive = dplyr::lag(order, n=unique(lead_time), default=0))
#Source: local data frame [10 x 4]
#Groups: team [2]
# team order lead_time receive
# <fctr> <dbl> <dbl> <dbl>
#1 a 2 3 0
#2 a 4 3 0
#3 a 3 3 0
#4 a 5 3 2
#5 a 6 3 4
#6 b 7 2 0
#7 b 8 2 0
#8 b 5 2 7
#9 b 4 2 8
#10 b 5 2 5
We can also use shift from data.table
library(data.table)
setDT(df)[, receive := shift(order, n = lead_time[1], fill=0), by = team]
df
# team order lead_time receive
# 1: a 2 3 0
# 2: a 4 3 0
# 3: a 3 3 0
# 4: a 5 3 2
# 5: a 6 3 4
# 6: b 7 2 0
# 7: b 8 2 0
# 8: b 5 2 7
# 9: b 4 2 8
#10: b 5 2 5

Resources