Extract rows from a single column to form two new columns - r

Update:
I realized that the dummy data frame I created originally does not reflect the structure of the data frame that I am working with. Allow me to rephrase my question here.
Data frame that I'm starting with:
StudentAndClass <- c("Anthropology College_Name","x","y",
"Geology College_Name","z","History College_Name", "x","y","z")
df <- data.frame(StudentAndClass)
Students ("x","y","z") are enrolled in classes that they are listed under. e.g. "x" and "y" are in Anthropology, while "x", "y", "z" are in History.
How can I create the desired data frame below?
Student <- c("x", "y", "z", "x", "y","z")
Class <- c("Anthropology College_Name", "Anthropology College_Name",
"Geology College_Name", "History College_Name",
"History College_Name", "History College_Name")
df_tidy <- data.frame(Student, Class)
Original post:
I have a data frame with observations of two variables merged in a single column like so:
StudentAndClass <- c("A","x","y","A","B","z","B","C","x","y","z","C")
df <- data.frame(StudentAndClass)
where "A", "B", "C" represent classes, and "x", "y", "z" students who are taking these classes. Notice that observations of students are wedged between observations of classes.
I'm wondering how I can create a new data frame with the following format:
Student <- c("x", "y", "z", "x", "y","z")
Class <- c("A", "A", "B", "C", "C", "C")
df_tidy <- data.frame(Student, Class)
I want to extract the rows containing observations of students and put them in a new column, while making sure that each Student observation is paired with the corresponding Class observation in the Class column.

One option is to create a vector
v1 <- c('x', 'y', 'z')
Then split the data based on logical vector and rbind
setNames(do.call(cbind, split(df, !df[,1] %in% v1)), c('Student', 'Class'))
# Student Class
#2 x A
#3 y A
#6 z B
#9 x B
#10 y C
#11 z C
Or with tidyverse
library(tidyverse)
df %>%
group_by(grp = c('Class', 'Student')[(StudentAndClass %in% v1) + 1]) %>%
mutate(n = row_number()) %>%
spread(grp, StudentAndClass) %>%
select(-n)
# A tibble: 6 x 2
# Class Student
#* <fctr> <fctr>
#1 A x
#2 A y
#3 B z
#4 B x
#5 C y
#6 C z
Update
If we need this based on elements between each pair of same 'LETTERS'
grp <- with(df, cummax(match(StudentAndClass, LETTERS[1:3], nomatch = 0)))
do.call(rbind, lapply(split(df, grp), function(x)
data.frame(Class = x[,1][2:(nrow(x)-1)], Student = x[[1]][1], stringsAsFactors=FALSE)))

Updated
In essence, you just need to find which indexes have college names, use those to get the range of students in each college, then subset the main vector by those ranges. Since students aren't guaranteed to be nested between two similar values, you have to be careful about any "empty" colleges.
college_indices <- which(endsWith(StudentAndClass, 'College_Name'))
colleges <- StudentAndClass[college_indices]
bounds_mat <- rbind(
start = college_indices,
end = c(college_indices[-1], length(StudentAndClass))
)
colnames(bounds_mat) <- colleges
bounds_mat['start', ] <- bounds_mat['start', ] + 1
bounds_mat['end', ] <- bounds_mat['end', ] - 1
# This prevents any problems if a college has no listed students
empty_college <- bounds_mat['start', ] > bounds_mat['end', ]
bounds_mat <- bounds_mat[, !empty_college]
class_listing <- apply(
bounds_mat,
2,
function(bounds) {
StudentAndClass[bounds[1]:bounds[2]]
}
)
df_tidy <- data.frame(
Student = unlist(class_listing),
Class = rep(names(class_listing), lengths(class_listing)),
row.names = NULL
)

Related

How to rename multiple Columns in R?

My goal is to get a concise way to rename multiple columns in a data frame. Let's consider a small data frame df as below:
df <- data.frame(a=1, b=2, c=3)
df
Let's say we want to change the names from a, b, and c to Y, W, and Z respectively.
Defining a character vector containing old names and new names.
df names <- c(Y = "a", Z ="b", E = "c")
I would use this to rename the columns,
rename(df, !!!names)
df
suggestions?
One more !:
df <- data.frame(a=1, b=2, c=3)
df_names <- c(Y = "a", Z ="b", E = "c")
library(dplyr)
df %>% rename(!!!df_names)
## Y Z E
##1 1 2 3
A non-tidy way might be through match:
names(df) <- names(df_names)[match(names(df), df_names)]
df
## Y Z E
##1 1 2 3
You could try:
sample(LETTERS[which(LETTERS %in% names(df) == FALSE)], size= length(names(df)), replace = FALSE)
[1] "S" "D" "N"
Here, you don't really care what the new names are as you're using sample. Otherwise a straight forward names(df) < c('name1', 'name2'...

Compare each row in two dataframes in R

I have 2 data frames with account numbers and amounts plus some other irrelevant columns. I would like to compare the output with a Y or N if they match or not.
I need to compare the account number in row 1 in dataframe A to the account number in row 1 in dataframe B and if they match put a Y in a column or an N if they don't. I've managed to get the code to check if there is a match in the entire dataframe but I need to check each row individually.
E.g.
df1
|account.num|x1|x2|x3|
|100|a|b|c|
|101|a|b|c|
|102|a|b|c|
|103|a|b|c|
df2
|account.num|x1|x2|x3|
|100|a|b|c|
|102|a|b|c|
|101|a|b|c|
|103|a|b|c|
output
|account.num|x1|x2|x3|match|
|100|a|b|c|Y|
|101|a|b|c|N|
|102|a|b|c|N|
|103|a|b|c|Y|
So, row 1 matches as they have the same account number, but row 2 doesn't because they are different. However, the other data in the dataframe doesn't matter just that column. Can I do this without merging the data frames? (I did have tables, but they won't work. I don't know why. So sorry if that's hard to follow).
You can use == to compare if account.num is equal, and use this boolean vector to subset c("N", "Y")
df1$match <- c("N", "Y")[1 + (df1[[1]] == df2[[1]])]
df1
# account.num x1 x2 x3 match
#1 100 a b c Y
#2 101 a b c N
#3 102 a b c N
#4 103 a b c Y
Data:
df1 <- data.frame(account.num=100:103, x1="a", x2="b", x3="c")
df2 <- data.frame(account.num=c(100,102,101,103), x1="a", x2="b", x3="c")
If you want a base R solution, here is a quick sketch. Assuming boath dataframes are of the same length (number of rows), it should work with your data.
# example dataframes
a <- data.frame(A=c(1,2,3), B=c("one","two","three"))
b <- data.frame(A=c(3,2,1), B=c("three","two","one"))
res <- c() #initialise empty result vector
for (rownum in c(1:nrow(a))) {
# iterate over all numbers of rows
res[rownum] <- all(a[rownum,]==b[rownum,])
}
res # result vector
# [1] FALSE TRUE FALSE
# you can put it in frame a like this. example colname is "equalB"
a$equalB <- res
If you want a tidyverse solution, you can use left_join.
The principle here would be to try to match the data from df2 to the data from df1. If it matches, it would add TRUE to a match column. Then, the code replace the NA values with FALSE.
I'm also adding code to create the data frames from the exemple.
library(tidyverse)
df1 <-
tribble(~account_num, ~x1, ~x2, ~x3,
100, "a", "b", "c",
101, "a", "b", "c",
102, "a", "b", "c",
103, "a", "b", "c") %>%
rowid_to_column() # because position in the df is an important information,
# I need to hardcode it in the df
df2 <-
tribble(~account_num, ~x1, ~x2, ~x3,
100, "a", "b", "c",
102, "a", "b", "c",
101, "a", "b", "c",
103, "a", "b", "c") %>%
rowid_to_column()
# take a
df1 %>%
# try to match df1 with version of df2 with a new column where `match` = TRUE
# according to `rowid`, `account_num`, `x1`, `x2`, and `x3`
left_join(df2 %>%
tibble::add_column(match = TRUE),
by = c("rowid", "account_num", "x1", "x2", "x3")
) %>%
# replace the NA in `match` with FALSE in the df
replace_na(list(match = FALSE))

Canonical tidyverse method to update some values of a vector from a look-up table

I frequently need to recode some (not all!) values in a data frame column based off of a look-up table. I'm not satisfied by the ways I know of to solve the problem. I'd like to be able to do it in a clear, stable, and efficient way. Before I write my own function, I'd want to make sure I'm not duplicating something standard that's already out there.
## Toy example
data = data.frame(
id = 1:7,
x = c("A", "A", "B", "C", "D", "AA", ".")
)
lookup = data.frame(
old = c("A", "D", "."),
new = c("a", "d", "!")
)
## desired result
# id x
# 1 1 a
# 2 2 a
# 3 3 B
# 4 4 C
# 5 5 d
# 6 6 AA
# 7 7 !
I can do it with a join, coalesce, unselect as below, but this isn't as clear as I'd like - too many steps.
## This works, but is more steps than I want
library(dplyr)
data %>%
left_join(lookup, by = c("x" = "old")) %>%
mutate(x = coalesce(new, x)) %>%
select(-new)
It can also be done with dplyr::recode, as below, converting the lookup table to a named lookup vector. I prefer lookup as a data frame, but I'm okay with the named vector solution. My concern here is that recode is the Questioning lifecycle phase, so I'm worried that this method isn't stable.
lookup_v = pull(lookup, new) %>% setNames(lookup$old)
data %>%
mutate(x = recode(x, !!!lookup_v))
It could also be done with, say, stringr::str_replace, but using regex for whole-string matching isn't efficient. I suppose there is forcats::fct_recode is a stable version of recode, but I don't want a factor output (though mutate(x = as.character(fct_recode(x, !!!lookup_v))) is perhaps my favorite option so far...).
I had hoped that the new-ish rows_update() family of dplyr functions would work, but it is strict about column names, and I don't think it can update the column it's joining on. (And it's Experimental, so doesn't yet meet my stability requirement.)
Summary of my requirements:
A single data column is updated based off of a lookup data frame (preferably) or named vector (allowable)
Not all values in the data are included in the lookup--the ones that are not present are not modified
Must work on character class input. Working more generally is a nice-to-have.
No dependencies outside of base R and tidyverse packages (though I'd also be interested in seeing a data.table solution)
No functions used that are in lifecycle phases like superseded or questioning. Please note any experimental lifecycle functions, as they have future potential.
Concise, clear code
I don't need extreme optimization, but nothing wildly inefficient (like regex when it's not needed)
A direct data.table solution, without %in%.
Depending on the length of the lookup / data tables, adding keys could improve performance substantially, but this isn't the case on this simple example.
library(data.table)
setDT(data)
setDT(lookup)
## If needed
# setkey(data,x)
# setkey(lookup,old)
data[lookup, x:=new, on=.(x=old)]
data
id x
1: 1 a
2: 2 a
3: 3 B
4: 4 C
5: 5 d
6: 6 AA
7: 7 !
Benchmarking
Expanding the original dataset to 10M rows, 15 runs using microbenchmark gave the follow results on my computer:
Note that forcats::fct_recode and dplyr::recode solutions mentioned by the OP have also been included. Neither works with the updated data because the named vector that resolves to . = ! will throw an error, which is why results are tested on the original dataset.
data = data.frame(
id = 1:5,
x = c("A", "A", "B", "C", "D")
)
lookup = data.frame(
old = c("A", "D"),
new = c("a", "d")
)
set.seed(1)
data <- data[sample(1:5, 1E7, replace = T),]
dt_lookup <- data.table::copy(lookup)
dplyr_coalesce <- function(){
library(dplyr)
lookupV <- setNames(lookup$new, lookup$old)
data %>%
dplyr::mutate(x = coalesce(lookupV[ x ], x))
}
datatable_in <- function(){
library(data.table)
lookupV <- setNames(lookup$new, lookup$old)
setDT(dt_data)
dt_data[ x %in% names(lookupV), x := lookupV[ x ] ]
}
datatable <- function(){
library(data.table)
setDT(dt_data)
setDT(dt_lookup)
## If needed
# setkey(data,x)
# setkey(lookup,old)
dt_data[dt_lookup, x:=new, on =.(x=old)]
}
purrr_modify_if <- function(){
library(dplyr)
library(purrr)
lookupV <- setNames(lookup$new, lookup$old)
data %>%
dplyr::mutate(x = modify_if(x, x %in% lookup$old, ~ lookupV[.x]))
}
stringr_str_replace_all_update <- function(){
library(dplyr)
library(stringr)
lookupV <- setNames(lookup$new, do.call(sprintf, list("^\\Q%s\\E$", lookup$old)))
data %>%
dplyr::mutate(x = str_replace_all(x, lookupV))
}
base_named_vector <- function(){
lookupV <- c(with(lookup, setNames(new, old)), rlang::set_names(setdiff(unique(data$x), lookup$old)))
lookupV[data$x]
}
base_ifelse <- function(){
lookupV <- setNames(lookup$new, lookup$old)
with(data, ifelse(x %in% lookup$old, lookup$new, x))
}
plyr_mapvalues <- function(){
library(plyr)
data %>%
dplyr::mutate(x = plyr::mapvalues(x, lookup$old, lookup$new, warn_missing = F))
}
base_match <- function(){
tochange <- match(data$x, lookup$old, nomatch = 0)
data$x[tochange > 0] <- lookup$new[tochange]
}
base_local_safe_lookup <- function(){
lv <- structure(lookup$new, names = lookup$old)
safe_lookup <- function(val) {
new_val <- lv[val]
unname(ifelse(is.na(new_val), val, new_val))
}
safe_lookup(data$x)
}
dplyr_recode <- function(){
library(dplyr)
lookupV <- setNames(lookup$new, lookup$old)
data %>%
dplyr::mutate(x = recode(x, !!!lookupV))
}
base_for <- function(){
for (i in seq_len(nrow(lookup))) {
data$x[data$x == lookup$old[i]] = lookup$new[i]
}
}
datatable_for <- function(){
library(data.table)
setDT(dt_data)
for (i in seq_len(nrow(lookup))) {
dt_data[x == lookup$old[i], x := lookup$new[i]]
}
}
forcats_fct_recode <- function(){
library(dplyr)
library(forcats)
lookupV <- setNames(lookup$new, lookup$old)
data %>%
dplyr::mutate(x = as.character(fct_recode(x, !!!lookupV)))
}
datatable_set <- function(){
library(data.table)
setDT(dt_data)
tochange <- dt_data[, chmatch(x, lookup$old, nomatch = 0)]
set(dt_data, i = which(tochange > 0), j = "x", value = lookup$new[tochange])
}
library(microbenchmark)
bench <- microbenchmark(dplyr_coalesce(),
datatable(),
datatable_in(),
datatable_for(),
base_for(),
purrr_modify_if(),
stringr_str_replace_all_update(),
base_named_vector(),
base_ifelse(),
plyr_mapvalues(),
base_match(),
base_local_safe_lookup(),
dplyr_recode(),
forcats_fct_recode(),
datatable_set(),
times = 15L,
setup = dt_data <- data.table::copy(data))
bench$expr <- forcats::fct_rev(forcats::fct_reorder(bench$expr, bench$time, mean))
ggplot2::autoplot(bench)
Thanks to #Waldi and #nicola for advice implementing data.table solutions in the benchmark.
Combination of a named vector and coalesce:
# make lookup vector
lookupV <- setNames(lookup$new, lookup$old)
data %>%
mutate(x = coalesce(lookupV[ x ], x))
# id x
# 1 1 a
# 2 2 a
# 3 3 B
# 4 4 C
# 5 5 d
Or data.table:
library(data.table)
setDT(data)
data[ x %in% names(lookupV), x := lookupV[ x ] ]
This post might have a better solution for data.table - "update on merge":
R data table: update join
A base R option using %in% and match - thanks to #LMc & #nicola
tochange <- match(data$x, lookup$old, nomatch = 0)
data$x[tochange > 0] <- lookup$new[tochange]
One more data.table option using set() and chmatch
library(data.table)
setDT(data)
tochange <- data[, chmatch(x, lookup$old, nomatch = 0)]
set(data, i = which(tochange > 0), j = "x", value = lookup$new[tochange])
Result
data
# id x
#1 1 a
#2 2 a
#3 3 B
#4 4 C
#5 5 d
#6 6 AA
#7 7 !
modify_if
You could use purrr::modify_if to only apply the named vector to values that exist in it. Though not a specified requirement, it has the benefit of the .else argument, which allows you to apply a different function to values not in your lookup.
I also wanted to include the use of tibble::deframe here to create the named vector. It is slower than setNames, though.
lookupV <- deframe(lookup)
data %>%
mutate(x = modify_if(x, x %in% lookup$old, ~ lookupV[.x]))
str_replace_all
Alternatively, you could use stringr::str_replace_all, which can take a named vector for the replacement argument.
data %>%
mutate(x = str_replace_all(x, lookupV))
Update
To accommodate the change to your edited example, the named vector used in str_replace_all needs to be modified. In this way, the entire literal string needs to be match so that "A" does not get substituted in "AA", or "." does not replace everything:
lookupV <- setNames(lookup$new, do.call(sprintf, list("^\\Q%s\\E$", lookup$old)))
data %>%
mutate(x = str_replace_all(x, lookupV))
left_join
Using dplyr::left_join this is very similar to OP solution, but uses .keep argument of mutate so it has less steps. This argument is currently in the experimental lifecycle and so it is not included in the benchmark (though it is around the middle of posted solutions).
left_join(data, lookup, by = c("x" = "old")) %>%
mutate(x = coalesce(new, x) , .keep = "unused")
Base R
Named Vector
Create a substitution value for every unique value in your dataframe.
lookupV <- c(with(lookup, setNames(new, old)), setNames(nm = setdiff(unique(data$x), lookup$old)))
data$x <- lookupV[data$x]
ifelse
with(data, ifelse(x %in% lookup$old, lookupV[x], x))
Another option that is clear is to use a for-loop with subsetting to loop through the rows of the lookup table. This will almost always be quicker with data.table because of auto indexing, or if you set the key (i.e., ?data.table::setkey()) ahead of time. Also, it will--of course--get slower as the lookup table gets longer. I would guess an update-join would be preferred if there is a long lookup table.
Base R:
for (i in seq_len(nrow(lookup))) {
data$x[data$x == lookup$old[i]] <- lookup$new[i]
}
data$x
# [1] "a" "a" "B" "C" "d" "AA" "!"
Or the same logic with data.table:
library(data.table)
setDT(data)
for (i in seq_len(nrow(lookup))) {
data[x == lookup$old[i], x := lookup$new[i]]
}
data$x
# [1] "a" "a" "B" "C" "d" "AA" "!"
Data:
data = data.frame(
id = 1:7,
x = c("A", "A", "B", "C", "D", "AA", ".")
)
lookup = data.frame(
old = c("A", "D", "."),
new = c("a", "d", "!")
)
Another base solution, with a lookup vector:
## Toy example
data = data.frame(
id = 1:5,
x = c("A", "A", "B", "C", "D"),
stringsAsFactors = F
)
lookup = data.frame(
old = c("A", "D"),
new = c("a", "d"),
stringsAsFactors = F
)
lv <- structure(lookup$new, names = lookup$old)
safe_lookup <- function(val) {
new_val <- lv[val]
unname(ifelse(is.na(new_val), val, new_val))
}
data$x <- safe_lookup(data$x)
dplyr+plyr solution that is in order with all ur bulletpoints (if u consider plyr in the the tidyverse):
data <- data %>%
dplyr::mutate(
x = plyr::mapvalues(x, lookup$old, lookup$new) #Can add , F to remove warnings
)
I basically share the same problem. Although dplyr::recode is in the "questioning" life cycle I don't expect it to become deprecated. At some point it might be superseded, but even in this case it should still be usable. Therefore I'm using a wrapper around dplyr::recode which allows the use of named vectors and or two vectors (which could be a lookup table).
library(dplyr)
library(rlang)
recode2 <- function(x, new, old = NULL, .default = NULL, .missing = NULL) {
if (!rlang::is_named(new) && !is.null(old)) {
new <- setNames(new, old)
}
do.call(dplyr::recode,
c(.x = list(x),
.default = list(.default),
.missing = list(.missing),
as.list(new)))
}
data = data.frame(
id = 1:7,
x = c("A", "A", "B", "C", "D", "AA", ".")
)
lookup = data.frame(
old = c("A", "D", "."),
new = c("a", "d", "!")
)
# two vectors new / old
data %>%
mutate(x = recode2(x, lookup$new, lookup$old))
#> id x
#> 1 1 a
#> 2 2 a
#> 3 3 B
#> 4 4 C
#> 5 5 d
#> 6 6 AA
#> 7 7 !
# named vector
data %>%
mutate(x = recode2(x, c("A" = "a",
"D" = "d",
"." = "!")))
#> id x
#> 1 1 a
#> 2 2 a
#> 3 3 B
#> 4 4 C
#> 5 5 d
#> 6 6 AA
#> 7 7 !
Created on 2021-04-21 by the reprex package (v0.3.0)

How can I create a new data frame with several rows for each observation based on string column?

I have a data frame in R with data on observations. One column contains several data points for each observation recorded as one long string with separators. I would like to restructure this data so that one observation can occur with several rows instead per the example below.
The data right now looks like this:
df <- data.frame(matrix(c("A", "B",
"X", "Y",
"{data1},{data2}", "{data1}"),
nrow = 2,
ncol = 3,
byrow = F))
names(df) <- c("key", "info", "more_info")
I would like it to look like this:
df <- data.frame(matrix(c("A", "A", "B",
"X", "X", "Y",
"{data1}", "{data2}", "{data1}"),
nrow = 3,
ncol = 3,
byrow = F))
names(df) <- c("key", "info", "more_info")
My first idea was to first use separate() and then use pivot_longer() but this ran into issues since the length of the last column is not the same for each observation. In fact, for some observations it may consist of hundreds of records.
You can use separate_rows from tidyr:
> library(tidyr)
> separate_rows(df, more_info, sep=",")
# A tibble: 3 x 3
key info more_info
<fct> <fct> <chr>
1 A X {data1}
2 A X {data2}
3 B Y {data1}
An option with unnest after strsplit
library(dplyr)
library(tidyr)
df %>%
mutate(more_info = strsplit(more_info, ",")) %>%
unnest(c(more_info))

Extracting first column that meets certain criteria for each row

I will try to explain what I am doing the best I can it is kind of confusing but I'll give it a shot. Essentially I start with 2 data frames. Each one containing a unique row per person and two items per user as columns. My goal is to turn this into 1 data frame with one unique row per user and the first item from each of the two data frames upon the condition that the items do not repeat. For example if for customer 1 in the first data frame his items are "a" and "d" and in the second data frame his items are "a" and "c", I would want the final data frame to be "a" and "c" for this customer. I have written an apply that does this however when I perform this on roughly 160,000 rows it takes quite a bit of time. I was hoping someone would be able to come up with a more efficient solution to my problem.
d1 <- data.frame(id = c("1", "2", "3"), stringsAsFactors = F)
r1 <- data.frame(i1 = c("a", "b", "c"), i2 = c("d", "e", "f"), stringsAsFactors = F)
rownames(r1) = d1$id
r2 <- data.frame(i1 = c("a", "c", "f"), i2 = c("c", "t", "l"), stringsAsFactors = F)
rownames(r2) = d1$id
dFinal <- data.frame(id = d1$id, r1 = "", r2 = "", stringsAsFactors = F)
dFinal$r1 = apply(dFinal, 1, function(x){r1[rownames(r1) == x["id"], "i1"]})
dFinal$r2 = apply(dFinal, 1, function(x){r2[rownames(r2) == x["id"], which(!r2[rownames(r2) == x["id"],c("i1","i2")] %in% x["r1"])[1]]})
Would the following do what you're looking for:
# Keep only first column of first data.frame
df <- cbind(d1,r1,r2)[,-3]
names(df) <- c("id","r1_final","r2_i1","r2_i2")
df$r2_final <- df$r2_i1
# Keep only second column of second data.frame
# if the value in the first column is found in first data.frame
df[df$r1_final == df$r2_i1,"r2_final"] <- df[df$r1_final == df$r2_i1,"r2_i2"]
df_final <- df[,c("id","r1_final","r2_final")]
print(df_final)
id r1_final r2_final
1 1 a c
2 2 b c
3 3 c f
Edit:
OP asked for a solution if there were four data.frames instead of 2 like in the example, here is some code that I haven't tested but it should work with two additional columns
df$r2_final <- df$r2_i1
df$r3_final <- df$r3_i1
df$r4_final <- df$r4_i1
df[df$r1_final == df$r2_i1,"r2_final"] <- df[df$r1_final == df$r2_i1,"r2_i2"]
df[df$r3_i1 %in% c(df$r1_final,df$r2_final),"r3_final"] <- df[df$r3_i1 %in% c(df$r1_final,df$r2_final),"r3_i2"]
df[df$r4_i1 %in% c(df$r1_final,df$r2_final,df$r3_final),"r4_final"] <- df[df$r4_i1 %in% c(df$r1_final,df$r2_final,df$r3_final),"r4_i2"]
df_final <- df[,c("id","r1_final","r2_final","r3_final","r4_final")]
Thanks for the accepted answer as it worked perfectly! However it gave me an idea to use ifelse. While it doesn't work any better or worse than the accepted answer it was a little easier for me to wrap my head around when adding more columns or data frames.
dfInt <- cbind(df1, df2, df3, df4)
dfInt$R1_Final <- dfInt$R1_1
dfInt$R2_Final <- ifelse(dfInt$R1_Final == dfInt$R2_1,
dfInt$R2_2,
dfInt$R2_1)
dfInt$R3_Final <- ifelse(dfInt$R1_Final != dfInt$R3_1 & dfInt$R2_Final != dfInt$R3_1,
dfInt$R3_1,
ifelse(dfInt$R2_Final != dfInt$R3_2,
dfInt$R3_2,
dfInt$R3_3))
dfInt$R4_Final <- ifelse(dfInt$R1_Final != dfInt$R4_1 & dfInt$R2_Final != dfInt$R4_1 & dfInt$R3_Final != dfInt$R4_1,
dfInt$R4_1,
ifelse(dfInt$R2_Final != dfInt$R4_2 & dfInt$R3_Final != dfInt$R4_2,
dfInt$R4_2,
ifelse(dfInt$R3_Final != dfInt$R4_3,
dfInt$R4_3,
dfInt$R4_4)))

Resources