Finding difference based on interaction between two vectors - r

I have two data frames with two columns each I'd like to compare, and generate output that appears in the first dataframe only that is the difference of the interaction of the two columns when compared between the dataframes.
I've tried using merge, %in%, Interaction, match, and I can't seem to get the correct output. I've also searched extensively on SO and am not finding a similar problem.
The closest response I've found is:
newdat <- match(interaction(dfA$colA, dfA$colB), interaction(dfB$colA, dfB$colB))
But obviously, this code isn't correct as this would (if working) would give me something that is common between the dataframes, and I want the difference between them (erroring - it generates a numeric vector, when both colA and B are string).
Example data:
#Dataframe A
colA colB
Aspirin Smith, John
Aspirin Doe, Jane
Atorva Smith, John
Simva Doe, Jane
#Dataframe B
colA colB
Aspirin Smith, John
Aspirin Doe, Jane
Atorva Doe, Jane
## GOAL:
#Dataframe
colA colB
Atorva Smith, John
Simva Doe, Jane
Thanks!

We can use setdiff from the dplyr package.
library(dplyr)
setdiff(datA, datB)
# colA colB
# 1 Atorva Smith, John
# 2 Simva Doe, Jane
DATA
datA <- read.table(text = " colA colB
Aspirin 'Smith, John'
Aspirin 'Doe, Jane'
Atorva 'Smith, John'
Simva 'Doe, Jane'",
header = TRUE, stringsAsFactors = FALSE)
datB <- read.table(text = " colA colB
Aspirin 'Smith, John'
Aspirin 'Doe, Jane'
Atorva 'Doe, Jane'",
header = TRUE, stringsAsFactors = FALSE)

If you want a base R solution, it's easy to write a setdiffDF function.
setdiffDF <- function(x, y){
ix <- !duplicated(rbind(y, x))[nrow(y) + 1:nrow(x)]
x[ix, ]
}
setdiffDF(dfA, dfB)
# colA colB
#3 Atorva Smith, John
#4 Simva Doe, Jane
Data in dput format.
dfA <-
structure(list(colA = structure(c(1L, 1L, 2L, 3L),
.Label = c("Aspirin", "Atorva", "Simva"), class = "factor"),
colB = structure(c(2L, 1L, 2L, 1L), .Label = c("Doe, Jane",
"Smith, John"), class = "factor")), class = "data.frame",
row.names = c(NA, -4L))
dfB <-
structure(list(colA = structure(c(1L, 1L, 2L),
.Label = c("Aspirin", "Atorva"), class = "factor"),
colB = structure(c(2L, 1L, 1L), .Label = c("Doe, Jane",
"Smith, John"), class = "factor")), class = "data.frame",
row.names = c(NA, -3L))

Related

How to combine values within a row with comma

I would like to know, How to combine columns in a dataframe/list in R with a comma separator. Below is the sample dataset.
Name Red Blue Green
Jack 4 5 3
John 5 6 4
Gen 3 7 1
Pra 4 6 2
Expected would be:
Name Colors
Jack 4,5,3
John 5,6,4
Gen 3,7,1
Pra 4,6,2
Immediate help would be appreciated.
Thanks in advance
We can use paste with do.call. Note that even if you have 100 columns to paste, the below code does it automatically without having to painfully mention paste(df1$Red, df1$blue, df1$Green, df1$Orange, etc..., sep=",") etc.
newdf1 <- cbind(df1[1], Colors=do.call(paste, c(df1[-1], sep=",")))
newdf1
# Name Colors
#1 Jack 4,5,3
#2 John 5,6,4
#3 Gen 3,7,1
#4 Pra 4,6,2
Or a similar option with sprintf
cbind(df1[1], Colors=do.call(sprintf, c(df1[-1], list(fmt="%d,%d,%d"))))
Or with unite from tidyr
library(dplyr)
library(tidyr)
df1 %>%
unite(Colors, Red:Green, sep=",")
# Name Colors
#1 Jack 4,5,3
#2 John 5,6,4
#3 Gen 3,7,1
#4 Pra 4,6,2
I'd suggest the paste function with a "," separator.
df$Colors<-paste(df$Red, df$Blue, df$Green, sep =",")
You can achieve this by using the unite function from the tidyr package:
tidyr::unite(df_test, Color, -Name, sep = ', ')
data:
structure(list(Name = c("Jack", "John", "Gen", "Pra"),
Red = c(4L, 5L, 3L, 4L),
Blue = c(5L, 6L, 7L, 6L),
Green = c(3L, 4L, 1L, 2L)),
class = "data.frame",
row.names = c(NA, -4L)) -> df_test
I would copy the data into excel, add a column with the value , and add a formulae =A1&D1&B1&D1&C1
A,B,C is the columns, D is the comma

How to divide the column values in to ranges

I would like to know how to divide the column values in to three different ranges based on scores.
Here's the following data I have
Name V1.1 V1.2 V2.1 V2.2 V3.1 V3.2
John French 86 Math 78 English 56
Sam Math 97 French 86 English 79
Viru English 93 Math 44 French 34
If I consider three ranges. First rangewith 0-60, Second rangewith 61-90 and third range with 91-100.
I would like to the subject name across all the skills.
Expected result would be
Name Level1 Level2 Level3
John English Math,French Null
Sam Null French,Eng Math
Viru Math,Fren Null English
First you need to convert the data to long form, one row per observation (where an observation is a single score. You need to do a melt, but it is complicated because your wide form consists of not only observations but observation classes. One way to do it is to use melt.data.table twice, but you may be more comfortable with dplyr, which has more accessible syntax.
# first you need to convert to long form
library("data.table")
setDT(df)
lhs <- melt.data.table(df, id = "Name", measure = patterns("\\.2"),
variable.name = "obs", value.name = "score")
lhs[, obs := gsub("(V\\d+)\\.\\d+","\\1",obs)]
lhs
rhs <- melt.data.table(df, id = "Name", measure = patterns("V\\d\\.1"),
variable.name = "obs", value.name = "subject")
rhs[, obs := gsub("(V\\d+)\\.\\d+","\\1",obs)]
rhs
df2 <- merge(lhs, rhs, by = c("Name","obs"))
# Name obs score1 subject1
# 1: John V1 86 French
# 2: John V2 78 Math
# 3: John V3 56 English
# 4: Sam V1 97 Math
# 5: Sam V2 86 French
# 6: Sam V3 79 English
# 7: Viru V1 93 English
# 8: Viru V2 44 Math
# 9: Viru V3 34 French
Then you need to use cut or some other function to create the three levels based on score1.
Then you should group by these levels and apply concatenation to the subjects, such as paste(..., collapse = ",").
Then you need to use cast or spread to return it to wide form.
Do give it some effort, and edit your question with what you've tried, and try to come up with a more specific question, not just "please do this for me".
Another option using splitstackshape and nested ifelse
library(splitstackshape)
library(tidyr)
# prepare data to convert in long format
data$subjects = do.call(paste, c(data[,grep("\\.1", colnames(data))], sep = ','))
data$marks = do.call(paste, c(data[,grep("\\.2", colnames(data))], sep = ','))
data = data[,-grep("V", colnames(data))]
# use cSplit to convert wide to long
out = cSplit(setDT(data), sep = ",", c("subjects", "marks"), "long")
# nested ifelse to assign level based on the score range
out[, level := ifelse(marks <= 60, "level1",
ifelse(marks <= 90, "level2", "level3"))]
req = out[, toString(subjects), by= c("Name","level")]
this will give
#> req
# Name level V1
#1: John level2 French, Math
#2: John level1 English
#3: Sam level3 Math
#4: Sam level2 French, English
#5: Viru level3 English
#6: Viru level1 Math, French
you can reshape either using dcast or spread from tidyr
spread(req, level, V1)
# Name level1 level2 level3
#1: John English French, Math NA
#2: Sam NA French, English Math
#3: Viru Math, French NA English
data
data = structure(list(Name = structure(1:3, .Label = c("John", "Sam",
"Viru"), class = "factor"), V1.1 = structure(c(2L, 3L, 1L), .Label = c("English",
"French", "Math"), class = "factor"), V1.2 = c(86L, 97L, 93L),
V2.1 = structure(c(2L, 1L, 2L), .Label = c("French", "Math"
), class = "factor"), V2.2 = c(78L, 86L, 44L), V3.1 = structure(c(1L,
1L, 2L), .Label = c("English", "French"), class = "factor"),
V3.2 = c(56L, 79L, 34L)), .Names = c("Name", "V1.1", "V1.2",
"V2.1", "V2.2", "V3.1", "V3.2"), class = "data.frame", row.names = c(NA,
-3L))
Not very intuitive, but leads to the requested output. Requires the package sjmisc!
library(sjmisc)
mydat <- data.frame(Name = c("John", "Sam", "Viru"),
V1.1 = c("French", "Math", "English"),
V1.2 = c(86, 97, 93),
V2.1 = c("Math", "French", "Math"),
V2.2 = c(78, 86, 44),
V3.1 = c("English", "English", "French"),
V3.2 = c(56, 79, 34))
# recode into groups
rec(mydat[, c(3,5,7)]) <- "min:60=1; 61:90=2; 91:max=3"
# convert to long format
newdf <- to_long(mydat, "no_use",
c("subject", "score"),
c("V1.1", "V2.1", "V3.1"),
c("V1.2", "V2.2", "V3.2")) %>%
select(-no_use) %>%
arrange(Name, score)
# at this point we are at a similar stage as described in the
# other answers, so we have our data in a long format
newdf
fdf <- list()
# iterate all unique names
for (i in unique(newdf$Name)) {
dummy <- c()
# iterare all three scores
for (s in 1:3) {
# find subjects related to the score
dat <- newdf$subject[newdf$Name == i & newdf$score == s]
if (length(dat) == 0) dat <- ""
dat <- paste0(dat, collapse = ",")
dummy <- c(dummy, dat)
}
# add character vector with sorted subjects to list
fdf[[length(fdf) + 1]] <- dummy
}
# list to data frame
finaldf <- as.data.frame(t(as.data.frame(fdf)))
finaldf <- cbind(unique(newdf$Name), finaldf)
# proper row/col names
colnames(finaldf) <- c("Names", "Level1", "Level2", "Level3")
rownames(finaldf) <- 1:nrow(finaldf)
finaldf

Tidy data Melt and Cast

In the Wickham's Tidy Data pdf he has an example to go from messy to tidy data.
I wonder where the code is?
For example, what code is used to go from
Table 1: Typical presentation dataset.
to
Table 3: The same data as in Table 1 but with variables in columns and observations in rows.
Per haps melt or cast. But from http://www.statmethods.net/management/reshape.html I cant see how.
(Note to self: Need it for GDPpercapita...)
The answer sort of depends on what the structure of your data are. In the paper you linked to, Hadley was writing about the "reshape" and "reshape2" packages.
It's ambiguous what the data structure is in "Table 1". Judging by the description, it would sound like a matrix with named dimnames (like I show in mymat). In that case, a simple melt would work:
library(reshape2)
melt(mymat)
# Var1 Var2 value
# 1 John Smith treatmenta —
# 2 Jane Doe treatmenta 16
# 3 Mary Johnson treatmenta 3
# 4 John Smith treatmentb 2
# 5 Jane Doe treatmentb 11
# 6 Mary Johnson treatmentb 1
If it were not a matrix, but a data.frame with row.names, you can still use the matrix method by using something like melt(as.matrix(mymat)).
If, on the other hand, the "names" are a column in a data.frame (as they are in the "tidyr" vignette, you need to specify either the id.vars or the measure.vars so that melt knows how to treat the columns.
melt(mydf, id.vars = "name")
# name variable value
# 1 John Smith treatmenta —
# 2 Jane Doe treatmenta 16
# 3 Mary Johnson treatmenta 3
# 4 John Smith treatmentb 2
# 5 Jane Doe treatmentb 11
# 6 Mary Johnson treatmentb 1
The new kid on the block is "tidyr". The "tidyr" package works with data.frames because it is often used in conjunction with dplyr. I won't reproduce the code for "tidyr" here, because that is sufficiently covered in the vignette.
Sample data:
mymat <- structure(c("—", "16", "3", " 2", "11", " 1"), .Dim = c(3L,
2L), .Dimnames = list(c("John Smith", "Jane Doe", "Mary Johnson"
), c("treatmenta", "treatmentb")))
mydf <- structure(list(name = structure(c(2L, 1L, 3L), .Label = c("Jane Doe",
"John Smith", "Mary Johnson"), class = "factor"), treatmenta = c("—",
"16", "3"), treatmentb = c(2L, 11L, 1L)), .Names = c("name",
"treatmenta", "treatmentb"), row.names = c(NA, 3L), class = "data.frame")

Find discrepancies between two tables

I'm working with R from a SAS/SQL background, and am trying to write code to take two tables, compare them, and provide a list of the discrepancies. This code would be used repeatedly for many different sets of tables, so I need to avoid hardcoding.
I'm working with Identifying specific differences between two data sets in R , but it doesn't get me all the way there.
Example Data, using the combination of LastName/FirstName (which is unique) as a key --
Dataset One --
Last_Name First_Name Street_Address ZIP VisitCount
Doe John 1234 Main St 12345 20
Doe Jane 4321 Tower St 54321 10
Don Bob 771 North Ave 23232 5
Smith Mike 732 South Blvd. 77777 3
Dataset Two --
Last_Name First_Name Street_Address ZIP VisitCount
Doe John 1234 Main St 12345 20
Doe Jane 4111 Tower St 32132 17
Donn Bob 771 North Ave 11111 5
Desired Output --
LastName FirstName VarName TableOne TableTwo
Doe Jane StreetAddress 4321 Tower St 4111 Tower St
Doe Jane Zip 23232 32132
Doe Jane VisitCount 5 17
Note that this output ignores records where I don't have the same ID in both tables (for instance, because Bob's last name is "Don" in one table, and "Donn" in another table, we ignore that record entirely).
I've explored doing this by applying the melt function on both datasets, and then comparing them, but the size data I'm working with indicates that wouldn't be practical. In SAS, I used Proc Compare for this kind of work, but I haven't found an exact equivalent in R.
Here is a solution based on data.table:
library(data.table)
# Convert into data.table, melt
setDT(d1)
d1 <- d1[, list(VarName = names(.SD), TableOne = unlist(.SD, use.names = F)),by=c('Last_Name','First_Name')]
setDT(d2)
d2 <- d2[, list(VarName = names(.SD), TableTwo = unlist(.SD, use.names = F)),by=c('Last_Name','First_Name')]
# Set keys for merging
setkey(d1,Last_Name,First_Name,VarName)
# Merge, remove duplicates
d1[d2,nomatch=0][TableOne!=TableTwo]
# Last_Name First_Name VarName TableOne TableTwo
# 1: Doe Jane Street_Address 4321 Tower St 4111 Tower St
# 2: Doe Jane ZIP 54321 32132
# 3: Doe Jane VisitCount 10 17
where input data sets are:
# Input Data Sets
d1 <- structure(list(Last_Name = c("Doe", "Doe", "Don", "Smith"), First_Name = c("John",
"Jane", "Bob", "Mike"), Street_Address = c("1234 Main St", "4321 Tower St",
"771 North Ave", "732 South Blvd."), ZIP = c(12345L, 54321L,
23232L, 77777L), VisitCount = c(20L, 10L, 5L, 3L)), .Names = c("Last_Name",
"First_Name", "Street_Address", "ZIP", "VisitCount"), class = "data.frame", row.names = c(NA, -4L))
d2 <- structure(list(Last_Name = c("Doe", "Doe", "Donn"), First_Name = c("John",
"Jane", "Bob"), Street_Address = c("1234 Main St", "4111 Tower St",
"771 North Ave"), ZIP = c(12345L, 32132L, 11111L), VisitCount = c(20L,
17L, 5L)), .Names = c("Last_Name", "First_Name", "Street_Address",
"ZIP", "VisitCount"), class = "data.frame", row.names = c(NA, -3L))
dplyr and tidyr work well here. First, a slightly reduced dataset:
dat1 <- data.frame(Last_Name = c('Doe', 'Doe', 'Don', 'Smith'),
First_Name = c('John', 'Jane', 'Bob', 'Mike'),
ZIP = c(12345, 54321, 23232, 77777),
VisitCount = c(20, 10, 5, 3),
stringsAsFactors = FALSE)
dat2 <- data.frame(Last_Name = c('Doe', 'Doe', 'Donn'),
First_Name = c('John', 'Jane', 'Bob'),
ZIP = c(12345, 32132, 11111),
VisitCount = c(20, 17, 5),
stringsAsFactors = FALSE)
(Sorry, I didn't want to type it all in. If it's important, please provide a reproducible example with well-defined data structures.)
Additionally, it looks like your "desired output" is a little off with Jane Doe's ZIP and VisitCount.
Your thought to melt them works well:
library(dplyr)
library(tidyr)
dat1g <- gather(dat1, key, value, -Last_Name, -First_Name)
dat2g <- gather(dat2, key, value, -Last_Name, -First_Name)
head(dat1g)
## Last_Name First_Name key value
## 1 Doe John ZIP 12345
## 2 Doe Jane ZIP 54321
## 3 Don Bob ZIP 23232
## 4 Smith Mike ZIP 77777
## 5 Doe John VisitCount 20
## 6 Doe Jane VisitCount 10
From here, it's deceptively simple:
dat1g %>%
inner_join(dat2g, by = c('Last_Name', 'First_Name', 'key')) %>%
filter(value.x != value.y)
## Last_Name First_Name key value.x value.y
## 1 Doe Jane ZIP 54321 32132
## 2 Doe Jane VisitCount 10 17
The dataCompareR package aims to solve this exact problem. The vignette for the package includes some simple examples, and I've used this package to solve the original problem below.
Disclaimer: I was involved with creating this package.
library(dataCompareR)
d1 <- structure(list(Last_Name = c("Doe", "Doe", "Don", "Smith"), First_Name = c("John", "Jane", "Bob", "Mike"), Street_Address = c("1234 Main St", "4321 Tower St", "771 North Ave", "732 South Blvd."), ZIP = c(12345L, 54321L, 23232L, 77777L), VisitCount = c(20L, 10L, 5L, 3L)), .Names = c("Last_Name", "First_Name", "Street_Address", "ZIP", "VisitCount"), class = "data.frame", row.names = c(NA, -4L))
d2 <- structure(list(Last_Name = c("Doe", "Doe", "Donn"), First_Name = c("John", "Jane", "Bob"), Street_Address = c("1234 Main St", "4111 Tower St", "771 North Ave"), ZIP = c(12345L, 32132L, 11111L), VisitCount = c(20L, 17L, 5L)), .Names = c("Last_Name", "First_Name", "Street_Address", "ZIP", "VisitCount"), class = "data.frame", row.names = c(NA, -3L))
compd1d2 <- rCompare(d1, d2, keys = c("First_Name", "Last_Name"))
print(compd1d2)
All columns were compared, 3 row(s) were dropped from comparison
There are 3 mismatched variables:
First and last 5 observations for the 3 mismatched variables
FIRST_NAME LAST_NAME valueA valueB variable typeA typeB diffAB
1 Jane Doe 4321 Tower St 4111 Tower St STREET_ADDRESS character character
2 Jane Doe 10 17 VISITCOUNT integer integer -7
3 Jane Doe 54321 32132 ZIP integer integer 22189
To get a more detailed and pretty summary, the user can run
summary(compd1d2)
The use of FIRST_NAME and LAST_NAME as the 'join' between the two tables is controlled by the keys = argument to the rCompare function. In this case any rows that do not match on these two variables are dropped from the comparison, but you can get a more detailed output on the comparison performed by using summary

transform one long row in data-frame to individual records

I have a variable list of people I get as one long row in a data frame and I am interested to reorganize these record into a more meaningful format.
My raw data looks like this,
df <- data.frame(name1 = "John Doe", email1 = "John#Doe.com", phone1 = "(444) 444-4444", name2 = "Jane Doe", email2 = "Jane#Doe.com", phone2 = "(444) 444-4445", name3 = "John Smith", email3 = "John#Smith.com", phone3 = "(444) 444-4446", name4 = NA, email4 = "Jane#Smith.com", phone4 = NA, name5 = NA, email5 = NA, phone5 = NA)
df
# name1 email1 phone1 name2 email2 phone2
# 1 John Doe John#Doe.com (444) 444-4444 Jane Doe Jane#Doe.com (444) 444-4445
# name3 email3 phone3 name4 email4 phone4 name5
# 1 John Smith John#Smith.com (444) 444-4446 NA Jane#Smith.com NA NA
# email5 phone5
# 1 NA NA
and I am trying to bend it into a format like this,
df_transform <- structure(list(name = structure(c(2L, 1L, 3L, NA, NA), .Label = c("Jane Doe",
"John Doe", "John Smith"), class = "factor"), email = structure(c(3L,
1L, 4L, 2L, NA), .Label = c("Jane#Doe.com", "Jane#Smith.com",
"John#Doe.com", "John#Smith.com"), class = "factor"), phone = structure(c(1L,
2L, 3L, NA, NA), .Label = c("(444) 444-4444", "(444) 444-4445",
"(444) 444-4446"), class = "factor")), .Names = c("name", "email",
"phone"), class = "data.frame", row.names = c(NA, -5L))
df_transform
# name email phone
# 1 John Doe John#Doe.com (444) 444-4444
# 2 Jane Doe Jane#Doe.com (444) 444-4445
# 3 John Smith John#Smith.com (444) 444-4446
# 4 <NA> Jane#Smith.com <NA>
# 5 <NA> <NA> <NA>
It should be added that it's not always five record, it could be any number between 1 and 99. I tried with reshape2's melt and `t()1 but it got way to complicated. I imagine there is some know method that I simply do not know about.
You're on the right track, try this:
library(reshape2)
# melt it down
df.melted = melt(t(df))
# get rid of the numbers at the end
df.melted$Var1 = sub('[0-9]+$', '', df.melted$Var1)
# cast it back
dcast(df.melted, (seq_len(nrow(df.melted)) - 1) %/% 3 ~ Var1)[,-1]
# email name phone
#1 John#Doe.com John Doe (444) 444-4444
#2 Jane#Doe.com Jane Doe (444) 444-4445
#3 John#Smith.com John Smith (444) 444-4446
#4 Jane#Smith.com <NA> <NA>
#5 <NA> <NA> <NA>
1) reshape() First we strip off the digits from the column names giving the reduced column names, names0. Then we split the columns into groups producing g (which has three components corresponding to the email, name and phone column groups). Then use reshape (from the base of R) to perform the wide to long transformation and select from the resulting long data frame the desired columns in order to exclude the columns that are added automatically by reshape. That selection vector, unique(names0), is such that it reorders the resulting columns in the desired way.
names0 <- sub("\\d+$", "", names(df))
g <- split(names(df), names0)
reshape(df, dir = "long", varying = g, v.names = names(g))[unique(names0)]
and the last line gives this:
name email phone
1.1 John Doe John#Doe.com (444) 444-4444
1.2 Jane Doe Jane#Doe.com (444) 444-4445
1.3 John Smith John#Smith.com (444) 444-4446
1.4 <NA> Jane#Smith.com <NA>
1.5 <NA> <NA> <NA>
2) reshape2 package Here is a solution using reshape2. We add a rowname column to df and melt it to long form. Then we split the variable column into the name portion (name, email, phone) and the numeric suffix portion which we call id. Finally we convert it back to wide form using dcast and select out the appropriate columns as we did before.
library(reshape2)
m <- melt(data.frame(rowname = 1:nrow(df), df), id = 1)
mt <- transform(m,
variable = sub("\\d+$", "", variable),
id = sub("^\\D+", "", variable)
)
dcast(mt, rowname + id ~ variable)[, unique(mt$variable)]
where the last line gives this:
name email phone
1 John Doe John#Doe.com (444) 444-4444
2 Jane Doe Jane#Doe.com (444) 444-4445
3 John Smith John#Smith.com (444) 444-4446
4 <NA> Jane#Smith.com <NA>
5 <NA> <NA> <NA>
3) Simple matrix reshaping . Remove the numeric suffixes from the column names and set cn to the unique remaining names. (cn stands for column names). Then we merely reshape the df row into an n x length(cn) matrix adding the column names.
cn <- unique(sub("\\d+$", "", names(df)))
matrix(as.matrix(df), nc = length(cn), byrow = TRUE, dimnames = list(NULL, cn))
name email phone
[1,] "John Doe" "John#Doe.com" "(444) 444-4444"
[2,] "Jane Doe" "Jane#Doe.com" "(444) 444-4445"
[3,] "John Smith" "John#Smith.com" "(444) 444-4446"
[4,] NA "Jane#Smith.com" NA
[5,] NA NA NA
4) tapply This problem can also be solved with a simple tapply. As before names0 is the column names without the numeric suffixes. names.suffix is just the suffixes. Now use tapply :
names0 <- sub("\\d+$", "", names(df))
names.suffix <- sub("^\\D+", "", names(df))
tapply(as.matrix(df), list(names.suffix, names0), c)[, unique(names0)]
The last line gives:
name email phone
1 "John Doe" "John#Doe.com" "(444) 444-4444"
2 "Jane Doe" "Jane#Doe.com" "(444) 444-4445"
3 "John Smith" "John#Smith.com" "(444) 444-4446"
4 NA "Jane#Smith.com" NA
5 NA NA NA

Resources