Replace all non NAs across multiple columns with a specific string - r

Given the following example dataset:
df <- structure(list(Id = 1:10,
Department = c("A", "B", "A", "C",
"A", "B", "B", "C", "D", "A"),
Q1 = c("US", NA, NA, "US",
NA, "US", NA, "US", NA, "US"),
Q2 = c("Comp B", NA, NA,
"Comp B", "Comp B", NA, "Comp B", NA, "Comp B", "Comp B"),
Q3 = c(NA, NA, NA, NA, NA, NA, "Comp C", NA, NA, NA),
Q4 = c(NA, "Comp D", NA, "Comp D", NA, NA, NA, NA, "Comp D", NA),
Sales = c(10, 23, 12, 5, 5, 76, 236, 4, 3, 10)),
row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
Is there a way to replace all non NA values in columns Q2:Q4 with, for instance, the word "Competitor" all at once? I know how to do string_replace on individual columns but with over 100 columns, with different words to be replaced in each, I'm hoping there is a quicker way. I tried messing around with various versions of mutate(across(Q2:Q4, ~str_replace(.x, !is.na, "Competitor"))), which I modelled after mutate(across(Q2:Q4, ~replace_na(.x, 0))) but that didn't work. I'm still not clear on the syntax on across except for the most simple operations and don't even know if it is applicable here.
Thanks!

str_replace is for replacing substring. The second argument with is.na is not be called i.e is.na is a function. We could use replace to replace the entire non-NA element
library(dplyr)
df1 <- df %>%
mutate(across(Q2:Q4, ~ replace(., !is.na(.), "Competitor")))
-output
# A tibble: 10 x 7
Id Department Q1 Q2 Q3 Q4 Sales
<int> <chr> <chr> <chr> <chr> <chr> <dbl>
1 1 A US Competitor <NA> <NA> 10
2 2 B <NA> <NA> <NA> Competitor 23
3 3 A <NA> <NA> <NA> <NA> 12
4 4 C US Competitor <NA> Competitor 5
5 5 A <NA> Competitor <NA> <NA> 5
6 6 B US <NA> <NA> <NA> 76
7 7 B <NA> Competitor Competitor <NA> 236
8 8 C US <NA> <NA> <NA> 4
9 9 D <NA> Competitor <NA> Competitor 3
10 10 A US Competitor <NA> <NA> 10
Or in base R
nm1 <- grep("^Q[2-4]$", names(df), value = TRUE)
df[nm1][!is.na(df[nm1])] <- "Competitor"

Here is another option:
library(dplyr)
library(purrr)
df %>%
mutate(pmap_df(select(df, Q2:Q4), ~ replace(c(...), !is.na(c(...)), "Competitor")))
# A tibble: 10 x 7
Id Department Q1 Q2 Q3 Q4 Sales
<int> <chr> <chr> <chr> <chr> <chr> <dbl>
1 1 A US Competitor NA NA 10
2 2 B NA NA NA Competitor 23
3 3 A NA NA NA NA 12
4 4 C US Competitor NA Competitor 5
5 5 A NA Competitor NA NA 5
6 6 B US NA NA NA 76
7 7 B NA Competitor Competitor NA 236
8 8 C US NA NA NA 4
9 9 D NA Competitor NA Competitor 3
10 10 A US Competitor NA NA 10

Related

How to remove column(s) if a row contains a value?

I have seen lots of posts on how to remove rows if user specified columns contain a certain string.
I want to do the reverse and generalise it. I want to remove every column if any row in that column contains a certain string. (To compare with Excel, I would find all cells containing a given string and then delete every column.)
How can I do this? I was thinking with dplyr and filter, but I have to specify columns I think, or at least the way I would know how to approach it. But I have 300 odd columns and almost 4000 rows.
EDIT: Here is a sample of my dataframe.
# A tibble: 6 x 310
ISIN AU000KFWHAC9 AU3CB0243657 AU3CB0256162 AU3CB0260321 AU3CB0265239 AU3CB0283190 AU3SG0001928 AU3SG0002371
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Timestamp MID_PRICE Mid Price Cl~ Mid Price C~ Mid Price C~ Mid Price C~ Mid Price C~ Mid Price C~ Mid Price C~
2 41275 Invalid RIC. NA NA Invalid RIC. NA Invalid RIC. NA NA
3 41276 NA NA NA NA NA NA NA NA
4 41277 NA NA NA NA 3 NA NA NA
5 41278 NA NA NA NA NA NA NA NA
6 41279 5 NA 4 NA NA NA NA NA
So as you can see, the dataframe is full of lots of NA's. I am unsure if this will affect some functions' ability.
With a dataframe of:
> df <- data.frame(a=c("a", "b", "c"), b=c("bad string", "d", "e"), c=c("f", "g", "h"))
> df
a b c
1 a bad string f
2 b d g
3 c e h
>
Use colSums:
> df[, !colSums(df == "bad string")]
a c
1 a f
2 b g
3 c h
>
Only keep columns where colSums is 0.
You can grep your search:
dat[,-grep("Invalid", dat)]
ISIN AU3CB0243657 AU3CB0256162 AU3CB0265239 AU3SG0001928 AU3SG0002371
1 Timestamp MidPriceC~ MidPriceC~ MidPriceC~ MidPriceC~ MidPriceC~
2 41275 <NA> <NA> <NA> <NA> <NA>
3 41276 <NA> <NA> <NA> <NA> <NA>
4 41277 <NA> <NA> 3 <NA> <NA>
5 41278 <NA> <NA> <NA> <NA> <NA>
6 41279 <NA> 4 <NA> <NA> <NA>
Data:
dat <- structure(list(ISIN = c("Timestamp", "41275", "41276", "41277",
"41278", "41279"), AU000KFWHAC9 = c("MID_PRICE", "Invalid_RIC.",
NA, NA, NA, "5"), AU3CB0243657 = c("MidPriceC~", NA, NA, NA,
NA, NA), AU3CB0256162 = c("MidPriceC~", NA, NA, NA, NA, "4"),
AU3CB0260321 = c("MidPriceC~", "Invalid_RIC.", NA, NA, NA,
NA), AU3CB0265239 = c("MidPriceC~", NA, NA, "3", NA, NA),
AU3CB0283190 = c("MidPriceC~", "Invalid_RIC.", NA, NA, NA,
NA), AU3SG0001928 = c("MidPriceC~", NA, NA, NA, NA, NA),
AU3SG0002371 = c("MidPriceC~", NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-6L))
A solution using dplyr. We can use select and where to apply a function to check if a column contains a certain string or not. dat is from Andre Wildberg's answer.
library(dplyr)
dat2 <- dat %>%
select(where(function(x) all(!grepl("Invalid", x))))
dat2
# ISIN AU3CB0243657 AU3CB0256162 AU3CB0265239 AU3SG0001928 AU3SG0002371
# 1 Timestamp MidPriceC~ MidPriceC~ MidPriceC~ MidPriceC~ MidPriceC~
# 2 41275 <NA> <NA> <NA> <NA> <NA>
# 3 41276 <NA> <NA> <NA> <NA> <NA>
# 4 41277 <NA> <NA> 3 <NA> <NA>
# 5 41278 <NA> <NA> <NA> <NA> <NA>
# 6 41279 <NA> 4 <NA> <NA> <NA>

How to move data from one column to another in R

I am trying to move data from one column to another, due to the underlying forms being filled out incorrectly.
In the form it asks for information on a household and asks for their age(AGE) and gender(SEX) for each member, allowing up to 5 people per household. However some users have filled in information for person 1,3 and 4, but not filled in any info for person 2 because they filled out person 2 incorrectly, crossed out the details and have filled person 2 details into the person 3 boxes etc.
The data looks like this (ref 1 and 5 are correct in this data, all others are incorrect)
df <- data.frame(
ref = c(1, 2, 3, 4, 5, 6),
AGE1 = c(45, 36, 26, 47, 24, NA),
AGE2 = c(NA, 24, NA, 13, 57, 28),
AGE3 = c(NA, NA, 35, NA, NA, 26),
AGE4 = c(NA, NA, 15, 11, NA, NA),
AGE5 = c(NA, 15, NA, NA, NA, NA),
SEX1 = c("M", "F", "M", "M", "M", NA),
SEX2 = c(NA, "M", NA, "F", "F", "F"),
SEX3 = c(NA, NA, "M", NA, NA, "M"),
SEX4 = c(NA, NA, "F", "F", NA, NA),
SEX5 = c(NA, "F", NA, NA, NA, NA)
)
This is what the table looks like currently
(I have replaced NA with - to make reading easier)
ref
AGE1
AGE2
AGE3
AGE4
AGE5
SEX1
SEX2
SEX3
SEX4
SEX5
1
45
-
-
-
-
M
-
-
-
-
2
36
24
-
-
15
F
M
-
-
F
3
26
-
35
15
-
M
-
M
F
-
4
47
13
-
11
-
M
F
-
F
-
5
24
57
-
-
-
M
F
-
-
-
6
-
28
26
-
-
-
F
M
-
-
but i would like it to look like this
ref
AGE1
AGE2
AGE3
AGE4
AGE5
SEX1
SEX2
SEX3
SEX4
SEX5
1
45
-
-
-
-
M
-
-
-
-
2
36
24
15
-
-
F
M
F
-
-
3
26
35
15
-
-
M
M
F
-
-
4
47
13
11
-
-
M
F
F
-
-
5
24
57
-
-
-
M
F
-
-
-
6
28
26
-
-
-
F
M
-
-
-
Is there a way of correcting this using dplyr? If not, is there another way in R of correcting the data
Here is a way using dplyr and tidyr. The approach involves pivoting the data to longer format, sorting the NA values to the end, renumbering the column names, and the pivoting to wide form again.
library(dplyr)
library(tidyr)
df <- data.frame(ref, AGE1, AGE2, AGE3, AGE4, AGE5,
SEX1, SEX2, SEX3, SEX4, SEX5)
df %>%
mutate(across(starts_with("AGE"), as.character)) %>%
pivot_longer(2:11) %>%
separate(name, into = c("cat", "num"), 3) %>%
arrange(is.na(value)) %>%
group_by(ref, cat) %>%
mutate(num = seq_along(value)) %>%
ungroup() %>%
arrange(cat) %>%
unite(name, cat, num, sep = "") %>%
pivot_wider(id_cols = ref) %>%
mutate(across(starts_with("AGE"), as.numeric))
# A tibble: 6 x 11
ref AGE1 AGE2 AGE3 AGE4 AGE5 SEX1 SEX2 SEX3 SEX4 SEX5
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
1 1 45 NA NA NA NA M NA NA NA NA
2 2 36 24 15 NA NA F M F NA NA
3 3 26 35 15 NA NA M M F NA NA
4 4 47 13 11 NA NA M F F NA NA
5 5 24 57 NA NA NA M F NA NA NA
6 6 28 26 NA NA NA F M NA NA NA
Here's a way using dplyr and tidyr library.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -ref,
names_to = c('.value', 'num'),
names_pattern = '([A-Z]+)(\\d+)') %>%
arrange(ref, AGE, SEX) %>%
group_by(ref) %>%
mutate(num = row_number()) %>%
ungroup %>%
pivot_wider(names_from = num, values_from = c(AGE, SEX))
# ref AGE_1 AGE_2 AGE_3 AGE_4 AGE_5 SEX_1 SEX_2 SEX_3 SEX_4 SEX_5
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
#1 1 45 NA NA NA NA M NA NA NA NA
#2 2 15 24 36 NA NA F M F NA NA
#3 3 15 26 35 NA NA F M M NA NA
#4 4 11 13 47 NA NA F F M NA NA
#5 5 24 57 NA NA NA M F NA NA NA
#6 6 26 28 NA NA NA M F NA NA NA
Try the base code below
u1 <- reshape(
setNames(df, sub("(\\d)", ".\\1", names(df))),
direction = "long",
idvar = "ref",
varying = -1
)
u2 <- reshape(
transform(
u1[with(u1, order(is.na(AGE), is.na(SEX))), ],
time = ave(time, ref, FUN = seq_along)
),
direction = "wide",
idvar = "ref"
)
out <- u2[match(names(df),sub("\\.","",names(u2)))]
and you will get
> out
ref AGE.1 AGE.2 AGE.3 AGE.4 AGE.5 SEX.1 SEX.2 SEX.3 SEX.4 SEX.5
1.1 1 45 NA NA NA NA M <NA> <NA> <NA> <NA>
2.1 2 36 24 15 NA NA F M F <NA> <NA>
3.1 3 26 35 15 NA NA M M F <NA> <NA>
4.1 4 47 13 11 NA NA M F F <NA> <NA>
5.1 5 24 57 NA NA NA M F <NA> <NA> <NA>
6.2 6 28 26 NA NA NA F M <NA> <NA> <NA>
data
df <- data.frame(
ref = c(1, 2, 3, 4, 5, 6),
AGE1 = c(45, 36, 26, 47, 24, NA),
AGE2 = c(NA, 24, NA, 13, 57, 28),
AGE3 = c(NA, NA, 35, NA, NA, 26),
AGE4 = c(NA, NA, 15, 11, NA, NA),
AGE5 = c(NA, 15, NA, NA, NA, NA),
SEX1 = c("M", "F", "M", "M", "M", NA),
SEX2 = c(NA, "M", NA, "F", "F", "F"),
SEX3 = c(NA, NA, "M", NA, NA, "M"),
SEX4 = c(NA, NA, "F", "F", NA, NA),
SEX5 = c(NA, "F", NA, NA, NA, NA)
)
Here is a solution using package dedupewider:
library(dedupewider)
df <- data.frame(
ref = c(1, 2, 3, 4, 5, 6),
AGE1 = c(45, 36, 26, 47, 24, NA),
AGE2 = c(NA, 24, NA, 13, 57, 28),
AGE3 = c(NA, NA, 35, NA, NA, 26),
AGE4 = c(NA, NA, 15, 11, NA, NA),
AGE5 = c(NA, 15, NA, NA, NA, NA),
SEX1 = c("M", "F", "M", "M", "M", NA),
SEX2 = c(NA, "M", NA, "F", "F", "F"),
SEX3 = c(NA, NA, "M", NA, NA, "M"),
SEX4 = c(NA, NA, "F", "F", NA, NA),
SEX5 = c(NA, "F", NA, NA, NA, NA)
)
age_moved <- na_move(df, cols = names(df)[grepl("^AGE\\d$", names(df))]) # 'right' direction is by default
sex_moved <- na_move(age_moved, cols = names(df)[grepl("^SEX\\d$", names(df))])
sex_moved
#> ref AGE1 AGE2 AGE3 AGE4 AGE5 SEX1 SEX2 SEX3 SEX4 SEX5
#> 1 1 45 NA NA NA NA M <NA> <NA> NA NA
#> 2 2 36 24 15 NA NA F M F NA NA
#> 3 3 26 35 15 NA NA M M F NA NA
#> 4 4 47 13 11 NA NA M F F NA NA
#> 5 5 24 57 NA NA NA M F <NA> NA NA
#> 6 6 28 26 NA NA NA F M <NA> NA NA

In R, a more elegant solution for finding "not-missing' values in one column, then adding a string based on these rows in a new variable?

In the multiple choice questions I have to analyze, there is an "other" option. For those where they can only pick one option, I will be uniting the separate columns of each answer choice into 1 column using unite. Wherever a person has written in an "other" string, instead of choosing one of the provided options, I want to change that row to say "Other."
For example:
ID Sector1 Sector2 Sector3 .... Sector13(Other)
A NA NA "String3" NA
B "String1" NA NA NA
C "String1" NA NA NA
D NA NA NA "Other string1"
E NA NA NA "Other string2"
ID NewSectorColumn
A "String3"
B "String1"
C "String1"
D "Other"
E "Other"
Here's my code:
I first create the new variable ($SectorOther) in RawData1, then change the values of $SectorOther into "Other" if $Sector13 (the column which contains all the "other" answers) is not blank.
rawdata1$SectorOther <- rawdata1$sector13
rawdata1$SectorOther [which(!is.na(rawdata1$SectorOther))] <- "Other Sector"
Just wondering if there is a more elegant solution, where I can combine doing the same thing to a few other variables like this one.
Maybe you can use apply() and a build-in function to organize your code like this. The grepl() function aims to detect the string with other word. Here the code:
#Code
myfun <- function(x)
{
y <- x[min(which(!is.na(x)))]
y <- ifelse(grepl('other',y,ignore.case = T),'Other',y)
return(y)
}
#Apply
df$Newvar <- apply(df[,-1],1,myfun)
Output:
df
ID Sector1 Sector2 Sector3 Sector13.Other. Newvar
1 A <NA> NA String3 <NA> String3
2 B String1 NA <NA> <NA> String1
3 C String1 NA <NA> <NA> String1
4 D <NA> NA <NA> Other string1 Other
5 E <NA> NA <NA> Other string2 Other
Some data used:
#Data
df <- structure(list(ID = c("A", "B", "C", "D", "E"), Sector1 = c(NA,
"String1", "String1", NA, NA), Sector2 = c(NA, NA, NA, NA, NA
), Sector3 = c("String3", NA, NA, NA, NA), Sector13.Other. = c(NA,
NA, NA, "Other string1", "Other string2")), row.names = c(NA,
-5L), class = "data.frame")
In base R you could do:
indices <- cbind(1:nrow(df),max.col(!is.na(df[-1]), 'first'))
df$New_col <- sub('(Other).*', '\\1', df[-1][indices])
df
ID Sector1 Sector2 Sector3 Sector13.Other. New_col
1 A <NA> NA String3 <NA> String3
2 B String1 NA <NA> <NA> String1
3 C String1 NA <NA> <NA> String1
4 D <NA> NA <NA> Other string1 Other
5 E <NA> NA <NA> Other string2 Other
another option is to use ``coalescefromdplyr`:
df$NewSectorColumn <- sub("(Other).*","\\1",dplyr::coalesce(!!!df[-1]))
df
ID Sector1 Sector2 Sector3 Sector13.Other. NewSectorColumn
1 A <NA> NA String3 <NA> String3
2 B String1 NA <NA> <NA> String1
3 C String1 NA <NA> <NA> String1
4 D <NA> NA <NA> Other string1 Other
5 E <NA> NA <NA> Other string2 Other
An option with fcoalecse from data.table
library(data.table)
setDT(df)[, NewSectorColumn := sub("\\s+.*", "",
do.call(fcoalesce, lapply(.SD, as.character))),
.SDcols = patterns('^Sector')]
-output
df
# ID Sector1 Sector2 Sector3 Sector13.Other. NewSectorColumn
#1: A <NA> NA String3 <NA> String3
#2: B String1 NA <NA> <NA> String1
#3: C String1 NA <NA> <NA> String1
#4: D <NA> NA <NA> Other string1 Other
#5: E <NA> NA <NA> Other string2 Other
data
df <- structure(list(ID = c("A", "B", "C", "D", "E"), Sector1 = c(NA,
"String1", "String1", NA, NA), Sector2 = c(NA, NA, NA, NA, NA
), Sector3 = c("String3", NA, NA, NA, NA), Sector13.Other. = c(NA,
NA, NA, "Other string1", "Other string2")), row.names = c(NA,
-5L), class = "data.frame")

collapsing columns values in a specific order and leaving the missing values as NA in R

I am using R.
I have 4 different databases. Each one have values for my variables. Some of the bases have more values than others. So I want to use first the one that has the most values and lastly the one that have the least values. The data looks like this...
Variables A B C D
John 2 4
Mike 6
Walter 7
Jennifer 9 8
Amanda 3
Carlos 9
Michael 3
James 5
Kevin 4
Dennis 7
Frank
Steven
Joseph
Elvis 2
Maria 1
So, in roder to fill the data a need to create a new column that first uses the data of column B because is the one that contains the most values, then A, then C and then D and the ones that are missing need to be NA's. Also I need to add another column that gives me the reference of the data. In other words if I am using the column B to the that of John I need a column that tells me that the data pertains to column B.
The column should look like this...
Variables E D
John 4 B
Mike 6 B
Walter 7 B
Jennifer 9 B
Amanda 3 A
Carlos 9 A
Michael 3 B
James 5 D
Kevin 4 A
Dennis 7 C
Frank NA NA
Steven NA NA
Joseph NA NA
Elvis 2 B
Maria 1 B
With tidyverse you can do the following...
Use pivot_longer to put into long form. Make name an ordered factor by "B", "A", "C", and "D". Then when you arrange, you can get the first value by this order within each person's name.
This assumes your missing data are NA. If they are instead blank character values, you can filter those out with filter(value != "") instead of drop_na(value).
library(tidyverse)
df %>%
pivot_longer(cols = -Variables) %>%
mutate(name = ordered(name, levels = c('B', 'A', 'C', 'D'))) %>%
group_by(Variables) %>%
drop_na(value) %>%
arrange(name) %>%
summarise(E = first(value),
New_D = first(name)) %>%
right_join(df)
Output
Variables E New_D A B C D
<chr> <dbl> <ord> <dbl> <dbl> <dbl> <dbl>
1 Amanda 3 A 3 NA NA NA
2 Carlos 9 A 9 NA NA NA
3 Dennis 7 C NA NA 7 NA
4 Elvis 2 B NA 2 NA NA
5 James 5 D NA NA NA 5
6 Jennifer 9 B NA 9 8 NA
7 John 4 B 2 4 NA NA
8 Kevin 4 A 4 NA NA NA
9 Maria 1 B NA 1 NA NA
10 Michael 3 B NA 3 NA NA
11 Mike 6 B NA 6 NA NA
12 Walter 7 B NA 7 NA NA
13 Frank NA NA NA NA NA NA
14 Steven NA NA NA NA NA NA
15 Joseph NA NA NA NA NA NA
Data
df <- structure(list(Variables = c("John", "Mike", "Walter", "Jennifer",
"Amanda", "Carlos", "Michael", "James", "Kevin", "Dennis", "Frank",
"Steven", "Joseph", "Elvis", "Maria"), A = c(2, NA, NA, NA, 3,
9, NA, NA, 4, NA, NA, NA, NA, NA, NA), B = c(4, 6, 7, 9, NA,
NA, 3, NA, NA, NA, NA, NA, NA, 2, 1), C = c(NA, NA, NA, 8, NA,
NA, NA, NA, NA, 7, NA, NA, NA, NA, NA), D = c(NA, NA, NA, NA,
NA, NA, NA, 5, NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-15L))

Appending a value from one table into another based on a criteria

I have two tables that I'm trying to join in a particular way. One is a simple tibble that provides a HEX color and its category that it is associated with:
library(tibble)
library(dplyr)
colors <- tibble(Category = c("A", "B", "C", "D"),
Colors = c("#0079c0", "#cc9900", "#252525", "#c5120e"))
# A tibble: 4 × 2
Category Colours
<chr> <chr>
1 A #0079c0
2 B #cc9900
3 C #252525
4 D #c5120e
I have another tibble that lists the categories both as rows and columns, and those appear in a specific way:
Main_Table <- tibble(Category = c("A", "B", "C", "D"),
A = c(NA, "A", NA, NA),
B = c(NA, NA, NA, NA),
C = c(NA, "C", NA, NA),
D = c("D", "D", NA, NA))
# A tibble: 4 × 5
Category A B C D
<chr> <chr> <lgl> <chr> <chr>
1 A <NA> NA <NA> D
2 B A NA C D
3 C <NA> NA <NA> <NA>
4 D <NA> NA <NA> <NA>
I want to join the color into the main table based on whether its corresponding category is present under the variable that bears its name. For example, let's say that if I want category D's color to be included, I'd end up with the below:
Main_Table_Goal <- tibble(Category = c("A", "B", "C", "D"),
A = c(NA, "A", NA, NA),
B = c(NA, NA, NA, NA),
C = c(NA, "C", NA, NA),
D = c("D", "D", NA, NA),
color = c("#c5120e", "#c5120e", NA, NA))
# A tibble: 4 × 6
Category A B C D color
<chr> <chr> <lgl> <chr> <chr> <chr>
1 A <NA> NA <NA> D #c5120e
2 B A NA C D #c5120e
3 C <NA> NA <NA> <NA> <NA>
4 D <NA> NA <NA> <NA> <NA>
How do I achieve this using dplyr? I've been trying with *_join and other tricks, but I've not gotten anywhere.
EDIT: I should have mentioned that I'd like to eventually include this in a function, so ideally the code can be flexible to accommodate any number of categories.
Here is an option using match
Main_Table %>%
mutate(color = colors$Colors[match(D, colors$Category)])
# A tibble: 4 × 6
# Category A B C D color
# <chr> <chr> <lgl> <chr> <chr> <chr>
#1 A <NA> NA <NA> D #c5120e
#2 B A NA C D #c5120e
#3 C <NA> NA <NA> <NA> <NA>
#4 D <NA> NA <NA> <NA> <NA>
I am not sure how many categories you have in your data. But if you have only four (i.e., A, B, C, and D), the following would be one way. I wanted to work with one data frame. So I initially merged the two data frames. I converted B in logical to character since I wanted to use mutate_at(). Then, I replaced the four categories with the four colors. Finally, I removed Colors and converted B to logical.
library(dplyr)
left_join(Main_Table, colors) %>%
mutate(B = as.character(B)) %>%
mutate_at(vars(A:D),
funs(color = recode(., A = Colors[1],
B = Colors[2],
C = Colors[3],
D = Colors[4]))) %>%
select(-Colors) %>%
mutate(B = as.logical(B))
Given akrun's idea, you can do the following. As long as you can tell how many category you have, you just specify the columns in vars(). If all columns are in character, no need to convert logical to character.
left_join(Main_Table, colors) %>%
mutate(B = as.character(B)) %>%
mutate_at(vars(A:D),funs(color = Colors[match(., Category)])) %>%
select(-Colors) %>%
mutate(B = as.logical(B))
# Category A B C D A_color B_color C_color D_color
# <chr> <chr> <lgl> <chr> <chr> <chr> <chr> <chr> <chr>
#1 A <NA> NA <NA> D <NA> <NA> <NA> #c5120e
#2 B A NA C D #0079c0 <NA> #252525 #c5120e
#3 C <NA> NA <NA> <NA> <NA> <NA> <NA> <NA>
#4 D <NA> NA <NA> <NA> <NA> <NA> <NA> <NA>
This is a dynamic solution where you set the color parameter once at the top:
target_category <- 'D' # set color
target_category_table <- Main_Table %>%
select_(target_category) %>%
left_join(colors %>%
filter(Category == target_category) %>%
setNames(c(target_category, 'color')))
goal_table <- Main_Table %>%
bind_cols(select(target_category_table, color))
goal_table
Result:
# A tibble: 4 × 6
Category A B C D color
<chr> <chr> <lgl> <chr> <chr> <chr>
1 A <NA> NA <NA> D #c5120e
2 B A NA C D #c5120e
3 C <NA> NA <NA> <NA> <NA>
4 D <NA> NA <NA> <NA> <NA>

Resources