Create a new column using mutate_at() in R - r

i'm trying to do some modifications to the next data frame:
df <- data.frame(
zgen = c("100003446", "100001749","100002644","100001755"),
Name_mat = c("EVEROLIMUS 10 MG CM", "GALSULFASA 5MG/5ML FAM", "IDURSULFASE 2MG/ML SOL. P/INFUSION FAM","IMIGLUCERASA 400U POL. LIOF. FAM"),
details= c("CM", "FAM", "SOL. P/INFUSION FAM","NA")
)
And i'm using mutate_at() from dplyr package to create a new column calling "type". That column can change depending of a list of characters that can appear in the columns of my data frame ("name_mat" and "details"). The code is:
df <- df %>% mutate_at(vars(one_of("Name_mat ","details")),
funs(case_when( "FAM|FRA" == TRUE ~ "FA",
"CM|COMPRIMIDO" == TRUE~ "COM",
"SOL"== TRUE~"SOL",
"CP|CAPSULA"== TRUE~"CAP",
TRUE ~ "bad_mat")))
My first time using mutate_at and i don't know how to create a new column calling "type" in my data frame "df". Finally i need something like:
ZGEN Name_mat details Type
1 100003446 EVEROLIMUS 10 MG CM CM COM
2 100001749 GALSULFASA 5MG/5ML FAM FAM FA
3 100002644 IDURSULFASE 2MG/ML SOL. P/INFUSION FAM SOL. P/INFUSION FAM FA
4 100001755 IMIGLUCERASA 400U POL. LIOF. FAM NA FA
I appreciate any help or any other point of view about how to do this.
Thanks!

try to do it this way
library(tidyverse)
library(stringr)
df %>% mutate(TYPE = case_when(
str_detect(Name_mat, pattern = "FAM") | str_detect(details, "FRA") ~ "FA",
str_detect(Name_mat, pattern = "CM") | str_detect(details, "COMPRIMODO") ~ "CM",
str_detect(Name_mat, pattern = "SOL") ~ "SOL",
str_detect(Name_mat, pattern = "CP") | str_detect(details, "CAPSULA") ~ "CAP",
TRUE ~ "bad_mat"))

We can also use
library(dplyr)
library(purrr)
library(stringr)
pat <- "\\b(FAM|FRA|CM|COMPRIMIDO|SOL|CP|CAPSULA)\\b"
nm1 <- setNames(c("FA", "FA", "COM", "COM", "SOL", "CAP", "CAP"),
c("FAM", "FRA", "CM", "COMPRIMIDO", "SOL", "CP", "CAPSULA"))
df %>%
select(Name_mat, details) %>%
map(str_extract_all, pattern = pat) %>%
transpose %>%
map_chr( ~ nm1[flatten_chr(.x)][1] ) %>%
bind_cols(df, Type = .)

Related

How to use grepl function to subset rows based on multiple conditions across columns in R

I am trying to subset a dataset using the grepl function.
I want to retain rows where all columns contain '#'
I tried this code but it doesn't work.
all_nullx <- riv %>% with(riv, riv[ grepl( '#', col1) & grepl( '#', col2) & grepl('#', col3) & grepl('#', left_index) & grepl('#', right_middle) & grepl('#', col4), ])
Thanks
Tidyverse
We can apply a function across all columns using everything() inside of across, then keep only rows that have # in every column.
library(tidyverse)
riv %>%
filter(across(everything(), ~ grepl("#", .)))
Or with stringr:
riv %>%
filter(across(everything(), ~ str_detect(., "#")))
base R
Or we can use grepl with Reduce from base R:
riv[Reduce(`&`, lapply(riv, grepl, pattern = "#")),]
Or one more base R possibility:
riv[apply(riv , 1 , function(x) all(grepl("#", x))), ]
Output
a b c
1 A#r C#r F#r
2 B#r D#r G#r
Data
riv <- structure(list(
a = c("A#r", "B#r", "Rr"),
b = c("C#r", "D#r", "E#r"),
c = c("F#r", "G#r", "Hr")
),
class = "data.frame",
row.names = c(NA,-3L))
one nice option within the tidyverse, specifically dplyr in this case, can be the if_all() function (I will use #andrews data)
riv <- structure(list(
a = c("A#r", "B#r", "Rr"),
b = c("C#r", "D#r", "E#r"),
c = c("F#r", "G#r", "Hr")
),
class = "data.frame",
row.names = c(NA,-3L))
library(dplyr)
riv %>%
dplyr::filter(if_all(everything(), ~grepl("#", .x)))
a b c
1 A#r C#r F#r
2 B#r D#r G#r

Match text from one column with another column (vlookup + like)

I'm trying to perform a match of 2 columns but without success. I have one DF1 with 2 columns, Id and JSON. In the second DF2, I have one column with a pattern to be matched in each row for DF1$json (something like vlookup + like function).
As an output, I'd like to get DF1$Id but only where any of DF2 is matched with DF1$json.
I've tried some combinations with str_detect but it doesn't work on non-vector values. Maybe some tricks with grep or stringr functions?
For example:
str_detect(DF1$json, fixed(DF2[1,1], ignore_case = TRUE))
df1 <- data.frame(
Id = c("AA", "BB", "CC", "DD"),
json = c("{xxx:yyy:zzz};{mmm:zzz:vvv}", "{ccc:yyy:zzz};{ddd:zzz:vvv}", "{ttt:yyy:zzz};{mmm:zzz:vvv}", "{uuu:yyy:zzz};{mmm:zzz:vvv}")
)
matches <- c("mmm:zzz:vvv", "mmm:yyy:zzz")
library(stringr) # needed for str_extract_all()
Solution using data.table
library(data.table)
setDT(df1)
df1[, match := any(str_extract_all(json, "(?<=\\{).+?(?=\\})")[[1]] %in% matches), by = Id]
df1[match == T, .(Id)]
Solution using dplyr
library(dplyr)
df1 %>%
group_by(Id) %>%
mutate(match = any(str_extract_all(json, "(?<=\\{).+?(?=\\})")[[1]] %in% matches)) %>%
filter(match == T) %>%
select(Id)
Or just directly filter()
df1 %>%
group_by(Id) %>%
filter(any(str_extract_all(json, "(?<=\\{).+?(?=\\})")[[1]] %in% matches)) %>%
select(Id)
Output on both methods
Id
1: AA
2: CC
3: DD
Does this give you the expected result :
my_df <- data.frame("id" = c("AA", "BB", "CC", "DD"),
"json" = c("{x:y:z};{m:z:v}", "{c:y:z};{d:z:v}", "{t:y:z};{m:z:v}", "{u:y:z};{m:z:v}"),
"pattern" = c("m:z:v", "t:y:z", "m:z:v", "t"),
stringsAsFactors = FALSE)
my_f <- function(x) {
my_var <- paste(grep(pattern = my_df[x, "pattern"], x = my_df$json), collapse = " ")
return (my_var)
}
my_df$Value <- lapply(1:nrow(my_df), my_f)

How to use dplyr to recode all values that are not in a list of values?

I have a dataframe with a list of names I want to keep, the rest I want to recode to "Other".
namesToKeep = data.frame(
car = c("Merc 240D", "Cadillac Fleetwood"),
reason = c("Great car","Me like")
)
selectedCars <- namesToKeep$car
names(selectedCars) <- selectedCars
mtcars %>% mutate(CarName = rownames(mtcars)) %>%
mutate(
CarName = recode(CarName, !!!selectedCars, .default = "Other")
)
The above code works, and demonstrates what I want to do. But I would like to clean it up a bit and not have to calculate a named vector of selected names before hand.
Using base R I can do this by directly mutating the dataframe but I wonder what is the idiomatic dplry way of doing this?
There is a deframe function from tibble which can directly convert the data.frame to a named vector
library(dplyr)
library(tibble)
mtcars %>%
mutate(CarName = rownames(mtcars),
CarName = recode(CarName, !!!deframe(namesToKeep), .default = "Other"))
If it is on the same column, use setNames to create the names on the fly
mtcars %>%
mutate(CarName = rownames(.),
CarName = recode(CarName,
!!! setNames(namesToKeep$car, namesToKeep$car), .default = "Other"))
Or a hacky option is to duplicate the column with indexing and then deframe
mtcars %>%
mutate(CarName = rownames(.),
CarName = recode(CarName,
!!! deframe(namesToKeep[c(1, 1)]), .default = "Other"))

short hand for using str_detect and & in filter

I'm trying to get the short hand for using str_detect and & to filter a dataframe:
library(tidyverse)
df <- data.frame(type = c("age", "age and sex", "sex"))
# type
# 1 age
# 2 age and sex
# 3 sex
I want to shorten this pipe
df %>%
filter(str_detect(type, "age") & str_detect(type, "sex"))
# type
# 1 age and sex
So I'd like to pipe the filter to map over pattern <- c("age", "sex") and maybe use reduce somehow?
Thanks
We can use a regex to specify zero or more characters (*) following the 'age' succeeded by 'sex'. The \\b is to specify word boundary so that it won't match 'adage' etc.
library(dplyr)
library(stringr)
df %>%
filter(str_detect(type, '\\bage\\b.*\\bsex'))
Or use map/reduce
library(purrr)
df %>%
filter(map(c('age', 'sex'), ~ str_detect(type, .x)) %>% reduce(`&`))

How to subset the next column in R

df <- data.frame(intro = c("bob","bob","bob"),
intro_score = c("Excellent","Excellent","Good"),
method = c("sally","sally","sally"),
method_score = c("Excellent","Excellent","Excellent"),
result = c("Norman","Norman","Norman"),
result_score = c("Good","Good","Good"))
If I want to look for "bob" in this dataframe, how do I return the column next to "bob" (intro_score only), assuming I'm not sure if "bob" is in here. Say, if I were to look for "ken", the result should be null. If I were to look for "Norman", the result should return result_score.
I have tried something like this:
name <- "bob"
df_name <- df %>%
if (str_detect(intro, name)) {
select((which(colnames==str_detect(intro, name)))+1)
} else {}
Thank you for your help!
using base R if you need the names you could do:
names(df[unique(which(df=="bob",TRUE)[,2]+1)])
[1] "intro_score"
or if you need the column values, you do:
df[unique(which(df=="bob",TRUE)[,2]+1)]
intro_score
1 Excellent
2 Excellent
3 Good
You could reshape your data into time (intro, method, result), name, and score.
df2 <- reshape(df, direction = "long", varying = list(c(1,3,5), c(2,4,6)), v.names = c("name", "score"), times = c("intro", "method", "result"))
df2[df2$name == "Norman", "score"]
library(purrr)
search_person <- "bob"
colnames(df)[which(map_lgl(df,~all(.x == search_person))) + 1]
"intro_score"
Here is one option with select_if
library(dplyr)
library(magrittr)
df %>%
select_if(~ any(. == "bob")) %>%
names %>%
match(., names(df)) %>%
add(1) %>%
names(df)[.]
#[1] "intro_score"

Resources