mutate and if_any with condition over multiple columns - r

I tried to combine mutate, case_when and if_any to create a variable = 1 if any of the variables whose name begins with "string" is equal to a specific string.
I can't figure out what I'm missing in the combination of these conditions.
I'm trying:
df <-data.frame(string1= c("a","b", "c"), string2= c("d", "a", "f"), string3= c("a", "d", "c"), id= c(1,2,3))
df <- df%>%
mutate(cod = case_when(if_any(starts_with("string") == "a" ~1 )))

The syntax was slightly wrong, but you were close. Note that if_any works like across, so like this if_any(columns, condition), and you should use function, \ or ~ to specify the condition.
df %>%
mutate(cod = case_when(if_any(starts_with("string"), ~ .x == "a") ~ 1))
string1 string2 string3 id cod
1 a d a 1 1
2 b a d 2 1
3 c f c 3 NA

Related

In R, subset a dataframe on rows whose ID appears more than once [duplicate]

This question already has answers here:
Finding ALL duplicate rows, including "elements with smaller subscripts"
(9 answers)
Closed last month.
Background
I have a dataframe d with ~10,000 rows and n columns, one of which is an ID variable. Most ID's appear once, but some appear more than once. Say that it looks like this:
Problem
I'd like a new dataframe d_sub which only contains ID's that appear more than once in d. I'd like to have something that looks like this:
What I've tried
I've tried something like this:
d_sub <- subset(d, duplicated(d$ID))
But that only gets me one entry for ID's b and d, and I want each of their respective rows:
Any thoughts?
We may need to change the duplicated with | condition as duplicated by itself is FALSE for the first occurrence of 'ID'
d_sub <- subset(d, duplicated(ID)|duplicated(ID, fromLast = TRUE))
We could use add_count, then filter on n:
library(dplyr)
df %>%
add_count(ID) %>%
filter(n!=1) %>%
select(-n)
Example:
library(dplyr)
df <- tribble(
~ID, ~gender, ~zip,
"a", "f", 1,
"b", "f", NA,
"b", "m", 2,
"c", "f", 3,
"d", "f", NA,
"d", "m", 4)
df %>%
add_count(ID) %>%
filter(n!=1) %>%
select(-n)
Output:
ID gender zip
<chr> <chr> <dbl>
1 b f NA
2 b m 2
3 d f NA
4 d m 4

recode several variables placed in a vector in a column

I want to know how to recode several variables placed in a vector in a column, I put an example of my code.
library(tidyverse)
df <- data.frame(
"num" = 1:5,
"letter" = c("a", "b", "c", "d" , "e"))
vector<-c("a","c")
df<-df %>% mutate(letter=recode(letter,vector="no"))
I think recode doesn't take vector arguments. Instead of recode you can try a simple if_else statement like this:
df<-df %>% mutate(letter= if_else(letter %in% vector, 'no', letter))
You can change the values of letter for vector values using %in% -
df$letter[df$letter %in% vector] <- 'no'
df
# num letter
#1 1 no
#2 2 b
#3 3 no
#4 4 d
#5 5 e
Or in data.table -
library(data.table)
setDT(df)[letter %in% vector, letter := 'no']
df

How do I mutate a list-column to a common one leaving only the last value when there is a vector in the list?

I am trying to use purrr::map_chr to get the last element of the vector in a list-column as the actual value in case that it exists.
THE reproducible example:
library(data.table)
library(purrr)
x <- data.table(one = c("a", "b", "c"), two = list("d", c("e","f","g"), NULL))
I want data as it is but changing my list-column to a common one with "g" as the value for x[2,2]. What I've tryed:
x %>% mutate(two = ifelse(is.null(.$two), map_chr(~NA_character_), map_chr(~last(.))))
The result should be the next one.
# one two
# a d
# b g
# c NA
Thaks in advance!
Here is an option. We can use if/else instead of ifelse here
library(dplyr)
library(tidyr)
x %>%
mutate(two = map_chr(two, ~ if(is.null(.x)) NA_character_ else last(.x)))
# one two
#1 a d
#2 b g
#3 c NA
Or replace the NULL elements with NA and extract the last
x %>%
mutate(two = map_chr(two, ~ last(replace(.x, is.null(.), NA))))
I would propose this solution which is a bit cleaner.
library(tidyverse)
df <- tibble(one = c("a", "b", "c"), two = list("d", c("e","f","g"), NULL))
df %>%
mutate_at("two", replace_na, NA_character_) %>%
mutate_at("two", map_chr, last)

Add a row to a dataframe that repeats a row and replaces 2 entries

I want to add rows to a dataframe (or tibble) as part of a data entry project. I need to:
Find one row that holds a specific value in one column (obsid)
Duplicate that row. However, replace the value in column "word".
Append the new row to the dataframe
I want to write a function that makes it easy. When I write the function, it won't add the new rows. I can print out the answer. But it won't alter the basic dataframe
If I do it without a function it works as well.
Why won't the function add the row?
df <- tibble(obsid = c("a","b" , "c", "d"), b=c("a", "a", "b", "b"), word= c("what", "is", "the", "answer"))
df$main <- 1
addrow <- function(id, newword) {
rowtoadd <- df %>%
filter(obsid== id & main==1) %>%
mutate(word=replace(word, main==1, newword)) %>%
mutate(main=replace(main, word==newword, 0))
df <- bind_rows(df, rowtoadd)
print(rowtoadd)
print(filter(df, df$obsid== id))}
addrow("a", "xxx")
R objects usually don't modify itself, you need to warp the result in return() to return the modified copy of that dataframe.
Change your function to:
df <- tibble(obsid = c("a","b" , "c", "d"), b=c("a", "a", "b", "b"), word= c("what", "is", "the", "answer"))
df$main <- 1
addrow <- function(id, newword) {
rowtoadd <- df %>%
filter(obsid== id & main==1) %>%
mutate(word=replace(word, main==1, newword)) %>%
mutate(main=replace(main, word==newword, 0))
df <- bind_rows(df, rowtoadd)
return(df)
}
> addrow("a", "xxx")
# A tibble: 5 x 4
obsid b word main
<chr> <chr> <chr> <dbl>
1 a a what 1
2 b a is 1
3 c b the 1
4 d b answer 1
5 a a xxx 0

How to change the df column name within a list

I have a list of dfs. The dfs all have the same column names. I would like to:
(1) Change one of the column names to the name of the df within the list
(2) full_join all the dfs after name change
Example of my list:
my_list <- list(one = data.frame(Type = c(1,2,3), Class = c("a", "a", "b")),
two = data.frame(Type = c(1,2,3), Class = c("a", "a", "b")))
Output that I want:
data.frame(Type = c(1,2,3),
one = c("a", "a", "b"),
two = c("a", "a", "b"))
Type one two
1 a a
2 a a
3 b b
You could possible use dplyr::bind_rows combined with tidyr::spread to achieve the same result (if you are happy to consider alternative approaches). For example:
library(tidyverse)
my_list %>% bind_rows(.id = "groups") %>% spread(groups, Class)
#> Type one two
#> 1 1 a a
#> 2 2 a a
#> 3 3 b b
The first step can be tricky, but it's simple if you iterate over names(my_list).
transformed <- sapply(names(my_list), function(name) {
df <- my_list[[name]]
colnames(df)[colnames(df) == 'Class'] <- name
df
}, simplify = FALSE, USE.NAMES = TRUE)
With purrr::reduce and dplyr::full_join the result can be obtained:
purrr::reduce(transformed, dplyr::full_join)
# Type one two
# 1 1 a a
# 2 2 a a
# 3 3 b b

Resources