Fill cells dataframe based on multiple conditions - r

How to fill cells based on multiple conditions?
There are a lot of players (columns) in this game, but I only included 2 for the sake of this example. I want to loop over a lot of players.
Every row represents a game round.
Conditions:
IF player00[i] score = 0 &
IF lossallowed00[i] = "no"
THEN Fill flag00[i] with "FLAG"
df <-data.frame(
player001 = c(1,0,3),
player002 = c(1,0,5),
lossallowed001 = c("no", "yes", "no"),
lossallowed002 = c("no", "no", "yes"),
flag001 = NA,
flag002 = NA
)
#desired output:
#player001 player002 lossallowed001 lossallowed002 flag001 flag002
# 1 1 no no NA NA
# 0 0 yes no NA FLAG
# 3 5 no yes NA NA

If you use a method of reshaping to long format, splitting out the IDs based on the pattern of column names being variables made of letters and IDs being made of numbers, you can do the operation all at once in a couple lines and reshape back to wide. Using regex means you're not bound by either the number of players or the names of columns. I added an ID column for the games to differentiate rows; you could drop it afterward.
The reshaping itself is covered pretty extensively already (Reshaping multiple sets of measurement columns (wide format) into single columns (long format) for example) but is useful for problems that need to scale like this.
library(dplyr)
df %>%
tibble::rowid_to_column(var = "game") %>%
tidyr::pivot_longer(-game, names_to = c(".value", "num"),
names_pattern = "(^[a-z]+)(\\d+$)") %>%
mutate(flag = ifelse(player == 0 & lossallowed == "no", "FLAG", NA_character_)) %>%
tidyr::pivot_wider(id_cols = game, names_from = num, values_from = player:flag,
names_glue = "{.value}{num}")
#> # A tibble: 3 × 7
#> game player001 player002 lossallowed001 lossallowed002 flag001 flag002
#> <int> <dbl> <dbl> <chr> <chr> <chr> <chr>
#> 1 1 1 1 no no <NA> <NA>
#> 2 2 0 0 yes no <NA> FLAG
#> 3 3 3 5 no yes <NA> <NA>

A possible solution:
library(tidyverse)
df <-data.frame(player001 = c(1,0,3), player002 = c(1,0,5),lossallowed001 = c("no", "yes", "no"), loseallowed002 = c("no", "no", "yes"),flag001 = NA, flag002 = NA)
df %>%
rownames_to_column("id") %>%
mutate(across(where(is.numeric), as.character)) %>%
pivot_longer(cols = -id) %>%
group_by(str_extract(name, "\\d{3}$"), id) %>%
mutate(value = if_else(row_number() == 3 & first(value) == "0" &
nth(value, 2) == "no", "FLAG", value)) %>%
ungroup %>% select(name, value) %>%
pivot_wider(names_from = name, values_from = value, values_fn = list) %>%
unnest(cols = everything()) %>% type.convert(as.is = TRUE)
#> # A tibble: 3 × 6
#> player001 player002 lossallowed001 loseallowed002 flag001 flag002
#> <int> <int> <chr> <chr> <lgl> <chr>
#> 1 1 1 no no NA <NA>
#> 2 0 0 yes no NA FLAG
#> 3 3 5 no yes NA <NA>

You can do this. First reshape the data, and then add the column. Use bind_cols if you want the data to be merged back.
library(purrr)
library(dplyr)
map(set_names(paste0("00", 1:2)), ~ select(df, ends_with(.x))) %>%
map(., ~ mutate(., newcol = ifelse(.[[1]] == 0 & .[[2]] == "no", "FLAG", NA)))
$`001`
player001 lossallowed001 flag001 newcol
1 1 no NA NA
2 0 yes NA NA
3 3 no NA NA
$`002`
player002 loseallowed002 flag002 newcol
1 1 no NA <NA>
2 0 no NA FLAG
3 5 yes NA <NA>

Here's a solution in the tidyverse. While I arrived at this solution independently, this is likely a duplicate of #camille's solution here, which was posted shortly before mine.
library(tidyverse)
# ...
# Code to generate 'df'.
# ...
df %>%
# Index the matches.
mutate(match_id = row_number()) %>%
# Pivot to get a row for each player {001, 002, ...} and match.
pivot_longer(
# Target columns whose names end with a separate suffix of 3+ digits.
matches("^(.*\\D)(\\d{3,})$"),
names_pattern = "^(.*\\D)(\\d{3,})$",
# Index the players by their suffixes; and give each the following three columns:
# 'player' (score), 'lossallowed', and 'flag'.
names_to = c(".value", "player_id")
) %>%
# Flag the appropriate cases.
mutate(
flag = if_else(player == 0 & lossallowed == "no", "FLAG", NA_character_)
) %>%
# Return to original, wide format.
pivot_wider(
names_from = player_id,
values_from = !c(match_id, player_id),
names_glue = "{.value}{player_id}"
) %>%
arrange(match_id) %>% select(!match_id)

Related

extract valus of another dataframe if value of one column is partially match in R

Sorry I didn't clarify my question,
my aim is if dt$id %in% df$id , extract df$score add to new column at dt,
I have a dataframe like this :
df <- tibble(
score = c(2587,002,885,901,2587,3371,3372,002),
id = c("AR01.0","AR01.1","AR01.12","ERS02.00","ERS02.01","ERS02.02","QR01","QR01.03"))
And I have another dataframe like
dt <- tibble(
id = c("AR01","QR01","KVC"),
city = c("AM", "Bis","CHB"))
I want to mutate a new column "score"
I want to got output like below :
id
city
score
AR01
AM
2587/2/885
ERS02
Bis
901/3371
KVC
CHB
NA
or
id
city
score
score2
score3
AR01
AM
2587
2
885
ERS02
Bis
901
3371
NA
KVC
CHB
NA
NA
NA
I tried to use ifelse to achieve but always got error,
do any one can provide ideas? Thank you.
A simple left_join (after mutateing id values in df) is required:
library(dplyr)
library(stringr)
left_join(df %>% mutate(id = str_extract(id, "[\\w]+")), dt, by = "id") %>%
group_by(id) %>%
summarise(across(city,first),
score = paste(score, collapse = "/"))
# A tibble: 3 × 3
id city score
<chr> <chr> <chr>
1 AR01 AM 2587/2/885
2 ERS02 NA 901/2587/3371
3 QR01 Bis 3372/2
For the second solution you can use separate:
library(dyplr)
library(stringr)
library(tidyr)
left_join(df %>% mutate(id = str_extract(id, "[\\w]+")), dt, by = "id") %>%
group_by(id) %>%
summarise(across(city,first),
score = paste(score, collapse = "/")) %>%
separate(score,
into = paste("score", 1:3),
sep = "/" )
# A tibble: 3 × 5
id city `score 1` `score 2` `score 3`
<chr> <chr> <chr> <chr> <chr>
1 AR01 AM 2587 2 885
2 ERS02 NA 901 2587 3371
3 QR01 Bis 3372 2 NA
You could create groups by extracting everything before the . using sub to group_by on and merge the rows with paste separated with / and right_join them by id like this:
library(tibble)
df <- tibble(
score = c(2587,002,885,901,2587,3371,3372,002),
id = c("AR01.0","AR01.1","AR01.12","ERS02.00","ERS02.01","ERS02.02","QR01","QR01.03"))
dt <- tibble(
id = c("AR01","QR01","KVC"),
city = c("AM", "Bis","CHB"))
library(dplyr)
df %>%
mutate(id = sub('\\..*', "", id)) %>%
group_by(id) %>%
mutate(score = paste(score, collapse = '/')) %>%
distinct(id, .keep_all = TRUE) %>%
ungroup() %>%
right_join(., dt, by = 'id')
#> # A tibble: 3 × 3
#> score id city
#> <chr> <chr> <chr>
#> 1 2587/2/885 AR01 AM
#> 2 3372/2 QR01 Bis
#> 3 <NA> KVC CHB
Created on 2022-10-01 with reprex v2.0.2

Reshape () and modify_shape()

df <- data.frame(
code1 = c ("ZAZ","ZAZ","ZAZ","ZAZ","ZAZ","ZAZ","JOZ","JOZ","JOZ","JOZ","JOZ","JOZ","TSV","TSV"),
code2 = c("NAN","NAN","NAN","NAN","NAN","NAN","NAN","NAN","NAN","NAN","NAN","NAN","TSA","TSA"),
start = c("Date1.1","Date1.1","Date1.3","Date1.3","Date1.5","Date1.5","Date3.1","Date3.1","Date3.3","Date3.3","Date3.5","Date3.5","Date 5.1","Date 5.1"),
end = c("Date2.1","Date2.1","Date2.3","Date2.3","Date2.5","Date2.5","Date4.1","Date4.1","Date4.3","Date4.3","Date4.5","Date4.5","Date6.1","Date6.1"),
price = c(1,2,3,4,5,6,1,2,3,4,5,6,1,2))
I'm trying to achieve:
I have so far done:
df <- df %>%
group_by(code1, code2,start,end) %>%
slice_min(price) #%>%
group_modify()
df <- df[order(df$price),]
All well explained in the image but in brief:
To group by code1,code2,start,end and select smallest price for each
Reshape sending start,end,price to different columns (max 3 start,end,price per key code1,code2
I understand that this can be done within group_modify() but unsure how
Any help so much appreciated!
Brian
Here is one way using dplyr and tidyr libraries.
For each group (code1, code2, start and end) calculate the minimum value of price.
Create an index column for code1 and code2. This is to name start, end and price as start_1, start_2 etc.
Get the data in wide format using pivot_wider.
library(dplyr)
library(tidyr)
df %>%
group_by(code1, code2, start, end) %>%
summarise(price = min(price, na.rm = TRUE)) %>%
group_by(code1, code2) %>%
mutate(index = row_number()) %>%
ungroup() %>%
pivot_wider(names_from = index, values_from = c(start, end, price),
names_vary = "slowest")
# code1 code2 start_1 end_1 price_1 start_2 end_2 price_2 start_3 end_3 price_3
# <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <chr> <chr> <dbl>
#1 JOZ NAN Date3.1 Date4.1 1 Date3.3 Date4.3 3 Date3.5 Date4.5 5
#2 TSV TSA Date 5.1 Date6.1 1 NA NA NA NA NA NA
#3 ZAZ NAN Date1.1 Date2.1 1 Date1.3 Date2.3 3 Date1.5 Date2.5 5
Note that names_vary = "slowest" allows to have columns in an orderly fashion (start_1, end_1, price_1... instead of start_1, start_2 ..., end_1, end_2... etc. )
I guess you can try aggregate + reshape + ave (all from base R)
reshape(
transform(
aggregate(price ~ ., df, min),
id = ave(seq_along(price), code1, code2, FUN = seq_along)
),
direction = "wide",
idvar = c("code1", "code2"),
timevar = "id"
)
which gives
code1 code2 start.1 end.1 price.1 start.2 end.2 price.2 start.3 end.3
1 ZAZ NAN Date1.1 Date2.1 1 Date1.3 Date2.3 3 Date1.5 Date2.5
4 JOZ NAN Date3.1 Date4.1 1 Date3.3 Date4.3 3 Date3.5 Date4.5
7 TSV TSA Date5.1 Date6.1 1 <NA> <NA> NA <NA> <NA>
price.3
1 5
4 5
7 NA

Create a column to indicate the presence of a value in other columns

I have a dataset with symptoms (20+), the symptoms are categorized as Yes/No/Unknown. I would like to create a new column which indicates if the subject (ID) has no symptoms (I'm defining this as they have no symptoms with 'Yes').
I've got a sample dataset below and I can create a column as desired but it feels like there must be a better/cleaner way just using dplyr::mutate() rather than the filtering and joining that I'm doing?
library(dplyr)
test <- tibble(
ID = c(1:10),
col1 = sample(c("Yes", "No", "Unknown"), 10, replace = TRUE),
col2 = sample(c("Yes", "No", "Unknown"), 10, replace = TRUE),
col3 = sample(c("Yes", "No", "Unknown"), 10, replace = TRUE)
)
left_join(test, test %>%
filter_at(vars(col1:col3), any_vars(. == "Yes")) %>%
mutate(any_symptoms = "Yes") %>%
select(ID, any_symptoms),
by = "ID"
) %>%
mutate(any_symptoms = recode(any_symptoms, .missing = "No"))
#> # A tibble: 10 x 5
#> ID col1 col2 col3 any_symptoms
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 Unknown Unknown Unknown No
#> 2 2 Unknown No No No
#> 3 3 Yes Yes Unknown Yes
#> 4 4 No Unknown Unknown No
#> 5 5 No No Unknown No
#> 6 6 Unknown Yes Unknown Yes
#> 7 7 Yes Unknown Unknown Yes
#> 8 8 No No No No
#> 9 9 No Unknown Unknown No
#> 10 10 No No No No
Created on 2020-05-29 by the reprex package (v0.3.0)
You can use rowSums to check if you have more than 0 "yes" in a row.
test$any_symptoms <- c('No', 'Yes')[(rowSums(test[-1] == 'Yes') > 0) + 1]
You can also use this in dplyr pipes :
library(dplyr)
test %>% mutate(any_symptoms = c('No', 'Yes')[(rowSums(.[-1] == 'Yes') > 0) + 1])
Or using pmap from purrr
library(purrr)
test %>%
mutate(any_symptoms = c('No', 'Yes')[pmap_lgl(select(., starts_with('col')),
~any(c(...) == 'Yes')) + 1])
This should work:
test %>%
left_join(
test %>%
pivot_longer(-ID) %>%
group_by(ID) %>%
mutate(is_yes = value == "Yes") %>%
summarise(any_symptoms = ifelse(sum(is_yes) > 0, "Yes", "No"))
)
This works, but might be a bit annoying if you have 20 columns:
test %>% mutate(any_symptoms = case_when(grepl("Yes", paste(col1, col2, col3), fixed = TRUE) ~ "Yes", TRUE ~ "No"))

Rename a dataframe Column with text from within the column itself

Given a (simplified) dataframe with format
df <- data.frame(a = c(1,2,3,4),
b = c(4,3,2,1),
temp1 = c("-","-","-","foo: 3"),
temp2 = c("-","bar: 10","-","bar: 4")
)
a b temp1 temp2
1 4 - -
2 3 - bar: 10
3 2 - -
4 1 foo: 3 bar: 4
I need to rename all temp columns with the names contained within the column, My end goal is to end up with this:
a b foo bar
1 4 - -
2 3 - 10
3 2 - -
4 1 3 4
the df column names and the data contained within them will be unknown, however the columns that need changing will contain temp and the delimiter will always be a ":"
As such I can easily remove the name from within the columns using dplyr like this:
df <- df %>%
mutate_at(vars(contains("temp")), ~(substr(., str_locate(., ":")+1,str_length(.))))
but first I need to rename the columns based on some function method, that scans the column and returns the value(s) within it, ie.
rename_at(vars(contains("temp")), ~(...some function.....))
As per the example given there's no guarantee that specific rows will have data so I can't simply grab value from row 1
Any ideas welcome.
Thanks in advance
One possibility involving dplyr and tidyr could be:
df %>%
pivot_longer(names_to = "variables", values_to = "values", -c(a:b)) %>%
mutate(values = replace(values, values == "-", NA_character_)) %>%
separate(values, into = c("variables2", "values"), sep = ": ") %>%
group_by(variables) %>%
fill(variables2, .direction = "downup") %>%
ungroup() %>%
select(-variables) %>%
pivot_wider(names_from = "variables2", values_from = "values")
a b foo bar
<dbl> <dbl> <chr> <chr>
1 1 4 <NA> <NA>
2 2 3 <NA> 10
3 3 2 <NA> <NA>
4 4 1 3 4
If you want to further replace the NAs with -:
df %>%
pivot_longer(names_to = "variables", values_to = "values", -c(a:b)) %>%
mutate(values = replace(values, values == "-", NA_character_)) %>%
separate(values, into = c("variables2", "values"), sep = ": ") %>%
group_by(variables) %>%
fill(variables2, .direction = "downup") %>%
ungroup() %>%
select(-variables) %>%
pivot_wider(names_from = "variables2", values_from = "values") %>%
mutate_at(vars(-a, -b), ~ replace_na(., "-"))
a b foo bar
<dbl> <dbl> <chr> <chr>
1 1 4 - -
2 2 3 - 10
3 3 2 - -
4 4 1 3 4
This will do the job:
colnames(df)[which(grepl("temp", colnames(df)))] <- unique(unlist(sapply(df[,grepl("temp", colnames(df))],
function(x){gsub("[:].*",
"",
grep("\\w+",
x,
value = TRUE))})))

Compute row-wise counts in subsets of columns in dplyr

I want to count the number of instances of some text (or factor level) row wise, across a subset of columns using dplyr.
Here's the input:
> input_df
num_col_1 num_col_2 text_col_1 text_col_2
1 1 4 yes yes
2 2 5 no yes
3 3 6 no <NA>
And here's the desired output:
> output_df
num_col_1 num_col_2 text_col_1 text_col_2 sum_yes
1 1 4 yes yes 2
2 2 5 no yes 1
3 3 6 no <NA> 0
In sum_yes we have counted the number of "yes" in that row.
I have tried two methods:
Attempted solution 1:
text_cols = c("text_col_1","text_col_2")
df = input_df %>% mutate(sum_yes = rowSums( select(text_cols) == "yes" ), na.rm = TRUE)
Errors with:
Error in mutate_impl(.data, dots) :
Evaluation error: no applicable method for 'select_' applied to an object of class "character".
Attempted solution 2:
text_cols = c("text_col_1","text_col_2")
df = input_df %>% select(text_cols) %>% rowsum("yes", na.rm = TRUE)
Errors with:
Error in rowsum.data.frame(., "yes", na.rm = TRUE) :
incorrect length for 'group'
We can use mutate and take sum of number of "yes" for each row.
library(dplyr)
df %>% mutate(sum_yes = rowSums(.[text_cols] == "yes"))
# num_col_1 num_col_2 text_col_1 text_col_2 sum_yes
#* <int> <int> <fct> <fct> <int>
#1 1 4 yes yes 2
#2 2 5 no yes 1
#3 3 6 no <NA> 0
Inspired from this answer.
rowwise with c_across :
df %>%
rowwise() %>%
mutate(sum_yes = sum(c_across(all_of(text_cols)) == "yes"))
do with rowwise
df %>%
rowwise() %>%
do((.) %>% as.data.frame %>%
mutate(sum_yes = sum(.=="yes")))
without do and rowwise
df %>%
select(text_cols) %>%
mutate(sum_yes = rowSums(. == "yes"))
In base R, it is actually more simple
df$sum_yes <- rowSums(df[text_cols] == "yes")
We can also use reduce with map
library(tidyverse)
df %>%
select(text_cols) %>%
map(~ .x == "yes" & !is.na(.x)) %>%
reduce(`+`) %>%
bind_cols(df, sum_yes = .)
# num_col_1 num_col_2 text_col_1 text_col_2 sum_yes
#1 1 4 yes yes 2
#2 2 5 no yes 1
#3 3 6 no <NA> 0

Resources