how to build a string variable to capture muti cols info - r

I have a df that looks like this:
It can be build using codes:
structure(list(ID = c(1, 2, 3, 4, 5), Pass = c(0, 1, 1, 1, 1),
Math = c(0, 0, 1, 1, 1), ELA = c(0, 1, 0, 1, 0), PE = c(0,
0, 1, 1, 1)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
Where pass stand for a student pass any test or not. Now I want to build a new var Result to capture a student's test results like following, what should I do?

Try the base R code below
q <- with(data.frame(which(df[-(1:2)] == 1, arr.ind = TRUE)),
tapply(names(df[-(1:2)])[col], factor(row, levels = 1:nrow(df)), toString))
df$Result <- ifelse(is.na(q), "Not Pass", paste0("Pass: ", q))
which gives
> df
# A tibble: 5 x 6
ID Pass Math ELA PE Result
<dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0 0 0 0 Not Pass
2 2 1 0 1 0 Pass: ELA
3 3 1 1 0 1 Pass: Math, PE
4 4 1 1 1 1 Pass: Math, ELA, PE
5 5 1 1 0 1 Pass: Math, PE

Using dplyr with rowwise
library(dplyr)
library(stringr)
df1 %>%
rowwise %>%
mutate(Result = if(as.logical(Pass))
str_c('Pass: ', toString(names(select(., Math:PE))[as.logical(c_across(Math:PE))])) else 'Not pass' ) %>%
ungroup
# A tibble: 5 x 6
# ID Pass Math ELA PE Result
# <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#1 1 0 0 0 0 Not pass
#2 2 1 0 1 0 Pass: ELA
#3 3 1 1 0 1 Pass: Math, PE
#4 4 1 1 1 1 Pass: Math, ELA, PE
#5 5 1 1 0 1 Pass: Math, PE
data
df1 <- structure(list(ID = c(1, 2, 3, 4, 5), Pass = c(0, 1, 1, 1, 1),
Math = c(0, 0, 1, 1, 1), ELA = c(0, 1, 0, 1, 0), PE = c(0,
0, 1, 1, 1)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))

Here's one solution:
library(dplyr)
library(magrittr)
library(stringr)
df <- structure(list(ID = c(1, 2, 3, 4, 5), Pass = c(0, 1, 1, 1, 1),
Math = c(0, 0, 1, 1, 1), ELA = c(0, 1, 0, 1, 0), PE = c(0,
0, 1, 1, 1)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
df %<>% pivot_longer(cols = -c(ID, Pass), names_to = "sub", values_to = "done")
df %<>% group_by(ID) %>% mutate(Result = paste0(ifelse(done == 1, sub, NA), collapse = ", ")) %>% ungroup()
df %<>% pivot_wider(names_from = sub, values_from = done)
df %<>% mutate(Result = paste0("Pass: ", str_replace_all(Result, "NA[, ]*", "")))
df %<>% mutate(Result = ifelse(str_detect(Result, "Pass: $"), "Not pass", str_replace_all(Result, ",[\\s]*$", "")))
df
# # A tibble: 5 x 6
# ID Pass Result Math ELA PE
# <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
# 1 1 0 Not pass 0 0 0
# 2 2 1 Pass: ELA 0 1 0
# 3 3 1 Pass: Math, PE 1 0 1
# 4 4 1 Pass: Math, ELA, PE 1 1 1
# 5 5 1 Pass: Math, PE 1 0 1
I can provide an explanation of what the code is doing if necessary.

Related

manipulate a pair data in R

I would like to reshape the data sample below, so that to get the output like in the table. How can I reach to that? the idea is to split the column e into two columns according to the disease. Those with disease 0 in one column and those with disease 1 in the other column. thanks in advance.
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), fid = c(1,
1, 2, 2, 3, 3, 4, 4, 5, 5), disease = c(0, 1, 0, 1, 1, 0, 1, 0, 0,
1), e = c(3, 2, 6, 1, 2, 5, 2, 3, 1, 1)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L))
library(tidyverse)
df %>%
pivot_wider(fid, names_from = disease, values_from = e, names_prefix = 'e') %>%
select(-fid)
e0 e1
<dbl> <dbl>
1 3 2
2 6 1
3 5 2
4 3 2
5 1 1
if you want the e1,e2 you could do:
df %>%
pivot_wider(fid, names_from = disease, values_from = e,
names_glue = 'e{disease + 1}') %>%
select(-fid)
# A tibble: 5 x 2
e1 e2
<dbl> <dbl>
1 3 2
2 6 1
3 5 2
4 3 2
5 1 1
We could use lead() combined with ìfelse statements for this:
library(dplyr)
df %>%
mutate(e2 = lead(e)) %>%
filter(row_number() %% 2 == 1) %>%
mutate(e1 = ifelse(disease==1, e2,e),
e2 = ifelse(disease==0, e2,e)) %>%
select(e1, e2)
e1 e2
<dbl> <dbl>
1 3 2
2 6 1
3 5 2
4 3 2
5 1 1

how to add condition to mutate(across

I have df and I would like to calculate percentage (.x/.x[1] * 100 ) when row_number >2 and the first row in the same col is not 0. What should I do if we want to use mutate(across...? where and how can I add the part on .x[1]!=0?
mutate(across(.fns = ~ifelse(row_number() > 2 ... sprintf("%1.0f (%.2f%%)", .x, .x/.x[1] * 100), .x)))
df<-structure(list(Total = c(4, 2, 1, 1, 0, 0), `ELA` = c(0,
0, 0, 0, 0, 0), `Math` = c(4, 2, 1, 1, 0,
0), `PE` = c(0, 0, 0, 0, 0, 0)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
df %>%
mutate(across(
where(~.x[1] > 0),
~ifelse(
row_number() > 2,
sprintf("%1.0f (%.2f%%)", .x, .x/.x[1] * 100),
.x
)))
# # A tibble: 6 × 4
# Total ELA Math PE
# <chr> <dbl> <chr> <dbl>
# 1 4 0 4 0
# 2 2 0 2 0
# 3 1 (25.00%) 0 1 (25.00%) 0
# 4 1 (25.00%) 0 1 (25.00%) 0
# 5 0 (0.00%) 0 0 (0.00%) 0
# 6 0 (0.00%) 0 0 (0.00%) 0
Have a look at the ?across help page for more examples.

Subtracting each column from its previous one in a data frame

I have a very simple case here in which I would like to subtract each column from its previous one. As a matter of fact I am looking for a sliding subtraction as the first column stays as is and then the first one subtracts the second one and second one subtracts the third one and so on till the last column.
here is my sample data set:
structure(list(x = c(1, 0, 0, 0), y = c(1, 0, 1, 1), z = c(0,
1, 1, 1)), class = "data.frame", row.names = c(NA, -4L))
and my desired output:
structure(list(x = c(1, 0, 0, 0), y = c(0, 0, 1, 1), z = c(-1,
1, 0, 0)), class = "data.frame", row.names = c(NA, -4L))
I am personally looking for a solution with purrr family of functions. I also thought about slider but I'm not quite familiar with the latter one. So I would appreciate any help and idea with these two packages in advance. Thank you very much.
A simple dplyr only solution-
cur_data() inside mutate/summarise just creates a whole copy. So
just substract cur_data()[-ncol(.)] from cur_data()[-1]
with pmap_df you can do similar things
df <- structure(list(x = c(1, 0, 0, 0), y = c(1, 0, 1, 1), z = c(0,
1, 1, 1)), class = "data.frame", row.names = c(NA, -4L))
library(dplyr)
df %>%
mutate(cur_data()[-1] - cur_data()[-ncol(.)])
#> x y z
#> 1 1 0 -1
#> 2 0 0 1
#> 3 0 1 0
#> 4 0 1 0
similarly
pmap_dfr(df, ~c(c(...)[1], c(...)[-1] - c(...)[-ncol(df)]))
I think you are looking for pmap_df with lag to subtract the previous value.
library(purrr)
library(dplyr)
pmap_df(df, ~{x <- c(...);x - lag(x, default = 0)})
# A tibble: 4 x 3
# x y z
# <dbl> <dbl> <dbl>
#1 1 0 -1
#2 0 0 1
#3 0 1 0
#4 0 1 0
Verbose, but simple:
df %>%
select(x) %>%
bind_cols(df %>%
select(-1) %>%
map2_dfc(df %>%
select(-ncol(df)), ~.x -.y))
# x y z
#1 1 0 -1
#2 0 0 1
#3 0 1 0
#4 0 1 0
We can just do (no need of any packages)
cbind(df1[1], df1[-1] - df1[-ncol(df1)])
-output
x y z
1 1 0 -1
2 0 0 1
3 0 1 0
4 0 1 0
Or using dplyr
library(dplyr)
df1 %>%
mutate(.[-1] - .[-ncol(.)])

How to convert indicator columns to a concatenated column (of column names)

I have 3 columns consisting of indicator (0/1)
icols <-
structure(list(delivery_group = c(0, 1, 1, 0, 0), culturally_tailored = c(0,
0, 1, 0, 1), integrated_intervention = c(1, 0, 0, 0, 0)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L))
I would like to return a single character column 'qualifiers', such that column names with indicator == 1 are concatenated in a string as below:
*qualifiers*
integrated_intervention
delivery_group
delivery_group, culturally_tailored
culturally_tailored
I tried extdplyr::ind (with various options) without success. The one below crashed my R session.
icols <- extdplyr::ind_to_char(col = qualifiers, ret_factor = FALSE, remove = TRUE,
from = c("delivery_group", "culturally_tailored", "integrated_intervention"),
mutually_exclusive = FALSE, collectively_exhaustive = FALSE)
I found Convert Boolean indicator columns to a single factor column, but thought there might be a simpler solution.
You can try:
icols$collapsed <- apply(icols, 1, function(x) paste0(names(icols)[x == 1], collapse = ", "))
icols
delivery_group culturally_tailored integrated_intervention collapsed
1 0 0 1 integrated_intervention
2 1 0 0 delivery_group
3 1 1 0 delivery_group, culturally_tailored
4 0 0 0
5 0 1 0 culturally_tailored
Or, even more compactly as Maurits suggested:
apply(icols, 1, function(x) toString(names(icols)[x == 1]))
I'm not sure this is a "simple" solution, but here is a solution using the tidyverse.
library(tidyverse)
icols <- tibble(
delivery_group = c(0, 1, 1, 0, 0),
culturally_tailored = c(0, 0, 1, 0, 1),
integrated_intervention = c(1, 0, 0, 0, 0)
)
icols %>%
rowid_to_column(var = "rowid") %>%
gather(key = "qualifiers", value = "indicator", -rowid) %>%
filter(indicator == 1) %>%
group_by(rowid) %>%
summarize(qualifiers = paste(qualifiers, collapse = ", ")) %>%
ungroup() %>%
complete(rowid = 1:nrow(icols)) %>%
select(qualifiers)
#> # A tibble: 5 x 1
#> qualifiers
#> <chr>
#> 1 integrated_intervention
#> 2 delivery_group
#> 3 delivery_group, culturally_tailored
#> 4 <NA>
#> 5 culturally_tailored
Created on 2019-02-27 by the reprex package (v0.2.1)
Here's a crazy way:
library(tidyverse)
icols %>%
mutate(qualifiers = case_when(
delivery_group & culturally_tailored == 1 ~ "delivery_group, culturally_tailored",
delivery_group & integrated_intervention == 1 ~ "delivery_group, integrated_intervation",
culturally_tailored & integrated_intervention == 1 ~ "culturally_tailored, integrated_intervation",
culturally_tailored == 1 ~ "culturally_tailored",
integrated_intervention == 1 ~ "integrated_intervention",
delivery_group == 1 ~ "delivery_group"))
# A tibble: 5 x 4
delivery_group culturally_tailored integrated_intervention qualifiers
<dbl> <dbl> <dbl> <chr>
1 0 0 1 integrated_intervention
2 1 0 0 delivery_group
3 1 1 0 delivery_group, culturally_tailored
4 0 0 0 NA
5 0 1 0 culturally_tailored

How do you use dot referencing (for the data frame) and groups in dplyr?

Consider the following example data:
tmp_df_dplyr <- data.frame(groups = rep(c("C", "B", "A"), each = 3),
a = c(-2, 0, -1, -1, 0, 1, 0, 1, 2),
b = rep(c(-1, 0, 1), each = 3))
I wish to do the following, except using colSums:
tmp_df_dplyr %>%
group_by(groups) %>%
summarise(min_group = min(c(sum(a), sum(b))))
# produces:
# A tibble: 3 × 2
groups min_group
<fctr> <dbl>
1 A 3
2 B 0
3 C -3
Using dot referencing, I get an unexpected result:
tmp_df_dplyr %>%
group_by(groups) %>%
summarise(min_group = min(colSums(.[, c('a', 'b')])))
# produces
# A tibble: 3 × 2
groups min_group
<fctr> <dbl>
1 A 0
2 B 0
3 C 0
that is, it looks like the groups are not being applied.

Resources