How to pass a vector of column names in case_when - r

I am using case_when to summarise a data frame using rowwise in dplyr. I have a sample data frame as shown below
structure(list(A = c(NA, 1, 0, 0, 0, 0, 0), B = c(NA, 0, 0, 1,
0, 0, 0), C = c(NA, 1, 0, 0, 0, 0, 0), D = c(NA, 1, 0, 1, 0,
0, 1), E = c(NA, 1, 0, 1, 0, 0, 1)), row.names = c(NA, -7L), class = "data.frame")
The code works when I mention all the names
df %>%
rowwise() %>%
mutate(New = case_when(any(c(A,B,C,D,E) == 1) ~ 1,
all(c(A,B,C,D,E) == 0 ) ~ 0
))
Can I pass the names in a vector, e.g cols <- colnames(df), and then that in case_when

To answer your question you can use cur_data() in dplyr 1.0.0 or c_across()
library(dplyr)
df %>%
rowwise() %>%
mutate(New = case_when(any(cur_data() == 1) ~ 1,
all(cur_data() == 0 ) ~ 0))
# A B C D E New
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 NA NA NA NA NA NA
#2 1 0 1 1 1 1
#3 0 0 0 0 0 0
#4 0 1 0 1 1 1
#5 0 0 0 0 0 0
#6 0 0 0 0 0 0
#7 0 0 0 1 1 1
With c_across() :
df %>%
rowwise() %>%
mutate(New = case_when(any(c_across()== 1) ~ 1,
all(c_across()== 0 ) ~ 0))
But you can also solve this using rowSums :
df %>%
mutate(New = case_when(rowSums(. == 1, na.rm = TRUE) > 0 ~ 1,
rowSums(. == 0, na.rm = TRUE) == ncol(.) ~ 0))

If you only have 0's and 1's in your dataset you could use this
df$New <- ifelse(rowSums(df) > 0, 1, 0)
If the rowsum > 0 it means that at least one '1' is present. Output
A B C D E New
1 NA NA NA NA NA NA
2 1 0 1 1 1 1
3 0 0 0 0 0 0
4 0 1 0 1 1 1
5 0 0 0 0 0 0
6 0 0 0 0 0 0
7 0 0 0 1 1 1

In base R, we can do this with
df$New <- +( rowSums(df) > 0)

Related

Counts of factor levels for multiple variables grouped by row

I want to count the number of occurrences that a specific factor level occurs across multiple factor varaibles per row.
Simplified, I want to know how many times each factor level is chosen across specific variables per row (memberID).
Example data:
results=data.frame(MemID=c('A','B','C','D','E','F','G','H'),
value_a = c(1,2,1,4,5,1,4,0),
value_b = c(1,5,2,3,4,1,0,3),
value_c = c(3,5,2,1,1,1,2,1)
)
In this example, I want to know the frequency of each factor level for value_a and value_b for each MemID. How many times does A respond 1? How many times does A respond 2? Etc...for each level and for each MemID but only for value_a and value_b.
I would like the output to look something like this:
counts_by_level = data.frame(MemID=c('A','B','C','D','E','F','G','H'),
count_1 = c(2, 0, 1, 0, 0, 2, 0, 0),
count_2 = c(0, 1, 1, 0, 0, 0, 0, 0),
count_3 = c(0, 0, 0, 1, 0, 0, 0, 1),
count_4 = c(0, 0, 0, 1, 1, 0, 1, 0),
count_5 = c(0, 1, 0, 0, 1, 0, 0, 0))
I have been trying to use add_count or add_tally as well as table and searching other ways to answer this question. However, I am struggling to identify specific factor levels across multiple variables and then output new columns for the counts of those levels for each row.
You could do something like this. Note that you didn't include a zero count, but there are some zero selections.
library(tidyverse)
results |>
select(-value_c) |>
pivot_longer(cols = starts_with("value"),
names_pattern = "(value)") |>
mutate(count = 1) |>
select(-name) |>
pivot_wider(names_from = value,
values_from = count,
names_prefix = "count_",
values_fill = 0,
values_fn = sum)
#> # A tibble: 8 x 7
#> MemID count_1 count_2 count_5 count_4 count_3 count_0
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A 2 0 0 0 0 0
#> 2 B 0 1 1 0 0 0
#> 3 C 1 1 0 0 0 0
#> 4 D 0 0 0 1 1 0
#> 5 E 0 0 1 1 0 0
#> 6 F 2 0 0 0 0 0
#> 7 G 0 0 0 1 0 1
#> 8 H 0 0 0 0 1 1
Another solution:
results %>%
group_by(MemID, value_a, value_b) %>%
summarise(n=n()) %>%
pivot_longer(c(value_a,value_b)) %>%
group_by(MemID, value) %>%
summarise(n=sum(n)) %>%
pivot_wider(MemID,
names_from = value, names_sort = T, names_prefix = "count_",
values_from=n, values_fn=sum, values_fill = 0)

How do you make a new factor column based on other columns in r?

I have a data set that looks like this
ID Group 1 Group 2 Group 3 Group 4
1 1 0 1 0
2 0 1 1 1
3 1 1 0 0
.
.
.
100 0 1 0 1
I want to make another column lets say Group 5 where if the condition of Group 1 is 1 then Group 5 would be 1. If Group 2 = 1, then Group 5 = 2. If Group 3 = 1, then Group 5 = 3, and if Group 4 = 1, then Group 5 = 4. How do I do this?
I tried these lines of code, but I seem to be missing something.
Group5 <- data.frame(Group1, Group2, Group3, Group4, stringsAsFactors=FALSE)
df$Group5 <- with(finalmerge, ifelse(Group1 %in% c("1", "0"),
"1", ""))
Any advice would be helpful, thanks in advance.
You could use which.max(), and apply this to each row.
df["Group_5"] <- apply(df[, -1], 1, which.max)
Output:
ID Group_1 Group_2 Group_3 Group_4 Group_5
1 1 0 0 0 1 4
2 2 0 1 0 0 2
3 3 0 0 1 0 3
4 4 1 0 0 0 1
Input:
df = structure(list(ID = c(1, 2, 3, 4), Group_1 = c(0, 0, 0, 1), Group_2 = c(0,
1, 0, 0), Group_3 = c(0, 0, 1, 0), Group_4 = c(1, 0, 0, 0)), class = "data.frame", row.names = c(NA,
-4L))

how to add condition to mutate(across

I have df and I would like to calculate percentage (.x/.x[1] * 100 ) when row_number >2 and the first row in the same col is not 0. What should I do if we want to use mutate(across...? where and how can I add the part on .x[1]!=0?
mutate(across(.fns = ~ifelse(row_number() > 2 ... sprintf("%1.0f (%.2f%%)", .x, .x/.x[1] * 100), .x)))
df<-structure(list(Total = c(4, 2, 1, 1, 0, 0), `ELA` = c(0,
0, 0, 0, 0, 0), `Math` = c(4, 2, 1, 1, 0,
0), `PE` = c(0, 0, 0, 0, 0, 0)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
df %>%
mutate(across(
where(~.x[1] > 0),
~ifelse(
row_number() > 2,
sprintf("%1.0f (%.2f%%)", .x, .x/.x[1] * 100),
.x
)))
# # A tibble: 6 × 4
# Total ELA Math PE
# <chr> <dbl> <chr> <dbl>
# 1 4 0 4 0
# 2 2 0 2 0
# 3 1 (25.00%) 0 1 (25.00%) 0
# 4 1 (25.00%) 0 1 (25.00%) 0
# 5 0 (0.00%) 0 0 (0.00%) 0
# 6 0 (0.00%) 0 0 (0.00%) 0
Have a look at the ?across help page for more examples.

Subtracting each column from its previous one in a data frame

I have a very simple case here in which I would like to subtract each column from its previous one. As a matter of fact I am looking for a sliding subtraction as the first column stays as is and then the first one subtracts the second one and second one subtracts the third one and so on till the last column.
here is my sample data set:
structure(list(x = c(1, 0, 0, 0), y = c(1, 0, 1, 1), z = c(0,
1, 1, 1)), class = "data.frame", row.names = c(NA, -4L))
and my desired output:
structure(list(x = c(1, 0, 0, 0), y = c(0, 0, 1, 1), z = c(-1,
1, 0, 0)), class = "data.frame", row.names = c(NA, -4L))
I am personally looking for a solution with purrr family of functions. I also thought about slider but I'm not quite familiar with the latter one. So I would appreciate any help and idea with these two packages in advance. Thank you very much.
A simple dplyr only solution-
cur_data() inside mutate/summarise just creates a whole copy. So
just substract cur_data()[-ncol(.)] from cur_data()[-1]
with pmap_df you can do similar things
df <- structure(list(x = c(1, 0, 0, 0), y = c(1, 0, 1, 1), z = c(0,
1, 1, 1)), class = "data.frame", row.names = c(NA, -4L))
library(dplyr)
df %>%
mutate(cur_data()[-1] - cur_data()[-ncol(.)])
#> x y z
#> 1 1 0 -1
#> 2 0 0 1
#> 3 0 1 0
#> 4 0 1 0
similarly
pmap_dfr(df, ~c(c(...)[1], c(...)[-1] - c(...)[-ncol(df)]))
I think you are looking for pmap_df with lag to subtract the previous value.
library(purrr)
library(dplyr)
pmap_df(df, ~{x <- c(...);x - lag(x, default = 0)})
# A tibble: 4 x 3
# x y z
# <dbl> <dbl> <dbl>
#1 1 0 -1
#2 0 0 1
#3 0 1 0
#4 0 1 0
Verbose, but simple:
df %>%
select(x) %>%
bind_cols(df %>%
select(-1) %>%
map2_dfc(df %>%
select(-ncol(df)), ~.x -.y))
# x y z
#1 1 0 -1
#2 0 0 1
#3 0 1 0
#4 0 1 0
We can just do (no need of any packages)
cbind(df1[1], df1[-1] - df1[-ncol(df1)])
-output
x y z
1 1 0 -1
2 0 0 1
3 0 1 0
4 0 1 0
Or using dplyr
library(dplyr)
df1 %>%
mutate(.[-1] - .[-ncol(.)])

Conditional aggregating by column pairs in R

UPDATE : I've updated the example because it wasn't clear enough.
I am trying to aggregate in R columns of a dataframe based on a condition.
My dataframe looks like this:
df <- data.frame(year = rep(2005, 8),
id = 1:8,
crash_x = c(0, 2, 0, 0, 4, 0,1,2),
crash_y = c(1, 0, 0, 0, 0, 1,0,0),
crash_z = c(0, 0, 3, 1, 0, 0,0,0),
injured_x = c(0, 1, 0, 0, 3, 0,0,0),
injured_y = c(0, 0, 2, 1, 0, 0,1,2),
injured_z = c(3, 0, 0, 0, 0, 2, 0,0))
year id crash_x crash_y crash_z injured_x injured_y injured_z
2005 1 0 1 0 0 0 3
2005 2 2 0 0 1 0 0
2005 3 0 0 3 0 2 0
2005 4 0 0 1 0 1 0
2005 5 4 0 0 3 0 0
2005 6 0 1 0 0 0 2
2005 7 1 0 0 0 1 0
2005 8 2 0 0 0 2 0
I would like to sum the columns on the condition that the columns crash_ and injured_ that share the same suffix (x, y, or z) have numbers greater than 0 in the same rows, e.g., rows 1 and 6, rows 3 and 4, rows 2 and 5, rows 7 and 8, etc.
The output should look like:
year crash_x crash_y crash_z injured_x injured_y injured_z
2005 0 2 0 0 0 5
2005 6 0 0 4 0 0
2005 0 0 4 0 3 0
2005 3 0 0 0 3 0
Is this possible ? Thanks!!
This solution first creates a new column with the "pattern" of 0 and non-0 values:
df <- data.frame(year = rep(2005, 8),
id = 1:8,
crash_x = c(0, 2, 0, 0, 4, 0,1,2),
crash_y = c(1, 0, 0, 0, 0, 1,0,0),
crash_z = c(0, 0, 3, 1, 0, 0,0,0),
injured_x = c(0, 1, 0, 0, 3, 0,0,0),
injured_y = c(0, 0, 2, 1, 0, 0,1,2),
injured_z = c(3, 0, 0, 0, 0, 2, 0,0))
df %<>% unite("pattern", c(crash_x, crash_y, crash_z, injured_x, injured_y, injured_z), remove = FALSE) %>%
mutate(pattern = gsub("[1-9]", "1", pattern))
Then summarizes each column according to pattern group with dplyr:
df %>% group_by(pattern, year) %>%
summarise_at(vars(crash_x, crash_y, crash_z, injured_x, injured_y, injured_z), sum)
The easiest way is to reshape (base R variant):
library(reshape2)
d <- read.table(text = "year id crash_x crash_y crash_z injured_x injured_y injured_z
2005 1 0 1 0 0 0 3
2005 2 2 0 0 1 0 0
2005 3 0 0 3 0 2 0
2005 4 0 0 1 0 1 0
2005 5 4 0 0 3 0 0
2005 6 0 1 0 0 0 2", header = T, stringsAsFactors = F)
want <- melt(subset(d, select = -id), id.vars = "year", variable.name = "crash", value.name = "val")
want$postfix <- gsub("(^crash_)|(^injured_)", "", want$crash)
want <- aggregate(val ~ crash + year + postfix, want, sum)
dcast(want, year + postfix ~ crash, value.var = "val", fill = 0)
# year postfix crash_x crash_y crash_z injured_x injured_y injured_z
#1 2005 x 6 0 0 4 0 0
#2 2005 y 0 2 0 0 3 0
#3 2005 z 0 0 4 0 0 5

Resources