Get frequency of values for multiple columns using dplyr? - r

I have a dataframe that looks like this:
a b c d e
1 0 0 1 1
.5 1 1 0 1
1 1. .5 .5. 0
0 0 1 NA 1
0 1 0 1 .5
I am looking for an output like:
col val count
a 1 2
.5 1
0 2
b 1 3
0 2
c 1 2
.5 1
0 2
d 1 2
.5 1
0 1
NA 1
e 1 3
.5 1
0 1
I have tried using
data %>%
summarize_at(colnames(data)), n(), na.rm = TRUE)
but this doesn't give me what I want. Any suggestions greatly appreciated, thank you!

I've assumed column d row 3 is a typo and .5. really is 0.5, in which case you could do the following:
library(tidyr)
library(dplyr)
df %>%
pivot_longer(everything()) %>%
group_by(name, value) %>%
summarise(count = n()) %>%
arrange(name, desc(value))
# or more succinctly as pointed out by #LMc
df %>%
pivot_longer(everything()) %>%
count(name, value) %>%
arrange(name, desc(value))
#> # A tibble: 15 x 3
#> name value count
#> <chr> <dbl> <int>
#> 1 a 1 2
#> 2 a 0.5 1
#> 3 a 0 2
#> 4 b 1 3
#> 5 b 0 2
#> 6 c 1 2
#> 7 c 0.5 1
#> 8 c 0 2
#> 9 d 1 2
#> 10 d 0.5 1
#> 11 d 0 1
#> 12 d NA 1
#> 13 e 1 3
#> 14 e 0.5 1
#> 15 e 0 1
data
df <- structure(list(a = c(1, 0.5, 1, 0, 0), b = c(0, 1, 1, 0, 1),
c = c(0, 1, 0.5, 1, 0), d = c(1, 0, 0.5, NA, 1),
e = c(1, 1, 0, 1, 0.5)), class = "data.frame", row.names = c(NA,
-5L))
Created on 2021-04-13 by the reprex package (v2.0.0)

Related

R: Add numeric sequence including negative values starting from middle of group

An example dataframe with 2 columns:
groupID <- c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3)
index_ad <- c( 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0)
df <- data.frame(groupID, index_ad)
I want to add another column with a sequence for each group starting at the row where index_ad = 1 and then adding sequential positive/negative numbers depending on whether the row comes before or after the row where index_ad = 1.
ep_id <- c(0, 1, 2, 3, -2, -1, 0, 1, 2, -1, 0, 1, 2)
df1 <- data.frame(groupID, index_ad, ep_id)
I've tried using row_number, but that always starts from the first row in each group.
df <- df %>% group_by(groupID) %>% mutate(ep_num = row_number()) %>% ungroup()
The real dataset has >10,000 rows and multiple other variables including date/times. The groups are arranged/sorted by date/time and the 'index_ad' variable refers to whether the case/row should be considered the index case for that group. All cases/rows before the index case have date/times that occurred before it and all cases/rows after it have date/times that occurred after it.
Please help me figure out how to add the 'ep_id' numeric sequence using R! Thankyou!
You can try
library(dplyr)
df |> group_by(groupID) |> mutate(ep_id = 1:n() - which(index_ad == 1))
output
# A tibble: 13 × 3
# Groups: groupID [3]
groupID index_ad ep_id
<dbl> <dbl> <int>
1 1 1 0
2 1 0 1
3 1 0 2
4 1 0 3
5 2 0 -2
6 2 0 -1
7 2 1 0
8 2 0 1
9 2 0 2
10 3 0 -1
11 3 1 0
12 3 0 1
13 3 0 2
df %>%
group_by(groupID) %>%
mutate(row = row_number(),
ep_num = row - row[index_ad == 1]) %>%
ungroup()
# A tibble: 13 × 4
groupID index_ad row ep_num
<dbl> <dbl> <int> <int>
1 1 1 1 0
2 1 0 2 1
3 1 0 3 2
4 1 0 4 3
5 2 0 1 -2
6 2 0 2 -1
7 2 1 3 0
8 2 0 4 1
9 2 0 5 2
10 3 0 1 -1
11 3 1 2 0
12 3 0 3 1
13 3 0 4 2
Here is a way. Subtract which index row is equal to 1 from the row number to get the result.
groupID <- c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3)
index_ad <- c( 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0)
df <- data.frame(groupID, index_ad)
suppressPackageStartupMessages(library(dplyr))
df %>%
group_by(groupID) %>%
mutate(ep_num = row_number(),
ep_num = ep_num - which(index_ad == 1)) %>%
ungroup()
#> # A tibble: 13 × 3
#> groupID index_ad ep_num
#> <dbl> <dbl> <int>
#> 1 1 1 0
#> 2 1 0 1
#> 3 1 0 2
#> 4 1 0 3
#> 5 2 0 -2
#> 6 2 0 -1
#> 7 2 1 0
#> 8 2 0 1
#> 9 2 0 2
#> 10 3 0 -1
#> 11 3 1 0
#> 12 3 0 1
#> 13 3 0 2
Created on 2022-08-12 by the reprex package (v2.0.1)
I have coded the mutate above in two lines to make it clearer but it can be simplified to
df %>%
group_by(groupID) %>%
mutate(ep_num = row_number() - which(index_ad == 1)) %>%
ungroup()

how to mutate new variables with different conditions in r

Say I have a df.
df = data.frame(status = c(1, 0, 0, 0, 1, 0, 0, 0),
stratum = c(1,1,1,1, 2,2,2,2),
death = 1:8)
> df
status stratum death
1 1 1 1
2 0 1 2
3 0 1 3
4 0 1 4
5 1 2 5
6 0 2 6
7 0 2 7
8 0 2 8
I want to mutate a new variable named weights. And it should meet the following conditions:
weights should be mutated in stratum group.
the weights value should return death value when the status is 1.
What I expected should like this:
df_wanted = data.frame(status = c(1, 0, 0, 0, 1, 0, 0, 0),
stratum = c(1,1,1,1, 2,2,2,2),
death = 1:8,
weights = c(1,1,1,1, 5,5,5,5))
> df_wanted
status stratum death weights
1 1 1 1 1
2 0 1 2 1
3 0 1 3 1
4 0 1 4 1
5 1 2 5 5
6 0 2 6 5
7 0 2 7 5
8 0 2 8 5
I do not know how to write the code.
Any help will be highly appreciated!
You may get the death value where status = 1.
library(dplyr)
df %>%
group_by(stratum) %>%
mutate(weights = death[status == 1]) %>%
ungroup
The above works because there is exactly 1 value in each group where status = 1. If there are 0 or more than 1 value in a group where status = 1 thann a better option is to use match which will return NA for 0 value and return the 1st death value for more than 1 value.
df %>%
group_by(stratum) %>%
mutate(weights = death[match(1, status)]) %>%
ungroup
# status stratum death weights
# <dbl> <dbl> <int> <int>
#1 1 1 1 1
#2 0 1 2 1
#3 0 1 3 1
#4 0 1 4 1
#5 1 2 5 5
#6 0 2 6 5
#7 0 2 7 5
#8 0 2 8 5

Sum rows by group with condition in R

I have a dataset in R like this one:
and I want to keep the same dataset with adding a column that gives the sum rows by ID when A=B=1.
This is the required dataset:
I tried the following R code but it doesn't give the result I want:
library(dplyr)
data1<-data%>% group_by(ID) %>%
mutate(result=case_when(A==1 & B==1 ~ sum(A),TRUE ~ 0)) %>% ungroup()
Not as neat and clean , but still:
data %>%
mutate(row_sum = apply(across(A:B), 1, sum)) %>%
group_by(ID) %>%
mutate(result = sum(row_sum == 2)) %>%
ungroup() %>%
select(-row_sum)
which gives:
# A tibble: 10 x 4
ID A B result
<dbl> <dbl> <dbl> <int>
1 1 1 0 3
2 1 1 1 3
3 1 0 1 3
4 1 0 0 3
5 1 1 1 3
6 1 1 1 3
7 2 1 0 2
8 2 1 1 2
9 2 1 1 2
10 2 0 0 2
After grouping by 'ID', multiply the 'A' with 'B' (0 values in B returns 0 in A) and then get the sum
library(dplyr)
data %>%
group_by(ID) %>%
mutate(result = sum(A*B)) %>%
ungroup
-output
# A tibble: 10 × 4
ID A B result
<dbl> <dbl> <dbl> <dbl>
1 1 1 0 3
2 1 1 1 3
3 1 0 1 3
4 1 0 0 3
5 1 1 1 3
6 1 1 1 3
7 2 1 0 2
8 2 1 1 2
9 2 1 1 2
10 2 0 0 2
data
data <- structure(list(ID = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2), A = c(1,
1, 0, 0, 1, 1, 1, 1, 1, 0), B = c(0, 1, 1, 0, 1, 1, 0, 1, 1,
0)), class = "data.frame", row.names = c(NA, -10L))

Conditional replacing of a numeric value in dplyr

Dear all I have a data frame that looks like this
df <- data.frame(time=c(1,2,3,4,1,2,3,4,5), type=c("A","A","A","A","B","B","B","B","B"), count=c(10,0,0,1,8,0,1,0,1))
df
time type count
1 1 A 10
2 2 A 0
3 3 A 0
4 4 A 1
5 1 B 8
6 2 B 0
7 3 B 1
8 4 B 0
9 5 B 1
I want to examine each group of types and if I see that one count is 0 then to replace the next count forward in time with 0. I do not count to be resurrected from the zero.
I want my data to looks like this
time type count
1 1 A 10
2 2 A 0
3 3 A 0
4 4 A 0
5 1 B 8
6 2 B 0
7 3 B 0
8 4 B 0
9 5 B 0
If I understood correctly
library(tidyverse)
df <-
data.frame(
time = c(1, 2, 3, 4, 1, 2, 3, 4, 5),
type = c("A", "A", "A", "A", "B", "B", "B", "B", "B"),
count = c(10, 0, 0, 1, 8, 0, 1, 0, 1)
)
df %>%
group_by(type) %>%
mutate(count = if_else(lag(count, default = first(count)) == 0, 0, count))
#> # A tibble: 9 x 3
#> # Groups: type [2]
#> time type count
#> <dbl> <chr> <dbl>
#> 1 1 A 10
#> 2 2 A 0
#> 3 3 A 0
#> 4 4 A 0
#> 5 1 B 8
#> 6 2 B 0
#> 7 3 B 0
#> 8 4 B 0
#> 9 5 B 0
Created on 2021-09-10 by the reprex package (v2.0.1)
You may use cummin function.
library(dplyr)
df %>% group_by(type) %>% mutate(count = cummin(count))
# time type count
# <dbl> <chr> <dbl>
#1 1 A 10
#2 2 A 0
#3 3 A 0
#4 4 A 0
#5 1 B 8
#6 2 B 0
#7 3 B 0
#8 4 B 0
#9 5 B 0
Since cummin is a base R function you may also implement it in base R -
transform(df, count = ave(count, type, FUN = cummin))

Change value by group based in reference within group

given a table with defined groups where within each group
I have just 1 reference (query) I'd like to change all values of a column
based in value of the reference.
This values are just 1 or -1.
The idea is:
- if reference is equal to 1 so keep all values as it are
- but if reference is -1, so all values should be multiplied by -1, so that way reference became to be 1 and the items with value 1 became to be -1
- Also, modified groups should have opposite order
I'm trying to do this way:
library(tidyverse)
item <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l")
grou <- c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)
quer <- c(0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0)
dir <- c(1, 1, 1, -1, 1, 1, 1, -1, 1, -1, -1, -1)
ds <- tibble(item = item,
group = grou,
query = quer,
direction = dir)
ds %>%
group_by(group) %>%
mutate(
direction = ifelse(
direction[query == 1] == 1, direction, (-1 * direction)
)
)
So this
# A tibble: 12 x 5
# Groups: group [4]
item group query direction
<chr> <dbl> <dbl> <dbl>
1 a 1 0 1
2 b 1 1 1
3 c 1 0 1
4 d 2 0 -1
5 e 2 1 1
6 f 2 0 1
7 g 3 0 1
8 h 3 1 -1
9 i 3 0 1
10 j 4 0 -1
11 k 4 1 -1
12 l 4 0 -1
Should became this
# A tibble: 12 x 5
# Groups: group [4]
item group query direction
<chr> <dbl> <dbl> <dbl>
1 a 1 0 1
2 b 1 1 1
3 c 1 0 1
4 d 2 0 -1
5 e 2 1 1
6 f 2 0 1
7 i 3 0 -1
8 h 3 1 1
9 g 3 0 -1
10 l 4 0 1
11 k 4 1 1
12 j 4 0 1
But it is not working.
Thanks in advance
Here is a way to do it:
ds %>%
rowid_to_column("id") %>%
group_by(group) %>%
mutate(tmp = max(query * direction) - 0.5,
direction = tmp * 2 * direction) %>%
arrange(id * tmp, .by_group = TRUE) %>%
select(-c(id, tmp))
The result:
# A tibble: 12 x 4
# Groups: group [4]
item group query direction
<chr> <dbl> <dbl> <dbl>
1 a 1 0 1
2 b 1 1 1
3 c 1 0 1
4 d 2 0 -1
5 e 2 1 1
6 f 2 0 1
7 i 3 0 -1
8 h 3 1 1
9 g 3 0 -1
10 l 4 0 1
11 k 4 1 1
12 j 4 0 1

Resources