Dplyr rolling balance - r

I am trying to compute a balance column.
So, to show an example, I want to go from this:
df <- data.frame(group = c("A", "A", "A", "A", "A"),
start = c(5, 0, 0, 0, 0),
receipt = c(1, 5, 6, 4, 6),
out = c(4, 5, 3, 2, 5))
> df
group start receipt out
1 A 5 1 4
2 A 0 5 5
3 A 0 6 3
4 A 0 4 2
5 A 0 6 5
to creating a new balance column like the following
> dfb
group start receipt out balance
1 A 5 1 4 2
2 A 0 5 5 2
3 A 0 6 3 5
4 A 0 4 2 7
5 A 0 6 5 8
I tried the following attempt but it isn't working
dfc <- df %>%
group_by(group) %>%
mutate(balance = if_else(row_number() == 1, start + receipt - out, (lag(balance) + receipt) - out)) %>%
ungroup()
Would really appreciate some help with this. Thanks!

You could use cumsum from dplyr. Note: I had to change your initial df table to match the one in your required result because you have different data in "out".
df <- data.frame(group = c("A", "A", "A", "A", "A"),
start = c(5, 0, 0, 0, 0),
receipt = c(1, 5, 6, 4, 6),
out = c(4, 5, 3, 2, 5))
dfc <- df %>%
group_by(group) %>%
mutate(balance=cumsum(start+receipt-out))
Source: local data frame [5 x 5]
Groups: group [1]
group start receipt out balance
<fctr> <dbl> <dbl> <dbl> <dbl>
1 A 5 1 4 2
2 A 0 5 5 2
3 A 0 6 3 5
4 A 0 4 2 7
5 A 0 6 5 8

Related

R Create a table of like against like

Could someone help me with this transformation in R? I would like to transform
this table
ID
Condition
Count
1
A
1
1
B
0
2
A
1
2
B
1
3
A
0
3
B
1
4
A
1
4
B
1
5
A
1
5
B
1
6
A
1
6
B
0
7
A
0
7
B
1
8
A
0
9
B
0
into this table
To create a table of like-against like
A
B
Count of ID
1
0
2
0
0
1
1
1
3
0
1
2
Any help would be appreciated. Thank you.
Phil,
You can do:
with(dat, split(Count, Condition)) |>
table() |>
data.frame()
A B Freq
1 0 0 1
2 1 0 2
3 0 1 2
4 1 1 3
Data:
dat <- structure(list(ID = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7,
7, 8, 9), Condition = c("A", "B", "A", "B", "A", "B", "A", "B",
"A", "B", "A", "B", "A", "B", "A", "B"), Count = c(1, 0, 1, 1,
0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0)), class = "data.frame", row.names = c(NA,
-16L))
Here is a tidyverse solution. I filled missing values with 0, please note that this leads to a different count than in your table (do you mean to have 8, 8 as the last two IDs and not 8, 9?):
data <- read.table(text = "ID Condition Count
1 A 1
1 B 0
2 A 1
2 B 1
3 A 0
3 B 1
4 A 1
4 B 1
5 A 1
5 B 1
6 A 1
6 B 0
7 A 0
7 B 1
8 A 0
9 B 0", header = TRUE)
library(tidyr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
data %>%
pivot_wider(
id_cols = ID,
names_from = Condition,
values_from = Count,
values_fill = 0
) %>%
count(A, B, name = "Count of ID")
#> # A tibble: 4 × 3
#> A B `Count of ID`
#> <int> <int> <int>
#> 1 0 0 2
#> 2 0 1 2
#> 3 1 0 2
#> 4 1 1 3
Created on 2023-01-20 by the reprex package (v1.0.0)

dplyr Find records that have specifc set of values

I have a dataset that has some ID and associated timepoints. I want to filter out IDs that have a specific combination of timepoints. If I filter using %in% or |, I get IDs out of the specific combination. How do I do this in R ?
ID
Timepoint
1
1
1
6
1
12
2
1
3
1
3
6
3
12
3
18
4
1
4
6
4
12
I want to filter IDs that have timepoints 1,6 and 12 and exclude other IDs.
Result would be IDs 1,3 and 4
library(dplyr)
df <- data.frame(ID = c(1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4),
Timepoint = c(1, 6, 12, 1, 1, 6, 12, 18, 1, 6, 12))
df %>%
filter(Timepoint %in% c(1, 6, 12)) %>%
mutate(indicator = 1) %>%
group_by(ID) %>%
complete(Timepoint = c(1, 6, 12)) %>%
filter(!ID %in% pull(filter(., is.na(indicator)), ID)) %>%
select(indicator)
Output:
# A tibble: 9 × 2
# Groups: ID [3]
ID indicator
<dbl> <dbl>
1 1 1
2 1 1
3 1 1
4 3 1
5 3 1
6 3 1
7 4 1
8 4 1
9 4 1
We can use
library(dplyr)
df %>%
group_by(ID) %>%
filter(all(c(1, 6, 12) %in% Timepoint)) %>%
ungroup
-output
# A tibble: 10 x 2
ID Timepoint
<dbl> <dbl>
1 1 1
2 1 6
3 1 12
4 3 1
5 3 6
6 3 12
7 3 18
8 4 1
9 4 6
10 4 12
From your data, ID 2 has time point 1. So if filter by time points 1, 6, 12, the result will be 1, 2, 3, 4 instead of 1, 3, 4.
ids <- c(1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4)
time_points <- c(1, 6, 12, 1, 1, 6, 12, 18, 1, 6, 12)
dat <- data.frame(ids, time_points)
unique(dat$ids[dat$time_points %in% c(1, 6, 12)])

Allocating resources based on a priority

I want to create assign column based on rank and limit by group.
In particular, for each group, I have a priority rank (e.g., 1,2,3 or 1,3,6 or 3,4,5 etc). Based on the rank (the small number is a priority), I want to allocate the resource given in limit column. Now I am doing this by hand. But I want to express this exercise using tidyverse. How do I allocate by mutate and group_by(or other methods)?
Using tidyverse, you can use top_n after grouping. This will filter the top values based on rank - where the n to keep in each group is determined by limit. Those kept will be assigned 1, and then merged with your original data.
Let me know if this provides the desired result.
library(tidyverse)
df %>%
group_by(group) %>%
top_n(limit[1], desc(rank)) %>%
mutate(assign = 1) %>%
right_join(df) %>%
replace_na(list(assign = 0)) %>%
arrange(group, rank)
Output
group rank limit assign
<chr> <dbl> <dbl> <dbl>
1 A 1 1 1
2 A 2 1 0
3 A 3 1 0
4 B 1 1 1
5 B 3 1 0
6 B 6 1 0
7 C 3 2 1
8 C 4 2 1
9 C 5 2 0
10 C 6 2 0
Data
df <- structure(list(group = c("A", "A", "A", "B", "B", "B", "C", "C",
"C", "C"), rank = c(1, 2, 3, 1, 3, 6, 3, 4, 5, 6), limit = c(1,
1, 1, 1, 1, 1, 2, 2, 2, 2)), class = "data.frame", row.names = c(NA,
-10L))

Find last non-zero element in column for each group, fill different column

I am trying to create a for loop that does the following:
for (i in 2:length(Exampledata$Levels)) {
if(is.na(Exampledata$Levels[i]) == "TRUE" {
find the last instance where
is.na(Exampledata$Levels) == "FALSE"
for that same ID, and input
the day from that row into last_entry[i]
}
}
Example data:
ID<-c("QYZ","MMM","QYZ","bb2","gm6","gm6","YOU","LLL","LLL","LLL")
day<-c(1,2,3,4,5,6,7,8,9,10)
values<-c(1,2,4,5,5,6,8,9,6,4)
Levels<-c("A","","A","C",'D','D',"C","y","","")
last_entry<-c(0,0,0,0,0,0,0,0,0,0)
What data currently looks like:
ID values Levels day last_entry
1 QYZ 1 A 1 0
2 MMM 2 2 0
3 QYZ 4 A 3 0
4 bb2 5 C 4 0
5 gm6 5 D 5 0
6 gm6 6 D 6 0
7 YOU 8 C 7 0
8 LLL 9 y 8 0
9 LLL 6 9 0
10 LLL 4 10 0
What I want it to look like:
ID values Levels day last_entry
1 QYZ 1 A 1 0
2 MMM 2 2 0
3 QYZ 4 A 3 0
4 bb2 5 C 4 0
5 gm6 5 D 5 0
6 gm6 6 D 6 0
7 YOU 8 C 7 0
8 LLL 9 y 8 0
9 LLL 6 9 8
10 LLL 4 10 8
I have seen a lot of code that looks for last non-zero elements or last is.na=FALSE, but none that can do it by ID, and extract a value from that row. I also need to ignore cases where there is no entry for that ID.
Essentially I want to know the last day that a level was entered for that ID.
Here's a solution using data.table:
library('data.table')
dt <- data.table(ID = c("QYZ","MMM","QYZ","bb2","gm6","gm6","YOU","LLL","LLL","LLL"),
Day = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
values = c(1, 2, 4, 5, 5, 6, 8, 9, 6, 4),
Levels = c("A", NA, "A", "C", "D", "D", "C", "y", NA, NA),
last_entry = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0))
func <- function(days, levels){
if(!any(is.na(levels)) | all(is.na(levels))) return(0)
return(last(days[which(!is.na(levels))]))
}
dt[, last_entry := ifelse(!is.na(Levels), 0, func(Day, Levels)), by = ID]
But if you're set on using a for loop:
ID <- c("QYZ","MMM","QYZ","bb2","gm6","gm6","YOU","LLL","LLL","LLL")
Day <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
Levels <- c("A", NA, "A", "C", "D", "D", "C", "y", NA, NA)
last_entry <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
i.na <- which(is.na(Levels))
for(id in unique(ID)){
i.id <- which(ID == id)
if(all(is.na(Levels[i.id])) | !any(is.na(Levels[i.id]))) next
day <- last(Day[i.id[!(i.id %in% i.na)]])
last_entry[i.na[i.na %in% i.id]] <- day
}
Here is one way using tidyr::fill. We replace the last_entry columns with NA where the Levels are empty, then use fill to replace those NA's with latest non-NA values and turn last_entry value of all non-empty Levels to 0.
library(dplyr)
df %>%
mutate(last_entry = ifelse(Levels != "", day, NA)) %>%
group_by(ID) %>%
tidyr::fill(last_entry) %>%
mutate(last_entry = replace(last_entry, Levels != "" | n() == 1, 0))
# ID day values Levels last_entry
# <fct> <dbl> <dbl> <fct> <dbl>
# 1 QYZ 1 1 A 0
# 2 MMM 2 2 "" 0
# 3 QYZ 3 4 A 0
# 4 bb2 4 5 C 0
# 5 gm6 5 5 D 0
# 6 gm6 6 6 D 0
# 7 YOU 7 8 C 0
# 8 LLL 8 9 y 0
# 9 LLL 9 6 "" 8
#10 LLL 10 4 "" 8
We can also do
df %>%
group_by(ID) %>%
mutate(last_entry = purrr::map_dbl(row_number(), ~if (Levels[.x] == "" & n() > 1)
day[max(which(Levels[1:.x] != ""))] else 0))
data
ID<-c("QYZ","MMM","QYZ","bb2","gm6","gm6","YOU","LLL","LLL","LLL")
day<-c(1,2,3,4,5,6,7,8,9,10)
values<-c(1,2,4,5,5,6,8,9,6,4)
Levels<-c("A","","A","C",'D','D',"C","y","","")
last_entry<-c(0,0,0,0,0,0,0,0,0,0)
df <- data.frame(ID, day, values, Levels, last_entry)
If you want to do it properly, you may want to code "empty" cells to NA beforehand.
Exampledata[Exampledata == ""] <- NA
Then you may use by from base R to look up "day" of the last !is.na entry of "Levels" in the by "ID" splitted data.
res <- do.call(rbind, by(Exampledata, Exampledata$ID, function(x) {
x$last_entry <- ifelse(is.na(x$Levels), x$day[tail(which(!is.na(x$Levels)), 1)], 0)
x
}))
Since the rbinded result comes out ordered alphabetically by "ID" we can re-order it by day.
res <- res[order(res$day), ]
res
# ID day values Levels last_entry
# QYZ.1 QYZ 1 1 A 0
# MMM MMM 2 2 <NA> NA
# QYZ.3 QYZ 3 4 A 0
# bb2 bb2 4 5 C 0
# gm6.5 gm6 5 5 D 0
# gm6.6 gm6 6 6 D 0
# YOU YOU 7 8 C 0
# LLL.8 LLL 8 9 y 0
# LLL.9 LLL 9 6 <NA> 8
# LLL.10 LLL 10 4 <NA> 8
Now there are the desired last entries for the "LLL" level, and an NA for MMM what it logically should have since "Levels" is NA and it has no last entry.
Data
Exampledata <- structure(list(ID = structure(c(5L, 4L, 5L, 1L, 2L, 2L, 6L, 3L,
3L, 3L), .Label = c("bb2", "gm6", "LLL", "MMM", "QYZ", "YOU"), class = "factor"),
day = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), values = c(1, 2,
4, 5, 5, 6, 8, 9, 6, 4), Levels = structure(c(2L, NA, 2L,
3L, 4L, 4L, 3L, 5L, NA, NA), .Label = c("", "A", "C", "D",
"y"), class = "factor"), last_entry = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0)), row.names = c(NA, -10L), class = "data.frame")

Unique body count column

I'm trying to add a body count for each unique person. Each person has multiple data points.
df <- data.frame(PERSON = c("A", "A", "A", "B", "B", "C", "C", "C", "C"),
Y = c(2, 5, 4, 1, 2, 5, 3, 7, 1))
This is what I'd like it to look like:
PERSON Y UNIQ_CT
1 A 2 1
2 A 5 0
3 A 4 0
4 B 1 1
5 B 2 0
6 C 5 1
7 C 3 0
8 C 7 0
9 C 1 0
You can use duplicated and negate it:
transform(df, uniqct = as.integer(!duplicated(Person)))
Since there is dplyr tag to the question here is an option
library(dplyr)
df %>%
group_by(PERSON) %>%
mutate(UNIQ_CT = ifelse(row_number( ) == 1, 1, 0))

Resources