R - Insert Missing Numbers in A Sequence by Group's Max Value - r

I'd like to insert missing numbers in the index column following these two conditions:
Partitioned by multiple columns
The minimum value is always 1
The maximum value is always the maximum for the group and type
Current Data:
group type index vol
A 1 1 200
A 1 2 244
A 1 5 33
A 2 2 66
A 2 3 2
A 2 4 199
A 2 10 319
B 1 4 290
B 1 5 188
B 1 6 573
B 1 9 122
Desired Data:
group type index vol
A 1 1 200
A 1 2 244
A 1 3 0
A 1 4 0
A 1 5 33
A 2 1 0
A 2 2 66
A 2 3 2
A 2 4 199
A 2 5 0
A 2 6 0
A 2 7 0
A 2 8 0
A 2 9 0
A 2 10 319
B 1 1 0
B 1 2 0
B 1 3 0
B 1 4 290
B 1 5 188
B 1 6 573
B 1 7 0
B 1 8 0
B 1 9 122
I've just added in spaces between the partitions for clarity.
Hope you can help out!

You can do the following
library(dplyr)
library(tidyr)
my_df %>%
group_by(group, type) %>%
complete(index = 1:max(index), fill = list(vol = 0))
# group type index vol
# 1 A 1 1 200
# 2 A 1 2 244
# 3 A 1 3 0
# 4 A 1 4 0
# 5 A 1 5 33
# 6 A 2 1 0
# 7 A 2 2 66
# 8 A 2 3 2
# 9 A 2 4 199
# 10 A 2 5 0
# 11 A 2 6 0
# 12 A 2 7 0
# 13 A 2 8 0
# 14 A 2 9 0
# 15 A 2 10 319
# 16 B 1 1 0
# 17 B 1 2 0
# 18 B 1 3 0
# 19 B 1 4 290
# 20 B 1 5 188
# 21 B 1 6 573
# 22 B 1 7 0
# 23 B 1 8 0
# 24 B 1 9 122
With group_by you specify the groups you indicated withed the white spaces. With complete you specify which columns should be complete and then what values should be filled in for the remaining column (default would be NA)
Data
my_df <-
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
type = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L),
index = c(1L, 2L, 5L, 2L, 3L, 4L, 10L, 4L, 5L, 6L, 9L),
vol = c(200L, 244L, 33L, 66L, 2L, 199L, 319L, 290L, 188L, 573L, 122L)),
class = "data.frame", row.names = c(NA, -11L))

One dplyr and tidyr possibility could be:
df %>%
group_by(group, type) %>%
complete(index = full_seq(1:max(index), 1), fill = list(vol = 0))
group type index vol
<fct> <int> <dbl> <dbl>
1 A 1 1 200
2 A 1 2 244
3 A 1 3 0
4 A 1 4 0
5 A 1 5 33
6 A 2 1 0
7 A 2 2 66
8 A 2 3 2
9 A 2 4 199
10 A 2 5 0
11 A 2 6 0
12 A 2 7 0
13 A 2 8 0
14 A 2 9 0
15 A 2 10 319
16 B 1 1 0
17 B 1 2 0
18 B 1 3 0
19 B 1 4 290
20 B 1 5 188
21 B 1 6 573
22 B 1 7 0
23 B 1 8 0
24 B 1 9 122

Related

Summing consecutive values, broken up by specific value, in R

I'm having a trouble figuring out how to group variables to achieve the desired result from dplyr. I have an experimental dataset set up like this:
subject task_phase block_number trial_number ResponseCorrect
<chr> <chr> <dbl> <dbl> <dbl>
1 268301377 1 1 2 1
2 268301377 1 1 3 1
3 268301377 1 1 4 1
4 268301377 1 2 2 -1
5 268301377 1 2 3 1
6 268301377 1 2 4 1
7 268301377 1 3 2 1
8 268301377 1 3 3 -1
9 268301377 1 3 4 1
10 268301377 2 1 50 1
11 268301377 2 1 51 1
12 268301377 2 1 52 1
13 268301377 2 2 37 -1
14 268301377 2 2 38 1
15 268301377 2 2 39 1
16 268301377 2 3 41 -1
17 268301377 2 3 42 -1
18 268301377 2 3 43 1
I'm hoping to sum the consecutive "correct" responses, and to have this tally "reset" each time there was an incorrect response:
subject task_phase block_number trial_number ResponseCorrect ConsecutiveCorrect
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 268301377 1 1 1 1 1
2 268301377 1 1 2 1 2
3 268301377 1 1 3 1 3
4 268301377 1 2 1 -1 0
5 268301377 1 2 2 1 1
6 268301377 1 2 3 1 2
7 268301377 1 3 1 1 1
8 268301377 1 3 2 -1 0
9 268301377 1 3 3 1 1
10 268301377 2 1 1 1 1
11 268301377 2 1 2 1 2
12 268301377 2 1 3 1 3
13 268301377 2 2 1 -1 0
14 268301377 2 2 2 1 1
15 268301377 2 2 3 1 2
16 268301377 2 3 1 -1 0
17 268301377 2 3 2 -1 0
18 268301377 2 3 3 1 1
I originally thought I could do something along the lines of df %>% group_by(subject, task_phase, block_number, ResponseCorrect) %>% mutate(ConsecutiveCorrect = cumsum(ResponseCorrect), and that almost works. But, it doesn't give a consecutive value: it just sums up the total number of correct responses per block (. I'm essentially trying to use the -1s as break points that start the summation over again.
Is there a grouping function (Tidyverse or otherwise) that I'm not aware of that could do something along these lines?
You could try
library(dplyr)
data %>%
group_by(
subject,
task_phase,
block_number,
grp = lag(cumsum(ResponseCorrect == -1), default = 0)
) %>%
mutate(ConsecutiveCorrect = ifelse(ResponseCorrect == -1, 0, cumsum(ResponseCorrect))) %>%
ungroup() %>%
select(-grp)
which returns
# A tibble: 18 x 6
subject task_phase block_number trial_number ResponseCorrect ConsecutiveCorrect
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 268301377 1 1 2 1 1
2 268301377 1 1 3 1 2
3 268301377 1 1 4 1 3
4 268301377 1 2 2 -1 0
5 268301377 1 2 3 1 1
6 268301377 1 2 4 1 2
7 268301377 1 3 2 1 1
8 268301377 1 3 3 -1 0
9 268301377 1 3 4 1 1
10 268301377 2 1 50 1 1
11 268301377 2 1 51 1 2
12 268301377 2 1 52 1 3
13 268301377 2 2 37 -1 0
14 268301377 2 2 38 1 1
15 268301377 2 2 39 1 2
16 268301377 2 3 41 -1 0
17 268301377 2 3 42 -1 0
18 268301377 2 3 43 1 1
An option with data.table. Grouped by 'subject', 'task_phase', 'block_number', get the run-length-id (rleid) of 'ResponseCorrect', return with rowid of that sequence, multiply with a logical vector so that elements that corresponds to -1 (FALSE -> 0 will return 0 and TRUE -> 1 returns the element)
library(data.table)
setDT(df)[, ConsecutiveCorrect := rowid(rleid(ResponseCorrect)) *
(ResponseCorrect == 1), by = .(subject, task_phase, block_number)]
-output
df
subject task_phase block_number trial_number ResponseCorrect ConsecutiveCorrect
1: 268301377 1 1 2 1 1
2: 268301377 1 1 3 1 2
3: 268301377 1 1 4 1 3
4: 268301377 1 2 2 -1 0
5: 268301377 1 2 3 1 1
6: 268301377 1 2 4 1 2
7: 268301377 1 3 2 1 1
8: 268301377 1 3 3 -1 0
9: 268301377 1 3 4 1 1
10: 268301377 2 1 50 1 1
11: 268301377 2 1 51 1 2
12: 268301377 2 1 52 1 3
13: 268301377 2 2 37 -1 0
14: 268301377 2 2 38 1 1
15: 268301377 2 2 39 1 2
16: 268301377 2 3 41 -1 0
17: 268301377 2 3 42 -1 0
18: 268301377 2 3 43 1 1
data
df <- structure(list(subject = c(268301377L, 268301377L, 268301377L,
268301377L, 268301377L, 268301377L, 268301377L, 268301377L, 268301377L,
268301377L, 268301377L, 268301377L, 268301377L, 268301377L, 268301377L,
268301377L, 268301377L, 268301377L), task_phase = c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
block_number = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), trial_number = c(2L, 3L,
4L, 2L, 3L, 4L, 2L, 3L, 4L, 50L, 51L, 52L, 37L, 38L, 39L,
41L, 42L, 43L), ResponseCorrect = c(1L, 1L, 1L, -1L, 1L,
1L, 1L, -1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, -1L, -1L, 1L)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18"))

R data.table : rolling lag sum for previous 3 days by group

I am currently working R in data.table and am looking for an easy way to implement a rolling lag sum. I can find posts on lags and posts on various sum functions but haven't been successful finding one in which sum and lag are combined in the way I am looking to implement it (rolling back 3 days).
I have a data set that resembles the following-
id agedays diar
1 1 1
1 2 0
1 3 1
1 4 1
1 5 0
1 6 0
1 7 0
1 8 1
1 9 1
1 10 1
3 2 0
3 5 0
3 6 0
3 8 1
3 9 1
4 1 0
4 4 0
4 5 0
4 6 1
4 7 0
I want to create a variable "diar_prev3" that holds the rolling sum of diar for the past 3 days prior to the current agedays value. Diar_prev3 would be NA for the rows in which agedays < 4 The data set would look like the following :
id agedays diar diar_prev3
1 1 1 NA
1 2 0 NA
1 3 1 NA
1 4 1 2
1 5 0 2
1 6 0 2
1 7 0 1
1 8 1 0
1 9 1 1
1 10 1 2
3 2 0 NA
3 5 0 0
3 6 0 0
3 8 1 0
3 9 1 1
4 1 0 NA
4 4 0 0
4 5 0 0
4 6 1 0
4 7 0 1
I have tried a basic lag function, but am unsure how to implement this with a rolling sum function included. Does anyone have any functions they recommend using to accomplish this?
****Edited to fix an error with ID==2
I don't get the logic; it does not appear to be by id, otherwise the results for id==2 don't make sense - but what is going on with id==3 and 4?
In principle, you could do something like this - either by ID or not:
library(data.table)
library(RcppRoll)
DT <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L),
agedays = c(1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 2L, 5L, 6L, 8L, 9L, 1L, 4L,
5L, 6L, 7L), diar = c(1L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L,
0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L)),
class = "data.frame", row.names = c(NA, -20L))
setDT(DT)
DT[, diar_prev3 := ifelse(agedays < 4, NA, RcppRoll::roll_sum(lag(diar, 1), n=3L, fill=NA, align = "right"))][]
#> id agedays diar diar_prev3
#> 1: 1 1 1 NA
#> 2: 1 2 0 NA
#> 3: 1 3 1 NA
#> 4: 1 4 1 2
#> 5: 1 5 0 2
#> 6: 2 6 0 1
#> 7: 2 7 0 0
#> 8: 2 8 1 1
#> 9: 2 9 1 2
#> 10: 2 10 1 3
#> 11: 3 2 0 NA
#> 12: 3 5 0 1
#> 13: 3 6 0 0
#> 14: 3 8 1 1
#> 15: 3 9 1 2
#> 16: 4 1 0 NA
#> 17: 4 4 0 1
#> 18: 4 5 0 0
#> 19: 4 6 1 1
#> 20: 4 7 0 1
DT[, diar_prev3 := ifelse(agedays < 4, NA, RcppRoll::roll_sum(lag(diar, 1), n=3L, fill=NA, align = "right")), by=id][]
#> id agedays diar diar_prev3
#> 1: 1 1 1 NA
#> 2: 1 2 0 NA
#> 3: 1 3 1 NA
#> 4: 1 4 1 2
#> 5: 1 5 0 2
#> 6: 2 6 0 NA
#> 7: 2 7 0 NA
#> 8: 2 8 1 1
#> 9: 2 9 1 2
#> 10: 2 10 1 3
#> 11: 3 2 0 NA
#> 12: 3 5 0 NA
#> 13: 3 6 0 0
#> 14: 3 8 1 1
#> 15: 3 9 1 2
#> 16: 4 1 0 NA
#> 17: 4 4 0 NA
#> 18: 4 5 0 0
#> 19: 4 6 1 1
#> 20: 4 7 0 1
Created on 2020-07-20 by the reprex package (v0.3.0)

Longest consecutive count of the same value per group

I have a data.frame as below and I want to add a variable describing the longest consecutive count of 1 in the VALUE variable observed in the group (i.e. longest consecutive rows with 1 in VALUE per group).
GROUP_ID VALUE
1 0
1 1
1 1
1 1
1 1
1 0
2 1
2 1
2 0
2 1
2 1
2 1
3 1
3 0
3 1
3 0
So the output would look like this:
GROUP_ID VALUE CONSECUTIVE
1 0 4
1 1 4
1 1 4
1 1 4
1 1 4
1 0 4
2 1 3
2 1 3
2 0 3
2 1 3
2 1 3
2 1 3
3 1 1
3 0 1
3 1 1
3 0 1
Any help would be greatly appreciated!
Using dplyr:
library(dplyr)
dat %>%
group_by(GROUP_ID) %>%
mutate(CONSECUTIVE = {rl <- rle(VALUE); max(rl$lengths[rl$values == 1])})
which gives:
# A tibble: 16 x 3
# Groups: GROUP_ID [3]
GROUP_ID VALUE CONSECUTIVE
<int> <int> <int>
1 1 0 4
2 1 1 4
3 1 1 4
4 1 1 4
5 1 1 4
6 1 0 4
7 2 1 3
8 2 1 3
9 2 0 3
10 2 1 3
11 2 1 3
12 2 1 3
13 3 1 1
14 3 0 1
15 3 1 1
16 3 0 1
Or with data.table:
library(data.table)
setDT(dat) # convert to a 'data.table'
dat[, CONSECUTIVE := {rl <- rle(VALUE); max(rl$lengths[rl$values == 1])}
, by = GROUP_ID][]
We can use ave with rle and get maximum occurrence of consecutive 1's for each group. (GROUP_ID)
df$Consecutive <- ave(df$VALUE, df$GROUP_ID, FUN = function(x) {
y <- rle(x == 1)
max(y$lengths[y$values])
})
df
# GROUP_ID VALUE Consecutive
#1 1 0 4
#2 1 1 4
#3 1 1 4
#4 1 1 4
#5 1 1 4
#6 1 0 4
#7 2 1 3
#8 2 1 3
#9 2 0 3
#10 2 1 3
#11 2 1 3
#12 2 1 3
#13 3 1 1
#14 3 0 1
#15 3 1 1
#16 3 0 1
Here is another option with data.table
library(data.table)
library(dplyr)
setDT(df1)[, CONSECUTIVE := max(table(na_if(rleid(VALUE)*VALUE, 0))), .(GROUP_ID)]
df1
# GROUP_ID VALUE CONSECUTIVE
# 1: 1 0 4
# 2: 1 1 4
# 3: 1 1 4
# 4: 1 1 4
# 5: 1 1 4
# 6: 1 0 4
# 7: 2 1 3
# 8: 2 1 3
# 9: 2 0 3
#10: 2 1 3
#11: 2 1 3
#12: 2 1 3
#13: 3 1 1
#14: 3 0 1
#15: 3 1 1
#16: 3 0 1
data
df1 <- structure(list(GROUP_ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L), VALUE = c(0L, 1L, 1L, 1L, 1L, 0L,
1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA,
-16L))

Creating dummy variable based on group properties

My data looks something like this:
ID CSEX MID CMOB CYRB 1ST 2ND
1 1 1 1 1991 0 1
2 1 1 7 1989 1 0
3 2 2 1 1985 1 0
4 2 2 11 1985 0 1
5 1 2 9 1994 0 0
6 2 3 4 1992 1 0
7 2 4 2 1992 0 1
8 1 4 10 1983 1 0
With ID = child ID, CSEX = child sex, MID = mother ID, CMOB = month of birth and CYRB = year of birth, 1st = first born dummy, 2nd = second born dummy.
And I'm trying to make a dummy variable that takes the value 1 if the first two children born into a family (i.e. with the same MID) are the same sex.
I tried
Identifiers_age <- Identifiers_age %>% group_by(MPUBID) %>%
mutate(samesex =
as.numeric(((first == 1 & CSEX == 1) & (second == 1 & CSEX == 1))
| (first == 1 & CSEX == 2) & (second == 1 & CSEX ==2))))
But clearly this still only check the condition for each individual ID rather than by MID so returns a dummy which always takes value = 0.
Thanks
Edit for expected output:
ID CSEX MID CMOB CYRB 1ST 2ND SAMESEX
1 1 1 1 1991 0 1 1
2 1 1 7 1989 1 0 1
3 2 2 1 1985 1 0 1
4 2 2 11 1985 0 1 1
5 1 2 9 1994 0 0 1
6 2 3 4 1992 1 0 0
7 2 4 2 1992 0 1 0
8 1 4 10 1983 1 0 0
i.e. for any individual that is in a family where the first two children born are of the same sex, the dummy SAMESEX = 1
Edit2 (What I showed before was just an example I made, for the true dataset calling structure gives):
CPUBID MPUBID CSEX CMOB CYRB first second
<int> <int> <int> <int> <int> <dbl> <dbl>
1 201 2 2 3 1993 1 0
2 202 2 2 11 1994 0 1
3 301 3 2 6 1981 1 0
4 302 3 2 10 1983 0 1
5 303 3 2 4 1986 0 0
6 401 4 1 8 1980 1 0
7 403 4 2 3 1997 0 1
8 801 8 2 3 1976 1 0
9 802 8 1 5 1979 0 1
10 803 8 2 9 1982 0 0
and str:
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 11512 obs. of 7 variables:
$ CPUBID : int 201 202 301 302 303 401 403 801 802 803 ...
$ MPUBID : int 2 2 3 3 3 4 4 8 8 8 ...
$ CSEX : int 2 2 2 2 2 1 2 2 1 2 ...
$ CMOB : int 3 11 6 10 4 8 3 3 5 9 ...
$ CYRB : int 1993 1994 1981 1983 1986 1980 1997 1976 1979 1982 ...
$ first : num 1 0 1 0 0 1 0 1 0 0 ...
$ second : num 0 1 0 1 0 0 1 0 1 0 ...
May be this helps
library(dplyr)
Identifiers_age %>%
group_by(MID) %>%
mutate(ind1 = CSEX *`1ST`,
ind2 = CSEX *`2ND`,
SAMESEX = as.integer(n_distinct(c(ind1[ind1!=0],
ind2[ind2!=0]))==1 & sum(ind1) >0 & sum(ind2) > 0)) %>%
select(-ind1, -ind2)
# ID CSEX MID CMOB CYRB 1ST 2ND SAMESEX
# <int> <int> <int> <int> <int> <int> <int> <int>
#1 1 1 1 1 1991 0 1 1
#2 2 1 1 7 1989 1 0 1
#3 3 2 2 1 1985 1 0 1
#4 4 2 2 11 1985 0 1 1
#5 5 1 2 9 1994 0 0 1
#6 6 2 3 4 1992 1 0 0
#7 7 2 4 2 1992 0 1 0
#8 8 1 4 10 1983 1 0 0
Or it can be made slightly compact with
Identifiers_age %>%
group_by(MID) %>%
mutate(SAMESEX = as.integer(n_distinct(c(CSEX * NA^!`1ST`, CSEX * NA^!`2ND`),
na.rm = TRUE)==1 & sum(`1ST`) > 0 & sum(`2ND`) > 0))
data
Identifiers_age <- structure(list(ID = 1:8, CSEX = c(1L, 1L, 2L, 2L, 1L,
2L, 2L,
1L), MID = c(1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L), CMOB = c(1L, 7L,
1L, 11L, 9L, 4L, 2L, 10L), CYRB = c(1991L, 1989L, 1985L, 1985L,
1994L, 1992L, 1992L, 1983L), `1ST` = c(0L, 1L, 1L, 0L, 0L, 1L,
0L, 1L), `2ND` = c(1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L)), .Names = c("ID",
"CSEX", "MID", "CMOB", "CYRB", "1ST", "2ND"), class = "data.frame",
row.names = c(NA, -8L))

R: adding in rows of zero based on the values in multiple columns

I am trying to append rows to an R data.frame. Here is an example of a data.frame "foo":
A B C D
1 1 1 200
1 1 2 50
1 1 3 15
1 2 1 150
1 2 4 50
1 3 1 300
2 1 2 40
2 1 4 90
2 3 2 80
For every A, there are 3 possible values of B, and for every B, there are 4 possible values of C. However, the initial df only contains non-zero values of D. I'd like to manipulate the df so that zeros are included for both B and C. Thus, the df would show 0's in D for any value of B/C that was 0. I have seen questions that address this with one column, but couldn't find a question addressing it with multiple columns. The final df would look like this:
A B C D
1 1 1 200
1 1 2 50
1 1 3 15
1 1 4 0
1 2 1 150
1 2 2 0
1 2 3 0
1 2 4 50
1 3 1 300
1 3 2 0
1 3 3 0
1 3 4 0
2 1 1 0
2 1 2 40
2 1 3 0
2 1 4 90
2 2 1 0
2 2 2 0
2 2 3 0
2 2 4 0
2 3 1 0
2 3 2 80
2 3 3 0
2 3 4 0
I first tried creating a dummy data frame that then merged with the initial df, but something isn't working right. Here's the current code, which I know is wrong because this code only generates rows based on A. I think I want to make the dummy frame based on A and B but I don't know how - could an if/else function work here?:
# create dummy df
dummy <- as.data.frame(
cbind(
sort(rep(unique(foo$A), 12)),
rep(1:3,length(unique(foo$A)))))
colnames(dummy) <- c("A","B")
foo$A <- as.numeric(foo$A)
foo$B <- as.numeric(foo$C)
# merge with foo
mergedummy <- merge(dummy,foo,all.x=T)
Any insight is greatly appreciated - thanks!
A one liner:
merge(dat, data.frame(table(dat[1:3]))[-4],all.y=TRUE)
# A B C D
#1 1 1 1 200
#2 1 1 2 50
#3 1 1 3 15
#4 1 1 4 NA
#...
Or maybe less complicated:
out <- data.frame(xtabs(D ~ ., data=dat))
out[do.call(order,out[1:3]),]
# A B C Freq
#1 1 1 1 200
#7 1 1 2 50
#13 1 1 3 15
#19 1 1 4 0
#...
Where dat is:
dat <- structure(list(A = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), B = c(1L,
1L, 1L, 2L, 2L, 3L, 1L, 1L, 3L), C = c(1L, 2L, 3L, 1L, 4L, 1L,
2L, 4L, 2L), D = c(200L, 50L, 15L, 150L, 50L, 300L, 40L, 90L,
80L)), .Names = c("A", "B", "C", "D"), class = "data.frame", row.names = c(NA,
-9L))
I created a master data frame which includes all combinations of A, B, and C as you describe in the expected outcome. Then, I merge the master data frame and your data frame. Finally, I replaced NA with 0.
master <- data.frame(A = rep(1:2, each = 12),
B = rep(1:3, each = 4),
C = rep(1:4, times = 6))
library(dplyr)
master %>%
left_join(., mydf) %>%
mutate(D = ifelse(D %in% NA, 0, D))
# A B C D
#1 1 1 1 200
#2 1 1 2 50
#3 1 1 3 15
#4 1 1 4 0
#5 1 2 1 150
#6 1 2 2 0
#7 1 2 3 0
#8 1 2 4 50
#9 1 3 1 300
#10 1 3 2 0
#11 1 3 3 0
#12 1 3 4 0
#13 2 1 1 0
#14 2 1 2 40
#15 2 1 3 0
#16 2 1 4 90
#17 2 2 1 0
#18 2 2 2 0
#19 2 2 3 0
#20 2 2 4 0
#21 2 3 1 0
#22 2 3 2 80
#23 2 3 3 0
#24 2 3 4 0
Here is one solution:
foo <- merge(expand.grid(lapply(foo[,1:3], unique)), foo, all=TRUE, sort=TRUE)
foo[is.na(foo)] <- 0

Resources