Using complete() to get all possible combinations of two character columns [closed] - r

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 days ago.
Improve this question
I am trying to expand the tibble below such that each combination of filename and move appears in the tibble, and those with no observations have 0 in the n column.
structure(list(filename = c("Texts/Bio.CSU.1.txt", "Texts/Bio.CSU.1.txt",
"Texts/Bio.CSU.1.txt", "Texts/Bio.CSU.1.txt", "Texts/Bio.CSU.1.txt",
"Texts/Bio.CSU.1.txt", "Texts/Bio.CSU.1.txt", "Texts/Bio.CSU.1.txt",
"Texts/Bio.CSU.1.txt", "Texts/Bio.EMich.1.txt", "Texts/Bio.EMich.1.txt",
"Texts/Bio.EMich.1.txt", "Texts/Bio.EMich.1.txt", "Texts/Bio.EMich.1.txt",
"Texts/Bio.EMich.1.txt", "Texts/Bio.EMich.1.txt", "Texts/Bio.EMich.1.txt",
"Texts/Bio.EMich.1.txt", "Texts/Bio.EMich.1.txt", "Texts/Bio.Gustavus.1.txt"
), move = c("achievement", "benefits", "competence", "gap", "goal",
"importance", "means", "previous_research", "timeline", "achievement",
"benefits", "gap", "goal", "hypothesis", "importance", "means",
"previous_research", "territory", "timeline", "achievement"),
n = c(12L, 1L, 9L, 1L, 1L, 3L, 5L, 7L, 2L, 1L, 3L, 3L, 1L,
2L, 2L, 35L, 3L, 34L, 2L, 2L)), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
filename = c("Texts/Bio.CSU.1.txt", "Texts/Bio.EMich.1.txt",
"Texts/Bio.Gustavus.1.txt"), .rows = structure(list(1:9,
10:19, 20L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L), .drop = TRUE))
I tried the following code:
moves_expanded <- moves %>% complete(move, filename, fill = list(n = 0L))
But the output looks exactly like the original tibble, with no rows added. I'm not sure why it doesn't work.
structure(list(move = c("achievement", "benefits", "competence",
"gap", "goal", "importance", "means", "previous_research", "timeline",
"achievement", "benefits", "gap", "goal", "hypothesis", "importance",
"means", "previous_research", "territory", "timeline", "achievement"
), filename = c("Texts/Bio.CSU.1.txt", "Texts/Bio.CSU.1.txt",
"Texts/Bio.CSU.1.txt", "Texts/Bio.CSU.1.txt", "Texts/Bio.CSU.1.txt",
"Texts/Bio.CSU.1.txt", "Texts/Bio.CSU.1.txt", "Texts/Bio.CSU.1.txt",
"Texts/Bio.CSU.1.txt", "Texts/Bio.EMich.1.txt", "Texts/Bio.EMich.1.txt",
"Texts/Bio.EMich.1.txt", "Texts/Bio.EMich.1.txt", "Texts/Bio.EMich.1.txt",
"Texts/Bio.EMich.1.txt", "Texts/Bio.EMich.1.txt", "Texts/Bio.EMich.1.txt",
"Texts/Bio.EMich.1.txt", "Texts/Bio.EMich.1.txt", "Texts/Bio.Gustavus.1.txt"
), n = c(12L, 1L, 9L, 1L, 1L, 3L, 5L, 7L, 2L, 1L, 3L, 3L, 1L,
2L, 2L, 35L, 3L, 34L, 2L, 2L)), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
filename = c("Texts/Bio.CSU.1.txt", "Texts/Bio.EMich.1.txt",
"Texts/Bio.Gustavus.1.txt"), .rows = structure(list(1:9,
10:19, 20L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L), .drop = TRUE))
I also tried the following, which gave an error:
moves %>% expand_grid(move, filename)

Related

NA values in R when dividing two variables

I have the following dataset:
structure(list(decils_renda = structure(c(1L, 3L, 5L, 3L, 2L,
10L, 3L, 7L, 2L, 8L, 4L, 7L, 6L, 2L, 5L, 1L, 1L, 9L, 4L, 2L), .Label = c("1r",
"2n", "3r", "4t", "5è", "6è", "7è", "8è", "9è", "10è"), class = "factor"),
nombre_families_decils = c(2107410.879995, 1919694.803749,
1871204.79901, 1919694.803749, 2000467.089601, 1756059.188985,
1919694.803749, 1865871.935523, 2000467.089601, 1832456.399842,
1929142.572451, 1865871.935523, 1857086.601994, 2000467.089601,
1871204.79901, 2107410.879995, 2107410.879995, 1726965.615762,
1929142.572451, 2000467.089601), despesatotal = structure(c(3692812.45,
9798007.97, 11479590.32, 7022441.93, 32068770.61, 43498810.27,
14197075.72, 30361832.13, 12884341.18, 86317384.39, 17834496.58,
7124896.58, 31555170.18, 6652264.05, 5166912.67, 22087897.14,
28243177.88, 13478665.67, 7722015.78, 11334536.72), format.stata = "%12.0g"),
despesamonetaria = structure(c(1750165.37, 5424793.37, 8354996.5,
5009218.41, 20577773.88, 38507968.12, 10922966.92, 30361832.13,
7139635.72, 80050637.69, 14429261.22, 5429467.01, 25528438.99,
3315187.59, 5166912.67, 14379160.67, 20813842.46, 9559187.02,
5939555.08, 9223340.12), format.stata = "%12.0g")), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
decils_renda = structure(1:10, .Label = c("1r", "2n", "3r",
"4t", "5è", "6è", "7è", "8è", "9è", "10è"), class = "factor"),
.rows = structure(list(c(1L, 16L, 17L), c(5L, 9L, 14L, 20L
), c(2L, 4L, 7L), c(11L, 19L), c(3L, 15L), 13L, c(8L, 12L
), 10L, 18L, 6L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L), .drop = TRUE))
I want to divide despesatotal and despesamonetaria between nombre_families_decils. However, when decils_renda is 1r, I only get NA values. And it shouldn't be an NA value.
I am using the following code:
Llar_2021_Red <- Llar_2021_Red %>%
group_by(decils_renda) %>%
mutate(despesa_total_decils=sum(despesatotal)/nombre_families_decils, na.rm=TRUE) %>%
mutate(despesa_monetaria_decils=sum(despesamonetaria)/nombre_families_decils, na.rm=TRUE)

How can I distinguish same variable with different order? (drug prescription data)

I'm trying to analyse drug prescription data with dates.
Here is the code for my data:
structure(list(id = c(4L, 4L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L), claim = c(1L, 2L, 1L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L), start_date = structure(c(12267,
12298, 12626, 12818, 12846, 12877, 12907, 12938, 13091, 13121,
13152), class = "Date"), drug = c("a", "a", "a", "b", "b", "a",
"a", "a", "a", "a", "b"), total.price = c(100L, 100L, 100L, 100L,
100L, 100L, 100L, 100L, 100L, 100L, 100L), dose = c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), IBD = c("CD", "CD", "CD", "CD",
"CD", "CD", "CD", "CD", "CD", "CD", "CD"), naivety = c(1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1), diff_drug = c(0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 1)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -11L), groups = structure(list(id = 4:6,
.rows = structure(list(1:2, 3L, 4:11), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE))
What I'm trying to calculate is the start date and end date of the drug prescriptions for each id.
First, I grouped data by "id" and "drug" variables.
If the variable 'discontinuation' ==1, the person's end date will be the discontinuation date.
If the variable 'discontinuation' == 0, the person's end date will be the last date of the prescriptions(max(start_date)).
I tried to calculate this by the code below.
bio_naive <- bio_naive %>% arrange(id,start_date) %>% mutate(Diff = lead(start_date) - (start_date))
bio_naive <- bio_naive %>% group_by(id, drug) %>% mutate(discontinuation = ifelse(Diff > 90, '1', '0'))
bio_naive$discontinuation[is.na(bio_naive $discontinuation)] <- 0
bio_naive %>%
group_by(id,drug) %>%
summarise(discont=max(discontinuation), start = min(start_date, na.rm = TRUE), dc_final_date = if_else(any(discontinuation == 1), start_date[match(1, discontinuation)], max(start_date)))
However, the problem arose with 'id6, drug b'
I want to seperate the date of 'two drug b prescriptions' like the results in the picture below,
but somehow it is combined in my code earlier.
Therefore, I want to ask two questions regard this data.
Is there any solutions to distinguish 'two drug b prescriptions' 'id6'?
Is there a code to calculate total price between id and drug groups during follow-up period (from start to until the dc_final_date)?
Desired Results
Thank you in advance.

subsample random rows of tibble

Suppose i have two data objects, df.A and df.B.
df.A <- structure(list(Species = structure(c(7L, 7L, 1L, 1L, 1L, 1L,
4L, 6L, 5L, 5L), .Label = c("Carcharhinus leucas", "Carcharhinus limbatus",
"Carcharhinus perezi", "Galeocerdo cuvier", "Ginglymostoma cirratum",
"Hypanus americanus", "Negaprion brevirostris", "Sphyrna mokarran"
), class = "factor"), Sex = structure(c(1L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 2L), .Label = c("f", "m"), class = "factor")), row.names = c(NA,
10L), class = "data.frame")
> class(df.A)
[1] "data.frame"
df.B <- structure(list(Diel.phase = structure(c(2L, 2L, 1L, 2L, 1L, 2L,
2L, 1L, 1L, 1L), .Label = c("Day", "Night"), class = "factor"),
Season = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L,
2L), .Label = c("Summer", "Winter"), class = "factor")), row.names = c(NA,
-10L), groups = structure(list(.rows = structure(list(1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame")), class = c("rowwise_df", "tbl_df", "tbl",
"data.frame"))
> class(df.B)
[1] "rowwise_df" "tbl_df" "tbl" "data.frame"
Let's say I want to subsample 2 rows from each object. The code below works for df.A but not for df.B. Instead, all rows for df.B are returned.
df.B %>% slice_sample(n=2)
Can someone explain this result? And how can i apply sample_slice to object of class(df.B) without back-transforming to data.frame object first?
The grouping influences how the tibble is treated.
You can do this:
df.B %>% ungroup() %>% slice_sample(n=2)

Convert list in data frame collapsing one column and keeping others unaletered in R

I have a list formed by 12 elements, each being a data frame. Each df contain three columns, two common columns across all the elements and one different.
The two common columns are:
coche_OEM
dia_hora_OEM
The other column, which is different in every element, can be collapsed in an unique column when converting the list into a data frame. For instance, column U0073 in one of the elements containS one value with the same name, whereas column B1182 contains another element with the same name as the variable name.
The issue is that I would like to convert this list into a data frame with three columns (variables):
coche_OEM
dia_hora_OEM
DTC: this column with all the values present in each column with their codes.
The list is this one:
listdf <- list(structure(list(B1182 = structure(1L, .Label = c("B1182",
"NULL"), class = "factor"), coche_OEM = structure(3L, .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), dia_hora_OEM = structure(1577774413, class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -1L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), groups = structure(list(B1182 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("B1182",
"NULL"), class = "factor"), coche_OEM = structure(c(1L, 2L, 3L,
4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), .rows = list(integer(0),
integer(0), 1L, integer(0), integer(0), integer(0), integer(0),
integer(0), integer(0), integer(0), integer(0), integer(0))), .Names = c("B1182",
"coche_OEM", ".rows"), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"), .drop = FALSE), .Names = c("B1182", "coche_OEM",
"dia_hora_OEM")), structure(list(B124D = structure(1L, .Label = c("B124D",
"NULL"), class = "factor"), coche_OEM = structure(3L, .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), dia_hora_OEM = structure(1577774413, class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -1L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), groups = structure(list(B124D = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("B124D",
"NULL"), class = "factor"), coche_OEM = structure(c(1L, 2L, 3L,
4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), .rows = list(integer(0),
integer(0), 1L, integer(0), integer(0), integer(0), integer(0),
integer(0), integer(0), integer(0), integer(0), integer(0))), .Names = c("B124D",
"coche_OEM", ".rows"), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"), .drop = FALSE), .Names = c("B124D", "coche_OEM",
"dia_hora_OEM")), structure(list(P2000 = structure(1L, .Label = c("c(\"P2000\", \"P2000\", \"P2000\")",
"NULL"), class = "factor"), coche_OEM = structure(5L, .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), dia_hora_OEM = structure(1577793330, class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -1L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), groups = structure(list(P2000 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("c(\"P2000\", \"P2000\", \"P2000\")",
"NULL"), class = "factor"), coche_OEM = structure(c(1L, 2L, 3L,
4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), .rows = list(integer(0),
integer(0), integer(0), integer(0), 1L, integer(0), integer(0),
integer(0), integer(0), integer(0), integer(0), integer(0))), .Names = c("P2000",
"coche_OEM", ".rows"), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"), .drop = FALSE), .Names = c("P2000", "coche_OEM",
"dia_hora_OEM")), structure(list(U3003 = structure(c(2L, 2L), .Label = c("NULL",
"U3003"), class = "factor"), coche_OEM = structure(c(5L, 1L), .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), dia_hora_OEM = structure(c(1577793330,
1582648789), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA,
-2L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), groups = structure(list(
U3003 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("NULL", "U3003"), class = "factor"),
coche_OEM = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L,
4L, 5L, 6L), .Label = c("356232050832996", "356232050836666",
"356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), .rows = list(integer(0),
integer(0), integer(0), integer(0), integer(0), integer(0),
2L, integer(0), integer(0), integer(0), 1L, integer(0))), .Names = c("U3003",
"coche_OEM", ".rows"), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"), .drop = FALSE), .Names = c("U3003", "coche_OEM",
"dia_hora_OEM")), structure(list(B1D01 = structure(c(1L, 1L,
2L), .Label = c("B1D01", "c(\"B1D01\", \"B1D01\")", "NULL"), class = "factor"),
coche_OEM = structure(c(2L, 1L, 1L), .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736",
"356232050899078", "356232050905933"), class = "factor"),
dia_hora_OEM = structure(c(1581690876, 1582648789, 1582651926
), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA,
-3L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), groups = structure(list(
B1D01 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("B1D01", "c(\"B1D01\", \"B1D01\")",
"NULL"), class = "factor"), coche_OEM = structure(c(1L, 2L,
3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L,
6L), .Label = c("356232050832996", "356232050836666", "356232050880755",
"356232050882736", "356232050899078", "356232050905933"), class = "factor"),
.rows = list(2L, 1L, integer(0), integer(0), integer(0),
integer(0), 3L, integer(0), integer(0), integer(0), integer(0),
integer(0), integer(0), integer(0), integer(0), integer(0),
integer(0), integer(0))), .Names = c("B1D01", "coche_OEM",
".rows"), row.names = c(NA, -18L), class = c("tbl_df", "tbl",
"data.frame"), .drop = FALSE), .Names = c("B1D01", "coche_OEM",
"dia_hora_OEM")), structure(list(U0155 = structure(2L, .Label = c("NULL",
"U0155"), class = "factor"), coche_OEM = structure(1L, .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), dia_hora_OEM = structure(1582648789, class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -1L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), groups = structure(list(U0155 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("NULL",
"U0155"), class = "factor"), coche_OEM = structure(c(1L, 2L,
3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), .rows = list(integer(0),
integer(0), integer(0), integer(0), integer(0), integer(0),
1L, integer(0), integer(0), integer(0), integer(0), integer(0))), .Names = c("U0155",
"coche_OEM", ".rows"), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"), .drop = FALSE), .Names = c("U0155", "coche_OEM",
"dia_hora_OEM")), structure(list(C1B00 = structure(1L, .Label = c("C1B00",
"NULL"), class = "factor"), coche_OEM = structure(1L, .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), dia_hora_OEM = structure(1582648789, class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -1L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), groups = structure(list(C1B00 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("C1B00",
"NULL"), class = "factor"), coche_OEM = structure(c(1L, 2L, 3L,
4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), .rows = list(1L, integer(0),
integer(0), integer(0), integer(0), integer(0), integer(0),
integer(0), integer(0), integer(0), integer(0), integer(0))), .Names = c("C1B00",
"coche_OEM", ".rows"), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"), .drop = FALSE), .Names = c("C1B00", "coche_OEM",
"dia_hora_OEM")), structure(list(P037D = structure(2L, .Label = c("NULL",
"P037D"), class = "factor"), coche_OEM = structure(1L, .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), dia_hora_OEM = structure(1582648789, class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -1L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), groups = structure(list(P037D = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("NULL",
"P037D"), class = "factor"), coche_OEM = structure(c(1L, 2L,
3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), .rows = list(integer(0),
integer(0), integer(0), integer(0), integer(0), integer(0),
1L, integer(0), integer(0), integer(0), integer(0), integer(0))), .Names = c("P037D",
"coche_OEM", ".rows"), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"), .drop = FALSE), .Names = c("P037D", "coche_OEM",
"dia_hora_OEM")), structure(list(P0616 = structure(2L, .Label = c("NULL",
"P0616"), class = "factor"), coche_OEM = structure(1L, .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), dia_hora_OEM = structure(1582648789, class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -1L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), groups = structure(list(P0616 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("NULL",
"P0616"), class = "factor"), coche_OEM = structure(c(1L, 2L,
3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), .rows = list(integer(0),
integer(0), integer(0), integer(0), integer(0), integer(0),
1L, integer(0), integer(0), integer(0), integer(0), integer(0))), .Names = c("P0616",
"coche_OEM", ".rows"), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"), .drop = FALSE), .Names = c("P0616", "coche_OEM",
"dia_hora_OEM")), structure(list(P0562 = structure(2L, .Label = c("NULL",
"P0562"), class = "factor"), coche_OEM = structure(1L, .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), dia_hora_OEM = structure(1582648789, class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -1L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), groups = structure(list(P0562 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("NULL",
"P0562"), class = "factor"), coche_OEM = structure(c(1L, 2L,
3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), .rows = list(integer(0),
integer(0), integer(0), integer(0), integer(0), integer(0),
1L, integer(0), integer(0), integer(0), integer(0), integer(0))), .Names = c("P0562",
"coche_OEM", ".rows"), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"), .drop = FALSE), .Names = c("P0562", "coche_OEM",
"dia_hora_OEM")), structure(list(U0073 = structure(2L, .Label = c("NULL",
"U0073"), class = "factor"), coche_OEM = structure(1L, .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), dia_hora_OEM = structure(1582648789, class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -1L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), groups = structure(list(U0073 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("NULL",
"U0073"), class = "factor"), coche_OEM = structure(c(1L, 2L,
3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), .rows = list(integer(0),
integer(0), integer(0), integer(0), integer(0), integer(0),
1L, integer(0), integer(0), integer(0), integer(0), integer(0))), .Names = c("U0073",
"coche_OEM", ".rows"), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"), .drop = FALSE), .Names = c("U0073", "coche_OEM",
"dia_hora_OEM")), structure(list(P0138 = structure(1L, .Label = c("c(\"P0138\", \"P0138\", \"P0138\")",
"NULL"), class = "factor"), coche_OEM = structure(5L, .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), dia_hora_OEM = structure(1583391111, class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -1L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), groups = structure(list(P0138 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("c(\"P0138\", \"P0138\", \"P0138\")",
"NULL"), class = "factor"), coche_OEM = structure(c(1L, 2L, 3L,
4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("356232050832996",
"356232050836666", "356232050880755", "356232050882736", "356232050899078",
"356232050905933"), class = "factor"), .rows = list(integer(0),
integer(0), integer(0), integer(0), 1L, integer(0), integer(0),
integer(0), integer(0), integer(0), integer(0), integer(0))), .Names = c("P0138",
"coche_OEM", ".rows"), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"), .drop = FALSE), .Names = c("P0138", "coche_OEM",
"dia_hora_OEM")))
So, how could I convert this list into a data frame with my requirements?
We can rename all the columns that are not 'coche_OEM' or 'dia_hora_OEM' to a predefined string ('id' here):
map_df(listdf, ~rename_at(.x, vars(-c('coche_OEM', 'dia_hora_OEM')), ~'id'))
# A tibble: 15 x 3
# Groups: id, coche_OEM [78]
id coche_OEM dia_hora_OEM
<chr> <fct> <dttm>
1 "B1182" 356232050880755 2019-12-31 06:40:13
2 "B124D" 356232050880755 2019-12-31 06:40:13
3 "c(\"P2000\", \"P2000\", \"P2000\")" 356232050899078 2019-12-31 11:55:30
4 "U3003" 356232050899078 2019-12-31 11:55:30
5 "U3003" 356232050832996 2020-02-25 16:39:49
6 "B1D01" 356232050836666 2020-02-14 14:34:36
7 "B1D01" 356232050832996 2020-02-25 16:39:49
8 "c(\"B1D01\", \"B1D01\")" 356232050832996 2020-02-25 17:32:06
9 "U0155" 356232050832996 2020-02-25 16:39:49
10 "C1B00" 356232050832996 2020-02-25 16:39:49
11 "P037D" 356232050832996 2020-02-25 16:39:49
12 "P0616" 356232050832996 2020-02-25 16:39:49
13 "P0562" 356232050832996 2020-02-25 16:39:49
14 "U0073" 356232050832996 2020-02-25 16:39:49
15 "c(\"P0138\", \"P0138\", \"P0138\")" 356232050899078 2020-03-05 06:51:51

Cartesian Rolling Join using Data.table

I have two tables:
dat: contains the data
dates: contains the table of dates
library(data.table)
dates = structure(list(date = structure(c(17562, 17590, 17621, 17651,
17682, 17712, 17743, 17774, 17804, 17835, 17865, 17896), class = "Date")),
row.names = c(NA, -12L), class = "data.frame")
dat = structure(list(date = structure(c(17546, 17743, 17778, 17901,
17536, 17806, 17901, 17981, 17532, 17722, 17969, 18234), class = "Date"),
country = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L), .Label = c("AAA", "BBB", "CCC"), class = "factor"),
state = structure(c(1L, 1L, 2L, 3L, 4L, 1L, 2L, 5L, 6L, 1L,
2L, 2L), .Label = c("S1", "S2", "S3", "S4", "S5", "S6"), class = "factor"),
item = structure(c(1L, 2L, 4L, 6L, 3L, 5L, 3L, 2L, 2L, 4L,
5L, 7L), .Label = c("M1", "M2", "M3", "M4", "M5", "M6", "M7"
), class = "factor"), value = c(67L, 10L, 50L, 52L, 93L,
50L, 62L, 46L, 6L, 30L, 30L, 14L)), row.names = c(NA, -12L
), class = "data.frame")
dates = data.table(dates)
dat = data.table(dat)
setkey(dates, date)
setkey(dat, date)
The result I'm after is below. I.e doing a rolling join with each individual row of dat and then combining the result.
rbind(
dat[1,][dates, roll = 90],
dat[2,][dates, roll = 90],
dat[3,][dates, roll = 90],
...
dat[12,][dates, roll = 90]
)
My actual dataset is much larger so it's no practical to list every row of dat. Is there a short hand way of doing the same thing without a loop?
If I understand your intent correctly, you want to rollover the records for 90 days.
I used a cross join and then used the rollover criteria to subset
Your original tables:
library(data.table)
dates = structure(list(date = structure(c(17562, 17590, 17621, 17651,
17682, 17712, 17743, 17774, 17804, 17835, 17865, 17896), class = "Date")),
row.names = c(NA, -12L), class = "data.frame")
dat = structure(list(date = structure(c(17546, 17743, 17778, 17901,
17536, 17806, 17901, 17981, 17532, 17722, 17969, 18234), class = "Date"),
country = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L), .Label = c("AAA", "BBB", "CCC"), class = "factor"),
state = structure(c(1L, 1L, 2L, 3L, 4L, 1L, 2L, 5L, 6L, 1L,
2L, 2L), .Label = c("S1", "S2", "S3", "S4", "S5", "S6"), class = "factor"),
item = structure(c(1L, 2L, 4L, 6L, 3L, 5L, 3L, 2L, 2L, 4L,
5L, 7L), .Label = c("M1", "M2", "M3", "M4", "M5", "M6", "M7"
), class = "factor"), value = c(67L, 10L, 50L, 52L, 93L,
50L, 62L, 46L, 6L, 30L, 30L, 14L)), row.names = c(NA, -12L
), class = "data.frame")
dates = data.table(dates)
dat = data.table(dat)
Note, I haven't setkey.
I am using a cross join function from the reference: How to do cross join in R?
CJ.table.1 <- function(X,Y)
setkey(X[,c(k=1,.SD)],k)[Y[,c(k=1,.SD)],allow.cartesian=TRUE][,k:=NULL]
Then I cross join, subset for the roll join, rename columns and sort
dsn1<-CJ.table.1(dat,dates)[i.date-date<=90 & i.date-date>=0][,.(date=i.date,country, state, item, value)][order(country, state, item, value,date),]
This is not necessarily the best way to do it, but you could simply write a loop here to iterate through your data:
df <- data.frame()
for (i in 1:nrow(dat)){
df <- rbind(df, dat[i,][dates, roll = 90])
}
head(df)
date country state item value
1: 2018-01-31 CCC S6 M2 6
2: 2018-02-28 CCC S6 M2 6
3: 2018-03-31 CCC S6 M2 6
4: 2018-04-30 <NA> <NA> <NA> NA
5: 2018-05-31 <NA> <NA> <NA> NA
Edit: just saw you said "without a loop", it's been a long day. This is one way to solve the problem though.

Resources