Explicit factor NAs in a data frame

Explicit factor NAs in a data frame - r

I have the following data frame with ages binned in ranges of 5 years and the frequency of a condition happening in males/females. The problem is that there were no occurrences in either gender for example in the range 15-20.
structure(list(age = structure(c(1L, 2L, 3L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L), .Label = c("[0,5]",
"(5,10]", "(10,15]", "(15,20]", "(20,25]", "(25,30]", "(30,35]",
"(35,40]", "(40,45]", "(45,50]", "(50,55]", "(55,60]", "(60,65]",
"(65,70]", "(70,75]", "(75,80]", "(80,85]", "(85,90]", "(90,95]",
"(95,100]"), class = "factor"), male = c(2L, 1L, 1L, 4L, 8L,
9L, 20L, 33L, 49L, 104L, 112L, 176L, 159L, 140L, 94L, 72L, 32L,
6L, 2L), female = c(1L, 1L, NA, 7L, 7L, 4L, 23L, 39L, 44L, 74L,
94L, 111L, 124L, 129L, 110L, 92L, 76L, 30L, 7L)), row.names = c(NA,
-19L), groups = structure(list(age = structure(c(1L, 2L, 3L,
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 20L), .Label = c("[0,5]", "(5,10]", "(10,15]", "(15,20]",
"(20,25]", "(25,30]", "(30,35]", "(35,40]", "(40,45]", "(45,50]",
"(50,55]", "(55,60]", "(60,65]", "(65,70]", "(70,75]", "(75,80]",
"(80,85]", "(85,90]", "(90,95]", "(95,100]"), class = "factor"),
.rows = structure(list(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -19L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
If I check the levels it properly shows all levels.
what I would want is a data frame where all ranges of ages show up and when they don't exist substitute then by 0.

You can use complete :
library(dplyr)
library(tidyr)
df %>%
ungroup %>%
complete(age, fill = list(male = 0, female = 0))
# age male female
# <fct> <dbl> <dbl>
# 1 [0,5] 2 1
# 2 (5,10] 1 1
# 3 (10,15] 1 0
# 4 (15,20] 0 0
# 5 (20,25] 4 7
# 6 (25,30] 8 7
# 7 (30,35] 9 4
# 8 (35,40] 20 23
# 9 (40,45] 33 39
#10 (45,50] 49 44
#11 (50,55] 104 74
#12 (55,60] 112 94
#13 (60,65] 176 111
#14 (65,70] 159 124
#15 (70,75] 140 129
#16 (75,80] 94 110
#17 (80,85] 72 92
#18 (85,90] 32 76
#19 (90,95] 6 30
#20 (95,100] 2 7

Related

Finding the 3 coldest consecutive months

I am trying to write a code that finds the 3 consecutives months that are the coldest.
For now I have written a code for the 3 first months (1,2,3) but then it should be applied to (4,5,6), (7,8,9), (10,11,12), (2,3,4), (5,6,7), (8,9,10), (11,12,1), (3,4,5), (6,7,8), (9,10,11) and (12,1,2) which are all the possible combinations of 3 consecutives months.
The code I wrote is here :
cold <- data_example %>%
group_by(Site) %>%
filter(Month %in% c(1,2,3)) %>%
mutate(mean_temperature = mean(t_q)) %>%
dplyr::select(-c(t_q,Month)) %>%
distinct(Site, mean_temperature)
average_temp_month_1_2_3 <- cold$mean_temperature
Then I replaced the c(1,2,3) by all possiblities, I have created a new column for each output.
I end up with a dataset with row corresponding to Site and columns are all the possibilities of 3 consecutive months.
After I took the min value for each row using the function apply() and min() and it gives me the coldest quarter for each Site.
I am looking for a way to generalize it, like creating a loop on the possiblities.
The structure of data_example is as follow :
structure(list(Site = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 13L,
13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L,
14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 17L, 17L, 17L, 17L,
17L, 17L, 17L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 18L,
18L, 18L, 18L, 18L, 18L, 18L, 25L, 25L, 25L, 25L, 25L, 25L, 25L,
25L, 25L, 25L, 25L, 25L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L,
26L, 26L, 26L, 26L), Month = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 11L, 12L), t_q = c(9.67754848470332, -6.74555496540183,
5.67969761460384, 12.537207581471, -9.4899105618945, 21.0747672424502,
15.2643039243614, -3.62839910494421, 11.3919028351455, 1.69988257436554,
4.22015024307287, 11.7045830784212, 8.91437673833493, 0.579081429509138,
-10.8207481229903, 7.05356868592628, 13.0911580912516, 17.2032089167605,
-2.47642708849114, -11.2105599344486, 33.986736305027, 17.8578689773214,
-14.9114468266335, 14.4681380389141, 0.568074240873411, 7.65458408777801,
1.91368344556659, 6.01571556896127, 11.4858297513536, 2.2608458985328,
-2.08200762781776, 12.1540989284163, 20.9941815285413, 0.375777604316208,
-2.7137027317614, -6.17690210400591, 11.2549857164403, 17.447156776654,
-6.96565197389579, -5.41542361226991, 11.1680111873065, 16.2266522778922,
-11.4503938582433, 5.93300314835716, -18.2818398656237, 16.2930210946949,
9.80219192652316, -0.48237356523527, 7.72680942503686, 5.84113084181759,
9.66129413490096, -4.53018262186904, 7.42187509892118, 9.2559478576895,
8.25120948667013, 8.18182063263247, 16.3703081943971, 19.5469951420341,
3.71888263185773, -0.150179891749435, 1.32057298670562, -5.63556532224354,
21.3918542474341, 4.58752188336035, 5.49430262894033, 5.99587512047837,
-3.76459024109216, -8.53522098071824, 8.01805680562232, 26.2227490426066,
8.90822434139878, 5.04259034084471, 6.89740304247746, 11.9484584922927,
-11.5085102739471, 30.4526759119379, 21.878533782357, -5.39936677076962,
-9.83965056853816, 19.3083455159472, 7.90653548036154, 3.11876660277767,
-8.85027083180008, -9.9225496831988, 5.97307112581907, -2.83528336599284,
-2.75758002814396, 4.68388181004449, 6.61649031537118, -6.65988084338133,
-0.981075313384259, 5.84898952305179, -5.20962191660178, 0.416662319713158,
-10.5336993269853, 19.5350642296553, 26.9696625385792, 15.3291059661081,
15.0799591208354, 13.2310653499033, 7.2053382722482, -7.87288386491102,
20.8083797469715, 6.16664220270041, 8.3360949793043, -14.4000921795463,
-10.5503025782944, 14.3185205291177, 5.83802399796341, 2.49660818997943,
15.7399297014092, -0.834086173817971, 12.4883230222372, 6.73548467376379,
7.7988835803825, -5.13583355913738, 7.51054162811707, 11.6610602814336,
-11.8864185954223, 4.2704440943851)), row.names = c(NA, -120L
), groups = structure(list(Site = c(4L, 5L, 13L, 14L, 15L, 16L,
17L, 18L, 25L, 26L), .rows = structure(list(1:12, 13:24, 25:36,
37:48, 49:60, 61:72, 73:84, 85:96, 97:108, 109:120), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))

You can use raster::movingFun to do a moving average with circular data, then use slice_min to get the minimum value per group.
library(dplyr)
circ <- function(x, by) ifelse(x%%by == 0, by, x%%by)
df %>%
group_by(Site) %>%
mutate(rolmean = raster::movingFun(t_q, n = 3, fun = mean, circular = TRUE)) %>%
slice_min(rolmean) %>%
mutate(coldest = toString(circ(c(Month-1, Month, Month+1), 12)))
output
# A tibble: 10 × 5
# Groups: Site [10]
Site Month t_q rolmean coldest
<int> <int> <dbl> <dbl> <chr>
1 4 2 -6.75 2.87 1, 2, 3
2 5 3 -10.8 -1.06 2, 3, 4
3 13 11 -2.71 -2.84 10, 11, 12
4 14 8 5.93 -7.93 7, 8, 9
5 15 3 9.66 3.66 2, 3, 4
6 16 7 -3.76 -2.10 6, 7, 8
7 17 11 -8.85 -5.22 10, 11, 12
8 18 10 0.417 -5.11 9, 10, 11
9 25 10 -14.4 -5.54 9, 10, 11
10 26 12 4.27 -0.593 11, 12, 1

Using which.min in aggregate on a moving average window.
aggregate(t_q ~ Site, dat, \(s) {
win <- 3 ## window length
sq <- Map(seq, 1:(length(s) - win + 1), win:length(s))
toString(sq[[which.min(sapply(sq, \(sq) mean(s[sq])))]])
})
# Site t_q
# 1 4 1, 2, 3
# 2 5 2, 3, 4
# 3 13 10, 11, 12
# 4 14 7, 8, 9
# 5 15 2, 3, 4
# 6 16 6, 7, 8
# 7 17 10, 11, 12
# 8 18 9, 10, 11
# 9 25 9, 10, 11
# 10 26 10, 11, 12

How to create a cross tabulation table between two variables with the counts in R?

I have a data frame with two columns that I want to cross tabulate. The data also includes the counts for the combination. I am trying to create the cross table and include those counts within the table. I am struggling to use the counts from the dataframe into the cross table.
> df %>% arrange(d1)%>% head()
count d1 d2
1 3 1 15
2 86 1 14
3 13 1 12
4 186 1 16
5 29 1 9
6 86 1 13
> table(df$d1,df$d2)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Expecting [1,15] and [1,14] to show 3, 86 based on the counts in df table.
Right now it shows 0s and 1s only based on if the combinations exists.
Here is my sample data:
structure(list(count = c(37L, 6L, 44L, 21L, 8L, 3L, 9L, 17L,
13L, 32L, 106L, 34L, 505L, 173L, 12L, 2L, 4L, 45L, 3L, 43L, 5L,
16L, 1L, 27L, 17L, 3L, 4L, 1L, 27L, 86L, 79L, 10L, 161L, 32L,
3L, 209L, 9L, 83L, 23L, 108L, 161L, 22L, 4L, 16L, 2L, 6L, 67L,
86L, 3L, 1L, 14L, 14L, 111L, 5L, 5L, 44L, 105L, 13L, 269L, 186L,
3L, 5L, 5L, 27L, 3L, 186L, 58L, 29L, 34L, 43L, 8L, 92L, 9L, 455L,
22L, 32L, 4L, 14L, 58L, 22L, 190L, 94L, 27L, 152L, 264L, 36L,
1L, 505L, 86L, 44L, 3L, 1L, 79L, 75L, 12L, 32L, 11L, 197L, 90L,
269L, 9L, 6L, 47L, 14L, 158L, 303L, 335L, 37L, 33L, 3L, 83L,
15L, 31L, 124L, 146L, 26L, 36L, 27L, 37L, 31L, 108L, 121L, 111L,
11L, 5L, 26L, 166L, 11L, 18L, 11L, 8L, 15L, 18L, 165L, 80L, 14L,
5L, 3L, 492L, 7L, 90L, 146L, 130L, 197L, 165L, 34L, 22L, 122L,
29L, 74L, 455L, 303L, 45L, 5L, 173L, 33L, 24L, 229L, 79L, 43L,
68L, 16L, 10L, 73L, 35L, 99L, 229L, 94L, 23L, 492L, 18L, 84L,
92L, 86L, 35L, 31L, 1L, 23L, 8L, 121L, 1L, 173L, 400L, 124L,
20L, 11L, 6L, 3L, 166L, 84L, 31L, 122L, 15L, 24L, 70L, 43L, 74L,
209L, 45L, 158L, 44L, 15L, 37L, 35L, 27L, 68L, 20L, 15L, 11L,
21L, 4L, 18L, 44L, 234L, 80L, 10L, 44L, 4L, 47L, 7L, 67L, 10L,
3L, 173L, 99L, 79L, 130L, 3L, 75L, 1L, 335L, 14L, 106L, 15L,
34L, 190L, 152L, 16L, 73L, 45L, 1L, 3L, 264L, 160L, 23L, 1L,
160L, 400L, 105L, 234L, 70L, 35L), d1 = c(10L, 17L, 5L, 3L, 12L,
1L, 10L, 10L, 12L, 7L, 14L, 6L, 16L, 3L, 7L, 9L, 7L, 13L, 4L,
8L, 9L, 2L, 7L, 16L, 8L, 15L, 12L, 12L, 2L, 1L, 16L, 15L, 14L,
5L, 8L, 14L, 11L, 11L, 4L, 4L, 13L, 7L, 12L, 11L, 17L, 8L, 4L,
13L, 15L, 15L, 12L, 13L, 4L, 5L, 5L, 5L, 2L, 1L, 2L, 1L, 2L,
13L, 12L, 5L, 3L, 16L, 10L, 1L, 14L, 2L, 7L, 9L, 15L, 16L, 3L,
11L, 8L, 12L, 9L, 9L, 14L, 11L, 8L, 11L, 16L, 10L, 17L, 6L, 1L,
3L, 5L, 1L, 3L, 11L, 10L, 14L, 5L, 3L, 6L, 16L, 15L, 15L, 4L,
14L, 14L, 16L, 16L, 8L, 3L, 7L, 1L, 15L, 6L, 11L, 6L, 5L, 1L,
15L, 2L, 7L, 14L, 2L, 13L, 10L, 6L, 1L, 3L, 15L, 2L, 3L, 9L,
7L, 11L, 3L, 10L, 16L, 17L, 7L, 3L, 15L, 1L, 2L, 10L, 13L, 4L,
5L, 8L, 4L, 9L, 16L, 13L, 4L, 10L, 17L, 6L, 8L, 7L, 11L, 8L,
9L, 16L, 7L, 14L, 9L, 4L, 3L, 13L, 4L, 8L, 16L, 8L, 6L, 14L,
14L, 9L, 13L, 17L, 12L, 10L, 1L, 17L, 11L, 16L, 2L, 1L, 7L, 14L,
12L, 2L, 9L, 8L, 6L, 4L, 13L, 9L, 6L, 5L, 6L, 12L, 11L, 4L, 2L,
14L, 12L, 11L, 7L, 8L, 6L, 1L, 12L, 9L, 12L, 5L, 3L, 6L, 15L,
13L, 8L, 10L, 4L, 1L, 13L, 17L, 13L, 1L, 10L, 14L, 17L, 9L, 2L,
10L, 17L, 2L, 12L, 5L, 3L, 6L, 7L, 3L, 16L, 15L, 5L, 9L, 2L,
6L, 5L, 13L, 11L, 4L, 6L, 13L, 4L), d2 = c(2L, 14L, 4L, 12L,
10L, 15L, 15L, 8L, 1L, 14L, 2L, 5L, 6L, 11L, 10L, 17L, 8L, 10L,
17L, 6L, 5L, 7L, 15L, 15L, 10L, 1L, 9L, 17L, 5L, 14L, 8L, 14L,
13L, 11L, 5L, 6L, 15L, 1L, 8L, 14L, 14L, 3L, 8L, 7L, 9L, 15L,
1L, 1L, 2L, 5L, 13L, 12L, 13L, 12L, 9L, 3L, 4L, 12L, 16L, 16L,
15L, 17L, 5L, 2L, 17L, 1L, 9L, 9L, 5L, 9L, 9L, 14L, 11L, 13L,
7L, 5L, 12L, 14L, 10L, 8L, 3L, 4L, 11L, 6L, 9L, 1L, 1L, 16L,
13L, 5L, 8L, 17L, 10L, 9L, 7L, 7L, 10L, 13L, 1L, 2L, 10L, 8L,
10L, 12L, 11L, 4L, 10L, 14L, 8L, 12L, 11L, 6L, 7L, 2L, 2L, 1L,
10L, 16L, 10L, 6L, 4L, 1L, 4L, 5L, 17L, 5L, 2L, 3L, 8L, 15L,
7L, 4L, 12L, 4L, 6L, 17L, 6L, 5L, 16L, 4L, 6L, 6L, 14L, 3L, 3L,
14L, 9L, 6L, 1L, 5L, 16L, 16L, 13L, 13L, 13L, 3L, 13L, 13L, 16L,
2L, 7L, 2L, 15L, 3L, 12L, 1L, 11L, 11L, 4L, 3L, 2L, 9L, 9L, 1L,
4L, 8L, 12L, 6L, 12L, 2L, 2L, 3L, 11L, 11L, 8L, 1L, 17L, 7L,
3L, 6L, 13L, 4L, 7L, 7L, 13L, 8L, 16L, 14L, 16L, 14L, 5L, 12L,
8L, 4L, 8L, 16L, 1L, 15L, 7L, 3L, 12L, 11L, 13L, 6L, 10L, 13L,
5L, 7L, 4L, 15L, 4L, 15L, 4L, 6L, 3L, 3L, 10L, 3L, 11L, 17L,
16L, 16L, 14L, 2L, 6L, 14L, 11L, 11L, 9L, 12L, 7L, 7L, 16L, 13L,
12L, 15L, 2L, 16L, 2L, 3L, 9L, 9L)), row.names = c(NA, 252L), class = "data.frame")

xtabs may be useful here
> xtabs(count ~ d1 + d2, df)
d2
d1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 0 121 99 67 26 90 11 20 29 36 83 13 86 86 3 186 1
2 121 0 166 105 27 146 16 18 43 37 124 15 160 106 3 269 1
3 99 166 0 165 44 234 22 33 73 79 173 21 197 190 11 492 3
4 67 105 165 0 44 122 15 23 35 47 94 35 111 108 7 303 3
5 26 27 44 44 0 34 3 3 5 11 32 5 44 34 1 74 0
6 90 146 234 122 34 0 31 43 84 80 152 23 173 209 15 505 5
7 11 16 22 15 3 31 0 4 8 12 16 3 24 32 1 68 0
8 20 18 33 23 3 43 4 0 22 17 27 4 31 37 6 79 0
9 29 43 73 35 5 84 8 22 0 58 75 4 70 92 0 264 2
10 36 37 79 47 11 80 12 17 58 0 0 8 45 130 9 335 0
11 83 124 173 94 32 152 16 27 75 0 0 18 229 158 9 400 0
12 13 15 21 35 5 23 3 4 4 8 18 0 14 14 0 45 1
13 86 160 197 111 44 173 24 31 70 45 229 14 0 161 10 455 5
14 86 106 190 108 34 209 32 37 92 130 158 14 161 0 10 0 6
15 3 3 11 7 1 15 1 6 0 9 9 0 10 10 0 27 0
16 186 269 492 303 74 505 68 79 264 335 400 45 455 0 27 0 14
17 1 1 3 3 0 5 0 0 2 0 0 1 5 6 0 14 0
Convert to data.frame if required
as.data.frame.matrix(xtabs(count ~ d1 + d2, df))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 0 121 99 67 26 90 11 20 29 36 83 13 86 86 3 186 1
2 121 0 166 105 27 146 16 18 43 37 124 15 160 106 3 269 1
3 99 166 0 165 44 234 22 33 73 79 173 21 197 190 11 492 3
4 67 105 165 0 44 122 15 23 35 47 94 35 111 108 7 303 3
5 26 27 44 44 0 34 3 3 5 11 32 5 44 34 1 74 0
6 90 146 234 122 34 0 31 43 84 80 152 23 173 209 15 505 5
7 11 16 22 15 3 31 0 4 8 12 16 3 24 32 1 68 0
8 20 18 33 23 3 43 4 0 22 17 27 4 31 37 6 79 0
9 29 43 73 35 5 84 8 22 0 58 75 4 70 92 0 264 2
10 36 37 79 47 11 80 12 17 58 0 0 8 45 130 9 335 0
11 83 124 173 94 32 152 16 27 75 0 0 18 229 158 9 400 0
12 13 15 21 35 5 23 3 4 4 8 18 0 14 14 0 45 1
13 86 160 197 111 44 173 24 31 70 45 229 14 0 161 10 455 5
14 86 106 190 108 34 209 32 37 92 130 158 14 161 0 10 0 6
15 3 3 11 7 1 15 1 6 0 9 9 0 10 10 0 27 0
16 186 269 492 303 74 505 68 79 264 335 400 45 455 0 27 0 14
17 1 1 3 3 0 5 0 0 2 0 0 1 5 6 0 14 0
Or may use dcast
library(data.table)
dcast(df, d1 ~ d2, value.var = 'count')
Key: <d1>
d1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1: 1 NA 121 99 67 26 90 11 20 29 36 83 13 86 86 3 186 1
2: 2 121 NA 166 105 27 146 16 18 43 37 124 15 160 106 3 269 1
3: 3 99 166 NA 165 44 234 22 33 73 79 173 21 197 190 11 492 3
4: 4 67 105 165 NA 44 122 15 23 35 47 94 35 111 108 7 303 3
5: 5 26 27 44 44 NA 34 3 3 5 11 32 5 44 34 1 74 NA
6: 6 90 146 234 122 34 NA 31 43 84 80 152 23 173 209 15 505 5
7: 7 11 16 22 15 3 31 NA 4 8 12 16 3 24 32 1 68 NA
8: 8 20 18 33 23 3 43 4 NA 22 17 27 4 31 37 6 79 NA
9: 9 29 43 73 35 5 84 8 22 NA 58 75 4 70 92 NA 264 2
10: 10 36 37 79 47 11 80 12 17 58 NA NA 8 45 130 9 335 NA
11: 11 83 124 173 94 32 152 16 27 75 NA NA 18 229 158 9 400 NA
12: 12 13 15 21 35 5 23 3 4 4 8 18 NA 14 14 NA 45 1
13: 13 86 160 197 111 44 173 24 31 70 45 229 14 NA 161 10 455 5
14: 14 86 106 190 108 34 209 32 37 92 130 158 14 161 NA 10 NA 6
15: 15 3 3 11 7 1 15 1 6 NA 9 9 NA 10 10 NA 27 NA
16: 16 186 269 492 303 74 505 68 79 264 335 400 45 455 NA 27 NA 14
17: 17 1 1 3 3 NA 5 NA NA 2 NA NA 1 5 6 NA 14 NA

If you are looking for a more publishable solution, you might want to try crosstable::crosstable() as it would let you output a nice HTML table.
This would require a few parameters though, as it is not meant for crossing long vectors of numbers in the first place.
Here is the code:
library(dplyr)
library(crosstable)
ct = df %>%
crosstable(d1, by=d2, percent_pattern="{n}", unique_numeric=Inf)
as_flextable(ct)

delete observations by days in R

My dataset has the next structure
df=structure(list(Data = structure(c(12L, 13L, 14L, 15L, 16L, 17L,
18L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("01.01.2018",
"02.01.2018", "03.01.2018", "04.01.2018", "05.01.2018", "06.01.2018",
"07.01.2018", "12.02.2018", "13.02.2018", "14.02.2018", "15.02.2018",
"25.12.2017", "26.12.2017", "27.12.2017", "28.12.2017", "29.12.2017",
"30.12.2017", "31.12.2017"), class = "factor"), sku = 1:18, metric = c(100L,
210L, 320L, 430L, 540L, 650L, 760L, 870L, 980L, 1090L, 1200L,
1310L, 1420L, 1530L, 1640L, 1750L, 1860L, 1970L), action = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L)), .Names = c("Data", "sku", "metric", "action"), class = "data.frame", row.names = c(NA,
-18L))
I need to delete observations that have certain dates.
But in this dataset there is action variable. The action column has only two values 0 and 1.
Observations on these certain dates should be deleted only for the zero category of action.
these dates are presented in a separate datase.
datedata=structure(list(Data = structure(c(18L, 19L, 20L, 21L, 22L, 5L,
7L, 9L, 11L, 13L, 15L, 17L, 23L, 1L, 2L, 3L, 4L, 6L, 8L, 10L,
12L, 14L, 16L), .Label = c("01.05.2018", "02.05.2018", "03.05.2018",
"04.05.2018", "05.03.2018", "05.05.2018", "06.03.2018", "06.05.2018",
"07.03.2018", "07.05.2018", "08.03.2018", "08.05.2018", "09.03.2018",
"09.05.2018", "10.03.2018", "10.05.2018", "11.03.2018", "21.02.2018",
"22.02.2018", "23.02.2018", "24.02.2018", "25.02.2018", "30.04.2018"
), class = "factor")), .Names = "Data", class = "data.frame", row.names = c(NA,
-23L))
how can i do it?

A solution is to use dplyr::filter as:
library(dplyr)
library(lubridate)
df %>% mutate(Data = dmy(Data)) %>%
filter(action==1 | (action==0 & !(Data %in% dmy(datedata$Data))))
# Data sku metric action
# 1 2017-12-25 1 100 0
# 2 2017-12-26 2 210 0
# 3 2017-12-27 3 320 0
# 4 2017-12-28 4 430 0
# 5 2017-12-29 5 540 0
# 6 2017-12-30 6 650 0
# 7 2017-12-31 7 760 0
# 8 2018-01-01 8 870 0
# 9 2018-01-02 9 980 1
# 10 2018-01-03 10 1090 1
# 11 2018-01-04 11 1200 1
# 12 2018-01-05 12 1310 1
# 13 2018-01-06 13 1420 1
# 14 2018-01-07 14 1530 1
# 15 2018-02-12 15 1640 1
# 16 2018-02-13 16 1750 1
# 17 2018-02-14 17 1860 1
# 18 2018-02-15 18 1970 1

I guess this will work. Fist use match to see weather there is a match in the day of df and the day in datedata, then filter it
library (dplyr)
df <- df %>% mutate (Data.flag = match(Data,datedata$Data)) %>%
filter(!is.na(Data.flag) & action == 0)

ungrouping dataframe in r which is grouped by rows

I have this dataframe
Source: local data frame [159 x 2]
Groups: <by row>
# A tibble: 159 × 2
session_id requestId
* <int> <list>
1 1105 <int [3]>
2 1107 <int [2]>
3 1108 <int [6]>
4 1109 <int [1]>
5 1110 <int [6]>
6 1111 <int [8]>
7 1112 <int [4]>
8 1114 <int [8]>
9 1117 <int [7]>
10 1118 <int [4]>
# ... with 149 more rows
I dont know how to ungroup it or its not working as it is grouped by rows not by some variable ..
I want my output look like something in given pattern/format
# A tibble: 342 × 2
session_id requestId
<int> <dbl>
1 1105 10
2 1105 3
3 1107 13
4 1107 13
5 1108 4
6 1108 6
7 1109 12
8 1109 5
9 1110 6
10 1110 10
I dont know how to do it ,must be simple if known..Thanks for helping
Edit :-
structure(list(session_id = c(1105L, 1107L, 1108L, 1109L, 1110L,
1111L, 1112L, 1114L, 1117L, 1118L), requestId = list(c(8L, 14L,
20L), c(7L, 14L), c(1L, 7L, 8L, 20L, 16L, 17L), 8L, c(1L, 16L,
17L, 8L, 14L, 20L), c(1L, 7L, 8L, 20L, 4L, 11L, 13L, 14L), c(4L,
11L, 13L, 14L), c(6L, 8L, 14L, 2L, 4L, 10L, 15L, 18L), c(4L,
5L, 10L, 16L, 2L, 15L, 18L), c(20L, 1L, 7L, 8L))), .Names = c("session_id",
"requestId"), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))

Here's a possible starting point, although it's still missing a reproducible example. Can be edited when we get one:
tlengths <- sapply( tbl$requestId, length)
#Error in lapply(X = X, FUN = FUN, ...) : object 'tbl' not found
sessions <- mapply(rep, sessionId, tlengths)
#Error in mapply(rep, sessionId, tlengths) : object 'sessionId' not found
newtbl <- tibble(session_id=sessions, requestId=unlist(tbl$requestId))
#Error in tibble(session_id = sessions, requestId = unlist(tbl$requestId)) :
# could not find function "tibble"
library(tidyverse)

As -pkumar suggested in comment , I did this using tidyverse's unnest
df = structure(list(session_id = c(1105L, 1107L, 1108L, 1109L, 1110L,
1111L, 1112L, 1114L, 1117L, 1118L), requestId = list(c(8L, 14L,
20L), c(7L, 14L), c(1L, 7L, 8L, 20L, 16L, 17L), 8L, c(1L, 16L,
17L, 8L, 14L, 20L), c(1L, 7L, 8L, 20L, 4L, 11L, 13L, 14L), c(4L,
11L, 13L, 14L), c(6L, 8L, 14L, 2L, 4L, 10L, 15L, 18L), c(4L,
5L, 10L, 16L, 2L, 15L, 18L), c(20L, 1L, 7L, 8L))), .Names = c("session_id",
"requestId"), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))
library('tidyverse')
unnest(df)

Conditionally remove a row based on another id code

In a dataset which contains many ids, I am only trying to manipulate rows which have id 7 or 9, and leave everything else untouched.
I am trying to conditionally remove a row from 7 or 9 in all instances where there isn't a variable that corresponds to it. So, if in the case of the dput example below, I want to remove the ninth row from id=9 because id=7 does not have an itemcode=2. Vice versa for id=7, I am trying to remove its itemcode=9 because id=9 does not have it.
id client item itemcode unit X2001 X2002 X2003 X2004 X2005 X2006 X2007
...
7 7 Bob eighth 8 100 13 18 15 NA NA NA NA
8 7 Bob ninth 9 100 11 21 10 NA NA NA NA
9 9 Bob_new first 1 100 NA NA NA 23 18 25 18
Code:
structure(list(id = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 10L), client = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L), .Label = c("Bob",
"Bob_new", "Mark"), class = "factor"), item = structure(c(3L,
9L, 4L, 2L, 8L, 7L, 1L, 5L, 3L, 6L, 9L, 4L, 2L, 8L, 7L, 1L, 3L
), .Label = c("eighth", "fifth", "first", "fourth", "ninth",
"second", "seventh", "sixth", "third"), class = "factor"), itemcode = c(1L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L
), unit = c(100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L,
100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L), X2001 = structure(c(5L,
6L, 1L, 4L, 2L, 5L, 3L, 1L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L
), .Label = c("11", "12", "13", "22", "24", "25", "NA"), class = "factor"),
X2002 = structure(c(4L, 8L, 1L, 3L, 7L, 2L, 5L, 6L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L), .Label = c("13", "14", "15",
"17", "18", "21", "22", "24", "NA"), class = "factor"), X2003 = structure(c(5L,
1L, 4L, 2L, 6L, 1L, 3L, 1L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L), .Label = c("10", "11", "15", "19", "23", "24", "NA"), class = "factor"),
X2004 = structure(c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 5L, 4L,
2L, 6L, 1L, 3L, 4L, 3L, 4L), .Label = c("11", "14", "15",
"20", "23", "25", "NA"), class = "factor"), X2005 = structure(c(6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 3L, 2L, 4L, 3L, 5L, 3L, 1L, 4L,
3L), .Label = c("11", "13", "18", "19", "25", "NA"), class = "factor"),
X2006 = structure(c(9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 8L, 6L,
1L, 2L, 5L, 3L, 7L, 8L, 4L), .Label = c("10", "15", "18",
"19", "20", "22", "23", "25", "NA"), class = "factor"), X2007 = structure(c(8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 4L, 7L, 6L, 2L, 4L, 1L, 5L, 5L,
3L), .Label = c("12", "13", "16", "18", "19", "21", "24",
"NA"), class = "factor")), .Names = c("id", "client", "item",
"itemcode", "unit", "X2001", "X2002", "X2003", "X2004", "X2005",
"X2006", "X2007"), class = "data.frame", row.names = c(NA, -17L
))
————————————————————————————————————————
ANOTHER SCENARIO:
before:
structure(list(id = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L), client = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 3L), .Label = c("Bob", "Bob_new", "Mark"), class = "factor"),
item = structure(c(3L, 9L, 10L, 9L, 4L, 2L, 8L, 7L, 7L, 1L,
5L, 3L, 6L, 9L, 4L, 2L, 8L, 7L, 1L, 3L), .Label = c("eighth",
"fifth", "first", "fourth", "ninth", "second", "seventh",
"sixth", "third", "third "), class = "factor"), itemcode = c(1L,
3L, 3L, 3L, 4L, 5L, 6L, 7L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 1L), type = structure(c(1L, 1L, 2L, 3L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("A",
"B", "C"), class = "factor"), unit = c(100L, 100L, 100L,
100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L,
100L, 100L, 100L, 100L, 100L, 100L, 100L), X2001 = c(24L,
25L, 30L, 26L, 11L, 22L, 12L, 25L, 24L, 13L, 11L, NA, NA,
NA, NA, NA, NA, NA, NA, NA), X2002 = c(17L, 24L, 12L, 96L,
13L, 15L, 22L, 21L, 14L, 18L, 21L, NA, NA, NA, NA, NA, NA,
NA, NA, NA), X2003 = c(23L, 10L, 46L, 94L, 19L, 11L, 24L,
19L, 10L, 15L, 10L, NA, NA, NA, NA, NA, NA, NA, NA, NA),
X2004 = c(NA, NA, 43L, 83L, NA, NA, NA, 6L, NA, NA, NA, 23L,
20L, 14L, 25L, 11L, 15L, 20L, 15L, 20L), X2005 = c(NA, NA,
97L, 86L, NA, NA, NA, 17L, NA, NA, NA, 18L, 13L, 19L, 18L,
25L, 18L, 11L, 19L, 18L), X2006 = c(NA, NA, 11L, 91L, NA,
NA, NA, 11L, NA, NA, NA, 25L, 22L, 10L, 15L, 20L, 18L, 23L,
25L, 19L), X2007 = c(NA, NA, 19L, 27L, NA, NA, NA, 15L, NA,
NA, NA, 18L, 24L, 21L, 13L, 18L, 12L, 19L, 19L, 16L)), .Names = c("id",
"client", "item", "itemcode", "type", "unit", "X2001", "X2002",
"X2003", "X2004", "X2005", "X2006", "X2007"), class = "data.frame", row.names = c(NA,
-20L))
after:
structure(list(id = c(7L, 7L, 7L, 7L, 7L, 7L, 9L, 9L, 9L, 9L,
9L, 9L, 10L), client = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 3L), .Label = c("Bob", "Bob_new", "Mark"), class = "factor"),
item = structure(c(2L, 7L, 3L, 1L, 5L, 4L, 2L, 6L, 3L, 1L,
5L, 4L, 2L), .Label = c("fifth", "first", "fourth", "seventh",
"sixth", "third", "third "), class = "factor"), itemcode = c(1L,
3L, 4L, 5L, 6L, 7L, 1L, 3L, 4L, 5L, 6L, 7L, 1L), type = structure(c(1L,
2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("A",
"B"), class = "factor"), unit = c(100L, 100L, 100L, 100L,
100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L), X2001 = c(24L,
10L, 11L, 22L, 12L, 17L, NA, NA, NA, NA, NA, NA, NA), X2002 = c(17L,
87L, 13L, 15L, 22L, 19L, NA, NA, NA, NA, NA, NA, NA), X2003 = c(23L,
47L, 19L, 11L, 24L, 17L, NA, NA, NA, NA, NA, NA, NA), X2004 = c(NA,
28L, NA, NA, NA, 28L, 23L, 14L, 25L, 11L, 15L, 20L, 20L),
X2005 = c(NA, 43L, NA, NA, NA, 16L, 18L, 19L, 18L, 25L, 18L,
11L, 18L), X2006 = c(NA, 69L, NA, NA, NA, 5L, 25L, 10L, 15L,
20L, 18L, 23L, 19L), X2007 = c(NA, 72L, NA, NA, NA, 20L,
18L, 21L, 13L, 18L, 12L, 19L, 16L)), .Names = c("id", "client",
"item", "itemcode", "type", "unit", "X2001", "X2002", "X2003",
"X2004", "X2005", "X2006", "X2007"), class = "data.frame", row.names = c(NA,
-13L))
I could implement the said filter code to remove items which do not exist in its corresponding place (id 7 and 9).
But if there are sub levels for items, like type of item. I am also trying to remove items if they don't have a type similar in the corresponding field.

You could use filter from dplyr
library(dplyr)
filter(df_all, itemcode %in% intersect(itemcode[id==7],
itemcode[id==9])|!id %in% c(7,9) )
# id client item itemcode unit X2001 X2002 X2003 X2004 X2005 X2006 X2007
#1 7 Bob first 1 100 24 17 23 NA NA NA NA
#2 7 Bob third 3 100 25 24 10 NA NA NA NA
#3 7 Bob fourth 4 100 11 13 19 NA NA NA NA
#4 7 Bob fifth 5 100 22 15 11 NA NA NA NA
#5 7 Bob sixth 6 100 12 22 24 NA NA NA NA
#6 7 Bob seventh 7 100 24 14 10 NA NA NA NA
#7 7 Bob eighth 8 100 13 18 15 NA NA NA NA
#8 9 Bob_new first 1 100 NA NA NA 23 18 25 18
#9 9 Bob_new third 3 100 NA NA NA 14 19 10 21
#10 9 Bob_new fourth 4 100 NA NA NA 25 18 15 13
#11 9 Bob_new fifth 5 100 NA NA NA 11 25 20 18
#12 9 Bob_new sixth 6 100 NA NA NA 15 18 18 12
#13 9 Bob_new seventh 7 100 NA NA NA 20 11 23 19
#14 9 Bob_new eighth 8 100 NA NA NA 15 19 25 19
#15 10 Mark first 1 100 NA NA NA 20 18 19 16
Update
Based on the new dataset, perhaps this helps
library(dplyr)
library(tidyr)
dfnew %>%
unite(itemtype, itemcode,type) %>%
filter(itemtype %in% intersect(itemtype[id==7],
itemtype[id==9])|!id %in% c(7,9)) %>%
separate(itemtype, c('itemcode', 'type'))
# id client item itemcode type unit X2001 X2002 X2003 X2004 X2005 X2006
# 1 7 Bob first 1 A 100 24 17 23 NA NA NA
# 2 7 Bob third 3 B 100 30 12 46 43 97 11
# 3 7 Bob fourth 4 A 100 11 13 19 NA NA NA
# 4 7 Bob fifth 5 A 100 22 15 11 NA NA NA
# 5 7 Bob sixth 6 A 100 12 22 24 NA NA NA
# 6 7 Bob seventh 7 A 100 25 21 19 6 17 11
# 7 9 Bob_new first 1 A 100 NA NA NA 23 18 25
# 8 9 Bob_new third 3 B 100 NA NA NA 14 19 10
# 9 9 Bob_new fourth 4 A 100 NA NA NA 25 18 15
# 10 9 Bob_new fifth 5 A 100 NA NA NA 11 25 20
# 11 9 Bob_new sixth 6 A 100 NA NA NA 15 18 18
# 12 9 Bob_new seventh 7 A 100 NA NA NA 20 11 23
# 13 10 Mark first 1 A 100 NA NA NA 20 18 19
# X2007
#1 NA
#2 19
#3 NA
#4 NA
#5 NA
#6 15
#7 18
#8 21
#9 13
#10 18
#11 12
#12 19
#13 16

If I understand the problem: every itemcode in id=9 subset must have identical itemcode in id=7 subset (and reverse). If it is not the case then we filter the row with the non-pair itemcode out, but leave everything with id not in 7 or 9. Here is one way of doing it:
First get common item codes:
items_9 <- df_all$itemcode[ df_all$id==9 ]
items_7 <- df_all$itemcode[ df_all$id==7 ]
items_common <- items_9[ items_9 %in% items_7 ]
select everything with common itemcodes for 7 and 9 and the rest:
df_new <- df_all[
which(
( df_all$id %in% c(7, 9) &
df_all$itemcode %in% items_common
) |
!df_all$id %in% c(7,9)
)
,]

library(dplyr)
df$remove <- paste(df$itemcode, df$type)
df<-invisible(filter(df,
remove %in% intersect(remove[type==7],
remove[type==9])|!type %in% c(7,9) ))
#Remove the additional column after filter
df$remove <- NULL

You could do something like this, which runs setdiff in both directions. The cl() function wasn't really necessary, but I really don't like writing the same expression over and over again.
f <- function(x, y) setdiff(union(x, y), x)
cl <- function(var) substitute(df$itemcode[df$id == x], list(x = var))
So now you can call f() on c(id7, id9) and then reverse it and get the c(id9, id7) result.
do.call(f, x <- list(cl(7), cl(9)))
# [1] 2
do.call(f, rev(x))
# [1] 9

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Explicit factor NAs in a data frame - r

Related

Finding the 3 coldest consecutive months

How to create a cross tabulation table between two variables with the counts in R?

delete observations by days in R

ungrouping dataframe in r which is grouped by rows

Conditionally remove a row based on another id code

Categories

Resources