I have a data frame with two columns that I want to cross tabulate. The data also includes the counts for the combination. I am trying to create the cross table and include those counts within the table. I am struggling to use the counts from the dataframe into the cross table.
> df %>% arrange(d1)%>% head()
count d1 d2
1 3 1 15
2 86 1 14
3 13 1 12
4 186 1 16
5 29 1 9
6 86 1 13
> table(df$d1,df$d2)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Expecting [1,15] and [1,14] to show 3, 86 based on the counts in df table.
Right now it shows 0s and 1s only based on if the combinations exists.
Here is my sample data:
structure(list(count = c(37L, 6L, 44L, 21L, 8L, 3L, 9L, 17L,
13L, 32L, 106L, 34L, 505L, 173L, 12L, 2L, 4L, 45L, 3L, 43L, 5L,
16L, 1L, 27L, 17L, 3L, 4L, 1L, 27L, 86L, 79L, 10L, 161L, 32L,
3L, 209L, 9L, 83L, 23L, 108L, 161L, 22L, 4L, 16L, 2L, 6L, 67L,
86L, 3L, 1L, 14L, 14L, 111L, 5L, 5L, 44L, 105L, 13L, 269L, 186L,
3L, 5L, 5L, 27L, 3L, 186L, 58L, 29L, 34L, 43L, 8L, 92L, 9L, 455L,
22L, 32L, 4L, 14L, 58L, 22L, 190L, 94L, 27L, 152L, 264L, 36L,
1L, 505L, 86L, 44L, 3L, 1L, 79L, 75L, 12L, 32L, 11L, 197L, 90L,
269L, 9L, 6L, 47L, 14L, 158L, 303L, 335L, 37L, 33L, 3L, 83L,
15L, 31L, 124L, 146L, 26L, 36L, 27L, 37L, 31L, 108L, 121L, 111L,
11L, 5L, 26L, 166L, 11L, 18L, 11L, 8L, 15L, 18L, 165L, 80L, 14L,
5L, 3L, 492L, 7L, 90L, 146L, 130L, 197L, 165L, 34L, 22L, 122L,
29L, 74L, 455L, 303L, 45L, 5L, 173L, 33L, 24L, 229L, 79L, 43L,
68L, 16L, 10L, 73L, 35L, 99L, 229L, 94L, 23L, 492L, 18L, 84L,
92L, 86L, 35L, 31L, 1L, 23L, 8L, 121L, 1L, 173L, 400L, 124L,
20L, 11L, 6L, 3L, 166L, 84L, 31L, 122L, 15L, 24L, 70L, 43L, 74L,
209L, 45L, 158L, 44L, 15L, 37L, 35L, 27L, 68L, 20L, 15L, 11L,
21L, 4L, 18L, 44L, 234L, 80L, 10L, 44L, 4L, 47L, 7L, 67L, 10L,
3L, 173L, 99L, 79L, 130L, 3L, 75L, 1L, 335L, 14L, 106L, 15L,
34L, 190L, 152L, 16L, 73L, 45L, 1L, 3L, 264L, 160L, 23L, 1L,
160L, 400L, 105L, 234L, 70L, 35L), d1 = c(10L, 17L, 5L, 3L, 12L,
1L, 10L, 10L, 12L, 7L, 14L, 6L, 16L, 3L, 7L, 9L, 7L, 13L, 4L,
8L, 9L, 2L, 7L, 16L, 8L, 15L, 12L, 12L, 2L, 1L, 16L, 15L, 14L,
5L, 8L, 14L, 11L, 11L, 4L, 4L, 13L, 7L, 12L, 11L, 17L, 8L, 4L,
13L, 15L, 15L, 12L, 13L, 4L, 5L, 5L, 5L, 2L, 1L, 2L, 1L, 2L,
13L, 12L, 5L, 3L, 16L, 10L, 1L, 14L, 2L, 7L, 9L, 15L, 16L, 3L,
11L, 8L, 12L, 9L, 9L, 14L, 11L, 8L, 11L, 16L, 10L, 17L, 6L, 1L,
3L, 5L, 1L, 3L, 11L, 10L, 14L, 5L, 3L, 6L, 16L, 15L, 15L, 4L,
14L, 14L, 16L, 16L, 8L, 3L, 7L, 1L, 15L, 6L, 11L, 6L, 5L, 1L,
15L, 2L, 7L, 14L, 2L, 13L, 10L, 6L, 1L, 3L, 15L, 2L, 3L, 9L,
7L, 11L, 3L, 10L, 16L, 17L, 7L, 3L, 15L, 1L, 2L, 10L, 13L, 4L,
5L, 8L, 4L, 9L, 16L, 13L, 4L, 10L, 17L, 6L, 8L, 7L, 11L, 8L,
9L, 16L, 7L, 14L, 9L, 4L, 3L, 13L, 4L, 8L, 16L, 8L, 6L, 14L,
14L, 9L, 13L, 17L, 12L, 10L, 1L, 17L, 11L, 16L, 2L, 1L, 7L, 14L,
12L, 2L, 9L, 8L, 6L, 4L, 13L, 9L, 6L, 5L, 6L, 12L, 11L, 4L, 2L,
14L, 12L, 11L, 7L, 8L, 6L, 1L, 12L, 9L, 12L, 5L, 3L, 6L, 15L,
13L, 8L, 10L, 4L, 1L, 13L, 17L, 13L, 1L, 10L, 14L, 17L, 9L, 2L,
10L, 17L, 2L, 12L, 5L, 3L, 6L, 7L, 3L, 16L, 15L, 5L, 9L, 2L,
6L, 5L, 13L, 11L, 4L, 6L, 13L, 4L), d2 = c(2L, 14L, 4L, 12L,
10L, 15L, 15L, 8L, 1L, 14L, 2L, 5L, 6L, 11L, 10L, 17L, 8L, 10L,
17L, 6L, 5L, 7L, 15L, 15L, 10L, 1L, 9L, 17L, 5L, 14L, 8L, 14L,
13L, 11L, 5L, 6L, 15L, 1L, 8L, 14L, 14L, 3L, 8L, 7L, 9L, 15L,
1L, 1L, 2L, 5L, 13L, 12L, 13L, 12L, 9L, 3L, 4L, 12L, 16L, 16L,
15L, 17L, 5L, 2L, 17L, 1L, 9L, 9L, 5L, 9L, 9L, 14L, 11L, 13L,
7L, 5L, 12L, 14L, 10L, 8L, 3L, 4L, 11L, 6L, 9L, 1L, 1L, 16L,
13L, 5L, 8L, 17L, 10L, 9L, 7L, 7L, 10L, 13L, 1L, 2L, 10L, 8L,
10L, 12L, 11L, 4L, 10L, 14L, 8L, 12L, 11L, 6L, 7L, 2L, 2L, 1L,
10L, 16L, 10L, 6L, 4L, 1L, 4L, 5L, 17L, 5L, 2L, 3L, 8L, 15L,
7L, 4L, 12L, 4L, 6L, 17L, 6L, 5L, 16L, 4L, 6L, 6L, 14L, 3L, 3L,
14L, 9L, 6L, 1L, 5L, 16L, 16L, 13L, 13L, 13L, 3L, 13L, 13L, 16L,
2L, 7L, 2L, 15L, 3L, 12L, 1L, 11L, 11L, 4L, 3L, 2L, 9L, 9L, 1L,
4L, 8L, 12L, 6L, 12L, 2L, 2L, 3L, 11L, 11L, 8L, 1L, 17L, 7L,
3L, 6L, 13L, 4L, 7L, 7L, 13L, 8L, 16L, 14L, 16L, 14L, 5L, 12L,
8L, 4L, 8L, 16L, 1L, 15L, 7L, 3L, 12L, 11L, 13L, 6L, 10L, 13L,
5L, 7L, 4L, 15L, 4L, 15L, 4L, 6L, 3L, 3L, 10L, 3L, 11L, 17L,
16L, 16L, 14L, 2L, 6L, 14L, 11L, 11L, 9L, 12L, 7L, 7L, 16L, 13L,
12L, 15L, 2L, 16L, 2L, 3L, 9L, 9L)), row.names = c(NA, 252L), class = "data.frame")
xtabs may be useful here
> xtabs(count ~ d1 + d2, df)
d2
d1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 0 121 99 67 26 90 11 20 29 36 83 13 86 86 3 186 1
2 121 0 166 105 27 146 16 18 43 37 124 15 160 106 3 269 1
3 99 166 0 165 44 234 22 33 73 79 173 21 197 190 11 492 3
4 67 105 165 0 44 122 15 23 35 47 94 35 111 108 7 303 3
5 26 27 44 44 0 34 3 3 5 11 32 5 44 34 1 74 0
6 90 146 234 122 34 0 31 43 84 80 152 23 173 209 15 505 5
7 11 16 22 15 3 31 0 4 8 12 16 3 24 32 1 68 0
8 20 18 33 23 3 43 4 0 22 17 27 4 31 37 6 79 0
9 29 43 73 35 5 84 8 22 0 58 75 4 70 92 0 264 2
10 36 37 79 47 11 80 12 17 58 0 0 8 45 130 9 335 0
11 83 124 173 94 32 152 16 27 75 0 0 18 229 158 9 400 0
12 13 15 21 35 5 23 3 4 4 8 18 0 14 14 0 45 1
13 86 160 197 111 44 173 24 31 70 45 229 14 0 161 10 455 5
14 86 106 190 108 34 209 32 37 92 130 158 14 161 0 10 0 6
15 3 3 11 7 1 15 1 6 0 9 9 0 10 10 0 27 0
16 186 269 492 303 74 505 68 79 264 335 400 45 455 0 27 0 14
17 1 1 3 3 0 5 0 0 2 0 0 1 5 6 0 14 0
Convert to data.frame if required
as.data.frame.matrix(xtabs(count ~ d1 + d2, df))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 0 121 99 67 26 90 11 20 29 36 83 13 86 86 3 186 1
2 121 0 166 105 27 146 16 18 43 37 124 15 160 106 3 269 1
3 99 166 0 165 44 234 22 33 73 79 173 21 197 190 11 492 3
4 67 105 165 0 44 122 15 23 35 47 94 35 111 108 7 303 3
5 26 27 44 44 0 34 3 3 5 11 32 5 44 34 1 74 0
6 90 146 234 122 34 0 31 43 84 80 152 23 173 209 15 505 5
7 11 16 22 15 3 31 0 4 8 12 16 3 24 32 1 68 0
8 20 18 33 23 3 43 4 0 22 17 27 4 31 37 6 79 0
9 29 43 73 35 5 84 8 22 0 58 75 4 70 92 0 264 2
10 36 37 79 47 11 80 12 17 58 0 0 8 45 130 9 335 0
11 83 124 173 94 32 152 16 27 75 0 0 18 229 158 9 400 0
12 13 15 21 35 5 23 3 4 4 8 18 0 14 14 0 45 1
13 86 160 197 111 44 173 24 31 70 45 229 14 0 161 10 455 5
14 86 106 190 108 34 209 32 37 92 130 158 14 161 0 10 0 6
15 3 3 11 7 1 15 1 6 0 9 9 0 10 10 0 27 0
16 186 269 492 303 74 505 68 79 264 335 400 45 455 0 27 0 14
17 1 1 3 3 0 5 0 0 2 0 0 1 5 6 0 14 0
Or may use dcast
library(data.table)
dcast(df, d1 ~ d2, value.var = 'count')
Key: <d1>
d1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1: 1 NA 121 99 67 26 90 11 20 29 36 83 13 86 86 3 186 1
2: 2 121 NA 166 105 27 146 16 18 43 37 124 15 160 106 3 269 1
3: 3 99 166 NA 165 44 234 22 33 73 79 173 21 197 190 11 492 3
4: 4 67 105 165 NA 44 122 15 23 35 47 94 35 111 108 7 303 3
5: 5 26 27 44 44 NA 34 3 3 5 11 32 5 44 34 1 74 NA
6: 6 90 146 234 122 34 NA 31 43 84 80 152 23 173 209 15 505 5
7: 7 11 16 22 15 3 31 NA 4 8 12 16 3 24 32 1 68 NA
8: 8 20 18 33 23 3 43 4 NA 22 17 27 4 31 37 6 79 NA
9: 9 29 43 73 35 5 84 8 22 NA 58 75 4 70 92 NA 264 2
10: 10 36 37 79 47 11 80 12 17 58 NA NA 8 45 130 9 335 NA
11: 11 83 124 173 94 32 152 16 27 75 NA NA 18 229 158 9 400 NA
12: 12 13 15 21 35 5 23 3 4 4 8 18 NA 14 14 NA 45 1
13: 13 86 160 197 111 44 173 24 31 70 45 229 14 NA 161 10 455 5
14: 14 86 106 190 108 34 209 32 37 92 130 158 14 161 NA 10 NA 6
15: 15 3 3 11 7 1 15 1 6 NA 9 9 NA 10 10 NA 27 NA
16: 16 186 269 492 303 74 505 68 79 264 335 400 45 455 NA 27 NA 14
17: 17 1 1 3 3 NA 5 NA NA 2 NA NA 1 5 6 NA 14 NA
If you are looking for a more publishable solution, you might want to try crosstable::crosstable() as it would let you output a nice HTML table.
This would require a few parameters though, as it is not meant for crossing long vectors of numbers in the first place.
Here is the code:
library(dplyr)
library(crosstable)
ct = df %>%
crosstable(d1, by=d2, percent_pattern="{n}", unique_numeric=Inf)
as_flextable(ct)
I have a data.frame with multiple columns recording values for months of the year.
zdate mwe1.x mwe2.x mwe3.x mwe4.x mwe5.x mwe6.x mwe7.x mwe1.y mwe2.y
1 Jan 2017 10 0 1 0 0 4 0 41 5
2 Feb 2017 7 0 0 0 0 0 0 76 33
3 Mar 2017 16 0 0 0 0 6 0 261 59
4 Apr 2017 40 4 0 0 1 0 0 546 80
5 May 2017 8 0 0 0 1 4 0 154 18
6 Jun 2017 7 0 0 0 2 1 0 74 4
7 Jul 2017 20 0 0 0 0 1 0 116 8
8 Aug 2017 25 6 1 0 3 6 1 243 54
9 Sep 2017 8 2 2 0 3 5 0 257 46
10 Oct 2017 2 0 0 0 0 0 0 74 7
11 Nov 2017 13 0 0 0 1 0 0 144 9
12 Dec 2017 6 0 3 0 2 1 0 164 20
mwe3.y mwe4.y mwe5.y mwe6.y mwe7.y
1 17 4 11 4 28
2 61 0 22 7 72
3 91 1 69 16 309
4 71 0 94 19 206
5 29 0 44 3 58
6 21 0 15 2 66
7 12 0 23 2 20
8 20 0 36 2 55
9 42 0 55 7 89
10 13 0 24 0 7
11 39 0 18 1 11
12 54 0 88 5 51
I would like to plot separate line charts for these columns, but with the same x-axis, and stacked on top of each other. I am trying to use 'ggplot2' and the 'facet_wrap' to do this, but I can't seem to work out how exactly to do it. I can get a single line plot:
plot <- ggplot(all, aes(x = all$date, y = all$mwe1.x)) +
+ geom_line()
But I want to have this stacked with the line plot for 'mwe1.y' directly below it. Can anybody help me out?
Maybe you are looking for this. The key is to reshape your data to long in order to use the full potential of ggplot2. Here the code using tidyverse functions:
library(tidyverse)
#Code
df %>% mutate(zdate=factor(zdate,levels = unique(zdate),ordered = T)) %>%
pivot_longer(-zdate) %>%
ggplot(aes(x=zdate,y=value,group=name,color=name))+
geom_line()
Output:
Some data used:
#Data
df <- structure(list(zdate = c("Jan-2017", "Feb-2017", "Mar-2017",
"Apr-2017", "May-2017", "Jun-2017", "Jul-2017", "Aug-2017", "Sep-2017",
"Oct-2017", "Nov-2017", "Dec-2017"), mwe1.x = c(10L, 7L, 16L,
40L, 8L, 7L, 20L, 25L, 8L, 2L, 13L, 6L), mwe2.x = c(0L, 0L, 0L,
4L, 0L, 0L, 0L, 6L, 2L, 0L, 0L, 0L), mwe3.x = c(1L, 0L, 0L, 0L,
0L, 0L, 0L, 1L, 2L, 0L, 0L, 3L), mwe4.x = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), mwe5.x = c(0L, 0L, 0L, 1L, 1L, 2L,
0L, 3L, 3L, 0L, 1L, 2L), mwe6.x = c(4L, 0L, 6L, 0L, 4L, 1L, 1L,
6L, 5L, 0L, 0L, 1L), mwe7.x = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 0L, 0L), mwe1.y = c(41L, 76L, 261L, 546L, 154L, 74L,
116L, 243L, 257L, 74L, 144L, 164L), mwe2.y = c(5L, 33L, 59L,
80L, 18L, 4L, 8L, 54L, 46L, 7L, 9L, 20L), mwe3.y = c(17L, 61L,
91L, 71L, 29L, 21L, 12L, 20L, 42L, 13L, 39L, 54L), mwe4.y = c(4L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), mwe5.y = c(11L,
22L, 69L, 94L, 44L, 15L, 23L, 36L, 55L, 24L, 18L, 88L), mwe6.y = c(4L,
7L, 16L, 19L, 3L, 2L, 2L, 2L, 7L, 0L, 1L, 5L), mwe7.y = c(28L,
72L, 309L, 206L, 58L, 66L, 20L, 55L, 89L, 7L, 11L, 51L)), class = "data.frame", row.names = c(NA,
-12L))
Or maybe this:
#Code 2
df %>% mutate(zdate=factor(zdate,levels = unique(zdate),ordered = T)) %>%
pivot_longer(-zdate) %>%
mutate(name=factor(name,levels = unique(name),ordered = T)) %>%
ggplot(aes(x=zdate,y=value,group=name,color=name))+
geom_line()+
facet_wrap(.~name,ncol = 7,scales = 'free_y')+
theme(legend.position = 'none',
axis.text.x = element_text(angle=90))
Output:
Update: The OP only wish specific variables, so we can use filter():
#Code 3
df %>% mutate(zdate=factor(zdate,levels = unique(zdate),ordered = T)) %>%
pivot_longer(-zdate) %>%
filter(name %in% c('mwe1.x','mwe1.y')) %>%
mutate(name=factor(name,levels = unique(name),ordered = T)) %>%
ggplot(aes(x=zdate,y=value,group=name,color=name))+
geom_line()+
facet_wrap(.~name,ncol = 7,scales = 'free_y')+
theme(legend.position = 'none',
axis.text.x = element_text(angle=90))
Output:
My dataset has the next structure
df=structure(list(Data = structure(c(12L, 13L, 14L, 15L, 16L, 17L,
18L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("01.01.2018",
"02.01.2018", "03.01.2018", "04.01.2018", "05.01.2018", "06.01.2018",
"07.01.2018", "12.02.2018", "13.02.2018", "14.02.2018", "15.02.2018",
"25.12.2017", "26.12.2017", "27.12.2017", "28.12.2017", "29.12.2017",
"30.12.2017", "31.12.2017"), class = "factor"), sku = 1:18, metric = c(100L,
210L, 320L, 430L, 540L, 650L, 760L, 870L, 980L, 1090L, 1200L,
1310L, 1420L, 1530L, 1640L, 1750L, 1860L, 1970L), action = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L)), .Names = c("Data", "sku", "metric", "action"), class = "data.frame", row.names = c(NA,
-18L))
I need to delete observations that have certain dates.
But in this dataset there is action variable. The action column has only two values 0 and 1.
Observations on these certain dates should be deleted only for the zero category of action.
these dates are presented in a separate datase.
datedata=structure(list(Data = structure(c(18L, 19L, 20L, 21L, 22L, 5L,
7L, 9L, 11L, 13L, 15L, 17L, 23L, 1L, 2L, 3L, 4L, 6L, 8L, 10L,
12L, 14L, 16L), .Label = c("01.05.2018", "02.05.2018", "03.05.2018",
"04.05.2018", "05.03.2018", "05.05.2018", "06.03.2018", "06.05.2018",
"07.03.2018", "07.05.2018", "08.03.2018", "08.05.2018", "09.03.2018",
"09.05.2018", "10.03.2018", "10.05.2018", "11.03.2018", "21.02.2018",
"22.02.2018", "23.02.2018", "24.02.2018", "25.02.2018", "30.04.2018"
), class = "factor")), .Names = "Data", class = "data.frame", row.names = c(NA,
-23L))
how can i do it?
A solution is to use dplyr::filter as:
library(dplyr)
library(lubridate)
df %>% mutate(Data = dmy(Data)) %>%
filter(action==1 | (action==0 & !(Data %in% dmy(datedata$Data))))
# Data sku metric action
# 1 2017-12-25 1 100 0
# 2 2017-12-26 2 210 0
# 3 2017-12-27 3 320 0
# 4 2017-12-28 4 430 0
# 5 2017-12-29 5 540 0
# 6 2017-12-30 6 650 0
# 7 2017-12-31 7 760 0
# 8 2018-01-01 8 870 0
# 9 2018-01-02 9 980 1
# 10 2018-01-03 10 1090 1
# 11 2018-01-04 11 1200 1
# 12 2018-01-05 12 1310 1
# 13 2018-01-06 13 1420 1
# 14 2018-01-07 14 1530 1
# 15 2018-02-12 15 1640 1
# 16 2018-02-13 16 1750 1
# 17 2018-02-14 17 1860 1
# 18 2018-02-15 18 1970 1
I guess this will work. Fist use match to see weather there is a match in the day of df and the day in datedata, then filter it
library (dplyr)
df <- df %>% mutate (Data.flag = match(Data,datedata$Data)) %>%
filter(!is.na(Data.flag) & action == 0)