Adding spaces between gathered bars in ggplot? - r

I have the following R code to plot the counts for two values side by side each month (d1 and d2). Before summarising with ddply, I have the count for separate 'posts' made on different dates.
I really want / need to add a space between the paired bars each month, and labels for each month, so it is clearer to read. Can anybody suggest what I should do for this?
m_scores = ddply(data_NKsam2,.(zdate),summarise, d1 = sum(d1) , d2 = sum(d2))
require(tidyr)
df <- gather(m_scores, event, total, d1:d2)
plot <- ggplot(df, aes(zdate, total, fill=event))
plot <- plot + geom_bar(stat = "identity", position = 'dodge')
This code currently produces a chart like this:
Here is a section of the data 'm_scores'
zdate d1 d2
11 Nov 2014 263 318
12 Dec 2014 430 662
13 Jan 2015 507 326
14 Feb 2015 326 279
15 Mar 2015 281 345
16 Apr 2015 352 260
17 May 2015 280 315
18 Jun 2015 243 238
19 Jul 2015 313 251
20 Aug 2015 446 439
21 Sep 2015 416 404
22 Oct 2015 616 423
23 Nov 2015 269 242
24 Dec 2015 781 527
25 Jan 2016 865 861
26 Feb 2016 997 2139
27 Mar 2016 920 1421
28 Apr 2016 376 498
29 May 2016 434 309
30 Jun 2016 271 284

Related

How to dynamically loop through a split dataframe in R

I have dataframe df3
df3
x d y
1 bbc Sep 2020 123
2 rsb Sep 2020 234
3 atc Sep 2020 345
4 svc Sep 2020 543
5 mwe Sep 2020 567
6 bpa Oct 2020 322
7 mwe Oct 2020 456
8 uhs Oct 2020 786
9 se Oct 2020 543
10 db Oct 2020 778
11 rsb Nov 2020 358
12 svc Nov 2020 678
13 db Nov 2020 321
14 rb Nov 2020 689
15 bpa Nov 2020 765
After applying split function `df5 = split(df3,f=df3$d)
> df5 = split(df3,f=df3$d)
> df5
$`Sep 2020`
x d y
1 bbc Sep 2020 123
2 rsb Sep 2020 234
3 atc Sep 2020 345
4 svc Sep 2020 543
5 mwe Sep 2020 567
$`Oct 2020`
x d y
6 bpa Oct 2020 322
7 mwe Oct 2020 456
8 uhs Oct 2020 786
9 se Oct 2020 543
10 db Oct 2020 778
$`Nov 2020`
x d y
11 rsb Nov 2020 358
12 svc Nov 2020 678
13 db Nov 2020 321
14 rb Nov 2020 689
15 bpa Nov 2020 765
I would like to dynamically loop through the split dataframe.
I need to find out if any values present in Nov 2020 are also present in Oct 2020.
if it is present in both, then have to check the previous one Sep 2020, and also find the number of times the names have occurred. Here df3$d is in as.yearmon format. If any names in df5[["Nov 2020"]]$x are present in df5[["Sep 2020"]]$x, extract and store it in an object along with its count. here the count is 2 since it would be present in Nov 2020 and Oct 2020. Only if the names are present in the most recent month, it should check the previous months. For this example, the output should be
> df4
names_present present_for
1 bpa 2
2 db 2
Thank you in advance

How to subset a data frame based on another data frame in base R

Suppose there is a data frame like (call it df1)
203 Feb 2014 353
204 Feb 2015 416
205 Feb 2057 2
206 Feb 2058 1
207 Feb 2062 1
208 Feb 2064 1
209 Feb 2065 1
210 Feb 2066 4
211 Feb 2067 10
212 Feb 2068 3
213 Jan 1969 123
214 Jan 1970 120
215 Jan 1971 162
216 Jan 1972 159
217 Jan 1973 109
218 Jan 1974 98
and another dataframe like ( call it df2)
1 Feb 2014
2 Jan 1974
then how do i make a subset of the df1 such that
204 Feb 2015 416
218 Jan 1974 98
is there way of doing this with base R?
Assuming that your data looks like this:
df1 <- read.table(text='203 "Feb 2014" 353
204 "Feb 2015" 416
205 "Feb 2057" 2
206 "Feb 2058" 1
207 "Feb 2062" 1
208 "Feb 2064" 1
209 "Feb 2065" 1
210 "Feb 2066" 4
211 "Feb 2067" 10
212 "Feb 2068" 3
213 "Jan 1969" 123
214 "Jan 1970" 120
215 "Jan 1971" 162
216 "Jan 1972" 159
217 "Jan 1973" 109
218 "Jan 1974" 98')
df2 <- read.table(text='1 "Feb 2015"
2 "Jan 1974"')
And also assuming that you mean Feb 2015 and not Feb 2014 (because Feb 2015 has the other column as 416) you could do:
#you use the %in% operator to find which elements of df2 exist in df1
#and use those to subset df1
df1[df1$V2 %in% df2$V2, ]
V1 V2 V3
2 204 Feb 2015 416
16 218 Jan 1974 98
Try dplyr
library(dplyr)
df2 %>% left_join(df1)

How to switch a row dimension to a column dimension in R? [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 7 years ago.
I have a df looks like below.
Year Month Cont
1 2011 Apr 1376
2 2012 Apr 1232
3 2013 Apr 1360
4 2014 Apr 1294
5 2015 Apr 1344
6 2011 Aug 1933
7 2012 Aug 1930
8 2013 Aug 1821
9 2014 Aug 1845
10 2015 Aug 1855
So my question is how can I switch the rows in "Month" the column. The result should look like this.
Cont Apr Aug
1 2011 1376 1933
2 2012 1232 1930
3 2013 1360 1821
4 2014 1294 1845
5 2015 1344 1855
You can use reshape2:
library(reshape2)
dcast(df, Year~Month, value.var="Cont")
Or tidyr:
library(tidyr)
spread(df, Month, Cont)
Please refer the following code
> dat <- read.table("data.txt", quote="\"", comment.char="")
> dat
V1 V2 V3
1 2011 Apr 1376
2 2012 Apr 1232
3 2013 Apr 1360
4 2014 Apr 1294
5 2015 Apr 1344
6 2011 Aug 1933
7 2012 Aug 1930
8 2013 Aug 1821
9 2014 Aug 1845
10 2015 Aug 1855
> library(reshape2)
> dcast(dat, V1~V2)
Using V3 as value column: use value.var to override.
V1 Apr Aug
1 2011 1376 1933
2 2012 1232 1930
3 2013 1360 1821
4 2014 1294 1845
5 2015 1344 1855

How can I have a predictive Termination & Active model?

I am a newbie in R. I have a dataset. Year & Month Active is store in the network Enterprise. Termination is store that left the network Enterprise. With these two variables, I can calculate the turnover My turnover is Termination / ((Active + Termination)) / (nb jours in the month) Example : Janv. 2013 , Active = 593 , Termination = 100 , Turnover = 1,75%
My question is with my dataset in attachment how can I calculate the active number and the termination number until 12-2015 ?
Is it possible to have a view of the scenario?
Dataset:
Year Month Active Termination To (%)
2013 1 5936 100 1,75%
2013 2 6182 190 3,21%
2013 3 6501 117 1,91%
2013 4 6675 92 1,43%
2013 5 6749 111 1,67%
2013 6 6719 145 2,20%
2013 7 6814 121 1,83%
2013 8 6854 90 1,34%
2013 9 6972 99 1,45%
2013 10 7320 99 1,42%
2013 11 7606 98 1,33%
2013 12 7976 155 1,99%
2014 1 7934 87 1,11%
2014 2 8079 127 1,61%
2014 3 8198 125 1,56%
2014 4 8135 154 1,91%
2014 5 8113 136 1,70%
2014 6 8095 173 2,17%
2014 7 8131 220 2,76%
2014 8 7950 135 1,72%
2014 9 7978 108 1,38%
2014 10 8117 199 2,51%
2014 11 8269 117 1,45%
2014 12 8471 177 2,11%
2015 1 8472 132 1,59%
2015 2 8591 117 1,39%
2015 3 8691 161 1,90%
2015 4 8647 126 1,48%
2015 5 8623 123 1,45%
2015 6 8739 177 2,07%
2015 7 8740 218 2,55%
2015 8 8548 35 0,41%

Order dataframe by month

I have calculated the maximum counts per month in this data.frame:
counts <- data.frame(year = sort(rep(2000:2009, 12)), month = rep(month.abb,10), count = sample(1:500, 120, replace = T))
library(plyr)
count_max <- ddply(counts, .(month), summarise, max.count = max(count))
month max.count
1 Apr 470
2 Aug 389
3 Dec 446
4 Feb 487
5 Jan 473
6 Jul 460
7 Jun 488
8 Mar 449
9 May 488
10 Nov 464
11 Oct 483
12 Sep 394
I now want to sort count_max by the month.abb vector, so that month is in the usual order Jan-Dec. This is what I tried:
count_max[match(count_max$month, month.abb),]
...but it didn't work. How can I arrange count_max$month in the order Jan-Dec?
An alternate without conversion:
count_max[order(match(count_max$month, month.abb)), ]
# month max.count
# 5 Jan 466
# 4 Feb 356
# 8 Mar 496
# 1 Apr 489
# 9 May 498
# 7 Jun 497
# 6 Jul 491
# 2 Aug 446
# 12 Sep 414
# 11 Oct 490
# 10 Nov 416
# 3 Dec 475
Note that in your example, match(count...) returns the position of a given month in month.abb, which is what you want to sort by. You came real close, but instead of sorting by that vector, you subsetted by it. So, for example, August is the 2nd value in your original DF, but the 8th value in month.abb, so the match value for the 2nd value in your subset vector is 8, which means you are going to put the 8th row of your original data frame (in your case March), into the second position of your new DF, instead of ranking the 2nd row in your original DF into 8th position of the new one.
The distinction is a bit of a brain twister, but if you think it through it should make sense.
Convert your "month" column into an ordered factor:
factor(count_max$month, month.abb, ordered=TRUE)
# [1] Apr Aug Dec Feb Jan Jul Jun Mar May Nov Oct Sep
# Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < Oct < Nov < Dec
Example:
count_max$month <- factor(count_max$month, month.abb, ordered=TRUE)
count_max[order(count_max$month), ]
# month max.count
# 5 Jan 482
# 4 Feb 408
# 8 Mar 483
# 1 Apr 489
# 9 May 369
# 7 Jun 432
# 6 Jul 344
# 2 Aug 470
# 12 Sep 474
# 11 Oct 450
# 10 Nov 492
# 3 Dec 366

Resources