r order numeric values - r

My question is trivial but some how I cannot find how to sort numbers. I would like it to be order by group and rank (1,2,3,4,5,6,7,8,9,10,11,12,13,14)
means <- ddply(Data, ~Group ~rank, summarise, mean=mean(Foo))
#My column types
str(means)
#'data.frame': 56 obs. of 3 variables:
# $ Group: chr "dEC" "dEC" "dEC" "dEC" ...
# $ rank : chr "1" "10" "11" "12" ...
# $ mean : num 41.4 67.4 NA 65.9 71.3 ...
#means
Group rank mean
1 dEC 1 41.37500
2 dEC 10 67.37500
3 dEC 11 NA
4 dEC 12 65.88889
5 dEC 13 71.33333
6 dEC 14 69.87500
7 dEC 2 60.87500
8 dEC 3 65.75000
9 dEC 4 66.00000
10 dEC 5 64.50000
11 dEC 6 70.25000
12 dEC 7 66.75000
13 dEC 8 65.12500
14 dEC 9 68.75000
15 Sham - dEC 1 46.90909
16 Sham - dEC 10 67.54545
17 Sham - dEC 11 68.90909
18 Sham - dEC 12 70.00000
19 Sham - dEC 13 68.36364
20 Sham - dEC 14 71.27273
21 Sham - dEC 2 55.72727
22 Sham - dEC 3 62.09091
23 Sham - dEC 4 61.54545
24 Sham - dEC 5 66.09091
25 Sham - dEC 6 67.63636
26 Sham - dEC 7 66.09091
27 Sham - dEC 8 65.90909
28 Sham - dEC 9 65.81818
#Desired results
#Ordered means
Group rank mean
1 dEC 1 41.37500
7 dEC 2 60.87500
8 dEC 3 65.75000
9 dEC 4 66.00000
10 dEC 5 64.50000
11 dEC 6 70.25000
12 dEC 7 66.75000
13 dEC 8 65.12500
14 dEC 9 68.75000
2 dEC 10 67.37500
3 dEC 11 NA
4 dEC 12 65.88889
5 dEC 13 71.33333
6 dEC 14 69.87500
15 Sham - dEC 1 46.90909
21 Sham - dEC 2 55.72727
22 Sham - dEC 3 62.09091
23 Sham - dEC 4 61.54545
24 Sham - dEC 5 66.09091
25 Sham - dEC 6 67.63636
26 Sham - dEC 7 66.09091
27 Sham - dEC 8 65.90909
28 Sham - dEC 9 65.81818
16 Sham - dEC 10 67.54545
17 Sham - dEC 11 68.90909
18 Sham - dEC 12 70.00000
19 Sham - dEC 13 68.36364
20 Sham - dEC 14 71.27273

The rank column was not numeric. So, we convert that to 'numeric' from 'character' class and order the columns 'Group' and 'rank'
means[with(means, order(Group,as.numeric(rank))),]
Or another option would be arrange from plyr (as commented by #Wistar)
library(plyr)
arrange(means, Group, as.numeric(rank))
If we are using dplyr, all the steps can be chained together (not tested)
library(dplyr)
Data %>%
group_by(Group, rank) %>%
summarise(mean=mean(Foo)) %>%
arrange(Group, as.numeric(rank))

Related

Mutate Year with Month Column for a Time Series Data Input in R Using Lubridate Package

I have this time series data frame as follows:
df <- read.table(text =
"Year Month Value
2021 1 4
2021 2 11
2021 3 18
2021 4 6
2021 5 20
2021 6 5
2021 7 12
2021 8 4
2021 9 11
2021 10 18
2021 11 6
2021 12 20
2022 1 14
2022 2 11
2022 3 18
2022 4 9
2022 5 22
2022 6 19
2022 7 22
2022 8 24
2022 9 17
2022 10 28
2022 11 16
2022 12 26",
header = TRUE)
I want to turn this data frame into a time series object of date column and value column only so that I can use the ts function to filter the starting point and the endpoint like ts(ts, start = starts, frequency = 12). R should know that 2022 is a year and the corresponding 1:12 are its months, the same thing should apply to 2021. I will prefer lubridate package.
pacman::p_load(
dplyr,
lubridate)
UPDATE
I now use unite function from dplyr package.
df|>
unite(col='date', c('Year', 'Month'), sep='')
Perhaps this?
df |>
tidyr::unite(col='date', c('Year', 'Month'), sep='-') |>
mutate(date = lubridate::ym(date))
# date Value
# 1 2021-01-01 4
# 2 2021-02-01 11
# 3 2021-03-01 18
# 4 2021-04-01 6
# 5 2021-05-01 20
# 6 2021-06-01 5
# 7 2021-07-01 12
# 8 2021-08-01 4
# 9 2021-09-01 11
# 10 2021-10-01 18
# 11 2021-11-01 6
# 12 2021-12-01 20
# 13 2022-01-01 14
# 14 2022-02-01 11
# 15 2022-03-01 18
# 16 2022-04-01 9
# 17 2022-05-01 22
# 18 2022-06-01 19
# 19 2022-07-01 22
# 20 2022-08-01 24
# 21 2022-09-01 17
# 22 2022-10-01 28
# 23 2022-11-01 16
# 24 2022-12-01 26

R loop over nominal list and integers

I have a dataset where I have been able to loop over different test values with dpois. For simplicity's sake, I have used an average of 4 events per month and I wanted to know what is the likelihood of n or more events, given the average. Here is what I have managed to make work:
MonthlyAverage <- 4
cnt <- c(0:10)
for (i in cnt) {
CountProb <- ppois(cnt,MonthlyAverage,lower.tail=FALSE)
}
dfProb <- data.frame(cnt,CountProb)
I am interested in investigating this to figure out how many events I may expect each month given the mean of that month.
I would be looking to say:
For January, what is the probability of 0
For January, what is the probability of 1
For January, what is the probability of 2
etc...
For February, what is the probability of 0
For February, what is the probability of 1
For February, what is the probability of 2
etc.
To give something like (numbers here are just an example):
I thought about trying one loop to select the correct month and then remove the month column so I am just left with the single "Monthly Average" value and then performing the count loop, but that doesn't seem to work. I still get "Non-numeric argument to mathematical function". I feel like I'm close, but can anyone please point me in the right direction for the formatting?
a "tidy-style" solution:
library(tidyr)
library(dplyr)
## example data:
df <- data.frame(Month = c('Jan', 'Feb'),
MonthlyAverage = c(5, 2)
)
> df
Month MonthlyAverage
1 Jan 5
2 Feb 2
df |>
mutate(n = list(1:10)) |>
unnest_longer(n) |>
mutate(CountProb = ppois(n, MonthlyAverage,
lower.tail=FALSE
)
)
# A tibble: 20 x 4
Month MonthlyAverage n CountProb
<chr> <dbl> <int> <dbl>
1 Jan 5 1 0.960
2 Jan 5 2 0.875
3 Jan 5 3 0.735
4 Jan 5 4 0.560
5 Jan 5 5 0.384
6 Jan 5 6 0.238
## ...
How about something like this:
cnt <- 0:10
MonthlyAverage <- c(1.8, 1.56, 2.44, 1.86, 2.1, 2.3, 2, 2.78, 1.89, 1.86, 1.4, 1.71)
grid <- expand.grid(cnt =cnt, m_num = 1:12)
grid$MonthlyAverage <- MonthlyAverage[grid$m_num]
mnames <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
grid$month <- mnames[grid$m_num]
grid$prob <- ppois(grid$cnt, grid$MonthlyAverage, lower.tail=FALSE)
grid[,c("month", "cnt", "prob")]
#> month cnt prob
#> 1 Jan 0 8.347011e-01
#> 2 Jan 1 5.371631e-01
#> 3 Jan 2 2.693789e-01
#> 4 Jan 3 1.087084e-01
#> 5 Jan 4 3.640666e-02
#> 6 Jan 5 1.037804e-02
#> 7 Jan 6 2.569450e-03
#> 8 Jan 7 5.615272e-04
#> 9 Jan 8 1.097446e-04
#> 10 Jan 9 1.938814e-05
#> 11 Jan 10 3.123964e-06
#> 12 Feb 0 7.898639e-01
#> 13 Feb 1 4.620517e-01
#> 14 Feb 2 2.063581e-01
#> 15 Feb 3 7.339743e-02
#> 16 Feb 4 2.154277e-02
#> 17 Feb 5 5.364120e-03
#> 18 Feb 6 1.157670e-03
#> 19 Feb 7 2.202330e-04
#> 20 Feb 8 3.743272e-05
#> 21 Feb 9 5.747339e-06
#> 22 Feb 10 8.044197e-07
#> 23 Mar 0 9.128391e-01
#> 24 Mar 1 7.001667e-01
#> 25 Mar 2 4.407062e-01
#> 26 Mar 3 2.296784e-01
#> 27 Mar 4 1.009515e-01
#> 28 Mar 5 3.813271e-02
#> 29 Mar 6 1.258642e-02
#> 30 Mar 7 3.681711e-03
#> 31 Mar 8 9.657751e-04
#> 32 Mar 9 2.294546e-04
#> 33 Mar 10 4.979244e-05
#> 34 Apr 0 8.443274e-01
#> 35 Apr 1 5.547763e-01
#> 36 Apr 2 2.854938e-01
#> 37 Apr 3 1.185386e-01
#> 38 Apr 4 4.090445e-02
#> 39 Apr 5 1.202455e-02
#> 40 Apr 6 3.071778e-03
#> 41 Apr 7 6.928993e-04
#> 42 Apr 8 1.398099e-04
#> 43 Apr 9 2.550478e-05
#> 44 Apr 10 4.244028e-06
#> 45 May 0 8.775436e-01
#> 46 May 1 6.203851e-01
#> 47 May 2 3.503686e-01
#> 48 May 3 1.613572e-01
#> 49 May 4 6.212612e-02
#> 50 May 5 2.044908e-02
#> 51 May 6 5.862118e-03
#> 52 May 7 1.486029e-03
#> 53 May 8 3.373058e-04
#> 54 May 9 6.927041e-05
#> 55 May 10 1.298297e-05
#> 56 Jun 0 8.997412e-01
#> 57 Jun 1 6.691458e-01
#> 58 Jun 2 4.039612e-01
#> 59 Jun 3 2.006529e-01
#> 60 Jun 4 8.375072e-02
#> 61 Jun 5 2.997569e-02
#> 62 Jun 6 9.361934e-03
#> 63 Jun 7 2.588841e-03
#> 64 Jun 8 6.415773e-04
#> 65 Jun 9 1.439431e-04
#> 66 Jun 10 2.948727e-05
#> 67 Jul 0 8.646647e-01
#> 68 Jul 1 5.939942e-01
#> 69 Jul 2 3.233236e-01
#> 70 Jul 3 1.428765e-01
#> 71 Jul 4 5.265302e-02
#> 72 Jul 5 1.656361e-02
#> 73 Jul 6 4.533806e-03
#> 74 Jul 7 1.096719e-03
#> 75 Jul 8 2.374473e-04
#> 76 Jul 9 4.649808e-05
#> 77 Jul 10 8.308224e-06
#> 78 Aug 0 9.379615e-01
#> 79 Aug 1 7.654944e-01
#> 80 Aug 2 5.257652e-01
#> 81 Aug 3 3.036162e-01
#> 82 Aug 4 1.492226e-01
#> 83 Aug 5 6.337975e-02
#> 84 Aug 6 2.360590e-02
#> 85 Aug 7 7.809999e-03
#> 86 Aug 8 2.320924e-03
#> 87 Aug 9 6.254093e-04
#> 88 Aug 10 1.540564e-04
#> 89 Sep 0 8.489282e-01
#> 90 Sep 1 5.634025e-01
#> 91 Sep 2 2.935807e-01
#> 92 Sep 3 1.235929e-01
#> 93 Sep 4 4.327373e-02
#> 94 Sep 5 1.291307e-02
#> 95 Sep 6 3.349459e-03
#> 96 Sep 7 7.672845e-04
#> 97 Sep 8 1.572459e-04
#> 98 Sep 9 2.913775e-05
#> 99 Sep 10 4.925312e-06
#> 100 Oct 0 8.443274e-01
#> 101 Oct 1 5.547763e-01
#> 102 Oct 2 2.854938e-01
#> 103 Oct 3 1.185386e-01
#> 104 Oct 4 4.090445e-02
#> 105 Oct 5 1.202455e-02
#> 106 Oct 6 3.071778e-03
#> 107 Oct 7 6.928993e-04
#> 108 Oct 8 1.398099e-04
#> 109 Oct 9 2.550478e-05
#> 110 Oct 10 4.244028e-06
#> 111 Nov 0 7.534030e-01
#> 112 Nov 1 4.081673e-01
#> 113 Nov 2 1.665023e-01
#> 114 Nov 3 5.372525e-02
#> 115 Nov 4 1.425330e-02
#> 116 Nov 5 3.201149e-03
#> 117 Nov 6 6.223149e-04
#> 118 Nov 7 1.065480e-04
#> 119 Nov 8 1.628881e-05
#> 120 Nov 9 2.248494e-06
#> 121 Nov 10 2.828495e-07
#> 122 Dec 0 8.191342e-01
#> 123 Dec 1 5.098537e-01
#> 124 Dec 2 2.454189e-01
#> 125 Dec 3 9.469102e-02
#> 126 Dec 4 3.025486e-02
#> 127 Dec 5 8.217692e-03
#> 128 Dec 6 1.937100e-03
#> 129 Dec 7 4.028407e-04
#> 130 Dec 8 7.489285e-05
#> 131 Dec 9 1.258275e-05
#> 132 Dec 10 1.927729e-06
Created on 2023-01-09 by the reprex package (v2.0.1)
If you have each month's mean, in base R you could easily use sapply to estimate the probability of obtaining values 0 to 10 using each month's mean value. Then you can simply combine it in a data frame:
# Data
df <- data.frame(month = month.name,
mean = c(1.8, 2.8, 1.7, 1.6, 1.8, 2,
2.3, 2.4, 2.1, 1.4, 1.9, 1.9))
probs <- sapply(1:12, function(x) ppois(0:10, df$mean[x], lower.tail = FALSE))
finaldata <- data.frame(month = rep(month.name, each = 11),
events = rep(0:10, times = 12),
prob = prob = as.vector(probs))
Output:
# month events prob
# 1 January 0 8.347011e-01
# 2 January 1 5.371631e-01
# 3 January 2 2.693789e-01
# 4 January 3 1.087084e-01
# 5 January 4 3.640666e-02
# 6 January 5 1.037804e-02
# 7 January 6 2.569450e-03
# 8 January 7 5.615272e-04
# 9 January 8 1.097446e-04
# 10 January 9 1.938814e-05
# 11 January 10 3.123964e-06
# 12 February 0 9.391899e-01
# 13 February 1 7.689218e-01
# 14 February 2 5.305463e-01
# 15 February 3 3.080626e-01
# ...
# 131 December 9 3.044317e-05
# 132 December 10 5.172695e-06

How to change grouped data in ungrouped data

I have grouped data that I want to convert to ungrouped data.
year<-c(rep(2014,4),rep(2015,4))
Age<-rep(c(22,23,24,25),2)
n<-c(1,1,3,2,0,2,3,1)
mydata<-data.frame(year,Age,n)
I would like to have a dataset like the one below created from the previous one.
year Age
1 2014 22
2 2014 23
3 2014 24
4 2014 24
5 2014 24
6 2014 25
7 2014 25
8 2015 23
9 2015 23
10 2015 24
11 2015 24
12 2015 24
13 2015 25
Try
mydata[rep(1:nrow(mydata),mydata$n),]
year Age n
1 2014 22 1
2 2014 23 1
3 2014 24 3
3.1 2014 24 3
3.2 2014 24 3
4 2014 25 2
4.1 2014 25 2
6 2015 23 2
6.1 2015 23 2
7 2015 24 3
7.1 2015 24 3
7.2 2015 24 3
8 2015 25 1
Here's a tidyverse solution:
library(tidyverse)
mydata %>%
uncount(n)
which gives:
year Age
1 2014 22
2 2014 23
3 2014 24
4 2014 24
5 2014 24
6 2014 25
7 2014 25
8 2015 23
9 2015 23
10 2015 24
11 2015 24
12 2015 24
13 2015 25
You can also use tidyr syntax for this:
library(tidyr)
year<-c(rep(2014,4),rep(2015,4))
Age<-rep(c(22,23,24,25),2)
n<-c(1,1,3,2,0,2,3,1)
mydata<-data.frame(year,Age,n)
uncount(mydata, n)
#> year Age
#> 1 2014 22
#> 2 2014 23
#> 3 2014 24
#> 4 2014 24
#> 5 2014 24
#> 6 2014 25
#> 7 2014 25
#> 8 2015 23
#> 9 2015 23
#> 10 2015 24
#> 11 2015 24
#> 12 2015 24
#> 13 2015 25
But of course you shouldn't use tidyr just because it is tidyr :) An alternate view of the Tidyverse "dialect" of the R language, and its promotion by RStudio.
We can use tidyr::complete
library(tidyr)
library(dplyr)
mydata %>% group_by(year, Age) %>%
complete(n = seq_len(n)) %>%
select(-n) %>%
ungroup()
# A tibble: 14 × 2
year Age
<dbl> <dbl>
1 2014 22
2 2014 23
3 2014 24
4 2014 24
5 2014 24
6 2014 25
7 2014 25
8 2015 23
9 2015 23
10 2015 24
11 2015 24
12 2015 24
13 2015 25
14 2015 22

Daily Average of Time series derived from monthly data R monthdays()

I have a time series object ts. I have mentioned the entire object here. It has data from Jan 2013 to Dec 2017 for all years. I am trying to find the daily average value so that the value is divided by the number of days in a month.
Expected output
The first value for Jan 2013 in ts is 23770, I want the value to be 23770/31 where 31 is the number of days in Jan, second value for Feb 2013 is 23482. I want the value to be 23482/28 as 28 was the number of days in Feb 2013 and so on
Tried so far:
I know monthdays() can do this. Something like ts/monthdays() .Monthdays() returns number of days in a month. I am not able to implement it here. Read about this tapply somewhere but it is not giving me desired result, since i need values corresponding to each month year combination.
ts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 23770 23482 23601 22889 23401 24240 23873 23647 23378 23871 22624 23496
2014 26765 27619 26341 27320 27389 27418 26874 27005 27538 26324 27267 27583
2015 28354 27452 28336 28998 28595 28338 27806 28660 27226 28317 28666 28574
2016 30209 30659 31554 30248 30358 31091 30389 30247 31227 31839 30602 30609
2017 32180 32203 31639 31784 32375 30856 31863 32827 32506 31702 31681 32176
> cycle(ts_actual_group2)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 1 2 3 4 5 6 7 8 9 10 11 12
2014 1 2 3 4 5 6 7 8 9 10 11 12
2015 1 2 3 4 5 6 7 8 9 10 11 12
2016 1 2 3 4 5 6 7 8 9 10 11 12
2017 1 2 3 4 5 6 7 8 9 10 11 12
Using tapply since i read it , but this is not giving desired output
tapply(ts_actual_group2, cycle(ts_actual_group2), mean)
1 2 3 4 5 6 7 8 9 10 11 12
28255.6 28283.0 28294.2 28247.8 28423.6 28388.6 28161.0 28477.2 28375.0 28410.6 28168.0 28487.6
I am not able to implement it here.
I'm not sure why you couldn't. The monthdays function from the forecast package, when applied to a ts object, returns the number of days in each month of the series. The object returned is a time-series of the same dimension as the input. So you can simply divide them.
library(forecast)
ts/monthdays(ts)
Jan Feb Mar Apr May Jun Jul
2013 766.7742 838.6429 761.3226 762.9667 754.8710 808.0000
2014 863.3871 986.3929 849.7097 910.6667 883.5161 913.9333
2015 914.6452 980.4286 914.0645 966.6000 922.4194 944.6000
2016 974.4839 1057.2069 1017.8710 1008.2667 979.2903 1036.3667
2017 1038.0645 1150.1071 1020.6129 1059.4667 1044.3548 1028.5333
monthsdays(ts) # Accepts a time-series object
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 31 28 31 30 31 30 31 31 30 31 30 31
2014 31 28 31 30 31 30 31 31 30 31 30 31
2015 31 28 31 30 31 30 31 31 30 31 30 31
2016 31 29 31 30 31 30 31 31 30 31 30 31
2017 31 28 31 30 31 30 31 31 30 31 30 31

Sum of previous five years

I need to aggregate the previous 5 years of the N_C variable in each row.
For example: year 2017 - Sum_Five_Years = 10(2017)+21(2015)+14(2014)+16(2013) = 61
Data:
library(dplyr)
DF<-data.frame(company = c("DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM"),
year= c("2017","2015","2015","2015","2013","2012","2012","2012","2010","2010","2015","2014","2014","2013","2013","2012"),
N_C= c("0","7","5","4","3","24","52","99","43","37","5","7","7","4","9","20"), Sum_Year = c("0","21","21","21","16","195","195","195","80","80","21","14","14","16","16","195"))
DF <- DF %>% arrange(year)
company year N_C Sum_Year
1 DEL MAR PHARM 2010 43 80
2 DEL MAR PHARM 2010 37 80
3 DEL MAR PHARM 2012 24 195
4 DEL MAR PHARM 2012 52 195
5 DEL MAR PHARM 2012 99 195
6 DEL MAR PHARM 2012 20 195
7 DEL MAR PHARM 2013 3 16
8 DEL MAR PHARM 2013 4 16
9 DEL MAR PHARM 2013 9 16
10 DEL MAR PHARM 2014 7 14
11 DEL MAR PHARM 2014 7 14
12 DEL MAR PHARM 2015 7 21
13 DEL MAR PHARM 2015 5 21
14 DEL MAR PHARM 2015 4 21
15 DEL MAR PHARM 2015 5 21
16 DEL MAR PHARM 2017 10 10
Expected Outcome
DF$Sum_Five_Year <- cbind(c("80","80","275","275","275","275","291","291","291","305","305","246","246","246","246","61"))
> DF
company year N_C Sum_Year Sum_Five_Year
1 DEL MAR PHARM 2010 43 80 80
2 DEL MAR PHARM 2010 37 80 80
3 DEL MAR PHARM 2012 24 195 275
4 DEL MAR PHARM 2012 52 195 275
5 DEL MAR PHARM 2012 99 195 275
6 DEL MAR PHARM 2012 20 195 275
7 DEL MAR PHARM 2013 3 16 291
8 DEL MAR PHARM 2013 4 16 291
9 DEL MAR PHARM 2013 9 16 291
10 DEL MAR PHARM 2014 7 14 305
11 DEL MAR PHARM 2014 7 14 305
12 DEL MAR PHARM 2015 7 21 246
13 DEL MAR PHARM 2015 5 21 246
14 DEL MAR PHARM 2015 4 21 246
15 DEL MAR PHARM 2015 5 21 246
16 DEL MAR PHARM 2017 10 10 61
I have tried the following code but it does not work:
library(data.table)
setDT(DF)
DF[, `:=` (Sum_Five_Year= sum(N_C)), by= list(company,cut(year, breaks = c(5), right = F))]
Any suggestion would be very appreciated :)
With no additional packages, you could use sapply.
The code below assumes that Sum_Year has already been created. You could apply the following directly to your example:
distinct(DF, company, year, Sum_Year) %>%
group_by(company) %>%
mutate(
year = as.integer(as.character(year)),
Sum_Five_Year = sapply(year, function(x) sum(Sum_Year[between(year, x - 5 + 1, x)]))
) %>%
left_join(DF %>% select(-Sum_Year), by = c("company", "year"))
Output:
# A tibble: 16 x 5
# Groups: company [?]
company year Sum_Year Sum_Five_Year N_C
<chr> <int> <int> <int> <int>
1 DELMARPHARM 2010 80 80 43
2 DELMARPHARM 2010 80 80 37
3 DELMARPHARM 2012 195 275 24
4 DELMARPHARM 2012 195 275 52
5 DELMARPHARM 2012 195 275 99
6 DELMARPHARM 2012 195 275 20
7 DELMARPHARM 2013 16 291 3
8 DELMARPHARM 2013 16 291 4
9 DELMARPHARM 2013 16 291 9
10 DELMARPHARM 2014 14 305 7
11 DELMARPHARM 2014 14 305 7
12 DELMARPHARM 2015 21 246 7
13 DELMARPHARM 2015 21 246 5
14 DELMARPHARM 2015 21 246 4
15 DELMARPHARM 2015 21 246 5
16 DELMARPHARM 2017 10 61 10
Otherwise you can do:
DF %>%
group_by(company, year) %>%
mutate(N_C = as.numeric(as.character(N_C))) %>%
summarise(Sum_Year = sum(N_C)) %>%
mutate(
year = as.integer(as.character(year)),
Sum_Five_Year = sapply(year, function(x) sum(Sum_Year[between(year, x - 5 + 1, x)]))
) %>%
left_join(DF %>% select(-Sum_Year), by = c("company", "year"))
If you'd like to get rid of the duplicated format, just leave out the join at the end:
DF %>%
group_by(company, year) %>%
mutate(N_C = as.numeric(as.character(N_C))) %>%
summarise(Sum_Year = sum(N_C)) %>%
mutate(
year = as.integer(as.character(year)),
Sum_Five_Year = sapply(year, function(x) sum(Sum_Year[between(year, x - 5 + 1, x)]))
)
Output:
# A tibble: 6 x 4
# Groups: company [1]
company year Sum_Year Sum_Five_Year
<chr> <int> <dbl> <dbl>
1 DELMARPHARM 2010 80 80
2 DELMARPHARM 2012 195 275
3 DELMARPHARM 2013 16 291
4 DELMARPHARM 2014 14 305
5 DELMARPHARM 2015 21 246
6 DELMARPHARM 2017 10 61

Resources