Summarizing percentage by subgroups - r

I don't know how to explain my problem, but I want to summarize the categories distance and get the percentage for each distance per month. In my table 1 week is 100% and now I want to calculate the same for the month but using the percentage from the weeks.
Something like sum(percent)/ amount of weeks in this month
This is what I have:
year month year_week distance object_remarks weeksum percent
1 2017 05 2017_21 15 ctenolabrus_rupestris 3 0.75
2 2017 05 2017_21 10 ctenolabrus_rupestris 1 0.25
3 2017 05 2017_22 5 ctenolabrus_rupestris 5 0.833
4 2017 05 2017_22 0 ctenolabrus_rupestris 1 0.167
5 2017 06 2017_22 0 ctenolabrus_rupestris 9 1
6 2017 06 2017_23 20 ctenolabrus_rupestris 6 0.545
7 2017 06 2017_23 0 ctenolabrus_rupestris 5 0.455
I want to have an output like this:
year month distance object_remarks weeksum percent percent_month
1 2017 05 15 ctenolabrus_rupestris 3 0.75 0.375
2 2017 05 10 ctenolabrus_rupestris 1 0.25 0.1225
3 2017 05 5 ctenolabrus_rupestris 5 0.833 0.4165
4 2017 05 0 ctenolabrus_rupestris 1 0.167 0.0835
5 2017 06 0 ctenolabrus_rupestris 14 1.455 0.7275
6 2017 06 20 ctenolabrus_rupestris 6 0.545 0.2775
Thanks a lot!

You may need to use group_by() twice.
df %>%
select(-year_week) %>%
group_by(month, distance) %>%
mutate(percent = sum(percent), weeksum = sum(weeksum)) %>%
distinct %>%
group_by(month) %>%
mutate(percent_month = percent/sum(percent))
# A tibble: 6 x 7
# Groups: month [2]
# year month distance object_remarks weeksum percent percent_month
# <int> <int> <int> <chr> <int> <dbl> <dbl>
# 1 2017 5 15 ctenolabrus_rupestris 3 0.75 0.375
# 2 2017 5 10 ctenolabrus_rupestris 1 0.25 0.125
# 3 2017 5 5 ctenolabrus_rupestris 5 0.833 0.416
# 4 2017 5 0 ctenolabrus_rupestris 1 0.167 0.0835
# 5 2017 6 0 ctenolabrus_rupestris 14 1.46 0.728
# 6 2017 6 20 ctenolabrus_rupestris 6 0.545 0.272

Related

Annual moving window over a data frame

I have a data frame of discharge data. Below is a reproducible example:
library(lubridate)
Date <- sample(seq(as.Date('1981/01/01'), as.Date('1982/12/31'), by="day"), 24)
Date <- sort(Date, decreasing = F)
Station <- rep(as.character("A"), 24)
Discharge <- rnorm(n = 24, mean = 1, 1)
df <- cbind.data.frame(Station, Date, Discharge)
df$Year <- year(df$Date)
df$Month <- month(df$Date)
df$Day <- day(df$Date)
The output:
> df
Station Date Discharge Year Month Day
1 A 1981-01-23 0.75514968 1981 1 23
2 A 1981-02-17 -0.08552776 1981 2 17
3 A 1981-03-20 1.47586712 1981 3 20
4 A 1981-04-26 3.64823544 1981 4 26
5 A 1981-05-22 1.21880453 1981 5 22
6 A 1981-05-23 2.19482857 1981 5 23
7 A 1981-07-02 -0.13598754 1981 7 2
8 A 1981-07-23 0.12365626 1981 7 23
9 A 1981-07-24 2.12557882 1981 7 24
10 A 1981-09-02 2.79879494 1981 9 2
11 A 1981-09-04 1.67926948 1981 9 4
12 A 1981-11-06 0.49720784 1981 11 6
13 A 1981-12-21 -0.25272271 1981 12 21
14 A 1982-04-08 1.39706157 1982 4 8
15 A 1982-04-19 -0.13965981 1982 4 19
16 A 1982-05-26 0.55238425 1982 5 26
17 A 1982-06-23 3.94639154 1982 6 23
18 A 1982-06-25 -0.03415929 1982 6 25
19 A 1982-07-15 1.00996167 1982 7 15
20 A 1982-09-11 3.18225186 1982 9 11
21 A 1982-10-17 0.30875497 1982 10 17
22 A 1982-10-30 2.26209011 1982 10 30
23 A 1982-11-06 0.34430489 1982 11 6
24 A 1982-11-19 2.28251458 1982 11 19
What I need to do is to create a moving window function using base R. I have tried using runner package but it is proving not to be so flexible. This moving window (say 3) shall take 3 rows at a time and calculate the mean discharge. This window shall continue till the last date of the year 1981. Another window shall start from 1982 and do the same. How to approach this?
Using base R only
w=3
df$DischargeM=sapply(1:nrow(df),function(x){
tmp=NA
if (x>=w) {
if (length(unique(df$Year[(x-w+1):x]))==1) {
tmp=mean(df$Discharge[(x-w+1):x])
}
}
tmp
})
Station Date Discharge Year Month Day DischargeM
1 A 1981-01-21 2.0009355 1981 1 21 NA
2 A 1981-02-11 0.5948567 1981 2 11 NA
3 A 1981-04-17 0.2637090 1981 4 17 0.95316705
4 A 1981-04-18 3.9180253 1981 4 18 1.59219699
5 A 1981-05-09 -0.2589129 1981 5 9 1.30760712
6 A 1981-07-05 1.1055913 1981 7 5 1.58823456
7 A 1981-07-11 0.7561600 1981 7 11 0.53427946
8 A 1981-07-22 0.0978999 1981 7 22 0.65321706
9 A 1981-08-04 0.5410163 1981 8 4 0.46502541
10 A 1981-08-13 -0.5044425 1981 8 13 0.04482458
11 A 1981-10-06 1.5954315 1981 10 6 0.54400178
12 A 1981-11-08 -0.5757041 1981 11 8 0.17176164
13 A 1981-12-24 1.3892440 1981 12 24 0.80299047
14 A 1982-01-07 1.9363874 1982 1 7 NA
15 A 1982-02-20 1.4340554 1982 2 20 NA
16 A 1982-05-29 0.4536461 1982 5 29 1.27469632
17 A 1982-06-10 2.9776761 1982 6 10 1.62179253
18 A 1982-06-17 1.6371733 1982 6 17 1.68949847
19 A 1982-06-28 1.7585579 1982 6 28 2.12446908
20 A 1982-08-17 0.8297518 1982 8 17 1.40849432
21 A 1982-09-21 1.6853808 1982 9 21 1.42456348
22 A 1982-11-13 0.6066167 1982 11 13 1.04058309
23 A 1982-11-16 1.4989263 1982 11 16 1.26364126
24 A 1982-11-28 0.2273658 1982 11 28 0.77763625
(make sure your df is ordered).
You can do this by using dplyr and the rollmean or rollmeanr function from zoo.
You group the data by year, and apply the rollmeanr in a mutate function.
library(dplyr)
df %>%
group_by(Year) %>%
mutate(avg = zoo::rollmeanr(Discharge, k = 3, fill = NA))
# A tibble: 24 x 7
# Groups: Year [2]
Station Date Discharge Year Month Day avg
<chr> <date> <dbl> <dbl> <dbl> <int> <dbl>
1 A 1981-01-04 1.00 1981 1 4 NA
2 A 1981-03-26 0.0468 1981 3 26 NA
3 A 1981-03-28 0.431 1981 3 28 0.494
4 A 1981-05-04 1.30 1981 5 4 0.593
5 A 1981-08-26 2.06 1981 8 26 1.26
6 A 1981-10-14 1.09 1981 10 14 1.48
7 A 1981-12-10 1.28 1981 12 10 1.48
8 A 1981-12-23 0.668 1981 12 23 1.01
9 A 1982-01-02 -0.333 1982 1 2 NA
10 A 1982-04-13 0.800 1982 4 13 NA
# ... with 14 more rows
Kindly let me know if this is what you were anticipating
Base version:
result <- transform(df,
Discharge_mean = ave(Discharge,Year,
FUN= function(x) rollapply(x,width = 3, mean, align='right',fill=NA))
)
dplyr version:
result <-df %>%
group_by(Year)%>%
mutate(Discharge_mean=rollapply(Discharge,3,mean,align='right',fill=NA))
Output:
> result
Station Date Discharge Year Month Day Discharge_mean
1 A 1981-01-09 0.560448487 1981 1 9 NA
2 A 1981-01-17 0.006777809 1981 1 17 NA
3 A 1981-02-08 2.008959399 1981 2 8 0.8587286
4 A 1981-02-21 1.166452993 1981 2 21 1.0607301
5 A 1981-04-12 3.120080595 1981 4 12 2.0984977
6 A 1981-04-24 2.647325960 1981 4 24 2.3112865
7 A 1981-05-01 0.764980310 1981 5 1 2.1774623
8 A 1981-05-20 2.203700845 1981 5 20 1.8720024
9 A 1981-06-19 0.519390897 1981 6 19 1.1626907
10 A 1981-07-06 1.704146872 1981 7 6 1.4757462
# 14 more rows

For loop in R to rewrite initial datasets

UPD:
HERE what I need:
Example of some datasets are here (I have 8 of them):
https://drive.google.com/drive/folders/1gBV2ZkywW6JqDjRICafCwtYhh2DHWaUq?usp=sharing
What I need is:
For example, in those datasets there is lev variable. Let's say this is a snapshot of the data in these datasets:
ID Year lev
1 2011 0.19
1 2012 0.19
1 2013 0.21
1 2014 0.18
2 2013 0.39
2 2014 0.15
2 2015 0.47
2 2016 0.35
3 2013 0.30
3 2015 0.1
3 2017 0.13
3 2018 0.78
4 2011 0.13
4 2012 0.35
Now, I need to create in each of my datasets EE_AB, EE_C, EE_H, etc., create variables ff1 and ff2 which are constructed for year ID, in each year respectively to the median of the whole IDs in that particular year.
Let's take an example of the year 2011. The median of the variable lev in this dataset in 2011 is (0.19+0.13)/2 = 0.16, so ff1 for ID 1 in 2011 should be 0.19/0.16 = 1.1875, and for ID 4 in 2011 ff1 = 0.13/0.16 = 0.8125.
Now let's take the example of 2013. The median lev is 0.3. so ff1 for ID 1, 2, 3 will be 0.7, 1.3, 1 respectively.
The desired output should be the ff1 variable in each dataset (e.g., EE_AB, EE_C, EE_H) as:
ID Year lev ff1
1 2011 0.19 1.1875
1 2012 0.19 0.7037
1 2013 0.21 0.7
1 2014 0.18 1.0909
2 2013 0.39 1.3
2 2014 0.15 0.9091
2 2015 0.47 1.6491
2 2016 0.35 1
3 2013 0.30 1
3 2015 0.1 0.3509
3 2017 0.13 1
3 2018 0.78 1
4 2011 0.13 0.8125
4 2012 0.35 1.2963
And this should be in the same way for other dataframes.
Here's a tidyverse method:
library(dplyr)
# library(purrr)
data_frameAB %>%
group_by(Year) %>%
mutate(ff1 = (c+d) / purrr::map2_dbl(c, d, median)) %>%
ungroup()
# # A tibble: 14 x 5
# ID Year c d ff1
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 2011 10 12 2.2
# 2 1 2012 11 13 2.18
# 3 1 2013 12 14 2.17
# 4 1 2014 13 15 2.15
# 5 1 2015 14 16 2.14
# 6 1 2016 15 34 3.27
# 7 1 2017 16 25 2.56
# 8 1 2018 17 26 2.53
# 9 1 2019 18 56 4.11
# 10 15 2015 23 38 2.65
# 11 15 2016 26 25 1.96
# 12 15 2017 30 38 2.27
# 13 45 2011 100 250 3.5
# 14 45 2012 200 111 1.56
Without purrr, that inner expression would be
mutate(ff1 = (c+d) / mapply(median, c, d))
albeit with type-safeness.
Since you have multiple frames in your data management, I have two suggestions:
Combine them into a list. This recommendation stems off the assumption that whatever you're doing to one frame you are likely to do all three. In that case, you can use lapply or purrr::map on the list of frames, doing all frames in one step. See https://stackoverflow.com/a/24376207/3358227.
list_of_frames <- list(AB=data_frameAB, C=data_frameC, F=data_frameF)
list_of_frames2 <- purrr::map(
list_of_frames,
~ .x %>%
group_by(Year) %>%
mutate(ff1 = (c+d) / purrr::map2_dbl(c, d, median)) %>% ungroup()
)
Again, without purrr, that would be
list_of_frames2 <- lapply(
list_of_frames,
function(.x) group_by(.x, Year) %>%
mutate(ff1 = (c+d) / mapply(median c, d)) %>%
ungroup()
)
Combine them into one frame, preserving the original data. Starting with list_of_frames,
bind_rows(list_of_frames, .id = "Frame") %>%
group_by(Frame, Year) %>%
mutate(ff1 = (c+d) / purrr::map2_dbl(c, d, median)) %>%
ungroup()
# # A tibble: 42 x 6
# Frame ID Year c d ff1
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 AB 1 2011 10 12 2.2
# 2 AB 1 2012 11 13 2.18
# 3 AB 1 2013 12 14 2.17
# 4 AB 1 2014 13 15 2.15
# 5 AB 1 2015 14 16 2.14
# 6 AB 1 2016 15 34 3.27
# 7 AB 1 2017 16 25 2.56
# 8 AB 1 2018 17 26 2.53
# 9 AB 1 2019 18 56 4.11
# 10 AB 15 2015 23 38 2.65
# # ... with 32 more rows

How do I make a cohort life expectancy data table in R?

Say if I have a data frame like this:
df <- data.frame(Year = c(2019,2019,2019,2020,2020,2020,2021,2021,2021), Age = c(0,1,2,0,1,2,0,1,2), px = c(0.99,0.88,0.77,0.99,0.88,0.77,0.99,0.88,0.77))
Which should look like this
> df
Year Age px
1 2019 0 0.99
2 2019 1 0.88
3 2019 2 0.77
4 2020 0 0.99
5 2020 1 0.88
6 2020 2 0.77
7 2021 0 0.99
8 2021 1 0.88
9 2021 2 0.77
How do I make a cohort life expectancy table so that it looks like this:
Year Age px
1 2019 0 0.99
2 2020 1 0.88
3 2021 2 0.77
I suggest using package dplyr
df %>%
filter(as.numeric(as.character(Year)) - as.numeric(as.character(Age)) == 2019)
# A tibble: 3 x 4
# id Year Age px
# <dbl> <dbl> <dbl> <dbl>
# 1 1 2019 0 0.99
# 2 5 2020 1 0.88
# 3 9 2021 2 0.77
Included #Ian Campbell's improvement.

dplyr - right join after group_by not producing desired/expected result

I am trying to get each of my id/year/month rows to have all rows corresponding to all seven weekdays with NAs for 'missing weekdays.'
Here is the data frame and my attempt at achieving this task:
> df
id year month weekday amount
1 1 2015 1 Friday 3650.43
2 2 2015 1 Monday 1271.12
3 1 2015 2 Friday 1315.79
4 2 2015 2 Monday 2195.37
> wday
weekday
1 Friday
2 Saturday
3 Wednesday
4 Sunday
5 Tuesday
6 Monday
7 Thursday
Tried to use group_by() and the right join. But, it is not producing what I thought it would. Is there a simple way to achieve the result I am after?
> df <- df %>% group_by(id, year, month) %>% right_join(wday)
Joining by: "weekday"
> df
Source: local data frame [9 x 5]
Groups: id, year, month [?]
id year month weekday amount
(dbl) (int) (int) (chr) (dbl)
1 1 2015 1 Friday 3650.43
2 1 2015 2 Friday 1315.79
3 NA NA NA Saturday NA
4 NA NA NA Wednesday NA
5 NA NA NA Sunday NA
6 NA NA NA Tuesday NA
7 2 2015 1 Monday 1271.12
8 2 2015 2 Monday 2195.37
9 NA NA NA Thursday NA
I want 7 rows per id/year/month combination where amount for missing weekdays will be NA (or zeroes ideally, but I know how to get that by mutate()).
Resulting data frame should look like this:
> df
id year month weekday amount
1 1 2015 1 Friday 3650.43
2 1 2015 1 Monday 0.00
3 1 2015 1 Saturday 0.00
4 1 2015 1 Sunday 0.00
5 1 2015 1 Thursday 0.00
6 1 2015 1 Tuesday 0.00
7 1 2015 1 Wednesday 0.00
8 1 2015 2 Friday 1315.79
9 1 2015 2 Monday 0.00
10 1 2015 2 Saturday 0.00
11 1 2015 2 Sunday 0.00
12 1 2015 2 Thursday 0.00
13 1 2015 2 Tuesday 0.00
14 1 2015 2 Wednesday 0.00
15 2 2015 1 Friday 0.00
16 2 2015 1 Monday 1271.12
17 2 2015 1 Saturday 0.00
18 2 2015 1 Sunday 0.00
19 2 2015 1 Thursday 0.00
20 2 2015 1 Tuesday 0.00
21 2 2015 1 Wednesday 0.00
22 2 2015 2 Friday 0.00
23 2 2015 2 Monday 2195.37
24 2 2015 2 Saturday 0.00
25 2 2015 2 Sunday 0.00
26 2 2015 2 Thursday 0.00
27 2 2015 2 Tuesday 0.00
28 2 2015 2 Wednesday 0.00
We can use expand.grid
expand.grid(c(lapply(df[1:3], unique), wday['weekday'])) %>%
left_join(., df) %>%
mutate(amount=replace(amount, is.na(amount), 0)) %>%
arrange(id, year, month, weekday)
# id year month weekday amount
#1 1 2015 1 Friday 3650.43
#2 1 2015 1 Monday 0.00
#3 1 2015 1 Saturday 0.00
#4 1 2015 1 Sunday 0.00
#5 1 2015 1 Thursday 0.00
#6 1 2015 1 Tuesday 0.00
#7 1 2015 1 Wednesday 0.00
#8 1 2015 2 Friday 1315.79
#9 1 2015 2 Monday 0.00
#10 1 2015 2 Saturday 0.00
#11 1 2015 2 Sunday 0.00
#12 1 2015 2 Thursday 0.00
#13 1 2015 2 Tuesday 0.00
#14 1 2015 2 Wednesday 0.00
#15 2 2015 1 Friday 0.00
#16 2 2015 1 Monday 1271.12
#17 2 2015 1 Saturday 0.00
#18 2 2015 1 Sunday 0.00
#19 2 2015 1 Thursday 0.00
#20 2 2015 1 Tuesday 0.00
#21 2 2015 1 Wednesday 0.00
#22 2 2015 2 Friday 0.00
#23 2 2015 2 Monday 2195.37
#24 2 2015 2 Saturday 0.00
#25 2 2015 2 Sunday 0.00
#26 2 2015 2 Thursday 0.00
#27 2 2015 2 Tuesday 0.00
#28 2 2015 2 Wednesday 0.00
sqldf For complex joins it is usually easier to use SQL:
library(sqldf)
sqldf("select
id,
year,
month,
wday.weekday,
sum((df.weekday = wday.weekday) * amount) amount
from df
join wday
group by 1, 2, 3, 4")
giving:
id year month weekday amount
1 1 2015 1 Friday 3650.43
2 1 2015 1 Saturday 0.00
3 1 2015 1 Wednesday 0.00
4 1 2015 1 Sunday 0.00
5 1 2015 1 Tuesday 0.00
6 1 2015 1 Monday 0.00
7 1 2015 1 Thursday 0.00
8 2 2015 1 Friday 0.00
9 2 2015 1 Saturday 0.00
10 2 2015 1 Wednesday 0.00
11 2 2015 1 Sunday 0.00
12 2 2015 1 Tuesday 0.00
13 2 2015 1 Monday 1271.12
14 2 2015 1 Thursday 0.00
15 1 2015 2 Friday 1315.79
16 1 2015 2 Saturday 0.00
17 1 2015 2 Wednesday 0.00
18 1 2015 2 Sunday 0.00
19 1 2015 2 Tuesday 0.00
20 1 2015 2 Monday 0.00
21 1 2015 2 Thursday 0.00
22 2 2015 2 Friday 0.00
23 2 2015 2 Saturday 0.00
24 2 2015 2 Wednesday 0.00
25 2 2015 2 Sunday 0.00
26 2 2015 2 Tuesday 0.00
27 2 2015 2 Monday 2195.37
28 2 2015 2 Thursday 0.00
base R We could replicate this in base R using merge and transform:
xt <- transform(
merge(df, wday, by = c()),
amount = (as.character(weekday.x) == as.character(weekday.y)) * amount,
weekday = weekday.y,
weekday.x = NULL,
weekday.y = NULL
))
aggregate(amount ~., xt, sum)
dplyr and if we really wanted to use dplyr we could replace the transform with mutate, rename and select:
library(dplyr)
merge(df, wday, by = c()) %>%
mutate(amount = (as.character(weekday.x) == as.character(weekday.y)) * amount) %>%
rename(weekday = weekday.y) %>%
select(-weekday.x) %>%
group_by(id, year, month, weekday) %>%
summarise(amount = sum(amount))
Note: If there is only one weekday per group (as in the question) we could optionally omit group by/sum, aggregate and group_by/summarise in the three solutions respectively.
Using tidyr and dplyr. complete here does the heavy lifting - if you already have each weekday somewhere in df, you won't need the bind_rows or na.omit (or dplyr).
library(dplyr)
library(tidyr)
df %>% #initial data
bind_rows(wday) %>% #adding on so we have all the weekdays
complete(id, year, month, weekday, #completing all levels of id:year:month:weekday
fill = list(amount = 0)) %>% #filling amount column with 0
na.omit() #remove the NAs we got from the bind_rows

Find daily percentile for large data set of irregular data

I have a very large data set (> 1 million rows) with percentiles that need to be calculated for all of the same day (e.g., all Jan 1, all Jan 2, ..., all Dec 31). There are many rows of the same year, month and day with different data. Below is an example of the data:
Year Month Day A B C D
2007 Jan 1 1 2 3 4
2007 Jan 1 5 6 7 8
2007 Feb 1 1 2 3 4
2007 Feb 1 5 6 7 8
.
.
2010 Dec 30 1 2 3 4
2010 Dec 30 5 6 7 8
2010 Dec 31 1 2 3 4
2010 Dec 31 5 6 7 8
So to calculate the 95th percentile for Jan 1, it would need to include all Jan 1 for all years (e.g., 2007-2010) and for all columns (A, B, C and D). This is then done for all Jan 2, Jan 3, ..., Dec 30 and Dec 31. This can easily be done with small data sets in Excel by using nested if statements; e.g., ={PERCENTILE(IF(Month($B$2:$B$1000000)="Jan",IF(Day($C$2:$C$1000000)="1",$D$2:$G$1000000)),95%)}
The percentiles could then be added to a a new data table containing only the month and days:
Month Day P95 P05
Jan 1
Jan 2
Jan 3
.
.
Dec 30
Dec 31
Then using the percentiles, I need to evaluate whether each data value in column names A, B, C and D for their respective date (e.g., Jan 1) is larger than P95 or smaller than P05. Then new columns could be added to the first data table containing 1 or 0 (1 if larger or smaller, 0 if not larger or smaller than the percentiles):
Year Month Day A B C D A05 B05 C05 D05 A95 B95 C95 D95
2007 Jan 1 1 2 3 4 1 0 0 0 0 0 0 0
2007 Jan 1 5 6 7 8 0 0 0 0 0 0 1 1
.
.
2010 Dec 31 5 6 7 8 0 0 0 0 0 0 0 1
I've called your data dat:
library(plyr)
library(reshape2)
# melt values so all values are in 1 column
dat_melt <- melt(dat, id.vars=c("Year", "Month", "Day"), variable.name="letter", value.name="value")
# get quantiles, split by day
dat_quantiles <- ddply(dat_melt, .(Month, Day), summarise,
P05=quantile(value, 0.05), P95=quantile(value, 0.95))
# merge original data with quantiles
all_dat <- merge(dat_melt, dat_quantiles)
# See if in bounds
all_dat <- transform(all_dat, less05=ifelse(value < P05, 1, 0), greater95=ifelse(value > P95, 1, 0))
Month Day Year letter value P05 P95 less05 greater95
1 Dec 30 2010 A 1 1.35 7.65 1 0
2 Dec 30 2010 A 5 1.35 7.65 0 0
3 Dec 30 2010 B 2 1.35 7.65 0 0
4 Dec 30 2010 B 6 1.35 7.65 0 0
5 Dec 30 2010 C 3 1.35 7.65 0 0
6 Dec 30 2010 C 7 1.35 7.65 0 0
7 Dec 30 2010 D 4 1.35 7.65 0 0
8 Dec 30 2010 D 8 1.35 7.65 0 1
9 Dec 31 2010 A 1 1.35 7.65 1 0
10 Dec 31 2010 A 5 1.35 7.65 0 0
11 Dec 31 2010 B 2 1.35 7.65 0 0
12 Dec 31 2010 B 6 1.35 7.65 0 0
13 Dec 31 2010 C 3 1.35 7.65 0 0
14 Dec 31 2010 C 7 1.35 7.65 0 0
15 Dec 31 2010 D 4 1.35 7.65 0 0
16 Dec 31 2010 D 8 1.35 7.65 0 1
17 Feb 1 2007 A 1 1.35 7.65 1 0
18 Feb 1 2007 A 5 1.35 7.65 0 0
19 Feb 1 2007 B 2 1.35 7.65 0 0
20 Feb 1 2007 B 6 1.35 7.65 0 0
21 Feb 1 2007 C 3 1.35 7.65 0 0
22 Feb 1 2007 C 7 1.35 7.65 0 0
23 Feb 1 2007 D 4 1.35 7.65 0 0
24 Feb 1 2007 D 8 1.35 7.65 0 1
25 Jan 1 2007 A 1 1.35 7.65 1 0
26 Jan 1 2007 A 5 1.35 7.65 0 0
27 Jan 1 2007 B 2 1.35 7.65 0 0
28 Jan 1 2007 B 6 1.35 7.65 0 0
29 Jan 1 2007 C 3 1.35 7.65 0 0
30 Jan 1 2007 C 7 1.35 7.65 0 0
31 Jan 1 2007 D 4 1.35 7.65 0 0
32 Jan 1 2007 D 8 1.35 7.65 0 1
Something along these lines can be merged to the original dataframe:
aggregate(dfrm[ , c("A","B","C","D")] , list(dfrm$month, dfrm$day),
FUN=quantile, probs=c(0.05,0.95))
Notice I suggested merge(). Your description suggested (but was not explicit) that you wanted all years worth of Jan-1 values to be submitted together. I think this is a lot "easier" than the expression you are using in Excel. This does both 0.05 and 0.95 on all four columns.

Resources