dplyr - right join after group_by not producing desired/expected result - r

I am trying to get each of my id/year/month rows to have all rows corresponding to all seven weekdays with NAs for 'missing weekdays.'
Here is the data frame and my attempt at achieving this task:
> df
id year month weekday amount
1 1 2015 1 Friday 3650.43
2 2 2015 1 Monday 1271.12
3 1 2015 2 Friday 1315.79
4 2 2015 2 Monday 2195.37
> wday
weekday
1 Friday
2 Saturday
3 Wednesday
4 Sunday
5 Tuesday
6 Monday
7 Thursday
Tried to use group_by() and the right join. But, it is not producing what I thought it would. Is there a simple way to achieve the result I am after?
> df <- df %>% group_by(id, year, month) %>% right_join(wday)
Joining by: "weekday"
> df
Source: local data frame [9 x 5]
Groups: id, year, month [?]
id year month weekday amount
(dbl) (int) (int) (chr) (dbl)
1 1 2015 1 Friday 3650.43
2 1 2015 2 Friday 1315.79
3 NA NA NA Saturday NA
4 NA NA NA Wednesday NA
5 NA NA NA Sunday NA
6 NA NA NA Tuesday NA
7 2 2015 1 Monday 1271.12
8 2 2015 2 Monday 2195.37
9 NA NA NA Thursday NA
I want 7 rows per id/year/month combination where amount for missing weekdays will be NA (or zeroes ideally, but I know how to get that by mutate()).
Resulting data frame should look like this:
> df
id year month weekday amount
1 1 2015 1 Friday 3650.43
2 1 2015 1 Monday 0.00
3 1 2015 1 Saturday 0.00
4 1 2015 1 Sunday 0.00
5 1 2015 1 Thursday 0.00
6 1 2015 1 Tuesday 0.00
7 1 2015 1 Wednesday 0.00
8 1 2015 2 Friday 1315.79
9 1 2015 2 Monday 0.00
10 1 2015 2 Saturday 0.00
11 1 2015 2 Sunday 0.00
12 1 2015 2 Thursday 0.00
13 1 2015 2 Tuesday 0.00
14 1 2015 2 Wednesday 0.00
15 2 2015 1 Friday 0.00
16 2 2015 1 Monday 1271.12
17 2 2015 1 Saturday 0.00
18 2 2015 1 Sunday 0.00
19 2 2015 1 Thursday 0.00
20 2 2015 1 Tuesday 0.00
21 2 2015 1 Wednesday 0.00
22 2 2015 2 Friday 0.00
23 2 2015 2 Monday 2195.37
24 2 2015 2 Saturday 0.00
25 2 2015 2 Sunday 0.00
26 2 2015 2 Thursday 0.00
27 2 2015 2 Tuesday 0.00
28 2 2015 2 Wednesday 0.00

We can use expand.grid
expand.grid(c(lapply(df[1:3], unique), wday['weekday'])) %>%
left_join(., df) %>%
mutate(amount=replace(amount, is.na(amount), 0)) %>%
arrange(id, year, month, weekday)
# id year month weekday amount
#1 1 2015 1 Friday 3650.43
#2 1 2015 1 Monday 0.00
#3 1 2015 1 Saturday 0.00
#4 1 2015 1 Sunday 0.00
#5 1 2015 1 Thursday 0.00
#6 1 2015 1 Tuesday 0.00
#7 1 2015 1 Wednesday 0.00
#8 1 2015 2 Friday 1315.79
#9 1 2015 2 Monday 0.00
#10 1 2015 2 Saturday 0.00
#11 1 2015 2 Sunday 0.00
#12 1 2015 2 Thursday 0.00
#13 1 2015 2 Tuesday 0.00
#14 1 2015 2 Wednesday 0.00
#15 2 2015 1 Friday 0.00
#16 2 2015 1 Monday 1271.12
#17 2 2015 1 Saturday 0.00
#18 2 2015 1 Sunday 0.00
#19 2 2015 1 Thursday 0.00
#20 2 2015 1 Tuesday 0.00
#21 2 2015 1 Wednesday 0.00
#22 2 2015 2 Friday 0.00
#23 2 2015 2 Monday 2195.37
#24 2 2015 2 Saturday 0.00
#25 2 2015 2 Sunday 0.00
#26 2 2015 2 Thursday 0.00
#27 2 2015 2 Tuesday 0.00
#28 2 2015 2 Wednesday 0.00

sqldf For complex joins it is usually easier to use SQL:
library(sqldf)
sqldf("select
id,
year,
month,
wday.weekday,
sum((df.weekday = wday.weekday) * amount) amount
from df
join wday
group by 1, 2, 3, 4")
giving:
id year month weekday amount
1 1 2015 1 Friday 3650.43
2 1 2015 1 Saturday 0.00
3 1 2015 1 Wednesday 0.00
4 1 2015 1 Sunday 0.00
5 1 2015 1 Tuesday 0.00
6 1 2015 1 Monday 0.00
7 1 2015 1 Thursday 0.00
8 2 2015 1 Friday 0.00
9 2 2015 1 Saturday 0.00
10 2 2015 1 Wednesday 0.00
11 2 2015 1 Sunday 0.00
12 2 2015 1 Tuesday 0.00
13 2 2015 1 Monday 1271.12
14 2 2015 1 Thursday 0.00
15 1 2015 2 Friday 1315.79
16 1 2015 2 Saturday 0.00
17 1 2015 2 Wednesday 0.00
18 1 2015 2 Sunday 0.00
19 1 2015 2 Tuesday 0.00
20 1 2015 2 Monday 0.00
21 1 2015 2 Thursday 0.00
22 2 2015 2 Friday 0.00
23 2 2015 2 Saturday 0.00
24 2 2015 2 Wednesday 0.00
25 2 2015 2 Sunday 0.00
26 2 2015 2 Tuesday 0.00
27 2 2015 2 Monday 2195.37
28 2 2015 2 Thursday 0.00
base R We could replicate this in base R using merge and transform:
xt <- transform(
merge(df, wday, by = c()),
amount = (as.character(weekday.x) == as.character(weekday.y)) * amount,
weekday = weekday.y,
weekday.x = NULL,
weekday.y = NULL
))
aggregate(amount ~., xt, sum)
dplyr and if we really wanted to use dplyr we could replace the transform with mutate, rename and select:
library(dplyr)
merge(df, wday, by = c()) %>%
mutate(amount = (as.character(weekday.x) == as.character(weekday.y)) * amount) %>%
rename(weekday = weekday.y) %>%
select(-weekday.x) %>%
group_by(id, year, month, weekday) %>%
summarise(amount = sum(amount))
Note: If there is only one weekday per group (as in the question) we could optionally omit group by/sum, aggregate and group_by/summarise in the three solutions respectively.

Using tidyr and dplyr. complete here does the heavy lifting - if you already have each weekday somewhere in df, you won't need the bind_rows or na.omit (or dplyr).
library(dplyr)
library(tidyr)
df %>% #initial data
bind_rows(wday) %>% #adding on so we have all the weekdays
complete(id, year, month, weekday, #completing all levels of id:year:month:weekday
fill = list(amount = 0)) %>% #filling amount column with 0
na.omit() #remove the NAs we got from the bind_rows

Related

Monthly table filtered by year of a hospital in R

I work in a hospital and I'm having a problem. I was asked to report the number of patients who were in the hospital (not the number of admissions) in each month filtered by the year in R. How could I do this? Here is an example data frame:
id <- c("1","2","3","4","5")
hospitalization_date <- c(as.Date("2010-01-01"), as.Date("2011-03-01"),as.Date("2010-04-01"),as.Date("2010-02-01"),as.Date("2011-06-01"))
discharged_date <- c(as.Date("2010-05-11"), as.Date("2011-08-04"),as.Date("2010-06-13"),as.Date("2010-08-02"),as.Date("2011-06-23"))
cid <- c("F29","F33","F71","F29","F09")
gender <- c("F","M","M","M","F")
df <- data.frame(id,hospitalization_date,discharged_date,cid,gender)
I would like the result to be like this, filtering by the year 2010:
monthly number_of_patients
1 jan 1
2 feb 2
3 mar 2
4 apr 3
5 may 3
6 jun 2
7 jul 1
8 aug 1
9 sep 0
10 oct 0
11 nov 0
12 dec 0
notice that the patient id=1 is in the months jan, feb, mar, apr and may, the patient id=3 is in the months apr, may and jun and the patient id=4 is in the months feb, mar, apr, may , Jun, Jul and Aug all from the year 2010.
and so, filtered by the year 2011:
monthly number_of_patients
1 jan 0
2 feb 0
3 mar 1
4 apr 1
5 may 1
6 jun 2
7 jul 1
8 aug 1
9 sep 0
10 oct 0
11 nov 0
12 dec 0
Help me please.
Perhaps this helps
library(dplyr)
library(lubridate)
library(tidyr)
library(purrr)
out <- df %>%
mutate(across(ends_with('_date'), ymd)) %>%
transmute(id,
dates = map2(hospitalization_date, discharged_date,
~ seq(.x, .y, by = 'month'))) %>%
unnest(dates) %>%
mutate(year = year(dates), monthly = format(dates, '%b')) %>%
count(year, monthly) %>%
group_by(year) %>%
complete(monthly = month.abb, fill = list(n = 0)) %>%
arrange(match(monthly, month.abb), .by_group = TRUE) %>%
ungroup
-output
> as.data.frame(out)
year monthly n
1 2010 Jan 1
2 2010 Feb 2
3 2010 Mar 2
4 2010 Apr 3
5 2010 May 3
6 2010 Jun 2
7 2010 Jul 1
8 2010 Aug 1
9 2010 Sep 0
10 2010 Oct 0
11 2010 Nov 0
12 2010 Dec 0
13 2011 Jan 0
14 2011 Feb 0
15 2011 Mar 1
16 2011 Apr 1
17 2011 May 1
18 2011 Jun 2
19 2011 Jul 1
20 2011 Aug 1
21 2011 Sep 0
22 2011 Oct 0
23 2011 Nov 0
24 2011 Dec 0
library(tidyverse); library(lubridate)
df %>%
pivot_longer(contains("date")) %>%
mutate(chg = ifelse(name == "hospitalization_date", 1, -1),
month = if_else(chg == -1,
ceiling_date(value, "month"),
floor_date(value, "month"))) %>%
count(month, wt = chg, name = "change") %>%
complete(month = seq.Date(min(month), max(month),
by = "month"), fill = list(change=0)) %>%
mutate(count = cumsum(change))
Result
month change count
1 2010-01-01 1 1
2 2010-02-01 1 2
3 2010-03-01 0 2
4 2010-04-01 1 3
5 2010-05-01 0 3
6 2010-06-01 -1 2
7 2010-07-01 -1 1
8 2010-08-01 0 1
9 2010-09-01 -1 0
10 2010-10-01 0 0
11 2010-11-01 0 0
12 2010-12-01 0 0
13 2011-01-01 0 0
14 2011-02-01 0 0
15 2011-03-01 1 1
16 2011-04-01 0 1
17 2011-05-01 0 1
18 2011-06-01 1 2
19 2011-07-01 -1 1
20 2011-08-01 0 1
21 2011-09-01 -1 0

How to row bind all cases of a particular weekday in a given year-month into an R dataset

I have data that includes a date and day of week.
I would like to identify all instances of a particular weekday that match the given year/month/weekday
in the original data.
For instance if the first record has the date "2010-07-05" which is a Thursday, I want to rowbind all Thursdays
that occur in July of 2010 to my original dataset.
While adding those new rows, I also want to fill in those new rows with values from the original data for all columns, except one. The exception is a variable which indicates whether or not that row
was in the original dataset or not.
Example data:
(1) alldays -- this data includes all dates and weekdays for the appropriate years.
(2) dt1 -- this is the example dataset that includes the date Adate, and day of week dow that will be used to identify the year/month/weekday and then look for all dates within that same month for the given weekday. For example - all Thursdays in July of 2017 will need to row bound to the original data.
library(data.table)
library(tidyverse)
library(lubridate)
alldays <- data.table (date = seq(as.Date("2010-01-01"),
as.Date("2011-12-31"), by="days"))
alldays <- alldays %>%
dplyr::mutate(year = lubridate::year(date),
month = lubridate::month(date),
day = lubridate::day(date),
dow = weekdays(date))
setDT(alldays)
head(alldays)
date year month day dow
1 2010-01-01 2010 1 1 Friday
2 2010-01-02 2010 1 2 Saturday
3 2010-01-03 2010 1 3 Sunday
4 2010-01-04 2010 1 4 Monday
5 2010-01-05 2010 1 5 Tuesday
6 2010-01-06 2010 1 6 Wednesday
Here is an example of the primary dataset
id <- seq(1:2)
admit <- rep(1,2)
zip <- c(54123, 54789)
Adate <- as.Date(c("2010-07-15","2011-03-14"))
year <- c(2010, 2011)
month <- c(7,3)
day <- c(15,14)
dow <- c("Thursday","Monday")
dt1 <- data.table(id, admit, zip, Adate, year, month, day, dow)
dt1
#> id admit zip Adate year month day dow
#> 1: 1 1 54123 2010-07-15 2010 7 15 Thursday
#> 2: 2 1 54789 2011-03-14 2011 3 14 Monday
The resulting dataset should be:
id admit zip Adate year month day dow
1: 1 0 54123 2010-07-01 2010 7 1 Thursday
2: 1 0 54123 2010-07-08 2010 7 8 Thursday
3: 1 1 54123 2010-07-15 2010 7 15 Thursday
4: 1 0 54123 2010-07-22 2010 7 22 Thursday
5: 1 0 54123 2010-07-29 2010 7 29 Thursday
6: 2 0 54789 2011-03-07 2011 3 7 Monday
7: 2 1 54789 2011-03-14 2011 3 14 Monday
8: 2 0 54789 2011-03-21 2011 3 21 Monday
9: 2 0 54789 2011-03-28 2011 3 28 Monday
So we can see that the first date dt1 2010-07-15 associated with id=1, which was a Thursday fell within a month with 4 additional Thursday in that month which were added to the dataset. The variable admit is the indicator of whether that row was in the original or subsequently added by virtue of the being matched.
I have tried first selecting the additional dates from alldays with matching weekdays but I am running into issues on how to rowbind those back into the original dataset while filling in the other values appropriately. Eventually I will be running this on a dataset with about 300,000 rows.
Here is an option:
alldays[dt1[, .(id, zip, admit=0L, year, month, dow)],
on=.(year, month, dow), allow.cartesian=TRUE][
dt1, on=.(id, date=Adate), admit := i.admit][]
output:
date year month day dow id zip admit
1: 2010-07-01 2010 7 1 Thursday 1 54123 0
2: 2010-07-08 2010 7 8 Thursday 1 54123 0
3: 2010-07-15 2010 7 15 Thursday 1 54123 1
4: 2010-07-22 2010 7 22 Thursday 1 54123 0
5: 2010-07-29 2010 7 29 Thursday 1 54123 0
6: 2011-03-07 2011 3 7 Monday 2 54789 0
7: 2011-03-14 2011 3 14 Monday 2 54789 1
8: 2011-03-21 2011 3 21 Monday 2 54789 0
9: 2011-03-28 2011 3 28 Monday 2 54789 0

Summarizing percentage by subgroups

I don't know how to explain my problem, but I want to summarize the categories distance and get the percentage for each distance per month. In my table 1 week is 100% and now I want to calculate the same for the month but using the percentage from the weeks.
Something like sum(percent)/ amount of weeks in this month
This is what I have:
year month year_week distance object_remarks weeksum percent
1 2017 05 2017_21 15 ctenolabrus_rupestris 3 0.75
2 2017 05 2017_21 10 ctenolabrus_rupestris 1 0.25
3 2017 05 2017_22 5 ctenolabrus_rupestris 5 0.833
4 2017 05 2017_22 0 ctenolabrus_rupestris 1 0.167
5 2017 06 2017_22 0 ctenolabrus_rupestris 9 1
6 2017 06 2017_23 20 ctenolabrus_rupestris 6 0.545
7 2017 06 2017_23 0 ctenolabrus_rupestris 5 0.455
I want to have an output like this:
year month distance object_remarks weeksum percent percent_month
1 2017 05 15 ctenolabrus_rupestris 3 0.75 0.375
2 2017 05 10 ctenolabrus_rupestris 1 0.25 0.1225
3 2017 05 5 ctenolabrus_rupestris 5 0.833 0.4165
4 2017 05 0 ctenolabrus_rupestris 1 0.167 0.0835
5 2017 06 0 ctenolabrus_rupestris 14 1.455 0.7275
6 2017 06 20 ctenolabrus_rupestris 6 0.545 0.2775
Thanks a lot!
You may need to use group_by() twice.
df %>%
select(-year_week) %>%
group_by(month, distance) %>%
mutate(percent = sum(percent), weeksum = sum(weeksum)) %>%
distinct %>%
group_by(month) %>%
mutate(percent_month = percent/sum(percent))
# A tibble: 6 x 7
# Groups: month [2]
# year month distance object_remarks weeksum percent percent_month
# <int> <int> <int> <chr> <int> <dbl> <dbl>
# 1 2017 5 15 ctenolabrus_rupestris 3 0.75 0.375
# 2 2017 5 10 ctenolabrus_rupestris 1 0.25 0.125
# 3 2017 5 5 ctenolabrus_rupestris 5 0.833 0.416
# 4 2017 5 0 ctenolabrus_rupestris 1 0.167 0.0835
# 5 2017 6 0 ctenolabrus_rupestris 14 1.46 0.728
# 6 2017 6 20 ctenolabrus_rupestris 6 0.545 0.272

Delete next row after every count/list of day; in R

I am finding it difficult to wrap my head around this:
In the dataframe below I want to delete the next row after every count/list of, say, Thursday, same for Friday and so on. I would prefer not using a loop since the data is big.
mydata<- read.table(header=TRUE, text="
Date AAPL.ret Weekday Thursday
1 2001-01-04 0.000000000 Thursday 1
2 2001-01-04 0.000000000 Thursday 1
3 2001-01-04 -0.025317808 Thursday 1
4 2001-01-04 0.014545711 Thursday 1
5 2001-01-04 0.007194276 Thursday 1
6 2001-01-04 -0.007194276 Thursday 1
7 2001-01-05 -0.0278569545 Friday 0
8 2001-01-05 0.0056338177 Friday 0
9 2001-01-05 0.0037383221 Friday 0
10 2001-01-05 0.0000000000 Friday 0
11 2002-02-25 3.511856e-03 Monday 0
12 2002-02-25 -3.511856e-03 Monday 0
13 2002-02-25 -4.398505e-04 Monday 0
14 2002-02-25 -2.643173e-03 Monday 0
15 2002-02-25 4.401416e-03 Monday 0
16 2002-02-26 9.189066e-03 Tuesday 0
17 2002-02-26 -8.243166e-04 Tuesday 0
18 2002-02-26 9.533751e-03 Tuesday 0
19 2002-02-26 4.527688e-03 Tuesday 0
20 2002-02-26 4.105933e-04 Tuesday 0
.............
100 2002-03-01 8.717651e-03 Friday 0
101 2002-03-01 1.990115e-02 Friday 0
102 2002-03-01 -1.344387e-03 Friday 0
103 2002-03-01 -1.445373e-02 Friday 0
")
The output I need should be like this:
Date AAPL.ret Weekday Thursday
1 2001-01-04 0.000000000 Thursday 1
2 2001-01-04 0.000000000 Thursday 1
3 2001-01-04 -0.025317808 Thursday 1
4 2001-01-04 0.014545711 Thursday 1
5 2001-01-04 0.007194276 Thursday 1
6 2001-01-04 -0.007194276 Thursday 1
7 2001-01-05 0.0056338177 Friday 0
8 2001-01-05 0.0037383221 Friday 0
9 2001-01-05 0.0000000000 Friday 0
11 2002-02-25 -3.511856e-03 Monday 0
12 2002-02-25 -4.398505e-04 Monday 0
13 2002-02-25 -2.643173e-03 Monday 0
14 2002-02-25 4.401416e-03 Monday 0
15 2002-02-26 -8.243166e-04 Tuesday 0
16 2002-02-26 9.533751e-03 Tuesday 0
17 2002-02-26 4.527688e-03 Tuesday 0
18 2002-02-26 4.105933e-04 Tuesday 0
.............
100 2002-03-01 1.990115e-02 Friday 0
101 2002-03-01 -1.344387e-03 Friday 0
102 2002-03-01 -1.445373e-02 Friday 0
Thank you in advance. Sorry if I have wrongfully asked the question. This is my first time of asking a question here; I have tried to follow the rules as best as I can; especially how the table should appear.
The codes I have tried, I believe, are really far from the answer I desire. Just counting and subsetting; below.
table(ret.df$Weekday=="Thursday")
r1<-ret.df[!(ret.df$Weekday=="Thursday"),]
I hope my question less vague now.
A follow up from the previous answer:
removing rows based on condition in ret_1ON
ret_1ON<- ret.df[duplicated(ret.df$Date)|1:nrow(ret.df)==1,]
dim(ret_1ON)
[1] 98734 4
head(ret_1ON)
Date AAPL.ret Weekday Thursday
1 2001-01-04 0.000000000 Thursday 1
2 2001-01-04 0.000000000 Thursday 1
3 2001-01-04 -0.025317808 Thursday 1
4 2001-01-04 0.014545711 Thursday 1
5 2001-01-04 0.007194276 Thursday 1
6 2001-01-04 -0.007194276 Thursday 1
tail(ret_1ON)
Date AAPL.ret Weekday Thursday
99994 2006-01-19 0.0013771520 Thursday 1
99995 2006-01-19 -0.0007321584 Thursday 1
99996 2006-01-19 -0.0029026141 Thursday 1
99997 2006-01-19 -0.0002511616 Thursday 1
99998 2006-01-19 0.0011297309 Thursday 1
99999 2006-01-19 -0.0002509410 Thursday 1
I'm wandering why the last item in tail is not 98734 but rather 99999?
dim(ret.df)
[1] 99999 4
which means the condition was effected, though.
We can do this with data.table
library(data.table)
setDT(mydata)[, .SD[(seq_len(.N) != 1)], Date]
if we wanted to keep the first row of the dataset
setDT(mydata)[, .SD[(seq_len(.N) != 1)|seq_len(.N)==.I[1]], Date]
Or with dplyr
library(dplyr)
mydata %>%
group_by(Date) %>%
filter(row_number() != 1)
Or using base R, if the 'Date' column is ordered
mydata[duplicated(mydata$Date),]
or with including the first row
mydata[duplicated(mydata$Date)|1:nrow(mydata)==1,]

Find daily percentile for large data set of irregular data

I have a very large data set (> 1 million rows) with percentiles that need to be calculated for all of the same day (e.g., all Jan 1, all Jan 2, ..., all Dec 31). There are many rows of the same year, month and day with different data. Below is an example of the data:
Year Month Day A B C D
2007 Jan 1 1 2 3 4
2007 Jan 1 5 6 7 8
2007 Feb 1 1 2 3 4
2007 Feb 1 5 6 7 8
.
.
2010 Dec 30 1 2 3 4
2010 Dec 30 5 6 7 8
2010 Dec 31 1 2 3 4
2010 Dec 31 5 6 7 8
So to calculate the 95th percentile for Jan 1, it would need to include all Jan 1 for all years (e.g., 2007-2010) and for all columns (A, B, C and D). This is then done for all Jan 2, Jan 3, ..., Dec 30 and Dec 31. This can easily be done with small data sets in Excel by using nested if statements; e.g., ={PERCENTILE(IF(Month($B$2:$B$1000000)="Jan",IF(Day($C$2:$C$1000000)="1",$D$2:$G$1000000)),95%)}
The percentiles could then be added to a a new data table containing only the month and days:
Month Day P95 P05
Jan 1
Jan 2
Jan 3
.
.
Dec 30
Dec 31
Then using the percentiles, I need to evaluate whether each data value in column names A, B, C and D for their respective date (e.g., Jan 1) is larger than P95 or smaller than P05. Then new columns could be added to the first data table containing 1 or 0 (1 if larger or smaller, 0 if not larger or smaller than the percentiles):
Year Month Day A B C D A05 B05 C05 D05 A95 B95 C95 D95
2007 Jan 1 1 2 3 4 1 0 0 0 0 0 0 0
2007 Jan 1 5 6 7 8 0 0 0 0 0 0 1 1
.
.
2010 Dec 31 5 6 7 8 0 0 0 0 0 0 0 1
I've called your data dat:
library(plyr)
library(reshape2)
# melt values so all values are in 1 column
dat_melt <- melt(dat, id.vars=c("Year", "Month", "Day"), variable.name="letter", value.name="value")
# get quantiles, split by day
dat_quantiles <- ddply(dat_melt, .(Month, Day), summarise,
P05=quantile(value, 0.05), P95=quantile(value, 0.95))
# merge original data with quantiles
all_dat <- merge(dat_melt, dat_quantiles)
# See if in bounds
all_dat <- transform(all_dat, less05=ifelse(value < P05, 1, 0), greater95=ifelse(value > P95, 1, 0))
Month Day Year letter value P05 P95 less05 greater95
1 Dec 30 2010 A 1 1.35 7.65 1 0
2 Dec 30 2010 A 5 1.35 7.65 0 0
3 Dec 30 2010 B 2 1.35 7.65 0 0
4 Dec 30 2010 B 6 1.35 7.65 0 0
5 Dec 30 2010 C 3 1.35 7.65 0 0
6 Dec 30 2010 C 7 1.35 7.65 0 0
7 Dec 30 2010 D 4 1.35 7.65 0 0
8 Dec 30 2010 D 8 1.35 7.65 0 1
9 Dec 31 2010 A 1 1.35 7.65 1 0
10 Dec 31 2010 A 5 1.35 7.65 0 0
11 Dec 31 2010 B 2 1.35 7.65 0 0
12 Dec 31 2010 B 6 1.35 7.65 0 0
13 Dec 31 2010 C 3 1.35 7.65 0 0
14 Dec 31 2010 C 7 1.35 7.65 0 0
15 Dec 31 2010 D 4 1.35 7.65 0 0
16 Dec 31 2010 D 8 1.35 7.65 0 1
17 Feb 1 2007 A 1 1.35 7.65 1 0
18 Feb 1 2007 A 5 1.35 7.65 0 0
19 Feb 1 2007 B 2 1.35 7.65 0 0
20 Feb 1 2007 B 6 1.35 7.65 0 0
21 Feb 1 2007 C 3 1.35 7.65 0 0
22 Feb 1 2007 C 7 1.35 7.65 0 0
23 Feb 1 2007 D 4 1.35 7.65 0 0
24 Feb 1 2007 D 8 1.35 7.65 0 1
25 Jan 1 2007 A 1 1.35 7.65 1 0
26 Jan 1 2007 A 5 1.35 7.65 0 0
27 Jan 1 2007 B 2 1.35 7.65 0 0
28 Jan 1 2007 B 6 1.35 7.65 0 0
29 Jan 1 2007 C 3 1.35 7.65 0 0
30 Jan 1 2007 C 7 1.35 7.65 0 0
31 Jan 1 2007 D 4 1.35 7.65 0 0
32 Jan 1 2007 D 8 1.35 7.65 0 1
Something along these lines can be merged to the original dataframe:
aggregate(dfrm[ , c("A","B","C","D")] , list(dfrm$month, dfrm$day),
FUN=quantile, probs=c(0.05,0.95))
Notice I suggested merge(). Your description suggested (but was not explicit) that you wanted all years worth of Jan-1 values to be submitted together. I think this is a lot "easier" than the expression you are using in Excel. This does both 0.05 and 0.95 on all four columns.

Resources