I am trying to apply some basic math to daily stock values based on a corresponding yearly value.
reprex
(daily prices)
library(tidyquant)
data(FANG)
# daily prices
FANG %>%
select(c(date, symbol, adjusted)) %>%
group_by(symbol)
# A tibble: 4,032 x 3
# Groups: symbol [4]
date symbol adjusted
<date> <chr> <dbl>
1 2013-01-02 FB 28
2 2013-01-03 FB 27.8
3 2013-01-04 FB 28.8
4 2013-01-07 FB 29.4
5 2013-01-08 FB 29.1
6 2013-01-09 FB 30.6
7 2013-01-10 FB 31.3
8 2013-01-11 FB 31.7
9 2013-01-14 FB 31.0
10 2013-01-15 FB 30.1
# ... with 4,022 more rows
(max price per year)
FANG_yearly_high <-
FANG %>%
group_by(symbol) %>%
summarise_by_time(
.date_var = date,
.by = "year",
price = AVERAGE(adjusted))
# Groups: symbol [4]
symbol date price
<chr> <date> <dbl>
1 AMZN 2013-01-01 404.
2 AMZN 2014-01-01 407.
3 AMZN 2015-01-01 694.
4 AMZN 2016-01-01 844.
5 FB 2013-01-01 58.0
6 FB 2014-01-01 81.4
7 FB 2015-01-01 109.
8 FB 2016-01-01 133.
9 GOOG 2013-01-01 560.
10 GOOG 2014-01-01 609.
11 GOOG 2015-01-01 777.
12 GOOG 2016-01-01 813.
13 NFLX 2013-01-01 54.4
14 NFLX 2014-01-01 69.2
15 NFLX 2015-01-01 131.
16 NFLX 2016-01-01 128.
I would like to divide each daily price by the corresponding max price for the year.
I tried:
FANG %>%
group_by(symbol) %>%
summarise_by_time(
.date_var = date,
.by = "year",
price = AVERAGE(adjusted) / YEAR(date(MAX(adjusted)))
)
and the get this error:
Error in as.POSIXlt.numeric(x, tz = tz(x)) : 'origin' must be supplied
Any sensible way to accomplish this?
Thank you
summarise_by_time is good if you just want to summarise. But you want to divide a daily price by a max of a period. So you need to use mutate. Below are 2 examples. The first one for daily prices, the second for weekly. You can adjust the weekly version easily to monthly.
library(tidyquant)
library(dplyr)
data(FANG)
# daily prices
FANG %>%
select(c(date, symbol, adjusted)) %>%
group_by(symbol, year = year(date)) %>%
mutate(price_pct = adjusted / max(adjusted))
# A tibble: 4,032 x 5
# Groups: symbol, year [16]
date symbol adjusted year price_pct
<date> <chr> <dbl> <dbl> <dbl>
1 2013-01-02 FB 28 2013 0.483
2 2013-01-03 FB 27.8 2013 0.479
3 2013-01-04 FB 28.8 2013 0.496
4 2013-01-07 FB 29.4 2013 0.508
5 2013-01-08 FB 29.1 2013 0.501
6 2013-01-09 FB 30.6 2013 0.528
7 2013-01-10 FB 31.3 2013 0.540
8 2013-01-11 FB 31.7 2013 0.547
9 2013-01-14 FB 31.0 2013 0.534
10 2013-01-15 FB 30.1 2013 0.519
# ... with 4,022 more rows
Weekly / monthly:
# weekly
FANG %>%
select(c(date, symbol, adjusted)) %>%
group_by(symbol) %>%
tq_transmute(mutate_fun = to.period,
period = "weeks" # change weeks to months for monthly
) %>%
group_by(symbol, year = year(date)) %>%
mutate(price_pct = adjusted / max(adjusted))
# A tibble: 836 x 5
# Groups: symbol, year [16]
symbol date adjusted year price_pct
<chr> <date> <dbl> <dbl> <dbl>
1 FB 2013-01-04 28.8 2013 0.519
2 FB 2013-01-11 31.7 2013 0.572
3 FB 2013-01-18 29.7 2013 0.535
4 FB 2013-01-25 31.5 2013 0.569
5 FB 2013-02-01 29.7 2013 0.536
6 FB 2013-02-08 28.5 2013 0.515
7 FB 2013-02-15 28.3 2013 0.511
8 FB 2013-02-22 27.1 2013 0.489
9 FB 2013-03-01 27.8 2013 0.501
10 FB 2013-03-08 28.0 2013 0.504
# ... with 826 more rows
Related
I have daily discharge data from a local stream near me. I am trying to sum and take the average of the daily data into weekly or monthly chunks so I can plot discharge_m3d(discharge) and Qs_sum(depletion) by weekly and monthly timeframes. Does anyone know how I can do this? I attached a figure of how my data frame looks.
People often use floor_date() from lubridate for these purposes. You can floor to a unit of month or week and then group by the resulting date column. Then you can use summarize() to compute the monthly or weekly sums/averages. From there you can use your plotting library of choice to visualize the result (like ggplot2, not shown).
This works even if you have more than one year of data (i.e. where the month or week number might repeat).
library(dplyr)
library(lubridate)
set.seed(123)
df <- tibble(
date = seq(
from = as.Date("2014-03-01"),
to = as.Date("2016-12-31"),
by = 1
),
Qs_sum = runif(length(date)),
discharge_m3d = runif(length(date))
)
df
#> # A tibble: 1,037 × 3
#> date Qs_sum discharge_m3d
#> <date> <dbl> <dbl>
#> 1 2014-03-01 0.288 0.560
#> 2 2014-03-02 0.788 0.427
#> 3 2014-03-03 0.409 0.448
#> 4 2014-03-04 0.883 0.833
#> 5 2014-03-05 0.940 0.720
#> 6 2014-03-06 0.0456 0.457
#> 7 2014-03-07 0.528 0.521
#> 8 2014-03-08 0.892 0.242
#> 9 2014-03-09 0.551 0.0759
#> 10 2014-03-10 0.457 0.391
#> # … with 1,027 more rows
df %>%
mutate(date = floor_date(date, unit = "month")) %>%
group_by(date) %>%
summarise(
n = n(),
qs_total = sum(Qs_sum),
qs_average = mean(Qs_sum),
discharge_total = sum(discharge_m3d),
discharge_average = mean(discharge_m3d),
.groups = "drop"
)
#> # A tibble: 34 × 6
#> date n qs_total qs_average discharge_total discharge_average
#> <date> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2014-03-01 31 18.1 0.585 15.3 0.494
#> 2 2014-04-01 30 12.9 0.429 15.2 0.507
#> 3 2014-05-01 31 15.5 0.500 15.3 0.493
#> 4 2014-06-01 30 15.8 0.525 16.3 0.542
#> 5 2014-07-01 31 15.1 0.487 13.9 0.449
#> 6 2014-08-01 31 14.8 0.478 16.2 0.522
#> 7 2014-09-01 30 15.3 0.511 13.1 0.436
#> 8 2014-10-01 31 15.6 0.504 14.7 0.475
#> 9 2014-11-01 30 16.0 0.532 15.1 0.502
#> 10 2014-12-01 31 14.2 0.458 15.5 0.502
#> # … with 24 more rows
# Assert that the "start of the week" is Sunday.
# So groups are made of data from [Sunday -> Monday]
sunday <- 7L
df %>%
mutate(date = floor_date(date, unit = "week", week_start = sunday)) %>%
group_by(date) %>%
summarise(
n = n(),
qs_total = sum(Qs_sum),
qs_average = mean(Qs_sum),
discharge_total = sum(discharge_m3d),
discharge_average = mean(discharge_m3d),
.groups = "drop"
)
#> # A tibble: 149 × 6
#> date n qs_total qs_average discharge_total discharge_average
#> <date> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2014-02-23 1 0.288 0.288 0.560 0.560
#> 2 2014-03-02 7 4.49 0.641 3.65 0.521
#> 3 2014-03-09 7 3.77 0.539 3.88 0.554
#> 4 2014-03-16 7 4.05 0.579 3.45 0.493
#> 5 2014-03-23 7 4.43 0.632 3.08 0.440
#> 6 2014-03-30 7 4.00 0.572 4.74 0.677
#> 7 2014-04-06 7 2.50 0.357 3.15 0.449
#> 8 2014-04-13 7 2.48 0.355 2.44 0.349
#> 9 2014-04-20 7 2.30 0.329 2.45 0.349
#> 10 2014-04-27 7 3.44 0.492 4.40 0.629
#> # … with 139 more rows
Created on 2022-04-13 by the reprex package (v2.0.1)
One way to approach this is using the lubridate and dplyr packages in the tidyverse. I assume here that your dates are year-month-day which they appear to be and that you only have one calendar year or at least no repeated months/weeks across two years.
monthly_discharge <- discharge %>%
filter(variable == "discharge") # First select just the rows that represent discharge (not clear if that's necessary here)
mutate(date = ymd(date), # convert date to a lubridate date object
month = month(date), # extract the numbered month from the date
week = week(date)) %>% # extract the numbered week in a year from the date
group_by(month, stream) %>% # group your data by month and stream
summarize(discharge_summary = mean(discharge_m3d)) # summarize your data so that each month has a single row with a single (mean) discharge value
# you can include multiple summary variables within the summarize function
This should produce a data frame with one row per month for each stream and a summary value for discharge. You could summarize by week by changing the month label in group_by to week.
Make use of the functions week(), month() and year() from the package lubridate to get the corresponding values for your date column. Afterwards we can find the means per week, month or year. For illustration, I added a row with year 2015, since there was only year 2014 in your sample data. Furthermore, for plotting reasons, I added a column "Year_Month" that shows the abbreviated month followed by year (x axis of the plot).
library(dplyr)
library(lubridate)
data <- data %>% mutate(Week = week(date), Month = month(date), Year = year(date)) %>%
group_by(Year, Week) %>%
mutate(mean_Week_Qs = mean(Qs_sum)) %>%
ungroup() %>%
group_by(Year, Month) %>%
mutate(mean_Month_Qs = mean(Qs_sum)) %>%
ungroup() %>%
group_by(Year) %>%
mutate(mean_Year_Qs = mean(Qs_sum)) %>%
ungroup() %>%
mutate(Year_Month = paste0(lubridate::month(date, label = TRUE), " ", Year)) %>%
ungroup()
> data
# A tibble: 12 x 10
date discharge_m3d Qs_sum Week Month Year mean_Week_Qs mean_Month_Qs mean_Year_Qs Year_Month
<date> <dbl> <dbl> <int> <int> <int> <dbl> <dbl> <dbl> <chr>
1 2014-03-01 797 0 9 3 2014 0.0409 0.629 0.629 Mar 2014
2 2014-03-02 826 0.00833 9 3 2014 0.0409 0.629 0.629 Mar 2014
3 2014-03-03 3760 0.114 9 3 2014 0.0409 0.629 0.629 Mar 2014
4 2014-03-04 4330 0.292 10 3 2014 0.785 0.629 0.629 Mar 2014
5 2014-03-05 2600 0.480 10 3 2014 0.785 0.629 0.629 Mar 2014
6 2014-03-06 4620 0.656 10 3 2014 0.785 0.629 0.629 Mar 2014
7 2014-03-07 2510 0.816 10 3 2014 0.785 0.629 0.629 Mar 2014
8 2014-03-08 1620 0.959 10 3 2014 0.785 0.629 0.629 Mar 2014
9 2014-03-09 2270 1.09 10 3 2014 0.785 0.629 0.629 Mar 2014
10 2014-03-10 5650 1.20 10 3 2014 0.785 0.629 0.629 Mar 2014
11 2014-03-11 2530 1.31 11 3 2014 1.31 0.629 0.629 Mar 2014
12 2015-03-06 1470 1.52 10 3 2015 1.52 1.52 1.52 Mar 2015
Now we can plot, for example Qs_sum per year and month, and add the mean as a red dot:
ggplot(data, aes(Year_Month, Qs_sum)) +
theme_classic() +
geom_point(size = 2) +
geom_point(aes(Year_Month, mean_Month_Qs), color = "red", size = 5, alpha = 0.6)
To summarize the results by weekly or monthly averages, you can do as follows, using distinct():
data %>% distinct(Year, Week, mean_Week_Qs)
# A tibble: 4 x 3
Week Year mean_Week_Qs
<int> <int> <dbl>
1 9 2014 0.0409
2 10 2014 0.785
3 11 2014 1.31
4 10 2015 1.52
data %>% distinct(Year, Month, mean_Month_Qs)
# A tibble: 2 x 3
Month Year mean_Month_Qs
<int> <int> <dbl>
1 3 2014 0.629
2 3 2015 1.52
This can only be done after the mutate() and mean() commands above. If you want to get directly to summarized results, you can use summarize() directly on the initial dataframe:
data %>% group_by(Year, Week) %>% summarise(Week_Avg = mean(Qs_sum))
# A tibble: 4 x 3
# Groups: Year [2]
Year Week Week_Avg
<int> <int> <dbl>
1 2014 9 0.0409
2 2014 10 0.785
3 2014 11 1.31
4 2015 10 1.52
data %>% group_by(Year, Month) %>% summarise(Month_Avg = mean(Qs_sum))
# A tibble: 2 x 3
# Groups: Year [2]
Year Month Month_Avg
<int> <int> <dbl>
1 2014 3 0.629
2 2015 3 1.52
Note that for plotting, mutate() is preferred, since it preserves the single weekly points (black in the plot above), if we used summarise() instead, we would be left with only the red points.
Data
data <- structure(list(date = structure(16130:16140, class = "Date"),
discharge_m3d = c(797, 826, 3760, 4330, 2600, 4620, 2510,
1620, 2270, 5650, 2530), Qs_sum = c(0, 0.00833424, 0.114224781,
0.291812109, 0.479780482, 0.656321971, 0.816140731, 0.959334606,
1.087579095, 1.20284046, 1.30695595), Week = c(9L, 9L, 9L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L), Month = c(3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L)), row.names = c(NA, -11L
), class = c("tbl_df", "tbl", "data.frame"))
I am new to coding. I have a data set of daily stream flow averages over 20 years. Following is an example:
DATE FLOW
1 10/1/2001 88.2
2 10/2/2001 77.6
3 10/3/2001 68.4
4 10/4/2001 61.5
5 10/5/2001 55.3
6 10/6/2001 52.5
7 10/7/2001 49.7
8 10/8/2001 46.7
9 10/9/2001 43.3
10 10/10/2001 41.3
11 10/11/2001 39.3
12 10/12/2001 37.7
13 10/13/2001 35.8
14 10/14/2001 34.1
15 10/15/2001 39.8
I need to create a loop summing the previous 6 days as well as the current day (rolling weekly average), and print it to an array for the designated water year. I have already created an aggregate function to separate yearly average daily means into their designated water years.
# Separating dates into specific water years
wtr_yr <- function(dates, start_month=9)
# Convert dates into POSIXlt
POSIDATE = as.POSIXlt(NEW_DATE)
# Year offset
offset = ifelse(POSIDATE$mon >= start_month - 1, 1, 0)
# Water year
adj.year = POSIDATE$year + 1900 + offset
# Aggregating the water year function to take the mean
mean.FLOW=aggregate(data_set$FLOW,list(adj.year), mean)
It seems that it can be done much more easily.
But first I need to prepare a bit more data.
library(tidyverse)
library(lubridate)
df = tibble(
DATE = seq(mdy("1/1/2010"), mdy("12/31/2022"), 1),
FLOW = rnorm(length(DATE), 40, 10)
)
output
# A tibble: 4,748 x 2
DATE FLOW
<date> <dbl>
1 2010-01-01 34.4
2 2010-01-02 37.7
3 2010-01-03 55.6
4 2010-01-04 40.7
5 2010-01-05 41.3
6 2010-01-06 57.2
7 2010-01-07 44.6
8 2010-01-08 27.3
9 2010-01-09 33.1
10 2010-01-10 35.5
# ... with 4,738 more rows
Now let's do the aggregation by year and week number
df %>%
group_by(year(DATE), week(DATE)) %>%
summarise(mean = mean(FLOW))
output
# A tibble: 689 x 3
# Groups: year(DATE) [13]
`year(DATE)` `week(DATE)` mean
<dbl> <dbl> <dbl>
1 2010 1 44.5
2 2010 2 39.6
3 2010 3 38.5
4 2010 4 35.3
5 2010 5 44.1
6 2010 6 39.4
7 2010 7 41.3
8 2010 8 43.9
9 2010 9 38.5
10 2010 10 42.4
# ... with 679 more rows
Note, for the function week, the first week starts on January 1st. If you want to number the weeks according to the ISO 8601 standard, use the isoweek function. Alternatively, you can also use an epiweek compatible with the US CDC.
df %>%
group_by(year(DATE), isoweek(DATE)) %>%
summarise(mean = mean(FLOW))
output
# A tibble: 681 x 3
# Groups: year(DATE) [13]
`year(DATE)` `isoweek(DATE)` mean
<dbl> <dbl> <dbl>
1 2010 1 40.0
2 2010 2 45.5
3 2010 3 33.2
4 2010 4 38.9
5 2010 5 45.0
6 2010 6 40.7
7 2010 7 38.5
8 2010 8 42.5
9 2010 9 37.1
10 2010 10 42.4
# ... with 671 more rows
If you want to better understand how these functions work, please follow the code below
df %>%
mutate(
w1 = week(DATE),
w2 = isoweek(DATE),
w3 = epiweek(DATE)
)
output
# A tibble: 4,748 x 5
DATE FLOW w1 w2 w3
<date> <dbl> <dbl> <dbl> <dbl>
1 2010-01-01 34.4 1 53 52
2 2010-01-02 37.7 1 53 52
3 2010-01-03 55.6 1 53 1
4 2010-01-04 40.7 1 1 1
5 2010-01-05 41.3 1 1 1
6 2010-01-06 57.2 1 1 1
7 2010-01-07 44.6 1 1 1
8 2010-01-08 27.3 2 1 1
9 2010-01-09 33.1 2 1 1
10 2010-01-10 35.5 2 1 2
# ... with 4,738 more rows
I am trying to find mean of A and B for each row and save it as separate column but seems like the code only average the first row and fill the rest of the rows with that value. Any suggestion how to fix this?
library(tidyverse)
library(lubridate)
set.seed(123)
DF <- data.frame(Date = seq(as.Date("2001-01-01"), to = as.Date("2003-12-31"), by = "day"),
A = runif(1095, 1,60),
Z = runif(1095, 5,100)) %>%
mutate(MeanofAandZ= mean(A:Z))
Are you looking for this:
DF %>% rowwise() %>% mutate(MeanofAandZ = mean(c_across(A:Z)))
# A tibble: 1,095 x 4
# Rowwise:
Date A Z MeanofAandZ
<date> <dbl> <dbl> <dbl>
1 2001-01-01 26.5 7.68 17.1
2 2001-01-02 54.9 33.1 44.0
3 2001-01-03 37.1 82.0 59.5
4 2001-01-04 6.91 18.0 12.4
5 2001-01-05 53.0 8.76 30.9
6 2001-01-06 26.1 7.63 16.9
7 2001-01-07 59.3 30.8 45.0
8 2001-01-08 39.9 14.6 27.3
9 2001-01-09 59.2 93.6 76.4
10 2001-01-10 30.7 89.1 59.9
you can do it with Base R: rowMeans
Full Base R:
DF$MeanofAandZ <- rowMeans(DF[c("A", "Z")])
head(DF)
#> Date A Z MeanofAandZ
#> 1 2001-01-01 17.967074 76.92436 47.44572
#> 2 2001-01-02 47.510003 99.28325 73.39663
#> 3 2001-01-03 25.129638 64.33253 44.73109
#> 4 2001-01-04 53.098027 32.42556 42.76179
#> 5 2001-01-05 56.487570 23.99162 40.23959
#> 6 2001-01-06 3.687833 81.08720 42.38751
or inside a mutate:
library(dplyr)
DF <- DF %>% mutate(MeanofAandZ = rowMeans(cbind(A,Z)))
head(DF)
#> Date A Z MeanofAandZ
#> 1 2001-01-01 17.967074 76.92436 47.44572
#> 2 2001-01-02 47.510003 99.28325 73.39663
#> 3 2001-01-03 25.129638 64.33253 44.73109
#> 4 2001-01-04 53.098027 32.42556 42.76179
#> 5 2001-01-05 56.487570 23.99162 40.23959
#> 6 2001-01-06 3.687833 81.08720 42.38751
We can also do
DF$MeanofAandZ <- Reduce(`+`, DF[c("A", "Z")])/2
Or using apply
DF$MeanofAandZ <- apply(DF[c("A", "Z")], 1, mean)
I have the following data:
# A tibble: 7,971 x 10
symbol date open high low close volume adjusted start_date end_date
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <date> <date>
1 AAPL 2009-01-02 12.3 13.0 12.2 13.0 186503800 11.4 2009-07-31 2010-06-30
2 AAPL 2009-01-05 13.3 13.7 13.2 13.5 295402100 11.8 2009-07-31 2010-06-30
3 AAPL 2009-01-06 13.7 13.9 13.2 13.3 322327600 11.6 2009-07-31 2010-06-30
4 AAPL 2009-01-07 13.1 13.2 12.9 13.0 188262200 11.4 2009-07-31 2010-06-30
5 AAPL 2009-01-08 12.9 13.3 12.9 13.2 168375200 11.6 2009-07-31 2010-06-30
6 AAPL 2009-01-09 13.3 13.3 12.9 12.9 136711400 11.3 2009-07-31 2010-06-30
7 AAPL 2009-01-12 12.9 13.0 12.5 12.7 154429100 11.1 2009-07-31 2010-06-30
8 AAPL 2009-01-13 12.6 12.8 12.3 12.5 199599400 11.0 2009-07-31 2010-06-30
9 AAPL 2009-01-14 12.3 12.5 12.1 12.2 255416000 10.7 2009-07-31 2010-06-30
10 AAPL 2009-01-15 11.5 12.0 11.4 11.9 457908500 10.4 2009-07-31 2010-06-30
I am trying to group by symbol, start date and end date and take the difference between the first observation on the start date and the last observation on the end date. I just can't seem to get it working.
That is take the difference of the "close" on the start date and the "close" on the end date.
Any help would be great, thanks!
syms <- c("AAPL", "MSFT", "GOOG")
library(tidyquant)
data <- tq_get(syms)
data <- data %>%
mutate( start_date = paste(year(date %m+% months(6)), "07", "31", sep = "-"), # note this is the start_date for when we calculate the returns - we will have bought this portfolio on the 1st July but we get returns on the 31st
end_date = paste(year(date %m+% months(18)), "06", "30", sep = "-"),
start_date = as.Date(start_date),
end_date = as.Date(end_date))
My attempt...
data %>%
group_by(symbol, start_date, end_date) %>%
summarise(diff = diff(close))
EDIT:
I am trying to group by symbol and then take start_date - end_date. So first, I should be grouping by symbol and filtering the date column down to between the start_date and end_date values. i.e. I am only interested in the "close" price on the start_date and end_date days (which is fixed). Then just take the difference between the close price on the start_date and end_date. So most of the stock price data is useless here and I am only interested in the close on the start_date and end_date then take the difference between these two values.
I think what you are looking for is to subtract first and last close value for each group
library(dplyr)
data %>%
group_by(symbol, start_date, end_date) %>%
summarise(diff = first(close) - last(close))
# symbol start_date end_date diff
# <chr> <date> <date> <dbl>
# 1 AAPL 2009-07-31 2010-06-30 -7.38
# 2 AAPL 2010-07-31 2011-06-30 -15.5
# 3 AAPL 2011-07-31 2012-06-30 -12.5
# 4 AAPL 2012-07-31 2013-06-30 -34.4
# 5 AAPL 2013-07-31 2014-06-30 28.0
# 6 AAPL 2014-07-31 2015-06-30 -34.5
# 7 AAPL 2015-07-31 2016-06-30 -31.9
# 8 AAPL 2016-07-31 2017-06-30 31
# 9 AAPL 2017-07-31 2018-06-30 -48.1
#10 AAPL 2018-07-31 2019-06-30 -41.6
# … with 26 more rows
Another way to write it could be
data %>%
group_by(symbol, start_date, end_date) %>%
summarise(diff = close[1L] - close[n()])
Or it can be also done using base R aggregate
aggregate(close~symbol +start_date + end_date,data,function(x) x[1L] - x[length(x)])
You can take this approach...
# create df of unique symbol, start, and end date combos
df1 <- df %>% distinct(symbol,start_date,end_date)
# join original data that match the desired start/end dates
df1 <- df %>% select(start_close=close,symbol,start_date=date) %>% left_join(df1,.)
df1 <- df %>% select(end_close=close,symbol,end_date=date) %>% left_join(df1,.)
# find difference in close values
df1 %>% mutate(diff=end_close - start_close)
# A tibble: 36 x 6
symbol start_date end_date start_close end_close diff
<chr> <date> <date> <dbl> <dbl> <dbl>
1 AAPL 2009-07-31 2010-06-30 23.3 35.9 12.6
2 AAPL 2010-07-31 2011-06-30 NA 48.0 NA
3 AAPL 2011-07-31 2012-06-30 NA NA NA
4 AAPL 2012-07-31 2013-06-30 87.3 NA NA
5 AAPL 2013-07-31 2014-06-30 64.6 92.9 28.3
6 AAPL 2014-07-31 2015-06-30 95.6 125. 29.8
7 AAPL 2015-07-31 2016-06-30 121. 95.6 -25.7
8 AAPL 2016-07-31 2017-06-30 NA 144. NA
9 AAPL 2017-07-31 2018-06-30 149. NA NA
10 AAPL 2018-07-31 2019-06-30 190. NA NA
# ... with 26 more rows
There are NAs since not every start/end date are in the original date column.
I am trying to summarise this daily time serie of rainfall by groups of 10-day periods within each month and calculate the acummulated rainfall.
library(tidyverse)
(dat <- tibble(
date = seq(as.Date("2016-01-01"), as.Date("2016-12-31"), by=1),
rainfall = rgamma(length(date), shape=2, scale=2)))
Therefore, I will obtain variability in the third group along the year, for instance: in january the third period has 11 days, february 9 days, and so on. This is my try:
library(lubridate)
dat %>%
group_by(decade=floor_date(date, "10 days")) %>%
summarize(acum_rainfall=sum(rainfall),
days = n())
this is the resulting output
# A tibble: 43 x 3
decade acum_rainfall days
<date> <dbl> <int>
1 2016-01-01 48.5 10
2 2016-01-11 39.9 10
3 2016-01-21 36.1 10
4 2016-01-31 1.87 1
5 2016-02-01 50.6 10
6 2016-02-11 32.1 10
7 2016-02-21 22.1 9
8 2016-03-01 45.9 10
9 2016-03-11 30.0 10
10 2016-03-21 42.4 10
# ... with 33 more rows
can someone help me to sum the residuals periods to the third one to obtain always 3 periods within each month? This would be the desired output (pay attention to the row 3):
decade acum_rainfall days
<date> <dbl> <int>
1 2016-01-01 48.5 10
2 2016-01-11 39.9 10
3 2016-01-21 37.97 11
4 2016-02-01 50.6 10
5 2016-02-11 32.1 10
6 2016-02-21 22.1 9
One way to do this is to use if_else to apply floor_date with different arguments depending on the day value of date. If day(date) is <30, use the normal way, if it's >= 30, then use '20 days' to ensure it gets rounded to day 21:
dat %>%
group_by(decade=if_else(day(date) >= 30,
floor_date(date, "20 days"),
floor_date(date, "10 days"))) %>%
summarize(acum_rainfall=sum(rainfall),
days = n())
# A tibble: 36 x 3
decade acum_rainfall days
<date> <dbl> <int>
1 2016-01-01 38.8 10
2 2016-01-11 38.4 10
3 2016-01-21 43.4 11
4 2016-02-01 34.4 10
5 2016-02-11 34.8 10
6 2016-02-21 25.3 9
7 2016-03-01 39.6 10
8 2016-03-11 53.9 10
9 2016-03-21 38.1 11
10 2016-04-01 36.6 10
# … with 26 more rows