R Tidyquant adding a price difference column to a tibble - r

I'm trying to add a column to a Tidyquant tibble. Here's the code:
library(tidyquant)
symbol <- 'AAPL'
start_date <- as.Date('2022-01-01')
end_date <- as.Date('2022-03-31')
prices <- tq_get(symbol,
from = start_date,
to = end_date,
get = 'stock.prices')
head(prices)
# A tibble: 6 x 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AAPL 2022-01-03 178. 183. 178. 182. 104487900 182.
2 AAPL 2022-01-04 183. 183. 179. 180. 99310400 179.
3 AAPL 2022-01-05 180. 180. 175. 175. 94537600 175.
4 AAPL 2022-01-06 173. 175. 172. 172 96904000 172.
5 AAPL 2022-01-07 173. 174. 171. 172. 86709100 172.
6 AAPL 2022-01-10 169. 172. 168. 172. 106765600 172.
Now, I'm attempting to add the change_on_day column (that's just the difference in the 'adjusted' prices between one day and the next) using the following:
prices$change_on_day <- diff(prices$adjusted)
The error message is:
Error: Assigned data `diff(prices$adjusted)` must be compatible with existing data.
x Existing data has 61 rows.
x Assigned data has 60 rows.
i Only vectors of size 1 are recycled.
How would I add this price difference column?
Thanks!

If you are trying to get today's value from the previous date value then you should be able to do that with the lag() function
prices %>%
mutate(change_on_day=adjusted-lag(adjusted,1))

We can use tq_transmute with quantmod::periodReturn setting the period argument to 'daily' in order to calculate daily returns.
library(tidyquant)
symbol <- "AAPL"
start_date <- as.Date("2022-01-01")
end_date <- as.Date("2022-03-31")
prices <- tq_get(symbol,
from = start_date,
to = end_date,
get = "stock.prices"
)
stock_returns_monthly <- prices %>%
tq_transmute(
select = adjusted,
mutate_fun = periodReturn,
period = "daily",
col_rename = "change_on_day"
)
stock_returns_monthly
#> # A tibble: 61 × 2
#> date change_on_day
#> <date> <dbl>
#> 1 2022-01-03 0
#> 2 2022-01-04 -0.0127
#> 3 2022-01-05 -0.0266
#> 4 2022-01-06 -0.0167
#> 5 2022-01-07 0.000988
#> 6 2022-01-10 0.000116
#> 7 2022-01-11 0.0168
#> 8 2022-01-12 0.00257
#> 9 2022-01-13 -0.0190
#> 10 2022-01-14 0.00511
#> # … with 51 more rows
Created on 2022-04-18 by the reprex package (v2.0.1)
For more information check this vignette

Related

R: Accessing first three elements of splitted dataframe

For example,
dateIntervals <- as.Date(c("2020-08-10", "2020-11-11", "2021-07-05"))
possibleDates <- seq(as.Date("2020-01-02"), dateIntervals[3], by = "day")
genDF<-function() data.frame(Date = sample(possibleDates, 100), Value = runif(100))
listdf <-replicate(2, genDF(), simplify = FALSE)
Yes, listdf has two dataframe elements(each 100 random dates in possibleDates and values)
and listdf[[1]] is like this
A data.frame: 100 × 2
Date Value
<date> <dbl>
2020-07-24 0.63482411
2020-02-26 0.25989280
2020-10-26 0.21721077
2020-10-11 0.34774192
2020-08-18 0.67758312
2020-02-03 0.22929624
2020-06-10 0.30279353
2020-05-29 0.95549488
...
lapply(listdf, function(x) split(x, findInterval(x$Date, dateIntervals)))
Made listdf as a 2*3 list, splitted by date.
1.$`0`
A data.frame: 43 × 2
Date Value
<date> <dbl>
1 2020-07-24 0.63482411
2 2020-02-26 0.25989280
6 2020-02-03 0.22929624
7 2020-06-10 0.30279353
...
$`1`
A data.frame: 15 × 2
Date Value
<date> <dbl>
3 2020-10-26 0.21721077
4 2020-10-11 0.34774192
5 2020-08-18 0.67758312
31 2020-11-09 0.59149301
...
$`2`
A data.frame: 42 × 2
Date Value
<date> <dbl>
9 2021-06-28 0.10055644
10 2021-05-17 0.63942936
12 2021-04-22 0.63589801
13 2021-02-01 0.70106156
...
2.$`0`
A data.frame: 43 × 2
Date Value
<date> <dbl>
2 2020-07-16 0.81376364
4 2020-07-03 0.05152627
7 2020-01-21 0.98677433
8 2020-03-23 0.13513921
...
$`1`
A data.frame: 18 × 2
Date Value
<date> <dbl>
5 2020-11-01 0.02740125
12 2020-09-04 0.82042568
15 2020-08-12 0.54190868
16 2020-09-19 0.05933666
18 2020-10-05 0.04983061
...
$`2`
A data.frame: 38 × 2
Date Value
<date> <dbl>
1 2021-04-13 0.46199245
3 2021-06-12 0.71461155
6 2021-01-24 0.56527997
9 2021-04-17 0.72634151
13 2021-04-20 0.55489499
...
I want only first two of the splitted ones.($'0' and $'1' for 1. and 2.)
is there any parameter in the split function which does things like this?
(getting only first or last n elements)
I want something like this...
lapply(listdf, function(x) split(x, findInterval(x$Date, dateIntervals), some parameter=2))
yes this "2". Getting only the first two ones. is there a function parameter in split which can do this?

Rename() function in R not working with Dplyr pipe inside for loop

I retrieve a bunch of data using the Tidyquant API, as follows:
library(tidyquant)
symbols_list <- c('DGS2', 'DGS5', 'DGS10', 'DGS30')
start_date <- as.Date('2022-01-01')
end_date <- as.Date('2022-03-31')
for (asset in symbols_list){
# retrieves the data
assign(paste('sym', asset, sep = '_'), tq_get(asset,
from = start_date,
to = end_date,
get = 'economic.data') %>%
)
}
The data looks as follows:
sym_DGS5
# A tibble: 64 x 3
symbol date price
<chr> <date> <dbl>
1 DGS5 2022-01-03 1.37
2 DGS5 2022-01-04 1.37
3 DGS5 2022-01-05 1.43
4 DGS5 2022-01-06 1.47
5 DGS5 2022-01-07 1.5
6 DGS5 2022-01-10 1.53
7 DGS5 2022-01-11 1.51
8 DGS5 2022-01-12 1.5
9 DGS5 2022-01-13 1.47
10 DGS5 2022-01-14 1.55
# ... with 54 more rows
I would like to rename the generic price column with the names of the symbols. In this case, I would like to rename "price" as "DGS5" To do so, I use the rename() function and the dpyr pipe, as follows:
start_date <- as.Date('2022-01-01')
end_date <- as.Date('2022-03-31')
symbol_list <- c('T10Y2Y', 'DGS2', 'DGS5', 'DGS10', 'DGS30')
symbol_list
for (asset in symbol_list){
# retrieves the data
assign(paste('sym', asset, sep = '_'), tq_get(asset,
from = start_date,
to = end_date,
get = 'economic.data'
) %>% rename(asset = price)
)
}
But, doing so gives:
sym_DGS5
# A tibble: 64 x 3
symbol date asset
<chr> <date> <dbl>
1 DGS5 2022-01-03 1.37
2 DGS5 2022-01-04 1.37
3 DGS5 2022-01-05 1.43
4 DGS5 2022-01-06 1.47
5 DGS5 2022-01-07 1.5
6 DGS5 2022-01-10 1.53
7 DGS5 2022-01-11 1.51
8 DGS5 2022-01-12 1.5
9 DGS5 2022-01-13 1.47
10 DGS5 2022-01-14 1.55
# ... with 54 more rows
Notice how the column was renamed to "asset". I would have expected the column to be renamed to "DGS5".
What did I do incorrectly here?
Thanks!
Note that you have a data-variable in a function argument (i.e. an env-variable that holds a promise, you need to embrace the argument by surrounding it in doubled braces. This is called [INDIRECTION], which is a glue syntax.
If you want to use the names of variables in the output, you can use glue syntax in conjunction with :=
Therefore you get
rename({{asset}} = price)
Check here for more information

Is there a way to group data according to time in R?

I'm working with trip ticket data and it includes a column with dates and times. I'm want to group trips according to Morning(05:00 - 10:59), Lunch(11:00-12:59), Afternoon(13:00-17:59), Evening(18:00-23:59), and Dawn/Graveyard(00:00-04:59) and then count the number of trips (by means of counting the unique values in the trip_id column) for each of those categories.
Only I don't know how to group/summarize according to time values. Is this possible in R?
trip_id start_time end_time day_of_week
1 CFA86D4455AA1030 2021-03-16 08:32:30 2021-03-16 08:36:34 Tuesday
2 30D9DC61227D1AF3 2021-03-28 01:26:28 2021-03-28 01:36:55 Sunday
3 846D87A15682A284 2021-03-11 21:17:29 2021-03-11 21:33:53 Thursday
4 994D05AA75A168F2 2021-03-11 13:26:42 2021-03-11 13:55:41 Thursday
5 DF7464FBE92D8308 2021-03-21 09:09:37 2021-03-21 09:27:33 Sunday
Here's a solution with hour() and case_when().
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
trip <- tibble(start_time = mdy_hm("1/1/2022 1:00") + minutes(seq(0, 700, 15)))
trip <- trip %>%
mutate(
hr = hour(start_time),
time_of_day = case_when(
hr >= 5 & hr < 11 ~ "morning",
hr >= 11 & hr < 13 ~ "afternoon",
TRUE ~ "fill in the rest yourself :)"
)
)
print(trip)
#> # A tibble: 47 x 3
#> start_time hr time_of_day
#> <dttm> <int> <chr>
#> 1 2022-01-01 01:00:00 1 fill in the rest yourself :)
#> 2 2022-01-01 01:15:00 1 fill in the rest yourself :)
#> 3 2022-01-01 01:30:00 1 fill in the rest yourself :)
#> 4 2022-01-01 01:45:00 1 fill in the rest yourself :)
#> 5 2022-01-01 02:00:00 2 fill in the rest yourself :)
#> 6 2022-01-01 02:15:00 2 fill in the rest yourself :)
#> 7 2022-01-01 02:30:00 2 fill in the rest yourself :)
#> 8 2022-01-01 02:45:00 2 fill in the rest yourself :)
#> 9 2022-01-01 03:00:00 3 fill in the rest yourself :)
#> 10 2022-01-01 03:15:00 3 fill in the rest yourself :)
#> # ... with 37 more rows
trips <- trip %>%
count(time_of_day)
print(trips)
#> # A tibble: 3 x 2
#> time_of_day n
#> <chr> <int>
#> 1 afternoon 7
#> 2 fill in the rest yourself :) 16
#> 3 morning 24
Created on 2022-03-21 by the reprex package (v2.0.1)

Calculate Stocks biWeekly Return Standard Deviation in R

I have a very long dataset of numerous stocks for many years, similar to this one:
one_ticker = tq_get("AAPL", from = "2021-06-01")
one_ticker <- one_ticker %>%
mutate(day = day(date),
month = month(date),
year = year(date))
symbol date open high low close volume adjusted day month year
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
1 AAPL 2021-06-01 125. 125. 124. 124. 67637100 124. 1 6 2021
2 AAPL 2021-06-02 124. 125. 124. 125. 59278900 125. 2 6 2021
3 AAPL 2021-06-03 125. 125. 123. 124. 76229200 123. 3 6 2021
4 AAPL 2021-06-04 124. 126. 124. 126. 75169300 126. 4 6 2021
5 AAPL 2021-06-07 126. 126. 125. 126. 71057600 126. 7 6 2021
6 AAPL 2021-06-08 127. 128. 126. 127. 74403800 126. 8 6 2021
7 AAPL 2021-06-09 127. 128. 127. 127. 56877900 127. 9 6 2021
8 AAPL 2021-06-10 127. 128. 126. 126. 71186400 126. 10 6 2021
9 AAPL 2021-06-11 127. 127. 126. 127. 53522400 127. 11 6 2021
10 AAPL 2021-06-14 128. 131. 127. 130. 96906500 130. 14 6 2021
I want first to calculate the biWeekly adjusted price return within each month:
-first biWeekly interval: days 1-15
-second biWeekly interval: days 16-30
Calculate the adjusted returns standard deviation within each quarter.
Here are the results (for Apple last 6 months):
1. Adjusted_biWeekly_Returns
[1] 0.043128324
[2] 0.052324355
[3] 0.081663817
[4] -0.003620508
[5] 0.026136504
[6] 0.004698278
[7] -0.022818187
[8] -0.048995111
[9] 0.0153523
[10] 0.022176775
Explanations:
[1] 129.257401/123.913231-1 = 0.043128324
(15/06/2021 adjusted price// 01/06/2021 adjusted price)
[5] 148.882721/145.090561-1 = 0.026136504
(13/08/2021 & 02/08/2021) - because there was no trading on the 15th and the 1st.
2. Quarterly Standard Deviation:
1/06/2021 - 1/09/2021 0.028944365 ([1]-[6] standard deviation)
1/09/2021 - 1/01/2022 Not available yet.
How can I calculate it in R?
*there is the tq_transmute function which is very useful for weekly but not biWeekly calculations
You could do each step separately and make use of the to.weekly and apply.quarterly functions as done in the code below:
library(tidyverse)
library(tidyquant)
library(xts)
library(PerformanceAnalytics)
library(TTR)
one_ticker = tq_get("AAPL", from = "2021-06-01")
one_ticker <- one_ticker %>%
mutate(day = day(date),
month = month(date),
year = year(date))
aapl_adj <- xts(x = one_ticker$adjusted,
order.by = one_ticker$date)
aapl_adj_weekly <- to.weekly(aapl_adj, OHLC = F) # convert to weekly
idx <- seq_along(aapl_adj_weekly) %% 2 > 0 # create index for bi-weekly dates
aapl_adj_biweekly <- aapl_adj_weekly[idx, ] # extract bi-weekly values
aapl_adj_biweekly_returns <- TTR::ROC(aapl_adj_biweekly, type = "discrete", na.pad = F)
aapl_adj_biweekly_returns
# compute quarterly standard deviation
xts::apply.quarterly(aapl_adj_biweekly_returns, sd)
e1
2021-06-18 NA
2021-09-24 0.03159961
2021-11-16 0.02900001
If you don't need to downsample to a biweekly frequency, you could just run this in one go for each ticker. This has the advantage of better estimates of the return standard deviation too, since you use all the available data points instead of only biweekly data:
# fast version without downsampling and annualized standard deviation
aapl_adj |> TTR::ROC(type = "discrete", na.pad = F) |> xts::apply.quarterly(FUN = PerformanceAnalytics::sd.annualized)
e1
2021-06-30 0.1551537
2021-09-30 0.2063587
2021-11-16 0.1701798

How do I use dplyr to correlate each column in a for loop?

I have a dataframe of 19 stocks, including the S&P500 (SPX), throughout time. I want to correlate each one of these stocks with the S&P for each month (Jan-Dec), making 18 x 12 = 216 different correlations, and store these in a list called stockList.
> tokens
# A tibble: 366 x 21
Month Date SPX TZERO .....(16 more columns of stocks)...... MPS
<dbl> <dttm> <dbl> <dbl> <dbl>
1 2020-01-02 3245.50 0.95 176.72
...
12 2020-12-31 3733.42 2.90 .....(16 more columns of stocks)..... 360.73
Here's where my error pops up, by using the index [i], or [[i]], in the cor() function
stockList <- list()
for(i in 1:18) {
stockList[[i]] <- tokens %>%
group_by(Month) %>%
summarize(correlation = cor(SPX, tokens[i+3], use = 'complete.obs'))
}
Error in summarise_impl(.data, dots) :
Evaluation error: incompatible dimensions.
How do I use column indexing in the cor() function when trying to summarize? Is there an alternative way?
First, to recreate data like yours:
library(tidyquant)
# Get gamestop, apple, and S&P 500 index prices
prices <- tq_get(c("GME", "AAPL", "^GSPC"),
get = "stock.prices",
from = "2020-01-01",
to = "2020-12-31")
library(tidyverse)
prices_wide <- prices %>%
select(date, close, symbol) %>%
pivot_wider(names_from = symbol, values_from = close) %>%
mutate(Month = lubridate::month(date)) %>%
select(Month, Date = date, GME, AAPL, SPX = `^GSPC`)
This should look like your data:
> prices_wide
# A tibble: 252 x 5
Month Date GME AAPL SPX
<dbl> <date> <dbl> <dbl> <dbl>
1 1 2020-01-02 6.31 75.1 3258.
2 1 2020-01-03 5.88 74.4 3235.
3 1 2020-01-06 5.85 74.9 3246.
4 1 2020-01-07 5.52 74.6 3237.
5 1 2020-01-08 5.72 75.8 3253.
6 1 2020-01-09 5.55 77.4 3275.
7 1 2020-01-10 5.43 77.6 3265.
8 1 2020-01-13 5.43 79.2 3288.
9 1 2020-01-14 4.71 78.2 3283.
10 1 2020-01-15 4.61 77.8 3289.
# … with 242 more rows
Then I put that data in longer "tidy" format where each row has the stock value and the SPX value so I can compare them:
prices_wide %>%
# I want every row to have month, date, and SPX
pivot_longer(cols = -c(Month, Date, SPX),
names_to = "symbol",
values_to = "price") %>%
group_by(Month, symbol) %>%
summarize(correlation = cor(price, SPX)) %>%
ungroup()
# A tibble: 24 x 3
Month symbol correlation
<dbl> <chr> <dbl>
1 1 AAPL 0.709
2 1 GME -0.324
3 2 AAPL 0.980
4 2 GME 0.874
5 3 AAPL 0.985
6 3 GME -0.177
7 4 AAPL 0.956
8 4 GME 0.873
9 5 AAPL 0.792
10 5 GME -0.435
# … with 14 more rows

Resources