How to get Indian Stock market Index data in r? - r

I was trying to get BSE SENSEX data in R and came across this Import Indian stock prices into R which is useful but I am not able to get SENSEX Index data using getSymbols from quantmod.
I have tried many options but none of them worked
quantmod::getSymbols("SENSEX", src="yahoo")
quantmod::getSymbols("SENSEX.BO", src="yahoo")
quantmod::getSymbols("BSE SENSEX", src="yahoo")
quantmod::getSymbols("BSE SENSEX.BO", src="yahoo")
About Sensex:

Packages tidyquant andQuantmod uses Yahoo Finance. Therefore, check what the security is called on their website, hence: ^BSESN
library(tidyquant)
tq_get("^BSESN")
# A tibble: 2,643 × 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ^BSESN 2012-01-03 15641. 15970. 15641. 15939. 16200 15939.
2 ^BSESN 2012-01-04 15967. 16005. 15822. 15883. 17800 15883.
3 ^BSESN 2012-01-05 15893. 15980. 15809. 15857. 21200 15857.
4 ^BSESN 2012-01-06 15789. 16001. 15665. 15868. 17200 15868.
5 ^BSESN 2012-01-07 NA NA NA NA NA NA
6 ^BSESN 2012-01-09 15840. 15872. 15678. 15815. 11200 15815.
7 ^BSESN 2012-01-10 15898. 16181. 15898. 16165. 19600 16165.
8 ^BSESN 2012-01-11 16222. 16245. 16128. 16176. 18600 16176.
9 ^BSESN 2012-01-12 16117. 16179. 15963. 16038. 14400 16038.
10 ^BSESN 2012-01-13 16145. 16257. 16050. 16155. 19600 16155.
# … with 2,633 more rows
# ℹ Use `print(n = ...)` to see more rows

Related

How i can find the total number of financial observations within a time frame in R?

Let's say that we have a data frame of stocks beginning at 2017/01/01 and ending 2022/04/01.
library(tidyquant)
symbol = c("DOW","TWTR","FB","GOOG","TSLA","NOK","AMZN","AAPL")
ASSETS = tq_get(symbol,
from = "2017-01-01",
to = "2022-04-01")
DATAWEB = ASSETS%>%
select(symbol,date,close,volume)
DATAWEB
i want to give a date (any date) and to calculate the annual observations of each stock. Now each year has 365 days but in finance the year has 252 trading days. If i take the percentage change of lag 2 in a year of 252 i will have 250 obs. But in my code below i find 251.(i am searching 1 year of 365 days back).Why ?
DATE = as.Date("2022-01-31",tryFormats = "%Y-%m-%d");DATE # for example 31 of March
DATAWEB%>%
dplyr::group_by(symbol, .drop=FALSE)%>%
dplyr::filter(date <=DATE & date >=(DATE-lubridate::years(1)) )%>%
dplyr::mutate(y = as.numeric( close - dplyr::lag(close, 2) / close))%>%
tidyr::drop_na()%>%
dplyr::select(symbol,date,y)%>%
dplyr::summarize(OBS1Y = n())
# A tibble: 8 x 2
symbol OBS1Y
<chr> <int>
1 AAPL 251
2 AMZN 251
3 DOW 251
4 FB 251
5 GOOG 251
6 NOK 251
7 TSLA 251
8 TWTR 251
The problem is that if i ago back 252 obs i will result to 250:
DATAWEB%>%
dplyr::group_by(symbol, .drop=FALSE)%>%
dplyr::filter(date <=DATE)%>%
dplyr::slice_tail(n=252)%>%
dplyr::mutate(y = as.numeric( close - dplyr::lag(close, 2) / close))%>%
tidyr::drop_na()%>%
dplyr::select(symbol,date,y)%>%
dplyr::summarize(OBS1Y = n())
# A tibble: 8 x 2
symbol OBS1Y
<chr> <int>
1 AAPL 250
2 AMZN 250
3 DOW 250
4 FB 250
5 GOOG 250
6 NOK 250
7 TSLA 250
8 TWTR 250
but it might NOT go exactly one year back but less (for example AAPL goes 252 obs back and stops later than one year):
# A tibble: 250 x 3
symbol date y
<chr> <date> <dbl>
1 AAPL 2021-02-04 136.
2 AAPL 2021-02-05 136.
3 AAPL 2021-02-08 136.
4 AAPL 2021-02-09 135.
5 AAPL 2021-02-10 134.
6 AAPL 2021-02-11 134.
7 AAPL 2021-02-12 134.
8 AAPL 2021-02-16 132.
9 AAPL 2021-02-17 130.
10 AAPL 2021-02-18 129.
# ... with 240 more rows
# i Use `print(n = ...)` to see more rows
My question is: How can i fix this ? I want to give any date that i want and (probably is the last day of each month) and go back one year that has 250 observations (252 but excluding NA's will result to 250).
Any help i would appreciate it.

R Tidyquant adding a price difference column to a tibble

I'm trying to add a column to a Tidyquant tibble. Here's the code:
library(tidyquant)
symbol <- 'AAPL'
start_date <- as.Date('2022-01-01')
end_date <- as.Date('2022-03-31')
prices <- tq_get(symbol,
from = start_date,
to = end_date,
get = 'stock.prices')
head(prices)
# A tibble: 6 x 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AAPL 2022-01-03 178. 183. 178. 182. 104487900 182.
2 AAPL 2022-01-04 183. 183. 179. 180. 99310400 179.
3 AAPL 2022-01-05 180. 180. 175. 175. 94537600 175.
4 AAPL 2022-01-06 173. 175. 172. 172 96904000 172.
5 AAPL 2022-01-07 173. 174. 171. 172. 86709100 172.
6 AAPL 2022-01-10 169. 172. 168. 172. 106765600 172.
Now, I'm attempting to add the change_on_day column (that's just the difference in the 'adjusted' prices between one day and the next) using the following:
prices$change_on_day <- diff(prices$adjusted)
The error message is:
Error: Assigned data `diff(prices$adjusted)` must be compatible with existing data.
x Existing data has 61 rows.
x Assigned data has 60 rows.
i Only vectors of size 1 are recycled.
How would I add this price difference column?
Thanks!
If you are trying to get today's value from the previous date value then you should be able to do that with the lag() function
prices %>%
mutate(change_on_day=adjusted-lag(adjusted,1))
We can use tq_transmute with quantmod::periodReturn setting the period argument to 'daily' in order to calculate daily returns.
library(tidyquant)
symbol <- "AAPL"
start_date <- as.Date("2022-01-01")
end_date <- as.Date("2022-03-31")
prices <- tq_get(symbol,
from = start_date,
to = end_date,
get = "stock.prices"
)
stock_returns_monthly <- prices %>%
tq_transmute(
select = adjusted,
mutate_fun = periodReturn,
period = "daily",
col_rename = "change_on_day"
)
stock_returns_monthly
#> # A tibble: 61 × 2
#> date change_on_day
#> <date> <dbl>
#> 1 2022-01-03 0
#> 2 2022-01-04 -0.0127
#> 3 2022-01-05 -0.0266
#> 4 2022-01-06 -0.0167
#> 5 2022-01-07 0.000988
#> 6 2022-01-10 0.000116
#> 7 2022-01-11 0.0168
#> 8 2022-01-12 0.00257
#> 9 2022-01-13 -0.0190
#> 10 2022-01-14 0.00511
#> # … with 51 more rows
Created on 2022-04-18 by the reprex package (v2.0.1)
For more information check this vignette

Generalize "$-notation"

I'm still getting used to working in R and thought constructing a "simple" MACD-screener would be a great way to get into some of the inner workings of R. However, I have encountered the following problem.
I've perfectly been able to calculate te MACD and signal line for a seperate stock. So now, in order to be able to scan multiple stocks, I have to generalize the code. My question in: "How can I use a variable (f.e. name of the stock currently being looked at) in the "$-notation"?
After this I'm planning to do a "for loop" iterating over the names of stocks in a list-object. Is this a practical way of doing it?
Below I've inserted the code I have till now. In this code I'm looking to replace the "QQQ" with a variable.
library(quantmod)
tickers <- c('QQQ','SPY','APPL','MMM')
ema.s = 12
ema.l = 26
ema.k = 9
ema.t = 200
getSymbols(tickers, from = '2021-01-6',
to = "2021-10-21",warnings = FALSE,
auto.assign = TRUE)
QQQ$QQQ.EMA.S <- EMA(QQQ[,6], n = ema.s)
QQQ$QQQ.EMA.L <- EMA(QQQ[,6], n = ema.l)
QQQ$QQQ.MACD <- QQQ$QQQ.EMA.S - QQQ$QQQ.EMA.L
QQQ$QQQ.SIG <- EMA(QQQ$QQQ.MACD, n = ema.k)
You can use tidyquant to all of this in one go.
library(tidyquant)
ema.s = 12
ema.l = 26
tickers <- c('QQQ','SPY','AAPL','MMM')
# get all the data in a tibble
stock_data <- tq_get(tickers,
from = '2021-01-6',
to = "2021-10-21")
stock_data <- stock_data %>%
group_by(symbol) %>%
tq_mutate(select = adjusted,
mutate_fun = MACD,
n_fast = ema.s,
n_slow = ema.l)
stock_data
# A tibble: 800 x 10
# Groups: symbol [4]
symbol date open high low close volume adjusted macd signal
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 QQQ 2021-01-06 307 312. 306. 308. 52809600 306. NA NA
2 QQQ 2021-01-07 310. 316. 310. 315. 30394800 314. NA NA
3 QQQ 2021-01-08 317. 319. 315. 319. 33955800 318. NA NA
4 QQQ 2021-01-11 316. 317. 314. 314. 32746400 313. NA NA
5 QQQ 2021-01-12 314. 316. 311. 314. 29266800 313. NA NA
6 QQQ 2021-01-13 314. 317. 314. 316. 22898400 315. NA NA
7 QQQ 2021-01-14 316. 318. 314. 314. 23500100 313. NA NA
8 QQQ 2021-01-15 314. 315. 311. 312. 35118700 311. NA NA
9 QQQ 2021-01-19 314. 317. 313. 316. 24537000 315. NA NA
10 QQQ 2021-01-20 320. 325. 317. 324. 30728100 323. NA NA
If you want to do this in base R functions combined with only quantmod functions, check the quantmod tag, there are a few posts that use lapply to do this. If you don't find what you need, let me know.

Mutate while accessing value in list column in a pipe with map and pluck

I would like to achieve the following:
filter dataframe catalogs based on multiple columns in dataframe orders, for each row in dataframe orders and store the result in a list column in dataframe orders. (succeeded)
calculate the difference between a date in data frame orders and another date in the new listcolumn.
Table s_orders contains order data for different people (account keys). Table s_catalogs contains all catalogs that were sent to each account key
For each order, I want to know:
if and what catalogs were sent from the previous order (or the beginning) until the day before the focal order. More specifically, consumers received a (paper) catalog at s_catalogs$CATDATE. I want to know for each order what catalogs were received between the previous order (s_orders$PREVORDER) and the latest order. Because some consumers do not have a previous order I set the previous order date startdate to date("1999-12-31") which is the beginning of my dataset.
Then I want to do some calculations on the catalog data. (in this example: calculate the difference between date of a catalog and the order date)
For this, I have written a function getCatalogs, which takes the account key and two dates as input, and outputs a dataframe with the results from the other table. Would be much appreciated if someone has a better, more efficient solution? maybe with some sort of join?
I think my main problem is how to use mutate, pmap, pipes, pluck interchangeably for building complex queries on multiple tables.
My actual problem is outlined in sections Desired result and Problem.
# packages needed
library("dplyr")
library("lubridate")
library("purrr")
#library("tidyverse")
Example data
( i sampled some users from my data. s_ stands for 'sample')
# orders
s_orders <- structure(list(ACCNTKEY = c(2806, 2806, 2806, 3729, 3729, 3729,
3729, 4607, 4607, 4607, 4607, 4742, 11040, 11040, 11040, 11040,
11040, 17384), ORDDATE = structure(c(11325, 11703, 11709, 11330,
11375, 11384, 12153, 11332, 11445, 11589, 11713, 11333, 11353,
11429, 11662, 11868, 11960, 11382), class = "Date")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -18L))
# # A tibble: 18 x 2
# ACCNTKEY ORDDATE
# <dbl> <date>
# 1 2806 2001-01-03
# 2 2806 2002-01-16
# 3 2806 2002-01-22
# 4 3729 2001-01-08
# 5 3729 2001-02-22
# 6 3729 2001-03-03
# 7 3729 2003-04-11
# 8 4607 2001-01-10
# 9 4607 2001-05-03
# 10 4607 2001-09-24
# 11 4607 2002-01-26
# 12 4742 2001-01-11
# 13 11040 2001-01-31
# 14 11040 2001-04-17
# 15 11040 2001-12-06
# 16 11040 2002-06-30
# 17 11040 2002-09-30
# 18 17384 2001-03-01
# catalogs
s_catalogs <- structure(list(ACCNTKEY = c("2806", "2806", "4607", "2806", "4607",
"4607", "4607"), CATDATE = structure(c(11480, 11494, 11522, 11858,
11886, 12264, 12250), class = "Date"), CODE = c("2806/07/2001",
"2806/21/2001", "4607/19/2001", "2806/20/2002", "4607/18/2002",
"4607/31/2003", "4607/17/2003")), row.names = c(NA, -7L), class = c("tbl_df",
"tbl", "data.frame"))
# # A tibble: 7 x 3
# ACCNTKEY CATDATE CODE
# <chr> <date> <chr>
# 1 2806 2001-06-07 2806/07/2001
# 2 2806 2001-06-21 2806/21/2001
# 3 4607 2001-07-19 4607/19/2001
# 4 2806 2002-06-20 2806/20/2002
# 5 4607 2002-07-18 4607/18/2002
# 6 4607 2003-07-31 4607/31/2003
# 7 4607 2003-07-17 4607/17/2003
calculate the lagged order date
# calculate previous order date for each order in s_orders
s_orders<-s_orders %>%
group_by(ACCNTKEY) %>%
arrange(ORDDATE) %>%
mutate(PREVORDER=as_date(lag(ORDDATE)))
So now we know the previous order (if any)
Function getCatalogs (improvement appreciated)
So the below function getCatalogs returns a dataframe with the catalogs that were received by that account key before the order (or actually in between the last orders/catalogs that were received between startdate and enddate).
# in case _startdate_ is missing then I set it to some starting value
getCatalogs<-function(key,startdate,enddate){
if(is.na(startdate)){
startdate<-as_date(date("1999-12-31"))
}
tmp <- s_catalogs[s_catalogs$ACCNTKEY==key &
s_catalogs$CATDATE<enddate &
s_catalogs$CATDATE>=startdate,]
if (NROW(tmp)>0){
return(tmp)
}else{return(NA)}
}
Use the function
let's get for each order all catalogs in a listcolumn
# For each row in s_orders search in dataframe s_catalogs all catalogs that were received for that account key before the order date but after the previous order.
s_orders <- s_orders %>% as_tibble() %>%
mutate(catalogs =
pmap(c(list(ACCNTKEY),list(PREVORDER),list(ORDDATE)),.f= function(x,y,z){getCatalogs(x,y,z)}))
This line for example gets the date of the latest catalog, which is what i need:
s_orders %>% pluck("catalogs") %>% pluck(13) %>% pluck("CATDATE") %>% max()
# [1] "2001-06-21"
Desired result:
Now I would like to retrieve the number of days between the above date and the date of the order (ORDDATE). The following code does it exactly but it is only correct in row 13.
# get amount of days since last catalog
s_orders3 <- s_orders %>%
mutate(diff = ORDDATE - s_orders %>%
pluck("catalogs") %>% pluck(13) %>% pluck("CATDATE") %>% max())
# # A tibble: 18 x 5
# ACCNTKEY ORDDATE PREVORDER catalogs diff
# <dbl> <date> <date> <list> <time>
# 1 2806 2001-01-03 NA <lgl [1]> -169 days
# 2 3729 2001-01-08 NA <lgl [1]> -164 days
# 3 4607 2001-01-10 NA <lgl [1]> -162 days
# 4 4742 2001-01-11 NA <lgl [1]> -161 days
# 5 11040 2001-01-31 NA <lgl [1]> -141 days
# 6 3729 2001-02-22 2001-01-08 <lgl [1]> -119 days
# 7 17384 2001-03-01 NA <lgl [1]> -112 days
# 8 3729 2001-03-03 2001-02-22 <lgl [1]> -110 days
# 9 11040 2001-04-17 2001-01-31 <lgl [1]> -65 days
# 10 4607 2001-05-03 2001-01-10 <lgl [1]> -49 days
# 11 4607 2001-09-24 2001-05-03 <tibble [1 × 3]> 95 days
# 12 11040 2001-12-06 2001-04-17 <lgl [1]> 168 days
# 13 2806 2002-01-16 2001-01-03 <tibble [2 × 3]> 209 days
# 14 2806 2002-01-22 2002-01-16 <lgl [1]> 215 days
# 15 4607 2002-01-26 2001-09-24 <lgl [1]> 219 days
# 16 11040 2002-06-30 2001-12-06 <lgl [1]> 374 days
# 17 11040 2002-09-30 2002-06-30 <lgl [1]> 466 days
# 18 3729 2003-04-11 2001-03-03 <lgl [1]> 659 days
Check manually:
date("2002-01-16")-date("2001-06-21")
# Time difference of 209 days
Problem
However, the code subtracts the same date from order date in every row. I want it to use the date that belongs to each particular row.
So the problem is how to replace the %>% pluck(13) %>% by some command that dows this trick to every row and put it in the diff column.
I am really searching for a solution that uses either purrr or dplyr or some other package that is just as efficient and clear.
Hoping that I have understood the question clearly, here is my attempt trying to solve the problem. I changed the getCatalogs function to return only max CATDATE in case if it is present.
library(dplyr)
library(purrr)
getCatalogs<-function(key,startdate,enddate){
if(is.na(startdate)) startdate<- as.Date("1999-12-31")
tmp <- s_catalogs$CATDATE[s_catalogs$ACCNTKEY==key &
s_catalogs$CATDATE<enddate &
s_catalogs$CATDATE>=startdate]
if (length(tmp) > 0) max(tmp) else NA
}
s1_orders<- s_orders %>%
group_by(ACCNTKEY) %>%
arrange(ORDDATE) %>%
mutate(PREVORDER=lag(ORDDATE))
and then use pmap like :
s1_orders %>%
mutate(catalogs = pmap_dbl(list(ACCNTKEY,PREVORDER,ORDDATE), getCatalogs),
catalogs = as.Date(catalogs, origin = "1970-01-01"),
diff = ORDDATE - catalogs)
# ACCNTKEY ORDDATE PREVORDER catalogs diff
# <dbl> <date> <date> <date> <drtn>
# 1 2806 2001-01-03 NA NA NA days
# 2 3729 2001-01-08 NA NA NA days
# 3 4607 2001-01-10 NA NA NA days
# 4 4742 2001-01-11 NA NA NA days
# 5 11040 2001-01-31 NA NA NA days
# 6 3729 2001-02-22 2001-01-08 NA NA days
# 7 17384 2001-03-01 NA NA NA days
# 8 3729 2001-03-03 2001-02-22 NA NA days
# 9 11040 2001-04-17 2001-01-31 NA NA days
#10 4607 2001-05-03 2001-01-10 NA NA days
#11 4607 2001-09-24 2001-05-03 2001-07-19 67 days
#12 11040 2001-12-06 2001-04-17 NA NA days
#13 2806 2002-01-16 2001-01-03 2001-06-21 209 days
#14 2806 2002-01-22 2002-01-16 NA NA days
#15 4607 2002-01-26 2001-09-24 NA NA days
#16 11040 2002-06-30 2001-12-06 NA NA days
#17 11040 2002-09-30 2002-06-30 NA NA days
#18 3729 2003-04-11 2001-03-03 NA NA days
Update
Without changing the current getCatalogs function, we can test the length of catalogs
s1_orders %>%
mutate(catalogs = pmap(list(ACCNTKEY,PREVORDER,ORDDATE), getCatalogs),
temp = map_dbl(catalogs, ~if (length(.x) > 1)
.x %>% pluck("CATDATE") %>% max else NA),
temp = as.Date(temp, origin = "1970-01-01"),
diff = ORDDATE - temp)

Reconstructing daily values backwards using a final value in dplyr

This should be pretty straightforward to do in R using dplyr, but I am a bit stuck as to how exactly to do this.
I have aggregated by day a set of transaction revenues, and I want to calculate the daily balance using a final balance. In Excel this would be a trivial formula to do - input the first balance and then for subsequent rows subtract the daily revenues from the previous row's balance.
I am trying to do this in dplyr and keep hitting a wall. Any advice as to how I could achieve this would be great. I'm pretty sure you need to use lag() but I can't seem to figure out exactly how.
Sample data:
library(tidyverse)
x <- tibble(date = c('2018-04-03','2018-04-02','2018-04-01','2018-03-31','2018-03-30'),
daily_spend = c(575,-5.26,-112.45,-73.5,25.6))
final_balance <- 1000
Here's what the data looks like:
# A tibble: 5 x 2
date daily_spend
<chr> <dbl>
1 2018-04-03 575.
2 2018-04-02 -5.26
3 2018-04-01 -112.
4 2018-03-31 -73.5
5 2018-03-30 25.6
What I would like to do would be to add an additional column 'balance' and for each row have the value be the previous balance minus the daily spend, to give a daily spend.
Here are some expected values:
# A tibble: 5 x 3
date daily_spend end_balance
<chr> <dbl> <dbl>
1 2018-04-03 575. 1000
2 2018-04-02 -5.26 425
3 2018-04-01 -112. 430.
4 2018-03-31 -73.5 542.71
5 2018-03-30 25.6 616.21
Here's what I have been trying, which doesn't work beyond the first two rows (due to the nature of the way lag() works):
x <- x %>%
mutate(end_balance = ifelse(row_number() ==1,final_balance,0),
end_balance = ifelse(row_number()>1,lag(end_balance)-lag(daily_spend),end_balance))
The results of this method:
# A tibble: 5 x 3
date daily_spend end_balance
<chr> <dbl> <dbl>
1 2018-04-03 575. 1000.
2 2018-04-02 -5.26 425.
3 2018-04-01 -112. 5.26
4 2018-03-31 -73.5 112.
5 2018-03-30 25.6 73.5
Here you go
mutate(x, end_balance = final_balance - cumsum(daily_spend) + daily_spend)
Use final_balance subtract the cumulative sum of daily_spend (lagged):
x %>%
mutate(end_balance = final_balance - cumsum(lag(daily_spend, default = 0))) %>%
as.data.frame()
# date daily_spend end_balance
#1 2018-04-03 575.00 1000.00
#2 2018-04-02 -5.26 425.00
#3 2018-04-01 -112.45 430.26
#4 2018-03-31 -73.50 542.71
#5 2018-03-30 25.60 616.21

Resources