Using ifelse to create a running tally in R - r

I am trying to do some quantitative modeling in R. I'm not getting an error message, but the results are not what I actually need.
I am a newbie, but here is my complete code sample.
`library(quantmod)
#Building the data frame and xts to show dividends, splits and technical indicators
getSymbols(c("AMZN"))
Playground <- data.frame(AMZN)
Playground$date <- as.Date(row.names(Playground))
Playground$wday <- as.POSIXlt(Playground$date)$wday #day of the week
Playground$yday <- as.POSIXlt(Playground$date)$mday #day of the month
Playground$mon <- as.POSIXlt(Playground$date)$mon #month of the year
Playground$RSI <- RSI(Playground$AMZN.Adjusted, n = 5, maType="EMA") #can add Moving Average Type with maType =
Playground$MACD <- MACD(AMZN, nFast = 12, nSlow = 26, nSig = 9)
Playground$Div <- getDividends('AMZN', from = "2007-01-01", to = Sys.Date(), src = "google", auto.assign = FALSE)
Playground$Split <- getSplits('AMZN', from = "2007-01-01", to = Sys.Date(), src = "google", auto.assign = FALSE)
Playground$BuySignal <- ifelse(Playground$RSI < 30 & Playground$MACD < 0, "Buy", "Hold")
All is well up until this point when I start using some logical conditions to come up with decision points.
Playground$boughts <- ifelse(Playground$BuySignal == "Buy", lag(Playground$boughts) + 1000, lag(Playground$boughts))
It will execute but the result will be nothing but NA. I suppose this is because you are trying to add NA to a number, but I'm not 100% sure. How do you tell the computer I want you to keep a running tally of how much you have bought?
Thanks so much for the help.

So we want ot buy 1000 shares every time a buy signal is generated?
Your problem stems from MACD idicator. It actually generates two columns, macd and signal. You have to decide which one you want to keep.
Playground$MACD <- MACD(AMZN, nFast = 12, nSlow = 26, nSig = 9)$signal
This should solve the problem at hand.
Also, please check the reference for ifelse. The class of return value can be tricky at times, and so the approach suggested by Floo0 is preferable.
Also, I'd advocate using 1 and 0 instead of buy and sell to show weather you are holding . It makes the math much easier.
And I'd strongly suggest reading some beginner tutorial on backtesting with PerformanceAnalytics. They make the going much much easier.
BTW, you missed this line in the code:
Playground$boughts<- 0
Hope it helps.
EDIT: And I forgot to mention the obvious. discard the first few rows where MACD will be NA
Something like:
Playground<- Playground[-c(1:26),]

Whenever you want to do an ifelse like
if ... Do something, else stay the same: Do not use ifelse
Try this instead
ind <- which(Playground$BuySignal == "Buy")
Playground$boughts[ind] <- lag(Playground$boughts) + 1000

Related

How can I get specific data out of a list (with lists) under a certain condition

I have lists in lists and would like to take the activities that occur just within the first 600 seconds (journey time < 600). The "journey time" starts with 0 and adds the time of corresponding activity "code" on top.
homepage1[["customer_data"]][["activity_list"]][[i]][["journey_time"]]
homepage1[["customer_data"]][["activity_list"]][[i]][["code"]]
So for example [["journey_time"]] could look like this 0, 46.7, 79.4, ...., 1800.
[["code"]] looks like StartPage, ClickItem1, ScrollItem1, ..., ClosePage.
"i" are the customers here.
I tried it for each customer alone, but I, of course, would prefer an iterative process with loops.
Thank you in advance! Appreciate it much!
Marius
Your data structure looks like a JSON -> R conversion, so in case you have the original JSON you may not need to convert it to JSON first.
Having said this, you can use fromJSON with flatten = TRUE to get the relevant data in a nice data.frame format, which makes the processing much easier:
Data
homepage1 <- list(
customer_data = list(
activity_list = list(
list(
list(journey_time = 0, code = "StartPage"),
list(journey_time = 46.7, code = "ClickItem1"),
list(journey_time = 79.4, code = "ScrollItem1"),
list(journey_time = 1800, code = "ClosePage")
)
)
)
)
library(jsonlite)
(mdat <- fromJSON(toJSON(homepage1), flatten = TRUE))
# $customer_data
# $customer_data$activity_list
# $customer_data$activity_list[[1]]
# journey_time code
# 1 0 StartPage
# 2 46.7 ClickItem1
# 3 79.4 ScrollItem1
# 4 1800 ClosePage
So all you need to do is to use cumsum on column journey_time (assuming that each timing measures the time spent on the element since the last visit and not from the beginning, if the later is true you do not need cumsum) to get cumulative timings and use that as a filter:
idx <- cumsum(mdat$customer_data$activity_list[[1]]$journey_time) < 600
unlist(mdat$customer_data$activity_list[[1]]$code[idx])
# [1] "StartPage" "ClickItem1" "ScrollItem1"
Now it is easy to loop over all customers like this:
lapply(mdat$customer_data$activity_list, function(al) {
idx <- cumsum(al$journey_time) < 600
unlist(al$code[idx])
})

Error in xy.coords when trying to fit ARIMA model, please advise

I'm hoping you might be able to help me with an issue that I'm having when trying to fit an ARIMA model for a school project that I`m working on.
The data that I'm using shows weekly sales figures starting from 2019 and going till 2021. My goal is to produce a forecast for the remainder of 2021 based on those figures. As my dataset is comprised of weekly data and the seasonality based on the ACF and PACF plots seems to occur once a year I've set the "S =" argument from the sarima() function to 52. The problem is that every time I try to run the model, I keep getting an error and I can't figure out any way of getting rid of it.
I've tried to use the same code with other data sets on the datacamp environment with "S = 52" and the model runs without a problem. I'm hoping that somebody might be able to give me some advice on how to deal with this issue. Thank you!
P.S.
If the "S =" argument is set lower than 35 then the model will run. (Just in case this information might help)
####Install Packages####
library(tidyverse)
library(zoo)
library(xts)
library(lubridate)
library(astsa)
library(tseries)
library(forecast)
######Load and inspect the data########
unit_sales <- structure(list(Date = c("30/03/2019", "06/04/2019", "13/04/2019",
"20/04/2019", "27/04/2019", "04/05/2019", "11/05/2019", "18/05/2019",
"25/05/2019", "01/06/2019", "08/06/2019", "15/06/2019", "22/06/2019",
"29/06/2019", "06/07/2019", "13/07/2019", "20/07/2019", "27/07/2019",
"03/08/2019", "10/08/2019", "17/08/2019", "24/08/2019", "31/08/2019",
"07/09/2019", "14/09/2019", "21/09/2019", "28/09/2019", "05/10/2019",
"12/10/2019", "19/10/2019", "26/10/2019", "02/11/2019", "09/11/2019",
"16/11/2019", "23/11/2019", "30/11/2019", "07/12/2019", "14/12/2019",
"21/12/2019", "28/12/2019", "04/01/2020", "11/01/2020", "18/01/2020",
"25/01/2020", "01/02/2020", "08/02/2020", "15/02/2020", "22/02/2020",
"29/02/2020", "07/03/2020", "14/03/2020", "21/03/2020", "28/03/2020",
"04/04/2020", "11/04/2020", "18/04/2020", "25/04/2020", "02/05/2020",
"09/05/2020", "16/05/2020", "23/05/2020", "30/05/2020", "06/06/2020",
"13/06/2020", "20/06/2020", "27/06/2020", "04/07/2020", "11/07/2020",
"18/07/2020", "25/07/2020", "01/08/2020", "08/08/2020", "15/08/2020",
"22/08/2020", "29/08/2020", "05/09/2020", "12/09/2020", "19/09/2020",
"26/09/2020", "03/10/2020", "10/10/2020", "17/10/2020", "24/10/2020",
"31/10/2020", "07/11/2020", "14/11/2020", "21/11/2020", "28/11/2020",
"05/12/2020", "12/12/2020", "19/12/2020", "26/12/2020", "02/01/2021",
"09/01/2021", "16/01/2021", "23/01/2021", "30/01/2021", "06/02/2021",
"13/02/2021", "20/02/2021", "27/02/2021", "06/03/2021", "13/03/2021",
"20/03/2021", "27/03/2021"), Units = c(967053.4, 633226.9, 523264,
473914.2, 418087.5, 504342.2, 477819, 415650, 406972.3, 429791.4,
441724.4, 453221.8, 402005.8, 414993.4, 381457.2, 391218.7, 486925.9,
409791.8, 399217.9, 409210, 478121.2, 495549.1, 503918.3, 535949.5,
517450.4, 523036.8, 616456.9, 665979.3, 705201.5, 700168.1, 763538.8,
875501.2, 886586.6, 967806, 1094195, 1285950.5, 1450436.1, 1592162.8,
2038160.5, 1676988.8, 1026193.7, 820405.5, 738643.9, 669657.6,
720287.7, 673194.1, 754102.5, 639532, 680413.6, 710702, 711722.8,
834036.8, 427817.2, 505849.6, 441047.4, 439411, 487634.1, 594594.8,
548796.7, 565682, 528275.2, 448092, 467780.1, 544160.3, 538275.8,
485055.5, 592097.3, 537514.3, 493381.9, 445280.8, 448111.2, 419263.4,
457125.7, 561169.6, 704575.3, 656423.1, 653751.3, 622937.7, 718022.8,
768901.9, 793443, 814604.2, 876269.3, 982921.8, 1064920.7, 1201494.4,
1337374.9, 1619595.8, 1734773.8, 1624071, 1777832.3, 1648201.9,
1106253.8, 940141.1, 796129.1, 853392.9, 932059.1, 905990.4,
981188.6, 907823.9, 956098.8, 1003966.7, 1331125.5, 805593.6,
799486.2)), class = "data.frame", row.names = c(NA, -105L))
####Convert date column to date format
unit_sales$Date <- as.Date(unit_sales$Date, format ="%d/%m/%Y" )
###Convert to xts object
unit_sales_xts <- xts(unit_sales, unit_sales$Date)
periodicity(unit_sales_xts)
###Convert to ts object
unit_sales_vector <- unit_sales$Units
unit_sales_ts <- ts(unit_sales_vector, start = decimal_date(as.Date("2019-03-30")), frequency = 52)
###Plot data
ts.plot(unit_sales_ts)
###Make data stationary and plot it
ts.plot(diff(log(unit_sales_ts)))
###Plot ACF and PACF
pacf_plot <- pacf(diff(log(unit_sales_ts)), lag.max = 105)
acf_plot <- acf(diff(log(unit_sales_ts)), lag.max = 105)
###Test if data is stationary
adf.test(diff(log(unit_sales_ts)))
###Fit ARIMA model
sarima(unit_sales_ts, p = 1, d = 1, q = 0)
sarima.for(unit_sales_ts, n.ahead = 39, 1,1,0)
**###Fit Seasona ARIMA model - THIS IS WHERE THE ERROR OCCURS -**
sarima(unit_sales_ts, p = 1, d = 1, q = 0, P = 0, D = 1, Q = 0, S = 52)
###Forecast using the above model
sarima.for(unit_sales_ts,n.ahead = 39, p = 1, d = 1, q = 0, P = 0, D = 1, Q = 0, S = 52)
I tested you code and get the same error, so I read into the astsa::sarima() implementation and found these two lines, concerning the use of seasonality and your data:
alag <- max(10 + sqrt(num), 3 * S)
nlag <- ifelse(S < 7, 20, 3 * S)
Without reading the whole implementation, I deduce, that the package creator suposes 3 times the season size for the parameter to work correctly. Which is not your case with 105 observation when using S = 52. Now if that is a bug or just not well documented or properly treated in the code, I can not tell you. I do not know which version of the package datacamp runs and what is the update history of the package itself. But we can assume that at least one of the two lines causes the error since all values from 35 for S cause the same error.
One way to work arround is printing the implementation code of the function to console (just write "astsa::sarima" and hit enter, without the " though), copy it to modify the lines (I tried to use 2 * instead of 3 *) and assing it to a function name of your own. Then the code runs. Also you could try the print at the datacamp environment and compare to you local installation.

R Programming Random Stock Pick

I stuck in a problem with R Programming.
My aim is to randomly select 2 stocks out of the Swiss Market Index, which contains of 30 stocks.
Until now I solved the random pick of the 2 stocks with the following code:
SMI_components <- cbind("ABB (ABBN.VX)", "ADECCO (ADEN.VX)", "ACTELION (ATLN.VX)", "JULIUS BAER GRP (BAER.VX)", "RICHEMONT (CFR.VX)", "CREDIT SUISSE (CSGN.VX)", "GEBERIT (GEBN.VX)", "GIVAUDAN (GIVN.VX)", "HOLCIM (HOLN.VX)", "NESTLE (NESN.VX)", "NOVARTIS (NOVN.VX)", "TRANSOCEAN (RIGN.VX)", "ROCHE HOLDING (ROG.VX)", "SWISSCOM (SCMN.VX)", "SGS (SGSN.VX)", "SWISS RE (SREN.VX)", "SYNGENTA (SYNN.VX)", "UBS (UBSG.VX)", "SWATCH GROUP (UHR.VX)", "ZURICH INSURANCE GROUP (ZURN.VX)")
for(i in 1:1){
print(sample(SMI_components, 2))
}
How do I continue my code, if I want to download the historical data from these two random picked stocks?
For example, the random selection is:
"NOVARTIS (NOVN.VX)" and "ZURICH INSURANCE GROUP (ZURN.VX)"
how to continue that ...
SMI_NOVARTIS <- yahooSeries ("NOVN.VX", from = "2005-01-01", to = "2015-07-30", frequency = "daily")
SMI_ZURICH <- yahooSeries ("ZURN.VX", from = "2005-01-01", to = "2015-07-30", frequency = "daily")
I would really appreciate your help
Regards
print outputs to the console but doesn't store anything. So the first thing to do is assign the output of sample into a variable.
my_picks <- sample(SMI_components, 2)
Extract ticker symbol between parens (courtesy the comment below):
my_picks <- sub(".*\\((.*)\\).*", "\\1", my_picks)
Then you can use lapply, to call a function (yahooSeries) for each value in my_picks.
series_list <- lapply(my_picks, yahooSeries, from = "2005-01-01", to = "2015-07-30", frequency = "daily")
Then you'll get the output in a list. series_list[[1]] will have the output of yahooSeries for the first value of my_picks, and series_list[[2]] for the second
Lastly, not sure why you bothered with the single-iteration for loop, but you don't need that

Relative Performance loop in R

I'm not a programmer by any means and have been trying to learn R to code various trading strategies. I'm trying to calculate the relative performance of a list of stocks versus the S&P 500 and save it to a matrix. It appears that what I've written only goes through the first symbol and then stops. Below is the code that I've come up with. I appreciate any help, input and advice on how to proceed. Thank you.
library(quantmod)
library(PerformanceAnalytics)
Sys.setenv(TZ = "UTC")
symbols <- c('IBM', 'GE', '^GSPC')
getSymbols(symbols, src = "yahoo", from = "2010-12-31", to = Sys.Date())
symadj <- cbind(IBM[,6], GE[,6])
sp5adj <- GSPC[,6]
# Calculate Relative Performance vs S&P and save data
for (i in length(symadj)) {
rp <- matrix(symadj[,1]/sp5adj, nrow = 1070, ncol = 3)
print(tail(rp))
}
_You are not looping over an array but over a single number:
for (i in length(symadj))
Try (see the seq added, watch the parenthesis. Plus, be careful with length, the iteration is over ncol - i.e. the columns):
for (i in seq(1,ncol(rp),1))
_Also, you are going always through the same column:
rp <- matrix(symadj[,1]/sp5adj, nrow = 1070, ncol = 3)
_A thing I skipped: you should build your matrix before the loop:
rp <- matrix(0,nrow=1071,ncol=2)
And then assign without overwritting your previous matrix - you have already build it (plus, look at the i where the 1 was, now you are iterating)
rp[,i] <- symadj[,i]/sp5adj #This inside the loop
_Your for loop should end up looking something like this:
rp <- matrix(0,nrow=1071,ncol=2)
for (i in seq(1,ncol(rp),1)) {
rp[,i] <- symadj[,i]/sp5adj #This inside the loop
print(tail(rp))
}
\!/ Now there are 1071 days in that period, so the matrix should have one more row - that's why the 1071.

Avoid for-loop: Define blocks of actions within a time range

I need to define blocks of actions - so I want to group together all actions for a single id that take place less than 30 days since the last action. If it's more than 30 days since the last action, then I'd increment the label by one (so label 2, 3, 4...). Every new id would start at 1 again.
Here's the data:
dat = data.frame(cbind(
id = c(rep(1,2), rep(16,3), rep(17,24)),
##day_id is the action date in %Y%m%d format - I keep it as numeric but could potentially turn to a date.
day_id = c(20130702, 20130121, 20131028, 20131028, 20130531, 20140513, 20140509,
20140430, 20140417, 20140411, 20140410, 20140404,
20140320, 20140313, 20140305, 20140224, 20140213, 20140131, 20140114,
20130827, 20130820, 20130806, 20130730, 20130723,
20130719, 20130716, 20130620, 20130620, 20130614 ),
###diff is the # of days between actions/day_ids
diff =c(NA,162,NA,0,150,NA,4,9,13,6,1,6,15,7,8,9,11,13,17,140,7,14,
7,7,4,3,26,0,6),
###Just a flag to say whether it's a new id
new_id = c(1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
))
I've done it with a for loop and managed to avoid loops within loops (see below) but can't seem to get rid of that outer loop. Of course, it gets extremely slow with thousands of ids. In the example below, 'call_block' is what I'm trying to reproduce but without the for loop. Can anyone help me get this out of a loop??
max_days = 30
r = NULL
for(i in unique(dat$id)){
d = dat$diff[dat$id==i]
w = c(1,which(d>=max_days) , length(d)+1)
w2 = diff(w)
r = c(r,rep(1:(length(w)-1), w2))
}
dat$call_block = r
Thank you!
Posting #alexis_laz's answer here to close out the question
library(data.table)
f = function(x){
ret = c(1, cumsum((x >= 30)[-1]) + 1)
return(ret = ret)
}
df = data.table(dat)
df2 = df[,list(call_block= f(diff)), by = id]

Resources