time series in R, unwanted variable class changes - r

I'm trying to program the Coppock Curve in R and finding time series exceedingly difficult to work with in R. The S&P 500 data can be downloaded from finance.yahoo.com. Just bring in the date and the adjusted close.
sp500 = read.csv(file="/.../sp500.csv",header=TRUE)
attach(zoo)
sp500.z = zoo(sp500)
lag11 = lag(sp500.z$SP500, -11, na.pad=TRUE)
lag14 = lag(sp500.z$SP500, -14, na.pad=TRUE)
sp500.z = cbind(sp500.z, lag11, lag14)
str(sp500.z)
sp500.z[1:25,]
data = (sp500.z)
data[1:25,] ### everything looks good up to here
str(data)
data = as.data.frame(data) ### problem arises here, everything becomes factor even if it wasn't before, so I try to convert, but it doesn't work
data$SP500 = as.numeric(data$SP500)
data$lag11 = as.numeric(data$lag11)
data$lag14 = as.numeric(data$lag14)
data$date = as.Date(data$date)
In order to do further data manipulation I need to convert to a data frame, because you cannot attach a zoo matrix or perform dataset$variable operations on it. When I convert to data frame the lag11 and lag14 variables turn into index numbers. The data frame conversion makes everything a factor, and when the variable types are corrected the problem occurs.
The Coppock Curve is calculated as a 10-month weighted moving average of the sum of the 14-month rate of change and the 11-month rate of change for the index.
Coppock Curve = 10-month weighted MA(of 14-month ROC + 11-month ROC)
Where the ROC is:
ROC = [(Close - Close n periods ago) / (Close n periods ago)] * 100
where n is 11 and 14. The weights on the ROC terms go backwards in time from 10/55 for period t, 9/55 for t-1,..., 1/55 for t-9.

You do not need to convert to a data.frame. While you cannot use $ on a matrix, you can use it on zoo and xts objects. And you really shouldn't be using attach, especially if this is something you plan to put into a reusable script.
What you want to do is very easy with xts/zoo, quantmod, and TTR.
library(quantmod) # also loads TTR, xts, and zoo
# download data from Yahoo Finance
sp500 <- getSymbols("^GSPC", auto.assign=FALSE)
# convert to monthly
sp500m <- to.monthly(sp500)
# add lags (via $<-, like you claimed couldn't be done)
sp500m$lag11 <- ROC(Ad(sp500m), n=11, type="discrete")
sp500m$lag14 <- ROC(Ad(sp500m), n=14, type="discrete")
# calculate Coppock Curve
sp500m$Coppock <- WMA(sp500m$lag11 + sp500m$lag14, n=10, wts=(9:1)/55)

One option would be not to put the data into zoo format and then use the lag() function in dplyr instead. So:
library(dplyr)
sp500 = read.csv(file="/.../sp500.csv",header=TRUE)
sp500.v2 <- sp500 %>%
mutate(lag11 = lag(SP500, 11),
lag14 = lag(SP500, 14))
Are these data grouped somehow, like maybe by ticker symbol? If so, you could accommodate that like this:
sp500.v2 <- sp500 %>%
group_by([grouping variable name, no quotes]) %>%
mutate(lag11 = lag(SP500, 11),
lag14 = lag(SP500, 14))
And if the data aren't pre-sorted by date or you want to make sure that's done before you lag, you can use arrange() like so:
sp500.v2 <- sp500 %>%
group_by([grouping variable name, no quotes]) %>%
arrange([date variable]) %>%
mutate(lag11 = lag(SP500, 11),
lag14 = lag(SP500, 14))

Related

create and plot a cumulative probability density function with custom bin # and sizes of stock price ROC in R

I want to import daily stock market price data into R from any ticker, and examine one historical time segment of it. Then, from this segment, convert these prices into daily ROC/rateofchange % changes. Next, take this ROC series and create a cumulative probability density function which allows me to set any custom number of sorting bins, and any size limit for each bin. example: 22 bins with .3% limit. Next, plot this CPDF as either a histogram or a scatterplot. The final step would be to do this for 2 different sections of the same stock and plot them next to each other for visual inspection. I have started a code on stock ticker SPY, but I cannot get it to work.
library(quantmod)
library(tidyquant)
library(tidyverse)
# using tidyverse to import a ticker
spy <- tq_get("spy")
spy010422 <- tq_get("spy", get ="stock.prices", from ='2022-01-04', to = '2022-01-24')
str(spy010422)
# getting ROC between prices in the series
spy010422.rtn = ROC(spy010422$close, n = 1, type = c("discrete"), na.pad = TRUE)
str(spy010422.rtn)
# trying to use ggplot and tibble to create an ECDF function
spy010422.rtn %>%
tibble() %>%
ggplot() +
stat_ecdf(aes(.))
# another attempt at running ECDF on the ROC series
spy010422.rtn %>%
ggplot(spy010422.rtn) +
stat_ecdf(aes(close))
# trying to set the number of bins and bin size for the ECDF
spy010422.rtn %>%
mutate(rounded = round(close/.3, 0) *.3,
bin = min_rank(rounded)) %>%
ggplot(aes(close, bin)) +
geom_line()
# next time segment of the ticker spy to compare this to
spy020222 <- tq_get("spy", get ="stock.prices", from ='2022-02-02', to = '2022-02-24')
I couldn't understand what exacly you wanted to plot. Normally a CPDF is just a continuous line, and doesn't have bins to customise. Also "plot this CPDF as either a histogram or a scatterplot" is a weird prhase to me, as one normally plots the histogram/scatterplot of the variable, not of the CPDF of the variable. Given that, I made a function that plots the histogram of the ROC of the ticker, and you can coment if that was what you wanted or not.
The function takes a list of dates in the format list(c(from1, to1), c(from2, to1), ...) (you can add as many intervals as you want), and loops for each interval on this list (with the purrr::map function). For each interation, it creates the histogram costumizing the bins argument. After the loop, the graphs are binded in one figure using the ggpubr::ggarrange function (you must run install.packages("ggpubr") if you don't have the package installed).
library(quantmod)
library(tidyquant)
library(tidyverse)
gg.roc.hist = function(ticker, dates, bins = 30){
map(dates, function(dates){ #loop for each interval in the 'dates' list
df = tq_get(ticker, get ="stock.prices", from = dates[1], to = dates[2]) #get the prices
df$roc = ROC(df$close, n = 1, type = c("discrete"), na.pad = TRUE) #add a column with the ROC
ggplot(df, aes(x = roc)) +
geom_histogram(bins = bins) + #create a histogram changing the bins
labs(title = paste0(dates[1], " to ", dates[2]))}) %>%
ggpubr::ggarrange(plotlist = .) #bind the graphs together
}
Runnig:
gg.roc.hist('spy', list(c('2022-01-04','2022-01-24'), c('2022-02-02', '2022-02-24')), 22)
Yields this graph:

Moving average on several time series using ggplot

Hi I try desperately to plot several time series with a 12 months moving average.
Here is an example with two time series of flower and seeds densities. (I have much more time series to work on...)
#datasets
taxon <- c(rep("Flower",36),rep("Seeds",36))
density <- c(seq(20, 228, length=36),seq(33, 259, length=36))
year <- rep(c(rep("2000",12),rep("2001",12),rep("2002",12)),2)
ymd <- c(rep(seq(ymd('2000-01-01'),ymd('2002-12-01'), by = 'months'),2))
#dataframe
df <- data.frame(taxon, density, year, ymd)
library(forecast)
#create function that does a Symmetric Weighted Moving Average (2x12) of the monthly log density of flowers and seeds
ma_12 <- function(x) {
ts_x <- ts(x, freq = 12, start = c(2000, 1), end = c(2002, 12)) # transform to time-series object as it is necessary to run the ma function
return(ma(log(ts_x + 1), order = 12, centre = T))
}
#trial of the function
ma_12(df[df$taxon=="Flower",]$density) #works well
library(ggplot2)
#Trying to plot flower and seeds log density as two time series
ggplot(df,aes(x=year,y=density,colour=factor(taxon),group=factor(taxon))) +
stat_summary(fun.y = ma_12, geom = "line") #or geom = "smooth"
#Warning message:
#Computation failed in `stat_summary()`:
#invalid time series parameters specified
Function ma_12 works correctly. The problem comes when I try to plot both time-series (Flower and Seed) using ggplot. I cannot define both taxa as different time series and apply a moving average on them. Seems that it has to do with "stat_summary"...
Any help would be more than welcome! Thanks in advance
Note: The following link is quite useful but can not directly help me as I want to apply a specific function and plot it in accordance to the levels of one group variable. For now, I can't find any solution. Any way, thank you to suggest me this.
Multiple time series in one plot
This is what you need?
f <- ma_12(df[df$taxon=="Flower", ]$density)
s <- ma_12(df[df$taxon=="Seeds", ]$density)
f <- cbind(f,time(f))
s <- cbind(s,time(s))
serie <- data.frame(rbind(f,s),
taxon=c(rep("Flower", dim(f)[1]), rep("Seeds", dim(s)[1])))
serie$density <- exp(serie$f)
library(lubridate)
serie$time <- ymd(format(date_decimal(serie$time), "%Y-%m-%d"))
library(ggplot2)
ggplot() + geom_point(data=df, aes(x=ymd, y=density, color=taxon, group=taxon)) +
geom_line(data=serie, aes(x= time, y=density, color=taxon, group=taxon))

ts.plot() not plotting Time Series data against custom x-axis

I am having issues with trying to plot some Time Series data; namely, trying to plot the date (increments in months) against a real number (which represents price).
I can plot the data with just plot(months, mydata) with no issue, but its in a scatter plot format.
However, when I try the same with ts.plot i.e. tsplot(months, mydata), I get the following error:
Error in .cbind.ts(list(...), .makeNamesTs(...), dframe = dframe, union = TRUE) : no time series supplied
I tried to bypass this by doing tsplot(ts(months, mydata)), but with this I get a straight linear line (which I know isn't correct).
I have made sure that both months and mydata have the same length
EDIT: What I mean by custom x-axis
I need the data to be in monthly increments (specifically from 03/1998 to 02/2018) - so I ran the following in R:
d <- seq(as.Date("1998-03-01"), as.Date("2018-02-01"), "day")
months <- seq(min(d), max(d), "month")
Now that I have attained the monthly increments, I need the above variable, months, to act as the x-axis for the Time Series plot (perhaps more accurately, the time index).
With package zoo you can do the following.
library(zoo)
z <- zoo(mydata, order.by = months)
labs <- seq(min(index(z)), max(index(z)), length.out = 10)
plot(z, xaxt = "n")
axis(1, at = labs, labels = format(labs, "%m/%Y"))
Data creation code.
set.seed(1234)
d <- seq(as.Date("1998-03-01"), as.Date("2018-02-01"), "day")
months <- seq(min(d), max(d), "month")
n <- length(months)
mydata <- cumsum(rnorm(n))

starting a daily time series in R

I have a daily time series about number of visitors on the web site. my series start from 01/06/2014 until today 14/10/2015 so I wish to predict number of visitor for in the future. How can I read my series with R? I'm thinking:
series <- ts(visitors, frequency=365, start=c(2014, 6))
if yes,and after runing my time series model arimadata=auto.arima() I want to predict visitor's number for the next 6o days, how can i do this?
h=..?
forecast(arimadata,h=..),
the value of h shoud be what ?
thanks in advance for your help
The ts specification is wrong; if you are setting this up as daily observations, then you need to specify what day of the year 2014 is June 1st and specify this in start:
## Create a daily Date object - helps my work on dates
inds <- seq(as.Date("2014-06-01"), as.Date("2015-10-14"), by = "day")
## Create a time series object
set.seed(25)
myts <- ts(rnorm(length(inds)), # random data
start = c(2014, as.numeric(format(inds[1], "%j"))),
frequency = 365)
Note that I specify start as c(2014, as.numeric(format(inds[1], "%j"))). All the complicated bit is doing is working out what day of the year June 1st is:
> as.numeric(format(inds[1], "%j"))
[1] 152
Once you have this, you're effectively there:
## use auto.arima to choose ARIMA terms
fit <- auto.arima(myts)
## forecast for next 60 time points
fore <- forecast(fit, h = 60)
## plot it
plot(fore)
That seems suitable given the random data I supplied...
You'll need to select appropriate arguments for auto.arima() as suits your data.
Note that the x-axis labels refer to 0.5 (half) of a year.
Doing this via zoo
This might be easier to do via a zoo object created using the zoo package:
## create the zoo object as before
set.seed(25)
myzoo <- zoo(rnorm(length(inds)), inds)
Note you now don't need to specify any start or frequency info; just use inds computed earlier from the daily Date object.
Proceed as before
## use auto.arima to choose ARIMA terms
fit <- auto.arima(myts)
## forecast for next 60 time points
fore <- forecast(fit, h = 60)
The plot though will cause an issue as the x-axis is in days since the epoch (1970-01-01), so we need to suppress the auto plotting of this axis and then draw our own. This is easy as we have inds
## plot it
plot(fore, xaxt = "n") # no x-axis
Axis(inds, side = 1)
This only produces a couple of labeled ticks; if you want more control, tell R where you want the ticks and labels:
## plot it
plot(fore, xaxt = "n") # no x-axis
Axis(inds, side = 1,
at = seq(inds[1], tail(inds, 1) + 60, by = "3 months"),
format = "%b %Y")
Here we plot every 3 months.
Time Series Object does not work well with creating daily time series. I will suggest you use the zoo library.
library(zoo)
zoo(visitors, seq(from = as.Date("2014-06-01"), to = as.Date("2015-10-14"), by = 1))
Here's how I created a time series when I was given some daily observations with quite a few observations missing. #gavin-simpson gave quite a big help. Hopefully this saves someone some grief.
The original data looked something like this:
library(lubridate)
set.seed(42)
minday = as.Date("2001-01-01")
maxday = as.Date("2005-12-31")
dates <- seq(minday, maxday, "days")
dates <- dates[sample(1:length(dates),length(dates)/4)] # create some holes
df <- data.frame(date=sort(dates), val=sin(seq(from=0, to=2*pi, length=length(dates))))
To create a time-series with this data I created a 'dummy' dataframe with one row per date and merged that with the existing dataframe:
df <- merge(df, data.frame(date=seq(minday, maxday, "days")), all=T)
This dataframe can be cast into a timeseries. Missing dates are NA.
nts <- ts(df$val, frequency=365, start=c(year(minday), as.numeric(format(minday, "%j"))))
plot(nts)
series <- ts(visitors, frequency=365, start=c(2014, 152))
152 number is 01-06-2014 as it start from 152 number because of frequency=365
To forecast for 60 days, h=60.
forecast(arimadata , h=60)

Convert absolute values to ranges for charting in R

Warning: still new to R.
I'm trying to construct some charts (specifically, a bubble chart) in R that shows political donations to a campaign. The idea is that the x-axis will show the amount of contributions, the y-axis the number of contributions, and the area of the circles the total amount contributed at this level.
The data looks like this:
CTRIB_NAML CTRIB_NAMF CTRIB_AMT FILER_ID
John Smith $49 123456789
The FILER_ID field is used to filter the data for a particular candidate.
I've used the following functions to convert this data frame into a bubble chart (thanks to help here and here).
vals<-sort(unique(dfr$CTRIB_AMT))
sums<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, sum)
counts<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, length)
symbols(vals,counts, circles=sums, fg="white", bg="red", xlab="Amount of Contribution", ylab="Number of Contributions")
text(vals, counts, sums, cex=0.75)
However, this results in way too many intervals on the x-axis. There are several million records all told, and divided up for some candidates could still result in an overwhelming amount of data. How can I convert the absolute contributions into ranges? For instance, how can I group the vals into ranges, e.g., 0-10, 11-20, 21-30, etc.?
----EDIT----
Following comments, I can convert vals to numeric and then slice into intervals, but I'm not sure then how I combine that back into the bubble chart syntax.
new_vals <- as.numeric(as.character(sub("\\$","",vals)))
new_vals <- cut(new_vals,100)
But regraphing:
symbols(new_vals,counts, circles=sums)
Is nonsensical -- all the values line up at zero on the x-axis.
Now that you've binned vals into a factor with cut, you can just use tapply again to find the counts and the sums using these new breaks. For example:
counts = tapply(dfr$CTRIB_AMT, new_vals, length)
sums = tapply(dfr$CTRIB_AMT, new_vals, sum)
For this type of thing, though, you might find the plyr and ggplot2 packages helpful. Here is a complete reproducible example:
require(ggplot2)
# Options
n = 1000
breaks = 10
# Generate data
set.seed(12345)
CTRIB_NAML = replicate(n, paste(letters[sample(10)], collapse=''))
CTRIB_NAMF = replicate(n, paste(letters[sample(10)], collapse=''))
CTRIB_AMT = paste('$', round(runif(n, 0, 100), 2), sep='')
FILER_ID = replicate(10, paste(as.character((0:9)[sample(9)]), collapse=''))[sample(10, n, replace=T)]
dfr = data.frame(CTRIB_NAML, CTRIB_NAMF, CTRIB_AMT, FILER_ID)
# Format data
dfr$CTRIB_AMT = as.numeric(sub('\\$', '', dfr$CTRIB_AMT))
dfr$CTRIB_AMT_cut = cut(dfr$CTRIB_AMT, breaks)
# Summarize data for plotting
plot_data = ddply(dfr, 'CTRIB_AMT_cut', function(x) data.frame(count=nrow(x), total=sum(x$CTRIB_AMT)))
# Make plot
dev.new(width=4, height=4)
qplot(CTRIB_AMT_cut, count, data=plot_data, geom='point', size=total) + opts(axis.text.x=theme_text(angle=90, hjust=1))

Resources