Evolution of linear regression coefficients over time - datetime

I would like to observe the evolution of the linear regression coefficients over time. To be more precise, let's have a time frame of 2 years where the linear regression will always use the data set with a range of 1 year. After the first regression, we move one week further (i.e. we add a new week, but one is also subtracted from the beginning) and do the regression again as long as we reach the final date: altogether, there will be 52 regressions.
My problem is that there are some holidays in the data set and we can not simply add 7 days as one would easily suggest. I would like to have some wrapper function that would do aforementioned for many other functions from different packages, for example forecast.lm() from the forecast package or any function that one can think of: the objective in every case would be to find the evolution of the linear regression parameters week-by-week.

I think you might get more answers if you edit/subdivide your question in a clear way. (1) how do I find holidays (it's not clear what your definition of holidays is)? (2) how do I slice up a data set accordingly? (3) how do I run a linear regression in each chunk?
(1) find holidays: can't really help here, as I don't know how they're defined/coded in your data set. library(sos); findFn("holiday") finds some options
(2) partition the data set according to inter-holiday/weekend intervals. The example below supposes holidays are coded as 1 and non-holidays are coded as zero.
(3) run the linear regression for each chunk and extract the coefficients.
d <- data.frame(holiday=c(0,0,0,1,1,0,0,0,0,1,0,0,0,0),
x=runif(14),y=runif(14))
per <- cumsum(c(1,diff(d$holiday)==-1)) ## maybe use rle() instead
dd <- with(d,split(subset(d,!holiday),per[!holiday]))
t(sapply(lapply(dd,lm,formula=y~x),coef))

Related

How to create a rolling linear regression in R?

I am trying to create (as the title suggests) a rolling linear regression equation on a set of data (daily returns of two variables, total of 257 observations for each, linked together by date, want to make the rolling window 100 observations). I have searched for rolling regression packages but I have not found one that works on my data. The two data pieces are stored within one data frame.
Also, I am pretty new to programming, so any advice would help.
Some of the code I have used is below.
WeightedIMV_VIX_returns_combined_ID100896 <- left_join(ID100896_WeightedIMV_daily_returns, ID100896_VIX_daily_returns, by=c("Date"))
head(WeightedIMV_VIX_returns_combined_ID100896, n=20)
plot(WeightedIMV_returns ~ VIX_returns, data = WeightedIMV_VIX_returns_combined_ID100896)#the data seems to be correlated enought to run a regression, doesnt matter which one you put first
ID100896.lm <- lm(WeightedIMV_returns ~ VIX_returns, data = WeightedIMV_VIX_returns_combined_ID100896)
summary(ID100896.lm) #so the estimate Intercept is 1.2370, estimate Slope is 5.8266.
termplot(ID100896.lm)
Again, sorry if this code is poor, or if I am missing any information that some of you may need to help. This is my first time on here! Just let me know what I can do better. Thanks!

Account for different variable lengths after differenciation in a linear Model in R

Struggling with R.
My linear lm() Model contains Variables that are differenced via diff() and variables which are not differenced. The differenced variables are one observation shorter due to differencing.
Therefore, the lm() gives the error message of different variable lengths.
My idea of a solution to this error is, to somehow define the variables as time series (which they are anyway, but R doesnt know that) and then tell the lm-Model exactly, which years to use (yearly data).
In my understanding, after differencing, a time-series looses its first observation, so, as I use the ts()-Funktion, I shall set a starting year one year later for the differenced function.
More concrete:
lets say I imported the variables x and y
then I go
dx<-diff(x, lag=1, differences=1)
while y remains the same
lm(y~dx)
will then produce the said error.
Lets say x and y both start in 1900. Then dx starts in 1901, so that lm has to start in 1901 for all Variables.
My idea is, as stated above, to explicitly make both variables a time-series
tsdx<-ts(dx, frequency=1, start=1901)
tsy<-ts(y, frequency=1, start:1900)
and then somehow tell lm() to start with the year 1901.
Is this a good way to deal with these Problems? And how would I code the last step? Thanks alot!

Generated non-stationary Data and auto.arima in R

I generated my own fictional Sales Data in order to execute a time series analysis.
It is supposed to represent a growing company and therefore i worked with a trend. However, I read through some tutorials and often read the information, that non-stationary time series should not be predicted by the auto.arima function.
But I receive results that make sense and If I would difference the data (which i did as well) the output doesn't make much sense.
So here comes my question: Can I use the auto.arima function with my data, that obviously has a trend?
Best regards and thanks in advance,
Francisco
eps <- rnorm(100, 30, 20)
trend <- seq(1, 100, 1)
trend <- 3 * trend
Sales <- trend + eps
timeframe<-seq(as.Date("2008/9/1"),by="month",length.out=100)
Data<-data.frame(Sales,timeframe)
plot(Data$timeframe,Data$Sales)
ts=ts(t(Data[,1]))
plot(ts[1,],type='o',col="black")
md=rwf(ts[1,],h=12,drift=T,level=c(80,95))
auto.arima(ts[1,])
Using the forecast function allows us to plot the expected sales for the next year: plot(forecast(auto.arima(ts[1,]),h=12))
Using the forecast function with our automated ARIMA can help us plan for the next quartal
forecast(auto.arima(ts[1,]),h=4)
plot(forecast(auto.arima(ts[1,])))
another way would be to use the autoplot function
fc<-forecast(ts[1,])
autoplot(fc)
The next step is to analyze our time-series. I execute the adf test, which has the null-hypothesis that the data is non-stationary.
So with the 5% default threshold our p-value would have to be greater than 0.05 in order to be certified as non-stationary.
library(tseries)
adf=adf.test(ts[1,])
adf
The output suggests that the data is non-stationary:
acf
acf=Acf(ts[1,])
Acf(ts[1,])
The autocorrelation is decreasing almost steadily, this points to non-stationary data also. Doing a kpss.test should verify that our data is non-stationary, since its null-hypothesis is the opposite of the adf test.
Do we expect a value smaller than 0.05
kpss=kpss.test(ts[1,])
kpss
We receive a p-value of 0.01, further proving that the data has a trend
ndiffs(ts[1,])
diff.data=diff(ts[1,])
auto.arima(diff.data)
plot(forecast(diff.data))
To answer your question - yes, you can use the auto.arima() function in the forecast package on non-stationary data.
If you look at the help file for auto.arima() (by typing ?auto.arima) you will see that it explains that you can choose to specify the "d" parameter - this is the order of differencing - first order means you difference the data once, second order means you difference the data twice etc. You can also choose not to specify this parameter and in this case, the auto.arima() function will determine the appropriate order of differencing using the "kpss" test. There are other unit root tests such as the Augmented Dickey-Fuller which you can choose to use in the auto.arima function by setting test="adf". It really depends on your preference.
You can refer to page 11 and subsequent pages for more information on the auto.arima function here:
https://cran.r-project.org/web/packages/forecast/forecast.pdf

Trying to do a simulation in R

I'm pretty new to R, so I hope you can help me!
I'm trying to do a simulation for my Bachelor's thesis, where I want to simulate how a stock evolves.
I've done the simulation in Excel, but the problem is that I can't make that large of a simulation, as the program crashes! Therefore I'm trying in R.
The stock evolves as follows (everything except $\epsilon$ consists of constants which are known):
$$W_{t+\Delta t} = W_t exp^{r \Delta t}(1+\pi(exp((\sigma \lambda -0.5\sigma^2) \Delta t+\sigma \epsilon_{t+\Delta t} \sqrt{\Delta t}-1))$$
The only thing here which is stochastic is $\epsilon$, which is represented by a Brownian motion with N(0,1).
What I've done in Excel:
Made 100 samples with a size of 40. All these samples are standard normal distributed: N(0,1).
Then these outcomes are used to calculate how the stock is affected from these (the normal distribution represent the shocks from the economy).
My problem in R:
I've used the sample function:
x <- sample(norm(0,1), 1000, T)
So I have 1000 samples, which are normally distributed. Now I don't know how to put these results into the formula I have for the evolution of my stock. Can anyone help?
Using R for (discrete) simulation
There are two aspects to your question: conceptual and coding.
Let's deal with the conceptual first, starting with the meaning of your equation:
1. Conceptual issues
The first thing to note is that your evolution equation is continuous in time, so running your simulation as described above means accepting a discretisation of the problem. Whether or not that is appropriate depends on your model and how you have obtained the evolution equation.
If you do run a discrete simulation, then the key decision you have to make is what stepsize $\Delta t$ you will use. You can explore different step-sizes to observe the effect of step-size, or you can proceed analytically and attempt to derive an appropriate step-size.
Once you have your step-size, your simulation consists of pulling new shocks (samples of your standard normal distribution), and evolving the equation iteratively until the desired time has elapsed. The final state $W_t$ is then available for you to analyse however you wish. (If you retain all of the $W_t$, you have a distribution of the trajectory of the system as well, which you can analyse.)
So:
your $x$ are a sampled distribution of your shocks, i.e. they are $\epsilon_t=0$.
To simulate the evolution of the $W_t$, you will need some initial condition $W_0$. What this is depends on what you're modelling. If you're modelling the likely values of a single stock starting at an initial price $W_0$, then your initial state is a 1000 element vector with constant value.
Now evaluate your equation, plugging in all your constants, $W_0$, and your initial shocks $\epsilon_0 = x$ to get the distribution of prices $W_1$.
Repeat: sample $x$ again -- this is now $\epsilon_1$. Plugging this in, gives you $W_2$ etc.
2. Coding the simulation (simple example)
One of the useful features of R is that most operators work element-wise over vectors.
So you can pretty much type in your equation more or less as it is.
I've made a few assumptions about the parameters in your equation, and I've ignored the $\pi$ function -- you can add that in later.
So you end up with code that looks something like this:
dt <- 0.5 # step-size
r <- 1 # parameters
lambda <- 1
sigma <- 1 # std deviation
w0 <- rep(1,1000) # presumed initial condition -- prices start at 1
# Show an example iteration -- incorporate into one line for production code...
x <- rnorm(1000,mean=0,sd=1) # random shock
w1 <- w0*exp(r*dt)*(1+exp((sigma*lambda-0.5*sigma^2)*dt +
sigma*x*sqrt(dt) -1)) # evolution
When you're ready to let the simulation run, then merge the last two lines, i.e. include the sampling statement in the evolution statement. You then get one line of code which you can run manually or embed into a loop, along with any other analysis you want to run.
# General simulation step
w <- w*exp(r*dt)*(1+exp((sigma*lambda-0.5*sigma^2)*dt +
sigma*rnorm(1000,mean=0,sd=1)*sqrt(dt) -1))
You can also easily visualise the changes and obtain summary statistics (5-number summary):
hist(w)
summary(w)
Of course, you'll still need to work through the details of what you actually want to model and how you want to go about analysing it --- and you've got the $\pi$ function to deal with --- but this should get you started toward using R for discrete simulation.

Is it possibile to arrange a time series in the way that a specific autocorrleation is created?

I have a file containing 2,500 random numbers. Is it possible to rearrange these saved numbers in the way that a specific autocorrelation is created? Lets say, autocorrelation to the lag 1 of 0.2, autocorrelation to the lag 2 of 0.4, etc.etc.
Any help is greatly appreciated!
To be more specific:
The time series of a daily return in percent of an asset has the following characteristics that I am trying to recreate:
Leptokurtic, symmetric distribution, let's say centered at a daily return of zero
No significant autocorrelations (because the sign of a daily return is not predictable)
Significant autocorrleations if the time series is squared
The aim is to produce a random time series which satisfies all these three characteristics. The only two inputs should be the leptokurtic distribution (this I have already created) and the specific autocorrelation of the squared resulting time series (e.g. the final squared time series should have an autocorrelation at lag 1 of 0.2).
I only know how to produce random numbers out of my own mixed-distribution. Naturally if I would square this resulting time series, there would be no autocorrelation. I would like to find a way which takes this into account.
Generally the most straightforward way to create autocorrelated data is to generate the data so that it's autocorrelated. For example, you could create an auto correlated path by always using the value at p-1 as the mean for the random draw at time period p.
Rearranging is not only hard, but sort of odd conceptually. What are you really trying to do in the end? Giving some context might allow better answers.
There are functions for simulating correlated data. arima.sim() from stats package and simulate.Arima() from the forecast package.
simulate.Arima() has the advantages that (1.) it can simulate seasonal ARIMA models (maybe sometimes called "SARIMA") and (2.) It can simulate a continuation of an existing timeseries to which you have already fit an ARIMA model. To use simulate.Arima(), you do need to already have an Arima object.
UPDATE:
type ?arima.sim then scroll down to "examples".
Alternatively:
install.packages("forecast")
library(forecast)
fit <- auto.arima(USAccDeaths)
plot(USAccDeaths,xlim=c(1973,1982))
lines(simulate(fit, 36),col="red")

Resources