Difftime Error using Looping Regressions in R - r

With the below code I am getting the error Error in Ops.difftime((f - mean(f)), 2) : '^' not defined for "difftime" objects.
This error only occurs with the inclusion of r_sq[[counter-lookback]] <- summary(temp_lm)$r.squared; which is located towards the end of the loop. I cannot find any similar error solutions online. Thank you for your help.
#Import necessary packages
require(quantmod)
require(ggplot2)
#Measure time used in processing
ptm <- proc.time()
#########
#Write in the ticker symbols of the pair
tickers <- c("GS","JPM")
########
#Pull data down for symbols
A <- getSymbols(tickers[1],auto.assign=FALSE)
B <- getSymbols(tickers[2],auto.assign=FALSE)
#Strip data such as high and low prices
A <- A[,4]
B <- B[,4]
#Create data frame of both price series
AB_DF <- data.frame(A,B)
#Create a time series of the spread & rename header
S <- A-B
colnames(S) <- "Spread.Close"
#Separate the index of times from the spread data for regression
TS <- index(S)
SP <- coredata(S)
#Perform regressions of past 'lookback' days of the spread, incrementing by 1, beginning at T = lookback+1
########
# Change below variable to alter length of data in regression
lookback <- 250
#######
#Initialize a counter, and lists to hold data from the spread regressions
counter <- lookback+1
res_store <- list()
spread_coef <- list()
r_sq <- list()
while (counter<length(SP)) {
temp_lm <- lm(TS[(counter-lookback):counter]~SP[(counter-lookback):counter]);
res_store[[counter-lookback]] <- residuals(temp_lm);
spread_coef[[counter-lookback]] <- coefficients(temp_lm)[[2]];
r_sq[[counter-lookback]] <- summary(temp_lm)$r.squared;
counter <- counter+1;
}

Ok all, I have figured it out. The issue is that R does not like to compute R^2 values for data indexed by time. By regressing the data values against time, an error in difftime() occurs. I solved this by changing the index from time values to a standard integer index, and everything ran fine.

Related

Convert dataframe to mids using mice in R

I have use mice to impute data, save the data as csv, and then run a Factor Analysis in SPSS and generated some factors. I now want to load the csv in R and run an imputed linear regression on the data. However, when I try to convert the dataframe to mids I get and error message saying:
library(mice)
# assign mtcars to a new dataframe
df <- mtcars
# loop 10 times
for (x in 1:10){
# create a fake imp number
a <- rep(x, 1, nrow(df))
# bind the fake imp number to the df
df2 <- cbind(df, a)
# crate a 10 folded version of mtcars with also the fake imp number
if (x ==1){
new_df <- df2
} else{
new_df <- rbind(new_df, df2)
}
}
# change the column name of the fake imp to ".imp"
names(new_df)[names(new_df) == 'a'] <- '.imp'
# convert df to mids
df_imp <- as.mids(new_df, .imp = .imp)
> Error in as.mids(df) : Original data not found. Use `complete(...,
> action = 'long', include = TRUE)` to save original data.
Can you please help me with this error?
From the as.mids() documentation.
This function converts imputed data stored in long format into an object of class mids. The original incomplete dataset needs to be available so that we know where the missing data are. The function is useful to convert back operations applied to the imputed data back in a mids object. It may also be used to store multiply imputed data sets from other software into the format used by mice.
The incomplete data is stored as imputation 0 in the long format. Therefore starting your procedure at 0 instead of 1 resolves the issue. (Also, you need quotes around .imp = '.imp' in the as.mids() call. Or, remove it and rely on the default. Or, just supply "a" as the imputation variable.)
library(mice)
df <- mtcars
for (x in 0:10){
a <- rep(x, 1, nrow(df))
df2 <- cbind(df, a)
if (x == 0){
new_df <- df2
} else{
new_df <- rbind(new_df, df2)
}
}
names(new_df)[names(new_df) == 'a'] <- '.imp'
df_imp <- as.mids(new_df)

How to apply `lapply` function to a set of time series data set

I have time-series data. The data contains four univariate time-series columns over several years. I would like to fit ARIMA model to each univariate time-series data for each year of the first 4 years.
I tried this:
library(ggplot2)
library(forecast)
library(tseries)
library(zoo)
library(dplyr)
library(data.table)
data("EuStockMarkets")
dat <- EuStockMarkets
res <- lapply(split(as.zoo(EuStockMarkets), as.integer(time(EuStockMarkets))), as.ts)
datNew <- lapply(5:8, function(i) ts(res[[i]]))
dat.log <- lapply(1:4, function(i) log(datNew[[i]]))
dat.diff <- lapply(1:4, function(i)diff(dat.log[[i]]))
Logreturns <- dat.diff
The following code does not work properly as it gave me only 4 values. However, I accept it to give me 16 values (4 years, 4 univariate time series in each year)
Arima.model <- lapply(1:4, function(i)auto.arima(Logreturns[[i]][,i]))
I think what you need is double lapply, one to iterate over each list and another to iterate over each column of the list.
result <- unlist(lapply(seq_along(Logreturns), function(i)
lapply(seq_len(ncol(Logreturns[[i]])), function(j)
auto.arima(Logreturns[[i]][,j]))), recursive = FALSE)

Data extraction in for loop using r

I am trying to find and compare daily gain and loss percentage in two stocks in r. This is the code
library(quantmod)
stockData <- new.env() #Make a new environment for quantmod to store data in
tickers <- c("AAPL","GOOG")
#Set start date
start_date=as.Date("2014-01-01")
getSymbols(tickers, src="yahoo", env=stockData,from=start_date)
for (tick in tickers) {
x <- get(tick, pos=stockData) # get data from stockData environment
x$gl<-((Cl(x)-Op(x))/Op(x))*100 #Daily gain loss percentage
}
I am able to calculate daily gain/loss percentage for individual stocks but I don't know how to proceed further and extract-compare gain/loss percentage of multiple stocks separately.
Example
if AAPL(gain/loss percentage) is greater than GOOG(gain/loss percentage) then 1 else -1
Your code is a good starting point. However, I would suggest that you store the data of the tickers in a list. As the code is now, only the data of the last ticker treated in the loop is stored in x.
This slightly modified version might help:
library(quantmod)
stockData <- new.env() #Make a new environment for quantmod to store data in
tickers <- c("AAPL","GOOG","YHOO","FB")
#Set start date
start_date <- as.Date("2014-01-01")
getSymbols(tickers, src="yahoo", env=stockData, from=start_date)
x <- list()
for (i in 1:length(tickers)) {
x[[i]] <- get(tickers[i], pos=stockData) # get data from stockData environment
x[[i]]$gl <-((Cl(x[[i]])-Op(x[[i]]))/Op(x[[i]]))*100 #Daily gain loss percentage
}
compare_pl <- function(x,y){ifelse(x$gl > y$gl, 1, -1)}
aapl_vs_goog <- compare_pl(x[[1]],x[[2]])
Now the variable aapl_vs_goog contains the data on the days where AAPL outperformed GOOG (+1) or vice versa (-1):
> tail(aapl_vs_goog)
# gl
#2015-08-19 -1
#2015-08-20 1
#2015-08-21 1
#2015-08-24 1
#2015-08-25 -1
#2015-08-26 -1
Needless to say that this can be performed in the same way for any other ticker.

Prepend xts rows to a subset

Supposing I need to apply an MA(5) to a batch of market data, stored in an xts object. I can easily pull the subset of data I wanted smoothed with xts subsetting:
x['2013-12-05 17:00:01/2013-12-06 17:00:00']
However, I need an additional 5 observations prior to the first one in my subset to "prime" the filter. Is there an easy way to do this?
The only thing I have been able to figure out is really ugly, with explicit row numbers (here using xts sample data):
require(xts)
data(sample_matrix)
x <- as.xts(sample_matrix)
x$rn <- row(x[,1])
frst <- first(x['2007-05-18'])$rn
finl <- last(x['2007-06-09'])$rn
ans <- x[(frst-5):finl,]
Can I just say bleah? Somebody help me.
UPDATE: by popular request, a short example that applies an MA(5) to the daily data in sample_matrix:
require(xts)
data(sample_matrix)
x <- as.xts(sample_matrix)$Close
calc_weights <- function(x) {
##replace rnorm with sophisticated analysis
wgts <- matrix(rnorm(5,0,0.5), nrow=1)
xts(wgts, index(last(x)))
}
smooth_days <- function(x, wgts) {
w <- wgts[index(last(x))]
out <- filter(x, w, sides=1)
xts(out, index(x))
}
set.seed(1.23456789)
wgts <- apply.weekly(x, calc_weights)
lapply(split(x, f='weeks'), smooth_days, wgts)
For brevity, only the final week's output:
[[26]]
[,1]
2007-06-25 NA
2007-06-26 NA
2007-06-27 NA
2007-06-28 NA
2007-06-29 -9.581503
2007-06-30 -9.581208
The NAs here are my problem. I want to recalculate my weights for each week of data, and apply those new weights to the upcoming week. Rinse, repeat. In real life, I replace the lapply with some ugly stuff with row indexes, but I'm sure there's a better way.
In an attempt to define the problem clearly, this appears to be a conflict between the desire to run an analysis on non-overlapping time periods (weeks, in this case) but requiring overlapping time periods of data (2 weeks, in this case) to perform the calculation.
Here's one way to do this using endpoints and a for loop. You could still use the which.i=TRUE suggestion in my comment, but integer subsetting is faster.
y <- x*NA # pre-allocate result
ep <- endpoints(x,"weeks") # time points where parameters change
set.seed(1.23456789)
for(i in seq_along(ep)[-(1:2)]) {
rng1 <- ep[i-1]:ep[i] # obs to calc weights
rng2 <- ep[i-2]:ep[i] # "prime" obs
wgts <- calc_weights(x[rng1])
# calc smooth_days on rng2, but only keep rng1 results
y[rng1] <- smooth_days(x[rng2], wgts)[index(x[rng1])]
}

Adding Values In a Stochastic Simulation using R

Here I've managed to extract extract time in discrete form such as 1,2,3...,50 from each simulation with help of the users. But, since there is no value for interval 20-21 and more, is there any coding such that I can add the value inside myself? Because, if there is no reading for that time interval, that means the readings are same until the next interval. The coding I used as below:
library(GillespieSSA)
parms <- c(beta=0.591,sigma=1/8,gamma=1/7)
x0 <- c(S=50,E=0,I=1,R=0)
a <- c("beta*S*I","sigma*E","gamma*I")
nu <- matrix(c(-1,0,0, 1,-1,0, 0,1,-1, 0,0,1),nrow=4,byrow=TRUE)
set.seed(12345)
out <- lapply(X=1:1,FUN=function(x)
ssa(x0,a,nu,parms,tf=50)$data)
out a<-as.data.frame(out)
idx <- diff(ceiling(a$V1)) == 1 a[idx,]
## change ==1 to >0
idx <- diff(ceiling(a$V1)) > 0
## get discrete time series
discrete.data <- a[idx,]
## get the last time step value
end.time <- ceiling(tail(discrete.data$V1,1))
## create an empty data frame with all time steps
new.df <- data.frame(t=0:end.time, S=0, E=0, I=0, R=0)
## replace only those time steps that have valid values
new.df[new.df$t %in% ceiling(discrete.data$V1),2:5] <- discrete.data[,2:5]
If necessary missing values can be replace by NA, depends on how you want to handle them.

Resources