R subsetting by date range - r

seems simple enough and I've been through all similar questions and applied them all... I'm either getting nothing or everything...
Trying to took at water temperatures (WTEMP) for specific date range(SAMPLE_DATE) 2007-06-01 to 2007-09-30 from (allconmon)
here is my code so far...
bydate<-subset(allconmon, allconmon$SAMPLE_DATE > as.Date("2007-06-01") & allconmon$SAMPLE_DATE < as.Date("2007-09-30"))
Ive also tried this but get errors
bydate2<- as.xts(allconmon$WTEMP,order.by=allconmon$SAMPLE_DATE)
bydate2['2007-06-01/2007-09-30']
Error in xts(x, order.by = order.by, frequency = frequency, .CLASS = "double", :
order.by requires an appropriate time-based object
not sure what I'm doing wrong here... seems to work for other people

I will highly recommend you using zoo package in R while dealing with time series data.
The operation you mentioned is actually a window function in zoo.
Here is the example from ?window:
Examples
window(presidents, 1960, c(1969,4)) # values in the 1960's
window(presidents, deltat = 1) # All Qtr1s
window(presidents, start = c(1945,3), deltat = 1) # All Qtr3s
window(presidents, 1944, c(1979,2), extend = TRUE)
pres <- window(presidents, 1945, c(1949,4)) # values in the 1940's
window(pres, 1945.25, 1945.50) <- c(60, 70)
window(pres, 1944, 1944.75) <- 0 # will generate a warning
window(pres, c(1945,4), c(1949,4), frequency = 1) <- 85:89
pres
Here is a list of papers from JSS demonstrating the usage of the zoo package also reshape your data which I found very inspiring.

I figured it out! on multiple levels... first off I didn't notice that R did something funky with my sample date label when I uploaded from text file... probably my fault...
here is a small sample of the data set. its 5,573,301 observations of 30 variables
notice the funky symbol in front of sample date.... not sure why R did that...
ï..SAMPLE_DATE SampleTime STATION SONDE Layer TOTAL_DEPTH TOTAL_DEPTH_A BATT BATT_A WTEMP WTEMP_A SPCOND SPCOND_A SALINITY SALINITY_A DO_SAT DO_SAT_A
however what I did.... (i changed the name to x as allconmon was a bit excessive)
x <- read.csv(file = "C:/Users/Desktop/cmon2001-08.txt",quote = "",header = TRUE,sep = "\t", na.strings = c("","NULL"))
library(chron)
x$month <- months(as.Date(x$ï..SAMPLE_DATE, "%Y-%m-%d"))
x$year <- substr(as.character(x$ï..SAMPLE_DATE), 1, 4)
y <- x[x$month == 'June' | x$month == 'July' | x$month == 'August' | x$month == 'September' ,]
so now I was able to subset all my data by those 4 months and then later by year, station, and water temp....

Related

Recursive / Expanding Window forecasts

I am having a small issue with my Rstudio code. I will try to replicate my code but unfortunately there is no easy data for me to show. This is about the package forecast. What I am looking for is somehwat simpler for what is in the manual. But unfortunately, I am not able to work round it.
so the issue is with an expanding window forecast. So I have a dependent variable Y and 3 regressors (X). I am trying to build a recursive one steap ahead forecast for each X.
Here is my code.
library(forecast)
library(zoo)
library(timeDate)
library(xts)
## Load data
data = Dataset[,2:ncol(Dataset)]
st <- as.Date("1990-1-1")
en <- as.Date("2020-12-1")
tt <- seq(st, en, by = "1 month")
data = xts(data, order.by=tt)
##########################################################################
RECFORECAST=function (Y,X,h,window){
st <- as.Date("1990-1-1")
en <- as.Date("2020-12-1")
tt <- seq(st, en, by = "1 month")
datas= cbind(Y,X)
newfcast= matrix(0,nrow(datas),h)
for (k in 1:nrow(datas)){
sample =datas[1:(window+k-1),]
# print(sample)
v= window+k
# print(v)
# fit = Arima(sample[,1], order=c(0,0,0),xreg=sample[,2])
fit = lm(sample[,1]~sample[,2], data = sample)
# fcast=forecast(fit,xreg=rep(sample[v,2],h))$mean
fcast = forecast.lm(fit,sample[v,2],h=1)$mean
print(fcast)
# print(fcast)
# newfcast[k+window+1,]=fcast
}
print(newfcast)
return(newfcast)
}
## Code to send the loop into forecasts
StoreMatrix = data$growth ## This is the first column data[,1]
for (i in 2:4)
{
try({
X=data[,i]
Y=data[,1]
RecModel=RECFORECAST(Y,X,h=1,window=60) ##Here the initial window is 60 obs
StoreMatrix=cbind(StoreMatrix,RecModel)
print(StoreMatrix)
}, silent=T)
}
The bits # were different ways I tried to crosscheck my data and they may not be useful. I have tried so many things but I don't seem to be able to get my head through it. At the end I want to have a matrix (StoreMatrix) with the first variable being the realization, and each of the columns with the corresponding 1 step ahead forecast.
The main lines where there seems to be an issue are these ones:
# fcast=forecast(fit,xreg=rep(sample[v,2],h))$mean
fcast = forecast.lm(fit,sample[v,2],h=1)$mean
Note sure how to solve this. Thank you very much.

addTA - Error in naCheck(x, n) : Series contains non-leading NAs

I recently tried to create my own technical indicator, a simple golden cross indicator. 50 - 200 day EMA to be added to my chartSeries chart. This worked fine with the code below at first, but after the updated package of quantmod was released it gives me this error message:
Code (stock data is downloaded through the getSymbols function in quantmod)
#20dayEMA - 50dayEMA Technical indicator, Price and Volume
newEMA <- function(x){(removeNA(EMA(p[,6],n=50)-(EMA(p[,6],n=200))))
}
emaTA <- newTA(newEMA)
emaTA(col='lightgoldenrod3', 'Price')
Then it gives me this error message:
Error in naCheck(x, n) : Series contains non-leading NAs
Does anyone know how to remove these non-leading NAs?
You can use na.omit and there is no need to convert to an xts-object as this is the default.
library(quantmod)
getSymbols("VELO.CO")
p <- na.omit(VELO.CO)
newEMA <- function(x) {
EMA(p[,6], n = 20) - (EMA(p[,6], n = 50))
}
emaTA <- newTA(newEMA)
barChart(VELO.CO)
emaTA(col = "lightgoldenrod3", "Price")
I'm not familiar with the quantmod package, but I played around with your code and I think I found a working solution:
library("quantmod")
getSymbols("VELO.CO")
p <- as.xts(c(VELO.CO))
# remove incomplete cases
vec <- which(!complete.cases(p)) # rows 2305 2398
p2 <- p[-vec, ]
newEMA <- function(x) {
EMA(p2[, 6], n = 20) - (EMA(p2[, 6], n = 50))
}
emaTA <- newTA(newEMA)
barChart(VELO.CO)
emaTA(col = "lightgoldenrod3", "Price")

Error in if ((location <= 1) | (location >= length(x)) - R - Eventstudies

I am trying my best at a simple event study in R, with some data retrieved from the Wharton Research Data Service (WRDS). I am not completely new to R, but I would describe my expertise level as intermediate. So, here is the problem. I am using the eventstudies package and one of the steps is converting the physical dates to event time frame dates with the phys2eventtime(..) function. This function takes multiple arguments:
z : time series data for which event frame is to be generated. In the form of an xts object.
Events : it is a data frame with two columns: unit and when. unit has column name of which response is to measured on the event date, while when has the event date.
Width : width corresponds to the number of days on each side of the event date. For a given width, if there is any NA in the event window then the last observation is carried forward.
The authors of the package have provided an example for the xts object (StockPriceReturns) and for Events (SplitDates). This looks like the following:
> data(StockPriceReturns)
> data(SplitDates)
> head(SplitDates)
unit when
5 BHEL 2011-10-03
6 Bharti.Airtel 2009-07-24
8 Cipla 2004-05-11
9 Coal.India 2010-02-16
10 Dr.Reddy 2001-10-10
11 HDFC.Bank 2011-07-14
> head(StockPriceReturns)
Mahindra.&.Mahindra
2000-04-03 -8.3381609
2000-04-04 0.5923550
2000-04-05 6.8097616
2000-04-06 -0.9448889
2000-04-07 7.6843828
2000-04-10 4.1220462
2000-04-11 -1.9078480
2000-04-12 -8.3286900
2000-04-13 -3.8876847
2000-04-17 -8.2886060
So I have constructed my data in the same way, an xts object (DS_xts) and a data.frame (cDS) with the columns "unit" and "when". This is how it looks:
> head(DS_xts)
61241
2011-01-03 0.024247
2011-01-04 0.039307
2011-01-05 0.010589
2011-01-06 -0.022172
2011-01-07 0.018057
2011-01-10 0.041488
> head(cDS)
unit when
1 11754 2012-01-05
2 10104 2012-01-24
3 61241 2012-01-31
4 13928 2012-02-07
5 14656 2012-02-08
6 60097 2012-02-14
These are similar in my opinion, but how it looks does not tell the whole story. I am quite certain that my problem is in how I have constructed these two objects. Below is my R code:
#install.packages("eventstudies")
library("eventstudies")
DS = read.csv("ReturnData.csv")
cDS = read.csv("EventData.csv")
#Calculate Abnormal Returns
DS$AR = DS$RET - DS$VWRETD
#Clean up and let only necessary columns remain
DS = DS[, c("PERMNO", "DATE", "AR")]
cDS = cDS[, c("PERMNO", "DATE")]
#Generate correct date format according to R's as.Date
for (i in 1:nrow(DS)) {
DS$DATE[i] = format(as.Date(toString(DS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
for (i in 1:nrow(cDS)) {
cDS$DATE[i] = format(as.Date(toString(cDS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
#Rename cDS columns according to phys2eventtime format
colnames(cDS)[1] = "unit"
colnames(cDS)[2] = "when"
#Create list of unique PERMNO's
PERMNO <- unique(DS$PERMNO)
for (i in 1:length(PERMNO)) {
#Subset based on PERMNO
DStmp <- DS[DS$PERMNO == PERMNO[i], ]
#Remove PERMNO column and rename AR to PERMNO
DStmp <- DStmp[, c("DATE", "AR")]
colnames(DStmp)[2] = as.character(PERMNO[i])
dates <- as.Date(DStmp$DATE)
DStmp <- DStmp[, -c(1)]
#Create a temporary XTS object
DStmp_xts <- xts(DStmp, order.by = dates)
#If first iteration, just create new variable, otherwise merge
if (i == 1) {
DS_xts <- DStmp_xts
} else {
DS_xts <- merge(DS_xts, DStmp_xts, all = TRUE)
}
}
#Renaming columns for matching
colnames(DS_xts) <- c(PERMNO)
#Making sure classes are the same
cDS$unit <- as.character(cDS$unit)
eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
So, if I run phys2eventtime(..) it returns:
> eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
Error in if ((location <= 1) | (location >= length(x))) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In findInterval(when, index(x)) : NAs introduced by coercion
I have looked at the original function (it is available at their GitHub, can't use more than two links yet) to figure out this error, but I ran out of ideas how to debug it. I hope someone can help me sort it out. As a final note, I have also looked at another (magnificent) answer related to this R package (question: "format a zoo object with “dimnames”=List of 2"), but it wasn't enough to help me solve it (or I couldn't yet comprehend it).
Here is the link for the two CSV files if you would like to reproduce my error (or solve it!).

Using ifelse to create a running tally in R

I am trying to do some quantitative modeling in R. I'm not getting an error message, but the results are not what I actually need.
I am a newbie, but here is my complete code sample.
`library(quantmod)
#Building the data frame and xts to show dividends, splits and technical indicators
getSymbols(c("AMZN"))
Playground <- data.frame(AMZN)
Playground$date <- as.Date(row.names(Playground))
Playground$wday <- as.POSIXlt(Playground$date)$wday #day of the week
Playground$yday <- as.POSIXlt(Playground$date)$mday #day of the month
Playground$mon <- as.POSIXlt(Playground$date)$mon #month of the year
Playground$RSI <- RSI(Playground$AMZN.Adjusted, n = 5, maType="EMA") #can add Moving Average Type with maType =
Playground$MACD <- MACD(AMZN, nFast = 12, nSlow = 26, nSig = 9)
Playground$Div <- getDividends('AMZN', from = "2007-01-01", to = Sys.Date(), src = "google", auto.assign = FALSE)
Playground$Split <- getSplits('AMZN', from = "2007-01-01", to = Sys.Date(), src = "google", auto.assign = FALSE)
Playground$BuySignal <- ifelse(Playground$RSI < 30 & Playground$MACD < 0, "Buy", "Hold")
All is well up until this point when I start using some logical conditions to come up with decision points.
Playground$boughts <- ifelse(Playground$BuySignal == "Buy", lag(Playground$boughts) + 1000, lag(Playground$boughts))
It will execute but the result will be nothing but NA. I suppose this is because you are trying to add NA to a number, but I'm not 100% sure. How do you tell the computer I want you to keep a running tally of how much you have bought?
Thanks so much for the help.
So we want ot buy 1000 shares every time a buy signal is generated?
Your problem stems from MACD idicator. It actually generates two columns, macd and signal. You have to decide which one you want to keep.
Playground$MACD <- MACD(AMZN, nFast = 12, nSlow = 26, nSig = 9)$signal
This should solve the problem at hand.
Also, please check the reference for ifelse. The class of return value can be tricky at times, and so the approach suggested by Floo0 is preferable.
Also, I'd advocate using 1 and 0 instead of buy and sell to show weather you are holding . It makes the math much easier.
And I'd strongly suggest reading some beginner tutorial on backtesting with PerformanceAnalytics. They make the going much much easier.
BTW, you missed this line in the code:
Playground$boughts<- 0
Hope it helps.
EDIT: And I forgot to mention the obvious. discard the first few rows where MACD will be NA
Something like:
Playground<- Playground[-c(1:26),]
Whenever you want to do an ifelse like
if ... Do something, else stay the same: Do not use ifelse
Try this instead
ind <- which(Playground$BuySignal == "Buy")
Playground$boughts[ind] <- lag(Playground$boughts) + 1000

How can I apply factors to different subsets of a larger time series using a custom function?

I'm measuring a physiological variable with a millisecond timestamp on a number of patients. For each patient I want to apply a factor to a subset of the timestamped rows describing their posture at that exact moment.
I've tried creating the following function, which works fine when describing the first posture. When trying to apply the next "posture-factor," the previously registered posture is deleted.
TestPatient <- data.frame(Time=seq(c(ISOdatetime(2011,12,22,12,00,00)), by = "sec", length.out = 100),Value=rnorm(100, 9, 3))
patientpositionslice <- function(patient,positiontype,timestart,timestop) {
patient$Position[
format(patient$Time, "%Y-%m-%d %H:%M:%S") >= timestart &
format(patient$Time, "%Y-%m-%d %H:%M:%S") < timestop] <- positiontype
patient
}
TestPatientNew <- patientpositionslice(TestPatient,"Horizontal","2011-12-22 12:00:05","2011-12-22 12:00:10")
TestPatientNew <- patientpositionslice(TestPatient,"Vertical","2011-12-22 12:00:15","2011-12-22 12:00:20")
How do I modify the function so I can apply it repeatedly on the same patient with different postures such as "Horizontal", "Vertical", "Sitting" etc.?
Here's your solution. Probably there are more elegant ways but this is mine ;)
TestPatient <- data.frame(Time=seq(c(ISOdatetime(2011,12,22,12,00,00)), by = "sec", length.out = 100),Value=rnorm(100, 9, 3))
#Included column with position
TestPatient$position <- NA
patientpositionslice <- function(patient,positiontype,timestart,timestop) {
#changed the test to ifelse() function
new<-ifelse(
format(patient$Time, "%Y-%m-%d %H:%M:%S") >= timestart &
format(patient$Time, "%Y-%m-%d %H:%M:%S") < timestop , positiontype, patient$position)
patient$position <- new
patient
}
TestPatientNew <- patientpositionslice(TestPatient,"Horizontal","2011-12-22 12:00:05","2011-12-22 12:00:10")
#For repeated insertion use the previous object
TestPatientNew <- patientpositionslice(TestPatientNew ,"Vertical","2011-12-22 12:00:15","2011-12-22 12:00:20")
i commented the changes. hope it is like you wanted it else just correct me.

Resources