What is the difference the zoo object and ts object in R? - r

I want to know the differences into use ts() or zoo() function.

A zoo object has the time values (possibly irregular) in an index attribute displayed like a row name at the console by the print.zoo method and the values in a matrix or atomic vector which places constraints on the values that can be used (generally numeric, but necessarily all of a single mode, i.e. not as a list with multiple modes like a dataframe might hold). With pkg:zoo loaded, to get a list of functions that have zoo-methods:
library(zoo)
methods(class="zoo")
The yrmon- class is added to allow monthly date indices. you can see the range of methods:
methods(class="yearmon")
The xts-class is an important extension to the zoo methods but an additional package is needed. There are many worked examples of zoo and xts functions on SO.
A ts-object has values of a single mode with attributes that always imply regular observations and those attributes support a recurring cycle such as years and months. Rather than storing the index item by item or row by row, the index is calculated on the fly using 'start', 'end' and 'frequency' values stored as attributes and accessible with functions by those names. The list of functions for ts-objects is distinctly small (and most people find them more difficult to work with):
methods(class="ts")
There was also an its-package for irregular time series, but it was distinctly less popular than the zoo-package and has apparently been abandoned.

Related

Properly define the members and invariants of a class in R

We have an R package for a certain purpose. The basic data structure is a correlation function which is a real/complex valued function for a smallish (100) number (T) of time slices. We have multiple measurements (N) of it, so at its core it is a N×T matrix. But then there are more things that it can become:
One can bootstrap it with R samples such that it becomes an R×T matrix. However we want to keep the original data, so there is a field for the R×T matrix and another for the N×T matrix.
It can be symmetrized which will cut T in half and also alter various other functions that work with those objects.
Also it can be shifted which takes the difference between consecutive elements and therefore drops one time slice. The first column in the matrix then corresponds to t = 1 and not t = 0 any more, which becomes important in fits to the data.
Correlation functions may have an imaginary part, this is stored as a second real matrix. But they might not.
When doing non-linear operations with the data, we do that once with the average of the original data and each bootstrap sample. If the result is another correlation function, that object will not have “original data” but only the average.
So basically we have a class that can have various fields and only the average of the original data is really common.
To make things worse, there is no formal documentation for the possible members and the invariants associated with them. Coming from C++ where a concise class definition allows me do encapsulation, The S3 class system in R seems like an invitation for inconsistencies.
This surfaced a few times when some function taking such a correlation function as argument and expected some field to be present while it was not. The code is riddled with lines that just add another field to the class when performing an operation.
Long story short: Is there some automatically enforcable way in the S3 class system to have an exhaustive list of all the fields that a class can have? Right now I only see the possibility to document (in English) in the constructor function and just hope nobody missed a line where fields were added.

Backtest over specific dates in Quanstrat R

How do I backtest over specific dates, for example 2008::2010 in Quanstrat?
I want to load symbols from 2001::2017, but i only want to back test over a subset of dates. (rather than reload the symbols every time for specific date ranges)
There is no built-in way to do this in quantstrat. In fact, there is a comment at the beginning of the apply* functions that says:
#TODO add Date subsetting
(patches welcome)
There are a number of possible ways to do this with the existing code though.
Probably the simplest way is to load all your market data into an environment, and then subset your market data into the .GlobalEnv before each call to applyStrategy.
Indicators and signals should use vectorized functions, and should take (at most) seconds to apply to the entire series. So the simplest thing is probably to run applyIndicators and applySignals manually over the entire series, and then call applyRules with just the subset you want.
You could also add a signal function that does understand subsets. This signal function would be last in the strategy specification, and would filter all your other signals to 0 outside of your preferred date range.

Summarizing attributes across sequences in a single sequence object?

I'm using TraMineR to analyze sets of sequences. Each coherent set of sequences may contain 100 work processes from a single project for a single period of time. Using TraMineR I can easily calculate descriptive statistics for each sequence, however I'm more interested in descriptive statistics of the sequence object itself - subsuming all the smaller sequences within.
For example, to get state frequencies, I run:
seqstatd(sequences.sts)
However, this gives me the state frequencies for each sequence within my sequence object. I want to access the frequencies of states across all sequences inside of my sequence object. How can I accomplish this?
I am not sure to understand your question since seqstatd() returns the cross-sectional frequencies at each successive position, and NOT the state frequencies for each sequence. The latter is returned by seqistatd().
Assuming you refer to the outcome of seqistatd() you would get the mean time spent in each state with seqmeant(sequence.sts).
For other summaries you can use the apply function. For instance, you get the variance of the time spent in each state with
tab <- seqistatd(mvad.seq)
vart <- apply(tab,2,var)
head(vart)
Hope this helps.

Can an xts object be created from a list?

I would like to apply the xts class to a list.
y <- list(1, 2, 3)
tm <- Sys.time() + 1:3
require(xts)
xts(x = y, order.by = tm)
## Error in coredata.xts(x) : currently unsupported data type
Fair enough, is it fairly straightforward to extend this so that I can make this work for my own (extension of list) class? Do I write methods for coredata, index and xts that apply to my own class or do I need to first add similar methods for list?
I couldn't find anything in the documentation or vignettes on this, but I'm probably missing something obvious.
Primarily I would like to create a simple class based on a recursive vector, and then apply the xts tools of index and [ to that. The extraction tools allow indexing by time interval with simple character strings, i.e. ["2013-05-31 10"] means the interval between 10:00:00 and 10:59:59 on that day and this is the feature I'd like to get for free.
An xts object is (in essence) a numerical matrix plus an index attribute.
The constraints are hence a) to have a numeric matrix (which you know how to create from a list) and b) to have a POSIXt object for the index.
If you are beholden to lists, keep your data as a list of ... xts objects.
Exploring the source code shows that this is really not possible without substantial work (as Joshua says in the comment above).
The code that provides the general support for input types is in C in xts, so that alone makes it extra effort to apply this outside of atomic vectors, matrices and data.frames.
The analogous code in zoo is pure R so that could work a little more easily, but I wanted the support that allows indexing by time interval with simple character strings, i.e. ["2013-05-31 10"] means the interval between 10:00:00 and 10:59:59 on that day.
The best options I can see are
Poach or recreate the code for time interval indexing and apply to the new class
Create an object containing xts and define methods to propagate the support to the recursive list component. (There are examples of this in an overall S4 context, e.g. in spacetime.)

Index xts using string and return only observations at that exact time

I have an xts time series in R and am using the very handy function to subset the time series based on a string, for example
time_series["17/06/2006 12:00:00"]
This will return the nearest observation to that date/time - which is very handy in many situations. However, in this particular situation I only want to return the elements of the time series which are at that exact time. Is there a way to do this in xts using a nice date/time string like this?
In a more general case (I don't have this problem immediately now, but suspect I may run into it soon) - is it possible to extract the closest observation within a certain period of time? For example, the closest observation to the given date/time, assuming it is within 10 minutes of the given date/time - otherwise just discard that observation.
I suspect this more general case may require me writing a function to do this - which I am happy to do - I just wanted to check whether the more specific case (or the general case) was already catered for in xts.
AFAIK, the only way to do this is to use a subset that begins at the time you're interested in, then get the first observation of that.
e.g.
first(time_series["2006-06-17 12:00:00/2006-06-17 12:01"])
or, more generally, to get the 12:00 price every day, you can subset down to 1 minute of each day, then split by days and extract the first observation of each.
do.call(rbind, lapply(split(time_series["T12:00:00/T12:01"],'days'), first))
Here's a thread where Jeff (the xts author) contemplates adding the functionality you want
http://r.789695.n4.nabble.com/Find-first-trade-of-day-in-xts-object-td3598441.html#a3599887

Resources