What does nPercentile do in Graphite and how it differs from percentileOfSeries? - graphite

I read the docs a million times but I can't figure out what it does. What does nPercentile do in Graphite and how it differs from percentileOfSeries?

nPercentile: "Returns n-percent of each series in the seriesList."
This converts every series you give it to a single value*, that is the n-th percentile of that series (but it will output as many series as it was given as input).
*note that as everything in graphite, this single value will be returned as a series, containing many times (as many as needed to fill the requested time-range) the same single value.
percentileOfSeries: "percentileOfSeries returns a single series which is composed of the n-percentile values taken across a wildcard series at each point."
This returns a single series, which, for every timepoint, contains the n-th percentile of the different input series.

Related

How to calculate the moving average of a binary value with no arrays or while/for statments

I've got a binary object (in my case it represents a valve).
I want to calculate a value that represents its average (or estimated average) value over the last hour.
I'm doing this calculation in a language called PPCL, which was created for controlling HVAC equipment.
It script language that doesn't use arrays, there is no for statement, or while statement.
I don't want to create 60 variable and sampling the value every 60 seconds, since I would end up with hundreds of variables with the multiple valves I've got to average.
Thanks
Use the
TIMAVG(output,sampleTime,numSamples,thingToAverage) function in PPCL to calculate moving averages.
Make sure that every pass of the program hits this line, and that there is no SAMPLE() on that line.

How in opentsdb do I add up all datapoints in a specified time range?

I'm using opentsdb. I have ONE time series, with values at 10-minute intervals. I want to specify a start time and an end time, and get back a single number that is the sum of all the values in the specified time range. I tried what I thought to be correct
...start=<start>&end=<end>&m=sum...
but got back all the individual values rather than their sum.
Add the element downsample="0all-sum"; apparently, the "0all" is interpreted as "the interval containing all timestamps".

Stretching multiple time series in R

I have a csv, with the columns representing a set of measurements taken over a period of time (in this case, the opening area of a larynx during a breath).
However, the times series (may) different numbers of measurements. eg:
23,34,44
25,35,39
23,33,,
23,,,
Using ts.plot(data) I've been able to plot these on the same graph. However, I need each series to be "stretched" to the same length. (Such that each column in the CSV represents the same distance on the x-axis, but with varying resolution) How might this be best achieved?
Additionally, I had been using lines(rowMeans(data, na.rm = TRUE)) to produce an average, which I also need to do with the "stretched" series.
I had been considering performing the interpolation (up to some arbitrary resolution such at 1000) in Ruby, and then producing a new CSV file to run the original R code against. I would expect there, however, to be a more elegant solution in R.
Maybe you just need approx? E.g., approx(some.series, n=length.max.series). This function offers constant or linear interpolation.

Summarizing attributes across sequences in a single sequence object?

I'm using TraMineR to analyze sets of sequences. Each coherent set of sequences may contain 100 work processes from a single project for a single period of time. Using TraMineR I can easily calculate descriptive statistics for each sequence, however I'm more interested in descriptive statistics of the sequence object itself - subsuming all the smaller sequences within.
For example, to get state frequencies, I run:
seqstatd(sequences.sts)
However, this gives me the state frequencies for each sequence within my sequence object. I want to access the frequencies of states across all sequences inside of my sequence object. How can I accomplish this?
I am not sure to understand your question since seqstatd() returns the cross-sectional frequencies at each successive position, and NOT the state frequencies for each sequence. The latter is returned by seqistatd().
Assuming you refer to the outcome of seqistatd() you would get the mean time spent in each state with seqmeant(sequence.sts).
For other summaries you can use the apply function. For instance, you get the variance of the time spent in each state with
tab <- seqistatd(mvad.seq)
vart <- apply(tab,2,var)
head(vart)
Hope this helps.

Index xts using string and return only observations at that exact time

I have an xts time series in R and am using the very handy function to subset the time series based on a string, for example
time_series["17/06/2006 12:00:00"]
This will return the nearest observation to that date/time - which is very handy in many situations. However, in this particular situation I only want to return the elements of the time series which are at that exact time. Is there a way to do this in xts using a nice date/time string like this?
In a more general case (I don't have this problem immediately now, but suspect I may run into it soon) - is it possible to extract the closest observation within a certain period of time? For example, the closest observation to the given date/time, assuming it is within 10 minutes of the given date/time - otherwise just discard that observation.
I suspect this more general case may require me writing a function to do this - which I am happy to do - I just wanted to check whether the more specific case (or the general case) was already catered for in xts.
AFAIK, the only way to do this is to use a subset that begins at the time you're interested in, then get the first observation of that.
e.g.
first(time_series["2006-06-17 12:00:00/2006-06-17 12:01"])
or, more generally, to get the 12:00 price every day, you can subset down to 1 minute of each day, then split by days and extract the first observation of each.
do.call(rbind, lapply(split(time_series["T12:00:00/T12:01"],'days'), first))
Here's a thread where Jeff (the xts author) contemplates adding the functionality you want
http://r.789695.n4.nabble.com/Find-first-trade-of-day-in-xts-object-td3598441.html#a3599887

Resources