Graphite - using movingAverage with groupByNode - graphite

Is there any way to use movingAverage as the callback to groupByNode with a bound 2nd argument of something like 5. Or if not at least achieve the same result?
So when using a query like this:
groupByNode(some.query.* , 2, "avg")
I'd like to replace "avg" with something that calls movingAverage with the results of the groupByNode as the first argument and 5 as the 2nd one.

The result of groupByNode(movingAverage(some.query, 5), 2, "avg") should be the value that you're after, since it'll be the average of the 5-minute moving averages of the individual matching series.

Related

Partitioning Data in 'R' based on data size

I'm currently working on a program that analyzes leaf area and compares that to the position of the leaf within the cluster (i.e. is it the first leaf, 3rd, last. etc.) and am analyzing the relationship between the position, area, mass, and more. I have a database of approximately 5,000 leaves, and 1,000 clusters and that's where the problem arises.
Clusters come in different numbers, most have 5 leaves, but some have 2, 8, or anywhere in-between. I need a way to separate the clusters by number in the cluster so that the program isn't treating clusters with 3 leaves the same as clusters with 7. My .csv has each leaf individually entered so simply manually input different sets isn't possible.
I'm rather new at 'R' so I might be missing an obvious skill here but any help would be greatly appreciated. I also understand this is rather confusing so please feel free to reply with clarifying questions.
Thanks in advance.
If I understand the question correctly, it sounds like you want to calculate things based on some defined group (in your case clusterPosition?). One way to do this with dplyr is to use group_by with summarize or mutate. The later keeps all the rows in your original data set and simply adds a new column to it, the former aggregates like rows and returns a summary statistic for each unique grouped variable.
As an example, if your data looks something like this:
df <- data.frame(leafArea = c(2.0, 3.0, 4.0, 5.0, 6.0),
cluster = c(1, 2, 1, 2, 3), clusterPosition = c(1, 1, 2, 2, 1))
To get the mean and standard deviation for each unique clusterPosition you would do something like the below, this returns one row for each unique clusterPosition.
library(dplyr)
df %>% group_by(clusterPosition) %>% summarize(meanArea = mean(leafArea), sdArea = sd(leafArea))
If you want to compare each unique leaf to some characteristic of it's clusterPosition, ie you want to preserve all the individual rows in your original dataset, you can use mutate instead of summarize.
library(dplyr)
df %>% group_by(clusterPosition) %>% mutate(meanPositionArea = mean(leafArea), diffMean = leafArea - meanPositionArea)

limit data in aggregate function

Is there a way to limit the aggregated data?
An example:
aggregate(cars$speed, FUN=mean, by=list(cars$dist), data=cars)
gives exactly the same output as:
aggregate(cars$speed, FUN=mean, by=list(cars$dist), data=cars[cars$speed >= 15, ])
In this case there are only two variables but in my case I want to limit the data by a third variable. Is this possible within the aggregate function or is a new dataframe necessary?
Thanks a lot.

How can I apply a user defined function to every column of a data frame

I am trying to use the count() function from the dplyr package on every column of my data frame to count the number of each value per column in my df.
I tried :
apply(df, 2, function(x){count_(df,X[1])})
However, it does not work. If I do
apply(df, 2, function(x){count_(df,"one of my column's name")})
it only applies it to that column.
How can I apply it to every column in my data frame ?
How about the following:
apply(df, 2, table)
Probably a lot of elegant ways to do this, but try this:
zz=apply( iris , 2 , function(x) { table(x) })
zz will be a list of occurrence counts, one list item for each column. You could merge it all together if you want from there.
Edit: I just noticed someone above did the same with just "table" instead of the function definition. Both will work, I always use a function since I end up messing around with it a bit.

How do I generate row-specific means in a data frame?

I'm looking to generate means of ratings as a new variable/column in a data frame. Currently every method I've tried either generates columns that show the mean of the entire dataset (for the chosen items) or don't generate means at all. Using the rowMeans function doesn't work as I'm not looking for a mean of every value in a row, just a mean that reflects the chosen values in a given row. So for example, I'm looking for the mean of 10 ratings:
fun <- mean(T1.1,T2.1,T3.1,T4.1,T5.1,T6.1,T7.1,T8.1,T9.1,T10.1, trim = 0, na.rm = TRUE)
I want a different mean printed for every row because each row represents a different set of observations (a different subject, in my case). The issues I'm looking to correct with this are twofold: 1) it generates only one mean, the mean of all values for each of the 10 variables, and 2) this vector is not a part of the dataframe. I tried to generate a new column in the dataframe by using "exp$fun" but that just creates a column whose every value (for every row) is the grand mean. Could anyone advise as to how to program this sort of row-based mean? I'm sure it's simple enough but I haven't been able to figure it out through Googling or trawling StackOverflow.
Thanks!
It's hard to figure out an answer without a reproducible example but have you tried subsetting your dataset to only include the 10 columns from which you'd like to derive your means and then using an apply statement? Something along the lines of apply(df, 1, mean) where the first argument refers to your dataframe, the second argument specifies whether to perform a function by rows (1) or columns (2), and the third argument specifies the function you wish to apply?

How does the function argument work in R's 'combn'?

Despite reading the documentation, I'm struggling to understand how the function argument works in the combn utility.
I have a table with two columns of data, for each column, I want to calculate the ratio of each unique combination of data pairs in that column. Let's just focus on one column for simplicity:
V1
1 342.3
2 123.5
3 472.0
4 678.3
...
14 567.2
I can use the following to return all the unique combinations:
combn(table[,1], 2)
but of course this just returns each pair of values. I want to divide them to get a ratio, but can't seem to figure out how to set this up.
I understand that for something like outer, for example, you can just provide the operator as the argument but how does this transfer to combn?
combn(table[,1], 2, FUN = "/")
# obviously not correct
The issue is that the function will receive exactly one parameter. And that parameter will be vector of the elements in that particular set. The / function require two separate parameters, not a single vector of values. Instead you could write
combn(table[,1], 2, FUN = function(x) x[1]/x[2])
So here we get one parameter x and we divide the first value by the second.
Other functions such as
combn(1:4, 2, FUN = sum)
work just fine because they expect to receive a single vector of values.

Resources