Graphite: group by node fragment

Graphite: group by node fragment - graphite

If I have metrics named:
statsite.gauges.a-ABC-1.thing
statsite.gauges.a-ABC-2.thing
statsite.gauges.a-CBA-1.thing
Is it possible to group these metrics by a particular fragment, for instance:
statsite.gauges.a-{groupByThisPart}-*.thing
So that I can feed them into another function such as sumSeries.

This is possible by using aliasSub to convert the '-' into '.', as follows, apply:
aliasByNode(seriesName, 2)
which outputs 'a-CBA-1'. Then apply:
aliasSub(seriesName, \d{4})-(\d{4})-(\w{5}, \1.\2.\3)
which outputs 'a.CBA.1'.
Then you can use groupByNode to sum all the parts for the 2nd fragment.
groupByNode(seriesName, 1, sum)

Every series matched by the expression you use will be rendered separately. So if you do:
statsite.gauges.a-*-*.thing
All series matching that pattern will be displayed. There are some functions like sumSeriesWithWildcards that your can use to perform the aggregation only with respect to a certain position, but positions are delimited by dots, so I don't think you can do what you want with Graphite.
I believe the best option is to rename your metrics so you separate every part you'd like to group by by dots.

Related

Is there an R function to remove repetition within observation?

I have a large dataset that contains one column called "TYPE_DESCRIPTION" that describes the type of activity of each observation.
However, the raw dataset that I obtained somehow may contain more than one repetition of the same activity within the "TYPE_DESCRIPTION" column.
Let's say for one observation, the activity (or value) shown within the "TYPE_DESCRIPTION" column can contain "Walking, Walking, Walking, Walking", instead of just "Walking". How do I remove the repetition of "Walking" within that column so I only have the value once?
I have tried the distinct() function, but it defines the "Walking, Walking, Walking, Walking" as one unique value. Whereas what I want is just "Walking".
This became a problem when later I want to add a new column using mutate() that groups the activity into higher order and write "Walking" in the codes. Since I only write "Walking" on the code, it does not recognize the variation of 'Walking' with different repetition and put it under different category that I need it to be.
Thanks.

in Base R:
transform(df, uniq=sapply(strsplit(TYPE_DESCRIPTION, ', ?'), \(x)toString(unique(x))))
TYPE_DESCRIPTION uniq
1 Walking,Walking, Walking, Walking Walking
2 Running, Walking Running, Walking

Naming data frames in lists using a sequence

I have a rather simple question. So I have a list, and I want to name the data frames in the list according to a sequence. Right now I have a sequence that increase according to one letter per list (explained below):
nm1 <- paste0("Results_Comparison_",LETTERS[seq_along(Model_comparisons)])
This creates "Results_Comparison_A", "Results_Comparison_B", "Results_Comparison_C", "Results_Comparison_D", etc. What I want is it for it to be a number instead of a letter. (i.e. Results_Comparison_1, Results_Comparison_2, Results_Comparison_3, etc.) Does anyone know how I could change this? If extra information is needed let me know!

This should work paste0("Results_Comparison_",seq_along(Model_comparisons))

Regex in R match specified words when they all (two or more) occur in whatever order within certain distance in particular line

I have a double challenge.
First, I want to match lines that contain two (or eventually more) specified words within certain distance in whatever order.
Using lookaround I manage to select lines matching two or more words, regardless of the order within they occur. I can also easily add more words to be found in the same line, so it this can also be applied without much effort when more word must occur in order to be selected. The disadvantage is that can't detail the maximal distance between them.
^(?=.*\john)(?=.*\jack).*$
By using the pipe operator I can detail both orders in which the terms may occur as well as the accepted distance between them, but when more words should be matched the code becomes lengthy and errorsensitive.
jack.{0,100}john|john.{0,100}jack
Is there a way to combine the respective advantages of both approaches in one regular expression?
Second, ideally I would like that only 'jack' and 'john' (and are selected in the line but not the whole line.
Is there a possibility to do this all at once?

For this case, you have to use the second approach. But it can't be possible with regex alone.. You have to ask for language tools help like paste in-order to build a regex (given in the second format).
In python, I would do like below to create a long regex.
>>> def create_reg(lis):
out = []
for i in lis:
out.append(''.join(i) + '|' + ''.join([i[2],i[1], i[0]]))
return '(?:' + '|'.join(out) + ')'
>>> lst = [('john', '{0,100}', 'jack'), ('foo', '{0,100}', 'bar')]
>>> create_reg(lst)
'(?:john{0,100}jack|jack{0,100}john|foo{0,100}bar|bar{0,100}foo)'
>>>

Transform graphite metric name

I'm trying to use grafana's world map plugin, which requires a certain form for it's metric names: DE, FR etc.
I don't have those metrics available in my graphite data and I don't have control over it, but I do have urls available e.g. www.foo.de, www.foo.fr.
Is there a way to transform a metric name i.e take the last two characters before using it?

The answer is the aliasSub function which can do a regex replace.
I used this in combination with aliasByNode to replace the parts of the url I didn't need e.g.:
aliasByNode(aliasSub(xxx.yyy.zzz.www_foo_fr.aaa.bbb, 'www_foo_', ''), 4)

How to add a column to a dataframe in lapply

I have two separate dataframe, one (frame1) has the general info about the location of sensors and the other one (frame2) has the time series for all the locations with the siteIDs column common between the two.
I want to add another column to frame2. I thought it would be possible to use lapply, but it is not working. I have also tried using [[ instead of $, no gain. It does not produce any warning or error. It simply does not do anything.
gaugeList<-as.list(unique(frame2$siteIDs))
frame2[['timeZone']]<-as.character(NA)
lapply(gaugeList,function(gaugeX) { frame2$timeZone[which(frame2$siteIDs==gaugeX)] <- (as.character(frame1$timeZone[which (frame1$siteIDs==gaugeX)]))})