Transform graphite metric name - graphite

I'm trying to use grafana's world map plugin, which requires a certain form for it's metric names: DE, FR etc.
I don't have those metrics available in my graphite data and I don't have control over it, but I do have urls available e.g. www.foo.de, www.foo.fr.
Is there a way to transform a metric name i.e take the last two characters before using it?

The answer is the aliasSub function which can do a regex replace.
I used this in combination with aliasByNode to replace the parts of the url I didn't need e.g.:
aliasByNode(aliasSub(xxx.yyy.zzz.www_foo_fr.aaa.bbb, 'www_foo_', ''), 4)

Related

Generate Splunk report with only extracted fields

First and foremost, maybe what I am looking for isn’t possible or I am going down the wrong path. Please suggest.
Consider, I’ve raw data which has n number of parameters each separated by ‘&’.
Id=1234&ACC=bc3gds5&X=TESTX&Y=456567&Z=4457656&M=TESTM&N=TESTN&P=5ec3a
Using SPL, I’ve filtered only a few fields(ACC, X, Y) which I’m interested in. Now, I would like to generate the report only with the filtered fields in a tabular format, not the whole raw data.
There may be more than one way to do that, but I like to use rex. The rex command extracts text that matches a regular expression into fields. Once you have the fields you use SPL on them to do whatever you need.
index=foo
| rex "ACC=(?<ACC>[^&]+)&X=(?<X>[^&]+)&Y=(?<Y>[^&]+)"
| table ACC X Y

Can I extract all matches with functions like aregexec?

I've been enjoying the powerful function aregexec that allows me to mine strings in a fuzzy way.
For that I can search for a string of nucleotide "ATGGCTTCGTC" within a DNA section with defined allowance of insertion, deletion and substitute.
However, it only show me the first match without finishing the whole string. For example,
If I run
aregexec("a","adfasdfasdfaa")
only the first "a" will show up from the result. I'd like to see all the matches.
I wonder if there are other more powerful functions or a argument to be added to this one.
Thank you very much.
P.S. I explained the fuzzy search poorly. I mean, the match doesn't have to be perfect. Say if I allow an substitution of one character, and search AATTGG in ctagtactaAATGGGatctgct, the capital part will be considered a match. I can similarly allow insertions and deletions of certain characters.
gregexpr will show every time there is the pattern in the string, like in this example.
gregexpr("as","adfasdfasdfaa")
There are many more information if you use ?grep in R, it will explain every aspect of using regex.

name to twitter user name using R

I have a list of local election candidates and I would like to find out
(i) if these individuals have a twitter account
(ii) if so what are their screen names/ user names.
search_users seemed to be the best option but it does not do a good job. Here is an example:
y1 <- search_users(q="suleyman kilinc", n=5, parse=TRUE)
This gives me a list of 5 users and non of them is the one that I am looking for. This is often the case. But when I do the same search on Google with the key words "suleyman+kilinc+twitter", the first option that Google offers is what I exactly need. This is true for 95% of the random names that I manually searched. Is there a good way to automatize the name to user name search through R or a better option than search_users function.
Any help is appreciated.
It is a very interesting question. the q parameter accepts a string as indicated above. When you pass a word with space as a value of q then you are instructing the function to search for "suleyman" & "kilinc" hence "suleyman kilinc" is the same as "suleyman AND kilinc". The REST API for twitter in this case will return any user with both "suleyman" and "kilinc" irregardless of the order.

Graphite: group by node fragment

If I have metrics named:
statsite.gauges.a-ABC-1.thing
statsite.gauges.a-ABC-2.thing
statsite.gauges.a-CBA-1.thing
Is it possible to group these metrics by a particular fragment, for instance:
statsite.gauges.a-{groupByThisPart}-*.thing
So that I can feed them into another function such as sumSeries.
This is possible by using aliasSub to convert the '-' into '.', as follows, apply:
aliasByNode(seriesName, 2)
which outputs 'a-CBA-1'. Then apply:
aliasSub(seriesName, \d{4})-(\d{4})-(\w{5}, \1.\2.\3)
which outputs 'a.CBA.1'.
Then you can use groupByNode to sum all the parts for the 2nd fragment.
groupByNode(seriesName, 1, sum)
Every series matched by the expression you use will be rendered separately. So if you do:
statsite.gauges.a-*-*.thing
All series matching that pattern will be displayed. There are some functions like sumSeriesWithWildcards that your can use to perform the aggregation only with respect to a certain position, but positions are delimited by dots, so I don't think you can do what you want with Graphite.
I believe the best option is to rename your metrics so you separate every part you'd like to group by by dots.

How to replace english abbreviated form to their dictionary form

I'm working on a system to analyze texts in english: I use stanford-core nlp to make sentences from whole documents and to make tokens from sentences. I also use the maxent tagger to get tokens pos tags.
Now, considering that I use this corpus to build a supervised classifier, it would be good if I could replace any word like 're, 's, havin, sayin', etc. to its standard form(are, is, having, saying). I've been searching for some english dictionary file, but I don't know how to use it. There are so many distinct cases to consider that I don't think it's an easy task to realize: is there some similar work or whole project that I could use?
Ideas:
I) use string edit distance on a subset of your text and try to match words that do not exist in the dictionary using edit distance against existing words in the dictionary.
II) The key feature of lots of those examples you have is that they are only 1 character different from the correct spelling. So, I suggest for those words that you fail to match with a dictionary entry, try and add all english characters to the front or back and lookup the resulting word in a dictionary. This is very expensive in the beginning but if you keep track of those misspellings in a lookup table (re -> are) at some point you will have 99.99% of the common misspellings (or whatever you call them) in your lookup table with their actual correct spelling.
III) Train a word-level 2-gram or 3-gram language model on proper and clean english text (i.e. newspaper articles), then run it over the entire corpus that you have and see for those words that your language model considers as unknown words (which means it hasn't seen them in training phase), what is the highest probable word according to the language model. Most probably the language model top-10 prediction will be the correct spelled word.

Resources