Is it possible to aggregate data with varying nesting depth in Grafana? - graphite

I have data in Grafana with different nesting depths. It looks like this (the nesting depth differs depending on the message type):
foo.<host>.type.<type-id>
foo.<host>.type.<type-id>.<subtype-id>
foo.<host>.type.<type-id>.<subtype-id>.<more-nesting>
...
The <host> field can be the IP of the server sending the data and <type-id> is the type of message that it handled. There are quite a lot of message types but for the visualization I am only interested in the first level of <type-id> aggregated over all hosts.
For example, if I have this data:
foo.ip1.type.type1 = 3
foo.ip1.type.type2.subtype1 = 5
foo.ip2.type.type1 = 4
foo.ip2.type.type2.subtype1 = 9
foo.ip2.type.type2.subtype2 = 13
I would rather see it like this:
foo.*.type.type1 = 7 (3+4)
foo.*.type.type2 = 27 (5+9+13)
Then it would be easier to produce a graph where you can see which types of messages are most frequent.
I have not found a way to express that in Grafana. The only option that I see is to create a graph by manually creating queries for each message type. If there were only a handful of types that would be OK, but in my example, the number of types is quite high and even worse, they can change over time. When new message types are added, I would like to see them without having to change the graph.
Does Grafana support to aggregate the data in such a way? Can it visualize the data aggregated by one node and while summing up everything that comes after the node (like the --max-depth option in the Unix du command)?
I am not very experienced with Grafana, but I am starting to believe this functionality is not supported. Not sure whether Grafana allows to preprocess the data, but if the data could be transformed to
foo.ip1.type.type1 = 3
foo.ip1.type.type2_subtype1 = 5
foo.ip2.type.type1 = 4
foo.ip2.type.type2_subtype1 = 9
foo.ip2.type.type2_subtype2 = 13
it would also be valid workaround as the number of subtypes in very low in my data (often there is even only one subtype).

I think the groupByNode function might be useful to you. By doing something like:
groupByNode(foo.ip1.type.*.*,3,"sumSeries")
You'll need to repeat this for each level of nesting. Hope that helps.
More information is available here:
http://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.groupByNode
If you want to do it the way you alluded to in your example you could use aliasSub

Related

How to include all elements of a vector in an exams2moodle or exams2pdf output?

I am working on a simple code to find the square root of the following elements:
dat <- c(4,9,16,25,36,49,64,81,100,121,144,169,196,225)
num<-sample(dat ,1,replace=F)
Parts of the seed file are configured like this:
examen01<-c("SinRad.Rmd")
semilla<-sample(100:1000, 1)
set.seed(semilla)
exams2moodle(examen01,n=14,svg=TRUE,name="SinRadConRad",
encoding="UTF-8",dir="salida",edir="ejercicios",
mchoice = list(shuffle = TRUE,answernumbering = "ABCD",
solution = FALSE,
eval = list(partial = TRUE,rule = "none")))
I intend that, with n=14, all the responses to the options found in the "dat" vector are included, but I see that there are responses that are repeated.
How to achieve 14 answers for the 14 possibilities, without repeating or missing any?
Thank you very much
The exams2xyz() functions have been written to draw large numbers of random variations from sets of exercises. There is no dedicated functionality that draws a small number of deterministic variations. So in your case I would just draw, say, a hundred variations from the exercise template even if it can only yield 14 distinct versions. Sure, this wastes a bit of memory but not so much that I would worry about this.
Having said that, it is possible to set up a temporary file with a specific version of an exercise by using the expar() function. For example, expar("SinRad.Rmd", num = 4) would yield an exercise where the num parameter has been fixed to 4. Then in the same way you can cycle through the other 13 numbers you want. In the following post we also provide an expargrid() function that does this for all possible combinationso of parameters: Making deterministic versions of a parametrized question
Then you can run exams2moodle() on the resulting 14 deterministic exercise files.

summary of row of numbers in R

I just hope to learn how to make a simple statistical summary of the random numbers fra row 1 to 5 in R. (as shown in picture).
And then assign these rows to a single variable.
enter image description here
Hope you can help!
When you type something like 3 on a single line and ask R to "run" it, it doesn't store that anywhere -- it just evaluates it, meaning that it tries to make sense out of whatever you've typed (such as 3, or 2+1, or sqrt(9), all of which would return the same value) and then it more or less evaporates. You can think of your lines 1 through 5 as behaving like you've used a handheld scientific calculator; once you type something like 300 / 100 into such a calculator, it just shows you a 3, and then after you have executed another computation, that 3 is more or less permanently gone.
To do something with your data, you need to do one of two things: either store it into your environment somehow, or to "pipe" your data directly into a useful function.
In your question, you used this script:
1
3
2
7
6
summary()
I don't think it's possible to repair this strategy in the way that you're hoping -- and if it is possible, it's not quite the "right" approach. By typing the numbers on individual lines, you've structured them so that they'll evaluate individually and then evaporate. In order to run the summary() function on those numbers, you will need to bind them together inside a single vector somehow, then feed that vector into summary(). The "store it" approach would be
my_vector <- c(1, 3, 7, 2, 6)
summary(my_vector)
The importance isn't actually the parentheses; it's the function c(), which stands for concatenate, and instructs R to treat those 5 numbers as a collective object (i.e. a vector). We then pass that single object into my_vector().
To use the "piping" approach and avoid having to store something in the environment, you can do this instead (requires R 4.1.0+):
c(1, 3, 7, 2, 6) |> summary()
Note again that the use of c() is required, because we need to bind the five numbers together first. If you have an older version of R, you can get a slightly different pipe operator from the magrittr library instead that will work the same way. The point is that this "binding" part of the process is an essential part that can't be skipped.
Now, the crux of your question: presumably, your data doesn't really look like the example you used. Most likely, it's in some separate .csv file or something like that; if not, hopefully it is easy to get it into that format. Assuming this is true, this means that R will actually be able to do the heavy lifting for you in terms of formatting your data.
As a very simple example, let's say I have a plain text file, my_example.txt, whose contents are
1
3
7
2
6
In this case, I can ask R to parse this file for me. Assuming you're using RStudio, the simplest way to do this is to use the File -> Import Dataset part of the GUI. There are various options dealing with things such as headers, separators, and so forth, but I can't say much meaningful about what you'd need to do there without seeing your actual dataset.
When I import that file, I notice that it does two things in my R console:
my_example <- read.table(...)
View(my_example)
The first line stores an object (called a "data frame" in this case) in my environment; the second shows a nice view of how it's rendered. To get the summary I wanted, I just need to extract the vector of numbers I want, which I see from the view is called V1, which I can do with summary(my_example$V1).
This example is probably not helpful for your actual data set, because there are so many variations on the theme here, but the theme itself is important: point R at a file, as it to render an object, then work with that object. That's the approach I'd recommend instead of typing data as lines within an R script, as it's much faster and less error-prone.
Hopefully this will get you pointed in the right direction in terms of getting your data into R and working with it.

How to source statments with neoj4 graph database?

I'm looking forward to use NeoJ4 for some researchs. However I have to check if it can do what I want first.
I would like to build a graph that says :
StatementID1 = Cannabidiol hasPositiveEffectOn ChronicPain
StatementID1 isSupportedBy Study1
StatementID1 isSupportedBy Study2
StatementID1 isNotSupportedBy Study3
This is easy to add key:Value properties to a relationship in NeoJ4.
The difficulty is that I want Study1,2,3 to be nodes. So that these can have them own set of properties.
This can be done in a triplestore where each triple has an ID like "Statment1" here. This is a matter of adding triples where the object is another triple ID.
url:TripleID1 = url:Cannabidiol url:hasPositiveEffectOn url:ChronicPain
url:TripleID2 = url:TripleID1 url:isSupportedBy url:Study1
url:TripleID3 = url:TripleID1 url:isSupportedBy url:Study2
url:TripleID4 = url:TripleID1 url:isNotSupportedBy url:Study3
For the moment I can't find how to do it simplely in NeoJ4.
I could add the DOI of the study as a property :
Study 1 :
DOI:123/123
Then add the same DOI in the link :
isSupportedBy:
DOI:123/123
Since the DOI is unique it could be possible to make some searchs. However this will make things much more complex.
Is there a simpler method to achieve that?
I suppose this is a database design issue.
Would a node/relationship model something like the following fit your data and make your queries easy?
Neo4j doesn’t support edges going from an edge to a node. Edges are always between nodes. So you’ll have to convert your positiveEffect edge to a node (as proposed in rickhg12hs’s answer) or model the positiveEffect as a non-edge (as you yourself proposed).

Multiplying two complex SeriesLists by a specific node

Let's say I have 4 different Serieslists:
foo.total
foo.succesful
bar.total
bar.succesful
I have generated a complex graphite query for both of them, so it goes like
function1(function2(foo.total))
function3(function4(foo.succesful))
I want to multiply these by each other. Well, that's not very difficult:
multiplySeries(function1(function2(foo.total)),function3(function4(foo.succesful)))
This draws one graph and works as intended.
The problem I am facing when trying to wildcard the foo-part, so I can do *.total. In this case I want to draw 2 graphs, because there are 2 wildcarded variables.
So my question is, how can I generalize the above query to not only work with foo but with n number of variables?
Thank you!
You need some kind of template function, the only one graphite provides is applyByNode and it should be sufficient in your case:
applyByNode(
*.{total,succesful},
'multiplySeries(f1(f2(%.total)), f3(f4(%.succesful)))',
'% some line'
)

Creating New Variables in R that relate to

I have 7 different variable in an excel spreadsheet that I have imported into R. They each are columns with a size of 3331. They are:
'Tribe' - there are 8 of them
'Month' - when the sampling was carried out
'Year' - the year when the sampling was carried out
'ID" - an identifier for each snail
'Weight' - weight of a snail in grams
'Length' - length of a snail shell in millimetres
'Width' - width of a snail shell in millimetres
This is a case where 8 different tribes have been asked to record data on a suspected endangered species of snail to see if they are getting rarer, or changing in size or weight.
This happened at different frequencies between 1993 and 1998.
I would like to know how to be able to create a new variables to the data so that if I entered names(Snails) # then it would list the 7 given variables plus any added variable that I have.
The dataset is limited to the point where I would like to add new variables. Such as, knowing the counts per month of snails in any given month.
This would rely on me using - Tribe,Month,Year and ID. Where if an ID (snail identifier) was were listed according to the rates in any given month then I would be able to sum them to see if there are any changes in counts. I have tried:
count=c(Tribe,Year,Month,ID)
count
But, after doing things like that, R just has a large list of that is 4X the size of the dataset. I would like to be able to create a given new variable that is of column size n=3331.
Or maybe I would like to create a simpler variable so I can see if a tribe collected at any given month. I don't know how I can do this.
I have looked at other forums and searched but, there is nothing that I can see that helps me in my case. I appreciate any help. Thanks
I'm guessing you need to organise your variables in a single structure, such as a data.frame.
See ?data.frame for the help file.
To get you started, you could do something like:
snails <- data.frame(Tribe,Year,Month,ID)
snails
# or for just the first few rows
head(snails)
Then this would have your data looking similar to your Excel file like:
Tribe Year Month ID
1 1 1 1 a
2 2 2 2 b
3 3 3 3 c
<<etc>>
Then if you do names(snails) it will list out your column names.
You could possibly avoid some of this mucking about by just importing your Excel file either directly from Excel, or saving as a csv (comma separated values) file first and then using read.csv("name_of_your_file.csv")
See http://www.statmethods.net/input/importingdata.html for some more specifics on this.
To tabulate your data, you can do things like...
table(snails$Tribe)
...to see the number of snail records collected by each tribe. Or...
table(snails$Tribe,snails$Year)
...to see the trends in each tribe by each year. The $ character will let you access the named variable (column) inside a data.frame in the same way you are currently using the free floating variables. This might seem like more work initially, but it will pay off greatly when you need to do some more involved analysis.
Take for example if you want to only analyse the weights from tribe "1", you could do:
snails$Weight[snails$Tribe==1]
# mean of these weights
mean(snails$Weight[snails$Tribe==1])
There are a lot more things I could explain but you would probably be better served by reading an excellent website like Quick-R here: http://www.statmethods.net/management/index.html to get you doing some more advanced analysis and plotting.

Resources