Graphite WildCard Percentage Calculation - graphite

I am working on a graph which has servers in our infrastructure, each server has a metric1 and metric2, I need metric1 / metric2
My Grafana dashboard graph has
Row A : DC_Servers.*.Metric1
Row B: DC_Servers.*.Metric2
At this point I see all three servers and metric1 and metric2. How do I get the percentage, i.e. metric1 / metric2 on the same graph, given I have to use wild card to include all servers in DC?

This should give you Metric1 & of Metric2 for every series (so will generate % series for the number of servers you have)
reduceSeries(mapSeries(DC_Servers.*.{Metric1,Metric2},1),"asPercent",2,"Metric1","Metric2")
The functions reduceSeries and mapSeries are new in Graphite 1.0.0.

Related

How do I represent domain knowledge information with bnlearn

I am learning about Dynamic Bayesian Network models using the R package bnlearn. To this end, I am following this paper where they impose certain constraints in the form of 6 layers (Table 1 in the paper):
1 Gender, age at ALS onset
2 Onset site, onset delta (start of the trial - onset)
3 Riluzole intake, placebo/treatment
4 Variables at time t-1
5 Variables at time t, TSO
6 Survival
In this example, since gender and age are in the top layer they cannot be influenced by Riluzole intake but influence (or have a causal connection) Riluzole intake and ultimately survival. This guarantees acyclicality in the network, that is, we do not have non-ending feedback loops among the variables.
My question is, how can we model such prior knowledge using the R package bnlearn.
You can add domain knowledge or constraints to structure learning in a couple of ways.
If you want to specify the network structure and parameters using domain knowledge, you can build the network manually using custom.fit.
If you want to estimate the structure of the BN from data then you can impose constraints on edge direction & edge presence using the whitelist and blacklist parameters in the structure learning algorithms.
A prior can be placed on the edges in structure learning (e.g. prior="cs", where "If prior is cs, beta is a data frame with columns from, to and prob specifying the prior probability for a set of arcs. A uniform probability distribution is assumed for the remaining arcs."). There are other priors that can be used.

How to analyse an impulse response function with more than 2 variables?

I am running an impulse response function in R, using the package vars.
My data has 3 variables, the inflation (Brazilian CPI, or IPCA), the exchange rate and the output gap.
My goal is to calculate the exchange rate pass-through (both the maximum impact and the lag), and I am following and academic recommendation to add the output gap (as the monthly industrial production with HP filter).
The pass-through I am interested in is exchange rate -> CPI. The output gap is of my interest only in the way it impacts this pass-through relation. So I wrote the code as:
model_irf <- vars::irf(model_var,
impulse = "Exchange Rate",
response = "CPI",
n.ahead = 12,
cumulative = TRUE)
This gives me the expected response of variable “CPI” t+12 to a unit change in variable “Exchange Rate”.
I imagine (from macroeconomic theory) the output gap impacts the magnitude of the pass through, so in periods of larger output gap companies have less space to increase prices; relation that is not visible in this model I wrote.
My question is: How is the output gap related to the IRF I calculated? (Or if the model is wrong and I should write it differently to test this assumption)
Thank you very much for your time!

prop.test alternative statement usage

I am testing if a sending information to consumers about promotion convince them to buy anything. Out of 100k consumers we randomly selected 90% of them and sended them catalogs. After sometime we tracked who have bought.
To recreate the problem lets use:
set.seed(1)
got <- rbinom(n=100000, size=1, prob=0.1)
bought <- rbinom(n=100000, size=1, prob=0.05)
table(got, bought)
bought
got 0 1
0 85525 4448
1 9567 460
As I read on here I should use prop.test(table(got, bought), correct=FALSE) function, but i want to check not only if the proportions are equal, but if the proportion of those who bought during promotion, for the group who got the leaflet was greater then in those who didn't get it.
Should I use argument alternative = "less" or , alternative = "greater"? and dose the order or got and bought is impotent?
You usually want to use a two sided alternative (for all you know sending promotion annoys people and they are less likely to purchase).
prop.test is doing a chi square test which by definition does not look at which group is bigger.
You could do a t.test like this
t.test(bought ~ got, data = data.frame(got = got, bought = bought))
Depending on your typical conversion rate and sample size and alpha you can get confidence intervals implying negative conversion rates so a Bootstrapping or Bayesian approach may be better suited.

Artificially increasing training data for regression by Random Forest and Neural Networks

We are trying to predict sales quantity based on their attribute values .We have around 8000 records of data for training .Is it correct to increase training data by adding small variations to sales quantity for same 8000 records ?
I want to prepare new training set with 24000(3*8000) records with sales quantity + or - 0.1 for those 8000 records
Ex:like original data sales quantity=2 then new data will have 2,2.1 and 1.9 for same item.
The usefulness of variation depends on the scalar quantity of the attribute. For example, if your feature range is (0-100) adding +/- 0.1 is useless. If it is (0 < x < +1, just an example), then yes, the variations can make a good difference.
I think a better way would be to normalize your data (http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.normalize.html) then add the variations.
If you have categorical data, you can convert them to dummy variables if needed (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html).
Bad idea! Because intuitively it will not really help. It might just, unnecessarily, overfit the Random forest or NN model.

cluster many curves representing gas consumption

I have 700 hourly time series from 2010 to 2014 of gas consumption. One time serie represents the consumption of one companies.
Some have constant consumption, other consume only 4 months of the year and some have a high volatility consumption. As a consequence I would like to cluster them according to the shape of the consuption curve.
I tried the R package "kml", but i do not have good results. I also tried the "kmlShape" package, but it seems that i have too much data, and each time R quit..
I wondered if using Fast fourier transform and then cluster it could be a good idea? My goal is to really distinguish the group that the consumption is constant to those whose consumption is variable.
Then I would like to cluster the variable consummer in function of the peak and when they consumme.
I also tried to calculate the mean et variance of each clients, then cluster it with k-mean but it not very good, i can see 2 cluster, one with 650 clients and on other with 50...
thanks
first exemple`
2nd exemple
Here are three exemple of what I have, I have 700 curves likes that, some are high variables, some pretty constant.
I would like to cluster them according to their shape in order to have one group where the consumption is pretty constant, an other where the consumption is highly variable and try to cluster it according to the time the peak appear

Resources