Compute integration with OpentSDB and grafana - opentsdb

I have an instant power consumption stored in OpenTSDB
I would like to compute the daily power consumption which is the integration of all the instant power consumption. This could have been done using average if all sampling interval where identical , which is not the case...
The formula would be something like this:
Daily consumption= Sum of ( (Delta T) * InstantPower ) / 24
Delta T= Time between current consumption sample and last sample
InstantPower = current power consumption sample
Is it possible to do it through OpenTSDB (or Grafana)?

This is not possible using OpenTSDB. You are limited to the aggregation functions listed on the documentation.
Grafana does not add metric manipulation ability to OpenTSDB.
When working with time series, it is actually recommended to rather submit data as the integral (i.e. a monotinically increasing counter). OpenTSDB can then "differentiate" this using the rate function.

Related

Firebase Revenue AB testing algorithm

We have run an AB test at firebase which has the following results:
I was also building my own Bayesian AB-test suite and was wondering how they came to these conclusions.
What I was doing was querying the data of this test for the Control Group and Variant C:
Control Group: $11943 Revenue from 900 payers of 80491 users.
Variant C: $16487 Revenue from 894 payers of 80224 users.
I based my algorithm on this tool: https://vidogreg.shinyapps.io/bayes-arpu-test/. When I enter these inputs I get the following result:
This tool seems to be much more condident that Variant C is better than the control group then Firebase. It also seems like the Firebase distributions for Revenue per user are skewed while the Bayesian ARPU tool has very symmetrical distribution.
The code for the Bayesian ARPU tool is available. They used conjugate priors to get to these conclusions based on this paper:
https://cdn2.hubspot.net/hubfs/310840/VWO_SmartStats_technical_whitepaper.pdf
Can anyone help me out which results are the best?
I have found out what my problem was.
The first problem is that it has to be broken into two steps. As it is freemium app, most user do not pay. This means that these users do not give extra information for the distribution.
So,
We first need to find posterior distribitions for the payer percentage. This can be done as explained the paper I mentioned. In Python a function for the posterior distribition is this:
def binomial_rvar(successs, samples):
rvar = np.random.beta(1 + successes, 1 + (total - successes), samples)
return rvar
Secondly, of all payers, we want to get the revenue. The paper also describes how to do revenue, but they assume the revenue is exponentially distributed. This is not the case for our app. We have some users that spend insane amount of money on this app. If this user were to be in one of the groups, this method will immediately think it is the best.
What we can do is take the log of the pareto distributed samples, which will transform a pareto distbution into a exponential distribution. We first take the log of the user revenue and then sum all these together creating the "logsum" and count from how many users it came. We can then use the same approach as the paper uses. In Python this would be something like this:
def get_exponential_rvars(total_sum, users, samples):
r_var = 1. / np.random.gamma(users, 1 / (1 + total_sum), samples)
return r_var
We can now multiply both these r_var results, giving the final distribution for the revenue per user.

Is it possible to retrieve duration of less than 1 second from the Google Maps Distance Matrix API?

I'm trying to reduce uncertainty when using the Google Maps Distance Matrix API to extract journey time and distance data between a start and end node on the road network to calculate average speeds over fairly short distances (30m to 500m).
I am using the Python Googlemaps library
The standard journey time provided by the API is at 1 second i.e. integer resolution. Does anyone know if there is a command to extract the journey time at a finer temporal resolution of e.g 0.1 seconds when calling journey distance and duration data from the API?
According to the "unit systems" section in the API documentation,
duration: The length of time it takes to travel this route, expressed in seconds (the value field) and as text. The textual representation is localized according to the query's language parameter.
There are no Required or Optional parameters that can change this setting to return values expressed as fractions of a second.

How to multiply two series lists in Grafana / Graphite?

I have data in graphite in following format:
app.service.method_*.m1_rate (rate of calls per minute)
app.service.method_*.avg_time (avg response time per minute)
I would like to get graph with total estimated time given method is running per minute. In other words - multiply rate by avg time so I can learn from one graph what calls are taking most. If I can get it going I can then limit this (I know how :) ) to top N results of such multiplication.
Neither rate itself does not give me that information (high rate of very fast calls is not a problem) nor avg time (high average time on a service called once per 5 minutes is also not a problem).
Any suggestions?
This can be done with multiplySeriesWithWildcards
multiplySeriesWithWildcards(app.service.method_*.{m1_rate,avg_time}, 3)
May be multiplySeries could help you.

50hz Sine lookup table using PWM

Can someone please guide me how to generate lookup table for generating 50 hz sine wave using PWM in Atmega32.
This is what i have done so far but confused of what to do.
50 Hz sine wave so 20 ms time period
256 samples (No. of divisions)
step i need to increase = 20 ms/256 = 0.078125 ms (Period of PWM signal)
angle step rate = 360/256 = 1.40625
Amplitude of sine wave should be 1.
I think you are starting from the wrong end and getting lost because of that.
Ignoring the lookup table, can you generate a 50 Hz PWM signal usign explicit calls to sin() ? Good. Now the lookup table saves you those expensive sin calls. sin is a periodic function, so you need to store only one period (*). How many points that are depends on your digital output frequency, which is going to be much more than 50 Hz. How much more defines the number of points in your lookup table.
To fill your lookup table, you don't send the result of your PWM function to the digital output but youwrite it to the lookup table. To use the lookup table, you don't call the expensive function but you just copy the table entries straight to your output.
There is one common optimization: A since function has a lot of repetition. You don't need to store the send half, that's just the inverse of the first half, and the second quarter is just the first quarter mirrored.

Getting accurate graphite stats_counts

We have etsy/statsd node application running that flushes stats to carbon/whisper every 10 seconds. If you send 100 increments (counts), in the first 10 seconds, graphite displays them properly, like:
localhost:3000/render?from=-20min&target=stats_counts.test.count&format=json
[{"target": "stats_counts.test.count", "datapoints": [
[0.0, 1372951380], [0.0, 1372951440], ...
[0.0, 1372952460], [100.0, 1372952520]]}]
However, 10 seconds later, and this number falls to 0, null and or 33.3. Eventually it settles at a value 1/6th of the initial number of increments, in this case 16.6.
/opt/graphite/conf/storage-schemas.conf is:
[sixty_secs_for_1_days_then_15m_for_a_month]
pattern = .*
retentions = 10s:10m,1m:1d,15m:30d
I would like to get accurate counts, is graphite averaging the data over the 60 second windows rather than summing it perhaps? Using the integral function, after some time has passed, obviously gives:
localhost:3000/render?from=-20min&target=integral(stats_counts.test.count)&format=json
[{"target": "stats_counts.test.count", "datapoints": [
[0.0, 1372951380], [16.6, 1372951440], ...
[16.6, 1372952460], [16.6, 1372952520]]}]
Graphite data storage
Graphite manages the retention of data using a combination of the settings stored in storage-schemas.conf and storage-aggregation.conf. I see that your retention policy (the snippet from your storage-schemas.conf) is telling Graphite to only store 1 data point for it's highest resolution (e.g.10s:10m) and that it should manage the aggregation of those data points as the data ages and moves into the older intervals (with the lower resolution defined - e.g. 1m:1d). In your case, the data crosses into the next retention interval at 10 minutes, and after 10 minutes the data will roll up according the settings in the storage-aggregation.conf.
Aggregation / Downsampling
Aggregation/downsampling happens when data ages and falls into a time interval that has lower resolution retention specified. In your case, you'll have been storing 1 data point for each 10 second interval but once that data is over 10 minutes old graphite now will store the data as 1 data point for a 1 minute interval. This means you must tell graphite how it should take the 10 second data points (of which you have 6 for the minute) and aggregate them into 1 data point for the entire minute. Should it average? Should it sum? Depending on the type of data (e.g. timing, counter) this can make a big difference, as you hinted at in your post.
By default graphite will average data as it aggregates into lower resolution data. Using average to perform the aggregation makes sense when applied to timer (and even gauge) data. That said, you are dealing with counters so you'll want to sum.
For example, in storage-aggregation.conf:
[count]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum
UI (and raw data) aggregation / downsampling
It is also important to understand how the aggregated/downsampled data is represented when viewing a graph or looking at raw (json) data for different time periods, as the data retention schema thresholds directly impact the graphs. In your case you are querying render?from=-20min which crosses your 10s:10m boundary.
Graphite will display (and perform realtime downsampling of) data according to the lowest-resolution precision defined. Stated another way, it means if you graph data that spans one or more retention intervals you will get rollups accordingly. An example will help (assuming the retention of: retentions = 10s:10m,1m:1d,15m:30d)
Any graph with data no older than the last 10 minutes will be displaying 10 second aggregations. When you cross the 10 minute threshold, you will begin seeing 1 minute worth of count data rolled up according to the policy set in the storage-aggregation.conf.
Summary / tldr;
Because you are graphing/querying for 20 minutes worth of data (e.g. render?from=-20min) you are definitely falling into a lower precision storage setting (i.e. 10s:10m,1m:1d,15m:30d) which means that aggregation is occurring according to your aggregation policy. You should confirm that you are using sum for the correct pattern in the storage-aggregation.conf file. Additionally, you can shorten the graph/query time range to less than 10min which would avoid the dynamic rollup.

Resources