I have data in graphite in following format:
app.service.method_*.m1_rate (rate of calls per minute)
app.service.method_*.avg_time (avg response time per minute)
I would like to get graph with total estimated time given method is running per minute. In other words - multiply rate by avg time so I can learn from one graph what calls are taking most. If I can get it going I can then limit this (I know how :) ) to top N results of such multiplication.
Neither rate itself does not give me that information (high rate of very fast calls is not a problem) nor avg time (high average time on a service called once per 5 minutes is also not a problem).
Any suggestions?
This can be done with multiplySeriesWithWildcards
multiplySeriesWithWildcards(app.service.method_*.{m1_rate,avg_time}, 3)
May be multiplySeries could help you.
Related
I have a counter, which named,for example "mysvr.method_name1" with 3 tagk/v.It's a counter type of openTSDB which means query times in my situation.How can I get the accumulate value of it in past 30 days(in my situation, total requests number in 30 days).
I use q method like below:
q("sum:mysvr.method_name1{tag1=v1}", "1590940800", "1593532800")
but it looks like the number series not monotone increasing due to server restart, missing tagk/v or some other reasons.
So it's seam like the below query will not meet my requirement:
diff(q("sum:mysvr.method_name1{tag1=v1}", "1590940800", "1593532800"))
how shall I do to fetch the accumulate value for counter in the given time period?
The only thing I can make sure is the below is mean QPS in my situation:
avg(q("sum:rate{counter}:mysvr.method_name1{tag1=v1}", "1590940800", "1593532800"))
sum(q("sum:rate{counter}:mysvr.method_name1{tag1=v1}", "1590940800", "1593532800"))
works for my situation,the gap is multiply by sample time duration which in my situation is 30 seconds.
I have an instant power consumption stored in OpenTSDB
I would like to compute the daily power consumption which is the integration of all the instant power consumption. This could have been done using average if all sampling interval where identical , which is not the case...
The formula would be something like this:
Daily consumption= Sum of ( (Delta T) * InstantPower ) / 24
Delta T= Time between current consumption sample and last sample
InstantPower = current power consumption sample
Is it possible to do it through OpenTSDB (or Grafana)?
This is not possible using OpenTSDB. You are limited to the aggregation functions listed on the documentation.
Grafana does not add metric manipulation ability to OpenTSDB.
When working with time series, it is actually recommended to rather submit data as the integral (i.e. a monotinically increasing counter). OpenTSDB can then "differentiate" this using the rate function.
I'm writing a bank simulation program and I'm trying to find that percent to know how fast to program a new person coming in based on a timer that executes code every second. Sorry if it sounds kinda confusing, but I appreciate any help!
If you need to generate a new person entity every 2-6 seconds, why not generate a random number between 2 and 6, and set the timer to wait that amount of time. When the timer expires, generate the new customer.
However, if you really want the equivalent probability, you can get it by asking what it represents: the stochastic experiment is "at any given second of the clock, what is proability of a client entering, such that it will result in one client every 2-6 seconds?". Pick a specific incidence: say one client every 2 seconds. If on average you get 1 client every 2 seconds, then clearly the probability of getting a client at any given second is 1/2. If on average you get 1 client every 6 seconds, the probability of getting a client at any given second is 1/6.
The Poisson distribution gives the probability of observing k independant events in a period for which the average number of events is λ
P(k) = λk e-λ / k!
This covers the case of more than one customer arriving at the same time.
The easiest way to generate Poisson distributed random numbers is to repeatedly draw from the exponential distribution, which yields the waiting time for the next event, until the total time exceeds the period.
int k = 0;
double t = 0.0;
while(t<period)
{
t += -log(1.0-rnd())/lambda;
if(t<period) ++k;
}
where rnd returns a uniform random number between 0 and (strictly less than) 1, period is the number of seconds and lambda is the average number of arrivals per second (or, as noted in the previous answer, 1 divided by the average number of seconds between arrivals).
I get the following result when I run a load test. Can any one help me to read the report?
the number of thread = '500 '
ramp up period = '1'
Sample = '500'
Avg = '20917'
min = '820'
max = '48158'
Std Deviation = '10563.178194669255'
Error % = '0.046'
throughput = '10.375381295262601'
KB/Sec = `247.05023046315702`
Avg. Bytes = '24382.664'
Short explanation looks like:
Sample - number of requests sent
Avg - an Arithmetic mean for all responses (sum of all times / count)
Minimal response time (ms)
Maximum response time (ms)
Deviation - see Standard Deviation article
Error rate - percentage of failed tests
Throughput - how many requests per second does your server handle. Larger is better.
KB/Sec - self expalanatory
Avg. Bytes - average response size
If you having troubles with interpreting results you could try BM.Sense results analysis service
The JMeter docs say the following:
The summary report creates a table row for each differently named request in your test. This is similar to the Aggregate Report , except that it uses less memory.
The thoughput is calculated from the point of view of the sampler target (e.g. the remote server in the case of HTTP samples). JMeter takes into account the total time over which the requests have been generated. If other samplers and timers are in the same thread, these will increase the total time, and therefore reduce the throughput value. So two identical samplers with different names will have half the throughput of two samplers with the same name. It is important to choose the sampler labels correctly to get the best results from the Report.
Label - The label of the sample. If "Include group name in label?" is
selected, then the name of the thread group is added as a prefix.
This allows identical labels from different thread groups to be
collated separately if required.
# Samples - The number of samples with the same label
Average - The average elapsed time of a set of results
Min - The lowest elapsed time for the samples with the same label
Max - The longest elapsed time for the samples with the same label
Std. Dev. - the Standard Deviation of the sample elapsed time
Error % - Percent of requests with errors
Throughput - the Throughput is measured in requests per
second/minute/hour. The time unit is chosen so that the displayed
rate is at least 1.0. When the throughput is saved to a CSV file, it
is expressed in requests/second, i.e. 30.0 requests/minute is saved
as 0.5.
Kb/sec - The throughput measured in Kilobytes per second
Avg. Bytes - average size of the sample response in bytes. (in JMeter
2.2 it wrongly showed the value in kB)
Times are in milliseconds.
A Jmeter Test Plan must have listener to showcase the result of performance test execution.
Listeners capture the response coming back from Server while Jmeter runs and showcase in the form of – tree, tables, graphs and log files.
It also allows you to save the result in a file for future reference. There are many types of listeners Jmeter provides. Some of them are: Summary Report, Aggregate Report, Aggregate Graph, View Results Tree, View Results in Table etc.
Here is the detailed understanding of each parameter in Summary
report.
By referring to the figure:
image
Label: It is the name/URL for the specific HTTP(s) Request. If you have selected “Include group name in label?” option then the name of the Thread Group is applied as the prefix to each label.
Samples: This indicates the number of virtual users per request.
Average: It is the average time taken by all the samples to execute specific label. In our case, the average time for Label 1 is 942 milliseconds & total average time is 584 milliseconds.
Min: The shortest time taken by a sample for specific label. If we look at Min value for Label 1 then, out of 20 samples shortest response time one of the sample had was 584 milliseconds.
Max: The longest time taken by a sample for specific label. If we look at Max value for Label 1 then, out of 20 samples longest response time one of the sample had was 2867 milliseconds.
Std. Dev.: This shows the set of exceptional cases which were deviating from the average value of sample response time. The lesser this value more consistent the data. Standard deviation should be less than or equal to half of the average time for a label.
Error%: Percentage of Failed requests per Label.
Throughput: Throughput is the number of request that are processed per time unit(seconds, minutes, hours) by the server. This time is calculated from the start of first sample to the end of the last sample. Larger throughput is better.
KB/Sec: This indicates the amount of data downloaded from server during the performance test execution. In short, it is the Throughput measured in Kilobytes per second.
For more information:
http://www.testingjournals.com/understand-summary-report-jmeter/
Sample: Number of requests sent.
The Throughput: is the number of requests per unit of time (seconds, minutes, hours) that are sent to your server during the test.
The Response time: is the elapsed time from the moment when a given request is sent to the server until the moment when the last bit of information has returned to the client.
The throughput is the real load processed by your server during a run but it does not tell you anything about the performance of your server during this same run. This is the reason why you need both measures in order to get a real idea about your server’s performance during a run. The response time tells you how fast your server is handling a given load.
Average: This is the Average (Arithmetic mean μ = 1/n * Σi=1…n xi) Response time of your total samples.
Min and Max are the minimum and maximum response time.
An important thing to understand is that the mean value can be very misleading as it does not show you how close (or far) your values are from the average.For this purpose, we need the Deviation value since Average value can be the Same for different response time of the samples!!
Deviation: The standard deviation (σ) measures the mean distance of the values to their average (μ).It gives you a good idea of the dispersion or variability of the measures to their mean value.
The following equation show how the standard deviation (σ) is calculated:
σ = 1/n * √ Σi=1…n (xi-μ)2
For Details, see here!!
So, if the deviation value is low compared to the mean value, it will indicate you that your measures are not dispersed (or mostly close to the mean value) and that the mean value is significant.
Kb/sec: The throughput measured in Kilobytes per second.
Error % : Percent of requests with errors.
An example is always better to understand!!! I think, this article will help you.
There are lots of explanation of Jmeter Summary, I have been using this tool from quite some time for generating performance testing report with relevant data. The explanation available on below link is right from the field experience:
Jmeter:Understanding Summary Report
This is one of the most useful report generated by Jmeter to undertstand the load test result.
# Label: Name of HTTP sample request send to server
# Samples : This Captures the total number of samples pushed to server. Suppose you put a Loop Controller to run it 5 times this particular request and then 2 iteration(Called Loop Count in Thread Group)is set and load test is run for 100 users, then the count that will be displayed here .... 1*5*2 * 100 =1000. Total = total number of samples send to server during entire run.
# Average : It's an average response time for a particular http request. This response time is in millisecond, and an average for 5 loops in two iteration for 100 users. Total = Average of total average of samples, means add all averages for all samples and divide by number of samples
# Min : Minmum time spend by sample requests send for this label.
The total equals to the minimum time across all samples.
# Max : Maximum tie spend by sample requests send for this label
The total equals to the maxmimum time across all samples.
# Std. Dev. : Knowing the standard deviation of your data set tells you how densely the data points are clustered around the mean. The smaller the standard deviation, the more consistent the data. Standard deviation should be less than or equal to half of the average time for a label. If it is more than that, then it means that something is wrong. you need to figure out the problem and fix it.
https://en.wikipedia.org/wiki/Standard_deviation
Total is euqals to highest deviation across all samples.
# Error: Total percentage of erros found for a particular sample request. 0.0% shows that all requests completed successfully.
Total equals to percentage of errors samples in all samples (Total Samples)
# Throughput: Hits/sec, or total number of request per unit of time(sec, mins, hr) send to server during test.
endTime = lastSampleStartTime + lastSampleLoadTime
startTime = firstSampleStartTime
converstion = unit time conversion value
Throughput = Numrequests / ((endTime - startTime)*conversion)
# KB/sec : Its mesuring throughput rate in Kilobytes per second.
# Avg. Bytes: Avegare of total bytes of data downloaded from server.
Totals is average bytes across all samples.
We have etsy/statsd node application running that flushes stats to carbon/whisper every 10 seconds. If you send 100 increments (counts), in the first 10 seconds, graphite displays them properly, like:
localhost:3000/render?from=-20min&target=stats_counts.test.count&format=json
[{"target": "stats_counts.test.count", "datapoints": [
[0.0, 1372951380], [0.0, 1372951440], ...
[0.0, 1372952460], [100.0, 1372952520]]}]
However, 10 seconds later, and this number falls to 0, null and or 33.3. Eventually it settles at a value 1/6th of the initial number of increments, in this case 16.6.
/opt/graphite/conf/storage-schemas.conf is:
[sixty_secs_for_1_days_then_15m_for_a_month]
pattern = .*
retentions = 10s:10m,1m:1d,15m:30d
I would like to get accurate counts, is graphite averaging the data over the 60 second windows rather than summing it perhaps? Using the integral function, after some time has passed, obviously gives:
localhost:3000/render?from=-20min&target=integral(stats_counts.test.count)&format=json
[{"target": "stats_counts.test.count", "datapoints": [
[0.0, 1372951380], [16.6, 1372951440], ...
[16.6, 1372952460], [16.6, 1372952520]]}]
Graphite data storage
Graphite manages the retention of data using a combination of the settings stored in storage-schemas.conf and storage-aggregation.conf. I see that your retention policy (the snippet from your storage-schemas.conf) is telling Graphite to only store 1 data point for it's highest resolution (e.g.10s:10m) and that it should manage the aggregation of those data points as the data ages and moves into the older intervals (with the lower resolution defined - e.g. 1m:1d). In your case, the data crosses into the next retention interval at 10 minutes, and after 10 minutes the data will roll up according the settings in the storage-aggregation.conf.
Aggregation / Downsampling
Aggregation/downsampling happens when data ages and falls into a time interval that has lower resolution retention specified. In your case, you'll have been storing 1 data point for each 10 second interval but once that data is over 10 minutes old graphite now will store the data as 1 data point for a 1 minute interval. This means you must tell graphite how it should take the 10 second data points (of which you have 6 for the minute) and aggregate them into 1 data point for the entire minute. Should it average? Should it sum? Depending on the type of data (e.g. timing, counter) this can make a big difference, as you hinted at in your post.
By default graphite will average data as it aggregates into lower resolution data. Using average to perform the aggregation makes sense when applied to timer (and even gauge) data. That said, you are dealing with counters so you'll want to sum.
For example, in storage-aggregation.conf:
[count]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum
UI (and raw data) aggregation / downsampling
It is also important to understand how the aggregated/downsampled data is represented when viewing a graph or looking at raw (json) data for different time periods, as the data retention schema thresholds directly impact the graphs. In your case you are querying render?from=-20min which crosses your 10s:10m boundary.
Graphite will display (and perform realtime downsampling of) data according to the lowest-resolution precision defined. Stated another way, it means if you graph data that spans one or more retention intervals you will get rollups accordingly. An example will help (assuming the retention of: retentions = 10s:10m,1m:1d,15m:30d)
Any graph with data no older than the last 10 minutes will be displaying 10 second aggregations. When you cross the 10 minute threshold, you will begin seeing 1 minute worth of count data rolled up according to the policy set in the storage-aggregation.conf.
Summary / tldr;
Because you are graphing/querying for 20 minutes worth of data (e.g. render?from=-20min) you are definitely falling into a lower precision storage setting (i.e. 10s:10m,1m:1d,15m:30d) which means that aggregation is occurring according to your aggregation policy. You should confirm that you are using sum for the correct pattern in the storage-aggregation.conf file. Additionally, you can shorten the graph/query time range to less than 10min which would avoid the dynamic rollup.