I'm trying to graph out average response time from http logs. When I go to Visualize and try either a bar or line graph, any time that select a different Aggregation type besides Count(ie Average, Sum, Max, etc), I never get any values in the Field drop down. I believe that the X-Axis should/could just be a Date Histogram.
My query looks like this: "host:'hostname' AND file:'access.log'", which generates a ton of results as a Count, but again, can't seem to figure out how to graph out that other trend over time(outside of a count). I can confirm all my fields are being indexed.
Thanks.
The issue in this case came down to mappings, and how the fields were all being interpreted as strings, which makes it impossible to do any of the other number related Aggregations. The only way I found this out is via a tweet from Rashid(the lead dev of Kibana) tweet from Rashid to me.
Essentially, as documented in the grok docs, I needed to define the mapping type:
%{NUMBER:request_time}
Became:
%{NUMBER:request_time:float}
After re-indexing and re-mapping, now my fields are mapping to the right type, and now I can do number based aggregations.
Related
I'm logging some custom metrics in Application insights using the TelemetryClient.TrackMetric method in .NET, and I've noticed that occasionally some of the events are duplicated when I view them in the Azure portal.
I've drilled into the data, and the duplicate events have the same itemId and timestamp, but if I show the ingestion time by adding | extend ingestionTime = ingestion_time() to the query then I can see that the ingestion times are different.
This GitHub issue indicates that this behavior is expected, as AI uses at-least-once delivery.
I plot these metrics in charts in the Azure portal using a sum aggregation, however these duplicates are creating trust issues with the charts as the duplicates are simply treated as two separate events.
Is there a way to de-dupe the events based on itemId before plotting the data in the Azure portal?
Update
A more specific example:
I'm running an algorithm, triggered by an event, which results in a reward. The algorithm may be triggered several dozen times a day, and the reward is a positive or negative floating point value. It logs the reward each time to Application Insights as a custom metric (called say custom-reward), along with some additional properties for data splitting.
In the Azure portal I'm creating a simple chart by going to Application Insights -> Metrics and customising the chart. I select my custom-reward metric in the Metric dropdown, and select Sum as the aggregation. I may or may not apply splitting. I save the chart to my dashboard.
This simple chart gives me a nice way of monitoring the system to make sure nothing unexpected is happening, and the Sum value in the bottom left of the chart allows me to quickly see whether the sum of the rewards is positive or negative over the chart's range, and by how much.
However, on occasion I've been surprised by the result (say over the last 12 hours the sum of the rewards was surprisingly negative), and on closer inspection I discovered that a few large negative results have been duplicated. Further investigation shows this has been happening with other events, but with smaller results I tend not to notice.
I'm not that familiar with the advanced querying bit of Application Insights, I actually just used it for the first time today to dig into the events. But it does sound like there might be something I can do there to create a query that I can then plot, with the results deduped?
Update 2
I've managed to make progress with this thanks to the tips by #JohnGardner, so I'll mark that as the answer. I've deduped and plotted the results by adding the following line to the query:
| summarize timestamp=any(timestamp), value=any(value), name=any(name), customDimensions=any(customDimensions) by itemId
Update 3
Adding the following line to the query allowed me to split on custom data (in this case splitting by algorithm ID):
| extend algorithmId = tostring(customDimensions.["algorithm-id"])
With that line added, when you select "Chart" in the query results, algorithmId now shows up as an option in the split dropdown. After that you can click "Pin to dashboard". You lose the handy "sum over the time period" indicator in the bottom left of the chart which you get via the simple "Metrics" chart, however I'm sure I'll be able to recreate that in other ways.
if you are doing your own queries, you would generally be using something like summarize or makeseries to do this deduping for a chart. you wouldn't generally plot individual items unless you are looking at a very small time range?
so instead of something like
summarize count() ...
you could do
summarize dcount(itemId) ...
or you might add a "fake" summarize to a query that didn't need it before with by itemId to coalesce multiple rows into just one, using any(x) to grab any individual row's value for each column for each itemId.
but it really depends on what you are doing in your specific query. if you were using something like sum(itemCount) to also deal with sampling, you have other odd cases now, where the at-least-once delivery might have duplicated sampled items? (updating your question to add a specific query and hypothetical result would possibly lead to a more specific answer).
I want to calculate and display the average scroll depth in Data Studio from analytics.
I’m looking to get an average scroll depth in Studio. I’ve got the 10%,25%, etc scroll depth data coming in, but I now need to be able to calculate the average scroll % from this data.
To calculate the average scroll depth:
multiply the scrolled threshold by the number of events (10x500) + (20x400) + (30x475) +(40x300) + (50x200) + (60x100) +(70x75) +(80x60) + (90x20) + (100x10)
Then, take that total divided by the total number of events. 500 + 400 + 475... etc
Because I can’t reference cells in Studio I can’t get it to work. I’ve also tried Google Sheets, which does work to do the calculation, but then I can’t use Data Studios filter to provide a specific page path?
I'm thinking that perhaps the calculation will need to be done at data source, but I am not sure how to reference a 'cell'?
Data Studio doesn't work based on a concept of "cells", it works based on a concept of "fields"—which are basically properties of the data source. Similarly, you don't have "formulas" per se, but rather "calculated fields". These fields can be created either at the chart-level (single-use, but doesn't require permissions to modify the data source) or in the data source (reusable across many charts, requires permissions to modify the data source). Most fields also have an aggregation type, which tells the report how to aggregate it in charts by default (e.g. Sum or Average).
When you either edit your data source and hit "Add Field" or the option with the same name under the "Add metric" or "Add dimension" menus on a chart, you'll be presented with a box to input the formula. To access a field, just type its name (of if you're in the data source, select it from the list on the left). The editor will also typically give you an auto-complete list below your cursor based on what you're typing. Once your entry matches a field, it will get a highlight box around it (the color is based on the type; green = dimension/string,blue = metric/number). The functions available are sort of a mash-up of something between what you'd expect in Google Sheets and in a SQL query, but with more constraints on when you're allowed to use certain functions.
The documentation for calculated fields is pretty simple, so I'd recommend starting there before you try to do too much heavy-lifting in Data Studio. Because of constraints in Data Studio's data model, you'll often find that you need to create separate calculated fields for different parts of the formula, and then combine them in a new calculated field. I'll warn you that the error messages in the field editor aren't super helpful sometimes, so you may need to re-read the documentation for the functions and field types you're working with to ensure you get a valid result.
If you're running into problems, including the field names and values that you need in your calculation may help, including the source of the data (are these GA events?). The more details you give, including what you've already tried, the more helpful we can be. Also, make sure to read the docs first to make sure you have a good handle on the product you're using and the terminology the community is most likely to understand.
I have been playing with Grafana for a while, and got stuck with the following issue.
I need to produce a chart that displays the round trip time for some REST calls my application is doing.
I am using Dropwizard metrics and starting a PausableTimer when I fire the call, and stop it when the expected response comes back. From Grafana, then, I can see the data source I need but when I build the graph, I am forced to apply some statistical calculation on my data (stddev, mean, max, min, etc.) whereas all I need is to chart the numbers from the series.
Below is a screenshot of the config part I am referring to:
The dropdown contains a number of transformations to apply to the priginal data, and I cannot find a way to tell Grafana that I wans to display the data the way it is.
I am not really sure how to do this, had anyone had this issue before -- and solved it?
Thanks in advance.
Is there a way to simply show the change of a value over the selected time period? All I'm interested in is the offset of the last value compared to the initial one. The values can vary above and below these over the time period, it's not really relevant (and would be exceptions in my case).
For an initial value of 100 and an final value of 105, I'd expect a single stat box displaying 5%.
I have the feeling I'm missing something obvious obvious, but can't find a method to display this deceptively simple task.
Edit:
I'm trying to create a scripted Grafana dashboard that will automatically populate disk consumption growth for all our various volumes. The data is already in Graphite, but for purposes of capacity management and finance planning (which projects/departments gets billed) it would be helpful for managers to have a simple and coarse overview of which volumes grow outside expected parameters.
The idea was to create a list of single-stat values with color coding that could easily be scrolled through to find abnormalities. Disk usage would obviously never be negative, but volatility in usage between the start and end of the time period would be lost in this view. That's not a big concern for us as this is all shared storage and such usage is expected to a certain degree.
The perfect solution would be to have the calculations change dynamically based on the selected time period.
I'm thinking that this is not really possible (at least not easily) to do with just Graphite and Grafana and have started looking for alternative methods. We might have to implement a different reporting system for this purpose.
Edit 2
I've tried implementing the suggested solution from Leonid, and it works after a fashion. The calculations seems somewhat off from what I expected though.
My test dashboard looks like follows:
If I were to calculate the change manually, I'd end up with roughly 24% change between the start (7,23) and end (8.96) value. Graphite calculates this to 19%. It's probably a reason for the discrepancy, probably something to do with it being a time-series and not discreet values?
As a sidenote: The example is only 30 days, even though the most interesting number would be a year. We don't have quite a year of data in Graphite yet and having a 30 day view is also interesting. It seems I have to implement several dashboards with static times.
You certainly can do that for some fixed period. For example following query take absolute difference betweent current metric value and value that metric has one minute ago (i.e. initial value) and then calculate it's percentage of inital value.
asPercent(absolute(diffSeries(my_metric, timeShift(my_metric, '1m'))), timeShift(my_metric, '1m'))
I believe you can't do that for time period selected in Grafana picker.
But is that really what you need? It's seems strange because as you said value can change in both directions. Maybe standard deviation would be more suitable for you? It's available in Graphite as stdev function.
I've added the suggested metrics filter from http://logstash.net/docs/1.4.2/filters/metrics and can now see the metrics coming through in kibana, so for example I have the following fields:
http.200.count
http.201.count
http.304.count
http.404.count
along with associated rates (i.e. http.200.rate_1m).
I can create a graph if I add the various rates manually on the Y-axis, but that means I need to know the various responses upfront (not difficult in this example I guess), but is there anyway to tell Kibana to graph the various fields as separate lines?
I believe what you want is a "Split Lines" aggregation. If you have a field that you can use to distinguish the data, then you can use a "Terms" aggregation on that field and Kibana will graph a separate line for each unique value found in that field.