Apache response time on custom chart on Stackdriver dashboard - stackdriver

We would like to have a line chart (or distribution) in our custom Stackdriver dashboard with the response time returned by the apache logs. Simple line chart, respone time or latency called by others coming from structured logs.
We setup our logging agent, added structured logs to fluentd as you can see on the image it's working on the log screen.
in the httpRequest we have the latency.
httpRequest: {
latency: "0.081215s"
referer: "-"
requestMethod: "GET"
requestUrl: "/v1/call/match-in?q=spdif&fields=&limit=200&ra..."
responseSize: "636"
serverIp: "3.89.69.139"
status: 200
userAgent: "HTTPClient/1.0 (2.8.3, ruby 2.2.3 (2015-08-18))"
}
We tried creating a custom log metric by picking the field, and setting the expression as ([0-9.]+)s to eliminate the ending 's'. Our values are in the 79ms 0.079800s not sure if we need to modify the bucket configuration.
But the chart is coming back empty:
How do we plot it as a line chart on a custom dashboard in Stackdriver?
Is there a way to plot an extracted time field from logs on the screen?
Update 1:
How can we actually check if we created a metric, that has too many time series? In the Troubleshooting section there are three cases, but how do we verify them?
Update 2:
So we found that this doesn't work with values below 1. Like we had chart where we had numbers sub second, and looks like while it's between 0 and 1 it doesn't plot.
The below image clearly shows that while the latency was below 1 it didn't plot on the distribution start, and once it was more than 1 it was plotting on the right chart.
So based on these finding what is the real work around? What me missed?
here is our config:

Related

Mobile data reported in GA Measurement Protocol appear in realtime but not in daily summary

I've been attempting to log activity on a mobile-like device using the Google Analytics Measurement Protocol. All of these attempts have validated using the validation URL, and I can see activity when I look at the real-time reports on the Analytics website. But when I look at the Home or Overview reports for the day - no activity is shown.
The view is set for "All Mobile App Data".
The POST body looks something like this:
v=1&tid=UA-000000000-1&ds=app&qt=1601&uid=uid-zzzzz&t=screenview&cd=Foo&an=Foo%20App%20Name&aid=com.example.foo&aiid=com.example.foo&av=0.0.1&ua=Mozilla%2F5.0%20(Linux%3B%20Android%207.0%3B%20SM-G930V%20Build%2FNRD90M)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F59.0.3071.125%20Mobile%20Safari%2F537.36
The ua field is just a pre-defined string. I found that if I omitted it, the Real Time monitoring listed the hits as desktop hits, although I was in a Mobile report and the ds field was "app".
Am I missing a field that is required? Is there some reason why it is showing up in the real-time report, but not in a daily report? Is there some other way to diagnose why the data is vanishing, or confirm the data is actually being captured?
When i check the debug endpoint the hit is valid
Request:
https://www.google-analytics.com/debug/collect?v=1&tid=UA-XXX-1&ds=app&qt=1601&uid=uid-zzzzz&t=screenview&cd=Foo&an=Foo%20App%20Name&aid=com.example.foo&aiid=com.example.foo&av=0.0.1&ua=Mozilla%2F5.0%20(Linux%3B%20Android%207.0%3B%20SM-G930V%20Build%2FNRD90M)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F59.0.3071.125%20Mobile%20Safari%2F537.36
Response
{
"hitParsingResult": [ {
"valid": true,
"parserMessage": [ ],
"hit": "/debug/collect?v=1\u0026tid=UA-53766825-1\u0026ds=app\u0026qt=1601\u0026uid=uid-zzzzz\u0026t=screenview\u0026cd=Foo\u0026an=Foo%20App%20Name\u0026aid=com.example.foo\u0026aiid=com.example.foo\u0026av=0.0.1\u0026ua=Mozilla%2F5.0%20(Linux%3B%20Android%207.0%3B%20SM-G930V%20Build%2FNRD90M)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F59.0.3071.125%20Mobile%20Safari%2F537.36"
} ],
"parserMessage": [ {
"messageType": "INFO",
"description": "Found 1 hit in the request."
} ]
}
I cannot use one of the mobile libraries from Firebase - this is not one of the platforms they support. I do not wish to pretend this is a web page - there is no associated hostname or path. I do not wish to use Events since I can't do event Behavior Flow, which is one of the things I'm interested in seeing.
I'm aware that it can sometimes take "a day or so" for results to first appear. The site was setup over five days ago at this point, and has received data during that time.
Good thought about the anti-spam setting, however the setting appears to be correct:
I've also tried using GET instead of POST - no change, it still shows the hit in real-time, but then it vanishes.
However, I know that it can record hits permanently. There were two hits from a spammer in Russia that have shown up in the daily report (I wasn't there to see it show up in real-time). I don't know what they did, but would love to find out since it might help figure out how I can add a record.
In the real-time reports, it correctly points out the data center all the hits are coming from. Perhaps that is filtering it out somewhere out of my control?
Try adding Cid I know it says this is an optional parameter but for mobile accounts I belive it may be required.
Client ID
Optional.
This field is required if User ID (uid) is not specified in the request. This anonymously identifies a particular user, device, or browser instance. For the web, this is generally stored as a first-party cookie with a two-year expiration. For mobile apps, this is randomly generated for each particular instance of an application install. The value of this field should be a random UUID (version 4) as described in http://www.ietf.org/rfc/rfc4122.txt.
Example value: 35009a79-1a05-49d7-b876-2b884d0f825b
Although this says it needs to be a UUIDv4, it does work with other UUIDs (I've tested it with a v5, which is a hash against the value used for the uid parameter).

Graphite Derivative shows no data

Using graphite/Grafana to record the sizes of all collections in a mongodb instance. I wrote a simple (WIP) python script to do so:
#!/usr/bin/python
from pymongo import MongoClient
import socket
import time
statsd_ip = '127.0.0.1'
statsd_port = 8125
# create a udp socket
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
client = MongoClient(host='12.34.56.78', port=12345)
db = client.my_DB
# get collection list each runtime
collections = db.collection_names()
sizes = {}
# main
while (1):
# get collection size per name
for collection in collections:
sizes[collection] = db.command('collstats', collection)['size']
# write to statsd
for size in sizes:
MESSAGE = "collection_%s:%d|c" % (size, sizes[size])
sock.sendto(MESSAGE, (statsd_ip, statsd_port))
time.sleep(60)
This properly shows all of my collection sizes in grafana. However, I want to get a rate of change on these sizes, so I build the following graphite query in grafana:
derivative(statsd.myHost.collection_myCollection)
And the graph shows up totally blank. Any ideas?
FOLLOW-UP: When selecting a time range greater than 24h, all data similarly disappears from the graph. Can't for the life of me figure out that one.
Update: This was due to the fact that my collectd was configured to send samples every second. The statsd plugin for collectd, however, was receiving data every 60 seconds, so I ended up with None for most data points.
I discovered this by checking the raw data in Graphite by appending &format=raw to the end of a graphite-api query in a browser, which gives you the value of each data point as a comma-separated list.
The temporary fix for this was to surround the graphite query with keepLastValue(60). This however creates a stair-step graph, as the value for each None (60 values) becomes the last valid value within 60 steps. Graphing a derivative of this then becomes a widely spaced sawtooth graph.
In order to fix this, I will probably go on to fix the flush interval on collectd or switch to a standalone statsd instance and configure as necessary from there.

Grafana annotations work inconsistently

I am using grafana to pull in graphite events and overlay them on graphs as annotations. This seems to work very inconsistently for me so I was hoping that someone might have an idea as to what I may be doing wrong.
I am able to see all of the events in the graphite dashboard so I know they are available.
When I create the annotation I am using Graphite event tags:
The one above seems to work as expected:
I added a second annotation and this one does not seem to show up at all. When I look at the network console in Chrome, both annotations are being fetched as expected but for some reason the second one is not added to the screen:
First network event (appears on graph):
[{"data": "Fixed issue with metrics not being collected properly for bamboo.", "what": "metrics bug fixed", "when": 1444197389.0, "id": 11, "tags": "bamboo_events"}]
Second network event (does not appear on graph):
[{"data": "Sync graphiteprod-c02 data to graphiteprod-c01", "what": "sync", "when": 1446665626.0, "id": 13, "tags": "testtag"}]
I have tried creating new a new dashboard that only has the second annotation defined and it does not show up there as well.
It looks like there might be a discrepancy between the graphite event epoch time and grafana.
For graphite it is returning 2015-11-04 08:33:46 as 1446662026.0
When compared to the current epoch time (1446651804) the graphite event is in the future. It seems that the time is showing about 5 hours in the future, might be some sort of issue with time zone conversion.

How to get Graphite to simply count counters, not time-rate them

I'm using Graphite and Collectd to monitor my server. In particular, I'm using the tail pluggin to count failed SSH logins. I'm using a counter for this metric, so expect to see 1, 2, 3, 0, etc... for data points. However, what I'm seeing is 0.1, 0.2, 0.3, 0, etc... It seems to me like Graphite is providing counts-per-second. I say this because my retention policy is one data point every 10 seconds for two hours. So 1 failed login per 10 seconds = 0.1 per second. I'm looking at this in a graph. It looks like this:
Furthermore, when I scale out to the next retention level, the numbers get adjusted accordingly: so 1 failed login which was shown as 0.1 is now shown as much less than this: 0.017 or something.
I don't think this is related to the aggregation method used: even the finest data is off. How can I get Graphite to treat this metric as a pure, raw, counter?
Here's my storage-schemas.conf (the retention policy):
[my_server]
pattern = .*
retentions = 10s:2h,1m:2d,30m:400d
Here's my configuration of the collectd tail plugin:
<Plugin "tail">
<File "/var/log/auth.log">
Instance "auth"
<Match>
Regex "sshd[^:]*: Failed password"
DSType "CounterInc"
Type "counter"
Instance "sshd-invalid_user"
</Match>
</File>
</Plugin>
And here's my configuration of the write_graphite pluggin (which sends data to graphite):
<Plugin write_graphite>
<Node "my_server_name">
Host "localhost"
Port "2003"
Protocol "tcp"
LogSendErrors true
Prefix "collectd."
#Postfix ""
StoreRates true
AlwaysAppendDS false
EscapeCharacter "_"
</Node>
</Plugin>
I tried setting StoreRates false for the write_graphite pluggin, but this didn't work. It did change the behaviour: when I performed a single failed SSH login, that metric shows as 1. However, it didn't drop back down to 0. When I performed two more failed logins, the metric pops up to 3.
Also of interest: I've also loaded the users pluggin which simply shows the number of users logged in and it's working great: shows 1 when I SSH in, two when I SSH in again, and back to 1 when I exit one SSH. For both settings of StoreRates. So it seems like what I want is possible somehow. Maybe not with the tail pluggin though.
The SSH logins with StoreRates false along with correct behaviour for Users Logged in can be seen in these graphs:
Any ideas? Thanks,
You are asking the system to count the number of events. And this is exactly what it's doing: it's counting the number of failed logins since its startup. Whether you're using StoreRates or not simply changes the way that information is displayed: as a rate or as the raw counter. A counter may never decrease! What you're actually asking for is a counter that resets itself upon reading: count the number of failed logins since the last time collectd checked.
As it happens the ABSOLUTE data source type in rrdtool can be used to achieve this, but that won't help you.
Step back, and think about what you're trying to achieve: the number of failed logins per second seems to me like a perfectly sane metric!
Although swissunix's answer is very helpful, to achieve the behaviour I was looking for, I ended up using Logster instead of Collectd. With Logster, you write the bit of code that parses the file as well as the bit that returns the metric. So although dividing a count by the time is common with Logster, you don't have to do this if you don't want to: there's lots of flexibility.
I've put my parsers here: https://github.com/camlee/logster-parsers
If you set StoreRates to false, in graphite you can apply the derivative function to the ever-increasing counter to get your rate of increase per retention interval, which would match your requirement.
E.g. in your example of reporting 1 failed login, then 2, you saw the values 1 and 3. The derivative is 1 and 2: the failed logs per interval that graphite tracks.

Google Analytics Realtime Sandbox Environment

I am looking for a way to setup a google analytics sandbox environment that will allow me
to test out my custom js code near real time.
My app will be using custom variables for advanced segmentation, and I would like to test out multiple scenarios quickly, as opposed to setting up a dummy GA account and wait for a whole day to confirm the test.
Thanks
Great question.
For GA, server updates occur every four hours, and after every sixth such update, the entire set is recalculated, which means a 24-hour lag from code change to reliable feedback. This delay also applies to most customizations to the GA Browser (e.g., "custom filters").
So if you are going to use GA as your web metrics system, and you expect to actually rely on those data then a test rig is essential.
For me, it's useful to group test systems for client-side analytics using two rubrics: (i) complete, self-contained (closed-loop) systems; or (ii) simpler automated data pulls from the production system (by "production system" here i mean GA's system, not the Site whose pages the GA code is tracking).
For the latter, just add this line to each page of your Site that contains the GA tracking code, just below '__trackPageview()':
pageTracker._setLocalRemoteServerMode();
That line will cause a copy of each transaction line to be logged to your server's activity log--so in essence, you get the data captured by GA in real-time That's all you need to do to capture the data; to parse it, you can use, for instance, any of the excellent open source web log analyzers like AWStats, or roll your own.
This is simple and reliable--but all it can do is tell you (in real-time) "does the analytics code i just implemented on pages served by my production server actually work?"
Usually, that's not good enough--you would rather know if your code will work before it's on your production server. To do that, you need to simulate the production environment and find a way to access in real-time the data GA collects.
This kind of test rig is a little more involved, but still not difficult.
In sum, it requires these steps:
host/serve the ga.js and the
tracking pixel locally;
log the __utm.gif requests (in the
GA data flow, each request
corresponds to one logged
transaction); and
parse the headers into some
convenient human-readable form.
If you want more detail than that (ie, a step-by-step implementation), here it is:
I. Hosting/Serving the GA Script (& automating updates
To do that, you can create a small shell script like this one to wget the latest ga.js version into your local directory (replacing the extant version it finds there).
#!/bin/sh
rm /My_Sites/sitename.com/analytics/ga.js
cd /My_Sites/sitename.com/analytics/
wget http://www.google-analytics.com/ga.js
chmod 644 /My_Sites/sitename.com/analytics/ga.js
cd ${OLDPWD}
exit 0;
(Thanks to AskApache.com, which provided the original motivation and config details to do this in a production context.)
II. Create __utm.gif file
This is just a transparent 1x1 pixel gif image, which you will place in Site directory (doesn't matter where, it just needs to match the location recited in your pages)
III. Log the __utm.gif Requests
For a testing protocol in which you are the source of the client-side activity (e.g., you want to verify the cross-browser fidelity of some event-tracking code you've added to a page on your Site, so you automate 5000 clicks on the button you just wired up,serving the page from your dev server set up for this purpose) it's probably simplest to just log the Request Headers, because it's in those headers that the GA script directs the client to gather various data from the DOM, from the location bar (url), and from prior http headers, and append them to a request for a resource on the GA server (__utm.gif, which is just a 1x1 transparent pixel).
For this type of protocol, i use the Firefox addon, LiveHTTPHeaders. You install it like any other Firefox addon, a few mouse clicks is all. Next, open it, and click the "Generator" tab. From this window, you can see the actual requests in real time. At the bottom of the window is a 'save' button to store the log. I find it easier to configure LiveHTTPHeaders to log only the __utm.gif requests; to do that, just click the 'Edit' tab and create a siimple filter to exclude everything except these particular gif images (using the check boxes on the right, and the large text box to the right).
Other kinds of test protocols require you to work from your Server Activity Logs; in that case just add this line to each page of your Site, just below __trackPageview():
pageTracker._setLocalRemoteServerMode();
IV. Parse those logged requests so you can actually read them
So now your log will contain individual transction lines, each one of which is a string appended to an HTTP Request for the GA tracking pixel. This string is just a concatenation of key-value pairs, each key begins with the letters "utm" (probably for "urchin tracker"). Each of these parameters corresponds to a variable that you see in the GA Dashboard (here's a complete list and description of them). This is all you need to know to build a parser. In more detail:
First, here's a sanitized __utm.gif request (the entries in your LiveHTTPHeaders log):
http://www.google-analytics.com/__utm.gif?utmwv=1&utmn=1669045322&utmcs=UTF-8&utmsr=1280x800&utmsc=24-bit&utmul=en-us&utmje=1&utmfl=10.0%20r45&utmcn=1&utmdt=Position%20Listings%20%7C%20Linden%20Lab&utmhn=lindenlab.hrmdirect.com&utmr=http://lindenlab.com/employment&utmp=/employment/openings.php?sort=da&&utmac=UA-XXXXXX-X&utmcc=__utma%3D87045125.1669045322.1274256051.1274256051.1274256051.1%3B%2B__utmb%3D87045125%3B%2B__utmc%3D87045125%3B%2B__utmz%3D87045125.1274256051.1.1.utmccn%3D(referral)%7Cutmcsr%3Dlindenlab.com%7Cutmcct%3D%2Femployment%7Cutmcmd%3Dreferral%3B%2B
This is my parser (in Python):
# regular expression module imported
import re
pattern = r'\&{1,2}'
pat_obj = re.compile(pattern)
# splitting the gif request on the '&' character
# (which GA originally used to concatenate each piece to build the request)
# (here, i've bound the __utm.gif to the variable by 'gfx')
gfx1 = pat_obj.split(gfx)
# create a look-up table to map a descriptive name to each gif request parameter
# (note, this isn't the entire list, which i've linked to above)
keys = "utmje utmsc utmsr utmac utmcc utmcn utmcr utmcs utmdt utme utmfl utmhn utmn utmp utmr utmul utmwv"
values = "java_enabled screen_color_depth screen_resolution account_string cookies campaign_session_new repeat_campaign_visit language_encoding page_title event_tracking_data flash_version host_name GIF_req_unique_id page_request referral_url browser_language gatc_version"
keys = keys.strip().split()
#create the look-up table
GIF_REQUEST_PARAMS = dict(zip(keys, values))
# parse each request parameter and map the parameter name to a descriptive name:
pattern = r'(utm\w{1,2})=(.*?)$'
pat_obj = re.compile(pattern)
for itm in gfx1 :
m = pat_obj.search(itm)
if m :
fmt = '{0:25} {1:10}'
print( fmt.format( GIF_REQUEST_PARAMS[m.group(1)], m.group(2) ) )
The result looks like this:
gatc_version              1         
GIF_req_unique_id         1669045322
language_encoding         UTF-8     
screen_resolution         1280x800  
screen_color_depth        24-bit    
browser_language          en-us     
java_enabled              1         
flash_version             10.0%20r45
campaign_session_new      1         
page_title                Position%20Listings%20%7C%20Linden%20Lab
host_name                 lindenlab.hrmdirect.com
referral_url              http://lindenlab.com/employment
page_request              /employment/openings.php?sort=da
account_string            UA-XXXXXX-X
cookies
To avoid making this longer still, i left out the cookies' value. They obviously require a separate parsing step, though it's virtually identical to the step i just showed. Again, each request represents a single transaction, so you can store them as you need to.

Resources