Cloud Datastore metric baseline to be alerted - google-cloud-datastore

In my company, there is a request to be alerted for the following metrics on a Cloud DataStore service configuration but they don't know which are the baselines for each of these metrics.
Do they have to be like a fixed count (ie: request_count > 100) or maybe an average.
api/request_count
index/write_count
entity/read_sizes
entity/write_sizes
I was checking free quota limits here but I'm not really sure which values can be the right ones to use for a baseline.

Related

Resolve stackdriver incident when no more timeseries with available data violate the policy

I have stackdriver alerts/incidents on metrics like cloud run revision request latencies.
If there were a few calls a long time ago that had high latency, but there have not been any new requests since then which had a low latency, the incident will be permanently firing. This is because when there are no new requests coming in, there are no data points for the metric.
Is there a way to automatically stop an incident from firing when there are no recent data points for the underlying metrics? Or is there an alternative way to have alerts on high request latencies in cloud run that automatically switches off the alarm again when no new requests are coming that have a high latency?
The solution of https://stackoverflow.com/a/63997540/6473907 does not work as-is, because the google cloud run built-in metric for the request count does not go to zero when there are no more requests coming in. Instead, it just stops providing any data points. The solution for us was to create a custom logs-based metric that counts the log entries written for every request by cloud run, because the logs-based metric does indeed go to zero, then combine it with the AND_WITH_MATCHING_RESOURCE as described in https://stackoverflow.com/a/63997540/6473907
The chart compares the request count as obtained from the google pre-defined metric run.googleapis.com/request_count (in violet) with the metric generated by a custom logs-based metric (in blue). Only the latter goes to zero when no more requests are coming in.
Edit: This solution will not work because the request count stops being sent to Stackdriver instead of dropping to zero. As explained in the other (more correct) answer, the solution is to create a logs-based metric for the requests, and this will properly drop to zero when there are no additional requests.
This behaviour is documented in the alerting docs:
If measurements are missing (for example, if there are no HTTP
requests for a couple of minutes), the policy uses the last recorded
value to evaluate conditions.
There are a few recommendations in there to mitigate this issue, but all the suggestions assume you're actually collecting metrics, not your situation where there are no metrics at all (because you stopped receiving requests).
This is probably by design: even if you are not receiving additional requests, you might still want to check why all the latest requests had this increased latency.
To work around this feature, you could try to use multiple conditions in your alert policy:
One condition related to the latency: if latency > X
One condition related to the existence of requests: if request count > 1
If you combine those with AND_WITH_MATCHING_RESOURCE, it should only trigger if there's high latency and there are requests. The incident should be resolved when one of the 2 conditions are not met. Even if no new metrics are ingested related to the latency (so the alerting policy still thinks the latency is high), the request count will stop matching after the duration period specified.

Could Functions for Firebase invocations displayed in Stackdriver

I'm using Firebase with Cloud Functions on the Blaze plan, and I would like to get an overview of how many of the 2 million free monthly invocations I'm using. After that (that's another story though) I would like to make the same overview for the other price points that Firebase is billing for, such as GB-seconds, CPU-seconds, and Network Outbound usage. I can't see these metrics in the firebase console, so my approach is to use the Google Cloud Console and Stackdriver.
Function Invocations in Firebase
Function Executions in Stackdriver
As it's seen on the two graphs, they clearly have the same shape, but the scales are different. The Firebase graph appears to have 3-4-times larger magnitudes that the Stackdriver graph.
Questions:
How come the two graph doesn't match each other?
Which one am I being billed for?
Here is how I've configured the Stackdriver chart:
So I asked the Firebase Support Team as Doug proposed.
Here's what they replied:
I just got an update from our engineers.
They're hinting that the difference might come from the aligner being
count and not sum, which is what the Firebase Console uses. With this,
could you try changing it and see if the values change?
Additionally, the console is pulling the data from Stackdriver, so the
source of the data should be the same. You should also be billed based
on the Stackdriver values. If there's still a discrepancy, please let
me know and provide the updated screenshots so I can investigate
further.
Afterwards I changed the aligner to be sum instead of count, which resultet in a chart with roughly the same scale as the Firebase graph.
So to answer my own questions:
How come the two graph doesn't match each other?
The should! The aligner in the Stackdriver chart must the set to sum to reflect the Firebase graph.
Which one am I being billed for?
Both! The data in Firebase originates from Stackdriver, so the graphs should match. If they don't It's likely an issue with the Stackdriver chart settings, and Firebase must be the true value.

Google Analytics Hit Quotas

I wonder whether someone can help me please.
I have a user who under a specific property, sporadically receives the following error:
Some hits sent on 03-Jul-2018 to property ...... exceeded one or more hit quotas and were therefore not processed.
Hits can be dropped when daily or monthly hit limits are exceeded. You can view your hit volume levels in Property Settings in Analytics.
Hits can also be dropped if visitor hit limits are exceeded. This can happen when your site is incorrectly generating the visitor ID for a GA session. Contact your website administrator to check that the visitor ID generation has been correctly implemented.
They are not using the Premium account but when I look at the data for the day in question, there aren't any issues with regards to 'High Cardinality' which unless I've misunderstood I'd expect to see.
Could someone look at this please and offer some guidance where the issue may be because this area is fairly new to me.
Many thanks and kind regards
Chris
Collection limits are influenced by 2 factors:
The tracker: whether you use ga.js,gtag.js,analytics.js etc... here are the details.
The property type: whether you are using GA (10M hits / month) or GA 360 (2B hits / month).
In your case you are facing a property limit. To find out when such limits where reached, you can create a custom report using a time dimension (eg date+time) combined with the hits metric. You can also combine the hit metrics with other dimensions (country, browser, device) to see if you find any patterns as to why you're getting so many hits.
Cardinality is something else: it refers to the number of unique value combinations for your dimensions. For instance if you have 500K events where each event category is different, you'll have a Cardinality of 500K on the event category dimension. The more hits, the more likely you'll have a high cardinality, but the 2 aren't necessary related (if you send 10B events with the same category, the cardinality on the category is 1).
So focus on identifying and solving your limits/quotas issue, as it's the real issue here:
If the number of hits is legitimate (you have a huge amount of traffic), then the only options are to upgrade to GA 360 or reduce the number of hits for each session
If the number of hits is abnormally high (eg traffic is stable but hits increased dramatically), look for implementation issues, especially generic event trackers such as error tracking with tools like Google Tag Manager

Google Analytics Client Sampling when over 10milion hits/month

Google Analytics Documentation says the following:
hese limits apply to the Web Property / Property / Tracking ID.
10 million hits per month per property
If you go over this limit, the Google Analytics team might contact you
and ask you upgrade to Analytics 360 or implement client sampling to
reduce the amount of data being sent to Google Analytics.
For monthly total Analytics 360 limits, please contact your account
manager or service representative.
What does this mean exactly ?
I know there is sampling , the one which you see in your reports..
But if your traffic exceeds the 10 milion hits per month, is there an automatic sampling system which forbids you to capture all incoming traffic?
In other words : Does google limit your traffic automaticly at the source? Not in the reports but in the source, let's say I capture 20mil hits a month, will i have all that traffic in my property or does it stop at a certain point?
AGAIN : i'm not talking about report sampling but about the actual captured data a month
Thanks in advance
No, Google does not limit data collection. You have to implement this yourself, although they give you a means to do that at least in the Javascript Tracking code. Implementing sampling yourself would be a little tricky since you want to sample out whole sessions, not individual pageviews.
If you record 20 mio hits you will have them in your property. But at that point you operate outside your quota and Google has the right to terminate your account (they will not do that without getting in contact with you, provided you respond to mails send to the Google accounts authorized to use your GA properties).
So far Google has been, in my experience, very generous even with large overruns, but you should not base something business critical on the violation of TOS for a free service.

Is the Google Analytics API containsSampledData field reliable?

We are running the Google Analytics free version and I'm seeing some inconsistent results regarding data sampling. I have tried my requests in Google Analytics Query Explorer, the GA Sheets add-on, and within the GA interface.
Basically, I am comparing results from a complete date range against the sum of results for that date range broken into smaller chunks (to reduce/remove the chance of sampling occurring). Metrics are sessions, transactions, and revenue. I have a session-level dynamic segment applied: sessions::condition::!ga:landingPagePath=#/thanks
As you may expect, the results from the single request are different (counts are lower) than those from summing the multiple smaller requests. For example, sessions are 45,311 vs. 51,596 and income is further apart. This implies that sampling is being used for the larger request. The trouble is that the API response explicitly says that sampling is not used in any case, i.e. "Contains Sampled Data" equals "No", even for the full date range within which our property should be exceeding the 500,000 session threshold for sampling to kick in.
I'm almost certain that the results from summing smaller date ranges are correct, as these are pretty close to what we see in our CMS analytics.
Can anyone explain the mechanics behind this? Is GA doing some sort of behind-the-scenes sampling to produce this inconsistency?
Thanks,
Daniel
Sounds like sampling. Check all your sources to see if they contain sampling and make sure you have Sampling Level Set to "HIGHER_PRECISION".
1) Google Sheets Google Analytics Add-On in cell B6 of the data for each query check to see if it says "Yes: for "Contains Sampled Data"
2) Google Analytics Query Explorer in the header below your profile name check to see if it says "Contains Sampled Data: Yes"
You are on the right track in breaking your query down into smaller chunks with smaller date ranges to avoid sampling. Here is a post on how to Avoid Google Analytics Sampling using Python

Resources