Storage schema retention and aggregation - graphite

In Graphite storage schema retention is configured as:
"10s:1d,1m:30d"
And aggregation methods of min, max, sum, average.
Question: When viewing the charts/graphs where you see what retention and aggregation method Graphite is using?

Graphite's retentions and aggregation methods are not visible on graphs because it doesn't belong to graphs - it belongs to individual metrics, and the graph can contain many metrics.
You can run whisper-info.py a tool with metrics wsp-file - then you'll get which retention and aggregation it has. But usually, your wsp files have same retention / aggeregation which were described in configs.

Related

GA4 limits and sampling

In the internet i tryed to find information, but it very hard ,because in any forums i found diffrent information.
How many hits i can collct in GA4 property in one month without sampling and limits?
In GA3 i could collect only 10M in one month, but in GA4 i don't know.
And after 10M in month, data will be collecting or Google will stop collect new data more than 10M hits?
In official Google docs - https://support.google.com/analytics/answer/11202874?hl=en
They say Explore sampling limits = 10M events per query
What does it mean?
It is impossible that in the report there were more than 10M lines?
It means that when you generate a non-standard report involving 10 million hits, sampling is applied. In that case you can reduce the time frame to involve fewer hits.

Cloud Datastore metric baseline to be alerted

In my company, there is a request to be alerted for the following metrics on a Cloud DataStore service configuration but they don't know which are the baselines for each of these metrics.
Do they have to be like a fixed count (ie: request_count > 100) or maybe an average.
api/request_count
index/write_count
entity/read_sizes
entity/write_sizes
I was checking free quota limits here but I'm not really sure which values can be the right ones to use for a baseline.

How to store data in Graphite with retention of 100ms?

I am using graphite to display our application stats.
storage-schemas.conf
[stats]
pattern = ^stats\.
retentions = 1s:1h,1m:1d,1h:100d
storage-aggregation.conf
[stats]
pattern = ^stats.*
xFilesFactor = 0
aggregationMethod = sum
Per second I am sending data 100 times.
With the above configuration, it is taking only one value every second.
I want to sum all the 100 values sent in a second and store them at that second.
How can I aggregate this data in graphite?
I tried to set retention to 0.01s:1h but its not working.
Is there any way to store data every 100 ms?
I searched everywhere but didn't find a proper solution.
The proper solution will be using Graphite together with StatsD. StatsD aggregates your irregular / high resolution data and sending to Graphite regularly.
Keep in mind, you have to change your client side to send data to statsd instead of graphite.
You can find more information from here: https://github.com/statsd/statsd
If you have high number of metrics I recommend using C version of it https://github.com/statsite/statsite

How to extract Google Analytics historical data using APIs. Pros and cons?

I'm doing a quick proof of concept to understand the procedure to extract historical data from Google Analytics to be further used for offline data stitching to generate a holistic view of data and its analysis. I have not found any detailed online documentation available to understand pros and cons.
Would like to know any limitations on:
The time period for which data can be extracted or any limitation for max. calendar days?
Whether all dimensions/metrics can be extracted or any specific ones?
Will the data be real-time or sampled?
Can all data be pulled into a single table or separate ones?
Will it be available for both freeware and premium version?
The time period for which data can be extracted or any limitation for max. calendar days?
Start date can not be before the launch of Google analytics on '2005-01-01'. Due to processing time lag extracting data that is newer then 2 days old can result in incomplete data. Recommend checking the isDataGolden flag on the response.
Requesting large date ranges can result in sampling which can not be prevented. Its best to request the data in small chunks.
Whether all dimensions/metrics can be extracted or any specific ones?
A list of the dimensions and metrics you can extract can be found here. Each request can contain a maximum of 7 dimensions and 10 metrics.
Will the data be real-time or sampled?
Real-time API and Reporting API are two different APIs. Real-time API is not to my knowledge sampled but as its only about five minutes of data I find it hard to think anyone but really big websites will hit this problem if it is.
Will it be available for both freeware and premium version?
Accessing Google Analytics APIs is free there is no charge. There are however limits on how much data you can extract in a given day.
By default your application can run a maximum of 50k request a day. This can be extended.
Each view you are extracting from can make a maximum of 10k requests day. This can not be extended.
See: limits and quotas for more info.
Note: I am a developer on a business intelligence application that extracts Google Analytics data. I can tell you that its definitely doable.

Having trouble getting accurate numbers from graphite

I have an application that publishes a number of stats to graphite via statsd. One of the stats simply sends a stat increment to statsd every time a message is received by the service. I need to display a graph that shows the the relative traffic over time for this stat. Generally speaking, I should be able to display a graph that refreshes every, say 10 seconds, and displays how many messages were recived in those 10 seconds as well as the history for a given period of time. However, no matter how I format my API query I cannot seem to get accurate data. I've read a number of articles including this one:
http://code.hootsuite.com/accurate-counting-with-graphite-and-statsd/
That seems to give some good insight but is still not quite giving me what I need. this is the closes I have come:
integral(hitcount(stats.recieved, "10seconds"))
However, I don't like the cumulative result of this and when I run this I get statistics that come nowhere near to what I see n my logs for messages received. I am ok with accepting some packet loss but we talking about orders of magnitude. I know I am doing something wrong. Just hoping someone can give me some insight as to what.
A couple of things to check/try:
Configure Graphite for Statsd
Check to make sure that you've used the retention schema and aggregation settings in Graphite that match how Statsd will be sending data (i.e. it sends one data point per 10 second flush interval).
Run a single Statsd aggregator
Be sure you are only running one instance of Statsd as running multiple statsd daemons will cause metrics to be dropped (as Graphite will be configured to only store one data point for it's highest precision of 10s:6h)
Limit the time range in the UI or URL API to less than 6 hours
When displaying graphs with data that crosses over the 6 hour threshold (e.g. from now to 7 hours ago), you will begin seeing 1 minute worth of aggregated count data for the displayed graph (if you've configured Graphite for statsd with retentions = 10s:6h,1min:7d,10min:5y). Rollups will occur based on the oldest data point in the time range (e.g. now till 7+ days = you'll get 10 min rollups).
If sending sparse or "bursty" data AND displaying old time range (triggering aggregation)
Confirm that your xFilesFactor is low enough that aggregation produces non null values even with a high rate of nulls. For example, 100 requests in the first 10 seconds, and none for the remaining 50 seconds in a minute would cause a storage of 100, null, null, null, null, null which would be summed up to null when the data ages if the XFilesFactor is higher than 1/6. Using the statsd recommended graphite configuration handles this, but it is good to know about... as this can give the appearance of lost data.
Saving schema or aggregation changes
If you changed the graphite schema or aggregation settings after any metrics were stored (in whisper = graphite's storage) you'll need to either delete the .wsp files for the metric (graphite will recreate them) or run whisper-resize.py.
Validating settings
You can verify the settings against some whisper data by running whisper-info.py on a .wsp file. Find the .wsp file for one of your metrics in /graphite/storage/whisper/
Run: whisper-info.py my_metric_data.wsp. whisper-info.py output should tell you more about how the storage settings are working.
TLDR;
You should ensure that Graphite is set to store one data point per 10 second interval for metrics coming from StatsD. You should make sure that Graphite is summing (not averaging) for count data coming from Statsd. Both of these can be handled by using the recommended Statsd configuration settings. Don't run more than one Statsd aggregator. When using the UI, limit the data returned to less than 6 hours OR understand what rollup you are viewing when looking at data that crosses retention thresholds. Lastly, make sure the settings take (if you've already been sending metrics).

Resources