How to store data in Graphite with retention of 100ms? - graphite

I am using graphite to display our application stats.
storage-schemas.conf
[stats]
pattern = ^stats\.
retentions = 1s:1h,1m:1d,1h:100d
storage-aggregation.conf
[stats]
pattern = ^stats.*
xFilesFactor = 0
aggregationMethod = sum
Per second I am sending data 100 times.
With the above configuration, it is taking only one value every second.
I want to sum all the 100 values sent in a second and store them at that second.
How can I aggregate this data in graphite?
I tried to set retention to 0.01s:1h but its not working.
Is there any way to store data every 100 ms?
I searched everywhere but didn't find a proper solution.

The proper solution will be using Graphite together with StatsD. StatsD aggregates your irregular / high resolution data and sending to Graphite regularly.
Keep in mind, you have to change your client side to send data to statsd instead of graphite.
You can find more information from here: https://github.com/statsd/statsd
If you have high number of metrics I recommend using C version of it https://github.com/statsite/statsite

Related

GA4 limits and sampling

In the internet i tryed to find information, but it very hard ,because in any forums i found diffrent information.
How many hits i can collct in GA4 property in one month without sampling and limits?
In GA3 i could collect only 10M in one month, but in GA4 i don't know.
And after 10M in month, data will be collecting or Google will stop collect new data more than 10M hits?
In official Google docs - https://support.google.com/analytics/answer/11202874?hl=en
They say Explore sampling limits = 10M events per query
What does it mean?
It is impossible that in the report there were more than 10M lines?
It means that when you generate a non-standard report involving 10 million hits, sampling is applied. In that case you can reduce the time frame to involve fewer hits.

Different Active Users count when using segments

I would love to understand what I'm looking at - why are the numbers different in this report when I add a segment?
This is the report without any segmentation:
This is the same report with the Mobile Traffic segment:
There two methods that Google uses to identify the number of users.
Calculation 1: Pre-calculated data
This calculation relies only on the number of sessions in the given date range and the time of each session. (This is determined by technology managed on the device, like a web browser, and is often referred to as the client-side time.) Because the result of this calculation can be added to the pre-aggregated data tables, Analytics can reference the table to quickly retrieve and serve this data in a report, including when you change the date range.
Calculation 2: Data calculated on the fly
Calculation 2 is based on the way you assign, collect, and store persistent data about your traffic. There are many solutions you can implement to customize this, but the most common way this data is going to be assigned and stored is through cookies managed via a web browser.
Adding a segment will force GA to calculate the data on the fly and that's why you are seeing a difference in the numbers.
Are you using GA free or 360? and the time range you are using is same in both reports?
You can also have a look into the Google article https://support.google.com/analytics/answer/2992042?hl=en
You are victim of sampling:
https://support.google.com/analytics/answer/2637192?hl=en
Sampling applies when:
you customize the reports
the number of sessions for the report time range exceeds 500K (GA) or 100M (GA 360)
The consequence is that:
the report will be based on a subset of the data (the % depends on the total number of sessions)
therefore your report data won't be as accurate as usual
What you can do to reduce sampling:
increase sample size in UI (will only decrease sampling to a certain extend, but in most cases won't completely remove sampling)
reduce time range
create filtered views so your reports contain the data you need and you don't have to customize them

R: Google Distance Matrix API request rate limit exceeded

I know that similar questions have been asked before, but from what I've been able to gather, none of the answers seem to apply to my case.
What I'm trying to do is replicate this, but in the R language : Computing the optimal road trip across the us
Everything works perfectly until I ask the Googles for the distance matrix for more than 10 locations. In my script (to follow) I list my API key, and on the API website I can see that my successful runs of the program (when the number of locations is less than 10) increase my usage for the day, so I know that my API is working... I think.
What I don't understand is why do I receive the "rate limit exceeded" error for, say, a distance matrix with 11 locations? If I have 1,500 requests left, I should certainly not have any issues, right? I should add that I am not familiar with other programming languages such as Java and Python, so that could explain part of my confusion.
Here be the relevant code:
# Request object from API
r <- GET(
"https://maps.googleapis.com/maps/api/distancematrix/json",
query = list(
origins = places,
destinations = places,
key = "INSERT API KEY HERE")
)
stop_for_status(r)
distances <- content(r)
The variable 'places' is simply a list containing the locations that I want distances to/from.
RTM?
Each query sent to the Google Maps Distance Matrix API is limited by
the number of allowed elements, where the number of origins times the
number of destinations defines the number of elements.
The Google Maps Distance Matrix API has the following limits in place:
Standard Usage Limits
Users of the standard API:
2,500 free elements per day
100 elements per query
100 elements per 10 seconds
Ergo: I think you have to split it up into several queries, with a 10 seconds pause in between, in order to get the full distance matrix.

Best retention practice using Graphite

I have been a happy user of Graphite+Grafana for a few months now and I have been advocating it around my firm.
My approach has been to measure data of interest and collect them into 1-minute or 5-minute buckets and send that information to Graphite. I was recently contacted by a group that processes quotes (billions a day!) and their approach has been to create a log line each time their applications process 1 million quotes. The problem is that the interval between 2 log lines can be highly erratic from 1 second to a few hours.
The dilemma is then: should I set my retention policy to a 1-second bucket so that I can see all measurements associated with spikes or should I use say a 1-minute bucket so that the number of data points to be saved and later on queried is much more manageable. FYI, when I set it to 1-second, showing the data for 8 or 10 charts, for a few days was bringing the system (or at least my browser) to a crawl because of the numbers of data points (mostly NULL) being pushed around from Graphite to Grafana
Here's my retention policy: 1s:10d,1m:36d,5m:180d
Alternatively, is there a way to configure Grafana+Graphite to only retrieve non-NULL data points?
What do you recommend?
You can always specify a lower retention period for 1s metrics so when you show a longer range Graphite will send you only the more coarse level.
For example, you can specify: 1s:2d, 1m:7d, 5m:180d
This way, if you show a range more than 2 days in the past you will get 1m resolution (and so on), which won't make your browser crawl, while you will still be able to inspect spikes in the last 2 days.

Having trouble getting accurate numbers from graphite

I have an application that publishes a number of stats to graphite via statsd. One of the stats simply sends a stat increment to statsd every time a message is received by the service. I need to display a graph that shows the the relative traffic over time for this stat. Generally speaking, I should be able to display a graph that refreshes every, say 10 seconds, and displays how many messages were recived in those 10 seconds as well as the history for a given period of time. However, no matter how I format my API query I cannot seem to get accurate data. I've read a number of articles including this one:
http://code.hootsuite.com/accurate-counting-with-graphite-and-statsd/
That seems to give some good insight but is still not quite giving me what I need. this is the closes I have come:
integral(hitcount(stats.recieved, "10seconds"))
However, I don't like the cumulative result of this and when I run this I get statistics that come nowhere near to what I see n my logs for messages received. I am ok with accepting some packet loss but we talking about orders of magnitude. I know I am doing something wrong. Just hoping someone can give me some insight as to what.
A couple of things to check/try:
Configure Graphite for Statsd
Check to make sure that you've used the retention schema and aggregation settings in Graphite that match how Statsd will be sending data (i.e. it sends one data point per 10 second flush interval).
Run a single Statsd aggregator
Be sure you are only running one instance of Statsd as running multiple statsd daemons will cause metrics to be dropped (as Graphite will be configured to only store one data point for it's highest precision of 10s:6h)
Limit the time range in the UI or URL API to less than 6 hours
When displaying graphs with data that crosses over the 6 hour threshold (e.g. from now to 7 hours ago), you will begin seeing 1 minute worth of aggregated count data for the displayed graph (if you've configured Graphite for statsd with retentions = 10s:6h,1min:7d,10min:5y). Rollups will occur based on the oldest data point in the time range (e.g. now till 7+ days = you'll get 10 min rollups).
If sending sparse or "bursty" data AND displaying old time range (triggering aggregation)
Confirm that your xFilesFactor is low enough that aggregation produces non null values even with a high rate of nulls. For example, 100 requests in the first 10 seconds, and none for the remaining 50 seconds in a minute would cause a storage of 100, null, null, null, null, null which would be summed up to null when the data ages if the XFilesFactor is higher than 1/6. Using the statsd recommended graphite configuration handles this, but it is good to know about... as this can give the appearance of lost data.
Saving schema or aggregation changes
If you changed the graphite schema or aggregation settings after any metrics were stored (in whisper = graphite's storage) you'll need to either delete the .wsp files for the metric (graphite will recreate them) or run whisper-resize.py.
Validating settings
You can verify the settings against some whisper data by running whisper-info.py on a .wsp file. Find the .wsp file for one of your metrics in /graphite/storage/whisper/
Run: whisper-info.py my_metric_data.wsp. whisper-info.py output should tell you more about how the storage settings are working.
TLDR;
You should ensure that Graphite is set to store one data point per 10 second interval for metrics coming from StatsD. You should make sure that Graphite is summing (not averaging) for count data coming from Statsd. Both of these can be handled by using the recommended Statsd configuration settings. Don't run more than one Statsd aggregator. When using the UI, limit the data returned to less than 6 hours OR understand what rollup you are viewing when looking at data that crosses retention thresholds. Lastly, make sure the settings take (if you've already been sending metrics).

Resources