When we request a metric's data-points in graphite say - http://graphite-server/render?target=&width=1200&height=750&format=json
how many data points are retrieved (i.e. how many data-points does it return by default if we don't specify 'from' and 'until' parameters?).
Does graphite give all the datapoints from the whisper database? or
does it give all the data-points that fall under the first retention policy period? or
does it return certain number of data-points?
For my local graphite instance I see that by default it returns all the data-points within last 24 hours. But, in my companies server I see that the graphite is returning data-points for last 30 hours.
Recently, I read somewhere that by default graphite gives/sends data-points of 24hours.
(In the question, I said I saw data-points of last 30hrs in my companies graphite-instance. But, that's not true (maybe I miss understood it :)). Even the graphite instance in my company by default is sending data-points for last 24hrs :))
Related
I have Graphite running on a Docker container and I've fed 24 hours worth of data sampled at 20 minute intervals to nine metrics – far from being a large payload. If I graph each metric in the Graphite web app, the last six hours of data are invisible. If I pull the raw data from the render API, these data points are indeed null (timestamps with no value).
However, if I narrow the time range down to the last six hours, the graphs display all the data I would expect. Weirder still, if I try to view this data using Grafana, the same thing happens: the last six hours are not shown unless I shrink the time range.
Is there any way to fix this so that recent data points are visible while viewing more than 6 hours of data?
I would start by looking at the storage-schemas.conf and storage-aggregation.conf files.
Do you have a different retention after the 6 hour?
We had a similar issue with data disappearing after the first 24h where we had high resolution. We had to tune how data is aggregated to the "next level".
Or maybe it is just the data is not yet written to disk - and only exists in the carbon-cache at the moment.
When using https://www.linkedin.com/countserv/count/share?format=json&url= to access an article's sharecount, is there an api daily limit?
We noticed that the time it was taking to retrieve count data was taking as much as 20 seconds on our production server. We added logic to cache the number of counts, and the 20 second delay stopped the next day. We are left wondering though what the limit might be (we can't seem to find it in your documentation).
I am tracking a number of events on a website and am trying to extract some analytics data via the api. The problem I have found can be boiled down to this scenario. If I want to know how many unique events have happened per day, I can run a query such as:
?start-date=2016-02-19&end-date=2016-02-24&metrics=ga%3AuniqueEvents&dimensions=ga%3Adate
which will give me table of the number of unique events per day from Feb 19th to Feb 24th. In my specific example, I will have a row that say I had 12914 unique events on Feb 22nd.
If I now change the time period for the query to something like this:
?start-date=2016-02-01&end-date=2016-05-01&metrics=ga%3AuniqueEvents&dimensions=ga%3Adate
I will basically get the same table, only from Feb 1st to Mai 1st. Was suprises me though is, that now the column for Feb 22nd reads 12966 events, while my assumption would be, that this number should actually stay the same.
Is there something I'm missing here? In which scenario would these numbers make sense? Thanks for your help!
Check the API response for the value of containsSampledData.
Sampling is the practice of selecting a subset of data from your traffic and reporting on the trends available in that sample set.
You can specify the sampling level to use for a request by setting the samplingLevel parameter to HIGHER_PRECISION.
You can also try simplifying your request by shortening the date range, or requesting fewer dimensions.
I have been a happy user of Graphite+Grafana for a few months now and I have been advocating it around my firm.
My approach has been to measure data of interest and collect them into 1-minute or 5-minute buckets and send that information to Graphite. I was recently contacted by a group that processes quotes (billions a day!) and their approach has been to create a log line each time their applications process 1 million quotes. The problem is that the interval between 2 log lines can be highly erratic from 1 second to a few hours.
The dilemma is then: should I set my retention policy to a 1-second bucket so that I can see all measurements associated with spikes or should I use say a 1-minute bucket so that the number of data points to be saved and later on queried is much more manageable. FYI, when I set it to 1-second, showing the data for 8 or 10 charts, for a few days was bringing the system (or at least my browser) to a crawl because of the numbers of data points (mostly NULL) being pushed around from Graphite to Grafana
Here's my retention policy: 1s:10d,1m:36d,5m:180d
Alternatively, is there a way to configure Grafana+Graphite to only retrieve non-NULL data points?
What do you recommend?
You can always specify a lower retention period for 1s metrics so when you show a longer range Graphite will send you only the more coarse level.
For example, you can specify: 1s:2d, 1m:7d, 5m:180d
This way, if you show a range more than 2 days in the past you will get 1m resolution (and so on), which won't make your browser crawl, while you will still be able to inspect spikes in the last 2 days.
I have an application that publishes a number of stats to graphite via statsd. One of the stats simply sends a stat increment to statsd every time a message is received by the service. I need to display a graph that shows the the relative traffic over time for this stat. Generally speaking, I should be able to display a graph that refreshes every, say 10 seconds, and displays how many messages were recived in those 10 seconds as well as the history for a given period of time. However, no matter how I format my API query I cannot seem to get accurate data. I've read a number of articles including this one:
http://code.hootsuite.com/accurate-counting-with-graphite-and-statsd/
That seems to give some good insight but is still not quite giving me what I need. this is the closes I have come:
integral(hitcount(stats.recieved, "10seconds"))
However, I don't like the cumulative result of this and when I run this I get statistics that come nowhere near to what I see n my logs for messages received. I am ok with accepting some packet loss but we talking about orders of magnitude. I know I am doing something wrong. Just hoping someone can give me some insight as to what.
A couple of things to check/try:
Configure Graphite for Statsd
Check to make sure that you've used the retention schema and aggregation settings in Graphite that match how Statsd will be sending data (i.e. it sends one data point per 10 second flush interval).
Run a single Statsd aggregator
Be sure you are only running one instance of Statsd as running multiple statsd daemons will cause metrics to be dropped (as Graphite will be configured to only store one data point for it's highest precision of 10s:6h)
Limit the time range in the UI or URL API to less than 6 hours
When displaying graphs with data that crosses over the 6 hour threshold (e.g. from now to 7 hours ago), you will begin seeing 1 minute worth of aggregated count data for the displayed graph (if you've configured Graphite for statsd with retentions = 10s:6h,1min:7d,10min:5y). Rollups will occur based on the oldest data point in the time range (e.g. now till 7+ days = you'll get 10 min rollups).
If sending sparse or "bursty" data AND displaying old time range (triggering aggregation)
Confirm that your xFilesFactor is low enough that aggregation produces non null values even with a high rate of nulls. For example, 100 requests in the first 10 seconds, and none for the remaining 50 seconds in a minute would cause a storage of 100, null, null, null, null, null which would be summed up to null when the data ages if the XFilesFactor is higher than 1/6. Using the statsd recommended graphite configuration handles this, but it is good to know about... as this can give the appearance of lost data.
Saving schema or aggregation changes
If you changed the graphite schema or aggregation settings after any metrics were stored (in whisper = graphite's storage) you'll need to either delete the .wsp files for the metric (graphite will recreate them) or run whisper-resize.py.
Validating settings
You can verify the settings against some whisper data by running whisper-info.py on a .wsp file. Find the .wsp file for one of your metrics in /graphite/storage/whisper/
Run: whisper-info.py my_metric_data.wsp. whisper-info.py output should tell you more about how the storage settings are working.
TLDR;
You should ensure that Graphite is set to store one data point per 10 second interval for metrics coming from StatsD. You should make sure that Graphite is summing (not averaging) for count data coming from Statsd. Both of these can be handled by using the recommended Statsd configuration settings. Don't run more than one Statsd aggregator. When using the UI, limit the data returned to less than 6 hours OR understand what rollup you are viewing when looking at data that crosses retention thresholds. Lastly, make sure the settings take (if you've already been sending metrics).