My current retention rule is like so:
[whatever]
priority = 110
pattern = ^stats\.whatever\..*
retentions = 60:10080,600:262974
If I understand correctly, this will save 2 days of 1 minute data and 5 years of ten minute data.
I have been sending data to graphite for the last couple of hours and I can see the a graph of this data but only for ranges less than 7 hours. If I try to visualize this data for a range of, for example, 1 day, the resulting graph doesn't show a single data point.
Is this caused by my retention rule?
thanks in advance.
I had this same problem. After you change your retention rules, you need to restart carbon-cache.py. If you want to keep the data you have you need to run whisper-resize.py on your whisper files (.wsp).
This link should help too:
https://answers.launchpad.net/graphite/+question/140289
However in that link, the parameters passed to whisper-resize.py are in the wrong order. It should be whisper-resize.py <file> <retention rate>
Here's a helpful command for resizing:
find /opt/graphite/storage/whisper -type f -name "*.wsp" -exec whisper-resize.py {} <retention rate> \;
Adjust it as needed.
I had a similar problem; for me it wasn't the retention rules, but the aggregation rules. By default, my counters were being assigned to --agggregationMethod average and -xFilesFactor 0.5. But my data was nowhere near that dense, so the aggregator was throwing away my data on the grounds that there wasn't a statistically significant sample available.
In my particular use case, I was interested in the peak value over some time period, so I used whisper-resize.py to reconfigure my database: --aggregationMethod max, --xFilesFactor 0.0 gave me the behavior I was expecting.
See also storage-aggregation.conf
Related
I am currently trying to compile temperature information from the WDFE5 Data set which is quite large in size and am struggling to find an efficient way to meet my goal. My main goals are to:
Determine the max temperature for individual days for each individual grid cell
Change the time step from hourly to daily and from UTC to MST.
The data set is stored in monthly NC4 files and contains the temperature data in a 3 dimensional matrix (time lat lon). My main question is if there is a efficient way to compile this data to meet my goals or to manipulate the NC4 files to be easier to play around with (Somehow merge the monthly files into one mega file?)
I have tried two rather convoluted ways to catch holes between months (Example : due to the time conversion, some dates end up spanning between two months, which requires me to read in the next file and then continuing to read the data).
My first try was to individually read 1 month / file at a time, using pmax() to get the max value of each grid cell, and comparing time steps for 24 hours, and then repeating the process. I have been using
ncvar_get() with start and count to only read one time step at a time. To catch days that span two months, I was able to create a convoluted function to merge the two, by calculating the number of 1 hour periods left in one month, and how much would be needed from the next.
My second try still involved pmax(), but I tried a different method to fill in any holes between months. I set a date vector from the time variable to each hour time step, and match by same day. While this seems better, it still has to read multiple NC4 files which gets very convoluting compared to being able to just reading one NC4 file with all the needed information.
In the end, I tested a few test cases and both seem to solutions seem to work, but run extremely slow and seem very overcomplicated to me. I was wondering if anyone had suggestions on how to better set up the NC4 files for reading and time conversion.
I am concatenating 1000s of nc-files (outputs from simulations) to allow me to handle them more easily in Matlab. To do this I use ncrcat. The files have different sizes, and the time variable is not unique between files. The concatenate works well and allows me to read the data into Matlab much quicker than individually reading the files. However, I want to be able to identify the original nc-file from which each data point originates. Is it possible to, say, add the source filename as an extra variable so I can trace back the data?
Easiest way: Online indexing
Before we start, I would use an integer index rather than the filename to identify each run, as it is a lot easier to handle, both for writing and then for handling in the matlab programme. Rather than a simple monotonically increasing index, the identifier can have relevance for your run (or you can even write several separate indices if necessary (e.g. you might have a number for the resolution, the date, the model version etc).
So, the obvious way to do this that I can think of would be that each simulation writes an index to the file to identify itself. i.e. the first model run would write a variable
myrun=1
the second
myrun=2
and so on... then when you cat the files the data can be uniquely identified very easily using this index.
Note that if your spatial dimensions are not unique and the number of time steps also changes from run to run from what you write, your index will need to be a function of all the non-unique dimensions, e.g. myrun(x,y,t). If any of your dimensions are unique across all files then that dimension is redundant in the index and can be omitted.
Of course, the only issue with this solution is it means running the simulations again :-D and you might be talking about an expensive model to run or someone else's runs you can't repeat. If rerunning is out of the question you will need to try to add an index offline...
Offline indexing (easy if grids are same, more complex otherwise)
IF your space dimensions were the same across all files, then this is still an easy task as you can add an index offline very easily across all the time steps in each file using nco:
ncap2 -s 'myrun[$time]=array(X,0,$time)' infile.nc outfile.nc
or if you are happy to overwrite the original file (be careful!)
ncap2 -O -s 'myrun[$time]=array(X,0,$time)'
where X is the run number. This will add a variable, with a new variable myrun which is a function of time and then puts X at each step. When you merge you can see which data slice was from which specific run.
By the way, the second zero is the increment, as this is set to zero the number X will be written for all timesteps in a given file (otherwise if it were 1, the index would increase by one each timestep - this could be useful in some cases. For example, you might use two indices, one with increment of zero to identify the run, and the second with an increment of unity to easily tell you which step of the Xth run the data slice belongs to).
If your files are for different domains too, then you might want to put them on a common grid before you do that... I think for that
cdo enlarge
might be of help, see this post : https://code.mpimet.mpg.de/boards/2/topics/1459
I agree that an index will be simpler than a filename. I would just add to the above answer that the command to add a unique index X with a time dimension to each input file can be simplified to
ncap2 -s 'myrun[$time]=X' in.nc out.nc
I am trying to understand how Graphite treats over samples. I read the documentation but could not find the answer.
For example, If I specify in Graphite that the retention policy should be 1 sample in 60 seconds and graphite receives something like 200 values in 60 seconds, what will be stored exactly ? Will graphite take an average or a random point in those 200 points ?
Short answer: it depends on the configuration, default is to take the last one.
Long answer, Graphite can configure, using regexp a strategy to aggregate several points in one sample.
These strategies are configured in storage-aggregations.conf file, using regexp to select metrics:
[all_min]
pattern = \.min$
aggregationMethod = min
This example conf, will aggregate points using their minimum.
By default, the last point to arrive wins.
This strategy will always be used to aggregate from higher resolutions to lower resolutions.
For example, if storage-schemas.conf contains:
[all]
pattern = .*
retentions = 1s:8d,1h:1y
Given the sum aggregation method, all points arrived for the same second will be summed and stored with a second resolution.
Points older than 8 days will be summed again to one hour resolution.
The aggregation configuration only applies when moving from archive i to archive i+1. For oversampling, it's always pick the last sample in the period.
The recommendation is to match sampling rate with the configuration.
see graphite issue
I'm using Graphite and Carbon-cache and trying to understand why it doesn't appear to be applying aggregation to data.
I have a whipser database:
whisper-create.py /opt/graphite/storage/whisper/test/test.wsp 60:1y
From the metadata I am using an average aggregation method:
Meta data:
aggregation method: average
max retention: 31536000
xFilesFactor: 0.5
And I am writing two values to it:
echo "test.test 1 `date +%s`" | nc localhost 2003;
echo "test.test 100 `date +%s`" | nc localhost 2003;
When I look at my whisper databse I see the following value:
42: 1395315780, 100
I would have expected this value to be 100+1 / 2 = 50.5
It appears to be using the last value, rather than an average of the two values.
I feel like I may be missing something here. Could anyone explain?
The answer was to use the carbon-aggregator, not the carbon-cache.
The carbon-cache will always replace the value, no matter what. If time per point is 1 second and you send more than one value within a second then the last value is what will be stored.
If you want more than one value to be kept you need to use the carbon-aggregator (running on a different port) and configure how it should aggregate the data (sum, average).
I had the same problem and no access on the graphite / whisper settings.
There is another solution. Aggregate data externally and the send it to graphite data port.
https://github.com/floringavrila/graphite-feeder
I'm trying to graph data using statsd and graphite. I have a simple counter, I increment it by 1, and then when I graph the values for the counter over the day, I see strange values like 0.09 as the peak in my graph (see http://i.stack.imgur.com/o4gmz.png)
This graph should be showing 2 logins, but instead it's showing 0.09. If I change the time scale from 1 day to the last 15 minutes, then it correctly shows the two logins (see http://i.stack.imgur.com/23vDJ.png)
I've set up my finest retention to be in 10s increments in storage-schemas.conf:
retentions = 10s:7d,1m:21d,24h:5y
I've set up my storage-aggregation.conf file to sum counts:
[sum]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum
(And, before you ask, yes; this is a .count).
If I try my URL with &rawData=true then in either case I see some Nones, some 0.0s, and a pair of 1.0s separated by some 0.0s. I never see these fractional values that somehow show up on the graph. So... Is this a bug? Am I doing something wrong?
There's also consolidateBy function which tells graphite what to do if there's no enough pixels to draw everything accurately. By default it's using "avg" function and therefore strange results when time ranges are greater. Here excerpt from documentation:
When a graph is drawn where width of the graph size in pixels is
smaller than the number of datapoints to be graphed, Graphite
consolidates the values to to prevent line overlap. The
consolidateBy() function changes the consolidation function from the
default of ‘average’ to one of ‘sum’, ‘max’, or ‘min’. This is
especially useful in sales graphs, where fractional values make no
sense and a ‘sum’ of consolidated values is appropriate.
Another function that could be useful is hitcount. Short excerpt from here why it's useful:
This function is like summarize(), except that it compensates
automatically for different time scales (so that a similar graph
results from using either fine-grained or coarse-grained records) and
handles rarely-occurring events gracefully.
I spent some time scratching my head why I get fractions for my counter with time ranges longer than couple hours when my aggregation rule is max. It's pretty confusing, especially at the beginning when you play with single counters to see if everything works. Checking rawData is quite a good way for debugging sanity check ;)