Basically, my problem is Graphite is not showing the correct data, as if it is not aggregating the data properly.
What I want to do is create a view counter using Graphite.
My Configuration
I'm using Node-StatsD to send data to Carbon:
var client = new StatsD({
host: config.host,
port: config.port,
prefix: config.prefix
});
client.increment("bucketName");`
I have the following storage-schemas.conf:
[default]
pattern = .*
retentions = 1h:14d,1d:99y
and everything else is on default including storage-aggregation.conf, which by default uses the aggregation method sum and has xFilesFactor of 0.
The Problem
I've confirmed that the data is able to reach Graphite as it did create my bucket and stats.counters.statsd.metrics_received.count does increase.
However my bucket count does not increase, here are the results of my query:
target=bucket.count&rawData=true gives
bucket.count,1553497200,1553583600,3600|0.0,0.0,0.0,...,0.0,0.0
target=summarize(bucket.count,"1d")&format=json gives
{"datapoints": [[0.0, 1553472000], [0.0, 1553558400]], "target":...}
hitcount(bucket.count,"1d") gives
{"datapoints": [[3600.0, 1553497200]], "target":...}
hitcount does give something that is not 0 but the count is way more than I send to Graphite. I sent at most 1 or 2 dozen in the last hour.
Related
Using: telegraf version 1.23.1
Thats the workflow Telegraf => Influx => Grafana.
I am using telegraf to check my metrics on a shared server. So far so good, i already could initalize the Telegraf uWSGI Plugin and display the data of my running django projects in grafana.
Problem
Now i wanted to check some folder size too with the [[inputs.filecount]] Telegraf Plugin and this works also well. However i do not need Metrics for every 10s for this plugin. So i change the interval like mentioned in the Documentation in the [[inputs.filecount]] Plugin.
telegraf.conf
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "5s"
flush_interval = "10s"
flush_jitter = "0s"
#... PLUGIN
[[inputs.filecount]]
# set different interval for this input plugin every 10min
interval=“600s”
collection_jitter=“20s”
# Default from Doc =>
directories = ["/home/myserver/logs", "/home/someName/growingData, ]
name = "*"
recursive = true
regular_only = false
follow_symlinks = false
size = "0B"
mtime = "0s"
After restarting Telegram with Supervisor it crashed because it could not parse the new lines.
supervisor.log
Error running agent: Error loading config file /home/user/etc/telegraf/telegraf.conf: Error parsing data: line 208: invalid TOML syntax
So that are these lines i added because i thought that is how the Doc it mention it.
telegraf.conf
# set different interval for this input plugin every 10min
interval=“600s”
collection_jitter=“20s”
Question
So my question is. How can i change or setup the interval for a single input plugin in telegraf?
Or do i have to apply a different TOML syntax like [[inputs.filecount.agent]] or so?
I assume that i do not have to change any output interval also? Because i assume even though its currently 10s, if this input plugin only pulls/inputs data every 600s it should not matter, some flush cycle will push the Data to influx .
How can i change or setup the interval for a single input plugin in telegraf?
As the link you pointed to shows, individual inputs can set the interval and collection_jitter options. There is no difference in the TOML syntax for example I can do the following for the memory input plugin:
[[inputs.mem]]
interval="600s"
collection_jitter="20s"
I assume that i do not have to change any output interval also?
Correct, these are independent of each other.
line 208: invalid TOML syntax
Knowing what exactly is on line 208 and around that line will hopefully resolve your issue and get you going again. Also make sure your quotes that you used are correct. Sometimes when people copy and paste quotes they get ” vs " which can cause issues!
Using python-ldap.search_s() function (https://www.python-ldap.org/en/python-ldap-3.3.0/reference/ldap.html#ldap.LDAPObject.search_s) with params...
base = DC=myorg,DC=local
filterstr = (&(sAMAccountName={login})(|(memberOf=CN=zone1,OU=zones,OU=datagroups,DC=myorg,DC=local)(memberOf=CN=zone2,OU=zones,OU=datagroups,DC=myorg,DC=local)))
...to try to match against a specific AD user.
Yet when I look at the result returned (with login = myuser), I see something like:
[
(u'CN=zone1,OU=zones,OU=datagroups,DC=myorg,DC=local', {u'sAMAccountName': ['myuser']}),
(None, [u'ldap://DomainDnsZones.myorg.local/DC=DomainDnsZones,DC=myorg,DC=local']),
(None, [u'ldap://ForestDnsZones.myorg.local/DC=ForestDnsZones,DC=myorg,DC=local']),
(None, [u'ldap://myorg.local/CN=Configuration,DC=myorg,DC=local'])
]
where there are multiple other hits in the list (besides the myuser sAMAccountName match) that have nothing to do with the search filter.
Looking at the docs (https://www.python-ldap.org/en/python-ldap-3.3.0/faq.html) these appear to be "search continuations" / referrals that are included when the search base is at the domain level and it says that they can be turned off by including the code like...
l = ldap.initialize('ldap://foobar')
l.set_option(ldap.OPT_REFERRALS,0)
as well as trying
ldap.set_option(ldap.OPT_REFERRALS,0)
l = ldap.initialize('ldap://foobar')
...yet adding this code does not change the behavior at all and I get the same results (see https://www.python-ldap.org/en/python-ldap-3.3.0/reference/ldap.html?highlight=set_option#ldap.set_option).
Am I misunderstanding something here? Anyone know how to get these to stop popping up? Anyone know the structure of the tuples that this function returns (the docs do not describe)?
Just talked to someone else more familiar with python-ldap and was told that OPT_REFERRALS is controlling if you automatically follow the referral, but it doesn't stop AD from sending them.
For now, the only approach they recommended was to filter these values with something like:
results = ldap.search_s(...)
results = [ x for x in results if x[0] is not None ]
Noting that the structure of the results returned from search_s() is
[
( dn, {
attrname: [ value, value, ... ],
attrname: [ value, value, ... ],
}),
]
When it's a referral it's a DN of None and the entry dict is replaced with an array of URI's.
* (Note that in the search_s call you can request specific attributes to be returned in your search too)
* (Note that since my base DN is a domain level path, using the ldap.set_option(ldap.OPT_REFERRALS,0) snippet was still useful just to stop the search_s() from actually going down the referral paths (which was adding a few seconds to the search time))
Again, I believe that this problem is due to the base DN being a domain level path (unless there is some other base_dn or search.filter I could use for that fact that the group users are scattered across various AD paths in the domain that I'm missing).
I have the following aggregation rule:
abc.prod.ALL.<service>.<metric>.count (60) = sum abc.local.*.<service>.<<metric>>.count
Given metrics like:
abc.prod.host1.aservice.ametric.count
abc.prod.host2.aservice.ametric.count
I would expect them to be aggregated to
abc.prod.ALL.aservice.ametric.count
But that metric is never created. In aggregator logs, I see
Allocating new metric buffer for abc.prod.ALL.aservice.ametric.count
but it's not created. If I add a layer to the generated metric like:
abc.prod.extralayer.ALL.<service>.<metric>.count (60) = sum abc.local.*.<service>.<<metric>>.count
then we seem to get a recursive explosion of created metrics like:
abc.prod.extralayer.ALL.aservice.ametric.count
abc.prod.extralayer.ALL.ALL.aservice.ametric.count
abc.prod.extralayer.ALL.ALL.ALL.aservice.ametric.count
abc.prod.extralayer.ALL.ALL.ALL.ALL.aservice.ametric.count
Which led me to believe that the generated metric is then aggregated again...
I added a logging line to AggregationProcessor.process:
else:
log.clients("Found aggregate " + aggregate_metric + " for " + metric)
aggregate_metrics.add(aggregate_metric)
And then tried with my original, desired rule.. and I eventually started to see, loglines like:
Found aggregate abc.prod.ALL.aservice.ametric.count for abc.prod.ALL.aservice.ametric.count
It matched itself as if it was a new incoming metric... Why is it being fed back into the aggregator?
This appears to have been a bug. It was not in older version but was in master at the time of my question.
If you are seeing this behaviour, follow the issue on GitHub:
https://github.com/graphite-project/carbon/issues/560
https://github.com/graphite-project/carbon/issues/455
There is no point in continuing the question here on SO.
Note: I am using the older version, 0.9.15 and not seeing the problem - so I recommend this until it is confirmed to not be resolved in master.
I'm working on scollector and I want to have specific frequencies for different collector.
For example:
get info from disk usage every 5 minutes
info from memory every minute
iostat every 30 seconds
and so on...
Here is a part of the conf.toml I made:
FullHost = true
Freq = 60
DisableSelf = true
[[iostat]]
Filter = "iostat"
Freq = 30
[[memory]]
Filter = "memory"
Freq = 60
But I get some error
./scollector -conf="perso.toml" -p
2016/04/19 14:40:45 fatal: main.go:297: extra keys in perso.toml: [iostat iostat.Freq memory memory.Freq]
It seems that I cannot multiply the frequencies.
What should I do to get what I want?
Thank you all
According to scollector documentation, Freq is a global setting, so it's not possible to set different frequencies for each collector. The exception is for external collectors, which may be put in a folder named after the desired frequency (in seconds).
Freq is indeed global setting and interval is usually set to it. Although some collectors override interval to different values e.g. elasticsearch-indices runs every 15 minutes because there's a lot of data to pull.
To change it either
(best) hack scollector code to read and pass freq parameter to every collector
(second best) file a github issue
(last resort) you can just change intervals scollector code in specific collectors and recompile scollector
Well, we might found something.
We create differents folders representing several Freq (0, 30, 60, 120...) and in each folders, we write external collectors we need.
'/etc/collectors/0',
'/etc/collectors/15',
'/etc/collectors/30',
'/etc/collectors/60',
'/etc/collectors/120',
'/etc/collectors/300',
'/etc/collectors/600'
In the conf.toml:
ColDir = "/etc/scollector/collectors"
If we want the internal collectors, we have to rewrite them :(
I downloaded statsd and graphite 0.9.x
I used the stats-client provided with source of statsd as follows:
./statsd-client.sh 'development.com.alpha.operation.testing.rate:1|c'
I did the above operation 10 times.
Then I tried querying for a summary for last 24 hours:
http://example.com/render?format=json&target=summarize(stats.development.com.alpha.operation.testing.rate,
"24hours", "sum",true)&from=-24hours&tz=UTC
I get 1 datapoint as follows:
"datapoints": [[0.0, 1386277560]]}]
Why I am getting 0.0? Even Graphite Composer does not display anything
I was expecting a value of "10" as I performed the operation 10 times. What did I do wrong?
storage-schemas.conf
[carbon]
pattern = ^carbon\.
retentions = 60:90d
[default_1min_for_1day]
pattern = .*
retentions = 60s:1d
Please help me understand the problem.
EDIT:
As per answer below, I changed storage-aggregation and I get following response after running whisper-info on metric_file.wsp. But I am still getting "0.0" as value in datapoint and Graphite browser does not display anything.
maxRetention: 86400
xFilesFactor: 0.0
aggregationMethod: sum
fileSize: 17308
Archive 0
retention: 86400
secondsPerPoint: 60
points: 1440
size: 17280
offset: 28
I also looked at stats_counts Tree as suggested in another answer, but its the same.
What is wrong with my setup. I am using default setting for everything but the changes suggested by an answer below in storage-aggregation
Within the whisper package, you will get a script- whisper-info.py. Invoke it on the appropriate metric file-
/whisper-info.py /opt/graphite/storage/whisper/alpha/beta/charlie.wsp
You will get something like this-
maxRetention: 31536000
xFilesFactor: 0.0
aggregationMethod: sum
fileSize: 1261468
Archive 0
retention: 31536000
secondsPerPoint: 300
points: 105120
size: 1261440
offset: 28
Here, make sure that aggregationMethod is sum, and xFilesFactor is 0.0. Most probably it is not, since this isn't graphite's default behavior. Now make a regex that picks up your metrics and put it at the beginning of the config file storage-aggregation.conf. This will ensure that the newly created metrics follow this new aggregation rule. You can read more about how xFilesFactor works here.
Have you tried using the stats_counts tree instead of stats? StatsD populates both for regular counters. stats by default does some fancy averaging which can tend make low-intensity stat signals disappear, whereas stats_counts just gives you the straight-up count, which sounds like what you want.