Configuring scollector to get different frequences for different collectors - opentsdb

I'm working on scollector and I want to have specific frequencies for different collector.
For example:
get info from disk usage every 5 minutes
info from memory every minute
iostat every 30 seconds
and so on...
Here is a part of the conf.toml I made:
FullHost = true
Freq = 60
DisableSelf = true
[[iostat]]
Filter = "iostat"
Freq = 30
[[memory]]
Filter = "memory"
Freq = 60
But I get some error
./scollector -conf="perso.toml" -p
2016/04/19 14:40:45 fatal: main.go:297: extra keys in perso.toml: [iostat iostat.Freq memory memory.Freq]
It seems that I cannot multiply the frequencies.
What should I do to get what I want?
Thank you all

According to scollector documentation, Freq is a global setting, so it's not possible to set different frequencies for each collector. The exception is for external collectors, which may be put in a folder named after the desired frequency (in seconds).

Freq is indeed global setting and interval is usually set to it. Although some collectors override interval to different values e.g. elasticsearch-indices runs every 15 minutes because there's a lot of data to pull.
To change it either
(best) hack scollector code to read and pass freq parameter to every collector
(second best) file a github issue
(last resort) you can just change intervals scollector code in specific collectors and recompile scollector

Well, we might found something.
We create differents folders representing several Freq (0, 30, 60, 120...) and in each folders, we write external collectors we need.
'/etc/collectors/0',
'/etc/collectors/15',
'/etc/collectors/30',
'/etc/collectors/60',
'/etc/collectors/120',
'/etc/collectors/300',
'/etc/collectors/600'
In the conf.toml:
ColDir = "/etc/scollector/collectors"
If we want the internal collectors, we have to rewrite them :(

Related

How to change the interval of a plugin in telegraf?

Using: telegraf version 1.23.1
Thats the workflow Telegraf => Influx => Grafana.
I am using telegraf to check my metrics on a shared server. So far so good, i already could initalize the Telegraf uWSGI Plugin and display the data of my running django projects in grafana.
Problem
Now i wanted to check some folder size too with the [[inputs.filecount]] Telegraf Plugin and this works also well. However i do not need Metrics for every 10s for this plugin. So i change the interval like mentioned in the Documentation in the [[inputs.filecount]] Plugin.
telegraf.conf
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "5s"
flush_interval = "10s"
flush_jitter = "0s"
#... PLUGIN
[[inputs.filecount]]
# set different interval for this input plugin every 10min
interval=“600s”
collection_jitter=“20s”
# Default from Doc =>
directories = ["/home/myserver/logs", "/home/someName/growingData, ]
name = "*"
recursive = true
regular_only = false
follow_symlinks = false
size = "0B"
mtime = "0s"
After restarting Telegram with Supervisor it crashed because it could not parse the new lines.
supervisor.log
Error running agent: Error loading config file /home/user/etc/telegraf/telegraf.conf: Error parsing data: line 208: invalid TOML syntax
So that are these lines i added because i thought that is how the Doc it mention it.
telegraf.conf
# set different interval for this input plugin every 10min
interval=“600s”
collection_jitter=“20s”
Question
So my question is. How can i change or setup the interval for a single input plugin in telegraf?
Or do i have to apply a different TOML syntax like [[inputs.filecount.agent]] or so?
I assume that i do not have to change any output interval also? Because i assume even though its currently 10s, if this input plugin only pulls/inputs data every 600s it should not matter, some flush cycle will push the Data to influx .
How can i change or setup the interval for a single input plugin in telegraf?
As the link you pointed to shows, individual inputs can set the interval and collection_jitter options. There is no difference in the TOML syntax for example I can do the following for the memory input plugin:
[[inputs.mem]]
interval="600s"
collection_jitter="20s"
I assume that i do not have to change any output interval also?
Correct, these are independent of each other.
line 208: invalid TOML syntax
Knowing what exactly is on line 208 and around that line will hopefully resolve your issue and get you going again. Also make sure your quotes that you used are correct. Sometimes when people copy and paste quotes they get ” vs " which can cause issues!

AzureML Dataset.File.from_files creation extremely slow even with 4 files

I have a few thousand of video files in my BlobStorage, which I set it as a datastore.
This blob storage receives new files every night and I need to split the data and register each split as a new version of AzureML Dataset.
This is how I do the data split, simply getting the blob paths and splitting them.
container_client = ContainerClient.from_connection_string(AZ_CONN_STR,'keymoments-clips')
blobs = container_client.list_blobs('soccer')
blobs = map(lambda x: Path(x['name']), blobs)
train_set, test_set = get_train_test(blobs, 0.75, 3, class_subset={'goal', 'hitWoodwork', 'penalty', 'redCard', 'contentiousRefereeDecision'})
valid_set, test_set = split_data(test_set, 0.5, 3)
train_set, test_set, valid_set are just nx2 numpy arrays containing blob storage path and class.
Here is when I try to create a new version of my Dataset:
datastore = Datastore.get(workspace, 'clips_datastore')
dataset_train = Dataset.File.from_files([(datastore, b) for b, _ in train_set[:4]], validate=True, partition_format='**/{class_label}/*.mp4')
dataset_train.register(workspace, 'train_video_clips', create_new_version=True)
How is it possible that the Dataset creation seems to hang for an indefinite time even with only 4 paths?
I saw in the doc that providing a list of Tuple[datastore, path] is perfectly fine. Do you know why?
Thanks
Do you have your Azure Machine Learning Workspace and your Azure Storage Account in different Azure Regions? If that's true, latency may be a contributing factor with validate=True.
Another possibility may be slowness in the way datastore paths are resolved. This is an area where improvements are being worked on.
As an experiment, could you try creating the dataset using a url instead of datastore? Let us know if that makes a difference to performance, and whether it can unblock your current issue in the short term.
Something like this:
dataset_train = Dataset.File.from_files(path="https://bloburl/**/*.mp4?accesstoken", validate=True, partition_format='**/{class_label}/*.mp4')
dataset_train.register(workspace, 'train_video_clips', create_new_version=True)
I'd be interested to see what happens if you run the dataset creation code twice in the same notebook/script. Is it faster the second time? I ask because it might be an issue with the .NET core runtime startup (which would only happen on the first time you run the code)
EDIT 9/16/20
While it doesn't seem to make sense that .NET core invoked when not data is moving, is suspect it is the validate=True part of the param that requires that all the data be inspected (which can computationally expensive). I'd be interested to see what happens if that param is False

Carbon Aggregator re-aggregating metric

I have the following aggregation rule:
abc.prod.ALL.<service>.<metric>.count (60) = sum abc.local.*.<service>.<<metric>>.count
Given metrics like:
abc.prod.host1.aservice.ametric.count
abc.prod.host2.aservice.ametric.count
I would expect them to be aggregated to
abc.prod.ALL.aservice.ametric.count
But that metric is never created. In aggregator logs, I see
Allocating new metric buffer for abc.prod.ALL.aservice.ametric.count
but it's not created. If I add a layer to the generated metric like:
abc.prod.extralayer.ALL.<service>.<metric>.count (60) = sum abc.local.*.<service>.<<metric>>.count
then we seem to get a recursive explosion of created metrics like:
abc.prod.extralayer.ALL.aservice.ametric.count
abc.prod.extralayer.ALL.ALL.aservice.ametric.count
abc.prod.extralayer.ALL.ALL.ALL.aservice.ametric.count
abc.prod.extralayer.ALL.ALL.ALL.ALL.aservice.ametric.count
Which led me to believe that the generated metric is then aggregated again...
I added a logging line to AggregationProcessor.process:
else:
log.clients("Found aggregate " + aggregate_metric + " for " + metric)
aggregate_metrics.add(aggregate_metric)
And then tried with my original, desired rule.. and I eventually started to see, loglines like:
Found aggregate abc.prod.ALL.aservice.ametric.count for abc.prod.ALL.aservice.ametric.count
It matched itself as if it was a new incoming metric... Why is it being fed back into the aggregator?
This appears to have been a bug. It was not in older version but was in master at the time of my question.
If you are seeing this behaviour, follow the issue on GitHub:
https://github.com/graphite-project/carbon/issues/560
https://github.com/graphite-project/carbon/issues/455
There is no point in continuing the question here on SO.
Note: I am using the older version, 0.9.15 and not seeing the problem - so I recommend this until it is confirmed to not be resolved in master.

Titan Graph Queries taking too long to execute

I have a problem with the executing speed of Titan queries.
To be more specific:
I created a property file for my graph using BerkeleyJe which is looking like this:
storage.backend=berkeleyje
storage.directory=/finalGraph_script/graph
Afterwards, i opened the Gremlin.bat to open my Graph.
I set up all the neccessary Index Keys for my nodes:
m = g.getManagementSystem();
username = m.makePropertyKey('username').dataType(String.class).make()
m.buildIndex('byUsername',Vertex.class).addKey(username).unique().buildCompositeIndex()
m.commit()
g.commit()
(all other keys are created the same way...)
I imported a csv file containing about 100 000 lines, each line is producing at least 2 nodes and some edges. All this is done via Batchloading.
That works without a Problem.
Then i execute a groupBy query which is looking like that:
m = g.V.has("imageLink").groupBy{it.imageLink}{it.in("is_on_image").out("is_species")}{it._().species.groupCount().cap.next()}.cap.next()
With this query i want for every node with the property key "imageLink" the number of the different "species". "Species" are also nodes, and can be called by going back the edge "is_on_image" and following the edge "is_species".
Well this is also working like a charm, for my recent nodes. This query is taking about 2 minutes on my local PC.
But now to the problem.
My whole dataset is a csv with 10 million entries. The structure is the same as above, and each line is also creating at least 2 nodes and some edges.
With my local PC i cant even import this set, causing an Memory Exception after 3 days of loading.
So I tried the same on a server with much more RAM and memory. There the Import works, and takes about 1 day. But the groupBy failes after about 3 days.
I actually dont know if the groupBy itself fails, or just the Connection to the Server after that long time.
So my first Question:
In my opinion about 15 million nodes shouldn't be that big deal for a graph database, should it?
Second Question:
Is it normal that it takes so long? Or is there anyway to speed it up using indices? I configured the indices as listet above :(
I don't know which exact information you need for helping me, but please just tell me what you need in addition to that.
Thanks a lot!
Best regards,
Ricardo
EDIT 1: The way im loading the CSV in the Graph:
I'm using this code, i deleted some unneccassry properties, which are also set an property for some nodes, loaded the same way.
bg = new BatchGraph(g, VertexIDType.STRING, 10000)
new File("annotation_nodes_wNothing.csv").eachLine({ final String line ->def (annotationId,species,username,imageLink) = line.split('\t')*.trim();def userVertex = bg.getVertex(username) ?: bg.addVertex(username);def imageVertex = bg.getVertex(imageLink) ?: bg.addVertex(imageLink);def speciesVertex = bg.getVertex(species) ?: bg.addVertex(species);def annotationVertex = bg.getVertex(annotationId) ?: bg.addVertex(annotationId);userVertex.setProperty("username",username);imageVertex.setProperty("imageLink", imageLink);speciesVertex.setProperty("species",species);annotationVertex.setProperty("annotationId", annotationId);def classifies = bg.addEdge(null, userVertex, annotationVertex, "classifies");def is_on_image = bg.addEdge(null, annotationVertex, imageVertex, "is_on_image");def is_species = bg.addEdge(null, annotationVertex, speciesVertex, "is_species");})
bg.commit()
g.commit()

Graphite returning incorrect datapoint

I downloaded statsd and graphite 0.9.x
I used the stats-client provided with source of statsd as follows:
./statsd-client.sh 'development.com.alpha.operation.testing.rate:1|c'
I did the above operation 10 times.
Then I tried querying for a summary for last 24 hours:
http://example.com/render?format=json&target=summarize(stats.development.com.alpha.operation.testing.rate,
"24hours", "sum",true)&from=-24hours&tz=UTC
I get 1 datapoint as follows:
"datapoints": [[0.0, 1386277560]]}]
Why I am getting 0.0? Even Graphite Composer does not display anything
I was expecting a value of "10" as I performed the operation 10 times. What did I do wrong?
storage-schemas.conf
[carbon]
pattern = ^carbon\.
retentions = 60:90d
[default_1min_for_1day]
pattern = .*
retentions = 60s:1d
Please help me understand the problem.
EDIT:
As per answer below, I changed storage-aggregation and I get following response after running whisper-info on metric_file.wsp. But I am still getting "0.0" as value in datapoint and Graphite browser does not display anything.
maxRetention: 86400
xFilesFactor: 0.0
aggregationMethod: sum
fileSize: 17308
Archive 0
retention: 86400
secondsPerPoint: 60
points: 1440
size: 17280
offset: 28
I also looked at stats_counts Tree as suggested in another answer, but its the same.
What is wrong with my setup. I am using default setting for everything but the changes suggested by an answer below in storage-aggregation
Within the whisper package, you will get a script- whisper-info.py. Invoke it on the appropriate metric file-
/whisper-info.py /opt/graphite/storage/whisper/alpha/beta/charlie.wsp
You will get something like this-
maxRetention: 31536000
xFilesFactor: 0.0
aggregationMethod: sum
fileSize: 1261468
Archive 0
retention: 31536000
secondsPerPoint: 300
points: 105120
size: 1261440
offset: 28
Here, make sure that aggregationMethod is sum, and xFilesFactor is 0.0. Most probably it is not, since this isn't graphite's default behavior. Now make a regex that picks up your metrics and put it at the beginning of the config file storage-aggregation.conf. This will ensure that the newly created metrics follow this new aggregation rule. You can read more about how xFilesFactor works here.
Have you tried using the stats_counts tree instead of stats? StatsD populates both for regular counters. stats by default does some fancy averaging which can tend make low-intensity stat signals disappear, whereas stats_counts just gives you the straight-up count, which sounds like what you want.

Resources