I'm looking at making a Processor / CPU Queue Length graph on Kibana from some metricbeats stats that I have, and I was wondering how to do it / if it is possible. From having a search around I cannot seem to find a lot about it.
Thanks.
Try using topbeat for ELK it will helpful to get the system infrastructure metrics.
https://www.elastic.co/guide/en/beats/topbeat/current/_step_5_loading_sample_kibana_dashboards.html
Related
I tried to run a Gremlin query adding a property to vertex through Gremlin console.
g.V().hasLabel("user").has("status", "valid").property(single, "type", "valid")
I constantly get this error:
org.apache.tinkerpop.gremlin.jsr223.console.RemoteException: Connection to server is no longer active
This error happens after query is running for one or two minutes.
I tried some simple queries like g.V().limit(10) and it works fine.
Since the affected vertex count is more than 4 million, not sure if it is failing due to timeout issue.
I also tried to split it into small batches:
g.V().hasLabel("user").has("status", "valid").hasNot("type").limit(200000).property(single, "type", "valid")
It succeeded for first few batches and started failing again.
Is there any recommendations for updating millions of vertices?
The precise approach you take may vary depending on the backend graph database and storage you are using as well as the capacity of the hardware being used.
The capacity of the hardware where Gremlin Server is running in terms of number of CPUs and most importantly, memory, will also be a factor as will the setting of the query timeout value.
To do this in Gremlin, if you had a way to identify distinct ranges of vertices easily you could split this up into multiple threads each doing batches of updates. If the example you show is representative of your actual need then that is likely not possible in this case.
Likewise some graph databases provide a bulk load capability that is often a good way to do large batch updates but probably not an option here as you need to do essentially a conditional update based on looking at the current presence (or not) of a property.
Without more information about your data model and hardware etc. the best answer is probably to do two things:
Use smaller limits. Maybe try 5K or even just 1K at first and work up from there until you find a reliable sweet spot.
Increase the query timeout settings.
You may need to experiment to find the sweet spot for your environment as the capacity of the hardware will definitely play a role in situations like this as well as how you write your query.
1.Does enterprise version support distributed graph algorithm?Or can the Neo4J graph data and graph calculation be distributed over cloud infrastruction?And How does it work?
2.If I have a server (16 cores CPU,256G memory,2TB HDD) and each node or relation has 1K data,how many nodes and relationships can the server contain.The ratio between nodes and relationships is 1:5.
If we want import more data,what should we do?
3.For fast importing , we used batchinserter,but one lucence index has a number limit which is 2^32.So we can import less than 2^32 nodes.What should we do to solve this limit except using more indexes?
4.And after two days for importing, the importing speed is too slow(200-600 nodes per sec) to accept. It is only 1% of the beginning !I can see that the memory is full, what should we do to impove the speed.
It has imported about 0.2B nodes and 0.5B relationships. That's half of my data. And my server has 32GB memory.
Thanks a lot.
You have many questions here that might be better suited as individual questions or asked on the Neo4j slack channel.
I started to write this as a comment, but ran out of chars so I'll try to point you to some resources:
1) Neo4j distributed graph model
I'm not sure exactly what you're asking here. See this document for general information on Neo4j scalability. If you're asking if graph traversals can be distributed across machines in a Neo4j cluster, the answer is no.
2) Hardware sizing
This depends a bit on your access patterns. See the hardware sizing calculator that can take workload / access patterns into account.
3-4) Import
Can you create a new question and share your code for this? You should be able to achieve much better performance than this.
I have graphite+collectd setup to collect system related metrics. This question concerns with the memory plugin for collectd.
My infra has this format for collecting memory usage data using collectd:
<cluster>.<host>.memory.memory-{buffered,cached,free,used}
I want to plot the percentage of memory used for each host.
So basically, I have to do something like this:
divideSeries(sumSeriesWithWildCards(*.*.memory.memory-{buffered,cached,free},1),sumSeriesWithWildCards(*.*.memory.memory-{buffered,cached,free,used}),1)
But I am not able to do this, as divideSeries wants the divisor metric to return only one metric.
I basically want a single target to monitor all hosts in a cluster.
How can I do this?
try this one:
asPercent(host.memory.memory-used, sumSeries(host.memory.memory-{used,free,cached,buffered}))
you'll get a graph of memory usage in percents for one hosts. Unfortunately I wasn't able to make it work with wildcards (multiple hosts).
Try this for multiple nodes with regex.
alias(asPercent(sumSeries(collectd.nodexx*_internal_cloudapp_net.memory.memory.used), sumSeries(collectd.nodexx*_internal_cloudapp_net.memory.memory.{used,free,cached,buffered})),"Memory Used")
alias(asPercent(sumSeries(collectd.nodexx*_internal_cloudapp_net.memory.memory.{cached,buffered}), sumSeries(collectd.nodexx*_internal_cloudapp_net.memory.memory.{used,free,cached,buffered})),"Memory Cached")
alias(asPercent(sumSeries(collectd.nodexx*_internal_cloudapp_net.memory.memory.free), sumSeries(collectd.nodexx*_internal_cloudapp_net.memory.memory.{used,free,cached,buffered})),"Memory Free")
Hello fellow StackOverflow Users,
I have this problem : I have one very big image which i want to work on. My first idea is to divide the big image to couple of sub-images and then send this sub-images to different GPUs. I don't use the Image-Object, because I don't work with the RGB-Value, but I'm only using the brightness value to manipulate the image.
My Question are:
Can I use one context with many commandqueues for every device? or should I use one context with one commandqueue for each device ?
Can anyone give me an example or ideas, how I can dynamically change the inputMem-Data (sub-images data) for setting up the kernel arguments to send to the each device ? (I only know how to send the same input data)
For Example, If I have more sub-images than the GPU-number, how can I distribute the sub-images to the GPUs ?
Or maybe another smarter approach?
I'll appreciate every help and ideas.
Thank you very much.
Use 1 context, and many queues. The simple method is one queue per device.
Create 1 program, and a kernel for each device (created from the same program). Then create different buffers (one per device) and set each kernel with each buffer. Now you have different kernels, and you can queue them in parallel with different arguments.
To distribute the jobs, simple use the event system. Checking if a GPU is empty and queing there the next job.
I can provide more detailed example with code, but as general sketch that should be the way to follow.
AMD APP SDK has few samples on multi gpu handling. You should be looking at these 2 samples
SimpleMultiDevice: shows how to create multiple commandqueues on single context and some performance results
BinomailoptionMultiGPU: look at loadBalancing method. It divides the buffer based on compute units & max clock freq of available gpus
I have a dozen load balanced cloud servers all monitored by Munin.
I can track each one individually just fine. But I'm wondering if I can somehow bundle them up to see just how much collective CPU usage (for example) there is among the cloud cluster as a whole.
How can I do this?
The munin.conf file makes it easy enough to handle this for subdomains, but I'm not sure how to configure this for simple web nodes. Assume my web nodes are named, web_node_1 - web_node_10.
My conf looks something like this right now:
[web_node_1]
address 10.1.1.1
use_node_name yes
...
[web_node_10]
address 10.1.1.10
use_node_name yes
Your help is much appreciated.
You can achieve this with sum and stack.
I've just had to do the same thing, and I found this article pretty helpful.
Essentially you want to do something like the following:
[web_nodes;Aggregated]
update no
cpu_aggregate.update no
cpu_aggregate.graph_args --base 1000 -r --lower-limit 0 --upper-limit 200
cpu_aggregate.graph_category system
cpu_aggregate.graph_title Aggregated CPU usage
cpu_aggregate.graph_vlabel %
cpu_aggregate.graph_order system user nice idle
cpu_aggregate.graph_period second
cpu_aggregate.user.label user
cpu_aggregate.nice.label nice
cpu_aggregate.system.label system
cpu_aggregate.idle.label idle
cpu_aggregate.user.sum web_node_1:cpu.user web_node_2:cpu.user
cpu_aggregate.nice.sum web_node_1:cpu.nice web_node_2:cpu.nice
cpu_aggregate.system.sum web_node_1:cpu.nice web_node_2:cpu.system
cpu_aggregate.idle.sum web_node_1:cpu.nice web_node_2:cpu.idle
There are a few other things to tweak the graph to give it the same scale, min/max, etc as the main plugin, those can be copied from the "cpu" plugin file. The key thing here is the last four lines - that's where the summing of values from other graphs comes in.