Remove duplicate remote syslog messages - syslog

For redundancy, every host in our distributed network sends its syslog messages to two dedicated rsyslog-nodes. These in turn send syslogs to a central graylog instance:
/ rsyslog \
host --> graylog
\ rsyslog /
Now every log-message gets duplicated!
Question: How can we keep the redundancy but remove duplicates? Does fluentd have a way to deal with this? Or any other opensource software designed to aggregate log-messages?
We do not want to include much more complexity to the whole setup, but inserting one additional component is fine.

AKAIK, there is no open source software which has built-in de-dup message because such requirements are application specific.
We may implement such message handling with master node but it has a performance / scalable problem.
We have several approaches to relax problem.
Add unique id to record and distinct in queries. I don't know how to distinct in graylog...
One stream is for realtime and another stream is for backup. No merge streams in one graylog
On fluentd, you may use fluent-plugin-suppress(https://github.com/fujiwara/fluent-plugin-suppress) or fluent-plugin-dedup(https://github.com/edvakf/fluent-plugin-dedup) for such cases.

Related

Custom routing via nginx - read from third party source

I am new to nginx, and am wondering if it can help me to solve a use-case we've encountered.
I have n nodes,which are reading from from a kafka topic with the same group id, which means that each node has disjoint data, partitioned by some key.
Nginx has no way of knowing apriori which node has data corresponding to which keys. But we can build an API or have a redis instance which can tell us the node given the key.
Is there a way nginx can incorporate third party information of this kind to route requests?
I'd also welcome any answers, even if it doesn't involve nginx.
Nginx has no way of knowing apriori which node has data corresponding to which keys
Nginx doesn't need to know. You would need to do this in Kafka Streams RPC layer with Interactive Queries. (Spring-Kafka has an InteractiveQueryService interface, btw, that can be used from Spring Web).
If you want to present users with a single address for the KStreams HTTP/RPC endpoints, then that would be a standard Nginx upstream definition for a reverse proxy, which would route to any of the backend servers, which in-turn communicate with themselves to fetch the necessary key/value, and return the response back to the client.
I have no idea how Kafka partitions
You could look at the source code and see it uses a murmur2 hash, which is available in Lua, and can be used in Nginx.
But again, this is a rabbit hole you should probably avoid.
Other option, use Kafka Connect to dump data to Redis (or whatever database you want). Then write a very similar HTTP API service, then (optionally) point Nginx at that.

Collect data for Bosun from multiple endpoints

In the observability system we're building from scratch, we'd like to have a single scollector to collect data from all the web servers and send it to Bosun, instead of having an instance of scollector on each server.
Do you know if there's a way to achieve that?
Scollector is implemented as an agent, similar to OpenTSDB's tcollector. It's lightweight and doesn't cause too much overhead on the hosts.
If you want all the data that scollector is capable of collecting forwarded to Bosun, there needs to be a single agent per host to monitor. Scollector makes use of procfs and similar which is only accessible on the hosts directly.
You can additionally create your own additional collectors that scollector will invoke for you.
With that, depending on your use case, you might be able to collect data from remote hosts, but scollector is really designed to run as an agent on every host and collect the data locally.

Load balancing MessageQ (ActiveMQ)

This is my scenario..
1) A REST based web service (Say X) takes in requests and puts it into ActiveQ
2) There is a listener on the other side of the Q that will read and process the message. This is async
I decided to go with ActiveMQ.
But trying to find a solution where I can Q and the Q listeners scalable.
1) I have many instances of X running. Hence there are multiple prodders to the Q.
2) Ordering is important to me.
3) Since my REST service is session less, I don't have a way to tag a bunch of requests with the same message ID.
4) Now if I use a single Q, it works fine..
But I want to scale it up and use multiple Q and multiple Q consumers without compromising on the order.
Can someone suggest me a solution to this problem?
Thanks much,
To achieve ordering of messages there are 2 ways defined in activeMQ
1)MessageGroups based on JMSXGroupID
2)Exclusive Consumer
Message groups are more useful than Exclusive consumer,as internally Exclusive Consumer only uses one consumer at a time,but in case of that consumer failure it connects to other consumer.
You can read the ActiveMQ documentation for the same here
http://activemq.apache.org/how-do-i-preserve-order-of-messages.html
hope this helps!
Good luck!
We can use following kind of connection URL to utilize ActiveMQ load....
failover://(tcp://192.nnn.nn.nn:61616,tcp://192.nnn.nn.nn:61616)?randomize=false
randomize=true will made message shuffles between two AciveMQ in active mode, rather not by just fail-over of ActiveMQ......
Complete reference for this can be found under the following Apache Site link....
http://activemq.apache.org/failover-transport-reference.html
But Still high availability (i.e, cluster) configuration make things stable for your App although Apache must advance ActiveMQ High Availability, hence things can work smoother.
Although because of KahaDB restriction load balancing/fault tolerant configuration is restricted. The present Apache ActiveMQ High Availability configuration available in the following link.
http://activemq.apache.org/clustering.html
However KahaDB has file lock restriction, following tweaking/alternates ways of configuration can be done...
1)Shared File System Master Slave,- A shared file system such as a SAN
http://activemq.apache.org/shared-file-system-master-slave.html
2)JDBC Master Slave,- A Shared database
http://activemq.apache.org/jdbc-master-slave.html
3)Replicated LevelDB Store,- ZooKeeper Server
http://activemq.apache.org/replicated-leveldb-store.html
Over & above by having JCA connectors,- AS like JBoss, Weblogic, Websphere, Geronimo, Glassfish,- ActimeMQ patching as a kind of Resource Adapter can be done. And with Apache Camel (karaf), JBoss Fuse ESB kind of products HA & clustering of ActiveMQ can be done.

Using NGINX to forward tracking data to Flume

I am working on providing analytics for our web property based on instrumentation data we collect via a simple image beacon. Our data pipeline starts with Flume, and I need the fastest possible way to parse query string parameters, form a simple text message and shove it into Flume.
For performance reasons, I am leaning towards nginx. Since serving static image from memory is already supported, my task is reduced to handling the querystring and forwarding a message to Flume. Hence, the question:
What is the simplest reliable way to integrate nginx with Flume? I am thinking about using syslog (Flume supports syslog listeners), but I struggle with how to configure nginx to forward custom log messages to a syslog (or just TCP) listener running on a remote server and on a custom port. Is it possible with existing 3rd party modules for nginx or would I have to write my own?
Separately, anything existing you can recommend for writing a fast $args parser would be much appreciated.
If you think I am on a completely wrong path and can recommend something better performance-wise, feel free to let me know.
Thanks in advance!
You should parse nginx log file like tail -f do and then pass results to Flume. It will be the most simple and reliable way. The problem with syslog is that it blocks nginx and may completely stuck under high-load or if something goes wrong (this is why nginx doesn't support it).

Can two or more SNMP agents be run on the same port (on the same machine)?

Just a technical question -
Can two or more SNMP agents be run on the same port (on the same machine)?
My first instinct would be no since host:port identifies an instance of an application but I'm not sure.
Thank you!
Technically, if the OS supports it, the SO_REUSEADDR SO_REUSEPORT options may be set on a socket to allow other processes to bind to the same address/port and thus allow multiple processes to receive messages on the same address/port. But both processes would have to set the option, and I doubt any agent implementations do that because it would not make sense to do so--it would just cause headaches having both agents potentially responding to a single request. Managers won't be equipped to handle it.
However, you can instead run an SNMP proxy in the primary address/port, configured to forward requests to one of multiple agents based on query, security, or (with SNMPv3) context/engine ID parameters, and forward responses back.
Also, using AgentX, you have an SNMP master agent running on the primary address/port, and one or more SNMP sub-agents connected to the master agent. The master agent dispatches requests to the sub-agents as appropriate, merging the results into a single response, so that to the outside world it appears as a single agent. Each sub-agent typically handles a different branch of OID space (one sub-agent implementing certain module(s), another sub-agent implementing other module(s)).
But taking two agents intended to own the address/port exclusively, and forcing them to share through the REUSE options, while it may be possible, would not be wise.
You can run multiple agents on the same host and with the same port if they have differents ip address (can use a netsh script for that).
Personnaly I use the nsoftware ddl : SecureSNMP V8 edition .NET to do this.
You can look at this post : Multiple SNMP Agents with nsoftware dll
No, two agents cannot both run on the same port as seperate applications for the reasons you assumed (except with a brittle packet sniffing hack, which we'll not go into).
However, 2 agents can be accessed through the same port if there is some mechanism that handles the actual port and distributes requests based on MIB. For example the Windows SNMP service does this, allowing any number of SNMP agents to be added as "extensions" through the registry (HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\SNMP\Parameters\ExtensionAgents) by writing them as DLLs and using the snmp.h headers in the platform SDK.
You are correct: ports can't be shared.
If both the agents were designed by you, then the answer can be different.
Consider the HTTP and FTP cases, we can use host names to distinguise multiple sites on the same port, then why can't we do it for SNMP?
We can create a dispatcher who monitors port 161 for incoming traffic. Then use multiple real agents to handle those traffic behind. We can feel free to design how to distinguise them. Personally I prefer the FTP virtual host name manner and use | to distinguise agents.
Maybe I can create a demo for #SNMP Suite in the future.
But if you need to work with existing agents on the same server, then such flexibility is lost.

Resources