how sink data to syslog using apache flume - syslog

is it possible to sink data from apache flume to syslog? I am currently checking from official documention, but seems there is no sink connector to Syslog available at presence. I am thnking using logstash as replacement because it supports both source and sink to syslog. However, the downside is that it is not distributed and scalable.
Thanks in advance?

You can create you own custom sink for Syslog, if you know any Syslog API for Java. This question at StackoverFlow itslef coulf help you.
Regrding custom sink creation, it is not very complex. You have to extend the AbstractSink class, and to implement the Configurable interface. This basically means to override the public Status process() and public void configure(Context context) methods, respectively.

Related

How to allow gRPC Channel in Java to use multiple connections?

Javadoc for io.grpc.Channel class mentions that,
A channel is free to have zero or many actual connections to the
endpoint based on configuration, load, etc
My question is, how do I enable this configuration, which allows gRPC's channel to open multiple actual connections when needed ?
Update:
According to this Microsoft Doc, only one TCP connection is made !
This is the default behavior, no configuration needed. See https://grpc.io/docs/guides/performance/ for a few tips on gRPC best practices, there's a short blurb in there about Channels and HTTP/2 connection limits.
I don't think the feature(or configuration) is readily available in grpc-java or grpc-go client. However, you can explore grpc-dotnet client, I think they found a workaround(which even they don't recommend using).
who manages these multiple connections
In grpc-java, the channel itself manages the HTTP/2 connections. We don't have access to these connections(if we want to configure them).
There may be a possible workaround to implement the same by - "creating multiple channels". See this

Does Kaa have messaging mechanism to external systems?

Is there any option in Kaa when device contacted the Kaa platform with some data, Is it able to send the same information to our external systems through message broker? For example when a temperature sensor updated the current temperature value to the Kaa, Is kaa able to send the same information to the Messaging brokers like ActiveMQ.
Maybe you can try the Kafka or Flume appender of Kaa.
I try to use the Kafka appender to send the data from some sensor to storm server like below reference and it works fine.
https://www.kaaproject.org/iot-real-time-data-processing-in-storm-using-kaa/
And you also can custom your appender by following below url:
https://kaaproject.github.io/kaa/docs/v0.10.0/Customization-guide/Log-appenders/
There are many possibilities to do that which are better or not depending on your particular use case.
But, usually the most efficient way would be to use one of the existing Log Appenders that are running on the Kaa server side and were specifically created for such messaging.

how is flume distributed?

I am working with flume to ingest a ton of data into hdfs (about petabytes of data). I would like to know how is flume making use of its distributed architecture? I have over 200 servers and I have installed flume in one of them from where I would get the data from (aka data source) and the sink is the hdfs. (hadoop is running over serengeti in these servers). I am not sure whether flume distributes itself over the cluster or I have installed it incorrectly. I followed apache's user guide for flume installation and this post of SO.
How to install and configure apache flume?
http://flume.apache.org/FlumeUserGuide.html#setup
I am a newbie to flume and trying to understand more about it..Any help would be greatly appreciated. Thanks!!
I'm not going to speak to Cloudera's specific recommendations but instead to Apache Flume itself.
It's distributed however you decide to distribute it. Decide on your own topology and implement it.
You should think of Flume as a durable pipe. It has a source (you can choose from a number), a channel (you can choose from a number) and a sink (again, you can choose from a number). It is pretty typical to use an Avro sink in one agent to connect to an Avro source in another.
Assume you are installing Flume to gather Apache webserver logs. The common architecture would be to install Flume on each Apache webserver machine. You would probably use the Spooling Directory Source to get the Apache logs and the Syslog Source to get syslog. You would use the memory channel for speed and so as not to affect the server (at the cost of durability) and use the Avro sink.
That Avro sink would be connected, via Flume load balancing, to 2 or more collectors. The collectors would be Avro source, File channel and whatever you wanted (elasticsearch?, hdfs?) as your sink. You may even add another tier of agents to handle the final output.
In the latest version, Apache Flume no longer follows master-slave architecture. It is deprecated after Flume 1.x.
There is no longer a Master, and no Zookeeper dependency. Flume now runs with a simple file-based configuration system.
If we want it to scale, we need to install it in multiple physical nodes and run our own topology. As far as single node is considered.
Say we hook to a JMS server that gives 2000 XML events per second, and I need two Fulme agents to get that data, I have two distributed options:
Two Flume agents started and running to get JMS data in same physical node.
Two Flume agents started and running to get JMS data in two physical nodes.

Using NGINX to forward tracking data to Flume

I am working on providing analytics for our web property based on instrumentation data we collect via a simple image beacon. Our data pipeline starts with Flume, and I need the fastest possible way to parse query string parameters, form a simple text message and shove it into Flume.
For performance reasons, I am leaning towards nginx. Since serving static image from memory is already supported, my task is reduced to handling the querystring and forwarding a message to Flume. Hence, the question:
What is the simplest reliable way to integrate nginx with Flume? I am thinking about using syslog (Flume supports syslog listeners), but I struggle with how to configure nginx to forward custom log messages to a syslog (or just TCP) listener running on a remote server and on a custom port. Is it possible with existing 3rd party modules for nginx or would I have to write my own?
Separately, anything existing you can recommend for writing a fast $args parser would be much appreciated.
If you think I am on a completely wrong path and can recommend something better performance-wise, feel free to let me know.
Thanks in advance!
You should parse nginx log file like tail -f do and then pass results to Flume. It will be the most simple and reliable way. The problem with syslog is that it blocks nginx and may completely stuck under high-load or if something goes wrong (this is why nginx doesn't support it).

Serial Port Commands

I need to send commands through Serial Port to control a electronic device. According the datasheet of this device, the command structure is as follows: Prefix Command Carriage Return. There are some commands, e.g. GOCW_BY1, STATUSRQ, etc.
The program will be developed in C++/CLI. After, I create the SerialPort objectand I set the port parameters, I send commands using the write("String") method of SerialPort class. However, I haven't still realized what kind of string I must set on write method.
Moreover, I don't know the meaning of prefix. Could you help me?
In C++/CLI, I recommend against using the .NET System::IO::Ports::SerialPort class. C++/CLI gives you convenient access to the Win32 API, which are far more powerful (and IMO easier to use) than the .NET API.
See for example this question about accessing serial ports from C++

Resources