how to write custom flume-ng source for creating avro file on hdfs sink ? - flume-ng

I'm trying to write a CustomSource which can create avro files on hdfs sink. But I can't figure it out.
Would like to see some guideline or example.

This will help you to get the details:
https://flume.apache.org/FlumeUserGuide.html
But on a high level to create a custom flume source:
1. Change the Flume config. Here you add your source configuration and point it to the MyCustomSource.java
2. Write the Source logic in MyCustomSource.java and copy the jar to flume nodes and restart.

Related

Package log4r-mongoDB integration

I explored the log4r package and found that we have console and file appenders. I am looking for pushing the logs to a mongoDB database. I am unable to find any way in pushing the logs to mongoDB, do we have any appender for the same.
TIA
You can try the mongoimport tool
If you do not specify a file (option --file=), mongoimport reads data from standard input (e.g. "stdin").
Or write logs into a temporary file and use mongoimport to import it.

Starting SparkR session using external config file

I have an RStudio driver instance which is connected to a Spark Cluster. I wanted to know if there is any way to actually connect to Spark cluster from RStudio using an external configuration file which can specify the number of executors, memory and other spark parameters. I know we can do it using the below command
sparkR.session(sparkConfig = list(spark.cores.max='2',spark.executor.memory = '8g'))
I am specifically looking for a method which takes spark parameters from an external file to start the sparkR session.
Spark uses standardized configuration layout with spark-defaults.conf used for specifying configuration option. This file should be located in one of the following directories:
SPARK_HOME/conf
SPARK_CONF_DIR
All you have to do is to configure SPARK_HOME or SPARK_CONF_DIR environment variables and put configuration there.
Each Spark installation comes with template files you can use as an inspiration.

In Pentaho can I load file and proccess the data directly into oracle database.

As of now i am downloading the file from SFTP to local and then adding into the database.I want to remove the extra step that is to download the file to machine.
The Text file input step is based on Apache vfs, which can read from a sftp server. So the solution is to define the Filename/Directory with the appropriate syntax:
sftp://[ username[: password]#] hostname[: port][ relative-path]
Supported file systems are on the Apache Common VFS web page.

fileio disable Rserve

I would like to prevent programs using Rserve from being able to navigate my file system, read/write files, etc. Based on the Rserve documentation on the config file, it looks like this should be possible using the "fileio disable" option:
https://www.rforge.net/Rserve/doc.html
However, I created an Rserve.cfg file as follows:
port 6312
fileio disable
and I am still able to read/write/delete files on my system through Rserve. It definitely registered the config file, as it is using port 6312, but does not give any indication of whether it registered the fileio disable option.
Does anyone know how to disable file access from Rserve? I see that the documentation isn't perfect. Is the formatting wrong?

SFTP polling using java

My scenario as follows:
One java program is updating some random files to a SFTP location.
My requirement is as soon as a file is uploaded by the previous java program, using java I need to download the file. The files can be of size 100MB. I am searching for some java API which is helpful in this way. Here I even don't know the name of files. But I can keep a regular expression for this. A same file can be uploaded by previous program periodically. Since file size is high I need to wait until the complete file to be uploaded.
I used Jsch to download files, but I am not getting how to poll using jsch.
Polling
All you can do is to keep listing remote directory periodically, until you find a new file. There's no better way with SFTP. For that you obviously use ChannelSftp.ls().
Regarding selecting files matching certain pattern, see:
JSch ChannelSftp.ls - pass match patterns in java
Waiting until the upload is complete
Again, there's no support for this in widespread implementations of SFTP.
For details, see my answer at:
SFTP file lock mechanism.

Resources