Move whole file into HDFS as single file using flume spooling directory - flume-ng

Aa far as flume documentation we can move data into HDFS based on event size or event count or duration . Is there any way to move whole file from spooling directory into HDFS as single file
Example
Spooling Dir HDFS
file1 - 1000 event ----> file1-1000 event
file2 - 1008 event ----> file2 - 1008 event
file3 - 800 event ----> file3 - 800 event
Thanks.

Well, sort of. You need to tweak you configuration to reflect that, because flume wasn't designed to shove entire files regardless of their size, as you can more effectively use hadoop fs -copyFromLocal to do that.
Here's a list of things you need to configure:
a) batch channel size must be smaller than the size of events in your files in case you only sporadically spool files. otherwise your events may stay stuck in your channels.
b) hdfs.rollSize = 0 to make sure your files don't get rolled over after any size limit
c) hdfs.rollCount = 0 to make sure your files don't get rolled over after any amount of events
d) hdfs.rollInterval set to a decent amount to make sure your files git spooled on time.
e) spool one file at the time to avoid mix ups.
that's basically it.

Related

log4j, order of every event in file

logger.info(1)
logger.info(2)
logger.info(3)
log4j writes events to a .log file in an anyc manner. So, is there a chance the order of events in the physical file will be like:
2
1
3
?

how to flush page data in python using mmap

I am trying to map a region of fpga memory to host system,
resource0 = os.open("/sys/bus/pci/devices/0000:0b:00.0/resource0", os.O_RDWR | os.O_SYNC)
resource_size = os.fstat(resource0).st_size
mem = mmap.mmap(resource0, 65536, flags=mmap.MAP_SHARED, prot=mmap.PROT_WRITE|mmap.PROT_READ, offset= 0 )
If i flush my host page with
mem.flush()
then print the contents
the data is same as before,
nothing is getting cleared from page
print(mem[0:131072])
mem.flush()
print(mem[0:131072])
as i read on python mmap docs , it says it clears then content,
https://docs.python.org/3.6/library/mmap.html
but when i test it remains same
i am using python 3.6.9
Why do you expect flush to clear a page?
https://docs.python.org/2/library/mmap.html
flush([offset, size])
Flushes changes made to the in-memory copy of a file back to disk. Without use of this call there is no guarantee that changes are written back before the object is destroyed. If offset and size are specified, only changes to the given range of bytes will be flushed to disk; otherwise, the whole extent of the mapping is flushed. offset must be a multiple of the PAGESIZE or ALLOCATIONGRANULARITY.
So if you want to clear anything you have to assign a new value first and then write it to the memory i.e. flush it.

How to limit the number of call record in CDR file

My call records are getting store in /var/log/asterisk/cdr-csv/Master.csv file. i want to limit the number of call to be stored ni this file, after that it should start form the beginning,
so for this what could be procedure
You can't limit number of records by asterisk
But you easy can rotate files by using logrotate linux utility.
For that you should create file /etc/logrotate.d/asterisk_cdr
/var/log/asterisk/cdr-csv/*csv {
missingok
rotate 5
size 2000k
create 0640 asterisk asterisk
}
For more info see documentation for logrotated

Asterisk: Record application is generating empty files

User making the call is asked to dial an extension. This is done by 1#playing a prompt with Background and then 2#wait_for_digit. Based on the extension that has been dialed, the destination number is determined and the call is forwarded to that number.
If the called person doesn't not answer, then Playback is used to play a prompt that asks the user to record the voice message; recording the voice message is done with the Record application.
This Record application is always generating empty wav files, size 44 bytes. If I remove the 1#playing a prompt with Background the Record application is generating proper files. If the Background is included, all recordings are empty.
I am using Perl Asterisk::AGI module.
$agi->exec('Answer');
....
.....
$agi->exec('Background', 'en/extra/please-enter-the-extension,n'); # this is the troubling part
my $my_extension = $agi->wait_for_digit(5000);
....
.....
$agi->exec('Playback', 'en/extra/the-party-you-are-calling&en/extra/is-curntly-busy,noanswer');
$agi->exec('Playback', 'en/vm-intro,noanswer');
my $file = 'xyz.wav';
$agi->exec('Record', "$file,0,10,k");
...
...
What should I do to make it work as I want it to?
Thank you.
UPDATE 1:
The same script is working without glitches now. Not sure if something unrelated to the script has changed.
Most likly you have check your codecs. IF you use g729 or g723 and no transcoder,it just can't write in wav format.

Qt or PyQt - check when file is used by another process. Wait until finish copy

Good morning,
What is the best strategy for check when a big file o big directory has finished to copy?
I want wait until a file has finish fully to copy. Is there a code example in q
I'm working on mac os x.
thanks
Update
I use QFileSystemWatcher. the problem is that I receive file or directory change notification when o copy it is in progress. So user copy a big folder (inside many files), the operating system copy process start, it take 5 minuts, but in same times my application receive file changed notification. This is a problem because when i receive a change notification my application start for doing some operations on that files, but the copy is already in progress!!!!
There is only one reliable way to do this: Change the copy process to write to temporary files and then rename them after the copy is finished.
That way, you can ignore new files which end with .tmp and rename is an atomic operation.
If you can't change the copy process, all you can do is add a timer to wait for, say, half an hour to make sure the copy is really finished.
A more fine grained (and more risky) approach is to add a loop that check the file size and stops when the file size doesn't change for a certain time but that's also hard to get right.
Worse, this doesn't prevent you from reading partial files (when the copy process was terminated in the middle).
I think that the QFileSystemWatcher is the right start for you to get to the point of monitoring for changes, but as you have found, these changes are ANY changes. From this point, I think it should be easy enough for you to just check the modification time of the file.
Here is a simple example of a Watcher class that will let you specify a file to monitor and see if it has been modified after a given time. It can run a callback or emit a signal that anyone can watch:
import os.path
import time
from PyQt4 import QtCore
class Watcher(QtCore.QObject):
fileNotModified = QtCore.pyqtSignal(str)
MOD_TIME_DIFF = 5 #seconds
def __init__(self, aFile, callback=None, checkEvery=5):
super(Watcher, self).__init__()
self.file = aFile
self.callback = callback
self._timer = QtCore.QTimer(self)
self._timer.setInterval(checkEvery*1000)
self._timer.timeout.connect(self._checkFile)
def _checkFile(self):
diff = time.time() - os.path.getmtime(self.file)
if diff > self.MOD_TIME_DIFF:
self._timer.stop()
self.fileNotModified.emit(self.file)
if self.callback:
self.callback()
def start(self):
self._timer.start()
def stop(self):
self._timer.stop()
An example of using it:
def callbackNotify():
print "Callback!"
def signalNotify(f):
print "Signal: %s was modified!" % f
# You could directly give it a callback
watcher = Watcher("/path/to/file.file", callback=callbackNotify)
# Or could use a signal
watcher.fileNotModified.connect(signalNotify)
# tell the watcher timer to start checking
watcher.start()
## after the file hasnt been modified in 5 seconds ##
# Signal: /path/to/file.file was modified!
# Callback!
Try using QtConcurrent framework.
In particular, check out QFuture and QFutureWatcher. You can execute asynchronous copy operations inside a QFuture object and monitor its progress through signals and slots with a watcher.
bool copyFunction() {
// copy operations here, return true on success
}
MyHandlerClass myObject;
QFutureWatcher<bool> watcher;
connect(&watcher, SIGNAL(finished()), &myObject, SLOT(handleFinished()));
QFuture<bool> future = QtConcurrent::run(copyFunction);
Since you have no control on the external application, my suggestion is that you lock the files while you work on them. In this way other programs will not be able to access them while locked.
Alternatively, if you have access to the other program's source, you should implement some form of inter process communication,via sockets, messages or whatever method you prefer.

Resources