AWS IoT MQTT Client for Python - How do you subscribe forever? - infinite-loop

I'm using the example from the samples/basicPubSub/basicPubSub.py with useWebsocket=True
I have some file my_test_file.py where I connect() and subscribe() with a message callback. Let's say the callback just writes to some log file. How can I have this file always be running, so that if I publish from somewhere else, today, tomorrow, a year from now - this log file is constantly being written to with the message?
from AWSIoTPythonSDK.MQTTLib import AWSIoTMQTTClient
client = AWSIoTMQTTClient('client_id', useWebsocket=True)
client.configureEndpoint('host', port)
client.configureCredentials('path to cert')
client.configureAutoReconnectBackoffTime(1, 32, 20)
client.configureOfflinePublishQueueing(-1)
client.configureDrainingFrequency(2)
client.configureConnectDisconnectTimeout(10)
client.configureMQTTOperationTimeout(5)
client.connect()
client.subscribe('topic name', 1, _some_callback_func)
while True:
time.sleep(1)
Is having an infinite while loop at the end of my_test_file.py the only way? With the infinite while loop, I run the file and it's a blocking process, but it is infinitely subscribed. Is it a combination of a systemd service and this infinite while loop? I saw some loop_forever() methods in the Paho MQTT client, does the aws iot mqtt client have something similar? Is loop_forever() just implementing an infinite while loop?

Infinite looping is the right way to handle with that, as long as you know what to do, not interrupting the main process (core logic). To do so, I recommend you isolate this script in a new python file to be runned by the main python script as a subprocess. So, before you start with hands on, read Multiprocessing - process-based parallelism.
Usually something always need to keeping the Python process running on memory, unless something reboot the OS unexpectedly.
It seems not be an elegant way to handle with it, but I advise you to construct an execution management structure behind this Python script, like always initiate your python script after "boot loader" machine/device, make an another watchdog script, or something else like that, to prevent some internet connections problems, forcing the system to be rebooted or reconnect to the internet.

Related

MPICH2, the failure of one process will crash all other processes

I use MPICH2. When I launch processes with mpiexec, the failure of one process will crash all other processes. How to avoid this?
In MPICH, there is a flag called -disable-auto-cleanup which will prevent the process manager from automatically cleaning up all processes when a single process fails.
However, MPI itself does not have much support for fault tolerance and this is something that the Fault Tolerance Working Group is working on adding in a future version of the MPI Standard.
For now, the best you can do is change the default MPI Error Handler away from MPI_ERRORS_ARE_FATAL, which causes all processes to abort, to something else like MPI_ERRORS_RETURN which would return the error code to the application and allow it to do something else. However, you're not likely to be able to communicate anymore after a failure has occurred, especially if you are trying to use collective communication.

Process stop getting network data

We have a process (written in c++ /managed), which receives network data via tcpip.
After running the process for a while while tracking network load, it seems that network get into freeze state and the process does not getting data, there are other processes in the system that using networking (same nic) which operates normally.
the process gets out of this frozen situation by itself after several minutes.
Any idea what is happening?
Any counter i can track to see if my process reach some limitations ?
It is going to be very difficult to answer specifically,
-- without knowing what exactly is your process/application about,
-- whether it is a network chat application, or a file server/client, or ......
-- without other details about your process how it is implemented, what libraries it uses, if relevant to problem.
Also you haven't mentioned what OS and environment you are running this process under,
there is very little anyone can help . It could be anything, a busy wait loopl in your code, locking problems if its a multi-threaded code,....
Nonetheless , here are some options to check:
If its linux try below commands to debug and monitor the behaviour of the process and see what could be problem-
top
Check top to see ow much resources(CPU, memory) your process is using and if there is anything abnormally high values in CPU usage for it.
pstack
This should stack frames of the process executing at time of the problem.
netstat
Run this with necessary options (tcp/udp) to check what is the stae of the network sockets opened by your process
gcore -s -c
This forces your process to core when the mentioned problem happens, and then analyze that core file using gdb
gdb
and then use command where at gdb prompt to get full back trace of the process (which functions it was executing last and previous function calls.

Potential Concerns in Stopping Meteor Ungracefully

Just getting into Meteor, which by many accounts seems like a great project. One potential issue (which it may not be) is there doesn't seem to be a meteor stop or another programmatic way to shut down meteor gracefully. Please let me know if I am wrong about this!
Are there potential concerns about maintaining database integrity (for example), if we interrupt the process using CTRL-C or shutting it down via an Activity Monitor? And are there steps we can take to reduce or eliminate such issues?
Caveat: I recognize the above questions are somewhat vague, and I understand that this is usually considered harmful on Stack, but I hope they are still answerable ones.
Thanks,
It does look like there is a cleanup which takes place before the process is terminated (https://github.com/meteor/meteor/blob/master/tools/cleanup.js).
The first signal sent is SIGINT which is a polite way to ask the process to shut down (and give it time to finish its last running thread)
With database integrity, the mongod process also tries to clean itself up before it shuts down & it has a recovery mechanism (from the journal files) on a quick recovery while restarting if forced to shutdown.
That being said, in the middle of a longer running thread I'm not too sure if it's allowed to finish or its killed immediately. But meteor does attempt to give it a chance to have a graceful termination at first, and then escalates it to a SIGHUP then finally a SIGTERM (which is still a graceful termination signal). At no point does meteor force or send a SIGKILL or SIGSTOP.
So meteor apps should be safe from Ctrl+C termination. With activity monitor termination it depends on what type of signal its sent (i.e Force Quit or just Quit)
So to add some closure to this, if your mongodb is externally managed, i.e. on a deployment production server meteor doesn't stop it as mongo-runner.js notes:
// Since it is externally managed, asking it to actually stop would be
// impolite, so our stoppable handle is a noop
if (process.env.MONGO_URL) {
launch_callback();
return handle;
}

Communication between two programs signals or shared mem?

I need to implement (in Qt) some solution to communicate between two programs running on Linux machine. One program is Worker, and the second is Watchdog. Basically I need Watchdog to periodically check on Worker and in case something wrong (no process,hangup - no answer from Worker) kill Worker (if present) and start it again.
Worker runs as a daemon, so I think starting it from unix /etc/init.d/worker would be appropriate.
I can see two solutions
Unix signals - both of them can send and receive Unix SIGUSR1
Shared memory
Which one to choose?
With signals both of programs will have to know others pid, probably reading from filesystem /var/run so it looks like a drawback.
With shared memory, all I need is "key" that programs will have hardcoded, so no need to read pids from filesystem. Since Watchdog should start first it can create shared mem segment, and Worker will only attach to it and maybe update its timestamp value??? However, to stop Worker by Watchdog (in case of hungup) Watchdog will still need Worker pid to send him SIGKILL, maybe it can read it from shared mem? Both concepts are new to me.
So what is the proper way to build reliable Watchdog, or am I missing something?
best regards
Marek
I think this is the best solution available through Qt:
http://qt-project.org/doc/qt-4.8/qlocalsocket.html
http://qt-project.org/doc/qt-4.8/qlocalserver.html
The QLocalSocket class provides a local socket. On Windows this is a
named pipe and on Unix this is a local domain socket.
http://qt-project.org/doc/qt-4.8/ipc-localfortuneserver.html
http://qt-project.org/doc/qt-4.8/ipc-localfortuneclient.html
Hope that helps.

Persistent connection to external source in uwsgi project

I have a project which needs to make a tcp connection to an external source. Each worker thread will be sending messages to this external service.
I'm wondering how I can do this without having a connection be brought up and torn down for every request. I'm pretty sure the pymongo module does something similar but I can't find any documentation on it. Would it be possible to set up some kind of thread-safe queue and have a separate thread consume that queue? I understand I could probably use gearman for this, but I'd like to avoid having another moving part in the system.
uWSGI has a thread-safe process-shared queueing system (http://projects.unbit.it/uwsgi/wiki/QueueFramework) but are you sure using simple python threading.Queue class is not enough ?

Resources