munin df plugin configuration - munin

Every time LVM snapshot is dropped, I receive Munin warnings:
Disk latency per device :: Average latency for /dev/vg/lv_mysql_snapshot
UNKNOWNs: Read IO Wait time is unknown, Write IO Wait time is unknown.
Disk latency per device :: Average latency for /dev/mapper/vg-lv_mysql-real
UNKNOWNs: Read IO Wait time is unknown, Write IO Wait time is unknown.
Disk latency per device :: Average latency for /dev/mapper/vg-lv_mysql_snapshot-cow
UNKNOWNs: Read IO Wait time is unknown, Write IO Wait time is unknown.
Is it any way to ignore the LVM partition notifications in munin?
Thanks.

In your case a quick:
[diskstats]
env.exclude lv
should be enough. This should go in /etc/munin/munin.conf below the configuration of this host.

Related

Need advice on serial port software access implementation on single CPU core device

I am designing the software controlling several serial ports, operating system is OpenWrt. The device application is running in is single core ARM9 # 450 MHz. Protocol for serial ports is Modbus.
The problem is with Modbus slave implementation. I designed it in real-time manner, looping reading data from serial port (port is open in non-blocking mode with 0 characters to wait for and no timeouts). The sequence/data stream timeout is about 4 milliseconds # 9600/8/N/1 (3.5 characters as advised). Timeout is checked if application does not see anything in the buffer, therefore if application is slower that incoming stream of characters, timeouts mechanism will not take place - until all characters are removed from the buffer and bus is quiet.
But I see that CPU switches between threads and this thread is missed for about 40-60 milliseconds, which is a lot to measure timeouts. While I guess serial port buffer still receives the data (how long this buffer is?), I am unable to assess how much time passed between chars, and may treat next message as continuation of previous and miss Modbus request targeted for my device.
Therefore, I guess, something must be redesigned (for the slave - master is different story). First idea coming to my mind is to forget about timeouts, and just parse the incoming data after being synchronized with the whole stream (by finding initial timeout). However, there're several problems - I must know everything about all types of Modbus messages to parse them correctly and find out their ends and where next message starts, and I do not see the way how to differentiate Modbus request with Modbus response from the device. If developers of the Modbus protocol would put special bit in command field identifying if message is request or response... but it is not the case, and I do not see the right way to identify if message I am getting is request or response without getting following bytes and checking CRC16 at would-be byte counts, it will cost time while I am getting bytes, and may miss window for the response to request targeted for me.
Another idea would be using blocking method with VTIME timeout setting, but this value may be set to tenths of seconds only (therefore minimal is 100 ms), and this is too much given another +50 ms for possible CPU switching between threads, I think something like timeout of 10 ms is needed here. It is a good question if VTIME is hardware time, or software/driver also subject to CPU thread interruptions; and the size of FIFO in the chip/driver how many bytes it can accumulate.
Any help/idea/advice will be greatly appreciated.
Update per #sawdust comments:
non-blocking use has really appeared not a good way because I do not poll hardware, I poll software, and the whole design is again subject to CPU switching and execution scheduling;
"using the built-in event-based blocking mode" would work in blocking mode if I would be able to configure UART timeout (VTIME) in 100 us periods, plus if again I would be sure I am not interrupted during the executing - only then I will be able to have timing properly, and restart reading in time to get next character's timing assessed properly.

BizTalk send port retry interval and retry count

There is one dynamic send port (Req/response) in my orchestration.
Request is sending to external system and accepting response in orch. There is a chance the external system have monthly maintenance of 2 days. To handle that scenario
Retry interval if I set to 2 days is it impacting the performance? Is it a good idea?
I wouldn't think it is a good idea, as even a transitory error of another type would then mean that message would be delayed by two days.
As maintenance is usually scheduled, either stop the send port (but don't unenlist) or stop the receive port that picks up the messages to send (preferable, especially if it is high volume), and start them again after the maintenance period.
The other option would be to build that logic into the Orchestration, that if it catches an exception that it increased the retry interval on each retry. However as above, if it is high volume, you might be better off switching of the receive location, as otherwise you will have a high number of running instances.
Set a service interval at the send port if you know when the receiving system will be down. If the schedule is unknown I would rather set:
retry count = 290
retry interval = 10 minutes
to achieve that the messages will be transmitted over two days.

Adafruit IO data rate limit

I'm trying to send data from multiple ESP-8266 to feeds on my Adafruit IO account.
The problem is that when I try to send new values, I'm faced with a ban from publishing because the 2 seconds time limit is violated when two or more of my MCUs happen to send data at the same time (I can't synchronize them to avoid this).
is there any possible solution to this problem?
I suggest to consider those three options:
A sending token which is send from one ESp to the next. So basically all ESPs are mot allowed to send. If the token is received its allowed to send - waits the appropriate time limit hands the token to the next ESP. This solution has all Arduinos connected via an AP/router and would use client to client communication. It can be setup fail safe, so if the next ESP is not available (reset/out of battery etc) you take the next on the list and issue an additional warning to the server
The second solution could be (more flexible and dynamic BUT SPO - single point of failure) to set up one ESP as data collector to do the sending.
If the ESps are in different locations you have to set them up that they meet the following requirement:
If you have a free Adafruit IO Account, the rate limit is 30 data
points per minute.
If you exceed this limit, a notice will be sent to the
{username}/throttle MQTT topic. You can subscribe to the topic if you
wish to know when the Adafruit IO rate limit has been exceeded for
your user account. This limit applies to all Data record modification
actions over the HTTP and MQTT APIs, so if you have multiple devices
or clients publishing data, be sure to delay their updates enough that
the total rate is below your account limit.
so its not 2 sec limit but 30/min (60/min if pro) so you limit sending each ESP to the formula:
30 / Number of ESPs sending to I/O -> 30 / 5 = 6 ==> 5 incl. saftey margin
means each ESP is within a minute only allowed to send 5 times. Important if the 5 times send limit is up it HAS to wait a minute before the next send.
The answer is simple, just don't send that frequent.
In the IoT world
If data need frequent update (such as motor/servo, accelerometer, etc.), it is often that you'd want to keep it local and won't want/need to send it to the cloud.
If the data need to be in the cloud, it is often not necessary need to be updated so frequently (such as temperature/humidity).
Alternatively, if you still think that your data is so critical that need to be updated so frequently, dedicate one ESP as your Edge Gateway to collect the data from sensor nodes, and send it to the cloud at once, that actually the proper way of an IoT network design with multiple sensor nodes.
If that still doesn't work for you, you still have the choice of pay for the premium service to raise the rate limit, or build your own cloud service and integrate it with your Edge Gateway.

Why does MPI_Barrier does not stop at the same time for workers across different nodes?

I used the following piece of codes to synchronize between working nodes on different boxes:
MPI_Barrier(MPI_COMM_WORLD);
gettimeofday(&time[0], NULL);
printf("RANK: %d starts at %lld sec, %lld usec\n",rank, time[0].tv_sec, time[0].tv_usec);
when I run two tasks in the same node, the starting time are quite close:
RANK: 0 starts at 1379381886 sec, 27296 usec
RANK: 1 starts at 1379381886 sec, 27290 usec
However, when I run two tasks across two different nodes, I ended up with more different starting time:
RANK: 0 starts at 1379381798 sec, 720113 usec
RANK: 1 starts at 1379381798 sec, 718676 usec
Is the following different reasonable? Or it implies some communication issue between nodes?
A barrier means that different nodes will synchronize. They do this by exchanging messages. However once a node has received message from all the other nodes that they reached the barrier, that node will continue. There is no reason to wait executing code since barriers are mainly used to guarantee for instance all nodes have processed data, not to synchronize nodes in time...
One can never synchronize nodes in time. Only by using strict protocols like the simple time protocol (STP), one can guarantee clocks are set approximately equal.
A barrier was introduced to guarantee code before the barrier is executed before nodes start executing something else.
For instance let's say all nodes execute the following code:
MethodA();
MPI_Barrier();
MethodB();
Then you can be sure that if a node executes MethodB, all other nodes have executed MethodA, however you don't know anything about how much they have already processed of MethodB.
Latency has a high influence on the execution time. Say for instance machineA was somehow faster than machineB (we assume a WORLD with two machines and time differences can be caused by caching,...)
If machine A reaches the barrier, it will send a message to machineB and wait for a message of machineB which says machineB has reached the barrier as well. In a next timeframe machineB reaches the barrier as well and sends a message to machineA. However machineB can immidiately continue process data since it already received the message of machineA. machineA however, must wait until the message arrives. Of course this message will arrive quite soon, but cause some timedifference. Furthermore it's not guaranteed that the message will be received correctly the first time. If machineA does not confirm it received the message, machineB will resend it after a while causing more and more delay. However in a LAN network packet loss is very unlikely.
One can thus claim the time of transmission (latency) has an impact on the difference in time.

How would I simulate TCP-RTM using/in NS2?

Here is a paper named "TCP-RTM: Using TCP for Real Time Multimedia Applications" by Sam Liang, David Cheriton.
This paper is to adapt tcp to be used in Real time application.
The two major modification which i actually want you to help me are:
On application-level read on the TCP connection, if there is no in sequence data queued to read but one or more out-of-order packets are queued for the connection, the first contiguous range of out-of-order packets is moved from the out-of-order queue to the receive queue, the receive pointer is advanced beyond these packets, and the resulting data delivered to the application. On reception of an out-of-order packet with a sequence number logically greater than the current receive pointer (rcv next ptr) and with a reader waiting on the connection, the packet data is delivered to the waiting receiver, the receive pointer is advanced past this data and this new receive pointer is
returned in the next acknowledgment segment.
In the case that the sender’s send-buffer is full due to large amount of backlogged data, TCP-RTM discards the oldest data segment in the buffer and accepts the new data written by the application. TCP-RTM also advances its send-window past the discarded data segment. This way, the application write calls are never blocked and the timing of the sender application is not broken.
They actually changed the 'tcpreno with sack' version of tcp in an old linux 2.2 kernel in real environment.
But, I want to simulate this in NS2.
I can work with NS2 e.g., analyzing, making performance graphs etc. I looked all the related files but can't find where to change.
So, would you please help me to do this.

Resources