Can telegraf store metrics data on the local system during network outages and later forward it? - telegraf

We have IoT devices which are mostly connected well to the internet, but there is a possibility that the network goes down. For this case, the device itself will do the right thing (while it cannot be actively controlled any more). We would still like to get metrics data for the time in which the network is down.
It means a device-local telegraf would need to collect the metrics data, store it and check on the network connection. If the network is up (again), then forward to a influxDB for example.
Is it possible to achieve this scenario with Telegraf/InfluxDB or prometheus?

Telegraf can not store metrics on the local drive in the case of a failure. However, in can buffer unsuccessfully sent metrics (I believe in RAM) and flush the buffer on the successful write. Take a look at metric_buffer_limit option in Telegraf config:
# Configuration for telegraf agent
[agent]
## For failed writes, telegraf will cache metric_buffer_limit metrics for each
## output, and will flush this buffer on a successful write. Oldest metrics
## are dropped first when this buffer fills.
## This buffer only fills when writes fail to output plugin(s).
metric_buffer_limit = 10000
This way, as far as you don't overflow this buffer, metrics collected while InfluxDB is down will still be preserved and resent later.
Edit: you can track a similar feature request here.

Related

Adafruit IO data rate limit

I'm trying to send data from multiple ESP-8266 to feeds on my Adafruit IO account.
The problem is that when I try to send new values, I'm faced with a ban from publishing because the 2 seconds time limit is violated when two or more of my MCUs happen to send data at the same time (I can't synchronize them to avoid this).
is there any possible solution to this problem?
I suggest to consider those three options:
A sending token which is send from one ESp to the next. So basically all ESPs are mot allowed to send. If the token is received its allowed to send - waits the appropriate time limit hands the token to the next ESP. This solution has all Arduinos connected via an AP/router and would use client to client communication. It can be setup fail safe, so if the next ESP is not available (reset/out of battery etc) you take the next on the list and issue an additional warning to the server
The second solution could be (more flexible and dynamic BUT SPO - single point of failure) to set up one ESP as data collector to do the sending.
If the ESps are in different locations you have to set them up that they meet the following requirement:
If you have a free Adafruit IO Account, the rate limit is 30 data
points per minute.
If you exceed this limit, a notice will be sent to the
{username}/throttle MQTT topic. You can subscribe to the topic if you
wish to know when the Adafruit IO rate limit has been exceeded for
your user account. This limit applies to all Data record modification
actions over the HTTP and MQTT APIs, so if you have multiple devices
or clients publishing data, be sure to delay their updates enough that
the total rate is below your account limit.
so its not 2 sec limit but 30/min (60/min if pro) so you limit sending each ESP to the formula:
30 / Number of ESPs sending to I/O -> 30 / 5 = 6 ==> 5 incl. saftey margin
means each ESP is within a minute only allowed to send 5 times. Important if the 5 times send limit is up it HAS to wait a minute before the next send.
The answer is simple, just don't send that frequent.
In the IoT world
If data need frequent update (such as motor/servo, accelerometer, etc.), it is often that you'd want to keep it local and won't want/need to send it to the cloud.
If the data need to be in the cloud, it is often not necessary need to be updated so frequently (such as temperature/humidity).
Alternatively, if you still think that your data is so critical that need to be updated so frequently, dedicate one ESP as your Edge Gateway to collect the data from sensor nodes, and send it to the cloud at once, that actually the proper way of an IoT network design with multiple sensor nodes.
If that still doesn't work for you, you still have the choice of pay for the premium service to raise the rate limit, or build your own cloud service and integrate it with your Edge Gateway.

mqtt paho network loop unnecessary?

I have seen a number of examples of paho clients reading sensor data then publishing, e.g., https://github.com/jamesmoulding/motion-sensor/blob/master/open.py. None that I have seen have started a network loop as suggested in https://eclipse.org/paho/clients/python/docs/#network-loop. I am wondering if the network loop is unnecessary for publishing? Perhaps only needed if I am subscribed to something?
To expand on what #hardillb has said a bit, his point 2 "To send the ping packets needed to keep a connection alive" is only strictly necessary if you aren't publishing at a rate sufficient to match the keepalive you set when connecting. In other words, it's entirely possible the client will never need to send a PINGREQ and hence never need to receive a PINGRESP.
However, the more important point is that it is impossible to guarantee that calling publish() will actually complete sending the message without using the network loop. It may work some of the time, but could fail to complete sending a message at any time.
The next version of the client will allow you to do this:
m = mqttc.publish("class", "bar", qos=2)
m.wait_for_publish()
But this will require that the network loop is being processed in a separate thread, as with loop_start().
The network loop is needed for a number of things:
To deal with incoming messages
To send the ping packets needed to keep a connection alive
To handle the extra packets needed for high QOS
Send messages that take up more than one network packet (e.g. bigger than local MTU)
The ping messages are only needed if you have a low message rate (less than 1 msg per keep alive period).
Given you can start the network loop in the background on a separate thread these days, I would recommend starting it regardless

How to send big data to an HTTP server where the client has limited RAM?

I am designing a temperature and humidity logging system using microcontrollers and embedded C. The system sends the data to the server using GET method (could be POST,too) whenever it gathers a new data from the sensors. Whenever the power and/or the internet is gone, it logs to an SD card.
I want to send these logged data to the server whenever the internet comes back. When sent separately, each of my request is as follows and it takes about 5 seconds to complete even on my local network.
GET /add.php?userID=0000mac=000:000:000:000:000:000&id=0000000000sensor=000&temp=%2B000.000&hum=000.000 HTTP/1.1\r\nHost: 192.168.10.25\r\n\r\n
However, since my available RAM is very limited to only about 400 bytes in this microcontroller, I cannot buffer and send all the logged data in one request.
For an electricity/internet loss of 2 days, about 3000 of data set is logged. How do I send these to the server in a very quick way?
Well I think a simple for loop will do it. Here is the pseudocode:
foreach(loggedRequest){
loadTheRequestFromTheLogFile();
sendTheDataForThisRequest(); // Hang here until the server returns a response
clearMemoryFromPreviousRequest();
}
By waiting for the server's response, you will be sure that the data got to the server as well as that you will not have the ram full.
You can also go with an inner for loop with a fixed number of requests and then wait for all their responses. This will be faster but you have to do multiple tests to be sure you're not reaching to the ram limit.

syslog drops logs silently

I'm using syslog to log data to a file - the data is pretty intensive, the order of thousands of rows every few seconds. What I observe is that trace amounts of logs are being missed - less than 0.1 % most of the times - but they're still missing. I have no explanation for why this occurs.
It doesn't seem to correlate directly to the amount of data being written because increasing the amount of data being written did not increase the rate of missed logs.
I'm wondering of ways to debug this - how could we understand or confirm if it is indeed syslog which is dropping data and if so why?
If you look at the source code for syslogd, you will see that the syslogd program only uses datagram sockets (type SOCK_DGRAM). These are by definition connectionsless but also not completely reliable in the sense that stream sockets are.
This is by design. Using stream sockets would mean that the syslog() call would have to wait for a confirmation that the message that it sent was received properly. So if syslogd was busy, every application that calls syslog() would block.
Syslogd was simply not designed with the volume of data that you are subjecting it to in mind. You could try enlarging the value of the sysctl variable kern.ipc.maxsockbuf, giving the logging socket a larger buffer.
If you want to make sure you capture everything, write to a file instead.

Xively: sending commands to devices - how to mark commands as read?

I can see how a datastream/channel can be used to send commands to a device (e.g. an actuator). The device can periodically poll the channel for incoming commands, but if the device has no storage of its own how can it tell which commands it has already received/processed?
This all depends on your implementation and hardware choices. And the real answer to this question lays far beyond the scope of Xively. You say that the device has no storage of its own, but I assume it has some kind of volatile memory at the very least.
The best thing to do would be to store the timestamp of the last datapoint that was received and compare it to whatever current data you have. If the timestamp is greater than the one in memory then you know it is new data.
An alternative to HTTP polling would be to use a socket with some kind of publish/subscribe interface that will allow you to received only new data from the server. Xively offers this on it's TCP, WebSockets, and MQTT socket servers.

Resources