A question over whether emonCMS is compatible with MQTT

pb66 · 15 April 2018 18:24

A couple of recent discussions have brought this into question for me.

I’m specifically thinking about how it is/should be implemented in emonHub but I now wonder if this is actually a more global question. Since I do not regularly use MQTT for various reasons, it might just be that I’m over looking something or maybe there is an issue, but it can be easily overcome.

The posts I refer to are the “Mqtt_input service high memory usage” thread that got me thinking about MQTT QoS and what that means to the route to emoncms. The other post was “Data not logged locally” which made me question the value of the QoS 2 settings and therefore question the suitability of using MQTT to post to emoncms with the current emoncms implementation.

When MQTT is published with a QOS level of 2 “exactly once” it involves buffering the data until receipt is confirmed (twice per connection) and since all MQTT involves 2 connections, 1 to the broker and 1 from the broker. What is unclear to me is what is happening with the queue(s). I did find a blog post that says the queue is “FIFO” but that isn’t official documentation.

“In summary, this is really just a FIFO queue that is stored on the broker per topic and for each client that has subscribed to the topic with qos >= 1. It is useful (and necessary) whenever you have events or sensor readings that you do not want to miss, even if the receiving client is temporarily unavailable.”

For now I will assume that to be the case, it’s what I would expect or at least hope for. If the buffered messages do not get released in order once a connection is re-established, that raises another question around the fact that emoncms can only post data in chronological order, this is more so with the “buffered-write” implementation of the low-write mode of emoncms, but because the “buffered-write” mode makes it so difficult to insert data and the low-write mode is essential to the widely used sdcard operation of the emonSD, the insert function’s and api’s do not get developed, so essentially it would apply to all emoncms installs.

The main concern is whether the benifit of QoS 2 is being felt in emoncms, or is it just adding extra processing and slowing the connection down that much that it creates a queue that is unlikely to reach it’s destination.

From what I understand for the FIFO buffer and therefore QoS 2 to work there must be a solid connection that never disconnects (when there is a queue) or the connection between emoncms and the broker must be made “persistent” (see Persistent Session and Queuing Messages - MQTT Essentials: Part 7).

To create a “persistent” MQTT connection the client must supply a client id so that the buffered queue can be repatriated with the correct client. Ordinarily it seems that the default connection method is a “clean” session, which basically tells the broker the new connection is not interested in any previously queued data and that once disconnected, it will not want any data queued until it reconnects. This undermines the intention of using QoS 2 (or even 1) in emoncms applications.

Looking at the docs for the MQTT lib used in emoncms, it seems the constructor needs to be called with a client ID and cleanSession = false (see Mosquitto\Client — Mosquitto-PHP 0.4.0 documentation).

Currently emoncms constucts a MQTT client with no parameters

github.com

emoncms/emoncms/blob/master/scripts/phpmqtt_input.php#L101

    
      
                      $log->error("Cannot connect to redis at ".$settings['redis']['host'].", autentication failed"); die('Check log\n');
                  }
              }
          } else {
              $redis = false;
          }
          
          
require("Modules/user/user_model.php");
          $user = new User($mysqli,$redis,null);
          
          
require_once "Modules/feed/feed_model.php";
          $feed = new Feed($mysqli,$redis,$settings['feed']);
          
          
require_once "Modules/input/input_model.php";
          $input = new Input($mysqli,$redis,$feed);
          
          
require_once "Modules/process/process_model.php";
          $process = new Process($mysqli,$input,$feed,$user->get_timezone($mqttsettings['userid']));
          
          
$device = false;
          if (file_exists("Modules/device/device_model.php")) {

So each time it disconnects and reconnects it is assigned a new random Client ID by the broker, a null/omitted client ID implies a cleanSession = true so no queued data will be passed on re-connection. Essentially, every time there is a dis-connect and reconnect emoncms is starting a fresh connection, even when there is queued data for the taking, emoncms will not receive that data. QoS 2, in this instance doesn’t persist a break in the connection, no matter how short or prolonged it is. Unfortunately, emoncms see’s a huge number of reconnects so I assume there is little or no queued data making it to emoncms.

I believe that we should be at least supplying a client id parameter in the mqtt client constructor, the cleanSession = false might be implied by supplying the Client ID, might be better to explicitly set it false anyway.

I suggest we (currently) use the write apikey of the first user, this will currently work as only the first user has mqtt inputs capability in emoncms. We don’t want a hardcoded client id incase more than one emoncms instance is connected to any one broker, but when the mqtt_input becomes multi-user, we do not want this Client id to be per user, so maybe we need a unique ref for each emoncms (in settings.php?), or we can continue to use the apikey of “the admin”, assuming only one admin user.

I have submitted a PR to the emoncms repo for consideration, but I have not tested it, it is predominantly to get a discussion going. But if anyone feels like trying it out, please do, we can devise a test to determine if it works by taking emoncms offline for a few minutes whilst a queue builds up in the broker and then restart emoncms to see if the data is complete, both with and without the “client id”.

Can anyone confirm or disprove my ramblings?

pb66 · 15 April 2018 22:30

It has just dawned on me that all of this is pretty irrelevant. In fact emoncms should be changed to QoS 0 until proper timestamping is implemented across the board for MQTT. Who want’s data that potentially hours old arriving late, getting timestamped with the current time and blocking the new current and correct data?

stephen · 15 April 2018 22:32

I would say have a unique client name very important. though i do not use emoncms per say but I extensively use MQTT in my devices communication ( I can have hundreds connections in the background) … you probably be better to say link the device name to the MAC. as i doubt you be running more then on instance on pi for most people. I think most people problems are that thier MQTT client names are the same on several of their devices. and when they start to overlap and data is lost or MQTT input service has high memory usage. I actually ran into that problem when building my TFT dashboard took me a bit to figure out what was going on. I had several dashboards loaded with the same MQTT client name and all sort of weird things going on – now when I write my code ( for both host and MQTT client)I try to use say use a device name “TFT dashboard” and then auto tack on the mac at the end it also makes it easier for debugging later you will see in the DHCP list that the dashboard loaded and connected and you can figure out any network issues because you have the MAC listed if need be of the device … ( I do not worry about QoS i just use default settings) and since it really only trivial data why back log the system to guarantee delivery though I send my data every 1/2 second on some of my devices the highest is every 3 seconds . if I loose 1 in 100000 or even 1 in 1000 sends what difference is it going to make … if you have a cr@ppy network then enable QoS ( ie sat connection or something with very high latency. but then you are also going to have to reduce your send frequency as well to compensate for any major delays )

stephen · 15 April 2018 22:43

why would they implement that when all you need to do is send the timestamp within the payload

pb66 · 15 April 2018 23:09

To put the timestamp in the payload is a proper implementation. Currently emoncms is set up to receive mqtt data as a single raw value in a dedicated topic, there is no place for a timestamp.

There has already been discussions on this (in the EmonHub Development thread) and hopefully, there will soon be a more resilient method of posting mqtt data to emoncms. It has been recognized that there is significant difference in “broadcasting” the current value/status for everything as a separate topic for systems to be linked into and passing time specific data for the purpose of keeping records and accurately “monitoring” energy and environmental data.

The MAC id idea is fairly good, but for me it wouldn’t work as I try to have a “test” emoncms running to pull in changes and updates to, before pulling them in to the live instance.

Most energy data is collected at 10s intervals and every datapoint is valuable, I do not subscribe to the idea that if we have an abundance of data we can afford to be wasteful. I would like to get as complete a picture as possible without any missing data, that is easily done with buffering and confirmed delivery. If I was sending so much data that some was getting lost, that tells me I need to slow things down, if the data isn’t impotant enough to want to receive every datapoint, why send it?

The reason QoS 2 should be used is that (for example) updating and upgrading, when apache or emoncms or mysql needs updating, the mqtt broker should queue the data and pass it when the emoncms instance comes up, likewise when the MQTT broker needs updating, emonhub will (one day) buffer the data until mosquitto comes back on line. emonhub can be updated whilst it is running, it only needs restarting once the update is done, if the broker is online, there will be no buffered data to loose and only the few seconds it takes emonhub to restart will be missing data. This also applies to network outages etc, following a power cut, everything powers up, but the router can be somewhat slower, so buffering data is a good solution if all your servers are not on one device.

stephen · 16 April 2018 00:24

okay did not know that as I do not use emoncms- then maybe think about influx as your back end in emoncms at some point .

okay do you send as batch or as average of 10 seconds as most of mine is ~1/2 - 1 second sends so I have alot of data coming in then I might average it at the end. say every 3 second divide by the number of successful sends (usually it is 5)… I can see your point it you send a single data point every 10 seconds that’s alot of missing info if you loose a few data point in a row … if I loose a few data point every so often it has very little effect on my over all data profile.

if you used influx as backend you could of used my method of data preservation that I use on openwrt router as my mqtt server… all the important incoming MQTT data is stored in a CSV timestamped as it arrives then deletes the CSV once it has being sent to influxdb which is recreated once the newest MQTT is recieved and at end of each day a new CSV is created with the days timestamp… then I batch send the data every 30 seconds (user defined) to a influxdb… if the network goes out for days it sends the data when it finally connects. it go through all the CSV and delete them as they are uploaded and moves on to the next CSV… but I guess one could apply it to how you currently send your data to emoncms as well

borpin · 20 April 2018 08:17

Yes there is as part of a Full JSON object

There is an outstanding PR though, to improve the date time handling.

github.com/emoncms/emoncms

Modify time validation for API and MQTT input

emoncms:master ← borpin:bpo-API_MQTT-input

opened 09:36AM - 09 Apr 18 UTC

borpin

+97 -27

Fix for issue #777 and a replacement PR for #804. Validate the time input val…ue for both the API (less bulk input) and MQTT input, to allow time to be in ISO string and seconds. Updated validation of time input to MQTT to make more robust. Used same method in input_mthods so both input methods allow same time format. Updated API help page. Not tested any impact to bulk upload method. Should be none.

The QOS issue is an interesting one. I don’t think we do use is correctly. As you say QOS 2 can only really be used if there is a timestamp, QOS 1 is probably the best as it does actually retry whereas QOS 0 I think just fires and forgets.

I have also added a PR that should prevent the MQTT client raising an exception.

github.com/emoncms/emoncms

Prevent MQTT Client exception

emoncms:master ← borpin:bpo-MQTTClient

opened 09:56AM - 09 Apr 18 UTC

borpin

+23 -14

Issue - subscribing to a topic when the MQTT client was not connected caused an …exception; not fatal but not nice. This has been noticed on a number of topics recently on the Forum. [Documentation](http://mosquitto-php.readthedocs.io/en/latest/client.html) The onConnect callback is only called if there is a CONNACK from the MQTT server. The onConnect callback assumed than any call to it meant a successful connection. This is not correct, only a response of '0' means success, so $connected now set as such. The reconnect try, did not check the connected flag before attempting to subscribe to the topic. By doing so the exception can be avoided.

borpin · 26 April 2018 16:54

@pb66, I think you might have missed my reply.

In addition,

Need a bit more work around this. If it is a persistent client, you do not need (nor probably want) to resubscribe to the topic. Currently every reconnect includes a subscribe (which can throw an exception if the connect request was in fact unsuccessful).

Does every implementation of an MQTT broker support persistent connection?

Also looking at that reference you supplied;

each MQTT client must store a persistent session too. So when a client requests the server to hold session data, it also has the responsibility to hold some information by itself:

Not sure how to achieve that.

For now, I’d actually suggest a QOS of 1 until we can get the session stuff sorted out.

pb66 · 27 April 2018 12:36

IMO QoS 1 is actually the last thing the current implementation needs. With an uncontrolled “at least once” this could cause all sorts of problems with duplicate posts, not only with “zero time changes” when the payload gets duplicated within the same second, but also when it gets duplicated with a different timestamp.

QoS 0 would mean only the data that’s missed during “outages” is missed, but at least all the timestamps applied at arrival in emoncms will be almost correct and duplication will be avoided regardless of all the disconnections and reconnections. And QoS 2 should work ok between disconnections (ie no duplicates) and should result in less duplicated inputs than QoS1 because only the unconfirmed are duplicated (exactly once). But currently any QoS2 attempts to ensure delivery is made, even if late, result in skewed timestamps.

Originally emoncms was QoS0 and at some point it’s been deemed that QoS2 would be better, I can’t argue with that sentiment, but changing the MQTT implementation to a QoS 2 level implementation is more than just changing the QoS setting at connection time.

I think QoS0 must be used when data is just “fired off” with no timestamps as the data is time specific and if it’s not delivered immediately so that emoncms can give it a meaningful timestamp it probably is better “forgotten”. Ideally QoS2 should be used with timestamped data if a persistent connection can be made. I really do not like the idea of encouraging duplicate inputs with QoS1. Besides QoS1 requires a client id too.

Since there is only one connection in emoncms, the same QoS setting is used across all the subscriptions and API’s, so we cannot have QoS2 for the timestamped data and QoS0 for non-timestamped, therefore IMO it has to be QoS0 all round until either ALL inputs are timestamped or the publishing source(s) dial back the QoS on non-timestamped data to 0 or emoncms gets multiple topics to subscribe to allowing different QoS settings to be set.

Perhaps the thing to do for now is too change emonhub to QoS0 since it’s the source of the un-timestamped data, which is what deems it “short life” and therefore better suited to “fire and forget”. If/when the MQTT implementation gets improved in emoncms, then emonhub can then change to use the new api’s.

The interaction between emonhub amnd emoncms is discussed at length in the EmonHub Development thread and will possibly result in multiple api’s (perhaps mirroring the http api’s). Perhaps multiple topics to allow different QoS levels might need to be considered.

However even if emoncms has a single QoS2 level implementation it will be able to get QoS 0,1 and 2 level data with the source QoS levels intact (lowest prevails), so it still needs to include some way of filtering duplicate posts in case a 3rd party source publishes data with QoS1 which could still get duplicated even with a steady QoS2 level “persistent” connection between emoncms and the broker.

borpin · 29 April 2018 10:26

Had not realised that emonhub was using the MQTT interface by default on EmonPis.

Yes I think we need the ability to subscribe to different topics at different QOS.

borpin · 9 May 2018 21:03

I have updated the PR for my attempts to avoid the exceptions and at the same time added in @pb66 code for a persistent connection.

github.com/emoncms/emoncms

Prevent MQTT Client exception

emoncms:master ← borpin:bpo-MQTTClient

opened 09:56AM - 09 Apr 18 UTC

borpin

+23 -14

Issue - subscribing to a topic when the MQTT client was not connected caused an …exception; not fatal but not nice. This has been noticed on a number of topics recently on the Forum. [Documentation](http://mosquitto-php.readthedocs.io/en/latest/client.html) The onConnect callback is only called if there is a CONNACK from the MQTT server. The onConnect callback assumed than any call to it meant a successful connection. This is not correct, only a response of '0' means success, so $connected now set as such. The reconnect try, did not check the connected flag before attempting to subscribe to the topic. By doing so the exception can be avoided.

With the persistent connection made, the same topic ID is returned on subscribing when either the daemon is stopped or if the MQTT broker is stopped. This was tested with data from node-red (QOS2, retain=true) rather than emonhub.

Without a persistent connection, the ID increases on each re-subscription.

pb66 · 13 May 2018 16:53

Stumbled across something else today.

I happened to see this error message and got curious

2018-05-13 15:05:05.803|ERROR|phpmqtt_input.php|exception 'Mosquitto\Exception' in /var/www/emoncms/scripts/phpmqtt_input.php:125
Stack trace:
#0 /var/www/emoncms/scripts/phpmqtt_input.php(125): Mosquitto\Client->connect('localhost', 1883, 5)
#1 {main}
2018-05-13 15:05:05.848|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-05-13 15:05:11.001|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-05-13 15:05:11.003|WARN|phpmqtt_input.php|Connecting to MQTT server: Connection Accepted.: code: 0
2018-05-13 15:05:21.288|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-05-13 15:05:21.298|WARN|phpmqtt_input.php|Connecting to MQTT server: Connection Accepted.: code: 0

Line 125 of phpmqtt_input.php is $mqtt_client->connect($mqtt_server['host'], $mqtt_server['port'], 5);. It looks pretty straight forward but I didn’t know what the “5” was, so I looked into it and found it is the $keepalive setting.

Having read a little about it, I now believe it is being used incorrectly or least it might incomplete. The usual disclaimers apply here, I’m not an MQTT expert and I’m learning “on the job” so to speak.

My understanding is that a client (in this case emoncms) can define a time interval that the broker will use to asses if the client is still “alive” and communicating with the broker.

Bearing in mind that the emoncms mqtt input script is for subscribing only, it doesn’t publish any data, that means that the only time the client contacts the broker is in reply to a received topic ie a confirmation etc.

So I wonder to myself, why is it setting such a low $keepalive setting of 5 secs?

Apparently the broker will assume there is a problem if 5 secs passes between messages from the client and will after a further 2.5secs (total of 150% of defined interval) disconnect the connection to initiate a reconnection.

This would then suggest that emoncms is volunteering to be cut-off from the broker after a maximum of 7.5 secs following any received data.

From what I’ve read it is the clients responsibility to issue/publish a PINGREQ packet to tell the broker it is still connected if 5secs passes since IT last communicated with the broker to ensure the broker never kills a good but quiet connection. It seems to be like a watchdog where the broker will reset/restart the connection if the client appears absent for longer than it has defined as a “keepalive” interval.

I see no evidence of emoncms doing that and since it only responds to published data it appears to be unnecessarily leaving itself open to the broker disconnecting if a 5sec datapoint is just 2.5s late.

In the case of a emonpi this might mean a single missed 5s payload might result in a disconnection. But an emonBase that has no regular 5s data to keep it alive would be forever resetting unless it has several devices well spread out time wise so that 7.5secs never passes between mqtt payloads.

Apparently by omitting the “5” the default is used (although I cannot find what that is definitively) and if a value of “0” is defined the function is disabled, although just extending the keep alive to a minute or 2 might be a big improvement, I suspect the proper solution for emoncms would be to set a “keepalive” variable in emoncms and use that to both inform the broker and for a function to time between service messages, triggering a PINGREQ if the interval passes, when emoncms sends a PINGREQ the broker will respond with a PINGRESP, if emoncms doesn’t get a PINGRESP within a reasonable time it (emoncms) should then initiate a reconnection from the client end.

I believe the documentation is wrong for the php-mqtt extension we use

$keepalive (int) – Optional. Number of sections after which the broker should PING the client if no messages have been recieved.

and have found this issue on github (API docs describe keepalive incorrectly · Issue #621 · eclipse/mosquitto · GitHub) for the Mosquitto broker that shows it to was documented the same (incorrect?) way until late last year.

Although not for Mosquitto specifically, this page (What Is MQTT Keep Alive and Client Take-Over? – MQTT Essentials Part 10) describes it quite well and the offical MQTT spec is here (MQTT Version 3.1.1).

[edit] I have created another PR so this might attract some attention for testing or further consideration.

github.com/emoncms/emoncms

decrease keepalive sensitivity to reduce disconnections

emoncms:master ← emoncms:pb66-patch-2

opened 05:02PM - 13 May 18 UTC

pb66

+1 -1

See https://community.openenergymonitor.org/t/a-question-over-whether-emoncms-is…-compatible-with-mqtt/7180/12?u=pb66 I suspect setting a keepalive of "5" is actually resulting in MORE disconnections than a more relaxed setting of "60". But it may be better to go for "180" or more to accomadate setups with just a couple of emonTH's. **PLEASE DON'T MERGE WITHOUT DISCUSSION AND CONSIDERATION OF THE CLIENT ID PR TOO!**

perhaps users with frequent MQTT reconnection issues could try relaxing the setting to see what effect it has.

borpin · 13 May 2018 21:16

Remembering that Mosquitto-PHP is a wrapper for libmosquitto, the default in set within the Mosquitto-PHP code. Of course, this is simply sending a setting to libmosquitto. (BTW is there markdown that will reduce the number of lines displayed below?)

Click here to show code

github.com

mgdm/Mosquitto-PHP/blob/35d77cdc01bfb0db46df0cb57165487cc0a75001/mosquitto.c#L386-L389


      
          	int retval;
          	zend_long port = 1883;
          	zend_long keepalive = 60;

However, the Eclipse Mosquitto documentation for the ‘mosquitto’ broker, also states that the default is 60 seconds and the minimum is 5.

Does seem rather odd to have it set to the minimum and I’d suggest simply reverting to the default.

Looking through my mosquitto logs, I had noticed the occasional disconnection (this is a non-emonhub setup that is getting data every 10s from node-red, both machines are a VM). I’ll go to the default and look and see if the disconnections cease.

pb66 · 13 May 2018 23:00

Yes I had read that the default was 60s for php-mosquitto, but I hadn’t seen any official docs, I have also read there is no default setting in Mosquitto itself. The link you provide I had seen but it crops up in several discussions that setting is (as indicated on that page) exclusively for broker to broker “bridges” not for clients as we are discussing here.

There are several unsupported claims in various discussions ranging from “there is no default” as in the default is “off” and someone who conducted some tests concluded it was 15s, and others that quote/debate the 60s “bridge” keepalive, so i didn’t want to draw any conclusions.

I guess the Mosquitto default is a moot point if we are using php-mosquitto any ways since that default will kick in if emoncms doesn’t define an interval, so the broker default will never get used.

Indeed, I think someone has chosen that hoping to get more frequent check thinking that would keep the connection alive, the faster you check it, the faster it will be reconnected. When in actual fact that approach ensures you get a very sensitive connection and the maximum chance of causing dis/re-connections.

Cool, lets see what you discover.

Not that I know of unfortunately. That’s why I’ve started do things like

Line 388 of mosquitto.c is

    zend_long keepalive = 60;

but it’s a PITA

Bill.Thomson · 14 May 2018 04:21

@borpin,

I edited your post to include a line that toggles showing / hiding the text when that line is clicked on.
Is that what you were asking about? Here’s how that’s implemented:

<details>
<summary>Click here to show code</summary>

https://github.com/mgdm/Mosquitto-PHP/blob/35d77cdc01bfb0db46df0cb57165487cc0a75001/mosquitto.c#L388

</details>

borpin · 14 May 2018 07:12

Thanks, useful, but I am sure I have seen somewhere (possibly on a different Discourse based forum) a code quote that was just a couple of lines long, but was definitely a direct link.

[Edit] Ahha (said piglet). If you amend the URL to specify the lines, you will just get those lines . Edited post above.

borpin · 14 May 2018 07:32

Ah missed that

The other thing that has just occurred to me is that if the client is sending a ping, that should be visible using tshark. I’ll have a look tonight.

pb66 · 14 May 2018 08:48

Ahh ok, I thought you were trying to pinpoint the one exact line as per the first url, to specifically identify as well as show the code and it’s location. But as it turns out, you can still use the same “code block” url format but set the from and to line numbers to the one line eg

github.com

mgdm/Mosquitto-PHP/blob/35d77cdc01bfb0db46df0cb57165487cc0a75001/mosquitto.c#L388-L388


      
          	zend_long keepalive = 60;

That would be good to know, although looking at the emoncms code, I doubt it will be there unless it is handled by the mosquitto-php extension, and I would have thought the extension was just various functions wrapped specifically for use in php, as in a Lib, I didn’t expect the extension to play any directly active role.

Bill.Thomson · 14 May 2018 08:53

Nice.
Thanks for the tip!

borpin · 14 May 2018 09:10

It was a bit of both really. The standard number of lines just takes up too much space.

I think it should be the broker that is sending out the Ping to confirm the connection is still alive rather than the client (I think if I have read correctly). However, the broker/client protocol for handling that seems a little fuzzy.