How to keep mqtt_input alive?

It happened again, the emonpi stopped working because the log partition was full. See Update below Last logged value is from 3 August 13:54. In this case, the following error message occurred in millisecond increments and inflated the emoncms log to 26 MB.

2018-08-03 13:40:51.978|WARN|phpmqtt_input.php|Connecting to MQTT server: Connection Accepted.: code: 0
2018-08-03 13:55:59.677|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-08-03 13:56:04.041|WARN|phpmqtt_input.php|Connecting to MQTT server: Connection Accepted.: code: 0
2018-08-03 13:56:35.081|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-08-03 13:56:36.935|WARN|phpmqtt_input.php|Connecting to MQTT server: Connection Accepted.: code: 0
2018-08-03 13:57:30.695|ERROR|phpmqtt_input.php|exception 'Mosquitto\Exception' in /var/www/emoncms/scripts/phpmqtt_input.php:116
Stack trace:
#0 /var/www/emoncms/scripts/phpmqtt_input.php(116): Mosquitto\Client->loop()
#1 {main}
2018-08-03 13:57:35.175|ERROR|phpmqtt_input.php|exception 'Mosquitto\Exception' in /var/www/emoncms/scripts/phpmqtt_input.php:116
Stack trace:
#0 /var/www/emoncms/scripts/phpmqtt_input.php(116): Mosquitto\Client->loop()
#1 {main}
2018-08-03 13:57:40.153|ERROR|phpmqtt_input.php|exception 'Mosquitto\Exception' in /var/www/emoncms/scripts/phpmqtt_input.php:116
Stack trace:

… a total of 471360 more lines with the error message.

emoncms.log.zip (370.8 KB)

On the one hand it would be nice to find and fix the problem, but on the other hand it would also be nice if a bug didn’t cause the complete system to fail. As it is now, it is unfortunately very unreliable and hardly recommended for productive use.

Server Information
Emoncms Version low-write 9.8.31 : 2018.06.21
Modules Administration : App v1.1.1 : Backup v1.1.4 : EmonHub Config v1.0.0 : Dashboard v1.3.0 : EventProcesses : Feed : Graph v1.2.0 : Input : postprocess : CoreProcess : Schedule : setup : sync : Time : User : Visualisation : WiFi v1.3.0
Buffer loading…
Writer Daemon is running with sleep 60s
Server OS Linux 4.4.26-v7+
Host emonpi emonpi (127.0.1.1)
Date 2018-08-11 17:14:44 UTC
Uptime 17:14:44 up 159 days, 8:20, 0 users, load average: 0.75, 0.59, 0.56
HTTP Server Apache/2.4.10 (Raspbian) HTTP/1.1 CGI/1.1 80
MySQL Version 5.5.52-0+deb8u1
Host localhost (127.0.0.1)
Date 2018-08-11 17:14:44 (UTC 00:00‌​)
Stats Uptime: 13767616 Threads: 2 Questions: 1483013 Slow queries: 28 Opens: 74 Flush tables: 1 Open tables: 52 Queries per second avg: 0.107
Redis Version 2.8.17
Host localhost:6379 (127.0.0.1)
Size 712 keys (639.35K)
Uptime 159 days
MQTT Version 1.4.10
Host localhost:1883 (127.0.0.1)
Pi Model Raspberry Pi 3 Model B Rev 1.2 - 1 GB (Sony UK)
SoC Broadcom BCM2709
Serial num. F50B302B
Temperature CPU: 53.69°C - GPU: 53.7’C
Release emonSD-07Nov16
File-system Current: read-write - Set root file-system temporarily to read-write, (default read-only)
Memory RAM Used: 76.96% Total: 973.11 MB Used: 748.87 MB Free: 224.23 MB
Disk Mount Stats
/ Used: 67.69% Total: 3.33 GB Used: 2.25 GB Free: 936.63 MB
/boot Used: 34.72% Total: 59.95 MB Used: 20.81 MB Free: 39.13 MB
/home/pi/data Used: 20.02% Total: 3.46 GB Used: 709.53 MB Free: 2.59 GB
PHP Version 5.6.27-0+deb8u1 (Zend Version 2.6.0)
Modules apache2handler : bcmath : bz2 : calendar : Core v5.6.27-0+deb8u1 : ctype : curl : date v5.6.27-0+deb8u1 : dba : dio v0.0.4RC4 : dom v20031129 : ereg : exif v1.4 $Id: 8b0e34c10dc8a04b8e81d9d79985b2566141b03d $ : fileinfo v1.0.5 : filter v0.11.0 : ftp : gettext : hash v1.0 : iconv : json v1.3.6 : libxml : mbstring : mcrypt : mhash : mosquitto v0.3.0 : mysql v1.0 : mysqli v0.1 : openssl : pcre : PDO v1.0.4dev : pdo_mysql v1.0.2 : Phar v2.0.2 : posix : readline v5.6.27-0+deb8u1 : redis v2.2.7 : Reflection v$Id: 5f15287237d5f78d75b19c26915aa7bd83dee8b8 $ : session : shmop : SimpleXML v0.1 : soap : sockets : SPL v0.2 : standard v5.6.27-0+deb8u1 : sysvmsg : sysvsem : sysvshm : tokenizer v0.1 : wddx : xml : xmlreader v0.1 : xmlwriter v0.1 : Zend OPcache v7.0.6-devFE : zip v1.12.5 : zlib v2.0

Client Information
HTTP Browser Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1 Safari/605.1.15
Screen Resolution 1280 x 800
Window Size 1676 x 844

All services seem to be running, but no inputs arrive when the log partition is full:

[email protected](rw):~$ sudo systemctl status {feedwriter,emonhub,mosquitto,redis-server,mqtt_input}
● feedwriter.service - LSB: feedwriter script daemon
   Loaded: loaded (/etc/init.d/feedwriter)
   Active: active (exited) since Mon 2018-05-21 22:23:24 UTC; 2 months 20 days ago

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

● emonhub.service - LSB: Start/stop emonHub
   Loaded: loaded (/etc/init.d/emonhub)
   Active: active (exited) since Mon 2018-06-11 20:50:32 UTC; 1 months 30 days ago

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

● mosquitto.service - LSB: mosquitto MQTT v3.1 message broker
   Loaded: loaded (/etc/init.d/mosquitto)
   Active: active (running) since Mon 2018-03-26 22:44:58 UTC; 4 months 16 days ago
   CGroup: /system.slice/mosquitto.service
           └─26901 /usr/sbin/mosquitto -c /etc/mosquitto/mosquitto.conf

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

● redis-server.service - Advanced key-value store
   Loaded: loaded (/lib/systemd/system/redis-server.service; enabled)
   Active: active (running) since Mon 2018-03-05 08:55:08 UTC; 5 months 7 days ago
 Main PID: 1851 (redis-server)
   CGroup: /system.slice/redis-server.service
           └─1851 /usr/bin/redis-server 127.0.0.1:6379

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

● mqtt_input.service - Emoncms MQTT Input Script
   Loaded: loaded (/etc/systemd/system/mqtt_input.service; enabled)
   Active: active (running) since Wed 2018-07-18 03:31:47 UTC; 3 weeks 3 days ago
     Docs: https://github.com/emoncms/emoncms/blob/master/docs/RaspberryPi/MQTT.md
 Main PID: 27612 (php)
   CGroup: /system.slice/mqtt_input.service
           └─27612 /usr/bin/php /var/www/emoncms/scripts/phpmqtt_input.php

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
[email protected](rw):~$ 

Update2: sudo service mqtt_input restart was enough to make the system work again! I thought I had already deleted the log files, but have only copied them to a persistent location for debugging. The system is running again although the log portion is still full. Maybe the system does not stop because of the full log partition, but only because mqtt_input terminates due to an error and cannot restart itself?

Update3: Ok I did the test: restarted emonPi and completely filled the log partition with test files. The emonPi continues to run. So that is not the problem. The problem is “only” that mqtt_input gets a value that it does not expect and terminates and is not able to recover from the error.

Can you look and see what Git Branch and/or release you are running? There have been some major changes to the MQTT interface recently which I think have reached the master branch but not the stable.

Could you also state how it is installed (EmonSD, EmonPi from store etc).

Hi Brian,

thanks for your answer!
It is an emonPi purchased from the OpenEnergyMonitor Shop.
It is good to know that improvements have already been made here. Could you please refer me to the relevant git repository where the changes were made?

I would then be happy to test the corresponding changes.

Current Git branches in use:

  • emonpi: master
  • emonhub: emon-pi
  • emoncms: stable

Are others relevant?

Honestly I am not the person to ask as I don’t run an EmonPi just a self install emoncms.

I do note the emonhub & feedwriter have also exited and that is odd.

The rep that will have the changes is the emoncms:master. However that is really a use at own risk as I don’t think it has been fully tested. It significantly changes how the publish works. Are you just subscribing or do you have some publish processes?

You might find it better to initially try and just pull in the script involved as I think that should be backwards compatible.

The key issue I found is that the subscribe is attempted before the connect callback is complete. subscribing before a connection has been established causes the exception.

@pb66 and @TrystanLea - it seems stable is a little unstable for MQTT :slight_smile:

If interested look at the recent MQTT discussions here and closed PRs on github.

1 Like

you could modify the systemd service file so it always restarts… i dont know if this is desired behaviour for this script though

I think the stock file already does this. Modifying the stock systemd control file has been the source of discussion elsewhere as it is not good practice as it may get overwritten by an update. It is better to add in a new file or a snippet but no decision was made on what to do.