Another emonPi stopped logging thread

Hi

I’ve got an error similar to the ones that have been discussed in these threads:
https://community.openenergymonitor.org/t/emonpi-low-write-9-8-25-inputs-stopped-working/
https://community.openenergymonitor.org/t/mqtt-input-service-high-memory-usage

It’s been happening regularly for a while. I’ve been restarting emonhub each time and was hoping a fix would pop out of the above threads. However, emonPi has stopped logging readings locally, but is still sending them to emoncms.org. I’m not sure if this is happening in other cases? I’m pretty sure it wasn’t doing this before…

local emoncms feeds look like this:

remote feeds look like this:

the emoncms.log file contains the following:

2018-04-23 08:00:23.271|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-04-23 08:00:24.270|WARN|phpmqtt_input.php|Connecting to MQTT server: Connection Accepted.: code: 0
2018-04-23 08:00:33.780|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-04-23 08:00:35.581|WARN|phpmqtt_input.php|Connecting to MQTT server: Connection Accepted.: code: 0
2018-04-23 08:10:09.712|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-04-23 08:10:11.093|WARN|phpmqtt_input.php|Connecting to MQTT server: Connection Accepted.: code: 0
2018-04-23 08:10:35.620|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-04-23 08:10:38.882|WARN|phpmqtt_input.php|Connecting to MQTT server: Connection Accepted.: code: 0
2018-04-23 08:10:50.367|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-04-23 08:10:50.940|WARN|phpmqtt_input.php|Connecting to MQTT server: Connection Accepted.: code: 0
2018-04-23 08:11:36.846|ERROR|phpmqtt_input.php|exception 'ErrorException' with message 'Division by zero' in /var/www/emoncms/Modules/process/process_processlist.php:571
Stack trace:
#0 /var/www/emoncms/Modules/process/process_processlist.php(571): exceptions_error_handler(2, 'Division by zer...', '/var/www/emoncm...', 571, Array)
#1 [internal function]: Process_ProcessList->kwh_to_power('18', 1524471096, 10583.32236375, NULL, Object(Process))
#2 /var/www/emoncms/Modules/process/process_model.php(61): call_user_func_array(Array, Array)
#3 /var/www/emoncms/Modules/process/process_model.php(104): Process->__call('kwh_to_power', Array)
#4 /var/www/emoncms/Modules/process/process_model.php(104): Process->kwh_to_power('18', 1524471096, 10583.32236375, NULL)
#5 /var/www/emoncms/scripts/phpmqtt_input.php(281): Process->input(1524471096, '4112', '1:8,2:0.11192,3...')
#6 [internal function]: message(Object(Mosquitto\Message))
#7 /var/www/emoncms/scripts/phpmqtt_input.php(116): Mosquitto\Client->loop()
#8 {main}
2018-04-23 08:11:49.126|ERROR|phpmqtt_input.php|exception 'ErrorException' with message 'Division by zero' in /var/www/emoncms/Modules/process/process_processlist.php:571
Stack trace:
#0 /var/www/emoncms/Modules/process/process_processlist.php(571): exceptions_error_handler(2, 'Division by zer...', '/var/www/emoncm...', 571, Array)
#1 [internal function]: Process_ProcessList->kwh_to_power('18', 1524471108, 10583.32236375, NULL, Object(Process))
#2 /var/www/emoncms/Modules/process/process_model.php(61): call_user_func_array(Array, Array)
#3 /var/www/emoncms/Modules/process/process_model.php(104): Process->__call('kwh_to_power', Array)
#4 /var/www/emoncms/Modules/process/process_model.php(104): Process->kwh_to_power('18', 1524471108, 10583.32236375, NULL)
#5 /var/www/emoncms/scripts/phpmqtt_input.php(281): Process->input(1524471108, '4112', '1:8,2:0.11192,3...')
#6 [internal function]: message(Object(Mosquitto\Message))
#7 /var/www/emoncms/scripts/phpmqtt_input.php(116): Mosquitto\Client->loop()
#8 {main}
2018-04-23 08:11:59.832|ERROR|phpmqtt_input.php|exception 'ErrorException' with message 'Division by zero' in /var/www/emoncms/Modules/process/process_processlist.php:571
Stack trace:
#0 /var/www/emoncms/Modules/process/process_processlist.php(571): exceptions_error_handler(2, 'Division by zer...', '/var/www/emoncm...', 571, Array)
#1 [internal function]: Process_ProcessList->kwh_to_power('18', 1524471119, 10583.32236375, NULL, Object(Process))
#2 /var/www/emoncms/Modules/process/process_model.php(61): call_user_func_array(Array, Array)
#3 /var/www/emoncms/Modules/process/process_model.php(104): Process->__call('kwh_to_power', Array)
#4 /var/www/emoncms/Modules/process/process_model.php(104): Process->kwh_to_power('18', 1524471119, 10583.32236375, NULL)
#5 /var/www/emoncms/scripts/phpmqtt_input.php(281): Process->input(1524471119, '4112', '1:8,2:0.11192,3...')
#6 [internal function]: message(Object(Mosquitto\Message))
#7 /var/www/emoncms/scripts/phpmqtt_input.php(116): Mosquitto\Client->loop()
#8 {main}
2018-04-23 08:40:03.563|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-04-23 08:40:05.426|WARN|phpmqtt_input.php|Connecting to MQTT server: Connection Accepted.: code: 0
2018-04-23 08:40:58.049|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-04-23 08:41:00.734|WARN|phpmqtt_input.php|Connecting to MQTT server: Connection Accepted.: code: 0
2018-04-23 08:41:22.275|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-04-23 08:41:25.282|WARN|phpmqtt_input.php|Connecting to MQTT server: Connection Accepted.: code: 0

My emonpi is running what I think is the latest software:

Server Information
Emoncms Version low-write 9.8.28 : 2018.01.27
Modules Administration : App v1.1.0 : Backup v1.1.2 : EmonHub Config v1.0.0 : Dashboard v1.1.1 : EventProcesses : Feed : Graph v1.2.0 : Input : postprocess : CoreProcess : Schedule : setup : Time : User : Visualisation : WiFi v1.0.0
Buffer loading…
Writer Daemon is not running, start it at ~/scripts/feedwriter
Server OS Linux 4.9.35+
Host emonpi emonpi (127.0.1.1)
Date 2018-04-24 20:38:34 UTC
Uptime 20:38:34 up 65 days, 3:34, 1 user, load average: 0.64, 0.45, 0.36
HTTP Server Apache/2.4.10 (Raspbian) HTTP/1.1 CGI/1.1 80
MySQL Version 5.5.57-0+deb8u1
Host localhost (127.0.0.1)
Date 2018-04-24 20:38:34 (UTC 00:00‌​)
Stats Uptime: 5635188 Threads: 3 Questions: 403306 Slow queries: 0 Opens: 59 Flush tables: 1 Open tables: 51 Queries per second avg: 0.071
Redis Version 2.8.17
Host localhost:6379 (127.0.0.1)
Size 116 keys (509.30K)
Uptime 9 days
MQTT Version 1.4.14
Host localhost:1883 (127.0.0.1)
Pi CPU Temp 36.86°C
Release emonSD-26Oct17
File-system Set root file-system temporarily to read-write, (default read-only)
Memory RAM Used: 60.34% Total: 481.7 MB Used: 290.65 MB Free: 191.05 MB
Disk Mount Stats
/ Used: 63.21% Total: 3.33 GB Used: 2.1 GB Free: 1.06 GB
/boot Used: 36.32% Total: 59.95 MB Used: 21.77 MB Free: 38.17 MB
/home/pi/data Used: 2.25% Total: 3.68 GB Used: 84.89 MB Free: 3.4 GB
PHP Version 5.6.30-0+deb8u1 (Zend Version 2.6.0)
Modules apache2handler : bcmath : bz2 : calendar : Core v5.6.30-0+deb8u1 : ctype : curl : date v5.6.30-0+deb8u1 : dba : dio v0.0.4RC4 : dom v20031129 : ereg : exif v1.4 : fileinfo v1.0.5 : filter v0.11.0 : ftp : gettext : hash v1.0 : iconv : json v1.3.6 : libxml : mbstring : mcrypt : mhash : mosquitto v0.3.0 : mysql v1.0 : mysqli v0.1 : openssl : pcre : PDO v1.0.4dev : pdo_mysql v1.0.2 : Phar v2.0.2 : posix : readline v5.6.30-0+deb8u1 : redis v2.2.7 : Reflection : session : shmop : SimpleXML v0.1 : soap : sockets : SPL v0.2 : standard v5.6.30-0+deb8u1 : sysvmsg : sysvsem : sysvshm : tokenizer v0.1 : wddx : xml : xmlreader v0.1 : xmlwriter v0.1 : Zend OPcache v7.0.6-devFE : zip v1.12.5 : zlib v2.0 :

Client Information
HTTP Browser Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0
Screen Resolution 1920 x 1080
Window Size 1778 x 773

I’m pretty sure restarting emonhub will bring it back to life for a few days before it crashes again… Can anyone suggest avenues for further investigation? Is there any additional info I can share that would be helpful?

Thanks

sorry, I’ve just noticed this post:

emonhub.log (364.1 KB) doesn’t cover the period when the error first occurred looks like it has a lot of noise? I’ve attached the most recent version, but I think I need to adjust the logrotate.conf file so I keep logs over a longer period?

looking at /etc/systemd/system/mqtt_input.service I think it’s not logging by default? and I need to uncomment for it to log to syslog?

[Unit]
Description=Emoncms MQTT Input Script
After=mosquitto.service mysql.service redis.service
Documentation=https://github.com/emoncms/emoncms/blob/master/docs/RaspberryPi/MQTT.md

[Service]
Type=idle
ExecStart=/usr/bin/php /var/www/emoncms/scripts/phpmqtt_input.php

# Uncomment instead of above to use standard log file, else use systemd log
# Type=forking
# ExecStart=/bin/sh -c '/usr/bin/php /var/www/emoncms/scripts/phpmqtt_input.php 2>&1 > /var/log/mqtt_input.log &'

# Restart script if stopped
Restart=always
# Wait 60s before restart
RestartSec=60

# Tag things in the log
# View with: sudo journalctl -f -u mqtt_input -o cat
SyslogIdentifier=mqtt_input

# Un-comment to pipe log to syslog
#StandardOutput=syslog

[Install]
WantedBy=multi-user.target

I can’t figure out where I need to enable logs for feedwriter? should I edit /etc/init.d/feedwriter? or the thing that calls the init script?

sorry for the shotgun approach to asking questions, would welcome any help :slight_smile:

@mozster is your system fully up to date? I notice a divide by zero error in your emoncms.log for kwh_to_power processor. Im not sure if that’s related, you mentioned restarting emonhub fixes the issue which would suggests it isnt related. Perhaps try removing the kwh_to_power process for a period of time?

thanks @TrystanLea

I updated last week or so. so I think I’m up to date, I’ll run an update again from the admin interface and also do a full power cycle ( update.log.txt (6.8 KB) )

EDIT: I think from looking at the log above, I needed to do a manual upgrade on account of disabling RF in src.ino? I’ve done that now (git pull/re-apply changes/compile/update) emonpiupdate.log.txt (6.3 KB)

:flushed: I’ve done that now, restarted emonhub and it’s kicked back into life again, I’ve tried enabling some more logging in mqtt_input and keeping more emonhub logs, will report back if it falls over again

From the server information (really useful);

So Feedwriter Daemon is not running but this is a Low-write install (not actually sure what the impact of this is).

Next time it stops can you please run the following commands from SSH and post the output

sudo systemctl status emonhub
sudo systemctl status feedwriter
sudo systemctl status mqtt_input
sudo systemctl status mosquitto

My suspicion is the mosquitto service has stopped as the phpmqtt_input.php script cannot establish a connection.

If it has try sudo systemctl restart mosquitto then check the status. If you can get the mosquitto log it may help diagnose why it stopped.

Do you really mean just restarting emonhub or do you mean rebooting the EmonPi?

1 Like

thanks Brian, will share the requested logs next time it falls over

Yup restarting the service: sudo systemctl restart emonhub

@mozster check out Collecting Local EmonCMS debug information
maybe it helps :slight_smile:

1 Like

I think it is related, the time it occurred aligns with the feeds “last updated” time in the screendump.

This is not the first time we have seen this, checkout the “Emonhub crashed with OOM error in /var/log/messages” thread for a similar event. On that occasion I added a fix PR to emoncms to avoid divide by zero errors when using the wh_accumulator process, but this has occured with the kwh_to_power process this time round.

There are several places that need a similar fix to make the emoncms code more robust against divide by zero errors whenever an input is duplicated or even just end up getting an identical timestamp due to the use of integer timestamps.

In that thread I question how that could happen with a QoS 2 level connection, since QoS 2 messages require 4 messages to complete (2 in each direction), until all the messages are completed, it is assumed that the payload wasn’t successfully received and will be resent until all 4 parts are successful.

All the MQTT disconnections and re-connections play havoc with the chances of all 4 messages completing and because a new random client id is issued by the broker each reconnection, the 4 part interaction cannot resume after a disconnection, all 4 parts must occur in one session because emoncms doesn’t supply a reuseable client id (see the “A question over whether emonCMS is compatible with MQTT” thread for more on this).

There must be multiple elements at play here, possibly but not definitely inter-connected.

The fact the feed-writer isn’t running will mean that no feed data will be written to disk, it will all remain in the buffer, but that is not the reason the feeds page is not updating, we have seen issue when the feedwriter has stopped previously and the feeds page continues to update as normal.

Is this an emonpi or an emonBase? I suspect it is an emonPi due to the emonhub.log and screen dumps showing node 5 and the fact you are editing src.ino, but the start of the update log shows there is no emonLCD found and if you are changing the emonPi FW, you should avoid using the “update emonpi” button in emoncms as it will over-write your FW changes, you must use the “wrong” button ie emonbase for an emonpi. (see the Update EmonPi Button or Update EmonBase Button? thread for more on that).

The emonhub.log seems to show that the rfm is still working as there are many “discarding RX frame ‘unreliable content’” log messages.

In the emonhub.logs it also shows that every 5secs the MQTT connection has to be re-done and appears to be successful but then gets disconnected, perhaps restarting emonhub is giving the broker a chance to get back on it’s feet after being knocked over by emoncms? (bit of a stab in the dark there as I do not (yet?) understand how restarting emonhub can effect the broker or make the feedwriter spring back to life)

1 Like

I have checked the other processes and I think this is the only one that is currently vulnerable to this so I have submitted a PR for consideration, but I do not think it is an ideal fix. It simply tries to avoid a crash/exception.

It is not ideal to be passing a value of zero to the next process, for example if that is a log to feed, the previous good “current” value could be over-written with 0 watts just because the input was duplicated (if buffered write is not in use, see Data Viewer questions - #14 by pb66).

Thankyou @pb66. I have merged your pull request. Its not yet in the stable branch that runs on the emonpi, Il look into merging later.

@pb66 thanks - really appreciate you looking into it

I bought an emonPi PCB only and plugged it into an old RPi 1 I had spare, so don’t have the enclosure & LCD connected. I wasn’t sure at first if the update button was using a downloaded FW or recompiling after applying changes, but I’ll have a look at the thread - thanks for the link.

It shouldn’t be a problem if the kwh_power process has been removed for testing in this instance.

No problem! The emonSD update routine (AKA the emonpi updater) always re-uploads the latest.hex each time it’s run. Your best bet is to compile your own hex file and call it something else and put it somewhere safe, then upload the hex to your emonpi board using the same avrdude command but with a different path and file name. Then always use the “update emonbase” (not update emonpi) button so that the upload command fails because it’s the wrong upload baud, leaving your FW intact but updates everything else. By not using the src.ino or latest.hex files, the updates to the repo will not get blocked by local changes.

@TrystanLea can the PR I put in to stop the MQTT input throwing an exception be merged please. the exception occurs on a retry connection which is followed by a subscribe whether the connection was successful or not.

I’m not too sure how the logs above are able to be generated. It looks to me that the code hits the ‘retrying session’ log entry before the callback onConnect records the ‘connection successful’.

Just checking, do you actually have any MQTT inputs to emoncms? Just wonder if we are barking up the wrong tree here!

Also, it seems from the log emonhub cannot get a reliable MQTT connection either. What is your emonhub configuration?

Thanks @borpin I will have a look at your pull request. All @mozster’s inputs should be being posted via MQTT if they are coming from a standard configuration emonhub.

I think they are, the inputs page uses a named input and the mqtt reconnection attempts in the emonhub.log seem to be triggered by the attempts to publish the emonpi data.

Ah, OK. Had not realised MQTT was the standard way for an EmonPi. I use the HTTP API personally.

As a matter of interest, why use the MQTT interface?

@mozster - are you still having problems?

That’s a very good question.

As it stands users are trading reliability and accuracy to use MQTT “exclusively” and thus make the same data available to other SW like openHab etc.

I would recommend using http to post data to emoncms, at least until the emoncms mqtt inputs are sorted and emonhub can post mqtt timestamped data to emoncms reliably. It is possible to just set up another interfacer like the emoncmsorg one and point it at the emonPi/emonbase but to avoid double posting (timestamped via http and untimestamped via mqtt) one of 2 things needs to happen, either stop the mqtt interfacer in emonhub, but that means other SW cannot subscribe to the energy data OR disable the emoncms mqtt_input so that emonhub can continue to publish without emoncms getting that data via mqtt, but that means other SW cannot publish mqtt to emoncms.

So you can only successfully have any 2 of these 3 things emonhub publish MQTT, emoncms subscribe to MQTT or reliable and accurate data, you cannot have all 3.

Unless of course you can somehow change the topics so that data emonhub publishes is not subscribed to by emoncms mqtt input.

It is now understood that different topics are required for “local status info” and “time-sensitive data”, future emonHub versions should have the ability to double publish mqtt as single value per topic and timestamped batched data to serve both needs (discussed in depth in the EmonHub Development thread)

That “2 streams” method is basically what I’ve suggested above, just using http and the one existing mqtt interfacer for now, until there is a timestamped mqtt interfacer to replace the more reliable and accurate http method.