MERGED: emonHub 'emon-pi' V2.0.0 - fixing thread is dead issue

Between @tim @beaylott @pb66 and I we have been working on fixing ongoing issues with the emon-pi variant of emonhub that has been resulting in crashes and ‘thread is dead’ errors, as reported multiple times on the forums e.g:

@beaylott has contributed a fix that implements thread restarting, restart counting, and logging of tracebacks within threads here, which I have briefly tested and appears to work well, see: Thread exception reporting, thread restarting, restart counting... by beaylott · Pull Request #36 · openenergymonitor/emonhub · GitHub

@tim has been working on a number of improvements including systemd support with watchdog reboot which is ongoing see: RFC - systemd integration with watchdog support and restart on thread failure - #17 by TrystanLea

In a PM between myself and @pb66 we have been discussing an ongoing issue that @pb66 bought to my attention where the intended separate threads of different interfacers where not being preserved. Our discussion has resulted in a move away from using pydispatcher to a native message queue implementation, here’s my explanation copied from my post on github:


Explanation of native message queue

In order to move data between the interfacers in emonhub e.g data received by the JeeLib interfacer and then published by the MQTT interfacer, emonhub has been using a python library called pydispatcher.

I had thought that it was a neat solution, moving data between threads with the ease of use similar to MQTT. However @pb66 highlighted that there was an issue with the implementation, that the intention of one thread per interfacer was not being preserved, the MQTT interfacer in particular was actually running in the RFM2Pi thread!

Paul then described a solution that would remove the need for the external pydispatcher library with a simple native message queue implementation. Here is my attempted summary (with references to the lines of code that implement it):

  1. We start with the main thread (in emonhub.py)

  2. The main thread starts a scan of each interfacer (L143)

  3. The main thread finds a pubchannel on the RFM2Pi interfacer called ‘ToEmonCMS’ (L149)

  4. The main thread reads and removes one item from the ‘ToEmonCMS’ queue on the RFM2Pi interfacer. (L155)

  5. The main thread iterates through each interfacer looking for subchannel entries called ‘ToEmonCMS’ (L158-L162)

  6. The main thread posts a copy of the item read from RFM2Pi into the ‘ToEmonCMS’ queue on each subscribing interfacer. (L168)

  7. The subscribing interfacer then reads and acts on the queue in its own time and in its own thread.

  8. The main thread then goes on to the next interfacer checking its pubchannel’s, repeating steps 4 to 6 above.

A single item is read at a time in order to avoid the case where a runaway interfacer monopolizes the main thread which could block the other interfacers, it also allows for the simple use of pop and append.


Paul @pb66 has been quite busy over recent weeks but I have tried my best to implement his suggested approach, which appears to work well. The result is now available in the emon-pi-develop branch of emonhub and is ready for testing!

Testing emonhub emon-pi-develop branch

I would very much welcome help with testing the emon-pi-develop branch. To do this the steps are:

cd /home/pi/emonhub
git pull
git checkout emon-pi-develop
sudo service emonhub restart

I will likely move forward with merging the development branch into the main emon-pi branch in a week or so if no large issues are found as the improvements are a big step forward over the current implementation.

Thanks again to @pb66 @tim and @beaylott’s input on all of this.

1 Like

It looks good to me. In these situations I am wary of thread safety but it looks alright because there is no way for the threads to interact really as only the master pop/push’s the buffers.

I would like people to look at what I did to restart the threads as it was a bit hacky:

I’m working on a branch where I’m SENDING data over the RFM module via calls embedded into the JeeLib interfacer python code - will this still be supported by the above changes?

This is similar to how the emonGLCD time is updated.

Sounds interesting! Yes the changes made in the above branch make only minimal changes to each interfacer, Im happy to help with merging the code.

Thanks @beaylott!

Nice work guys. I’ve switched my home emonPi system to run the dev branch. All good so far, been running for about 7hrs. Although admittedly I’ve never had ‘thread is dead’ issues on my system.

You are making the assumption here that the “name” is unique across the whole system - is it?

‘name’ is referencing elements in ‘kill_list’ in the scope of the for loop. When it enters the for loop it gets set to the first element of kill_list and then the next etc. It also isn’t set within the for loop (which could cause other problems). Have I mis-understood?

Will this implementation bring together the various emonhub implementations together so there is no longer a specific one for the emonpi?

Before updating, is this a working update?

Regards
Dave

@dave Between Glyn and me, we have it running on 4 systems now without an issue, but it’s only been a couple of days.

@borpin I hope so, but we are not quite there yet.

Hi Trystan
I get the following error when trying to update…

error: Your local changes to the following files would be overwritten by checkout:
        src/emonhub.py
        src/interfacers/EmonHubMqttInterfacer.py
        src/interfacers/EmonModbusTcpInterfacer.py
Please, commit your changes or stash them before you can switch branches.
Aborting

How do you stash things in a pi?

Regards
Dave

What are the changes you have made? Happy to help with merging if it makes sense?

Hi Trystan
TBH I dont have a clue what changes Ive made, I was running Stuarts script that reads SMA inverters over bluetooth but that stopped working after an update.
I’m happy to make a backup and force git to do an update but I dont know the git commands

Regards
Dave

can you copy the results here of:

git diff
pi@emonpi(rw):emonhub$ git diff
diff --git a/src/emonhub.py b/src/emonhub.py
index aee21c0..acdbf40 100755
--- a/src/emonhub.py
+++ b/src/emonhub.py
@@ -138,6 +138,7 @@ class EmonHub(object):
                 if not I.isAlive():
                     #I.start()
                     self._log.warning(I.name + " thread is dead") # had to be r                                                                                                 estarted")
+                    self._exit = True

             # Sleep until next iteration
             time.sleep(0.2)
diff --git a/src/interfacers/EmonHubMqttInterfacer.py b/src/interfacers/EmonHubM                                                                                                 qttInterfacer.py
index d25958c..4f458cd 100644
--- a/src/interfacers/EmonHubMqttInterfacer.py
+++ b/src/interfacers/EmonHubMqttInterfacer.py
@@ -123,8 +123,14 @@ class EmonHubMqttInterfacer(EmonHubInterfacer):
                     payload = str(value)

                     self._log.info("Publishing: "+topic+" "+payload)
+
:...skipping...
diff --git a/src/emonhub.py b/src/emonhub.py
index aee21c0..acdbf40 100755
--- a/src/emonhub.py
+++ b/src/emonhub.py
@@ -138,6 +138,7 @@ class EmonHub(object):
                 if not I.isAlive():
                     #I.start()
                     self._log.warning(I.name + " thread is dead") # had to be restarted")
+                    self._exit = True

             # Sleep until next iteration
             time.sleep(0.2)
diff --git a/src/interfacers/EmonHubMqttInterfacer.py b/src/interfacers/EmonHubMqttInterfacer.py
index d25958c..4f458cd 100644
--- a/src/interfacers/EmonHubMqttInterfacer.py
+++ b/src/interfacers/EmonHubMqttInterfacer.py
@@ -123,8 +123,14 @@ class EmonHubMqttInterfacer(EmonHubInterfacer):
                     payload = str(value)

                     self._log.info("Publishing: "+topic+" "+payload)
+
                     result =self._mqttc.publish(topic, payload=payload, qos=2, retain=False)

+                    try:
+                        result =self._mqttc.publish(topic, payload=payload, qos=2, retain=False)
+                    except Exception as e:
+                        self._log.warning("Unable to publish: "+topic+" "+payload+" error: "+ str(e))
+
                     if result[0]==4:
                         self._log.info("Publishing error? returned 4")

diff --git a/src/interfacers/EmonModbusTcpInterfacer.py b/src/interfacers/EmonModbusTcpInterfacer.py
index 6afe843..27036b5 100644
--- a/src/interfacers/EmonModbusTcpInterfacer.py
+++ b/src/interfacers/EmonModbusTcpInterfacer.py
@@ -16,7 +16,7 @@ Monitors Solar Inverter using modbus tcp

Thanks @dave, the changes displayed are in relation to the proposed changes in the original thread here.

You can now discard these with

git checkout src/emonhub.py
git checkout src/interfacers/EmonHubMqttInterfacer.py

does it list any changes in: src/interfacers/EmonModbusTcpInterfacer.py? is what you copied in the full output?
If so you can also do:

git checkout src/interfacers/EmonModbusTcpInterfacer.py

Or all in one go with a single command:

git checkout .

Then you should be able to follow the steps above.

Thanks

Dave this script should be working fine in the current code base - isn’t yours ?

So there are no duplicate names in the “kill_list” variable ?

@stuart
No, I did an update once and I was never able to get it work again :frowning:
I’ll have a go again following your guide.

@TrystanLea
I’ve changed the branch and have the Your branch is up-to-date with 'origin/emon-pi-develop'.
All seems to be working so far

Regards
Dave