WARNING MainThread RFM2Pi thread is dead

Have just started a test installation and learning how to log data, but the feed only runs for a few hours and then stops.
The log shows - “WARNING MainThread RFM2Pi thread is dead”
It restarts after a reboot. I’ve looked at the thread RFM69Pi stops updating/freezes, but couldn’t find anything to help.
The system is emonTX v3 and emonbase RFM69pi, there is only 1 CT and it’s running from 9V AC.
The emonTX LED has always flashed every second.
Any help gratefully received.

Are you able to show us some emonhub.log leading up to the “WARNING MainThread RFM2Pi thread is dead” messages?

The discussion you link to is related to RFM device firmware for when devices stop receiving/processing valid packets which I do not think is the case here. You can find some relevant by searching “thread is dead”, this issue is more likely to be (the emonpi variant of) emonhub than the rfm69pi itself.

When I download the log file from the emonhub page, it has no other content except
2016-12-06 15:17:01,182 WARNING MainThread RFM2Pi thread is dead
2016-12-06 15:17:01,408 WARNING MainThread RFM2Pi thread is dead
2016-12-06 15:17:01,636 WARNING MainThread RFM2Pi thread is dead
for the next hour or so.
Is there a way to get the full log file from a reboot?

You could reboot and wait until it reoccurs to see the log, but if you don’t catch it when it happens you could find yourself in the same position again.

Are you running an emonSD image? If your OS is configured to be read-only then the logs are only held in RAM and lost on a reboot, so after a reboot you have no choice but wait for a recurrence,

Can you SSH into the Pi? If so the logfiles can be found at /log/var/emonhub/emonhub.log, there may also be some rotated logs or archives depending on how long it’s been running and the activity levels. Use ls -la /log/var/emonhub for a list of logfiles and a utility like nano or less to read the logfiles eg nano /log/var/emonhub/emonhub.log.

1 Like

Yeh I have this, I started a thread a while ago but couldn’t find the cause EmonHUB Crashing - #24 by Dave
The way I overcame this is by putting a watchdog on NodeRED that monitors the update of my emonpi and if it exceeds 60 secs it restarts emonhub service and pushes me a message.
Every now and again thou it crashes and the watchdog doesn’t pick it up.
Just import the below code into NodeRED…

Regards
Dave

[{“id”:“9936972b.8753f8”,“type”:“pushover”,“z”:“a5ce04b2.5a31f8”,“name”:“Pushover”,“device”:"",“title”:“Node-RED”,“priority”:0,“sound”:“pushover”,“url”:"",“url_title”:"",“x”:977.78564453125,“y”:375.59124755859375,“wires”:[]},{“id”:“74c97519.99dcdc”,“type”:“mqtt in”,“z”:“a5ce04b2.5a31f8”,“name”:“Watchdog”,“topic”:“emon/emonpi/TotalGeneration”,“qos”:“2”,“broker”:“a06667e4.71e22”,“x”:75.70233154296875,“y”:431.24600982666016,“wires”:[[“11da4deb.ffda1a”,“a88f159e.1b5d28”]]},{“id”:“11da4deb.ffda1a”,“type”:“trigger”,“z”:“a5ce04b2.5a31f8”,“op1”:"",“op2”:“Emoncms has stopped updating!”,“op1type”:“nul”,“op2type”:“str”,“duration”:“60”,“extend”:true,“units”:“s”,“reset”:"",“name”:“Wait 60 Secs”,“x”:279.7023162841797,“y”:414.24600982666016,“wires”:[[“326f6f17.065868”,“e6a5be2.31155c”]]},{“id”:“326f6f17.065868”,“type”:“delay”,“z”:“a5ce04b2.5a31f8”,“name”:“5 Sec Delay”,“pauseType”:“delay”,“timeout”:“5”,“timeoutUnits”:“seconds”,“rate”:“1”,“rateUnits”:“second”,“randomFirst”:“1”,“randomLast”:“5”,“randomUnits”:“seconds”,“drop”:false,“x”:264.7023162841797,“y”:478.24600982666016,“wires”:[[“11da4deb.ffda1a”]]},{“id”:“e6a5be2.31155c”,“type”:“delay”,“z”:“a5ce04b2.5a31f8”,“name”:“Limit Messages”,“pauseType”:“rate”,“timeout”:“5”,“timeoutUnits”:“seconds”,“rate”:“1”,“rateUnits”:“minute”,“randomFirst”:“1”,“randomLast”:“5”,“randomUnits”:“seconds”,“drop”:true,“x”:463.70233154296875,“y”:430.24600982666016,“wires”:[[“cb7c12af.69c4f”,“9936972b.8753f8”,“57380ccf.2fb024”]]},{“id”:“55abcfae.660fb8”,“type”:“comment”,“z”:“a5ce04b2.5a31f8”,“name”:“Emoncms Watchdog”,“info”:"",“x”:117.70233154296875,“y”:378.24600982666016,“wires”:[]},{“id”:“57380ccf.2fb024”,“type”:“exec”,“z”:“a5ce04b2.5a31f8”,“command”:“sudo service emonhub restart”,“addpay”:true,“append”:"",“useSpawn”:"",“name”:“Reset emonHub”,“x”:726.9522552490234,“y”:481.9960708618164,“wires”:[[],[],[]]},{“id”:“da8c97ba.3b91b”,“type”:“inject”,“z”:“a5ce04b2.5a31f8”,“name”:“Restart emonHub Now?”,“topic”:"",“payload”:"",“payloadType”:“date”,“repeat”:"",“crontab”:"",“once”:false,“x”:493.45225524902344,“y”:481.24600982666016,“wires”:[[“57380ccf.2fb024”]]},{“id”:“ccf9ca21.a8c25”,“type”:“file”,“z”:“a5ce04b2.5a31f8”,“name”:“Logfile”,“filename”:"/home/pi/watchdog.txt",“appendNewline”:true,“createDir”:false,“overwriteFile”:“false”,“x”:808.7023315429688,“y”:430.24600982666016,“wires”:[]},{“id”:“cb7c12af.69c4f”,“type”:“function”,“z”:“a5ce04b2.5a31f8”,“name”:“Add datestamp”,“func”:“var datestamp = Date();\nmsg.payload = ((“RFM2Pi Reset script called - “) + (datestamp));\nreturn msg;”,“outputs”:1,“noerr”:0,“x”:658.7023315429688,“y”:430.24600982666016,“wires”:[[“ccf9ca21.a8c25”]]},{“id”:“a88f159e.1b5d28”,“type”:“trigger”,“z”:“a5ce04b2.5a31f8”,“op1”:””,“op2”:“Emoncms has crashed!”,“op1type”:“nul”,“op2type”:“str”,“duration”:“5”,“extend”:true,“units”:“min”,“reset”:"",“name”:“Wait 5 Mins”,“x”:485.7023162841797,“y”:525.2460403442383,“wires”:[[“fbeb29f8.631688”,“ca133f4a.e7ca”,“9936972b.8753f8”,“a2f784f0.808498”]]},{“id”:“fbeb29f8.631688”,“type”:“delay”,“z”:“a5ce04b2.5a31f8”,“name”:“5 Sec Delay”,“pauseType”:“delay”,“timeout”:“5”,“timeoutUnits”:“seconds”,“rate”:“1”,“rateUnits”:“second”,“randomFirst”:“1”,“randomLast”:“5”,“randomUnits”:“seconds”,“drop”:false,“x”:482.7023162841797,“y”:572.2460098266602,“wires”:[[“a88f159e.1b5d28”]]},{“id”:“a2f784f0.808498”,“type”:“exec”,“z”:“a5ce04b2.5a31f8”,“command”:“sudo reboot”,“addpay”:true,“append”:"",“useSpawn”:"",“timer”:"",“name”:“Reboot”,“x”:696.9522552490234,“y”:593.9960708618164,“wires”:[[],[],[]]},{“id”:“a9976826.0c7a4”,“type”:“inject”,“z”:“a5ce04b2.5a31f8”,“name”:“Reboot Now?”,“topic”:"",“payload”:"",“payloadType”:“date”,“repeat”:"",“crontab”:"",“once”:false,“x”:503.45225524902344,“y”:615.2460098266602,“wires”:[[“a2f784f0.808498”]]},{“id”:“816d1cc0.42e048”,“type”:“file”,“z”:“a5ce04b2.5a31f8”,“name”:“Logfile”,“filename”:"/home/pi/watchdog.txt",“appendNewline”:true,“createDir”:false,“overwriteFile”:“false”,“x”:856.7023162841797,“y”:529.2460403442383,“wires”:[]},{“id”:“ca133f4a.e7ca”,“type”:“function”,“z”:“a5ce04b2.5a31f8”,“name”:“Add datestamp”,“func”:“var datestamp = Date();\nmsg.payload = ((“RFM2Pi Reset script called - “) + (datestamp));\nreturn msg;”,“outputs”:1,“noerr”:0,“x”:706.7023162841797,“y”:530.2460403442383,“wires”:[[“816d1cc0.42e048”]]},{“id”:“a06667e4.71e22”,“type”:“mqtt-broker”,“z”:””,“broker”:“127.0.0.1”,“port”:“1883”,“clientid”:"",“usetls”:false,“compatmode”:true,“keepalive”:“15”,“cleansession”:true,“willTopic”:"",“willQos”:“0”,“willPayload”:"",“birthTopic”:"",“birthQos”:“0”,“birthPayload”:""}]

Happened again, but also missed the logs. I did ssh into the emonpi but the logs have been rotated and overwritten. I did notice though in the live log view that there was a lot of these lines at the end of the last logfile I manually download:

DEBUG    RFM2Pi     Discarding RX frame 'unreliable content'? 31 98 9 58 67 217 94 163 226 205 243 207 228 125 161 100 58 139 222 174 113 (-89)
2016-12-06 22:09:05,773 DEBUG    RFM2Pi     Discarding RX frame 'unreliable content'? 9 158 166 121 123 42 146 129 124 66 231 142 23 220 224 204 223 167 145 156 197 (-87)
2016-12-06 22:09:06,142 DEBUG    RFM2Pi     Discarding RX frame 'unreliable content'? 5 255 180 141 75 109 255 251 187 254 137 122 119 175 196 143 120 1 251 213 197 (-86)
2016-12-06 22:09:07,377 DEBUG    RFM2Pi     Discarding RX frame 'unreliable content'? 7 7 47 211 159 211 46 226 136 214 229 157 227 145 199 235 114 224 27 165 176 (-85)

however it recovers and keeps working.

Caught the log file this time! But it doesn’t look very helpful. Here are the lines before the crash:

2016-12-07 09:13:20,150 DEBUG    RFM2Pi     17800 Sent to channel(end)' : ToEmonCMS
2016-12-07 09:13:20,254 DEBUG    RFM2Pi     17801 NEW FRAME : OK 8 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (-38)
2016-12-07 09:13:20,258 DEBUG    RFM2Pi     17801 Timestamp : 1481062400.25
2016-12-07 09:13:20,258 DEBUG    RFM2Pi     17801 From Node : 8
2016-12-07 09:13:20,259 DEBUG    RFM2Pi     17801    Values : [10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
2016-12-07 09:13:20,260 DEBUG    RFM2Pi     17801      RSSI : -38
2016-12-07 09:13:20,260 DEBUG    RFM2Pi     17801 Sent to channel(start)' : ToEmonCMS
2016-12-07 09:13:20,261 INFO     RFM2Pi     Publishing: emon/emontx3/power1 10
2016-12-07 09:13:20,263 INFO     RFM2Pi     Publishing: emon/emontx3/power2 0
2016-12-07 09:13:20,264 INFO     RFM2Pi     Publishing: emon/emontx3/power3 0
2016-12-07 09:13:20,266 INFO     RFM2Pi     Publishing: emon/emontx3/power4 0
2016-12-07 09:13:20,268 INFO     RFM2Pi     Publishing: emon/emontx3/vrms 0
2016-12-07 09:13:20,301 INFO     RFM2Pi     Publishing: emon/emontx3/temp1 0
2016-12-07 09:13:20,306 INFO     RFM2Pi     Publishing: emon/emontx3/temp2 0
2016-12-07 09:13:20,309 INFO     RFM2Pi     Publishing: emon/emontx3/temp3 0
2016-12-07 09:13:20,312 INFO     RFM2Pi     Publishing: emon/emontx3/temp4 0
2016-12-07 09:13:20,314 INFO     RFM2Pi     Publishing: emon/emontx3/temp5 0
2016-12-07 09:13:20,316 INFO     RFM2Pi     Publishing: emon/emontx3/temp6 0
2016-12-07 09:13:20,318 INFO     RFM2Pi     Publishing: emon/emontx3/pulse 0
2016-12-07 09:13:20,320 INFO     RFM2Pi     Publishing: emon/emontx3/rssi -38
2016-12-07 09:13:20,322 INFO     RFM2Pi     Publishing: emonhub/rx/8/values 10,0,0,0,0,0,0,0,0,0,0,0
2016-12-07 09:13:20,323 INFO     RFM2Pi     Publishing: emonhub/rx/8/rssi -38
2016-12-07 09:13:20,325 DEBUG    RFM2Pi     17801 adding frame to buffer => [1481062400.254247, 8, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -38]
2016-12-07 09:13:20,326 DEBUG    RFM2Pi     17801 Sent to channel(end)' : ToEmonCMS
2016-12-07 09:13:20,526 WARNING  MainThread RFM2Pi thread is dead
2016-12-07 09:13:20,755 WARNING  MainThread RFM2Pi thread is dead
2016-12-07 09:13:20,982 WARNING  MainThread RFM2Pi thread is dead
2016-12-07 09:13:21,210 WARNING  MainThread RFM2Pi thread is dead
2016-12-07 09:13:21,437 WARNING  MainThread RFM2Pi thread is dead
2016-12-07 09:13:21,664 WARNING  MainThread RFM2Pi thread is dead
2016-12-07 09:13:21,892 WARNING  MainThread RFM2Pi thread is dead
2016-12-07 09:13:22,120 WARNING  MainThread RFM2Pi thread is dead
2016-12-07 09:13:22,348 WARNING  MainThread RFM2Pi thread is dead
2016-12-07 09:13:22,575 WARNING  MainThread RFM2Pi thread is dead
2016-12-07 09:13:22,818 WARNING  MainThread RFM2Pi thread is dead
2016-12-07 09:13:23,046 WARNING  MainThread RFM2Pi thread is dead
2016-12-07 09:13:23,273 WARNING  MainThread RFM2Pi thread is dead
2016-12-07 09:13:23,500 WARNING  MainThread RFM2Pi thread is dead

This might be a clue - when I ssh’d to the pi and checked the status, this is what showed:
[email protected](ro):/var/log/emonhub$ sudo service emonhub status
● emonhub.service - LSB: Start/stop emonHub
Loaded: loaded (/etc/init.d/emonhub)
Active: active (exited) since Tue 2016-12-06 20:54:24 AEDT; 12h ago
Process: 522 ExecStart=/etc/init.d/emonhub start (code=exited, status=0/SUCCESS)

Dec 06 20:54:24 emonpi systemd[1]: Started LSB: Start/stop emonHub.
Dec 06 22:25:22 emonpi emonhub[522]: Exception in thread RFM2Pi:
Dec 06 22:25:22 emonpi emonhub[522]: Traceback (most recent call last):
Dec 06 22:25:22 emonpi emonhub[522]: File “/usr/lib/python2.7/threading.py”, line 810, in __bootstrap_inner
Dec 06 22:25:22 emonpi emonhub[522]: self.run()
Dec 06 22:25:22 emonpi emonhub[522]: File “/home/pi/emonhub/src/interfacers/emonhub_interfacer.py”, line 73, in run
Dec 06 22:25:22 emonpi emonhub[522]: rxc = self.read()
Dec 06 22:25:22 emonpi emonhub[522]: File “/home/pi/emonhub/src/interfacers/EmonHubJeeInterfacer.py”, line 127, in read
Dec 06 22:25:22 emonpi emonhub[522]: c.rssi = int(f[-1][1:-1])
Dec 06 22:25:22 emonpi emonhub[522]: ValueError: invalid literal for int() with base 10: ‘-31(-90’
[email protected](ro):/var/log/emonhub$ sudo service emonhub restart

maybe a python error “invalid literal”? (I’m not sure if this is significant or not)

These “unreliable content” entries are packets that have failed crc checks within the rfm firmware, probably interference, they are only passed from the rfm69pi to emonhub for the purpose of logging them to the log files for debugging, they are not passed any further and are not going to be the cause of the “thread is dead” issue but may shed some light, they can be turned off by setting “quiet = true” in the rfm2pi settings in emonhub.conf

Actually the line immediately before the first “thread is dead” shows us the code returned succesfully after passing the data to both the MQTT and the emonCMS interfacers before crashing, that helps narrow the search considerably, I will need to look closer at the code to determine what that means exactly, but it’s a step forward.

It will probably be tomorrow before I make much sense of it though, as the old grey matter is starting to shut down for the night.

Yes very helpful, it is seeing the string " -31(-90 " as an RSSI value which sould be an integer, the space is missing to create 2 values “-31” and “(-90)”, I will need to take a closer look tomorrow.

EDIT - actually I’m not sure how relevant it is as the time and dates do not agree (or is it time diff?)

So on that occasion emonhub tripped up at line 127 of EmonHubJeeInterfacer.py, c.rssi = int(f[-1][1:-1]) because the “-31(-90” value cannot be cast to an int.

The proceeding line 126 must have been true to reach that line, so the value was actually " (-31(-90) " before having the brackets stripped

We cannot tell from the traceback provided what the rest of the packet was like, we can assume that the correct RSSI was probably " (-31) " as a RSSI of " (-90) " would have been prefixed with a “?” and not reached this section of code, so 2 questions that still need answering are “Was the rest of the packet intact or was there other defects too?” and “Where did the " (-90 " come from?”

If there was a buffer being reused without properly clearing it beforehand, and the later packet was smaller so the tail of the previous packet was not overwritten, I would expect the value to have been " (-31)-90) " (note the bracket in the centre is the other way round) so could this be serial line corruption or a poorly formed string passed faithfully?

(much of this is “notes to self” in case I have to refer back to it later!)

@icenov - Are you able to edit /home/pi/emonhub/src/interfacers/EmonHubJeeInterfacer.py to try a fix? I have submitted a PR but it should be tested before it’s pulled in. ( I will try and set up a test rig as I do not use this version)

So these lines

need to be changed to

note that Python is very fussy about indentation.

Because the timestamps and values do not marry up between the traceback and the logs shown we do not know if this issue is definitely the same, but it is probable. If it reoccurs can you try to provide a “matching pair” of traceback from the service status and emonhub.log section? unfortunately it may involve trawling back through the logs a bit,

I’ll give it a try but traveling for the next 2 days, so a bit of a delay I’m afraid.

The change I suggested has now been merged into the main repo so just doing a normal emonPi update should pull in the changes, let us know how it goes.

I did the update which pulled down the changes, but unfortunately the problem is still there. I’ll try and reset everything to default and start again.

Unfortunately that doesn’t surprise me much. The fix was for the specific error seen in the traceback that wasn’t necessarily the same error seen in the emonhub logs, the timestamps weren’t even the same.

Can you show us any logs or the service status tracebacks again and we’ll hava another go.

I’ve replaced the AC/AC power supply with the 3.5 V DC battery pack and it seems to be working fine now. I’m not sure why, but its been running smoothly for about 24hr now, which I’ve never achieved before. My AC supply was from a small transformer I had lying around - checked with multimeter and was definitely 9 VAC.

Is the transformer supplying the emonTx? What was the actual voltage it was giving you? The “shop” a.c. adapter is rated at 9 V, but that is at full load. It actually gives 11.6 V under no-load at the nominal mains voltage, and the a.c. supply in the emonTx was designed only for use with that. The power supply was not designed for use with any other transformer.

Yes - only supplying the emonTx. Measured 9 VAC under load with multimeter (normal operation, not sure how the load changes over time, but was logging).

In that case, it’s almost certain that the power supply inside the emonTx was not working properly due to insufficient voltage, and the emonTx was possibly restarting itself either randomly, or when it recovered after the power supply had collapsed when trying to supply the extra power required to transmit the data.

Which could also be the source of the corrupt serial prints (eg “-31(-90”) that we have now created error handling for.