Stops running after an hour or so

Hi pb, thanks for the reply

It stopped posting data, so I copied the log file and posted it, I then changed the ‘quiet = false’ setting, followed by a restart.

Sorry I can’t seem to quote properly on my device but anywhere I have said it has frozen or stopped, I mean it is no longer logging data as coming in (in the emonHUB View Log screen), and is no longer logging data as going out (in the emonHUB View Log screen), and emonCMS is also not receiving data, It literally stops/freezes, at which point I try to grab the log files I hoped would help.

When I mentioned ‘it froze then restarted’, I mean the emonHUB stopped reporting data coming in (in the emonHUB View Log screen), and going out (in the emonHUB View Log screen), and I was also not getting any data reported in emonCMS. When I say it restarted, I mean my connection to the emonHUB was lost, followed by a short wait, me needing to log back in and followed by the typical startup log being reported in emonHUB, which I assumed was a system restart. I just happened to grab a copy of that was in the log before this happened.

What ideally you need from me to be able to help in addition to the logs I am posting around when the ‘freezing’ and ‘rebooting’ happens? would you like a dump of all log files from the entire system? (thanks for increasing the limit so I can do this)

Thanks

That log excerpt contains this line

2017-12-01 17:17:25,565 INFO     MainThread Setting RFM2Pi quiet: 0 (0q)

which is the result of quiet=false already being set, unless you had already changed it since posting the emonhub.conf. As long as it’s set now that’s fine, it’s just trying to piece all the snippets of info together is difficult. What we need is a decent size emonhub.log that starts way before the issue occurs and extends well beyond that time. That will paint the best picture of what is happening.

We need to determine if the emonhub service is actually stopping, and we need to determine if there is an issue with the clock or if logging is being paused due to disk space, I have suggested some checks in my last post.

In my last post I also suggested disabling openhab for now.

The emonhub config page where you see the logs is limited in it’s ability, it only starts showing the log from when you open the page. This doesn’t mean that’s all there is, it just means that is all that has been logged since you opened the page. You need to use SSH and view/copy the files directly and try not to be tempted by that restart button until you have checked the service is stopped or in trouble. At the moment restarting or rebooting isn’t helping you find the cause, you are better off exploring whilst it isn’t working rather than restarting it.

Then I can only assume that log file data was after it was restarted, and everything before had been cleared out. I was sure I took it before but clearly not.

Right well, what ill do is see if it happens again (not gone down for a while), and if/when it does, ill post every log I can find via SSH, I have seen quite a few log related files in the log folder, ill grab them all and post them all and not restart anything.

Thanks for looking at the what I have so far though, appreciate it.

Ok, so its stopped posting any data again/no, I have not restarted it and just logged in via SSH to get all the log files. Ill start with the things in troubleshooting section.

The last reading I had posted was 4 hours ago (so 9pm 2nd Dec)

pi@emonpi(ro):~$ sudo service emonhub status -l

● emonhub.service - LSB: Start/stop emonHub
   Loaded: loaded (/etc/init.d/emonhub)
   Active: active (running) since Sat 2017-12-02 17:17:23 UTC; 7h ago
  Process: 502 ExecStart=/etc/init.d/emonhub start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/emonhub.service
           └─708 python /usr/share/emonhub/emonhub.py --config-file /home/pi/...

Dec 02 17:17:23 emonpi sudo[640]: pam_unix(sudo:session): session opened fo...0)
Dec 02 17:17:23 emonpi sudo[640]: pam_unix(sudo:session): session closed fo...ot
Dec 02 17:17:23 emonpi sudo[649]: root : TTY=unknown ; PWD=/ ; USER=root ; ...ub
Dec 02 17:17:23 emonpi sudo[649]: pam_unix(sudo:session): session opened fo...0)
Dec 02 17:17:23 emonpi sudo[649]: pam_unix(sudo:session): session closed fo...ot
Dec 02 17:17:23 emonpi sudo[668]: root : TTY=unknown ; PWD=/ ; USER=root ; ...ub
Dec 02 17:17:23 emonpi sudo[668]: pam_unix(sudo:session): session opened fo...0)
Dec 02 17:17:23 emonpi sudo[668]: pam_unix(sudo:session): session closed fo...ot
Dec 02 17:17:23 emonpi emonhub[502]: Starting OpenEnergyMonitor emonHub: em...k.
Dec 02 17:17:23 emonpi systemd[1]: Started LSB: Start/stop emonHub.
Hint: Some lines were ellipsized, use -l to show in full. 

pi@emonpi(ro):~$ tail /var/log/emonhub/emonhub.log

(its empty)

pi@emonpi(ro):~$

pi@emonpi(ro):~$ sudo service mqtt_input status

● mqtt_input.service - Emoncms MQTT Input Script
   Loaded: loaded (/etc/systemd/system/mqtt_input.service; enabled)
   Active: active (running) since Sat 2017-12-02 17:55:31 UTC; 7h ago
     Docs: https://github.com/emoncms/emoncms/blob/master/docs/RaspberryPi/MQTT.md
 Main PID: 2215 (php)
   CGroup: /system.slice/mqtt_input.service
           └─2215 /usr/bin/php /var/www/emoncms/scripts/phpmqtt_input.php

Dec 02 17:55:31 emonpi systemd[1]: Started Emoncms MQTT Input Script.

pi@emonpi(ro):~$ tail /var/log/emoncms.log

2017-12-02 17:55:24.000|ERROR|phpmqtt_input.php|exception 'Mosquitto\Exception' with message 'The client is not currently connected.' in /var/www/emoncms/scripts/phpmqtt_input.php:112
Stack trace:
#0 /var/www/emoncms/scripts/phpmqtt_input.php(112): Mosquitto\Client->loop()
#1 {main}
2017-12-02 17:55:24.001|WARN|phpmqtt_input.php|Not connected, retrying connection
2017-12-02 17:55:33.260|ERROR|phpmqtt_input.php|exception 'Mosquitto\Exception' with message 'The client is not currently connected.' in /var/www/emoncms/scripts/phpmqtt_input.php:112
Stack trace:
#0 /var/www/emoncms/scripts/phpmqtt_input.php(112): Mosquitto\Client->loop()
#1 {main}
2017-12-02 17:55:33.282|WARN|phpmqtt_input.php|Not connected, retrying connection

pi@emonpi(ro):~$ date

(correct value)

Sun 3 Dec 00:59:37 UTC 2017

Now for the log files found within \var\log

Archive.zip (1.1 MB)

ill leave the emonHUB unreset so if you need anything I have missed, just let me know

Thanks

it actually started logging again at some point after I was using SSH to gather those files… :roll_eyes: I just noticed after I finished my post

However if you look at the Archive.zip content there is an emonhub.log.1 and an emonhub.log.2 where the log file has been rotated out.

I had a very brief look (as I’m doing other stuff today) and there deosn’t appear to be anything of huge concern towards the end of the file, the last packet received was fully processed, it simply looks like the packets to stopped coming, therefore everything (emonhub, emoncms and the log files) just appears to have stopped because there isn’t anything to process.

Can you confirm the LED on the rfm2pi is flashing (when running ok and when “stopped”)?

Try creating this little script as /home/pi/reset_rfm2pi.py, you will need to use the rpi-rw command first to make the filesytem writable.

#!/usr/bin/env python
"""
"""

__author__ = 'Paul Burnell (pb66)'
__date__ = '14-05-2015'
# http://openenergymonitor.org/emon/node/5549

import RPi.GPIO as GPIO
import time

pin = 7 # P1-7 (BCM pin 4 or WiringPi pin 7)

try:
	GPIO.setmode(GPIO.BOARD)
	GPIO.setup(pin, GPIO.OUT)
	GPIO.output(pin, GPIO.HIGH)
	time.sleep(0.12)
	GPIO.output(pin, GPIO.LOW)
	GPIO.cleanup()
except Exception as e:
	print(e)

make it executable

sudo chmod +x /home/pi/reset_rfm2pi.py

and next time it stops try running it with

sudo /home/pi/reset_rfm2pi.py

It simply pulses the reset line for the rfnm2pi causing the processor to restart.

If this kickstarts everything back off then it is the RFM2Pi that is freezing and restarting emonhub is simply resetting the serial port and the rfm2pi in the process. (See RFM69Pi stops updating/freezes and the many linked threads on the old forum for further info on this)

However! I have never seen a lockout rfm2pi restart of it’s own accord. This makes me think this might be an RF collision issue. Although I have never experienced it, in theory it might be possible the last received packet was the last packet transmitted in “clean air”, the next packet may have started transmitting (or deferred transmitting) before this one had finished and the next before this one finished, effectively getting a rolling blockage. 4hrs seems along time, but with 20 nodes?

You can also check the rfm2pi is seated correctly (square and straight) and it’s not shorting on the unused gpio pins below the antenna wire.

Hello,

Yes now when I log in, I do see .1 versions of log files, would you like me to upload them for you?

The LED on the rfm2pi is not flashing, it is a solid red, the system is again not processing/receiving anything, with it is working as expected, the LED flashes, I assume every time it gets a packet.The RF module seems seated correctly, it is sitting flat and fully down on the pins, the emonHUB is located somewhere it cant be knocked about to upset any connections.

Ill grab the rest of the log files and store them in case you would like to see them, create the script you posted (thanks for that) and let you know how it goes.

Thanks

Here are the .1 and .2 files you mentioned

Archive 2.zip (931.6 KB)

I have run your script by the way, as soon as I did, the RF module started flashing, and data started getting posted. I will have a read about the issue in the link you provided.

No need, you already have. unless they have changed and have something new to tell us. No worries, you’ve posted them whilst I was mid post (the phone rang)

That doesn’t sound right, if/when you use the script can you also view the led? Can you confirm it gives a 1 second flash at reset? The flashes on receipt of data should be noticeably shorter.

Do you have another RFM69Pi handy if needed?

From what I recall, the issue as it was previously didn’t put the led on permanently (I would need to read the other threads to confirm) can you please try swapping the rfm69pi if you have another one handy?

It now sounds like the rfm69pi might be failing during startup, why it is restarting and subsequently failing to do so, I have no idea at this time. Tell us about your power supply, do you have an alternative (beefier) 5v dc supply you can try?

I just ran the script again whilst looking at the LED’s, it gave a one second flash, then resumed the shorter flashes.

I do have another new emonHUB setup, including the power supplies purchased with it, ill switch out the RFM69pi now for you, shall I also change power supply?

I do not have any alternative power supplies other than the ones you can buy from the shop with the emonHUB.

IMO and this is mainly unproven, the power supply to the rfm69cw (or rfm12b) may be the cause of the previously known issue (this issue may well be different though). I believe the 5vdc going into the pi which is then passed out to the rfm2pi and dropped to 3.3v, then distributed to the cpu and the rfm69cw may not be stable enough at the rfm69pi connection, I have not experienced the issue more than a handful of times in the years I have been pursuing the issue on behalf of others, so I cannot test for it myself. this maybe because I have only ever used quality 3amp psu’s with any RF based installs.

You may need to consider another psu if for no other reason than to eliminate it from suspicion IF nothing else turns up.

I do think you should try a different rfm69pi board as this issue appears to have different symptoms and is more in keeping with a handful of instances of bad rfm modules (or connections to that module) seen on emonTx’s. We haven’t seen this issue before on RFM2Pi so swapping out is a good test, it’s unlikely you will have 2 faulty boards as it is that rare.

If the fault continues, it is most likely a PSU issue (PSU itself, conn to PI, Pi itself or conn to RFM2Pi), although I cannot rule it out completely, it’s less likely to be an RF traffic/collision based issue unless the RFM module’s duty cycle is causing it to crash out (possibly still

We can eliminate a faulty PSU (but not an under powered one), faulty Pi or faulty RFM2Pi board by substitution as you have multiple units (do not swap everything out at once as that tells you nothing if it works ok).

Another thing you might want to consider is purchasing a JeeLink, it might not help much with this specific issue, but getting this far would have been easier and you could plug the JeeLink into the emonBase (Pi can supply up to 1.2a via USB so it may be more stable or a better physical connection than gpio) and temporarily point the [[RFM2Pi]] interfacer to /dev/ttyUSB0 to test.

I wouldn’t want to be without mine now, basically it is a 16MHz (faster and more stable) USB “RFM2Pi” type receiver. You can plug it into your PC and get an instant picture of whats happening “in the air” by using a serial console (eg the Arduino IDE). The power supply to it via the PC’s usb port will be more stable and it will run the same RFM2Pi sketches. It would prove invaluable to you whilst trying to set this project up and for debugging now and possibly later.

By plugging the JeeLink into a laptop you can walk around the building and see all the nodes by switching between the floors/groups (typing just 210g or 209g etc into the console) you can test for RF dead spots etc.

I’m not suggesting you need one, but IMO managing, debugging or developing for RFM networks is soooo much easier with one, if you might end up getting one down the line, might as well get it now and make use of it here.

Hi,

That JeeLink sounds great! I will get one ordered. Can you recommend a power supply?

I need to deploy asap, and in one of the forum posts you linked to earlier, someone mentioned making a watchdog. I know it’s not a fix but wondered if you happened to have something that perhaps looks for no activity, then calls your reset script you posted earlier.

Till then, ill switch out bits one by one so we can work out maybe what the cause is

The power supplies I use are “Genuine LA-530 5V 3000mA”

https://www.ebay.co.uk/itm/Genuine-LA-530-5V-3000mA-3A-Mains-AC-DC-Adaptor-Power-Supply-Charger-MICRO-B-USB/152575335774?hash=item238632dd5e:g:I78AAOSwPh5ZNpST

is where I have got them before (£10.98) but I have just spotted this significantly cheaper listing (£8.75) I might buy a couple more for the stock cupboard:-)

https://www.ebay.co.uk/itm/5V-3000ma-3A-Mains-Micro-USB-Power-Adapter-Charger-for-Raspberry-Pi-3-2-zero/112388117523?hash=item1a2ada9413:g:khYAAOSwYmZXOGET

but as always, on fleabay “genuine” doesn’t always mean genuine but returns are really easy when the seller says “genuine” and they turn out not to be.

I got hooked on these power supplies through using an HD-PVR years ago, every conceivable issue with an an HD-PVR always boiled down to the same thing on their forums, using a different PSU than these ones supplied as standard (different plug but same manufacturer and spec).

[edit] @Paul might be able to give you some pointers on the watchdog, I have never had cause to set one up. But please do not enable it whilst we are debugging as that will hide the issue and delay finding a solution, enabling the watchdog should only occur once you’ve given up or run out of time.

I use node-red to act as a watchdog for emoncms, using a simple flow as attached below.
The flow monitors emoncms by MQTT subscribing to one of the feeds, and if it is not updated at least every 90 seconds (can be any time period), then it sends the message “Emoncms has stopped updating!” out via the delay node msg.payload.
You can attach anything that you want to the delay node, a push node, email node, twitter, Twilio etc.
I use a ‘Pushover’ node to push the message to my mobile.

[{"id":"55067d65.aaf984","type":"mqtt in","z":"7fc7e6a6.803818","name":"Watchdog","topic":"trigger","qos":"0","broker":"ed3cf937.12c308","x":90,"y":431,"wires":[["882ad1c7.0555d"]]},{"id":"882ad1c7.0555d","type":"trigger","z":"7fc7e6a6.803818","op1":"","op2":"Emoncms  has stopped updating!","op1type":"nul","op2type":"str","duration":"90","extend":true,"units":"s","reset":"","name":"","x":269.8957824707031,"y":430.7777404785156,"wires":[["9cab2767.b157d8","8080a96.0362358"]]},{"id":"9cab2767.b157d8","type":"delay","z":"7fc7e6a6.803818","name":"","pauseType":"delay","timeout":"5","timeoutUnits":"seconds","rate":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"x":265.89581298828125,"y":473.7777404785156,"wires":[["882ad1c7.0555d"]]},{"id":"8080a96.0362358","type":"delay","z":"7fc7e6a6.803818","name":"Limit Messages","pauseType":"rate","timeout":"5","timeoutUnits":"seconds","rate":"1","rateUnits":"hour","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":true,"x":473.8957824707031,"y":429.7777404785156,"wires":[[]]},{"id":"ed3cf937.12c308","type":"mqtt-broker","z":"","broker":"192.168.1.8","port":"1883","clientid":"","usetls":false,"verifyservercert":true,"compatmode":true,"keepalive":"15","cleansession":true,"willTopic":"","willQos":"0","willRetain":null,"willPayload":"","birthTopic":"","birthQos":"0","birthRetain":null,"birthPayload":""}]

Paul

How often does it get triggered these days @Paul ? I’ve not heard of any new cases for a while now.

Funny you should ask…
It’s only triggered when I’ve had a low signal strength blip, and since I’ve used a ground plane, it doesn’t triggered at all.
The big change came when I updated to Raspbian Stretch, the old problem stopped completely, and I haven’t had to restart the RFM2Pi once. It’s been rock-solid since.

Paul

that’s not a bad idea on the node red watcher but at this rate, ill be getting quite a few emails as it has gone down once more with the new module on, I have the red solid light again on the RF module.

Ill swap power supplies over now, come to think of it, I have one of these kits so I will try that.

In the Raspberry PI in the above kit, I have it doing some processing of temp sensors from another setup of sensors we already had which involves connected a hub to the PI via USB, it also has a mouse and keyboard attached, anyways, when I plugged in the OOM power supply into the above PI, it went ‘mental’ as in, the keyboard didn’t work, the mouse didn’t work, and all the lights on the device I have attached to it via USB started flashing in a very unusual way… maybe the power supply is the cause, I suppose time will tell!

Yes some pointers on a watchdog that calls your script Paul would be a great help, I won’t enable till I deploy.

While I was writing this post, it restarted. The PI seemed to actually restart as when I tried to SSH to the PI right after the RF LED started to flash, it would not connect for a few moments (like when you first power on).

I think that might be worth trying but it doesn’t prove it’s not a power related issue if it doesn’t improve, if you do not have a beefier PSU to try, is there any way to reduce the load on the PSU? Are you using WiFi ? try disabling WiFi and using an Ethernet cable, try disabling some other services.

Yes I think that would be a beefier supply.

We can look at that later if all else fails, to be honest if this is a power issue, restarting is only adding to the drain and it wmay become a lucky dip if there is enough juice available when the reset is called, resulting in a rapid resetting frenzy unless you only reset once and wait the full X mins again, which might mean several laps before actually starting, this is all a bit hit or miss right now.

That also suggests power issues.

[edit] the shop sold unit is only 1.2a and the recommended minimum for a Pi3 is 2.5a although “bareboard” consumption is given as only 400ma, I do not know what the RFM2Pi draws but you are working it pretty hard and the emonSD is loaded with services so it would be understandable if the 1.2a supply was struggling a little.

sorry, I was doing the annoying thing for you of editing my post, you might be interested in what happened when I connected the OOM supply to another raspberry PI i have.