Reboot Required Every 20 Days Or So To Restore Logging

Absolutely, the perceived “need” to reboot should be sought out and remedied.

1 Like

Rebooting is quick and dirty. I know! :smiley:
The right thing to do is to tackle the root cause and fix it. It’s all a question of priority (c.q. time).

  1. update to the latest version (there is little point in filing bug reports on three versions ago)
  2. re-evaluate if the problem persist
    if it doesn’t => post it in the community
    if it does => proceed with the next steps
  3. sift through logs, gathering as many symptoms and pointers to the root cause
  4. either file a bug report or fix it yourself (and posting the bugfix in this forum)

As Trystan pointed out, the fix could be in the update. So, let’s first see what an update brings. I’m still on low-write 9.8.24 | 2017.11.27.

Hi Mike,

the logs you quote look like Apache logs. Though I’m not sure which specific log file you got them from.
I don’t have them in syslog>

pi@emonpi(rw):~$ grep WARN /var/log/syslog 
Apr 12 03:58:12 emonpi kernel: [    0.752229] WARN::dwc_otg_hcd_init:1032: FIQ DMA bounce buffers: virt = 0xbd904000 dma = 0xfd904000 len=9024
Apr 12 03:58:12 emonpi kernel: [    0.761497] WARN::hcd_init_fiq:459: FIQ on core 1 at 0x8058ef90
Apr 12 03:58:12 emonpi kernel: [    0.762842] WARN::hcd_init_fiq:460: FIQ ASM at 0x8058f300 length 36
Apr 12 03:58:12 emonpi kernel: [    0.764199] WARN::hcd_init_fiq:486: MPHI regs_base at 0xbe87a000

Nor in the Apache log

pi@emonpi(ro):~$ head -n10 /var/log/apache2/error.log
[Thu Apr 12 03:58:29.297445 2018] [mpm_prefork:notice] [pid 1638] AH00163: Apache/2.4.10 (Raspbian) configured – resuming normal operations
[Thu Apr 12 03:58:29.297665 2018] [core:notice] [pid 1638] AH00094: Command line: ‘/usr/sbin/apache2’
[Thu Apr 12 03:58:43.176439 2018] [mpm_prefork:notice] [pid 1638] AH00169: caught SIGTERM, shutting down
[Thu Apr 12 03:58:45.018413 2018] [mpm_prefork:notice] [pid 1947] AH00163: Apache/2.4.10 (Raspbian) configured – resuming normal operations
[Thu Apr 12 03:58:45.018649 2018] [core:notice] [pid 1947] AH00094: Command line: ‘/usr/sbin/apache2’
cat: /home/pi/data/emoncms-import.log: No such file or directory
cat: /home/pi/data/emoncms-export.log: No such file or directory
cat: /home/pi/data/emoncms-import.log: No such file or directory
cat: /home/pi/data/emoncms-export.log: No such file or directory
cat: /home/pi/data/emoncms-import.log: No such file or directory

@IM35461 Found the log file. You’re using the web interface, which shows /var/log/emoncms.log
I have these log entries too:

pi@emonpi(ro):~$ head -n10 /var/log/emoncms.log
2018-04-12 03:58:43.660|ERROR|phpmqtt_input.php|exception ‘Mosquitto\Exception’ with message ‘The client is not currently connected.’ in /var/www/emoncms/scripts/phpmqtt_input.php:112
Stack trace:
#0 /var/www/emoncms/scripts/phpmqtt_input.php(112): Mosquitto\Client->loop()
#1 {main}
2018-04-12 03:58:43.751|ERROR|phpmqtt_input.php|exception ‘Mosquitto\Exception’ in /var/www/emoncms/scripts/phpmqtt_input.php:121
Stack trace:
#0 /var/www/emoncms/scripts/phpmqtt_input.php(121): Mosquitto\Client->connect(‘localhost’, 1883, 5)
#1 {main}
2018-04-12 03:58:43.752|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-04-12 03:58:44.755|ERROR|phpmqtt_input.php|exception ‘Mosquitto\Exception’ with message ‘The client is not currently connected.’ in /var/www/emoncms/scripts/phpmqtt_input.php:112

At night, with near constant electricity use, constant temperature, the watchdog is a bit nervous. So it interprets that as an anomaly and executes a reboot. That means that a number of restarts by the watchdog were false positives.

I’ve now adjusted the watchdog to do more logging and to not reboot. I’ve included

  • process list (output of ps -AHf)
  • a copy of the three most recent openhab outputs
  • a copy of /var/log/syslog
  • a copy of /var/log/emoncms.log

Any other logfiles that could contain useful pointers to the root cause?

The most useful log is emonhub.log.

You also need to log the status of the following services, emonhub, feedwriter and mqtt_input.

A “top” printout to show RAM usage by process and cpu usage might be useful too.

[edit - also if you must use a watchdog, you really should check emoncms directly rather than openhab. you are not checking to see it the data is updating, you are checking to see if the data is being shared with openhab. emoncms and emonhub might not be failing if you are checking openhab. Also using an emoncms feed api would allow you to check the time the feeds were last updated NOT the last time the value changed. very different things, a static value may never change even when it is successfully updated every 5secs.]

I’ve added

  • status of all services
  • top printout

You’re right about OpenHAB. Retrieving the data from EmonCMS will be much more reliable. @pb66 Good idea! Will make the watchdog less nervous at night :slight_smile:

The script pulling data from EmonCMS API is ready. It checks when the temperature, power1 and power2 feeds have been last updated. If any of these feeds have been update more than 5 minutes ago, it logs a bunch of data.

The waiting game has begun. I hope it crashes soon :wink:

1 Like

@IM35461 I’ve updated to low-write 9.8.28 | 2018.01.27 about a week ago and so far so good. Looks like the update fixed whatever the root cause was. Based on that experience, I’d recommend you install this update to fix the issue you’re having.

Just to let you know it has been working well till yesterday and failed at 70+ days.

Did a reboot and installed an update and now have a new problem. The feeds web page just gives the loading cogwheel but the other webpages and the Android app still work.
Tried cold boot but no better:(

Any ideas?

I’ve updated as you have suggested and after 21 days the feeds stop logging.
In the server information next to writter (in red) “Daemon is not running, start it at ~/scripts/feedwriter” (see attached)
How do I resolve this issue?

Have you restarted the feedwiriter?

sudo systemctl restart feedwriter

It won’t fix the root issue but it will either return a fault/error that might give you insight to why it stopped running or it will start and write all that buffered data to disk before it is lost through a reboot or power outage as you apparently have “1012788 feed points pending write” held in RAM.

No, I had not tried that, I’m afraid I’ve already rebooted the system so I’ll try that next time it happens (I assume it will do this again in approximately 21 days).
Any ideas what the root problem could be, which is causing these feed data to remain in the buffer and not write to disk?

You need to take a copy of the current logs before rebooting.

Also looking at the output of

sudo systemctl status feedwriter

should give you the last few log entries for the service and may well tell you why it failed. It will also tell you exactly when it failed.

Paul and Brian,
Thanks for the help, I’m relatively new to monitoring at this level so when I’ve got the logs (BTW, exactly which log do I need?) and the output of the command line where do I begin sifting through these data? Is there somewhere that I can get information on how to interpret the log; I’ve had a cursory look at the EmonHub log and don’t know where to begin etc (‘needle and haystack’ spring to mind)?

If you try the status command now, you should see a ‘normal’ output. It is just a few lines of the log in reality.

Couple of examples (I do not use feedwriter) - this is a service that I run that I know is not working. You can see it says active (exited). Looking at the log entries, it looks like it lost network (socket error) in this case…

root@OrPi1:~# systemctl status helios.service
● helios.service - LSB: Start/stop heliosd
   Loaded: loaded (/etc/init.d/helios)
   Active: active (exited) since Wed 2018-09-19 12:38:31 BST; 4 days ago
  Process: 583 ExecStart=/etc/init.d/helios start (code=exited, status=0/SUCCESS)

Sep 19 12:38:56 OrPi1 helios[583]: multiple([msg], hostname, port, client_id, keepalive, will, auth, tl...port)
Sep 19 12:38:56 OrPi1 helios[583]: File "/usr/local/lib/python2.7/dist-packages/paho/mqtt/publish.py", ...tiple
Sep 19 12:38:56 OrPi1 helios[583]: client.connect(hostname, port, keepalive)
Sep 19 12:38:56 OrPi1 helios[583]: File "/usr/local/lib/python2.7/dist-packages/paho/mqtt/client.py", l...nnect
Sep 19 12:38:56 OrPi1 helios[583]: return self.reconnect()
Sep 19 12:38:56 OrPi1 helios[583]: File "/usr/local/lib/python2.7/dist-packages/paho/mqtt/client.py", l...nnect
Sep 19 12:38:56 OrPi1 helios[583]: sock = socket.create_connection((self._host, self._port), source_add..., 0))
Sep 19 12:38:56 OrPi1 helios[583]: File "/usr/lib/python2.7/socket.py", line 571, in create_connection
Sep 19 12:38:56 OrPi1 helios[583]: raise err
Sep 19 12:38:56 OrPi1 helios[583]: socket.error: [Errno 113] No route to host
Hint: Some lines were ellipsized, use -l to show in full.

This is my Emonhub - looks happy active (running). Note you can see where the actual log file is located.

root@OrPi1:~# systemctl status emonhub.service
● emonhub.service - LSB: Start/stop emonHub
   Loaded: loaded (/etc/init.d/emonhub)
   Active: active (running) since Wed 2018-09-19 12:38:46 BST; 4 days ago
  Process: 1058 ExecStart=/etc/init.d/emonhub start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/emonhub.service
           └─1069 python /usr/share/emonhub/emonhub.py --logfile /var/log/emonhub/emonhub.log --config-file ...

Sep 19 12:38:46 OrPi1 emonhub[1058]: Starting OpenEnergyMonitor emonHub: emonhub has been started ok.
Sep 19 12:38:46 OrPi1 systemd[1]: Started LSB: Start/stop emonHub.

Actually, in this case of yours, the error you are seeing is nothing to do with the emonhub. The 2 logs you need to look at are the feedwriter log and the main emoncms log. If the Feedwriter log tells you when the service stopped, you can check that time stamp on the main log and see if there are any corresponding entries.

Thanks for that. I’ll try these suggestions when it next fails, and report back

Sorry forgot to upload out from status command:

● feedwriter.service - LSB: feedwriter script daemon
   Loaded: loaded (/etc/init.d/feedwriter)
   Active: active (running) since Sat 2018-09-22 06:19:16 UTC; 1 weeks 0 days ago
   CGroup: /system.slice/feedwriter.service
           └─1833 /usr/bin/php -f /var/www/emoncms/scripts/feedwriter.php

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

It’s been running for 31 days so far and the error has not occurred again and I have not needed to reboot etc (fingers crossed).
Thanks for your assistance and hopefully it does not reoccur anytime in the future.

How do I turn on the feedwriter log, because when I restart the feed writer and then get the status i’m informed that the log is:

Log is turned off

pi@emonpi(ro):~$ sudo systemctl status feedwriter
● feedwriter.service - LSB: feedwriter script daemon
Loaded: loaded (/etc/init.d/feedwriter)
Active: active (exited) since Thu 2018-12-20 08:27:37 UTC; 1 months 14 days ago
Process: 8484 ExecStop=/etc/init.d/feedwriter stop (code=exited, status=0/SUCCESS)
Process: 15768 ExecStart=/etc/init.d/feedwriter start (code=exited, status=0/SUCCESS)

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
pi@emonpi(ro):~$ sudo systemctl restart feedwriter
pi@emonpi(ro):~$ sudo systemctl status feedwriter
● feedwriter.service - LSB: feedwriter script daemon
   Loaded: loaded (/etc/init.d/feedwriter)
   Active: active (running) since Sun 2019-02-03 07:53:41 UTC; 31s ago
  Process: 7078 ExecStop=/etc/init.d/feedwriter stop (code=exited, status=0/SUCCESS)
  Process: 7087 ExecStart=/etc/init.d/feedwriter start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/feedwriter.service
           └─7117 /usr/bin/php -f /var/www/emoncms/scripts/feedwriter.php

Feb 03 07:53:33 emonpi feedwriter[7087]: Log is turned off
Feb 03 07:53:33 emonpi feedwriter[7087]: Starting RPI
Feb 03 07:53:41 emonpi systemd[1]: Started LSB: feedwriter script daemon.