Reboot Required Every 20 Days Or So To Restore Logging

borpin · 22 September 2018 18:30

You need to take a copy of the current logs before rebooting.

Also looking at the output of

sudo systemctl status feedwriter

should give you the last few log entries for the service and may well tell you why it failed. It will also tell you exactly when it failed.

andrewhf01 · 23 September 2018 07:50

Paul and Brian,
Thanks for the help, I’m relatively new to monitoring at this level so when I’ve got the logs (BTW, exactly which log do I need?) and the output of the command line where do I begin sifting through these data? Is there somewhere that I can get information on how to interpret the log; I’ve had a cursory look at the EmonHub log and don’t know where to begin etc (‘needle and haystack’ spring to mind)?

borpin · 24 September 2018 06:27

If you try the status command now, you should see a ‘normal’ output. It is just a few lines of the log in reality.

Couple of examples (I do not use feedwriter) - this is a service that I run that I know is not working. You can see it says active (exited). Looking at the log entries, it looks like it lost network (socket error) in this case…

root@OrPi1:~# systemctl status helios.service
● helios.service - LSB: Start/stop heliosd
   Loaded: loaded (/etc/init.d/helios)
   Active: active (exited) since Wed 2018-09-19 12:38:31 BST; 4 days ago
  Process: 583 ExecStart=/etc/init.d/helios start (code=exited, status=0/SUCCESS)

Sep 19 12:38:56 OrPi1 helios[583]: multiple([msg], hostname, port, client_id, keepalive, will, auth, tl...port)
Sep 19 12:38:56 OrPi1 helios[583]: File "/usr/local/lib/python2.7/dist-packages/paho/mqtt/publish.py", ...tiple
Sep 19 12:38:56 OrPi1 helios[583]: client.connect(hostname, port, keepalive)
Sep 19 12:38:56 OrPi1 helios[583]: File "/usr/local/lib/python2.7/dist-packages/paho/mqtt/client.py", l...nnect
Sep 19 12:38:56 OrPi1 helios[583]: return self.reconnect()
Sep 19 12:38:56 OrPi1 helios[583]: File "/usr/local/lib/python2.7/dist-packages/paho/mqtt/client.py", l...nnect
Sep 19 12:38:56 OrPi1 helios[583]: sock = socket.create_connection((self._host, self._port), source_add..., 0))
Sep 19 12:38:56 OrPi1 helios[583]: File "/usr/lib/python2.7/socket.py", line 571, in create_connection
Sep 19 12:38:56 OrPi1 helios[583]: raise err
Sep 19 12:38:56 OrPi1 helios[583]: socket.error: [Errno 113] No route to host
Hint: Some lines were ellipsized, use -l to show in full.

This is my Emonhub - looks happy active (running). Note you can see where the actual log file is located.

root@OrPi1:~# systemctl status emonhub.service
● emonhub.service - LSB: Start/stop emonHub
   Loaded: loaded (/etc/init.d/emonhub)
   Active: active (running) since Wed 2018-09-19 12:38:46 BST; 4 days ago
  Process: 1058 ExecStart=/etc/init.d/emonhub start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/emonhub.service
           └─1069 python /usr/share/emonhub/emonhub.py --logfile /var/log/emonhub/emonhub.log --config-file ...

Sep 19 12:38:46 OrPi1 emonhub[1058]: Starting OpenEnergyMonitor emonHub: emonhub has been started ok.
Sep 19 12:38:46 OrPi1 systemd[1]: Started LSB: Start/stop emonHub.

Actually, in this case of yours, the error you are seeing is nothing to do with the emonhub. The 2 logs you need to look at are the feedwriter log and the main emoncms log. If the Feedwriter log tells you when the service stopped, you can check that time stamp on the main log and see if there are any corresponding entries.

andrewhf01 · 29 September 2018 06:45

Thanks for that. I’ll try these suggestions when it next fails, and report back

andrewhf01 · 29 September 2018 07:02

Sorry forgot to upload out from status command:

● feedwriter.service - LSB: feedwriter script daemon
   Loaded: loaded (/etc/init.d/feedwriter)
   Active: active (running) since Sat 2018-09-22 06:19:16 UTC; 1 weeks 0 days ago
   CGroup: /system.slice/feedwriter.service
           └─1833 /usr/bin/php -f /var/www/emoncms/scripts/feedwriter.php

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

andrewhf01 · 23 October 2018 16:01

It’s been running for 31 days so far and the error has not occurred again and I have not needed to reboot etc (fingers crossed).
Thanks for your assistance and hopefully it does not reoccur anytime in the future.

andrewhf01 · 3 February 2019 08:05

How do I turn on the feedwriter log, because when I restart the feed writer and then get the status i’m informed that the log is:

Log is turned off

pi@emonpi(ro):~$ sudo systemctl status feedwriter
● feedwriter.service - LSB: feedwriter script daemon
Loaded: loaded (/etc/init.d/feedwriter)
Active: active (exited) since Thu 2018-12-20 08:27:37 UTC; 1 months 14 days ago
Process: 8484 ExecStop=/etc/init.d/feedwriter stop (code=exited, status=0/SUCCESS)
Process: 15768 ExecStart=/etc/init.d/feedwriter start (code=exited, status=0/SUCCESS)

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
pi@emonpi(ro):~$ sudo systemctl restart feedwriter
pi@emonpi(ro):~$ sudo systemctl status feedwriter
● feedwriter.service - LSB: feedwriter script daemon
   Loaded: loaded (/etc/init.d/feedwriter)
   Active: active (running) since Sun 2019-02-03 07:53:41 UTC; 31s ago
  Process: 7078 ExecStop=/etc/init.d/feedwriter stop (code=exited, status=0/SUCCESS)
  Process: 7087 ExecStart=/etc/init.d/feedwriter start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/feedwriter.service
           └─7117 /usr/bin/php -f /var/www/emoncms/scripts/feedwriter.php

Feb 03 07:53:33 emonpi feedwriter[7087]: Log is turned off
Feb 03 07:53:33 emonpi feedwriter[7087]: Starting RPI
Feb 03 07:53:41 emonpi systemd[1]: Started LSB: feedwriter script daemon.

borpin · 3 February 2019 13:21

Hi @andrewhf01

It is most odd that Feedwriter keeps stopping for you.

I’d suggest a reboot may solve it in the short term.

However, there is a new feedwriter service about to be rolled out with the next stable build so I suggest that we revisit it once that has happened - shouldn’t be too long should it @TrystanLea?

andrewhf01 · 3 February 2019 16:58

I’ve tried rebooting several times but when I have rebooted I then lose all the data over the period of time that the error has occurred with the feedwriter. If I restart the feedwriter, as suggested further up this thread, then these data are not lost (which is some consolation).
Ok, I’ll wait until the new feedwriter is available.
Thanks,

IM35461 · 18 March 2019 19:30

I guess perhaps 20 days I started on a new microsd card with the latest release 9.8.8 beta and have never made any command line changes.

Guess, what… it stopped updating

I did not have much time and did not want to loose any more data (14hrs lost) so I just selected the reboot option.

I just took the screen dump which seem to report all the services were running…

Any ideas on a cause?

TrystanLea · 18 March 2019 20:58

O dear! and I can see you are running all the latest services using systemd with automatic restart. Perhaps it is related to the issue seen by @borpin, his stopped logging yesterday I think, and may be related to logs filling up - it may also be unconnected. @glyn.hudson has pushed a fix today to fix the log issue. You could try pulling that in with an update and seeing if the problem happens again?

IM35461 · 18 March 2019 21:25

I have applied the update and will see how it goes…

Many thanks.

IM35461 · 4 April 2019 19:59

Just went to check how my one was going and it no longer showed up on the local network. It was working this morning.

The units LCD was displaying WiFi signal level but the button did nothing so I assume a total crash (never had one before).

Unplugged the USB briefly and it is going again…

IM35461 · 22 April 2019 09:03

Failed again after 18 days

I have left it not logging should anyone want me to check something.

borpin · 22 April 2019 09:17

@IM35461, Can you do a

ls -lah /var/log

and post the output please?

Any rotated files (with a .X where X is a number) can be deleted.

If you use this command -

sudo truncate -s 0 /var/log/[the log file name]

on the largest files listed, you should find the data inputs restart.

I have outlined a process here to change the way the logging happens and then how to manage the log files. If you want to have a go at this the please start a new thread. It would be useful as it would add evidence to my theory.

IM35461 · 22 April 2019 09:21

Not sure I understand the second bit but the output from the first command is

emonSD-30Oct18
pi@emonpi:~ $ ls -lah /var/log
total 50M
drwxr-xr-x 11 root      root       520 Apr 22 06:25 .
drwxr-xr-x 12 root      root      4.0K Aug  2  2018 ..
drwxr-xr-x  2 root      adm        100 Apr  4 19:21 apache2
-rw-r-----  1 root      adm        40K Apr 22 09:17 auth.log
-rw-r--r--  1 root      root      6.3K Apr  4 19:21 boot.log
-rw-------  1 root      utmp         0 Apr 22 09:14 btmp
-rw-r-----  1 root      adm        25M Apr 22 09:18 daemon.log
-rw-r-----  1 root      adm       1.2K Apr  4 19:21 debug
-rw-rw-rw-  1 root      root       400 Apr 21 08:43 emoncms.log
drwxr-xr-x  2 pi        root        60 Apr  4 19:20 emonpilcd
-rw-r-----  1 root      adm        28K Apr 22 05:08 kern.log
drwxr-xr-x  2 pi        pi          80 Apr 22 06:25 logrotate
-rw-r-----  1 root      adm        24K Apr 22 09:18 messages
drwxr-xr-x  2 mosquitto mosquitto   60 Apr  4 19:21 mosquitto
-rw-rw-rw-  1 root      root         0 Apr  4 19:21 mqtt_input.log
drwxr-xr-x  2 mysql     adm         60 Apr  4 19:21 mysql
-rw-rw-rw-  1 root      root         0 Apr  4 19:21 mysql.log
-rw-rw-rw-  1 root      root         0 Apr  4 19:21 ntp_update.log
drwxr-xr-x  2 root      root        40 Apr  4 19:21 openhab
drwxr-xr-x  2 root      root        40 Apr  4 19:21 openhab2
drwxr-xr-x  2 redis     redis       60 Apr  4 19:21 redis
-rw-rw-rw-  1 root      root         0 Apr  4 19:21 service-runner.log
drwxr-xr-x  2 root      root        60 Apr  4 19:21 supervisor
-rw-r-----  1 root      adm        25M Apr 22 09:18 syslog
-rw-r-----  1 root      adm        190 Apr  4 19:21 user.log
-rw-rw-r--  1 root      utmp      1.9K Apr 22 09:17 wtmp
pi@emonpi:~ $

IM35461 · 22 April 2019 09:29

I entered also these two command on what seem to be the big files

pi@emonpi:~ $ sudo truncate -s 0 /var/log/syslog
pi@emonpi:~ $ sudo truncate -s 0 /var/log/daemon.log

Nothing seems to change…

EDIT

Swap file use seems to have been reduced

borpin · 22 April 2019 09:51

I should have said to do

df -h

It would have said /var/log/ was 100%. It should now say much less (in terms of %).

I’d like to see if the data logging can be restored without rebooting.

Could you post a new output of ls -lah /var/log/ please?

You showed the Feeds page before, are there entries in the Inputs page?

Is there anything in the log window of the Administration page?

[edit]All the data may be currently sitting in REDIS. What does this look like for you?

IM35461 · 22 April 2019 10:08

The first commands give

emonSD-30Oct18
pi@emonpi:~ $ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root       3.9G  1.6G  2.2G  42% /
devtmpfs        484M     0  484M   0% /dev
tmpfs           489M     0  489M   0% /dev/shm
tmpfs           489M   50M  440M  11% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           489M     0  489M   0% /sys/fs/cgroup
tmpfs            30M   80K   30M   1% /tmp
tmpfs            50M  2.3M   48M   5% /var/log
tmpfs           1.0M     0  1.0M   0% /var/tmp
/dev/mmcblk0p1   43M   22M   21M  52% /boot
/dev/mmcblk0p3  3.5G  136M  3.2G   5% /home/pi/data
tmpfs            98M     0   98M   0% /run/user/1000
pi@emonpi:~ $
pi@emonpi:~ $ ls -lah /var/log/
total 2.3M
drwxr-xr-x 11 root      root       520 Apr 22 06:25 .
drwxr-xr-x 12 root      root      4.0K Aug  2  2018 ..
drwxr-xr-x  2 root      adm        100 Apr  4 19:21 apache2
-rw-r-----  1 root      adm        43K Apr 22 10:00 auth.log
-rw-r--r--  1 root      root      6.3K Apr  4 19:21 boot.log
-rw-------  1 root      utmp         0 Apr 22 09:14 btmp
-rw-r-----  1 root      adm       1.1M Apr 22 10:00 daemon.log
-rw-r-----  1 root      adm       1.2K Apr  4 19:21 debug
-rw-rw-rw-  1 root      root       400 Apr 21 08:43 emoncms.log
drwxr-xr-x  2 pi        root        60 Apr  4 19:20 emonpilcd
-rw-r-----  1 root      adm        28K Apr 22 05:08 kern.log
drwxr-xr-x  2 pi        pi          80 Apr 22 06:25 logrotate
-rw-r-----  1 root      adm        26K Apr 22 09:27 messages
drwxr-xr-x  2 mosquitto mosquitto   60 Apr  4 19:21 mosquitto
-rw-rw-rw-  1 root      root         0 Apr  4 19:21 mqtt_input.log
drwxr-xr-x  2 mysql     adm         60 Apr  4 19:21 mysql
-rw-rw-rw-  1 root      root         0 Apr  4 19:21 mysql.log
-rw-rw-rw-  1 root      root         0 Apr  4 19:21 ntp_update.log
drwxr-xr-x  2 root      root        40 Apr  4 19:21 openhab
drwxr-xr-x  2 root      root        40 Apr  4 19:21 openhab2
drwxr-xr-x  2 redis     redis       60 Apr  4 19:21 redis
-rw-rw-rw-  1 root      root         0 Apr  4 19:21 service-runner.log
drwxr-xr-x  2 root      root        60 Apr  4 19:21 supervisor
-rw-r-----  1 root      adm       1.2M Apr 22 10:00 syslog
-rw-r-----  1 root      adm        190 Apr  4 19:21 user.log
-rw-rw-r--  1 root      utmp      2.7K Apr 22 09:59 wtmp
pi@emonpi:~ $

Adminstration Log

LAST ENTRIES ON THE LOG FILE

Inputs
I have no extra nodes and on my old image it never created a phantom nodes.

borpin · 22 April 2019 10:14

Can you do a screen shot of the services status please?