/var/log still filling up

borpin · 12 May 2019 16:13

Continuing the discussion from EmonPi stopped updating inputs - /var/log/ full:

@glyn.hudson @TrystanLea The current solution does not work.

I created a fresh install from the SDImage, ran the updates, and this is the output from df -h; a couple of hours later.

tmpfs            50M   37M   14M  73% /var/log

pi@emonpi:/var/log $ ls -la /etc/cron.hourly/logrotate
lrwxrwxrwx 1 root root 46 May 12 10:56 /etc/cron.hourly/logrotate -> /var/www/html/emoncms/scripts/logger/logrotate

Using the command in that file from the command line results in…

pi@emonpi:/var/log $ /usr/sbin/logrotate -v -s /var/log/logrotate/logrotate.status /etc/logrotate.conf 2>&1 | tee /var/log/logrotate/logrotate.log
tee: /var/log/logrotate/logrotate.log: Permission denied
reading config file /etc/logrotate.conf
error: /etc/logrotate.conf:32 duplicate log entry for /var/log/auth.log
Reading state from file: /var/log/logrotate/logrotate.status
Allocating hash table for state file, size 64 entries
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state

Handling 1 logs

rotating pattern: /var/log/*.log /var/log/*/*.log /var/log/syslog /var/log/messages /var/log/btmp /var/log/*  1048576 bytes (1 rotations)
empty log files are not rotated, old logs are removed
switching euid to 0 and egid to 0
error: error switching euid to 0 and egid to 0: Operation not permitted
error: error setting owner of /var/log/logrotate/logrotate.status.tmp to uid 0 and gid 0: Operation not permitted

Nothing is rotated.

Note the output from ls -la /var/log/

drwxr-xr-x 12 root      root           560 May 12 10:45 .
drwxr-xr-x 12 root      root          4096 Aug  2  2018 ..
drwxr-xr-x  2 root      adm            100 Oct 30  2018 apache2
drwxr-xr-x  2 root      root           100 May 12 10:45 apt
-rw-r-----  1 root      adm          55170 May 12 17:05 auth.log
-rw-r--r--  1 root      root          6590 May 12 10:48 boot.log
-rw-------  1 root      utmp             0 Oct 30  2018 btmp
-rw-r-----  1 root      adm       18902971 May 12 17:07 daemon.log
-rw-r-----  1 root      adm           1177 Oct 30  2018 debug
-rw-r--r--  1 root      root          3565 May 12 10:46 dpkg.log
-rw-rw-rw-  1 root      root         30928 May 12 10:56 emoncms.log
drwxr-xr-x  2 emonhub   root            60 Oct 30  2018 emonhub
drwxr-xr-x  2 pi        root            60 Oct 30  2018 emonpilcd
-rw-r-----  1 root      adm          74204 May 12 17:07 kern.log
drwxr-xr-x  2 pi        pi             100 May 12 17:06 logrotate
-rw-r-----  1 root      adm          54983 May 12 17:07 messages
drwxr-xr-x  2 mosquitto mosquitto       60 Oct 30  2018 mosquitto
-rw-rw-rw-  1 root      root             0 Oct 30  2018 mqtt_input.log
drwxr-xr-x  2 mysql     adm             60 Oct 30  2018 mysql
-rw-rw-rw-  1 root      root             0 Oct 30  2018 mysql.log
-rw-rw-rw-  1 root      root             0 Oct 30  2018 ntp_update.log
drwxr-xr-x  2 root      root            40 Oct 30  2018 openhab
drwxr-xr-x  2 redis     redis           60 Oct 30  2018 redis
-rw-rw-rw-  1 root      root             0 Oct 30  2018 service-runner.log
drwxr-xr-x  2 root      root            60 Oct 30  2018 supervisor
-rw-r-----  1 root      adm       18989391 May 12 17:07 syslog
-rw-r-----  1 root      adm            396 Oct 30  2018 user.log
-rw-rw-r--  1 root      utmp          1920 May 12 10:57 wtmp

You can see that the daemon.log and syslog are filling up, when I look, the emonhub entries are appearing in both.

I have pointed out on numerous occasions how to stop that happening (your original edits of rsyslog.conf do not seem to be included) and to improve the rotation before further improvement (Log2Ram) are introduced.

Users will continue to experience issues because of the log folder filling up until this is fixed.

pb66 · 12 May 2019 17:21

Yeah, I reported that 3weeks ago

emonSD next steps: filesystem & logrotate

The logrotaion cron seems to try and use a custom logrotate.status file that “should” apparently be in /var/log .
/usr/sbin/logrotate -v -s /var/log/logrotate/logrotate.status /etc/logrotate.conf 2>&1 | tee /var/log/logrotate/logrotate.log
Since /var/log is in tmpfs and any file there will be lost on reboot, this logrotate command fails due to not being able to find the nominated status file, thus logrotate doesn’t run.

That isn’t really the way to fix that issue, the emonhub service unit needs to be corrected to include the log file path in the command as suggested here

emonSD next steps: emonhub logging

EG emonhub already has a --logfile commandline option for this purpose so if the command in the service unit was
ExecStart=/usr/share/emonhub/emonhub.py --config-file=/home/pi/data/emonhub.conf --logfile=/var/log/emonhub/emonhub.log
instead of
ExecStart=/usr/share/emonhub/emonhub.py --config-file=/home/pi/data/emonhub.conf
normal logging would be resumed despite the move to a systemd unit.

it is far better to direct the log messages to the correct place using the correct tool at source than to let them fall into the wrong place (via stdout) and then try and fix after the event with various tweaks and edits. This correction doesn’t seem to have filtered through to the update script as yet, it will of course also need emonhub adding to the list of log folder creation and service restarts in rc.local until a fix to that issue is rolled out too (Grrrr!!!).

Another point re the logrotation is that the wildcard entries have been edited since the logrotation last worked and when logrotation gets fixed again and resumes operation, the /var/log/* will rotate the rotations as if they were separate logs. This is why vague wildcards cannot be used (unless olddir is used and the rotations are not held in the same folder as the logs to be rotated).

borpin · 12 May 2019 18:21

We’ll agree to disagree on that one . Suffice to say there are 2 viable alternatives for Glyn/Trystan to deploy.

Yes much better to put the logrotation.conf back to standard and use a drop in to manage the emonhub.log for now.

pb66 · 12 May 2019 18:53

No, that won’t work! This has been discussed already at some length, so I won’t go over it again here. But in the short term the old rotation needs restoring by fixing the command line, and then to remove the last wildcard that was slipped in with some other changes but hasn’t been operational yet.

See right at the bottom, yet another wildcard was added for some reason, along with the kept rotations being chopped from 2 to 1. this would be a breaking change if the rotation worked at all.

The old logrotation conf already had wildcards and emonhub.log was caught by the /var/log/*/*.log

# Include all logs including 1 level deep subdirectories
/var/log/*.log /var/log/*/*.log /var/log/syslog /var/log/messages /var/log/btmp /var/log/* {}

so an additional drop-in is not required whilst the existing OEM config is in play. in fact it specifically excludes the package installed “includes”.

The OEM config is needed in the short term as a “stock” logrotation conf is not suitable for use with logs in RAM ) whether it be the current image or L2R), a custom logrotation conf is needed for use with L2R to get the best results without building in more issues.

borpin · 12 May 2019 19:01

Yes it does!!

Precisely. Go back to the default, let the other packages with their own, carefully crafted rotation options just run and add to it with a carefully crafted one just for emon* log files.

Stock works fine for me (no persistence but I can live with that for now). Better than the log folder filling up!

Agreed but a different, long term, solution.

pb66 · 12 May 2019 19:06

I guess we’'ll just have to agree to disagree on this too then as I’m not going over it all again.

djh · 14 May 2019 11:43

Doesn’t it need to be run as root, not by the pi user?

djh · 14 May 2019 11:50

FWIW, on my system neither daemon.log nor syslog have grown huge. syslog is mainly reporting wi-fi checks and daemon.log is mostly about php session files. emoncms.log was what killed my system seemingly mostly with

2019-05-14 11:02:02.822|WARN|PHPTimeSeries.php|post_bulk_prepare() data in past, nothing saved. feedid=50 timestamp=1557820800 last=1557824400 value=1
which is emoncms’ reaction to one of my import scripts. I’m not clear why it thinks worth logging that.

pb66 · 14 May 2019 12:33

Indeed it should, I had some vague recollection that it was being run as pi not root but when I looked the other day I couldn’t find where that was the case. However I did find reference to the idea it wasn’t being run as root at some point due to these 2 lines

github.com

openenergymonitor/emonpi/blob/ac0846fa9b322fe3a04ec7bacb199beb4774e3fb/stretch/rc.local#L21-L21


      
          chown -R pi:pi /var/log/logrotate

Why would the logrotate log need to be pi:pi unless logrotate was being run as pi? and

Why do we need to switch to the root user mid rotate if it was already being run as root?

I have no idea why logrotate would need running as anything other than root from a cronjob.

When logrotation is working as I believe was intended by the emonSD mods, all logs are rotated at 1M and checked hourly. Depending on how updated your emonSD is (assuming it is an emonSD) you might have a daily check, which is obviously not enough, a lot can happen even in an hour let alone a day, or you might have a logrotate that fails altogether in which case the /var/log filling up is just inevitable, whether it be a hour, a day, a month or a year.

Even with the 1M “limit” a runaway error can cause mayhem as if the limit isn’t hit on one rotation eg 999K then a fault condition raises many messages, it might not last til the next check when the file size will be over 1M (it could be 30M by then!)

The issue with the emonhub.log being mis-directed since being changed to systemd has just brought this “/var/log error” error to the forefront, but it’s always been there and fixing emonhub logs and/or the daemon/syslogs etc will not “fix the issue”, that’s in the hands of a better logrotation strategy (and backed up by monit?). I have been banging on about this since the “rock solid gateway” days (pre-emonhub which itself pre-dates the emonPi and emonSD’s).

Also in connection with this and possibly your other post too (How to recover emontx inputs after log full) what has changed with the change to systemd or more specifically using stdout over proper logs, is that services tend to fall over when /var/log is full, previously the services would soldier on and the data was held in ram and recoverable once there was some room in /var/log, this seems to no longer be the case, now a “stop due to /var/log full” is apparently no longer recoverable without data loss.

it’s only worth logging for indepth debugging purposes and should have a “debug” or “info” loglevel, it should not be a “warning” level event.

borpin · 14 May 2019 13:17

That is always true with a ‘regular’ rotation mechanism unless you put in a monitoring solution like Monit as well (even with Log2Ram).

pb66 · 14 May 2019 13:50

Correct, (L2R plays no part in this), it’s all down to how much log info is produced, how much space is available and how it is rotated.

With the logrotate strategy I’ve suggested (using with L2R,)

the size available in ram is effectively increased significantly as the rotated logs are no longer held in ram.
with dedicated “sdcard (or L2R) optimised” logrotate configs for applications the limit can be set to suit each log, who really needs 1M of any non-OEM log held in RAM? Many logs can be rotated at 1K (for example) whilst emonhub/emoncms etc are larger.
do not use copytruncate so that rotating isn’t blocked by a full /var/log.
forced logrotate if overall size reaches limit.
increased size limit of /var/log
maybe even Increase frequency of checks if needed? (perhaps better than depending on monit)

With available size in ram vastly increased and size of logs severely decreased the chance of filling up is vastly reduced. Currently (eg take 10 minor system logs ie not emonhub/emoncms) 10x logs at a permissible 1M uses 10M (25%) of available ram, given the rotations are also of a similar size (can be smaller, but can also be much larger) that’s 50% of the 40M used on a healthy system. Reduce those to 1K with no rotated logs in ram, that’s just 10K of 40M (0.00025%), much more room for “incidents”! Plus if /var/log is managed better then there is no reason not to use a bigger limit (eg 100M?) as race conditions would be fewer and better managed so the 100M (vs 40M) is less likely to impact the rest of the OS, plus, when /var/log is filling up over 40M (but less than 100M) the OS can still function, albeit serving web pages at a lower speed or taking longer to process inputs as Redis isn’t populated etc etc But no crash!!!

It would be far better to reduce the ram available to the rest of the system to a plentiful 900M than resort to using swapfile or crashing out!

[edit] remember the 40M is only a max limit not a reservation of ram. if the logs are only 10M you still have 990M for other usage even if the partition has a 100M limit.

djh · 14 May 2019 21:28

Is that the sort of thing I should raise a bug for then?

edit: PS +1 for your suggested organization of logging.

pb66 · 14 May 2019 22:21

Actually, I was wrong. I misread the message, it should be a warning level (sorry!) What it is telling you is that the data you are trying to post, is earlier that the last saved data. The data posted has a timestamp of 1557820800 (Tuesday, 14 May 2019 08:00:00 UTC) which is 1 hour earlier than the current “last data” which has a timestamp of (Tuesday, 14 May 2019 09:00:00 UTC). You cannot post (or edit) data that is before the “last data” if you are using emoncms feedwriter (which the emonSD image does). Therefore this is a warning to the user that their data is not being recorded.

borpin · 15 May 2019 07:20

Original

As editied.

Thanks for editing my quote Paul.

It is not the rotation mechanism, it is the trigger for the rotation mechanism that creates a ‘regular’ check to see if rotation is required i.e. time based. Thus between the 2 checks there is always a possibility (however slight) that the logs will runaway.

So an irregular rotation trigger (actually it is still regular just checking more often) via something like Monit, reduces the chance of a log running away to the don’t need to worry about it level.

djh · 15 May 2019 13:30

The way weather stations send hourly observations via DataPoint is that they send the previous 24 values. This is vital if there is an interruption in service in order to avoid a gap in records. So yes, I could complicate my input program by first reading the observations already recorded by emoncms and not uploading them, but it seems to me to be simpler and safer all round to upload them all and have emoncms simply discard the ones it doesn’t need. So as far as I’m concerned it’s not an error and not something that needs warning about; it’s normal. It would be nice if I could tell emoncms that.

TrystanLea · 20 May 2019 13:41

Id like to go forward and fix this issue by reverting to using the in-built emonhub.log.

1. I would amend emonhub/service/emonhub.service at emon-pi · openenergymonitor/emonhub · GitHub to include the --logfile parameter

ExecStart=/usr/share/emonhub/emonhub.py --config-file=/home/pi/data/emonhub.conf --logfile=/var/log/emonhub/emonhub.log

2. I would update rc.local for the stretch image to setup the emonhub log directory emonpi/stretch/rc.local at master · openenergymonitor/emonpi · GitHub

3. I would update the emonhub update script to create the emonhub log directory to avoid the need for a system restart during the update emonpi/update/emonhub.sh at master · openenergymonitor/emonpi · GitHub

4. Update the emonhub log viewer UI to use /var/log/emonhub/emonhub.log

I think I would prefer to do this than just disabling the systemd service installation, it would also fix existing systems that have already run an update. Would you be happy if I proceed with the above @pb66 @borpin?

The rc.local mod is of course temporary, to be replaced with our non rc.local requirements as part of the new emonSD image & scripts in development.

Im not sure if I need to make any logrotate modifications yet to fix this on existing systems.

borpin · 20 May 2019 14:27

I am content with the approach.

Not to fix the log folder filling up as emonhub will rotate at 5M. What is does mean is that there is a maximum of 10M of logs which is not very long in time with MQTT publish messages.

Longer term, I’d like to see a solution that enables a defined set of days of logs but that would be compatible with emonhub handling the logging.

djh · 20 May 2019 14:52

Why not look at MQTT logs direct from the broker if they’re required? It’s very easy to just subscribe if desired, and the output can be stored in a file wherever you’d like. Emonhub publishes them as INFO so it’s easy enough to not see them there if required. Most of the time, logging MQTT messages is a waste, IMHO.

borpin · 20 May 2019 15:06

Most of the time maybe, but they helped sort out this issue (Emon MQTT data incomplete) as it showed emonhub was definitely sending them .

djh · 20 May 2019 16:28

Well, investigating an MQTT issue is clearly a time when you would want to look at MQTT messages, yes! But that’s not most of the time. And the OP did it using mosquitto_sub, not by using emonhub logs, IIUC.