How to recover emontx inputs after log full

So what loglevel are you using? it should be DEBUG for debugging or WARNING for a less verbose output, INFO never really got off the ground. IMO those example lines you gave us should be DEBUG not INFO as they are only good for in-depth debugging, connecting to the broker or subscribing to a topic is INFO, but printing every value of every publish, is DEBUG IMO. But INFO just hasn’t really been utilised much, in practice there are essentially 2 levels, DEBUG = verbose and WARNING = quiet.

Ok, therefore as we know emonhub is publishing, that would suggest the serial data is being passed via MQTT to the Mosquitto broker so that bits ok. (you can check that by setting the emonhub log level to DEBUG)

If you have other inputs functioning ok then feedwriter and core emoncms parts are apparently ok.

I would say the issue lays with the MQTT input to emoncms. It failed when the log was full as it couldn’t place it’s log messages, plus it’s possibly the only service you haven’t restarted. It was this I had in mind when I mentioned duplicated inputs, but the charateristics of the mqtt input crashing due to a full log, may differ to what we have previously seen.

there is information to be found, it’s just that the changes from init.d to systemd, mqtt_input to emoncms_mqtt, the move to emoncms.log for the mqtt, feedwriter and service-runner etc etc all happening at different times but in rapid succession really doesn’t help us know what you might have and where to find stuff. Plus, at the very core of your issue is the fact the logging stopped, that alone may explain the absence of some logs, eg the emoncms mqtt input.

what does

sudo systemctl status emoncms_mqtt mqtt_input

return?

and also

ls -la /var/log

How do I find out? And how do I do that? I found Troubleshooting — OpenEnergyMonitor 0.0.1 documentation but it doesn’t say anything.

There is no emoncms_input service.
mqtt_input is running. I just restarted it in case. No difference - no wait, I spoke too soon. Everything’s updating now.

$ systemctl status mqtt_input
● mqtt_input.service - Emoncms MQTT Input Script
   Loaded: loaded (/etc/systemd/system/mqtt_input.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2019-05-15 16:06:15 UTC; 3min 18s ago
     Docs: https://github.com/emoncms/emoncms/blob/master/docs/RaspberryPi/MQTT.md
 Main PID: 9800 (php)
   CGroup: /system.slice/mqtt_input.service
           └─9800 /usr/bin/php /var/www/emoncms/scripts/phpmqtt_input.php

May 15 16:06:15 emonpi systemd[1]: Started Emoncms MQTT Input Script.


$ ls -la /var/log
total 2336
drwxr-xr-x 11 root      root          920 May 15 06:25 .
drwxr-xr-x 12 root      root         4096 Aug  2  2018 ..
drwxr-xr-x  2 root      adm           140 May 15 06:25 apache2
-rw-r-----  1 root      adm        422696 May 15 16:06 auth.log
-rw-r-----  1 root      adm        414199 Apr 21 06:25 auth.log.1
-rw-r-----  1 root      adm         28790 Apr 15 06:25 auth.log.2.gz
-rw-r-----  1 root      adm         22215 Apr  7 06:25 auth.log.3.gz
-rw-r-----  1 root      adm         29276 Apr  1 06:25 auth.log.4.gz
-rw-r--r--  1 root      root        11201 Mar 16 09:23 boot.log
-rw-rw----  1 root      utmp            0 May  1 06:25 btmp
-rw-rw----  1 root      utmp            0 Apr  1 06:25 btmp.1
-rw-r-----  1 root      adm         71981 May 15 16:06 daemon.log
-rw-r-----  1 root      adm         61503 Apr 21 06:25 daemon.log.1
-rw-r-----  1 root      adm          5388 Apr 15 06:25 daemon.log.2.gz
-rw-r-----  1 root      adm          4168 Apr  7 06:25 daemon.log.3.gz
-rw-r-----  1 root      adm          5452 Apr  1 06:25 daemon.log.4.gz
-rw-r-----  1 root      adm             0 Mar 17 06:25 debug
-rw-r-----  1 root      adm          1177 Mar 16 09:22 debug.1
-rw-rw-rw-  1 root      root      1107675 May 15 16:06 emoncms.log
drwxr-xr-x  2 emonhub   root           80 May 14 19:57 emonhub
drwxr-xr-x  2 pi        root           60 Mar 16 09:22 emonpilcd
-rw-r-----  1 root      adm             0 May 13 13:44 kern.log
-rw-r-----  1 root      adm           109 Mar 19 17:37 kern.log.1
-rw-r-----  1 root      adm          7046 Mar 16 09:23 kern.log.2.gz
drwxr-xr-x  2 pi        pi             40 Mar 16 09:23 logrotate
-rw-r-----  1 root      adm          5366 May 15 06:25 messages
-rw-r-----  1 root      adm          1078 Apr 21 06:25 messages.1
-rw-r-----  1 root      adm           191 Apr 15 06:25 messages.2.gz
-rw-r-----  1 root      adm           180 Apr  7 06:25 messages.3.gz
-rw-r-----  1 root      adm           194 Apr  1 06:25 messages.4.gz
drwxr-xr-x  2 mosquitto mosquitto      60 Mar 16 09:23 mosquitto
-rw-rw-rw-  1 root      root            0 Mar 16 09:23 mqtt_input.log
drwxr-xr-x  2 mysql     adm            80 May 15 06:25 mysql
-rw-rw-rw-  1 root      root            0 Mar 16 09:23 mysql.log
-rw-rw-rw-  1 root      root            0 Mar 16 09:23 ntp_update.log
drwxr-xr-x  2 root      root           40 Mar 16 09:23 openhab
drwxr-xr-x  2 redis     redis         100 Mar 24 06:25 redis
-rw-rw-rw-  1 root      root            0 Mar 16 09:23 service-runner.log
drwxr-xr-x  2 root      root           60 Mar 16 09:23 supervisor
-rw-r-----  1 root      adm         20760 May 15 16:06 syslog
-rw-r-----  1 root      adm         87046 May 15 06:25 syslog.1
-rw-r-----  1 root      adm          4207 Apr 25 06:25 syslog.2.gz
-rw-r-----  1 root      adm             0 Mar 17 06:25 user.log
-rw-r-----  1 root      adm           198 Mar 16 09:22 user.log.1
-rw-rw-r--  1 root      utmp            0 May  1 06:25 wtmp
-rw-rw-r--  1 root      utmp            0 Apr  1 06:25 wtmp.1

So something in mqtt_input screws up when the log fills and doesn’t recover aftwerwards, but apparently doesn’t log anything to say it’s failed.

If you have the emoncms/config module installed (by default on emonSD) you can check the “emonhub” page, it’s near the top of the emonhub.conf.

Alternatively (I think) it’s also found at /home/pi/data/emonhub.conf and/or /etc/emonhub/emonhub.conf, the latter is the generic location for it, the former is where it was put due to the RO OS (which the latest image isn’t). It should have been symlinked so that the generic path was still valid, but I don’t think that was necessarily the case.

That is pretty much the case for all the OEM services (except pre-systemd emonhub)

Yeah! That’s 'bout the size of it, the issue is there is no room for new log messages, so it cannot leave that error message (even if it existed in code). The issue is that the services are not robust enough to manage that situation. emonhub (with it’s own log) is the exception to that rule, I have tested this thoroughly and emonhub doesn’t fail due to a full /var/log (or at least if it has it’s own direct log it doesn’t).

The content of your /var/log folder raises another question over logrotation. If the emonSD logrotation was in play, those .2.gz, .3.gz and .4.gz files wouldn’t be there. So I assume you’re on a daily “standard” logrotation. I do not recommend updating at the moment as you will not only pull in the emonhub service changes, but you will most likely lose the logrotation altogether as the “hourly fix” doesn’t seem to work at all! can you please post the output of the following

ls -la /etc/logrotate.conf
cat /etc/logrotate.conf
cat /etc/cron.{daily,hourly}/logrotate

and

ls -la /var/log/logrotate
ls -la /var/lib/logrotate/status
cat /var/lib/logrotate/status 

Thanks. I’d forgotten that page was there. So the answer was INFO. I think I might hack those warnings about old data to be INFO and then turn the logging level down.

Yes, the latter. It continues running after the log space becomes available again so it could log a warning at that point. But then if it recognized that situation, it should also be smart enough to fix whatever actually broke and stopped it working!

It occurred to me that it might not be mqtt_input that really has a problem. It might be mosquitto that is broken, plus the combination being dumb enough to not spot that the input was disconnected when I restarted mosquitto and not reconnect. Either way, it’s pretty obvious there’s some poor logic in there somewhere.

I don’t think I’ve adjusted log rotation at all.

$ ls -la /etc/logrotate.conf 
-rw-r--r-- 1 root root 599 May  5  2015 /etc/logrotate.conf

$ cat /etc/logrotate.conf
# see "man logrotate" for details
# rotate log files weekly
weekly

# keep 4 weeks worth of backlogs
rotate 4

# create new (empty) log files after rotating old ones
create

# uncomment this if you want your log files compressed
#compress

# packages drop log rotation information into this directory
include /etc/logrotate.d

# no packages own wtmp, or btmp -- we'll rotate them here
/var/log/wtmp {
    missingok
    monthly
    create 0664 root utmp
    rotate 1
}

/var/log/btmp {
    missingok
    monthly
    create 0660 root utmp
    rotate 1
}

# system-specific logs may be configured here

$ cat /etc/cron.{daily,hourly}/logrotate
#!/bin/sh

test -x /usr/sbin/logrotate || exit 0
/usr/sbin/logrotate /etc/logrotate.conf
cat: /etc/cron.hourly/logrotate: No such file or directory

There is no /var/log/logrotate

-rw-r--r-- 1 root root 1279 May 15 06:25 status

$ cat /var/lib/logrotate/status 
logrotate state -- version 2
"/var/log/syslog" 2019-5-15-6:25:1
"/var/log/dpkg.log" 2018-10-5-6:25:1
"/var/log/mosquitto/mosquitto.log" 2018-10-30-6:25:1
"/var/log/auth.log" 2019-5-12-6:25:2
"/var/log/apt/term.log" 2018-10-5-6:25:1
"/var/log/mysql/mysql-slow.log" 2018-8-7-6:0:0
"/var/log/apt/history.log" 2018-10-5-6:25:1
"/var/log/mysql/error.log" 2019-5-15-6:25:1
"/var/log/alternatives.log" 2018-8-7-6:0:0
"/var/log/debug" 2019-3-17-6:25:1
"/var/log/mail.log" 2018-8-7-6:0:0
"/var/log/kern.log" 2019-3-24-6:25:2
"/var/log/aptitude" 2018-8-7-6:0:0
"/var/log/redis/redis-server.log" 2019-3-24-6:25:2
"/var/log/apache2/access.log" 2019-5-15-6:25:1
"/var/log/ufw.log" 2018-9-19-6:0:0
"/var/log/wtmp" 2019-5-1-6:25:2
"/var/log/daemon.log" 2019-5-12-6:25:2
"/var/log/mail.warn" 2018-8-7-6:0:0
"/var/log/btmp" 2019-5-1-6:25:2
"/var/log/lpr.log" 2018-8-7-6:0:0
"/var/log/mail.err" 2018-8-7-6:0:0
"/var/log/mysql/mysql.log" 2018-8-7-6:0:0
"/var/log/lighttpd/error.log" 2018-8-12-6:25:2
"/var/log/user.log" 2019-3-17-6:25:1
"/var/log/mail.info" 2018-8-7-6:0:0
"/var/log/apache2/other_vhosts_access.log" 2018-8-7-6:0:0
"/var/log/apache2/error.log" 2019-1-19-6:25:1
"/var/log/cron.log" 2018-8-7-6:0:0
"/var/log/messages" 2019-5-12-6:25:2
"/var/log/mysql/mariadb-slow.log" 2018-8-7-6:0:0

No, I’m not planning on updating, thanks :wink:

It’s not as simple as that. I just restarted mosquitto again and the input automatically reconnected.

Sorry to say I think that I worked through this with someone else - should have remembered earlier :frowning:

yep Reboot Required Every 20 Days Or So To Restore Logging - #45 by IM35461.

Don’t know why I didn’t remember…

I think you’ve got some crossed wires there. The config on the emonhub page is purely for emonhub. Those “old data” messages you refer to are from emoncms not emonhub and that loglevel is set in emoncms/settings.php and not accesible via the emoncms gui.

I’m pretty confident it is, we’ve had issues before, we also saw that the broker was ok as emonhub was able to publish to it, so the issue lay inbetween the working broker and the emoncms inputs, of which, the http ones were updating ok.

That’s good to hear, thanks for sharing that. I’m trying to get a handle on what it and isn’t going on with the emonSD without a valid build guide and it’s a very fluid moving target as the updates and updater are frequently changing.

From your outputs

You are using the original package installed logrotate.conf (dated 2015), this and some other observations confirms what I suspected, that the whole logrotation modification was missed on the oct 2018 image.

So this means you are getting daily checks and weekly rotations not based on size unless overriidden by a package specific conf in /etc/logrotate.d/

although there may not be a /var/log/logrotate/logrotate.status file, the logrotate folder is there in your /var/log printout above, it’s ownership is pi:pi, that tells me it was (most probably) created by /etc/rc.local.

I’ve just been looking at what the logrotation needs for a interim fix and this info is useful, thanks.

Hopefully that won’t always be the case, but either way, I can let you know what I find and recommend with logrotate so you can decide whether to manual adjust or update.

No it looks pretty standard e.g. auth.log rotating weekly.

Pretty sure this is an ‘install’ rather than an SDImage IIRC. In which case the update scripts are not used.

What is the contents of /etc/logrotate.d/. Probably a load of specific configs for various things.

I think if you move the logrotate cron trigger to hourly from daily, and then create the 2 files I list in my issue Logrotate setup does not prevent /var/log filling up · Issue #103 · openenergymonitor/emonpi · GitHub you should get a stable system (check the directories in the post rotate command).

YMMV

There are certainly similarities (i.e. logs filling up and stopping the system) but it doesn’t seem to be the same situation as me (different logs, different services etc)

You’re right that I hadn’t realized/remembered they were two different logs. It seems par for the course that the emoncms webUI would provide screen editing for an alien log (emonhub) but not for its own log :roll_eyes:

emoncms.log is set to WARN level, so I’ve hacked PHPTimeSeries.php to make those messages INFO instead of WARN. I suppose I need to restart that somehow for it to take effect? Also, emoncms.log doesn’t seem to get rotated. Where is that done? (possibly a multi-valued answer :grin: )

You’re quite right. Apologies for my mis-statement.

Ifyou’re talking about my system, it is a shop-bought pre-installed SD card.

$ ls /etc/logrotate.d/
apache2  apt  aptitude  dpkg  lighttpd  mosquitto  mysql-server  redis-server  rsyslog  ufw

Probably not, emoncms is not a long running service, the script(s) are called for each request.

No apologies due, I wasn’t aware you had mis-stated anything! That was just one of the things I noted under “From your outputs”

Well the WARNings are still being logged, so something is not unloaded between requests or whatever.

Mmmmmh! That’s odd, but I guess nothing surprises me much these days.

Where did you do the editing, just to confirm we’re on the same page?

I wasn’t aware of any php caching (server side), I have not needed to restart following code edits before now (aside from browser refresh, which shouldn’t apply here). You could try a sudo systemctl reload apache2, that will do a “graceful restart” which IIRC isn’t actually a restart, it just reloads everything, but AFAIA apache2 doesn’t cache and php caching isn’t installed (but php is not my strong point), perhaps there’s just a shared copy of a included script lingering in ram?

/var/www/emoncms/Modules/feed/engine/PHPTimeSeries.php

This is the actual edit:

// 2019-05-16 djh  Changed 'warn' to 'info' to stop nuisance logging of DataPoint uploads
$this->log->info("post_bulk_prepare() data in past, nothing saved.  feedid=$feedid timestamp=$timestamp last=".$lastvalue_static_cache[$feedid]['time']." value=$value");

It does a graceful restart every morning as part of the logrotate. Graceful just means that apache allows its threads to run to completion (i.e. finish servicing a request) before terminating and restarting them. It rereads its configuration as well. So it’s a method of completely restarting but with no visible effects on clients.

May 18 06:25:03 emonpi systemd[1]: Reloading The Apache HTTP Server.
May 18 06:25:03 emonpi apachectl[20841]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.
May 18 06:25:03 emonpi systemd[1]: Reloaded The Apache HTTP Server.

Sadly the WARNings are still being logged. I’m out of my depth on PHP as well.

2019-05-18 11:01:35.817|WARN|PHPTimeSeries.php|post_bulk_prepare() data in past, nothing saved.  feedid=50 timestamp=1558166400 last=1558170000 value=1
2019-05-18 11:01:35.818|WARN|PHPTimeSeries.php|post_bulk_prepare() data in past, nothing saved.  feedid=50 timestamp=1558170000 last=1558170000 value=1

Solved it :slight_smile: Entirely by luck.

I reread this thread and had just been looking at the apache log and realized as I reread my explanation about graceful restarts that apache wouldn’t actually restart until all the threads finalised their requests. But emoncms keeps active requests with continual log entries showing it updating lists etc. So I shut down the browser window (on my openSUSE PC, note) that was displaying emoncms pages.

And lo and behold the WARN messages on the emonpi stopped!

So, obvious with hindsight, in order to unload PHP you need to stop accessing the web pages it is serving. Or I suppose a hard reset of apache would do the job.

2 Likes

I spoke too soon. They’re still coming. :frowning:

Oh dear! That’s annoying!

Have you tried restarting the feedwriter service?

Not yet. I restarted emonhub after posting my last message, but that doesn’t seem to have fixed it. I’ll try feedwriter now.

Restarting feedwriter seems to have done the trick. :fingers-crossed: