How to recover emontx inputs after log full

My emonBase stopped recording data from my emonTxen, presumably because /var/log filled up. They’re currently showing no data for 23 hours.

I’ve deleted some data from the beginning of emoncms.log, which was huge, and deleted some old apache2 and syslog files, so there is now space in the directory but the emonTx inputs are still stale. Feedwriter had exited, so I’ve restarted that and it says it’s caught up.

I’m pretty much at a loss where to look next. Is there documentation of how the input data is processed, or a troubleshooting guide? Alternatively, can anybody suggest what I need to look at next?

edit: a bit more context. The feeds that are imported from elsewhere by scripts (weather and solar readings) are still updating normally, hence why I think it’s something emoncms/emonhub specific.

The emonhub service status was a log that was saying No space left on device up until the time I made some space and nothing after that. So I restarted the service and now the log is up-to-date with startup information:

May 14 11:32:49 emonpi sudo[32128]: pam_unix(sudo:session): session opened for user root by (uid=0)
May 14 11:32:49 emonpi sudo[32128]: pam_unix(sudo:session): session closed for user root
May 14 11:32:49 emonpi sudo[32138]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/chown -R emonhub /var/log/emonhub
May 14 11:32:49 emonpi sudo[32138]: pam_unix(sudo:session): session opened for user root by (uid=0)
May 14 11:32:49 emonpi sudo[32138]: pam_unix(sudo:session): session closed for user root
May 14 11:32:49 emonpi sudo[32147]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/chown -R emonhub /var/log/emonhub
May 14 11:32:49 emonpi sudo[32147]: pam_unix(sudo:session): session opened for user root by (uid=0)
May 14 11:32:49 emonpi sudo[32147]: pam_unix(sudo:session): session closed for user root
May 14 11:32:49 emonpi emonhub[32118]: Starting OpenEnergyMonitor emonHub: emonhub has been started ok.
May 14 11:32:49 emonpi systemd[1]: Started LSB: Start/stop emonHub.

but my emonTxen are still not updating.

edit: Oh and all the services on the admin page are shown as running and green except for emonPiLCD but then I don’t have an emonPi.

Are your emonTx’s posting via RFM (rfm2pi receiver) or via WiFi (huzzah modules)?

If it’s the former, check the emonhub logs, they maybe at /var/log/emonhub/emonhub.log if you are still running the init.d script version. Or maybe in journald or systemd or daemon.log if using systemd
service depending on where your setup is at currently update wise!

Whether you can see this log info in emoncms depends on what emoncms version you have and whether it ties in with the emonhub service type (I know it’s a mess!), but to be fair, I think if you are updating via the emonpi updater, the changes should happen simultaneously, but IIRC you aren’t using a Raspbian based OS (correct? or am i getting crossed wires?) so I’m not sure how that translates to your set up.

[edit] also worth checking your inputs in emoncms inputs page are not duplicated, if they are see duplicated inputs · Issue #1198 · emoncms/emoncms · GitHub and the linked threads for info on deleting the duplicates to restore data logging. .

To reduce the size of big log files use truncate - e.g.

sudo truncate -s 0 /var/log/daemon.log

just in case the service does not recreate the file automatically. If you just delete the file it can make things worse.

What does df -h tell you?

You can use journalctl commands to view the entries being passed to rsyslog by systemd.

Are you using /var/log on tmpfs?

RF

emonhub.log is just full of MQTT publishing messages (incidentally, can I just disable MQTT since I don’t use it?)

systemctl status emonhub just shows the same as I posted above. Nothing new since.

My logging system is a standard emonBase. I was trying to get emoncms running on my openSUSE desktop so I could do some development, but I gave that up when I discovered how unportable it is.

My inputs are fine (there was a mysterious Node 3 that appeared out of nowhere, but I deleted that). The red light on the emonTx is flashing regularly.

Thanks. I didn’t want to remove all the log entries so I deleted the first part of the files with vi. That’s always worked for me in the past.

Well, it told me that /var/log was full and now it tells me that it isn’t. Is there anything else I should be looking for?

Yes.

So you do have a /var/log/emonhub/emonhub.log (ie it’s not via journald)?

If your emonhub.log is showing MQTT publishing, it must also show the packets arriving via serial from the rfm2pi as emonhub can only publish what it receives, so I assume all is ok in emonhub, ie the rfm packets must be arriving if the’re being publised, but you would need to confirm it’s the right traffic to be sure. Do you recognise the emonTx data passing through emonhub?

Are you saying “my inputs are updating just fine” or “my inputs are fine as there are no duplicates” ?

If the inputs are updating then the issue is not emonhub or MQTT/Mosquitto, it’s within emoncms. That can only be the feedwriter IF the new data is timestamped later than the last recorded update (ref your other thread). What does the feedwriter line on the admin page say? usually 60s interval and ??datapoints buffered to write, or words to that effect.

MQTT is how emonhub passes data to emoncms on the emonSD so unless you have set up your local emoncms as a target for an additional http interfacer in emonhub, you are dependent on MQTT for local data.

This used to get me confused as well. I think this is right for a default install. Personally I still use the HTTP interface locally.

Untitled%20Diagram%20(2)

If the Inputs page shows that the data is fresh, but the data on the Feeds page is not being refreshed, then it might be feedwriter (assuming you use it). Even though feedwriter shows it is running I have found before that it isn’t actually doing anything.

sudo systemctl restart feedwriter.service

If the data is still in Redis (so you have not rebooted) it should actually catch up.

As would I if I used the emonSD. I understand publishing MQTT data for the benefit of other services is handy, but just because that data is there for that purpose, it is not (IMO) essential that is the route used to emoncms. It adds another step (MQTT broker) between emonhub and emoncms which adds both complexity and as we have seen so many times, potential for failures (mqtt broker or emoncms mqtt inputs). The appeal of the MQTT was AFAIR the inputs labels, but that came at the cost of timestamps being set closer to source. Now with the device module being promoted to auto-create processing and feeds, the input naming is less of a attraction. The timestamps and ability to buffer data (eg to pause for a backup creation etc) wins hands down for me, let alone the reliability and accuracy benefits.

1 Like

Totally agree. One less thing to go wrong. If I want the data for another purpose, I’ll publish it by MQTT from the input process.

BTW, what route do you take for your Grafana feeds? Direct from emonhub or via emoncms? (way OT now) :smile:.

Yes, that’s correct.

No, the file contains MQTT entries and only MQTT entries. Like these:

2019-05-15 09:24:52,637 INFO     MQTT       Publishing: emonhub/rx/8/values 2854,3464,-2823,0,252.22,300,300,300,300,300,300,874001,-32
2019-05-15 09:24:56,582 INFO     MQTT       Publishing: emonhub/rx/7/values 595,0,0,0,248.35,15.7,300,300,300,300,300,621608,-60

My inputs are NOT updating. There are the correct number of them, except there was the addition of ‘Node 3’. The inputs from the emonTxen currently show an Updated value of 45 hrs. My imported data OTOH is updating regularly as normal.

feedwriter **Active** Running - sleep 60s 0 feed points pending write
All the other srevices say ‘Active Running’

Ah, OK.

No, sorry, I seem to have confused both you and Paul. My emonTx inputs are NOT updating. feedwriter had stopped when the logs were full so I restarted it after clearing some space. It claims to be fine now.

OK, have you tried restarting emonhub service?

[edit]
It might be mosquitto, try restarting that service as well.

Yes

Just done. No difference.

I’m a bit concerned that there doesn’t seem to be anywhere to look for information to diagnose this problem. Anything to look for in some log or other? Or anything on the admin page?

I’m getting to the point where I’m concerned about the loss of logged data, so I’m beginning to think about rebooting the system to see if that fixes the problem. If I do that we will lose all evidence and be no nearer finding the cause of the problem. Does @TrystanLea or anybody have any other ideas to try first to diagnose this?

So what loglevel are you using? it should be DEBUG for debugging or WARNING for a less verbose output, INFO never really got off the ground. IMO those example lines you gave us should be DEBUG not INFO as they are only good for in-depth debugging, connecting to the broker or subscribing to a topic is INFO, but printing every value of every publish, is DEBUG IMO. But INFO just hasn’t really been utilised much, in practice there are essentially 2 levels, DEBUG = verbose and WARNING = quiet.

Ok, therefore as we know emonhub is publishing, that would suggest the serial data is being passed via MQTT to the Mosquitto broker so that bits ok. (you can check that by setting the emonhub log level to DEBUG)

If you have other inputs functioning ok then feedwriter and core emoncms parts are apparently ok.

I would say the issue lays with the MQTT input to emoncms. It failed when the log was full as it couldn’t place it’s log messages, plus it’s possibly the only service you haven’t restarted. It was this I had in mind when I mentioned duplicated inputs, but the charateristics of the mqtt input crashing due to a full log, may differ to what we have previously seen.

there is information to be found, it’s just that the changes from init.d to systemd, mqtt_input to emoncms_mqtt, the move to emoncms.log for the mqtt, feedwriter and service-runner etc etc all happening at different times but in rapid succession really doesn’t help us know what you might have and where to find stuff. Plus, at the very core of your issue is the fact the logging stopped, that alone may explain the absence of some logs, eg the emoncms mqtt input.

what does

sudo systemctl status emoncms_mqtt mqtt_input

return?

and also

ls -la /var/log

How do I find out? And how do I do that? I found Troubleshooting — OpenEnergyMonitor 0.0.1 documentation but it doesn’t say anything.

There is no emoncms_input service.
mqtt_input is running. I just restarted it in case. No difference - no wait, I spoke too soon. Everything’s updating now.

$ systemctl status mqtt_input
● mqtt_input.service - Emoncms MQTT Input Script
   Loaded: loaded (/etc/systemd/system/mqtt_input.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2019-05-15 16:06:15 UTC; 3min 18s ago
     Docs: https://github.com/emoncms/emoncms/blob/master/docs/RaspberryPi/MQTT.md
 Main PID: 9800 (php)
   CGroup: /system.slice/mqtt_input.service
           └─9800 /usr/bin/php /var/www/emoncms/scripts/phpmqtt_input.php

May 15 16:06:15 emonpi systemd[1]: Started Emoncms MQTT Input Script.


$ ls -la /var/log
total 2336
drwxr-xr-x 11 root      root          920 May 15 06:25 .
drwxr-xr-x 12 root      root         4096 Aug  2  2018 ..
drwxr-xr-x  2 root      adm           140 May 15 06:25 apache2
-rw-r-----  1 root      adm        422696 May 15 16:06 auth.log
-rw-r-----  1 root      adm        414199 Apr 21 06:25 auth.log.1
-rw-r-----  1 root      adm         28790 Apr 15 06:25 auth.log.2.gz
-rw-r-----  1 root      adm         22215 Apr  7 06:25 auth.log.3.gz
-rw-r-----  1 root      adm         29276 Apr  1 06:25 auth.log.4.gz
-rw-r--r--  1 root      root        11201 Mar 16 09:23 boot.log
-rw-rw----  1 root      utmp            0 May  1 06:25 btmp
-rw-rw----  1 root      utmp            0 Apr  1 06:25 btmp.1
-rw-r-----  1 root      adm         71981 May 15 16:06 daemon.log
-rw-r-----  1 root      adm         61503 Apr 21 06:25 daemon.log.1
-rw-r-----  1 root      adm          5388 Apr 15 06:25 daemon.log.2.gz
-rw-r-----  1 root      adm          4168 Apr  7 06:25 daemon.log.3.gz
-rw-r-----  1 root      adm          5452 Apr  1 06:25 daemon.log.4.gz
-rw-r-----  1 root      adm             0 Mar 17 06:25 debug
-rw-r-----  1 root      adm          1177 Mar 16 09:22 debug.1
-rw-rw-rw-  1 root      root      1107675 May 15 16:06 emoncms.log
drwxr-xr-x  2 emonhub   root           80 May 14 19:57 emonhub
drwxr-xr-x  2 pi        root           60 Mar 16 09:22 emonpilcd
-rw-r-----  1 root      adm             0 May 13 13:44 kern.log
-rw-r-----  1 root      adm           109 Mar 19 17:37 kern.log.1
-rw-r-----  1 root      adm          7046 Mar 16 09:23 kern.log.2.gz
drwxr-xr-x  2 pi        pi             40 Mar 16 09:23 logrotate
-rw-r-----  1 root      adm          5366 May 15 06:25 messages
-rw-r-----  1 root      adm          1078 Apr 21 06:25 messages.1
-rw-r-----  1 root      adm           191 Apr 15 06:25 messages.2.gz
-rw-r-----  1 root      adm           180 Apr  7 06:25 messages.3.gz
-rw-r-----  1 root      adm           194 Apr  1 06:25 messages.4.gz
drwxr-xr-x  2 mosquitto mosquitto      60 Mar 16 09:23 mosquitto
-rw-rw-rw-  1 root      root            0 Mar 16 09:23 mqtt_input.log
drwxr-xr-x  2 mysql     adm            80 May 15 06:25 mysql
-rw-rw-rw-  1 root      root            0 Mar 16 09:23 mysql.log
-rw-rw-rw-  1 root      root            0 Mar 16 09:23 ntp_update.log
drwxr-xr-x  2 root      root           40 Mar 16 09:23 openhab
drwxr-xr-x  2 redis     redis         100 Mar 24 06:25 redis
-rw-rw-rw-  1 root      root            0 Mar 16 09:23 service-runner.log
drwxr-xr-x  2 root      root           60 Mar 16 09:23 supervisor
-rw-r-----  1 root      adm         20760 May 15 16:06 syslog
-rw-r-----  1 root      adm         87046 May 15 06:25 syslog.1
-rw-r-----  1 root      adm          4207 Apr 25 06:25 syslog.2.gz
-rw-r-----  1 root      adm             0 Mar 17 06:25 user.log
-rw-r-----  1 root      adm           198 Mar 16 09:22 user.log.1
-rw-rw-r--  1 root      utmp            0 May  1 06:25 wtmp
-rw-rw-r--  1 root      utmp            0 Apr  1 06:25 wtmp.1

So something in mqtt_input screws up when the log fills and doesn’t recover aftwerwards, but apparently doesn’t log anything to say it’s failed.

If you have the emoncms/config module installed (by default on emonSD) you can check the “emonhub” page, it’s near the top of the emonhub.conf.

Alternatively (I think) it’s also found at /home/pi/data/emonhub.conf and/or /etc/emonhub/emonhub.conf, the latter is the generic location for it, the former is where it was put due to the RO OS (which the latest image isn’t). It should have been symlinked so that the generic path was still valid, but I don’t think that was necessarily the case.

That is pretty much the case for all the OEM services (except pre-systemd emonhub)

Yeah! That’s 'bout the size of it, the issue is there is no room for new log messages, so it cannot leave that error message (even if it existed in code). The issue is that the services are not robust enough to manage that situation. emonhub (with it’s own log) is the exception to that rule, I have tested this thoroughly and emonhub doesn’t fail due to a full /var/log (or at least if it has it’s own direct log it doesn’t).

The content of your /var/log folder raises another question over logrotation. If the emonSD logrotation was in play, those .2.gz, .3.gz and .4.gz files wouldn’t be there. So I assume you’re on a daily “standard” logrotation. I do not recommend updating at the moment as you will not only pull in the emonhub service changes, but you will most likely lose the logrotation altogether as the “hourly fix” doesn’t seem to work at all! can you please post the output of the following

ls -la /etc/logrotate.conf
cat /etc/logrotate.conf
cat /etc/cron.{daily,hourly}/logrotate

and

ls -la /var/log/logrotate
ls -la /var/lib/logrotate/status
cat /var/lib/logrotate/status 

Thanks. I’d forgotten that page was there. So the answer was INFO. I think I might hack those warnings about old data to be INFO and then turn the logging level down.

Yes, the latter. It continues running after the log space becomes available again so it could log a warning at that point. But then if it recognized that situation, it should also be smart enough to fix whatever actually broke and stopped it working!

It occurred to me that it might not be mqtt_input that really has a problem. It might be mosquitto that is broken, plus the combination being dumb enough to not spot that the input was disconnected when I restarted mosquitto and not reconnect. Either way, it’s pretty obvious there’s some poor logic in there somewhere.

I don’t think I’ve adjusted log rotation at all.

$ ls -la /etc/logrotate.conf 
-rw-r--r-- 1 root root 599 May  5  2015 /etc/logrotate.conf

$ cat /etc/logrotate.conf
# see "man logrotate" for details
# rotate log files weekly
weekly

# keep 4 weeks worth of backlogs
rotate 4

# create new (empty) log files after rotating old ones
create

# uncomment this if you want your log files compressed
#compress

# packages drop log rotation information into this directory
include /etc/logrotate.d

# no packages own wtmp, or btmp -- we'll rotate them here
/var/log/wtmp {
    missingok
    monthly
    create 0664 root utmp
    rotate 1
}

/var/log/btmp {
    missingok
    monthly
    create 0660 root utmp
    rotate 1
}

# system-specific logs may be configured here

$ cat /etc/cron.{daily,hourly}/logrotate
#!/bin/sh

test -x /usr/sbin/logrotate || exit 0
/usr/sbin/logrotate /etc/logrotate.conf
cat: /etc/cron.hourly/logrotate: No such file or directory

There is no /var/log/logrotate

-rw-r--r-- 1 root root 1279 May 15 06:25 status

$ cat /var/lib/logrotate/status 
logrotate state -- version 2
"/var/log/syslog" 2019-5-15-6:25:1
"/var/log/dpkg.log" 2018-10-5-6:25:1
"/var/log/mosquitto/mosquitto.log" 2018-10-30-6:25:1
"/var/log/auth.log" 2019-5-12-6:25:2
"/var/log/apt/term.log" 2018-10-5-6:25:1
"/var/log/mysql/mysql-slow.log" 2018-8-7-6:0:0
"/var/log/apt/history.log" 2018-10-5-6:25:1
"/var/log/mysql/error.log" 2019-5-15-6:25:1
"/var/log/alternatives.log" 2018-8-7-6:0:0
"/var/log/debug" 2019-3-17-6:25:1
"/var/log/mail.log" 2018-8-7-6:0:0
"/var/log/kern.log" 2019-3-24-6:25:2
"/var/log/aptitude" 2018-8-7-6:0:0
"/var/log/redis/redis-server.log" 2019-3-24-6:25:2
"/var/log/apache2/access.log" 2019-5-15-6:25:1
"/var/log/ufw.log" 2018-9-19-6:0:0
"/var/log/wtmp" 2019-5-1-6:25:2
"/var/log/daemon.log" 2019-5-12-6:25:2
"/var/log/mail.warn" 2018-8-7-6:0:0
"/var/log/btmp" 2019-5-1-6:25:2
"/var/log/lpr.log" 2018-8-7-6:0:0
"/var/log/mail.err" 2018-8-7-6:0:0
"/var/log/mysql/mysql.log" 2018-8-7-6:0:0
"/var/log/lighttpd/error.log" 2018-8-12-6:25:2
"/var/log/user.log" 2019-3-17-6:25:1
"/var/log/mail.info" 2018-8-7-6:0:0
"/var/log/apache2/other_vhosts_access.log" 2018-8-7-6:0:0
"/var/log/apache2/error.log" 2019-1-19-6:25:1
"/var/log/cron.log" 2018-8-7-6:0:0
"/var/log/messages" 2019-5-12-6:25:2
"/var/log/mysql/mariadb-slow.log" 2018-8-7-6:0:0

No, I’m not planning on updating, thanks :wink:

It’s not as simple as that. I just restarted mosquitto again and the input automatically reconnected.