emonSD next steps: log2ram

borpin · 17 April 2019 20:12

Continuing the discussion from emonSD next steps and Better handling of files in tmpfs /var/log

@pb66 I’ve been looking at this as the log folder filling up is teeing me off.

Whilst it appears to do what we want, I really do not like the mixing of systemd and cron. It works more by luck than design I fear, and that often means it will break easily .

Stage 1 - I have installed your fork but I see this

The red I think means it is not accessible / an orphan. Am I right? If so how do I fix it

pi@emonpi:~ $ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root       3.9G  1.6G  2.2G  42% /
devtmpfs        484M     0  484M   0% /dev
tmpfs           489M     0  489M   0% /dev/shm
tmpfs           489M  6.6M  482M   2% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           489M     0  489M   0% /sys/fs/cgroup
log2ram          40M  1.4M   39M   4% /var/log
tmpfs            30M     0   30M   0% /tmp
tmpfs           1.0M     0  1.0M   0% /var/tmp
/dev/mmcblk0p1   43M   22M   21M  52% /boot
/dev/mmcblk0p3   11G  119M  9.7G   2% /home/pi/data
tmpfs            50M  8.0K   50M   1% /var/log.bak
tmpfs            98M     0   98M   0% /run/user/1000

I note there is a ‘t’ in the permissions string - don’t know what that means.

Bill.Thomson · 17 April 2019 20:27

t is the “sticky” bit.

So what is the sticky bit?

A sticky bit is a permission bit that is set on a directory that allows only the owner of the file within that directory, the owner of the directory or the root user to delete or rename the file. No other user has the needed privileges to delete the file created by some other user.

Why does /tmp have the t sticky bit?

The /tmp directory can be used by different Linux users to create temporary files.
Now, what if a user deletes/renames a file in this directory created by another user?

To avoid these kind of issues, the concept of sticky bit is used.
So for that, a 777 is given but preserving the sticky bit is not a bad idea.

Ref: permissions - What is the "t" letter in the output of "ls -ld /tmp"? - Ask Ubuntu

Bill.Thomson · 17 April 2019 20:33

Presented as food for thought, i.e the concept vice the actual module itself:

emlog is a Linux kernel module that makes it easy to access the most recent (and only the most recent) output from a process. It works just like “tail -f” on a log file, except that the storage required never grows. This can be useful in embedded systems where there isn’t enough memory or disk space for keeping complete log files, but the most recent debugging messages are sometimes needed (e.g., after an error is observed).

The emlog kernel module implements simple character device driver. The driver acts like a named pipe that has a finite, circular buffer. The size of the buffer is easily configurable. As more data is written into the buffer, the oldest data is discarded. A process that reads from an emlog device will first read the existing buffer, then see new text as it’s written, similar to monitoring a log file using “tail -f”. (Non-blocking reads are also supported, if a process needs to get the current contents of the log without blocking to wait for new data.)

borpin · 17 April 2019 20:44

@pb66 I think the folder /var/log.old is not created as part of the install script so the symlink is an orphan.

github.com

pb66/log2ram/blob/765238b9f8280441f05d9776dd4f09089bc52f24/log2ram#L63-L63

    
      
          [ -d $RAM_LOG/rotated_logs ] || ln -s $OLD_LOG $RAM_LOG/rotated_logs

pb66 · 17 April 2019 20:59

The log.old directory is created by logrotate if it doesn’t already exist when it has files to rotate. When that happens the symlink will then be fully functional.

Another line or 2 could be added to the install script to add an empty folder and set permissions/ownership, but all that effectively does is change the color of the symlink until the first rotation happens.

borpin · 17 April 2019 21:03

BTW testing this on an EmonPi not a DietPi (it has it’s own RAM log setup).

But that means the problem isn’t solved as currently logrotate does not work correctly.

I’m confused as to what log2ram does more than the standard emonSD setup.

[edit]
I’m not understanding this. From the df -h above, it seems to me that both

log2ram          40M  1.4M   39M   4% /var/log

and

tmpfs            50M  8.0K   50M   1% /var/log.bak

are both in RAM.

Something not right here

If I force the log2ram (sudo /usr/local/bin/log2ram write), I simply get a mirror of /var/log/ in /var/log.bak/ (RSYNC enabled).

~~If I force logrotate ( sudo logrotate -fvd /etc/logrotate.conf) it tells me logs need rotating, tells me they are being rotated, but there are no rotated logs!~~

Doh! I had the -d flag.

However, this morning, log folder full, log2ram says out of space Only option is reboot.

pb66 · 18 April 2019 09:10

I would recommend starting a new thread about that “issue” then. whilst log2ram may alleviate or even resolve that issue, it’s not a given since we do not know if it’s “correct behaviour” or if it’s a fault since it has only started happening very recently and regularly to a select few (just you?).

There has always been a possibility of filling /var/log, it only really ever happened to my non-emonsd installs when there was a fault condition that produced a vast amount of logs, each time I have located and remedied the source of the rush of logs. I have also seen on very rare occasions that a /var/log partition would fill up over time (6mths+) which were addressed by increasing the partition size marginally so that the size and rotation better fitted the rate of log generation. I could have also reduced the log levels to the same ends, but I prefer to have the log data for debugging and improving systems.

The emonpi/sd however has a different approach. That tries to tackle the issue by rapidly rotating logs and not retaining more than the single last rotation (previously the last 2 until very recently). Prior to the more recent changes (maybe specifically the changes to emonhub systemd service unit and logging) this “filling up” wasn’t an issue and/or certainly not a daily issue.

Using log2ram will give us the log folder structure required at boot, it will retain logs across reboots for debugging and it will better manage the space in ram by putting rotated logs to disk rather than holding them in ram. All these things are essential and also helpful in resolving the issue you have but using log2ram has not been suggested over the last year or so to directly address an issue that has popped up in the last couple of weeks.

Think about what the current emonpi logging regime is doing when there is a lot of log traffic. If it is filling up when only a single set of rotated logs are retained (in ram) then effectively (and approximately!) you only have half of the 50mb log space after each rotation for new logs, so just by moving the rotated logs out of ram you would have double the space, this is what my mod to log2ram aims to do.

Retaining the logs longer (across boots) is very useful for debugging and the fact it removes the need for all that insanity in rc.local is a no brainier IMO. log2ram is not the reason you are having issues, even if it was working correctly which yours obviously isn’t, it only might resolve the issue. We need to work out if/why you logging has increased.

I hear what you say, but we need to stop focusing on the bits that clearly do work and try and fix the bits that don’t first and foremost.

So does/did the emonSD, have you restored normal logging before installing log2ram? You cannot just install it on something that already has /var/log in ram/tmpfs.

No they are not both in ram and you are right something is very wrong, possibly caused by installing over an existing logging to ram set up?

That is correct, but log.bak should be on disk not in ram!!!

I have no idea what the current emonSD is like, with no buildguide and then recent changes added to the unknown, I’m not sure what to suggest to remedy this. I did start to try and figure that out but the emonsd failed to update on firstboot and the reaction here on the forum was less than useful so I gave up, I have better things to do with my time since I do not use the emonSD myself, I do not have these issues.

Are the changes to emonhub logging live on the emonsd updates? Is that the cause of the increased logging?

I will try and help you install and try log2ram if you want!

I will also try and help you find why your log is filling up as a separate thing (certainly initially until we know better)

There will definitely be overlapping area there, but it would be daft to try and fix log2ram for the sole aim of fixing the log filling up, it might not do it and that is not how you gauge log2rams effectiveness.

To get this right we need to balance (with ample headroom) the generation of logs, the rotation of logs and the space for logs. I am not keen on stemming the flow of log data, that is the easy option and means we are not improving on the previous setup that just dumped the logs older than a couple of hours. But that doesn’t mean we shouldn’t take a hard look at the logs for duplication and/or unuseful logs and/or fault conditions.

[edit]

Here’s a df -h and a ls /var/log* on a basic Raspbian image plus log2ram and a few other bit’s and bobs, no emoncms or emonhub.

pi@raspberrypi:~ $ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root       7.2G  3.1G  3.8G  46% /
devtmpfs        460M     0  460M   0% /dev
tmpfs           464M   64K  464M   1% /dev/shm
tmpfs           464M  460K  464M   1% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           464M     0  464M   0% /sys/fs/cgroup
/dev/mmcblk0p1   44M   22M   22M  50% /boot
log2ram          40M  6.6M   34M  17% /var/log
tmpfs            93M     0   93M   0% /run/user/1000
pi@raspberrypi:~ $ ls /var/log*
/var/log:
alternatives.log  boot.log       daemon.log  faillog         kern.log  log2ram.log  messages   rotated_logs  user.log    Xorg.0.log.old
apt               bootstrap.log  debug       fontconfig.log  lastlog   mail.info    monit.log  samba         wtmp
auth.log          btmp           dpkg.log    journal         lightdm   mail.log     netdata    syslog        Xorg.0.log

/var/log.bak:
alternatives.log  boot.log       daemon.log  faillog         kern.log  log2ram.log  messages   rotated_logs  user.log    Xorg.0.log.old
apt               bootstrap.log  debug       fontconfig.log  lastlog   mail.info    monit.log  samba         wtmp
auth.log          btmp           dpkg.log    journal         lightdm   mail.log     netdata    syslog        Xorg.0.log

/var/log.old:
access.log.1           auth.log.3.gz    dpkg.log.1       error.log.5.gz    log2ram.log.1     messages.3.gz  user.log.1
access.log.2.gz        auth.log.4.gz    dpkg.log.2.gz    error.log.6.gz    log2ram.log.2.gz  messages.4.gz  user.log.2.gz
access.log.3.gz        btmp.1           error.log.1      error.log.7.gz    log2ram.log.3.gz  syslog.1       user.log.3.gz
access.log.4.gz        daemon.log.1     error.log.10.gz  error.log.8.gz    log2ram.log.4.gz  syslog.2.gz    user.log.4.gz
access.log.5.gz        daemon.log.2.gz  error.log.11.gz  error.log.9.gz    log2ram.log.5.gz  syslog.3.gz    wtmp.1
access.log.6.gz        daemon.log.3.gz  error.log.12.gz  history.log.1.gz  log2ram.log.6.gz  syslog.4.gz
access.log.7.gz        daemon.log.4.gz  error.log.13.gz  history.log.2.gz  log2ram.log.7.gz  syslog.5.gz
alternatives.log.1     debug.1          error.log.14.gz  kern.log.1        mail.info.1       syslog.6.gz
alternatives.log.2.gz  debug.2.gz       error.log.2.gz   kern.log.2.gz     mail.log.1        syslog.7.gz
auth.log.1             debug.3.gz       error.log.3.gz   kern.log.3.gz     messages.1        term.log.1.gz
auth.log.2.gz          debug.4.gz       error.log.4.gz   kern.log.4.gz     messages.2.gz     term.log.2.gz
pi@raspberrypi:~ $

PPS - I do like the concept of adding monit to force a rotation as a backstop (omg I didn’t just say that did i?) but again it shouldn’t be the fix, it should be there to ensure when there is an unexpected fault condition increasing the log traffic, it is managed in a graceful way so the info needed to fix that issue is not lost. NOT to manage the everyday log traffic.

borpin · 18 April 2019 09:37

I’m learning so much about logging (though I may be wrong in some of this)! - Overlapped with your reply @pb66.

With the advent of systemd, most Linux implementations are doing dual logging. Log entries are going to either the journal (systemd-journal.service) or syslog (rsyslog) but anything going to the journal is by default sent to syslog as well.

The journal is recorded in a file structure under /run/log/journal. As you can see above (df -h output), the journal is on tempfs but is much larger than /var/log It is a binary file which also means, I think, it is more efficient in terms of space.

Daemons (systemd services) by default log to the daemon.log file. The default /etc/rsyslog.conf breaks out some log entries into a separate file as well as syslog. To prevent this @glyn.hudson added a daemon.none entry. Unfortunately, my EmonPi did not have that in the rsyslog.conf

*.*;auth,authpriv.none,daemon.none              -/var/log/syslog

I’ve raised an issue Duplicate daemon.log logging · Issue #97 · openenergymonitor/emonpi · GitHub

I think I have been noticing it as I had set emonhub to DEBUG and emoncms to DEBUG as well but I suspect full logs is the reason for systems simply stopping.

What I now think is, that there is no need for log2ram. By either adjusting the rsyslog.conf file or setting the systemd-journal config to ForwardToSyslog=no we can eliminate virtually all the logging to /var/log.

There is an option to set the Journal to save the logs to disk (need to investigate if that can be time bound) and a log file can be generated on the fly journalctl -u emonhub.py > your.file.name from what is in the journal.

I think that was probably the issue .

CelsoHenriques · 18 April 2019 09:59

Is this related to: RAM usage?
Check the Last post.
Since i installed the node-red nodes, i assume that it was the reason of my RAM increase, but now i have some doubts.
Maybe some update change the logging?

borpin · 18 April 2019 10:02

Sorry, I missed that. I will answer over there.

pb66 · 18 April 2019 10:41

Which is most likely why most distos do not include the changes made to the emonSD to go all “systemd only” and also why most softwares still use traditional logging. On a personal note I still prefer the program logs seperate to the deamon logs so a mix of traditional logging and journalctl just for daemon seems the best option, possibly a position also prefered by the major software and distro vendors at this time. Perhaps this will mature and change (as my my own opinion) over time but we do not need to be leading the charge on adopting this. we have so many other things to resolve and so little experience with the cutting edge systemd and journalctl etc.

tmpfs           464M  460K  464M   1% /run
. . . 
log2ram          40M  6.6M   34M  17% /var/log

Looking at my (non-sick) Pi I have 460K /run total and 6.6M in /var/log. The 464M and 40M are just max limits, this space is not reserved.

Your sick, loaded and recently rebooted emonpi shows

tmpfs           489M  6.6M  482M   2% /run
. . . 
log2ram          40M  1.4M   39M   4% /var/log

which isn’t a vast difference.

The default for emonhub has always been debug (thankfully) and it has been the most useful debugging tool for the whole OEM project for years.

If it is, that is a new position. even when emonhub cannot post to emonhub.log it has always continued to function 100%. I mentioned above that I had seen full logs in the past. Some of these full logs I found in passing when updating something or just doing a health check. The logging had stopped months previously but gone unnoticed as the system continued to work flawlessly. In contrast to that, I have a local Pi that has also filled it’s log partition and whilst emonhub continues to function and emoncms still gets most of the data, my “cron based” nodes have ceased to update, on inspection, the logs were full. It is apparent that those cron entires fail when syslog cannot write to /var/log.

It only takes one non-systemd software to upset that idea, for example emoncms!!!

We still need a ram log to handle even just one log, plus we need the folder structure at boot, plus we need to try and retain those logs beyond a reboot (or hourly rotation).

installing log2ram is ordinarily a one line install and a total solution to all those issues system wide.

What you propose would involve editing each and every install software (not just OEM stuff) to ensure it logged to systemd only and any future installs by the end user will need to comply to, they will not be able to “just install” stuff and it get handled transparently behind the scenes by log2ram.

I have no problem with working towards a total systemd setup if that’s how it pans out, but I have a problem with us forcing the rollout at the cost of more important stuff especially when it creates new issues too, eg emonhub stopping.

The fact Trystan has shown interest in log2ram is potentially the most significant leap forward for this project in years and I am not exaggerating. Installing log2ram is a quick and well rounded solution to many issues, we are still dependent on rc.local for anything to run! Please do not try and block this progress in hope of fully implementing a far more complex 100% systemd solution in the same timescale and expect that to not cause a multitude of new issues. installing log2ram is an immediate and long overdue repair to the existing image, once that is implemented we can indeed explore the advantages of journalctl etc at a more relaxed pace and get it right.

I do not see the aim to use more systemctl and journalctl as a reason to abandon log2ram, we can use both, eventually if journalctl does successfully negate the need for log2ram, it can be dropped then, but I suspect it will be needed for some time to come.

I’m not against using journalctl, but I am concerned about putting all our eggs in a single (new, unknown and cutting edge) basket. This feels like the adoption of MQTT all over again. It doesn’t seem to matter if it works so much, just as long as it is sytemd and journalctl! We are only just getting to grips with MQTT in emoncms now, 4years later! are we in for years of issues with sytemd and journalctl just because we dropped all the alternatives too soon? What ever happened to the value of “tried and tested”?

borpin · 18 April 2019 12:32

We have never solved why it (systems stopping) has been happening - still seems most likely cause to me (unsubstantiated).

Did this change when the systemd emonhub service was introduced and the logging mechanism changed? I suspect it may have. Emonhub itself seems to continue to work but other elements of Emoncms grind to a halt.

I’ve seen it here. The EmonPi continues for quite a while once /var/log is full but eventually Emoncms stops responding and recording feed data.

Accepted.

Starter for 10…

Logging Strategy

Use log2ram to reduce SDCard writes.
Use journald for daemon services configured to use RAM to reduce SDCard writes (standard setup).
Use logrotate to limit space consumed in /var/log.
Use monit to monitor log directory and force specified actions to recover space.
Do not direct output from daemon (services) to syslog (modify rsyslog).
Develop mechanism to store specific journald log entries for specified time (such as emonhub).

As Journald is here to stay, I’ll investigate the persistent options for journald. There are various settings to control size and rotation but I am not sure it uses RAM initially, then will flush to disk - it is unclear.

For now perhaps the solution for emonhub is to have a service timer that extracts the emonhub log entries once a day to file and also a service that extracts the log entries immediately before a reboot. Those log files could then be rotated as normal although that does feel a little Heath Robinson.

Are you still running the init.d emonhub?

Greebo · 19 April 2019 02:35

There’s a stretch backport of the updated systemd that adds the “log to specified file” functionality for any service unit. It may be worth including that in any considerations?

pb66 · 19 April 2019 09:11

On the contrary, it has been solved many times! Not all “it’s stopped” issues are caused by the same thing, we have found the issues with mqtt, feedwriter, emonhub etc and we have found those by checking the logs (emonhub mainly) if the stoppage was due to /var/log being full, the logs would out of date. Not saying it has never happened, but I do not think is could be as prevalent as you suggest

I believe so, I do not think this rate of filling up would have been previously missed.

Perhaps the mqtt and/or feedwriter cannot function for the same reason the recent change stops emonhub since they all use the same systemd and logging strategy?

I don’t agree with any proposed strategy if it means losing emonhub.log (as it was), but thb the jump has been made and I doubt there will be any backtracking so it doesn’t matter what I think, bolting the stable door after . . . doesn’t help anyone!

Absolutely and as each day passes I can see that is less likely to change.

Just a little?

Just to be clear, are you suggesting that all the apache,redis,mariadb,mosquitto (plus anything else that gets installed) is edited on a one by one basis just to move the logs from log2ram to journalctl? Bear in mind both “use RAM to reduce SDCard writes” one works transparently and must be present for non-systemd stuff, the other demands loads of editing.

borpin · 19 April 2019 10:48

I think we will agree to disagree. I think it is more prevalent than we think but masked by folk simply rebooting. Perhaps add in an email warning mechanism for when the log folder fills up and memory use exceeds x% (as I think once the folder is full, rsyslog might keep the logs in RAM).

No, just those that already do out of the box although where there is a simple solution (e.g. redis config item syslog-enabled) it could be considered - we fiddle with the logging quite a lot to reduce writes this would just be a different fiddle. Inherently, our fiddles with the stock image is the reason there is a need to fix this i.e. we mount /var/log to tempfs, so limiting the size, without doing sufficient other actions to prevent it filling up and which stock applications do not take into account with their standard logging configuration. Log2ram will only help if the folder does not fill up in the first place AFAICS.

Not necessarily. If a combination of log2ram, logrotate, journald & rsyslog are setup correctly, then any package (and more will go the journald route I believe) installed will just work with that setup. I don’t think we can say simply installing log2ram will solve everything (and I don’t think that is what you are saying).

For instance, logrotate only works if there is enough destination space to rotate to.

I am not wedded to any approach, I just want to discuss and establish a clear strategy to work to in creating an install script that works.

pb66 · 19 April 2019 11:27

Amen, I have brought this up so many times in the past, but first I was told it isn’t a problem, then when it was, the hourly logrotation was rolled out as a cure.

This has already been suggested by me previously, I also suggested adding /var/log to the admin page alongside the ram and disc usage.

We need to fiddle less, full stop! I am not suggesting log2ram is a total solution (especially as it stands now) but it can remove most if not all of those fiddles. This is why I added the olddir logrotate to the standard L2R (log2ram) to try and fill it out to better suit our needs, a forced rotate via monit sounds like a good move (not totally researched yet).

True, but when logrotate has been running for a while it has files to delete to make room for the new ones so it becomes less of an issue after the retention period is reached after start up. But this is a managable issue anyway. The monit trigger to force a rotate when the ram partition is full can simply check for space first, make space (by deleting older logs) if needed or dump the rotations to /dev/null (or simply delete/empty the logfiles in situ) to stop the system failing. This is where we should be focusing are efforts, in a global policy that works transparently for all logs and promotes logfile retention without impacting the card life. NOT fiddling here there and everywhere, removing logs, reducing loglevels, deleting valuable log data etc etc.

Exactly my point!
I’m not against using journalctl were appropriate, I’m just not wanting us to waste time converting everything to use it when it won’t suit everything and it will introduce more issues. I am always for choice! L2R and/or journalctl, MQTT and/or HTTP, JSON and/or CSV etc etc I always try to reign in the sudden revolt to another option, it only because those trying to force the total rollout of what ever is new and shiny to everyone whether they want it or not, that causes me to argue against that position that makes me appear against it, I’m not, but it can seem that way when I am forced to keep going over the same points again and again to retain CHOICE or slow the rollout in order to get it right.

You are right, it is not what I’m saying.

If only that was the position BEFORE making all the changes! There is no point being so sure that something is right to roll out and then being open to discussion after it doesn’t work out so well and the horse has bolted.

pb66 · 19 April 2019 13:36

Another point about using L2R with journalctl.

If you simply create a folder in /var/log called journal it would cause journalctl to be “persisted” to /var/log, yes that is still in ram, but then L2R would back it up and rotate it as needed just like any other log. Much simpler than “fiddling” with journalctl to get it to periodically persist.

Dave has made a good point about that not potentially catching vital crash logs if L2R hasn’t got time to write to disk, but that could be explored further (and it’s not something we would miss, currently emonSD starts with no previous log data at all), as can doing it directly via journalctl, my point here is that if we were using L2R already, “just add a folder” and we easily and instantly have persisted journals, no edits, no fiddling, no complex stuff and no additional write load to wear the sdcard, job done with one “mkdir”, takes the heat off to allow a more leisurely look into how to better that rather than a hastily rollout of a irreversible complex solution with very little noticeable operational difference.

borpin · 19 April 2019 13:43

Yes I’d thought of that. The problem could be the size. As /run/log/journal is quite large (your RAM size effectively), you might run into issues as journald would not expect the size to be limited. But you can manage that by some other journald directives. You might need a L2R write commanded by monit.

It is an alternative approach to restoring the generation of a specific emonhub log file.

I think it would need a bit more than that as I suggest above.

Not suggesting that.

pb66 · 19 April 2019 13:56

That sounds like it’s making things over complex. Remember regardless of whether the journal is inside or outside /var/log it is still in memory and using the same amount of space, all we perhaps need is to increase the /var/log to accommodate the extra load. BUT that extra size in /var/log is no bigger than a non-persisted “this boot only” journalctl file most of the time as the (default) daily logrotate would move the previous days/boots data off to disk.

The default max size of the journal file is 26Mb whether it’s in /var/log or not, IMO the 50Mb of the emonSD is too small, as I stated elsewhere the 50MB is only a ceiling not a reservation of space so I would set that to perhaps 10% of the total ram, eg 100M on a Pi 3, the image works fine on a 512M ram Zero or Pi B (effectively leaving only 462M for non log data) so I think we can run in 924 (1G minus 100M) without an issue. Especially if using L2R so that whilst the ceiling is higher, the logs held in ram are smaller (no rotated files) and journalctl is the same size (effectively the move to 100M would potentially only rob 24M from the system even when maxed out) because the 26M journal has moved along with the remapped RAM)

borpin · 19 April 2019 14:06

Where did you get this figure from?

Yes agree with all you say - my point was that we can’t simply create the folder and assume all will be fine and dandy. There will need to be some other tweaks along the way.