emonSD next steps: filesystem & logrotate

But does /var/log fill up as a result?

Not really surprising, but is that a problem?

I’d suggest that these optimisations (including the log2ram) are made optional within the install process so they can not be implemented by users not on flash memory.

With emonhub on log level warning & demandshaper turned off, the logs fill up much more slowly. Im going to provide better debug/warning level logging for both feedwriter and demandshaper to improve it further. I think I would personally like to have ‘warning’ as the default log level across the emon scripts with the option to set to ‘debug’ if there is an issue and ‘warning’ level is not enough.

I agree that making this all user configurable is a good idea.

Its a good question. If higher write rate is equal to lower SD card life span then making efforts to reduce write rate makes sense. How significant the difference is I dont know. The optimisations so far have reduced the OS partition write rate by 97%. It is of course a balance between functionality and reducing write rate. I think an hourly log2ram sync using RSYNC and the tmpfs changes for /tmp and /var/tmp may do most of the reduction, I will keep testing.

Here’s a screenshot of the output of iotop, it provides a useful indication of the processes writing to disk:

jbd2 being the journaling block device Journaling block device - Wikipedia

I do think there needs to be a more dynamic solution to the problem (and we risk moving off the thread title - perhaps another split) of the logs filling up.

I’d advocate something like the use of monit and the space usage test to trigger a log rotation if /var/log fills up.

1 Like

Comparing iotop with proc/diskstats Im getting quite different results.

Test period 10:32 → 12:12 (6000 seconds)

System1:

OS: 3.1 sectors/minute x 512/60 = 0.0258 kb/s x 6000s = 155k (iotop: 72k, 0.5x)
data: 43.0 sectors/minute x 512/60 = 0.358 kb/s x 6000s = 2148k (iotop: 7.5M, 3.6x)

System2:

OS: 3.1 sectors/minute x 512/60 = 0.0258 kb/s x 6000s = 155k (iotop: 60k, 0.4x) 
data: 16.7 sectors/minute  x 512/60 = 0.139 kb/s x 6000s = 834k (iotop: 3.4M, 4.2x)

iotop is reporting ~4x proc/diskstats for the ext2 (blocksize: 1024) data partition and under half for the ext4 (blocksize:4096) OS

any ideas?

Just a point, as I have run into this issue, mounting the data partition differently to the OS makes expanding the SD Card really hard! Expanding Filesystem.

What is the rationale for mounting the data partition separately? Are there really any significant gains to be had in a world of such large SDCards for pennies?

The data partition is ext2 and uses a different block size, this is part of the low-write optimisations.

I do not think it is the financial cost of the sd card that is being considered here. It’s the loss of data and aggravation of rebuilding a new sdcard image each time it fails.

For the record, whilst i see the reduction in write levels reported and welcome any reduction in write levels. I was never totally sold on the idea of a 3rd partition on the same drive. I would prefer to see a small usb stick in place of the 3rd partition if it must be separated, that would lower the writes on the sdcard significantly further. I do however like with having a root /data folder for c ollating all my data and mysql etc for easy backup and or easier retrieval in the event of a fail.

1 Like

Yes I get that, but does block size change the number of writes or just reduce the space used by small amounts of data? If the latter, when a 4Gb card was the largest you could get and was very expensive, I understand the optimization. Does this rationale still hold?

It is not just about the cost of the card, it is about the cost of the storage which has dramatically reduced since this design was implemented. But that is not mitigated by the smaller block size is it?

Loss of data is down to a poor backup regime - all storage devices are prone (at some point) to failing, even a USB stick.

Yes agreed but I presume you mean a logical folder rather than a physical partition? If all the data is grouped under any logical location it makes things far simpler (which is what DietPi does by default :smile:).

Trystan has documented his previous findings and working through the tests again currently. I think the basic idea is that the smaller block sizes mean less writes because every block is read and rewriten to write just one bit. So by increasing your data to save (feedwriter) and lowering your block size, you reduce the partial block writes aka multiple rewrites.

Those days pre-date even the very first emon images i think. I agree lower cost means we maybe do not need to be quite so careful, but as I said it’s the inconvenience not the cost that is the main factor here. Not to mention the environmental impact of “disposable sdcards”. Plus whilst the cost may have gone down, the tech has gotten better, cards do not fail as often these days so while they are cheaper, they have a longer expected life too.

Not sure I understand this

No, not at all! You cannot back up every other minuit of every day, even swapping out a sdcard for another pre-formated and imaged card results in lost data, even a reboot results in lost data!

If you card fails every few months are you going to watch it every day? I guarantee it will fail when you are the other side of the globe on your hols! Even with a daily backup you could lose a day or so by the time you notice and create a new image and set it up and import the old data etc etc Most of use have better things to do and want an almost fit and forget setup.

Indeed, even humans fail eventually, but doesn’t stop us taking reasonable steps to delay the inevitable.

I’ve got some sdcard based RPi devices that have been logging for 5 years on the same card, just as well, some of them are over 4 hours drive away, that’s a long way to go to change a sdcard!

I’m all for delaying the inevitable, just want to be sure it is not a case of 'we’ve always done it that way, which blinds one to other options. Equally ain’t broke don’t fix is a great mantra.

[edit]Your other thread reminds me one of the reasons for the separate partition was that the root partition was RO and the data partition RW. Again, things have changed.

Wow!!! You surprise me, why are we changing all the init.d and logs then? emonhub init.d not only “ain’t broke” but it has been the single most consistently dependable part of the project for years. It was the only service on the whole image that created it’s own log directory avoiding any need to be listed in the rc.local shenanigans and the log files have been the main source of debugging info for everything from rfm firmware issues, mqtt, data loss, even emoncms.org rejecting data etc etc etc. Those logs were also self rotated and were the only logs that couldn’t stuff up the ram partition prior to the logrotate being implemented and even when other logs filled the ram emonhub kept running. Now since it’s been “fixed” it is no only less dependable, it is causing issues for other services and stopping when the log is full.

And not always for the better. The fact part of the sdcard is RW undermines the RO part to a degree, because it’s the same card, if any partition wears out the whole card is no good. And now the rootfs is no longer RO it is prone to other errors too.

IMO RO should still be an option at least! Notice I say “option”! I know many user do not want RO emonPi’s and I get that and TBH there’s not a great deal of gain in having a RO FS and RW data on the same sdcard. But there is still a good argument for a RO data-forwarding only image option. One that is locked down with no disk writes at all, the “rock-solid gateway” was a great image.

Not me Guv. However, we are where we are and a sensible way forward needs to be identified.

As I never used a standard emonhub, I never really knew how it worked (in terms of logging etc).

And I don’t disagree there either.

As a matter of interest, what was the retention period for the emonhub logs previously?

Since it prodominantly write it’s logs to a space that is capped in size, it didn’t have a time retention, just a size cap, it rotated the log at 5M and retained just one old file so it would (eventually) maintain between 5M and 10M of logs. Al this was undermined by the new logrotate regime that rotated hourly at 1M and originally retained 2 files, effectively reducing emonhub logs to between 2M and 3M after it was running a while.

Mmmmmmmmmh!

I’m all for consistency, but I would not consider that “broke”. For me I would like to have made all the other services as reliable, robust and dependable as the emonhub init.d service if consistency was a goal.

Then we could try to get a single systemd service functioning well (by well I mean a prolonged test under real and stressed conditions not a quick spin up “it works fine for me” test) and then migrated the known good init.d services to a known good systemd implementation, fixing any issues enroute.

@ TrystanLea

Yr results as between diskstats and iotop are very different. But if the objective is low-write then both suggest that yr System 2 is better.

I’m running emonSD images updated to 9.9.8. I tried iotop but found it difficult/unfriendly …

  • It takes over the ssh screen
  • Data did not persist on the 2 top lines
  • There is no indication of the start time
  • When you quit …
    • The data is gone (a screen shot before quitting is clunky)
    • There is no indication of the finish time
  • I ran it for over an hour on a node …
    • No disk reads recorded
    • 32 K written to jbd2 mmcblk0p2 (the OS?)
    • 5.39M written to /var/www/emoncms/scripts/feedwriter.php … is that to the OS partition or to the Data partition? – was not clear.

My diskstats sectors written results are very different to yours (assume you are only talking about writes?).

Also my results are very different as between my different nodes – number of sensors, amount of input processing and whether just doing the emoncms task, all clearly have an impact. And this could also explain the difference between our results.

I guess it would be disappointing if it turns out that the choice of best low-write solution is dependent on number of sensors, amount of input processing, etc.

Using diskstats involves a bit of manual/excel calculation. I made a very simple script that helps getting the basic snapshot data …

#!/bin/bash
echo -e " starting diskstats-snapshot.sh"
echo " "
echo -e "date ... gives the time & date now"
date
echo " "
echo -e "uptime -p ... gives the time since last boot"
uptime -p
echo " "
echo -e "sudo cat /proc/diskstats ... gives the diskstats since last boot up until now"
sudo cat /proc/diskstats
echo " "
echo -e "it's now time for manual calculations ... :("
echo -e "the 6th number in diskstats is the SECTORS READ since last boot up until now"
echo -e "the 10th number in diskstats is the SECTORS WRITTEN since last boot up until now - the key stat?" 
echo " "
echo -e "that's it ..."
exit
1 Like

Ah, but I was referring to the install instructions not the move to the systemd method :grinning:.

[edit]i.e. not copying the service file but linking it to the /lib folder.

Ah ok!

Thanks again for sharing your test results @johnbanks looks very similar to what Im seeing

yes is doesnt say in the iotop output as far as Im aware, it should be the data partition.

Thats true, your seeing 122.4 sectors/min for 13 feeds while I was seeing 16.4 sectors/minute for 23 feeds… I have extended the interval at which I save to disk to 5mins rather than default 1min period… it might be that… I will try again with 60s write period.

@TrystanLea

Thx for the additional insights.

I use phptimeseries almost exclusively.

From the emonhub.conf logs, I can see that 2 of my nodes are updating every 10 secs.

A third node was updating every 5 secs which threw me a bit until I remembered that its emonTx was running 3 phase firmware that I updated with a new release last June. Apparently it has a user defined reporting interval. After a quick look thru’ the sketch, I didn’t see any ref - so I must be blindly using the default and it’s 5 secs.

Similarly, I don’t recall selecting a write to disk frequency and so again I’m blindly using the 60 sec default.

Now I appreciate that sensor reporting and writing to disk are different aspects. However, I would have thought the number of sector writes was more dependent on the sensor data ‘batch’ size rather than the frequency with which ‘batches’ are written to disk. Pls point my simplistic thinking in the right direction.

Thx

Thanks @johnbanks

I think it should be a combination of both, if the file system block size is 1024 bytes and you can batch the data in such a way that utilises a greater proportion of each block write (by buffering and writing at a lower frequency) that should reduce the overall sector writes per minute. These a limit to how much this can be done, if you buffer one feed for 5 mins and its posting 4 bytes every 10s, that’s 120 bytes but a write load of 1024 bytes.

PHPTIMESERIES writes 9 bytes per data point so 270 bytes. In theory the write load should be the same? but im not sure that it is… would be good to test and double check