emonSD next steps: filesystem & logrotate

TrystanLea · 20 April 2019 21:51

Here are the feedwriter changes: move all feedwriter logging to emoncms log · emoncms/emoncms@b561ac1 · GitHub all logging now goes via the logger, Im not sure why there was still print statements in there.

no, its just log2ram with rsync setting on/off.

the measurement is the output of proc/diskstats read every 60s, so that ends up being sectors written over a 60s period.

borpin · 20 April 2019 21:55

Ah Ok into the emoncms.log. Thanks.

TrystanLea · 20 April 2019 21:56

Whats not quite clear to me yet is how much less write load log2ram with an hourly sync is resulting in vs logging straight to the SD card in the first place and what the difference is between rsync enabled or disabled. That’s what Im trying to test at the moment. Im testing a reduced logging configuration but the results should maintain similar proportions if logging was more granular… although that would also be good to confirm with testing

pb66 · 20 April 2019 22:17

I would be surprised if there is any real advantage using rsync over cp. In general rsync only offers an advantage if there are large files (>GB) and/or remote targets. The small files in play here on a local device I expect to be at a disadvantage using rsync due to the additional processing to calculate.

That may be true, however whilst you are working at such a macro level, but still with the same size sectors and blocks the differences are going to be that much harder to detect and/or measure. Ideally you should be over logging so that the differences are obvious. If it were me, I would write a small program to log dummy data to a handful of log files at such a rapid rate as to make any other un-controlled activity negligible and therefore all tests are comparable. It would speed up the tests, make the tests more consistent and the results more obvious. Doing it this way also stress tests things at the same time.

[edit] the results of your tests above seem to show L2R at least halves the write activity if I’m reading correctly. Half the wear means twice the life span, what are your aims?

The fact you see so much reduction when generating so few log messages is outstanding IMO, that effectiveness can only get better with more traffic, there is only so far you can reduce “barely writing” down to. With larger writes there would be more to save.

TrystanLea · 21 April 2019 08:15

My understanding was that rsync transfered only the content of the file that had changed rather than copying the file anew. rsync - Wikipedia I guess it depends what the chunk size is which isnt entirely clear?

The test showing half the writes is for logs starting from a cleared state. As the log file size increases it appears the size of the hourly sync spike increase as well, the question is does it start becoming a larger write load than direct writes after x number of days, or with a lower logging resolution, that is what is not entirely clear to me…

TrystanLea · 21 April 2019 08:21

Here are the overnight results for log2ram with rsync turned off

again showing increasing spike magnitudes over time.

The mean sector writes per minute is 38.4 = 0.32 kb/s
The size of the /var/log folder using ‘du -sh /var/log’ increased from 384k to 1016k over 12h15 mins
iotop reported 1784 k of jbd2/mmcblk0p2 writes over the same period.

TrystanLea · 21 April 2019 17:01

The write rate from 9pm to 6am not including the large spike just before 7am was 20.8 or 0.17kb/s, from 7am until 5:30pm this evening it was 36.4 or 0.3 kb/s, the spike magnitude stepped down a little after midday today before continuing an upwards trend. Still not really enough data to be sure what the longer term implications are. It would be good to have 3-4 days at least.

yes that would be good as well, i was keen to test a reduced logging level to see what kind of write rate having the emoncms and emonhub logging at ‘error’ entails. I’d be grateful for any help replicating the above logs on different configurations. Do you see your sector writes per minute continue to increase over time?

Ideally the sync step should only copy the new data and not overwrite the entire file every time?..

borpin · 21 April 2019 18:43

Can you do some setup instructions?

TrystanLea · 21 April 2019 18:53

Sure, here’s the script Im using to record /proc/diskstats:

<?php
$apikey = "";
$host = "http://localhost/emoncms";

$redis = new Redis();
$redis->connect("localhost");

// Retrieve last value from redis
$last_sectors_written2 = $redis->get("diskstats:sectorswritten:2");
// Fetch current value
$total_sectors_written2 = exec("sudo awk '/mmcblk0p2/ {print $10}' /proc/diskstats");
// Save current value to redis
$redis->set("diskstats:sectorswritten:2",$total_sectors_written2);
// Calculate difference
$sectors_written2 = $total_sectors_written2 - $last_sectors_written2;

// Retrieve last value from redis
$last_sectors_written3 = $redis->get("diskstats:sectorswritten:3");
// Fetch current value
$total_sectors_written3 = exec("sudo awk '/mmcblk0p3/ {print $10}' /proc/diskstats");
// Save current value to redis
$redis->set("diskstats:sectorswritten:3",$total_sectors_written3);
// Calculate difference
$sectors_written3 = $total_sectors_written3 - $last_sectors_written3;

// Send to emoncms
file_get_contents("$host/input/post?node=diskstats&data=totalsectorswritten2:$total_sectors_written2,sectorswritten2:$sectors_written2,totalsectorswritten3:$total_sectors_written3,sectorswritten3:$sectors_written3&apikey=$apikey");

I run this from cron with:

* * * * * php /opt/emon/emonpi/track_diskstats.php > /dev/null 2>&1

This then creates 4 inputs, all of which I log to feed using PHPFINA at 60s interval. The screenshots above show examples of the resulting graphs.

TrystanLea · 22 April 2019 07:05

The trend is still increasing, average from the start is now 47.2 sectors/minute 0.4kb/s, the last 6 hours 72.7 sectors/minute 0.6 kb/s (higher than the write rate logging direct to disk of 0.54 kb/s). The /var/log folder size itself is now 2.1mb (increased at 0.014 kb/s for last 15 hours, the same rate as in the 12h15 period posted above).

borpin · 22 April 2019 07:23

I ran these stats overnight. Running a standard EmonSD on an EmonPi with just a CT but picking up RF from my other setup. Using the modified emonhub logging as mentioned elsewhere. emonSD next steps: emonhub logging - #13 by borpin

Overnight.

What might be interesting is plotting where feedwriter is writing and what volume. I cannot explain why for some parts of the period the minimum was not zero then from 05:00 it was (not me guv!)! The spike is probably a logrotate of emonhub log. Could it be feedwriter is not exactly 60s?

A close up of the writes as a result of the logrotation at midnight UTC. It was 3 stage rotation, concatination of latest log, a 1.gz file rotated to 2.gz and a new 1.gz file created. [edit]Log compression is about 10:1 - first compressed log was not 24Hrs

-rw-r--r-- 1 root root  19M Apr 22 06:58 /home/pi/data/emonhub-log/emonhub.log
-rw-r--r-- 1 root root 6.3M Apr 22 00:21 /home/pi/data/emonhub-log/emonhub.log.1.gz
-rw-r--r-- 1 root root 792K Apr 21 00:14 /home/pi/data/emonhub-log/emonhub.log.2.gz

TrystanLea · 22 April 2019 08:53

Thanks @borpin

did you record sectorswritten3 as well? that should include all your emoncms data write load.

other than the emonhub log modification are your logs otherwise on tmpfs without any persistance to disk? as in standard emonSD configuration?

What logging level is emonhub set? and can you post the mean writes/minute given in the graph stats below the graph?

borpin · 22 April 2019 09:44

Yes

Standard setup so DEBUG I think.

Yes. The peaks coincide with the emonhub log rotations.

I’ll repost the graph as well so the 2 match

BUT those peaks are unexplained they might have been something I did but are very strange. The data before the first peak is:

Not sure how useful this is…

pb66 · 22 April 2019 10:30

Yes that is it general intention of rsync but it is not that clear cut.

From the wiki page you linked says “The rsync algorithm is a type of delta encoding, and is used for minimizing network usage.” and it seems that network saving could be at a cost of additional disk writes if not set up correctly.

For example, from the rsync man page

-W, --whole-file

With this option rsync’s delta-transfer algorithm is not used and the whole file is sent as-is instead. The transfer may be faster if this option is used when the bandwidth between the source and destination machines is higher than the bandwidth to disk (especially when the lqdiskrq is actually a networked filesystem). This is the default when both the source and destination are specified as local paths."

We are doing local to local rsync’s! So the default operation is to copy the whole file, this is because it is considered faster to “just copy all” on a local to local device where network resource is unimportant.

There was a issue raised on L2R repo about the use of --whole-file (Why only copy file when using rsync? · Issue #25 · azlux/log2ram · GitHub) as it is specifically defined in L2R’s rsync command. The use of --no-whole-file would fall back to only transferring the delta (changes). I choose the word “transferring” rather than “copying” for a good reason.

When rsync does it’s thing it can be working on live files being accessed by other services, therefore at the destination it creates a copy of the target file, makes the changes to that and then deletes the original target and renames the new updated file to replace the original. This results in the whole file being re-written regardless of the --no-whole-file setting. So nothing is gained (in disk writes) by setting just that unless the --inplace option is also used.

–inplace

This option changes how rsync transfers a file when the file’s data needs to be updated: instead of the default method of creating a new copy of the file and moving it into place when it is complete, rsync instead writes the updated data directly to the destination file.

This has several effects: (1) in-use binaries cannot be updated (either the

OS will prevent this from happening, or binaries that attempt to swap-in their data will misbehave or crash), (2) the file’s data will be in an inconsistent state during the transfer, (3) a file’s data may be left in an inconsistent state after the transfer if the transfer is interrupted or if an update fails, (4) a file that does not have write permissions can not be updated, and (5) the efficiency of rsync’s delta-transfer algorithm may be reduced if some data in the destination file is overwritten before it can be copied to a position later in the file (one exception to this is if you combine this option with –backup , since rsync is smart enough to use the backup file as the basis file for the transfer).

WARNING: you should not use this option to update files that are being

accessed by others, so be careful when choosing to use this for a copy.

This option is useful for transfer of large files with block-based changes

or appended data, and also on systems that are disk bound, not network bound.

There is an L2R issue discussing use of --in-place (Current rsync usage may cause excessive SD card writes · Issue #45 · azlux/log2ram · GitHub) and quite rightly at that time there were concerns about the rotated log files. My modifications to L2R negate these issues and concerns as the rotated files are not handled by L2R directly. L2R only sets the logrotation “olddir” directive so that when logrotate occurs, it uses olddir and puts the files to another directory.

With these changes the rsync option should only append the changes directly to the existing file.

Although I have not fully tested this, I believe the rsync command in L2R should be changed from

github.com

azlux/log2ram/blob/07b3599823669a93870f946c7591f885aa370e2e/log2ram#L21-L21


      
          rsync -aXWv --delete --links $RAM_LOG/ $HDD_LOG/ 2>&1 | $LOG_OUTPUT

to

rsync -aXv --inplace --no-whole-file --delete --links $RAM_LOG/ $HDD_LOG/ 2>&1 | $LOG_OUTPUT

I have removed the -W option and added --inplace --no-whole-file, I will update my repo now by adding a new feature branch called “rsync_mods” but unless you plan to reinstall, it is easier to edit in place with sudo nano /usr/local/bin/log2ram, I just commented out the original line and added the new line below it so it is easy to revert.

Going back to whether the tests being done are valid/relative, the emonSD image creates a lot of log messages, had these been previously retained and better managed they would have been very useful to debug all the issues we’ve seen over the last few years. I’m keen to retain a higher level of logging until we are confident all the services are as error free as possible. It was because I expect a high turnover of logs that I had preferred to stick with the cp option in L2R and avoid investigating the potential advances with L2R rsync at this time. When there is a high log rate the files will be rotated more frequently and regardless of the L2R mode, we will be pretty much copying a whole new file each time even with the rsync mode operating as expected.

I believe moving to L2R (in basic cp mode) fixes the retention and folder structure for reboot issues and with the new longer retained verbose logs we can improve both the emonSD software and also L2R so one day we can indeed reduce the loglevels of the services once they are all fairly error free and also switch to a more streamline L2R through thorough testing over time.

With your current testing strategy of reducing the log levels you are introducing a unrealistic test which is causing a major reduction in the efficiency of L2R as the log files are only getting small additions to them, therefore the same whole file is being copied over and over each hour rather than the hourly rotated high traffic logfile being sync’d where the whole file is new and therefore the “cp” or existing “rsync -W” are not disadvantaged since the difference IS the whole file for some services. The use of unrealistically low log levels forces us to lift the bonnet on L2R to accommodate a log level that is unlikely to be seen for some time to come.

Oh and another minor point, I’ve noticed that the L2R logfile records very little (if anything) in cp mode whereas it logs the full --verbose output of rsync, so using rsync mode currently also adds to the logs being sync’d each time. If we start using L2R I would look at adding loglevels and more debugging logs to L2R eg the disk stats immediately before and after a L2R sync. And whilst I’m at it I would add a simple line to send the stats to a emonhub socket, so for debugging and running tests (or just displaying in emoncms) a “L2R” socket interfacer would just need enabling in emonhub, This would also help monitor multiple emonbases/Pi’s in a single emoncms. eg comparing several different logging strategies running on several test devices in one graph.

TrystanLea · 22 April 2019 10:57

are you writing the log rotations to the data partition? I would have though the log write load should be on partition 2?

Thanks @pb66 that’s really useful

I think if we can make it easy for the user to switch between logging levels in the emoncms interface that would be a best of both worlds. The default would be for limited ‘error’ level logging to preserve SD card lifespan, then if a bug is found and the error logs are not sufficient, we suggest turning the debug level logs on in order to catch the next instance of the issue.

I think its a step forwards to persist the ‘error’ level logs but am concerned about persisting debug level logs unless we can be sure that the write load is sufficiently small.

Greebo · 22 April 2019 11:31

“sufficiently small” for what precisely? How many “writes” per year do you want to aim for?
This SanDisk paper from 2003 (Archived SanDisk Paper) describes the SanDisk wear levelling algorithm for their flash memory.
Example 2 describes writing data to a 1GB card with a 4kB file being updated every 5 seconds. They predicted (back in 2003) a write lifetime of 317 years. As the size of the card goes up, the wear levelling has more room to play with so the lifespan increases.

How many years should we be aiming for?

I think it would be useful to understand what we’re shooting for because the law of diminishing returns is a thing.

[EDIT] Just for clarity, I’m not suggesting that reduced writes should be ignored. I’ve personally had a card in a RPi go read-only after 12 months logging weather data every 5 seconds. That was my introduction to reduced write methods and logging to RAM every 5 seconds and then flushing to “disk” every 5 minutes was my solution at the time, still going strong today many years later.

TrystanLea · 22 April 2019 11:58

Thanks @Greebo that’s an interesting paper. Ideally it would be in the 15-25 year range?, though the SD card may well fill up before that. In addition to standard write load there would also be a new image version written to the card every 1-2 years perhaps with a full backup and restore operation…

It’s not entirely clear to me what we are achieving now, I replace my SD card more often than not due to new software versions or/and larger SD card capacities and wanting to keep the old SD card for backup.

Greebo · 22 April 2019 12:25

I find myself in the same position!

Is this the right place to have the discussion about what we should be aiming for or should that be a different thread? This seems like the right place as it is directly related, but feel free to split this post into a new thread if that makes more sense.

There’s two ways to approach the question: a) How long before a given sized card will fill up and need to be replaced with a larger one; or b) how long is a reasonable time an EmonSD consumer should expect their SD card (or EmonPI system) to last without any hardware upgrade?

For a), you could take an “average” (like that exists!) scenario and calculate the time to fill a given sized card, i.e. How long before a 4GB card fills up in an EmonPi/emonTX system with 5 CT’s, 2 voltages and 8 temperature inputs and associated feeds? (is there a more “typical” use case to calculate instead?)
[EDIT]
I just checked my EmonSD card, I’ve got an 8GB card logging 3 phase power, 1 x voltage and associated kWh’s, 11 feeds in total. My data partition is using ~~607MB out of 3.8GB with 20 months worth of data~~, so if I make no further changes, I can expect my card to fill up after about ~~128 months (10.6 years)~~ Oops, 344MB of that was backups! Turns out its about 294 months (24.5 years)
[/EDIT]

For b) Lets start with your suggested 15 - 25 years. How many current-gen RPi’s are still going to be around in 15 years? Is Micro SD still going to be the card format then (its almost 15 years old now)? I know my original Pi B used a full SD Card, that hasn’t been the case for any subsequent ones, so I couldn’t use the same card when I upgraded my hardware. I realise none of us can predict the future, but IMHO perhaps 10 years is more realistic?
How likely is it that someone is still going to be using their originally jessie imaged 4GB card in 2026 (10 years after the image was created)?

TrystanLea · 22 April 2019 12:41

Great thought process. I think this is fine to continue here.

For a) 1x PHPFina feed at 10s resolution for 1 year is 12Mb, a SolarPV configuration has 6 feeds: 72 mb per year, so a 16gb card with~4gb for the OS would provide… ~170 years!

I have 60 feeds, of which roughly half are 60s (60Mb/yr) and half 10s interval (360Mb/yr), 420 Mb/year total. A 16GB card should give me 30 years.

We have only recently started shipping with 16Gb SD cards so this kind of duration is relatively new…

Edit: 10 years is probably realistic but it would be great if we can design it for 20 years.

Greebo · 22 April 2019 12:57

Looks like option a) and b) might both have similar timeframes, so not really that helpful a split at all

I know we can’t be sure that the calculation on page 5 of that archived whitepaper is still valid with current technology cards, but how close are we to that “15-25 years” lifespan based on the current write load you’re seeing in your data collection?