Hi all, we have 5 units (emonTX/ emonBASE setups) running remotely from our office, all sending data to Emoncms.org via internet connection, which is generally working quite well.
However, we have had some intermittent failures where the device stops sending data and the emoncms.org screen depicts these as red colour. Generally, in order to resolve the issue, all we need to do is simply turn the power off to the units and this sort “kick starts” the unit and continues recording.
In the case of finding a resolution to the sudden stops, is there a way of getting into the unit and downloading e.g., a log file that might paint a picture for us as to why the failures happen. I can see the logs in emonhub etc from the UI of the emonBASE device but seems a temporary file.
Thanks for the info. I will ssh into the unit next time I’m on site and download the file. Hopefully something will come up on it.
The length of time between failures is intermittent and I wouldn’t rule out the possibility of drops in internet connection in that case e.g., one particular failure happened last weekend during a storm that caused internet outage in the area. Just the more random failures that are probably more difficult to investigate - especially off-site.
What vintage of emonSD images are you running @bez ?
It is strongly advised you do not remove the power without shutting the Pi down first, it can be tempting and seem easier to just flick the switch or remove the power cord but that can have a terminal impact on the sd card.
If you can access the local emoncms instance, which you seem to be able to for accessing the emonhub logs, there should be a restart or shutdown button on the admin page of emoncms. Or you can ssh in and issue a shutdown command. Either will result in a more graceful and less risky shutdown.
If you are running an image dated/released in the last year or so it should be running log2ram which backs up all logs to /var/log.old once every hour AND at (a graceful) shutdown. So if you want recent/historic logs checkout /var/log.old bearing in mind how you shutdown, if pulling the power you may have up to 1 hour missing, but if shutdown gracefully they should all be there.
By “all” I mean not only the emoncms, emonhub and mosquitto etc, but also syslog etc.
P.S. Whilst the logs may help after a reboot, the best time to debug this is when the data is not flowing, before you reboot. As tempting as it is to get things running ASAP, another hour of data loss while you debug (whilst the fault is present) might result in finding the cause and preventing any future loss.
The problem is emonhub controls log rotation and this happens quite quickly so occasional errors are difficult to identify. I did previously suggest a method of being able to extend the volume/timespan of emonhub logs retained for just this reason.
If emonhub is controlling the rotation something is broke. The rotation should be controlled by log2ram. The in-built log rotation in emonhub shouldn’t come into play if log2ram is working/setup correctly.
The settings may not be ideal, for example if a emonhub instance regularly produces 2.5mb per hour, there is a chance it could reach emonhub’s 5mb rotation setting before log2ram runs again due to the 3mb maxsize (2.5 is under the limit and then if 5mb is reached within the hour), ideally that limit should be less than half ie 2mb, but that still won’t cover runaway faults, but that is a different issue.
The number of old logs kept and for how long is also another discussion too, just upping the log count is by far the easiest route, but that means lots of small files to check. Stitching all the rotated files together as part of a post-rotation routine is more complex and I think that’s where dev stalled.
Also the rate of rotation will depend on the traffic, with just one emontx I suspect traffic and therefore logging to be lower. YMMV if you have more going on.
The point I was making above was that the logs may well be there (but it’s not guaranteed), they are no longer just in /var/log, you shouldn’t assume all logs are lost on a reboot, and even more important is not to reboot until you have checked the logs, if that is possible.
I suggest you check again, we’ve had this discussion more than once! And you have repeated this error with Bruce too.
So, I’m at least glad you have struck out the value in your post above, because the line you have quoted to me, which by the way has been there pretty much since day one of original emonhub and is the line I have relentlessly pointed at over the last few years, is 5000 x 1024 bytes, which is 5000 kilobytes or a little shy of true 5 megabytes (5120 x 1024 or 5 x 1024 x 1024 would be 5 megabytes, but we are not here to split hairs, it’s a long way from “500mb”).
You are correct in “detecting” the count and size can now be set in the conf, this “feature” was slipped in amongst Bruce’s Python3 changes by Trystan 10mths ago.
But since I’m not aware of these settings being provided in the current default conf I expect them to be at their original settings, 1 of 5mb.
Going back to what I said in my last post.
5mb was plenty big enough during development, but G&T decided to globally set all rotations at 1 x 1mb in the emonSD images which crippled emonhub logging and set the precedent that you refer to where the logs are too small/short/few for any proper debugging. In the long log2ram discussions no decision was reached and then during testing Trystan experienced why we needed to set a log2ram (or rather a logrotate) limit for emonhub so that the 2 rotation systems are not working against each other, he chose 3mb, but I think 2mb would be better to remove that grey area in my example above.
What happens to the logs after rotation is managed by Logrotate, not emonhub or log2ram. Since the global count is set to 7, there should be 7 x 3mb = 21mb of logs, which should be ample.
Too improve logging the logrotate setting needs reducing to 2mb or the emonhub log_max_bytes needs increasing, not JUST to set the rotation size but rather to ensure emonhub is set to at least double the logrotate setting to avoid the example given.
To increase the number of logs retained, simply add a rotate 20 or what ever value suits better.
So, now going back to my first post, Ryan should have quite a few logs to look at, whether he has pulled the power or not will determine if any will be missing, and if that isn’t the case “something (else) is broke”.
Also don’t forget the log partition is just 40mb and has been since the days of the Pi B with 500mb ram despite my efforts to increase it to 50mb on Pi’s with less than 1GB and 100mb for 1GB and above. So emonhub’s 1 x 5mb (rotated) log settings could use upto 2x 5mb which is 25% of the ram limit (between log2ram iterations), where as the current settings could use less than 3mb (13%). So be warey of just upping the emonhub settings without looking at the partition size, we don’t want it filling up again.
I have no idea if setting backupCount to zero in emonhub would prevent a second file being produced, but that may not stop emonhub from “rotating” and effectively just clearing the log, I haven’t looked closely at Bruce’s code. However I would hate to see emonhub’s logging “fiddled with” until at least, the logrotate/log2ram is working 100% 100% of the time as it has been very stable and useful over the years, we need to fix the replacement before taking away the fallback.