Whilst I think a failed sdcard is unlikely, I’m not certain enough to rule it out as a possibility, installing to another sdcard (new or otherwise) would have gone a long way to put that idea to bed.
Sorry but I’m still a little vague on exactly what you’ve done. Same errors as last time prior to attempting stuff? or after attempting stuff? Your last post seems to suggest the error are NOT the same.
What we need to move forward is precise information about the status when it first fails and before any attempts to restart or reboot.
Yes, that is the issue you are eventually seeing, but emonhub does start perfectly ok and run without issue for some time (from a fresh image). Then something happens! emonhub doesn’t suddenly need to restart for no apparent reason. There is something that is causing emonhub to NEED to restart first, eg you are updating, you have rebooted, there has been a power outage or a crash due to an error induced within emonhub, external to emonhub or simply just due to the log partition filling up etc. We need to know why it needs to restart.
If you are flying down a motorway in your car and it suddenly goes kuput (and then won’t restart again) the issue is that it cut out whilst driving, not that it’s a non-starter, that is a subsequent symptom, it’s not the same as waking up one morning and the car doesn’t start. The fact you were doing 70mph and there was a loud bang and loads of smoke, is far more useful info than describing what now happens when it will no longer start.
Yes 8 emonTx’s is quite a load, but I have had the original emonhub running with 9 emonTx’s reporting at 5sec intervals along side 12 emonTH’s. There are differences in that my emonTx’s were connected via usb, but the traffic within emonhub (albeit a different version) would have been well over double what you have, so it is not a obvious concern (yet) but again I cannot rule it out due to the differences.
If there were going to be issues with so many emonTx’s, I would have expected that to have been in the RF domain, not within emonhub, eg rf collisions due to 8 unsync’d 10s transmissions. However, if you feel strongly about it being a traffic thing, you could try running with fewer devices for a trial period, or alter the reporting period so they report half as much (eg a 20s interval) to see if that has an effect.
The problem I’m having with getting my head around this is that the issue sticks, as if it is damaging something that doesn’t get rectified during a reboot, but does get rectified by redoing the image. This suggests something is being written to a file that is used in subsequent reboots.
As well as the known issue with ram filling up, I have since learnt about some additional issues with the logrotation. I would be keen to walk you through some changes to revert emonhub to log directly to /var/log and to check your log rotation settings too.
The trouble we have with starting with a new image is that it pulls in all the latest updates on firstboot, so each time we do that the playing field changes, we do not know if the current version will fail or not until it does, likewise, if you start afresh and we make some changes, if it’s successful, it’s not clear if it’s the changes or the latest update that has resolved things.
Actually, another good test might be to block the firstboot updates (on a fresh image) to see if that runs for a longer period of time (without pulling in any updates).
Which ever method you opt for, when/if it fails, that is the condition we want to know about, do not try any restarts or reboots. Currently a reboot will wipe all the relevant log data, we now know that emonhub will not restart after it fails, so there is nothing to gain by trying to reboot/restart, all that does is change the debug data to something possibly quite different to the initial failure.
The main maintainers are @TrystanLea and @glyn.hudson, but I wrote the “experimental” PoC version of emonhub that was taken and modified quite substantially to produce the emonpi variant. So I have a good understanding of what emonhub used to do or what it should do, but it is not my code that is deployed. Unfortunately, I do not use the emonpi variant or the emonSD myself, so the bulk of my “emonpi” experience is derived solely from helping debug issues such as yours.