Discarding RX frame 'unreliable content'? - After Upgrade to latest version 9.8.28

looking at the emonpiupdate.log there are 2 separate entries that say to check the settings.php after the update, have you done that? It usually means the settings have been revised and the update script wasn’t able to do the update automatically because the setting.php file wasn’t “stock” there were some user specific settings (other than mysql user/password).

settings.php HAS been user modified
Update required to settings.php...
**ERROR: unable to autoupdate settings.php since user changes are present, manual review required**

this second entry seems to be in html, perhaps @glyn.hudson can say why that might be?

<div style='width:600px; background-color:#eee; padding:20px; font-family:arial;'><h3>settings.php file error</h3><p>settings.php config file has new settings for this version. Copy default.settings.php to settings.php and modify the later.</p><p>To fix, check that the settings are set in <i>settings.php</i> or try re-creating your <i>settings.php</i> file from <i>default.settings.php</i> template</p></div>

Beyond that,the fact your remote emoncms works part of the time, but then fails, might be linked to the local emoncms issue but not necessarily so.

Can you plaese provide some emonhub.log when the fault occurs before you reboot? The current emonhub.log shows us that the remote is being updated and is also shows the local mqtt being published.

so can you do the following 3 checks to confirm all the services are running?

sudo systemctl status feedwriter

sudo systemctl status mqtt_input

sudo systemctl status emonhub

can you check these both now (whilst the remote emoncms is ok) and later when the remote emoncms stops updating, but before you reboot?

On the local emoncms can you please check to see if there are any inputs getting updated by uncollapsing the whole page (rapid double click one inputs “+” button and they all open) and looking for any green updated times.

I’m not sure about the LCD but I suspect it’s unrelated at this point.

Hi Paul -

Yes, I had to modify my settings.php manually to input the database connection information.

I suspect that the issue with my remote emoncms failing IS in fact related to my local emoncms (emonpi in my case) since the data has no other way to get to my remote emoncms server except via my local emoncms server.

I am going to start from scratch with the latest sdcard image for the emonpi, and then upload my backup and see if that helps. At least that will get me to the latest code (from scratch) and eliminate any issues with the update script.

I will report back if any of that helps.

OK, so here is the latest update.

I removed my old SD image card from the EmonPi. I downloaded the latest available version, installed it on a new 16GB card, resized it per the instructions, updated all of the emon software via the interface and restarted.

Once the new, clean system came online, I did a basic network config and then used the backup/restore utility to restore the full backup that I made before the upgrade to the latest SD image (emonSD-26Oct17 with low-write 9.8.28 | 2018.01.27).

I rebooted and everything “appears” to be working, but I still have the exact same problem in that nothing is getting logged or written locally, only to my remote EmonCMS system:

What is very interesting is that I am not getting anywhere near the number of “unreliable content” errors that I was getting before I did this upgrade.

However, one of my main problems still exists - nothing is getting written to the local inputs on the local emonpi.

have you rebooted since setting up?

otherwise can you provide the stuff asked for previously? good and bad logs and service checks, check for duplicated inputs etc

Yep, rebooted several times during the process. I will put together the logs and get them posted.

Thanks!

1 Like

OK, have not had the time to get all the logs and stuff requested from @pb66 due to a hectic flight schedule but I did notice that when the local EmonPi stops sending data to my remote emoncms install, simply going into the Emonhub screen and hitting restart fixes the issue, so the problem with getting the data from my local EmonPi to my remote emoncms appears to be with the emonhub program.

I will work on getting the logs over the next couple of days…

Thanks

Another update -

I was seeing all kinds of errors in the EMonPi logs and it kept failing causing me to have to restart the emonhub service so I decided to do another fresh install.

It has been running all day and has not stopped updating my remote emoncms server however it does not update its own feeds, but I feel like I am beating my head against the wall so I am going to give up on the local emonpi. So long as it continue to update my remote server I am happy.

:slight_smile:

1 Like

Well if you change your mind we’ll still be here, but please supply the info we need to help you.

The emonhub.logs leading up to and beyond a failure are the most useful, the status of the services at the time it’s faulting is also very useful.

I hope it runs ok, but the fact there is a current fault that’s not being dealt with, might impact the part that is working, there’s no telling as we do not know what the fault is.

Good Morning Everyone -

So the problem has cropped up again even with the new install. I have attached the emonhub.log that encompasses several iterations of failures, restarts of emonhub followed by success followed by failures. In all cases it appears that everything is working correctly, then I start getting json failures from my remote emoncms server. Once I restart the local emonhub, it works just fine for some period of time (random) before failing again.

emonhub.log.txt (606.8 KB)

I appreciate the help, I am at a loss. This has been running so smooth for several years now. I am not sure what my net step is at this point.

At the beginning of that log there is already over 9500 buffered entries so this log doesn’t show where that particular fault actually occurs, but the url being passed to your emoncms server shows “[1522490634.268513,28,23514.5,Infinity,0.0,-60]” which suggests that on Saturday, 31 March 2018 at 10:03:54.268 node “28” passed an “infinity” as the second value in it’s payload.

That will have caused emoncms to reject that payload and because of the way emonhub buffers and resends data in chronological order, so it has become stuck because until that payload in the buffer is accepted by emoncms it cannot move on. Unless the buffer gets full and starts deleting the oldest data to make room for the newest data, at that point when that particular frame gets discarded emoncms will accept that payload and things can carry on after the buffer is completely uploaded, assuming of course, that is the only occurrence of “infinity”.

After you restarted emonhub it seems to occur again at 2018-03-31 17:20:24,972 (Line 553)

2018-03-31 17:20:24,969 DEBUG    RFM2Pi     10 NEW FRAME : OK 28 51 74 184 70 0 0 128 127 0 0 0 0 (-61)
2018-03-31 17:20:24,971 DEBUG    RFM2Pi     10 Timestamp : 1522516824.97
2018-03-31 17:20:24,972 DEBUG    RFM2Pi     10 From Node : 28
2018-03-31 17:20:24,972 DEBUG    RFM2Pi     10    Values : [23589.099609375, inf, 0.0]
2018-03-31 17:20:24,973 DEBUG    RFM2Pi     10      RSSI : -61

bytes 5 - 8 are “0 0 128 127” which is the float representation of infinity. I do not know what node 28 is but that seems to be the source of the problem. Infinity is not a valid value, emonhub has passed that faithfully to emoncms and emoncms isn’t happy about it.

we can see in line 596 that the url containing the new “infinity” value does not return “ok” instead it returns “Format error, json string supplied is not valid” as “infinity” is not a valid JSON value.

emoncms already accepts “null” despite that not being valid JSON and I have suggested it accepts “NaN” as well but that hasn’t happened as yet. I do not think emoncms is likely to be changed any time soon, if and when it does, I do not know if it will handle this particular case as it’s really rare (possibly the first time I’ve seen it). I have mentioned to @TrystanLea that a mechanism that emoncms can discard and continue would be very useful for the bulk upload methods.

I think the quickest way for you to get this remedied is to ensure your node is sending valid data. The most common cause is usually divide by zero errors.

[edit] I suspect the reason behind the change is because the emonpi version of emonhub had the buffering removed for 2 years and @TrystanLea has not that long ago, re introduced the buffering. Prior to that the emonpi variant of emonhub was just firing the data off in a url and discarding the local copy rather than waiting for a confirmation of receipt before deleting the local copy. This is to reduce/eliminate data loss due to a flaky network etc.

I suspect the invalid data from node 28 just hasn’t been spotted before as emoncms would have still rejected it, but no one knew that all the data in the same payload was being silently discarded.

Paul -

Thank you for the information. Node 28 is the exact same as two other nodes, it measures water utilization from a smart water meter. The code is pretty simple and it is the same on all three units except for the node number, other than that the code is identical.

This is what I get from Node 28:

2018-03-31 21:34:49,230 DEBUG    RFM2Pi     4571 From Node : 28
2018-03-31 21:34:49,231 DEBUG    RFM2Pi     4571    Values : [23604.900390625, 0.0, 0.699999988079071]
2018-03-31 21:34:49,232 DEBUG    RFM2Pi     4571      RSSI : -63
2018-03-31 21:34:49,232 DEBUG    RFM2Pi     4571 Sent to channel(start)' : ToEmonCMS
2018-03-31 21:34:49,233 DEBUG    RFM2Pi     4571 Sent to channel(end)' : ToEmonCMS

So I guess I don’t understand what you are saying, if emonhub receives data that it does not know what to do with it crashes and requires a restart?

Is this something new in the code? I have been running this setup for almost 2 years now and NEVER had this issue, not even once until I upgraded to the latest version of code. If it is a code change, then is my only option to install an old version of code until the problem is resolved within the emonhub software?

You had said the Node 28 sent an infinity and that “emonhub has passed that faithfully to emoncms and emoncms isn’t happy about it.” But it’s not emoncms that is crashing, it is the emonhub that quits working and requires a restart.

I am running 9.8.10 on the receiving emoncms server and 9.8.28 on the transmitting one, would that make any difference?

The other thing you said is that it was node 28 causing the problems, but I have the exact same code running on three different water meters with no problems. Maybe I have screwed something up in my code for those nodes and it is just showing up now with the new software?

I have attached the sketch running Node 28, maybe I missed something obvious.

Or maybe the sensor itself is going bad and throwing bad data?

moteino_R5_RFM69HW_Pulse_RJS_Pool_Meter.ino.txt (19.9 KB)

Looking at the sketch for Node 28, I’m not convinced that this line is innocent:

  water.GPM = pulseAVGInterval > 0 ? 60.0 * 1000 * (1.0/PULSESPERGALLON)/(pulseAVGInterval/pulsesPerXMITperiod):0;

pulsesPerXMITperiod hasn’t been tested for zero. The parentheses around pulseAVGInterval/pulsesPerXMITperiod force that to be evaluated second (after 1.0/PULSESPERGALLON) but if pulsesPerXMITperiod is zero, then that surely will produce a divide-by-zero error, which will propagate back into water.GPM
I think the cure will be to put that on the top and multiply:

  water.GPM = pulseAVGInterval > 0 ? 60.0 * 1000.0 / PULSESPERGALLON * pulsesPerXMITperiod / pulseAVGInterval : 0;

I have no idea what the nodes are. I could just see on this occasion that node 28 is responsible. If they are all identical then it stands to reason that is is most likely not always node 28 that causes the problem. Unless of course the data node 28 reports happens to be different (eg less water usage etc), I don’t know enough about your setup, the data you are monitoring or the behavior of the service being monitored.

Around 2.5 years ago Trystan created a “emonpi variant” of emonhub, in his implementation, the intended behavior of emonhub was lost and for 2.5years the emonpi variant of emonhub just fired whatever data it had off over http and that was that, it did not wait for a confirmation before discarding the local copy. What emonhub used to do, should have done and now does again, is to wait for an “ok” from emoncms to say it has resieved the data and it’s ok to delete the original local copy.

For 2.5 years it hasn’t mattered whether the data was good or bad, received or not, emonhub just moved on to the next load of data.

Your node 28 (and possibly the others) has basically been ok to send invalid data, because emoncms could reject the data and emonhub didn’t care. The error in the emonpi variant of emonhub hid the error in your node(s) and now it’s fixed your node can’t getting away with it any more.

What is happening now is that when emonhub doesn’t get that “ok” it will buffer the data and keep trying to get emoncms to accept it, but emoncms isn’t having any of it, so the buffer just keeps growing.

emonhub isn’t failing, it’s just not able to off load the data onto emoncms because it’s invalid and emoncms doesn’t want it.

When you restarted emonhub (shown in your log) emonhub was still buffering nearly 10000 frames of data and trying in vain to pass them to emoncms, it’s trying to do it’s job but emoncms won’t allow it, because the data is invalid.

I wouldn’t say “screwed something up” but what you thought was ok when you wrote it, wasn’t ok, but no one noticed because emonhub wasn’t working as intended and didn’t care if emoncms didn’t accept the data. In this instance 2 wrongs did make a right, for a while. However, it was never totally right because the invalid values were causing other data to get discarded with it.

It appears that Node 28 is the only one that ever sends the “128 127” in it data stream.

Robert - I will try changing out the offending line and see if that fixes the issue.

Thank you both for your help and Paul for your great explanation!!

I will report back and let you know what happens!!

If all the nodes have the same code, I wouldn’t bet on the remainder being OK. That line can produce invalid data, and the law of natural perversity say that one day, it will.

More for my own purpose here, I want to make a note I can refer back to for future emonhub development.

Although not the case here, there could be a scenario where the node definition tries to scale the value passed through emonhub. In this instance we do not want emonhub to crash because it cannot multiply “infinity” by what ever scale is set by the user.

What does happen in that scenario will need to comply with emonhub’s global policy on handling errors without interfering with the data, perhaps we will just not do the math if it’s not possible, leaving the source data as it is. (eg infinity is infinity whether it’s multiplied by 0.01 or not).

But as emoncms still won’t accept that and emonhub needs to be robust, perhaps we will end up omitting a result (null) so the rest of that frame is accepted by emoncms, this depends on whether emoncms can handle the invalid data.

Ideally any emoncms specific casting of data should be done in the emoncms interfacer/reporter, but that adds additional processing.

or emitting?

Possibly, I guess omitting a value is the same as emitting a null. Although emonhub is actually written in Python so null is nothing more than a string, it would be None but then emoncms is PHP so the string “None” has no meaning, that’s why omitting is simpler val1,val2,,val4,val5 here we can all see val3 is empty, neither “null” or “None” and yet it’s both null and None depending on which end is looking at it, If that makes any sense whatsoever!

Just a quick update Paul & Robert -

After updating the code and reflashing the mote, all seems well. I will keep an eye on it and let you know if it keeps behaving like it is supposed to in a day or two.

Thank you both very much for your help, this is not something I would have ever caught on my own. Maybe I should have had my 13-year-old math genius daughter help with the math!

:slight_smile:

I do very much appreciate the time both of you put into helping me.

4/1/2018 - Still running great guys…

2 Likes

Paul & Robert -

Just a quick followup. The change in the math did the trick. I have not had a single problem with emonhub having to be restarted since changing the sketch.

My local emonpi still won’t log data but since I use my offsite server for all of my data gathering anyway, this is not a problem I need to address right now.

Thank you both again for your help!

Edit - removed [Solved] from thread name and marked issue as solved. BT - Moderator

2 Likes