Discarding RX frame 'unreliable content'? - After Upgrade to latest version 9.8.28

richard · 24 March 2018 01:54

Hello -

I was running an older version of EmonPi and decided to do an update today via the admin panel. I did a backup and then did the upgrade and other than having to update my MySQL username/password in settings.php everything went well.

Unfortunately, it turns out that something did not go quite as well as planned as I now have hundreds (if not thousands) of these entries in my log file:

Discarding RX frame ‘unreliable content’? 0 0 0 0 32 24 0 192 96 0 96 32 32 2 0 64 8 1 32 40 4 (-100)

All my nodes are offine as of now with errors like this:

2018-03-24 03:06:24,024 DEBUG    RFM2Pi     356 NEW FRAME : OK 25 208 2 32 0 0 0 (-85)
2018-03-24 03:06:24,025 DEBUG    RFM2Pi     356 Timestamp : 1521860784.02
2018-03-24 03:06:24,026 DEBUG    RFM2Pi     356 From Node : 25
2018-03-24 03:06:24,026 DEBUG    RFM2Pi     356    Values : [720, 3.2, 0]
2018-03-24 03:06:24,027 DEBUG    RFM2Pi     356      RSSI : -85
2018-03-24 03:06:24,027 DEBUG    RFM2Pi     356 Sent to channel(start)' : ToEmonCMS
2018-03-24 03:06:24,028 DEBUG    RFM2Pi     356 Sent to channel(end)' : ToEmonCMS
2018-03-24 03:06:26,976 DEBUG    RFM2Pi     Discarding RX frame 'unreliable content'? 0 60 160 0 32 8 0 
32 0 40 0 40 0 50 128 0 32 32 0 59 32 (-102)
2018-03-24 03:06:27,796 DEBUG    RFM2Pi     Discarding RX frame 'unreliable content'? 2 4 32 0 60 128 5 
32 0 49 60 8 201 96 32 2 5 160 192 16 52 (-104)

Many Thanks

Robert.Wall · 24 March 2018 12:21

That’s rubbish data that you’re seeing there. The number on the end "(-100) indicates that the received signal strength is very low, and the message doesn’t start with a recognised NodeID - unlike the “OK” message that came from Node 25, at a signal strength of -85. So those are transmissions from somewhere else, not part of your system, that fail the checksum. So they get thrown out.

I’m not an emonPi expert, but that looks to be incorrect. In the clip, I can see Node 25 was received and sent to emonCMS.

pb66 · 24 March 2018 13:09

As @Robert.Wall says the ‘unreliable content’ isn’t your valid data. It might be that your early emonPi didn’t show those discarded packets, the “quiet” setting was missing from the early emonPi’s and they were all effectively set to “quiet = true”, you are now using “quiet = false” which is a feature you can switch on to debug missing or corrupt data packets etc.

If you are getting swamped by distant signals that might make it difficult for your valid data to get through so the log entries might prove useful, but you could have a perfect functional system and also get many of these “foreign” packets, it could be some incompatible equipment using the same frequency, it could be a neighbors equipment.

I see nothing there to suggest anything is offline, can you provide more emonhub.log and tell us what nodes you expect to see in there?

richard · 25 March 2018 17:00

Thanks, Robert -

I don’t know why the logs are showing that is was successful but I can tell you without reservation that since I upgraded, nothing is getting through.

I have rebooted, double checked the configuration, rebooted again and again and still nothing.

I also have it sending to another emoncms (using [[emoncmsorg]] in my emonhub.conf) installed on another system and it has been working fantastic for that last 2 years without a single hitch until this upgrade, so something broke and I am not sure where to go from here.

In the above example, that is my system on another system located in my colo space in a data center. GEM is an electrical monitoring unit that talks directly to EmonCMS without talking to the EmonPi.

Any help would be greatly appreciated.

pb66 · 25 March 2018 17:24

we need to see significantly more to determine what might be happening

to be clear, are you saying that the data is no longer reaching the other emoncms server via http(s) either?

Can you also post the emonpiupdate.log from the ~/data folder?

Can you post your emonhub.conf too, fudge any private urls or apikeys but please try to not make too many changes and tell us what you have changed.

richard · 25 March 2018 18:19

I have an EmonPi that receives wireless data from my sensors. Then I have a second EmonCMS server that receives this same data via this entry in my emonhub.conf:

[[emoncmsorg]]
Type = EmonHubEmoncmsHTTPInterfacer
[[[init_settings]]]
[[[runtimesettings]]]
    pubchannels = ToRFM12,
    subchannels = ToEmonCMS,
    url = http://emoncms.mydomain.com
    apikey = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    senddata = 1                    # Enable sending data to Emoncms.org
    sendstatus = 0                  # Enable sending WAN IP to Emoncms.org MyIP > https://emoncms.org/myip/list
    sendinterval= 5                # Bulk send interval to Emocnms.org in seconds

Everything was working great, but I had been running release emonSD-03May16 and saw that I was way behind, so I made a backup using the backup utility and then did the upgrade using the built-in upgrade utility.

Once all of this was done, I had to update settings.php and my emonhub.conf, restarted the emonpi and then that is when I started seeing a ton of the ‘unreliable content’ entries in my logs and was not seeing anything getting to my feeds on my local EmonPi. I WAS seeing data getting to my second EmonCMS server so it seems like the EmonPi is receiving the data and forwarding it to my second server, but locally it is not seeing (or maybe writing) the data.

So I thought I was ok since I only really use the local EmonPi to get the data wirelessly but I use my data center copy of EmonCMS to store everything and interact with all of my scripts and home automation. Then my local EmonPi stopped sending data to my remote EmonCMS server.

When I reboot the local EmonPi, it will start sending data again, but only for about 3 or 4 hours, then it stops sending the data and I have to reboot it again.

Another thing to note that I am not sure make any difference but thought I would share anyway:

The LCD display on the local EmonPi is screwed up. When it boots up it is just fine, shows the IP address and the release but then just goes to all gibberish.

Thanks for the help!!

emonhub.conf.txt (4.9 KB)
emonpiupdate.log.txt (38.3 KB)
emonhub.log.txt (1.1 MB)

pb66 · 25 March 2018 21:19

looking at the emonpiupdate.log there are 2 separate entries that say to check the settings.php after the update, have you done that? It usually means the settings have been revised and the update script wasn’t able to do the update automatically because the setting.php file wasn’t “stock” there were some user specific settings (other than mysql user/password).

settings.php HAS been user modified
Update required to settings.php...
**ERROR: unable to autoupdate settings.php since user changes are present, manual review required**

this second entry seems to be in html, perhaps @glyn.hudson can say why that might be?

<div style='width:600px; background-color:#eee; padding:20px; font-family:arial;'><h3>settings.php file error</h3><p>settings.php config file has new settings for this version. Copy default.settings.php to settings.php and modify the later.</p><p>To fix, check that the settings are set in <i>settings.php</i> or try re-creating your <i>settings.php</i> file from <i>default.settings.php</i> template</p></div>

Beyond that,the fact your remote emoncms works part of the time, but then fails, might be linked to the local emoncms issue but not necessarily so.

Can you plaese provide some emonhub.log when the fault occurs before you reboot? The current emonhub.log shows us that the remote is being updated and is also shows the local mqtt being published.

so can you do the following 3 checks to confirm all the services are running?

sudo systemctl status feedwriter

sudo systemctl status mqtt_input

sudo systemctl status emonhub

can you check these both now (whilst the remote emoncms is ok) and later when the remote emoncms stops updating, but before you reboot?

On the local emoncms can you please check to see if there are any inputs getting updated by uncollapsing the whole page (rapid double click one inputs “+” button and they all open) and looking for any green updated times.

I’m not sure about the LCD but I suspect it’s unrelated at this point.

richard · 25 March 2018 21:49

Hi Paul -

Yes, I had to modify my settings.php manually to input the database connection information.

I suspect that the issue with my remote emoncms failing IS in fact related to my local emoncms (emonpi in my case) since the data has no other way to get to my remote emoncms server except via my local emoncms server.

I am going to start from scratch with the latest sdcard image for the emonpi, and then upload my backup and see if that helps. At least that will get me to the latest code (from scratch) and eliminate any issues with the update script.

I will report back if any of that helps.

richard · 25 March 2018 22:20

OK, so here is the latest update.

I removed my old SD image card from the EmonPi. I downloaded the latest available version, installed it on a new 16GB card, resized it per the instructions, updated all of the emon software via the interface and restarted.

Once the new, clean system came online, I did a basic network config and then used the backup/restore utility to restore the full backup that I made before the upgrade to the latest SD image (emonSD-26Oct17 with low-write 9.8.28 | 2018.01.27).

I rebooted and everything “appears” to be working, but I still have the exact same problem in that nothing is getting logged or written locally, only to my remote EmonCMS system:

What is very interesting is that I am not getting anywhere near the number of “unreliable content” errors that I was getting before I did this upgrade.

However, one of my main problems still exists - nothing is getting written to the local inputs on the local emonpi.

pb66 · 25 March 2018 22:51

have you rebooted since setting up?

otherwise can you provide the stuff asked for previously? good and bad logs and service checks, check for duplicated inputs etc

richard · 25 March 2018 23:18

Yep, rebooted several times during the process. I will put together the logs and get them posted.

Thanks!

richard · 27 March 2018 01:08

OK, have not had the time to get all the logs and stuff requested from @pb66 due to a hectic flight schedule but I did notice that when the local EmonPi stops sending data to my remote emoncms install, simply going into the Emonhub screen and hitting restart fixes the issue, so the problem with getting the data from my local EmonPi to my remote emoncms appears to be with the emonhub program.

I will work on getting the logs over the next couple of days…

Thanks

richard · 29 March 2018 02:12

Another update -

I was seeing all kinds of errors in the EMonPi logs and it kept failing causing me to have to restart the emonhub service so I decided to do another fresh install.

It has been running all day and has not stopped updating my remote emoncms server however it does not update its own feeds, but I feel like I am beating my head against the wall so I am going to give up on the local emonpi. So long as it continue to update my remote server I am happy.

pb66 · 29 March 2018 11:43

Well if you change your mind we’ll still be here, but please supply the info we need to help you.

The emonhub.logs leading up to and beyond a failure are the most useful, the status of the services at the time it’s faulting is also very useful.

I hope it runs ok, but the fact there is a current fault that’s not being dealt with, might impact the part that is working, there’s no telling as we do not know what the fault is.

richard · 31 March 2018 18:03

Good Morning Everyone -

So the problem has cropped up again even with the new install. I have attached the emonhub.log that encompasses several iterations of failures, restarts of emonhub followed by success followed by failures. In all cases it appears that everything is working correctly, then I start getting json failures from my remote emoncms server. Once I restart the local emonhub, it works just fine for some period of time (random) before failing again.

emonhub.log.txt (606.8 KB)

I appreciate the help, I am at a loss. This has been running so smooth for several years now. I am not sure what my net step is at this point.

pb66 · 31 March 2018 19:13

At the beginning of that log there is already over 9500 buffered entries so this log doesn’t show where that particular fault actually occurs, but the url being passed to your emoncms server shows “[1522490634.268513,28,23514.5,Infinity,0.0,-60]” which suggests that on Saturday, 31 March 2018 at 10:03:54.268 node “28” passed an “infinity” as the second value in it’s payload.

That will have caused emoncms to reject that payload and because of the way emonhub buffers and resends data in chronological order, so it has become stuck because until that payload in the buffer is accepted by emoncms it cannot move on. Unless the buffer gets full and starts deleting the oldest data to make room for the newest data, at that point when that particular frame gets discarded emoncms will accept that payload and things can carry on after the buffer is completely uploaded, assuming of course, that is the only occurrence of “infinity”.

After you restarted emonhub it seems to occur again at 2018-03-31 17:20:24,972 (Line 553)

2018-03-31 17:20:24,969 DEBUG    RFM2Pi     10 NEW FRAME : OK 28 51 74 184 70 0 0 128 127 0 0 0 0 (-61)
2018-03-31 17:20:24,971 DEBUG    RFM2Pi     10 Timestamp : 1522516824.97
2018-03-31 17:20:24,972 DEBUG    RFM2Pi     10 From Node : 28
2018-03-31 17:20:24,972 DEBUG    RFM2Pi     10    Values : [23589.099609375, inf, 0.0]
2018-03-31 17:20:24,973 DEBUG    RFM2Pi     10      RSSI : -61

bytes 5 - 8 are “0 0 128 127” which is the float representation of infinity. I do not know what node 28 is but that seems to be the source of the problem. Infinity is not a valid value, emonhub has passed that faithfully to emoncms and emoncms isn’t happy about it.

we can see in line 596 that the url containing the new “infinity” value does not return “ok” instead it returns “Format error, json string supplied is not valid” as “infinity” is not a valid JSON value.

emoncms already accepts “null” despite that not being valid JSON and I have suggested it accepts “NaN” as well but that hasn’t happened as yet. I do not think emoncms is likely to be changed any time soon, if and when it does, I do not know if it will handle this particular case as it’s really rare (possibly the first time I’ve seen it). I have mentioned to @TrystanLea that a mechanism that emoncms can discard and continue would be very useful for the bulk upload methods.

I think the quickest way for you to get this remedied is to ensure your node is sending valid data. The most common cause is usually divide by zero errors.

[edit] I suspect the reason behind the change is because the emonpi version of emonhub had the buffering removed for 2 years and @TrystanLea has not that long ago, re introduced the buffering. Prior to that the emonpi variant of emonhub was just firing the data off in a url and discarding the local copy rather than waiting for a confirmation of receipt before deleting the local copy. This is to reduce/eliminate data loss due to a flaky network etc.

I suspect the invalid data from node 28 just hasn’t been spotted before as emoncms would have still rejected it, but no one knew that all the data in the same payload was being silently discarded.

richard · 31 March 2018 22:08

Paul -

Thank you for the information. Node 28 is the exact same as two other nodes, it measures water utilization from a smart water meter. The code is pretty simple and it is the same on all three units except for the node number, other than that the code is identical.

This is what I get from Node 28:

2018-03-31 21:34:49,230 DEBUG    RFM2Pi     4571 From Node : 28
2018-03-31 21:34:49,231 DEBUG    RFM2Pi     4571    Values : [23604.900390625, 0.0, 0.699999988079071]
2018-03-31 21:34:49,232 DEBUG    RFM2Pi     4571      RSSI : -63
2018-03-31 21:34:49,232 DEBUG    RFM2Pi     4571 Sent to channel(start)' : ToEmonCMS
2018-03-31 21:34:49,233 DEBUG    RFM2Pi     4571 Sent to channel(end)' : ToEmonCMS

So I guess I don’t understand what you are saying, if emonhub receives data that it does not know what to do with it crashes and requires a restart?

Is this something new in the code? I have been running this setup for almost 2 years now and NEVER had this issue, not even once until I upgraded to the latest version of code. If it is a code change, then is my only option to install an old version of code until the problem is resolved within the emonhub software?

You had said the Node 28 sent an infinity and that “emonhub has passed that faithfully to emoncms and emoncms isn’t happy about it.” But it’s not emoncms that is crashing, it is the emonhub that quits working and requires a restart.

I am running 9.8.10 on the receiving emoncms server and 9.8.28 on the transmitting one, would that make any difference?

The other thing you said is that it was node 28 causing the problems, but I have the exact same code running on three different water meters with no problems. Maybe I have screwed something up in my code for those nodes and it is just showing up now with the new software?

I have attached the sketch running Node 28, maybe I missed something obvious.

Or maybe the sensor itself is going bad and throwing bad data?

moteino_R5_RFM69HW_Pulse_RJS_Pool_Meter.ino.txt (19.9 KB)

Robert.Wall · 31 March 2018 22:45

Looking at the sketch for Node 28, I’m not convinced that this line is innocent:

  water.GPM = pulseAVGInterval > 0 ? 60.0 * 1000 * (1.0/PULSESPERGALLON)/(pulseAVGInterval/pulsesPerXMITperiod):0;

pulsesPerXMITperiod hasn’t been tested for zero. The parentheses around pulseAVGInterval/pulsesPerXMITperiod force that to be evaluated second (after 1.0/PULSESPERGALLON) but if pulsesPerXMITperiod is zero, then that surely will produce a divide-by-zero error, which will propagate back into water.GPM
I think the cure will be to put that on the top and multiply:

  water.GPM = pulseAVGInterval > 0 ? 60.0 * 1000.0 / PULSESPERGALLON * pulsesPerXMITperiod / pulseAVGInterval : 0;

pb66 · 31 March 2018 22:55

I have no idea what the nodes are. I could just see on this occasion that node 28 is responsible. If they are all identical then it stands to reason that is is most likely not always node 28 that causes the problem. Unless of course the data node 28 reports happens to be different (eg less water usage etc), I don’t know enough about your setup, the data you are monitoring or the behavior of the service being monitored.

Around 2.5 years ago Trystan created a “emonpi variant” of emonhub, in his implementation, the intended behavior of emonhub was lost and for 2.5years the emonpi variant of emonhub just fired whatever data it had off over http and that was that, it did not wait for a confirmation before discarding the local copy. What emonhub used to do, should have done and now does again, is to wait for an “ok” from emoncms to say it has resieved the data and it’s ok to delete the original local copy.

For 2.5 years it hasn’t mattered whether the data was good or bad, received or not, emonhub just moved on to the next load of data.

Your node 28 (and possibly the others) has basically been ok to send invalid data, because emoncms could reject the data and emonhub didn’t care. The error in the emonpi variant of emonhub hid the error in your node(s) and now it’s fixed your node can’t getting away with it any more.

What is happening now is that when emonhub doesn’t get that “ok” it will buffer the data and keep trying to get emoncms to accept it, but emoncms isn’t having any of it, so the buffer just keeps growing.

emonhub isn’t failing, it’s just not able to off load the data onto emoncms because it’s invalid and emoncms doesn’t want it.

When you restarted emonhub (shown in your log) emonhub was still buffering nearly 10000 frames of data and trying in vain to pass them to emoncms, it’s trying to do it’s job but emoncms won’t allow it, because the data is invalid.

I wouldn’t say “screwed something up” but what you thought was ok when you wrote it, wasn’t ok, but no one noticed because emonhub wasn’t working as intended and didn’t care if emoncms didn’t accept the data. In this instance 2 wrongs did make a right, for a while. However, it was never totally right because the invalid values were causing other data to get discarded with it.

richard · 31 March 2018 23:18

It appears that Node 28 is the only one that ever sends the “128 127” in it data stream.

Robert - I will try changing out the offending line and see if that fixes the issue.

Thank you both for your help and Paul for your great explanation!!

I will report back and let you know what happens!!