One EmonTH node fails for hours, then works for a few minutes

I’ve just re-read my post back to myself and I definitely said

Which is basically acknowledging there is a lot of background noise being logged, and offering some reassurance that isn’t likely to be the problem, but with a small caveat to cover if the interference isn’t consistent ie spiking or perhaps stronger at times.

I them went on to say

Which clearly states “intermittent packet loss” as in there could be an occasional packet lost if there is high traffic due to background noise keeping the receiver busy, yes the JeeLib library is written to retry and that usually works but if the network is very noisy occasionally a packet may fail on the 2nd attempt and yes that could happen to all nodes equally. BUT it will NOT knock out a single node for a prolonged period.

The JeeLib traffic management I believe is in the hands of the sender alone, ie it will not transmit unless there is an opening and it will wait and try again if it is unable to send straight away. SO IN THEORY it is possible that noise is high enough at one node to postpone sending, whilst being barely seen at the receiver if the noise is more local to the sender.

BUT I AM NOT SUGGESTING THAT TO BE THE CASE HERE!!!

But on the flip side we cannot rule it out either, without knowing the source of the interference and the locations of each device and even then it would be a informed guess at best. We will need to see if there is anything useful in the logs or maybe just replace the hardware without considering any other possible cause.

Perhaps I should have restricted my first post to just “No idea! Can you post some emonhub.log”

Paul - I wasn’t getting at you, I was postulating that if there was a very local source of interference to the transmitting node and knowing that JeeLib does have rudimentary detection of channel use but not knowing exactly what happens next, i.e. does it wait until a break, or after n attempts does it transmit regardless or quit, it could be causing a problem just for that one node.

Since making the change (Quiet = False) I have seen the longest consistent up time for the apparently faulty node (6) since I started. I have placed a known good node (7) right next to it. Within a few minutes they were logging near-identical temp (within .1 of a degree). Node 7 is showing consistently higher humidity (3%, eg if 7 shows ~60%, 6 is showing ~57%), and it tracks changes with the same variance which leads me to believe this is a calibration difference. RSSI is pretty similar with 6 being about 2.5 lower than 7 (if 6 is showing -62.5, 7 is showing -60) .

I’ve attached a fresh log, but the impression I get is that node 6 was consistently failing CRCs - despite the Pi recieving apparently good data. Does that make sense?emonhub.txt (407.8 KB)

Can you confirm the troublesome node is still located in the same position as when you were experiencing issues earlier?

Have you changed anything else other than setting quiet = false ?

This setting doesn’t impact the reception or the way the packets pass crc, all it does is toggle on/off whether the first 20 bytes of any “discarded” packets are printed to the serial output or not, when they are printed (ie not “quiet”) emonhub is then able to log those packets to emonhub.log, no other operation is effected.

So if there is a change in operation it is purely coincidental and maybe due to moving the unit or disturbing the batteries etc. Only time will tell, the logs you have posted do not show or explain any “missing data” as the emonTh continues to work, if and when it fails, at that time there maybe useful data in the logs.

The logs show 3 nodes 23, 24 and 25, with the default emonhub.conf I assume node 24 is the troublesome node and 25 is the one you have now positioned next to it, is that correct?

If there are no failures within a reasonable time, I would put it back to it’s original position and see how it goes. just remember to grab the log before altering anything if if/when it does occur again.

Never mind - it died again. But it did update consistently for an hour and a half. I only moved it about 2 meters (from one bookshelf to another in the same room) because there was space for the second node to be next to it there., the only change was after I noticed it started logging I put the other one next to it. And yes, sorry, I call 24 and 25 6 and 7 respectively because that’s how they show up in the feed/graphs. I have attached a new set of logs in the hope you can help. Really appreciate your assistance so far. emonhub.txt (211.3 KB)

What time was the node not working? Was it still at fault when you grabbed the log?

A quick look at the end of the log shows all 3 nodes working, is that where you have got it working again or in a fault condition?

Where are you noting that the node isn’t working?

Are you just going by the graph output? Do you check the “last updated” column on the inputs page to see when the last update was received?

[EDIT] looking closer at that last log node 24 did not miss a beat at all between 20:17 and 20:37 (UTC I assume, ie TZ not set the emonPi)

2017-07-01 20:17:18,497 DEBUG    RFM2Pi     2221 NEW FRAME : OK 24 238 0 0 0 43 2 25 0 1 0 0 0 (-63)
2017-07-01 20:18:18,571 DEBUG    RFM2Pi     2224 NEW FRAME : OK 24 238 0 0 0 43 2 25 0 1 0 0 0 (-64)
2017-07-01 20:19:18,577 DEBUG    RFM2Pi     2227 NEW FRAME : OK 24 238 0 0 0 44 2 25 0 1 0 0 0 (-64)
2017-07-01 20:20:18,672 DEBUG    RFM2Pi     2230 NEW FRAME : OK 24 238 0 0 0 44 2 25 0 1 0 0 0 (-63)
2017-07-01 20:21:18,734 DEBUG    RFM2Pi     2233 NEW FRAME : OK 24 238 0 0 0 44 2 25 0 1 0 0 0 (-63)
2017-07-01 20:22:18,848 DEBUG    RFM2Pi     2236 NEW FRAME : OK 24 238 0 0 0 44 2 25 0 1 0 0 0 (-63)
2017-07-01 20:23:18,952 DEBUG    RFM2Pi     2239 NEW FRAME : OK 24 238 0 0 0 44 2 24 0 1 0 0 0 (-63)
2017-07-01 20:24:19,004 DEBUG    RFM2Pi     2242 NEW FRAME : OK 24 238 0 0 0 44 2 25 0 1 0 0 0 (-62)
2017-07-01 20:25:19,065 DEBUG    RFM2Pi     2245 NEW FRAME : OK 24 238 0 0 0 44 2 25 0 1 0 0 0 (-63)
2017-07-01 20:26:19,239 DEBUG    RFM2Pi     2248 NEW FRAME : OK 24 238 0 0 0 44 2 25 0 1 0 0 0 (-63)
2017-07-01 20:27:19,254 DEBUG    RFM2Pi     2251 NEW FRAME : OK 24 238 0 0 0 44 2 25 0 1 0 0 0 (-65)
2017-07-01 20:28:19,419 DEBUG    RFM2Pi     2254 NEW FRAME : OK 24 238 0 0 0 45 2 25 0 1 0 0 0 (-63)
2017-07-01 20:29:19,425 DEBUG    RFM2Pi     2257 NEW FRAME : OK 24 238 0 0 0 45 2 25 0 1 0 0 0 (-62)
2017-07-01 20:30:19,532 DEBUG    RFM2Pi     2260 NEW FRAME : OK 24 238 0 0 0 45 2 25 0 1 0 0 0 (-63)
2017-07-01 20:31:19,655 DEBUG    RFM2Pi     2262 NEW FRAME : OK 24 238 0 0 0 44 2 25 0 1 0 0 0 (-63)
2017-07-01 20:32:19,694 DEBUG    RFM2Pi     2265 NEW FRAME : OK 24 238 0 0 0 44 2 25 0 1 0 0 0 (-63)
2017-07-01 20:33:19,758 DEBUG    RFM2Pi     2268 NEW FRAME : OK 24 238 0 0 0 44 2 25 0 1 0 0 0 (-63)
2017-07-01 20:34:19,930 DEBUG    RFM2Pi     2271 NEW FRAME : OK 24 238 0 0 0 45 2 24 0 1 0 0 0 (-62)
2017-07-01 20:35:19,988 DEBUG    RFM2Pi     2274 NEW FRAME : OK 24 238 0 0 0 45 2 25 0 1 0 0 0 (-62)
2017-07-01 20:36:20,109 DEBUG    RFM2Pi     2277 NEW FRAME : OK 24 238 0 0 0 44 2 25 0 1 0 0 0 (-61)
2017-07-01 20:37:20,186 DEBUG    RFM2Pi     2280 NEW FRAME : OK 24 238 0 0 0 44 2 25 0 1 0 0 0 (-62)

Could it be a display issue in the feed perhaps?

Example of RSSI feed:

And here is the the humidity feed for the same approximate period (temperature is not helpful because the other sensor is showing the same temp so it is not possible to see both lines)

Possibly? Is that the only place you are noticing the issue? Was that emonhub.log for a period of outage?

What feed type are you using? If phpfina, what fixed interval are you using?

It raises a question over why only this node is affected, are they identical feeds and processing?

OK it is almost certainly an issue with the graph:

If I zoom in for the period where a wide view shows no data I get a straight line where mousing over shows no datapoints. EG from the above sample:

This appears to show no packets received between 19:35 and 21:35
But If i zoom in again however:

The graph is produced by the Pi web interface (in this case accessed on my LAN 192.168.1.8/emoncms/graph).

For these graphs I have simply added all 3 nodes temp, humidity and rssi on log to feed from the input view, then ticked them on the graph view, assigned consistent colours to each node and set the duration to daily, then saved that graph for ease of use.

EDIT: view of settings for saved view:

So are you saying that these are all using the default feed settings of 10 second fixed interval phpfina feeds?

I haven’t intentionally changed the settings as far as I can tell, but the update time is rarely 10 seconds:

When first creating your input process lists to add feeds you clicked the spanner at the end of the input row on the inputs page and got this screen

did you then add a “log to feed” by just accepting the default settings as displayed?

I can see from your feeds page screenshot they are phpfina but it doesn’t tell us what the fixed interval is, If you do not recall, then you can use a api call from your browser to recall the settings eg

http://emonpi/feed/getmeta.json?id=3

where “3” is the phpfina feed id (from the feeds page) for the feed in question, this api call assumes you are already logged into emoncms.

it should return something like

{"interval":10,"start_time":1467633890,"npoints":6271843}

which confirms it is a 10s fixed interval, this api only works for phpfina feeds

Hi again,

Yes that’s correct. I have also checked all the feeds (1-9) are showing “interval”:10,

EG:
{“interval”:10,“start_time”:1498029300,“npoints”:96423}

OK, found something else new. If I change the interval in the graph view to 10 the graph populates correctly for this node. Changing it to other values, particularly 60 or 120 (the default) causes greater loss of data points. Could it be the internal clock is off so it gradually loses sync with the expected input time which is why it comes and goes in semi-regular intervals?

EG example 1 (interval 120) - lots of missing data points

example 2 (interval 40) - some missing data points

example 3 (interval 10) - very few missing data points

No, what is happening is the feed is set up to save datapoints on a 10 second interval, but your data is arriving at 60s intervals, so when a new value arrives it will save that value to the current 10s time slot and not knowing what might of occurred since the last 60s update it omits the 5 missing data points or rather sets them to null.

So when you are viewing the graphs, the graphing code will determine the optimum resolution to display your data depending on the size of the graph area and the time span, If you look at the “view of settings for saved view” you posted above, the top line of boxes shows the interval was determined to be 120s.

When the graphing code then retrieves the data to display by looking for the nearest datapoints to those 120s intervals it has only a 1 in 6 chance of hitting valid data as 5 of 6 are null datapoints.

You simply need to better match the feeds to your data by replacing those existing feeds with phpfina 60s feeds and all should be good.

There is no way to change the interval once set so you cannot reuse the feeds you have created, you must create new feeds.

I cannot explain why this appeared more apparent to one specific node, it must have been pure luck since the feeds seem identical and are updating equally. It was affecting the other 2 feeds, we can see that in the last graph you posted, just to a lesser degree. each time you open the viewer page and view a graph it will have different start and end times, so in theory it should randomly affect all the traces, not just the one.

Ok, so the nodes are transmitting every 60 seconds, got it thanks! I will remove them all and recreate new feeds.

It looks as if a change to the notes in emonCMS is called for, to point out that a 60 s interval is required for rmonTH nodes.

Just to follow up this has indeed solved the problem - though the default interval in the graphs view is not always helpful (it changed to something like 900? - I set it back to 60 to match the node updates).