One EmonTH node fails for hours, then works for a few minutes

I have recently set up an EmonPi with three battery powered wireless emonTH v2 nodes. One of the nodes simply fails to produce usable results for hours at a time; on the graph you get a straight line, then several successful results where the measures vary, before stopping again and returning to another straight line. For example today I got usable results between about 6am and 6:45, then nothing til about 3pm, another few results then nothing since (it is now 10pm).

I tried moving it closer to the Pi, but this does not help. The RSSI in this case is between -55 and -65, but I am getting more reliable results from the two other nodes that are further away (RSSIs of -60 to -70 and -75 to -85 respectively).

Any ideas?

For illustration please see results of RSSI graph below:

Any ideas anyone? Could it be a hardware issue or is it more likely the node is not configured correctly?

Not many, I’m afraid. The straight line just means no data, or no reliable data (incorrect checksum), was received. The logs should be able to tell you which.

You’ve said others further away and reporting poorer RSIs are reliable, so it looks as if that rules out (a) receiver problems and (b) external interference. That leaves hardware and configuration. I doubt configuration, if you’re using the default sketch. (I assume there you set the NodeID using the switches, and have not changed and reloaded the sketch?)
We did have problems with synchronising transmitter and receiver when there was a long string of zeros - as was the case with temperatures and the emonTx - but the emonTH doesn’t do that, it sends a much shorter message so that can be ruled out. I think it’s either batteries or a genuine hardware fault.

Does vibration, temperature or anything like that seem to trigger any response? Have you looked inside, and is there anything obviously not right?

Hi Steve,

Can you check if quiet = false is set in the emonhub.conf for the RFM2Pi interfacer and provide some emonhub.logs for both a good and a bad period, the good period will show us what the packet looks like when it’s working and might give us an indication of the firmware vintage. Whereas the bad period might give us some clues as to what’s up if the failed packets are being logged.

Do you have a USB serial programmer? It would be useful to get the firmware revision and as the most likely cause of this issue is probably firmware, so you may need to upload an updated firmware to the emonTH.

But there was also an issue with the emonTH sent packet being corrupted when the packet was sent too soon after the RFM was awoken from sleep.

1 Like

I hadn’t noticed that issue. So when was it corrected and does Steve have a pre- or post- version?

This was way back, on the old forum and before the repo’s all got juggled around so I have no quick links I’m afraid, and with out knowing what FW rev is installed to @Steve_Lloyd’s emonTh it’s not really any of help.

If I find time to kill I will try and seek out a date or a post or a commit or something useful.

As Steve said “recently”, then it’s likely it’s not that - unless he’s had them in store for a while (or an old one got found and sent out).
Was it this change:
   V2.6 - (24/10/15) Tweek RF transmission timmng to help reduce RF packet loss
which was before the SMT V2 hardware. So if Steve has the SMT Version 2 hardware, it’s almost certainly not that.

Fair point, that was my best guess at the time of writing, perhaps I shouldn’t speculate, the point is a usb serial programmer would be useful to establish the FW rev, check serial output or install new FW if required.

The fact that 2 of 3 work ok reduces the likelyhood of a emonPi issue, the emonTH v1 has been around a long time with very few failures, if there was a new issue on the horizon due to being a “v2” the chances are it might be fixed with a FW edit.

Without any hard data eg emonhub.log we can only guess and my guess was the RF was an unlikely cause as the signal was good and a battery issue would possibly cut-out once or twice before terminally stopping, but they are just generalizations and assumption, hence my request for the logs to get a better picture.

rather stupid idea but never know, what gives the battery voltage on that nasty unit ? Are they ok or is maybe one of them bit sluggish and voltage drops to low resulting in weak to no signal ??

A bad or corroded battery connection? That is always a possibility. So is a bad joint somewhere on the pcb. That’s why I asked about vibration - vibration plus a bad joint could cure or cause a failure.

Thanks so much for the help, hopefully I can answer your questions:

  1. All settings are default, I literally plugged in the pi, set it up on my wifi, stuck batteries in the nodes and set up three feeds to log the temp, humidity and rssi respectively from all three nodes.

  2. There is no evident reason for why the node transmits successfully. It is set on a bookshelf in the same room as the pi to ensure of all the nodes it has the best signal. The other nodes are either on the other side of a wall or several rooms away respectively. There is little or no overt vibration or major environmental changes. I tried putting a known good node next to it for a while - which continued to work just fine, and I have moved it around the room as well (I noticed if I put the nodes near my wireless thermostat they all decreased in RSSI for much of the time, so I have moved that further away).

  3. Quiet is currently set to True, I have changed it to false and saved now - is a reboot needed?

  4. I do not have a USB serial programmer unfortunately.

  5. As a new user I cannot upload files, but I have stuck it in a dropbox hopefully you can access: Dropbox - File Deleted

  6. This equipment was all bought new within the last 6 months, so unless it was old stock it should be a relatively recent version

  7. I was worried about the aerial - when they arrived they were curled up inside the plastic cases, I unfurled them slightly and squeezed them through the air vents at the top of the node. This improved the signal strength significantly, but there is no evidence of damage or loose connections in the node with the issues. The only thing inside the node that does not appear in perfect condition is a slight chip on the plastic of the dip switches to set the node ID. I have fiddled with it several times looking for an issue and this seems to have no impact on its performance. I have also taken out and put the same batteries back in in case they were not seated properly.

I’ve bumped your user level up a notch.

Great thank you. I have uploaded the log locally but renamed to .txt due to restricted file types.emonhub.txt (1.0 MB)

3 . No reboot required, if you look at the logs you can see all the entries that include “Discarding RX frame ‘unreliable content’?” they are normally discarded by the receiver and not passed to emonhub, by setting quiet = false we can now see them.

7 . pulling the aerial out should not cause a problem. Might be worth swapping the batteries with one that works to see if the issue passes to that device, it’s a long shot but it would prove the batteries are ok.

When you notice it not working again, try and grab some emonhub and post here for us to look at.

There is a lot of noise being picked up which might make sifting though the logs a bit of a pain, but I do not suspect the interference is impacting the actual performance greatly unless the signal levels are increasing significantly. However the noise could be a factor for intermittent packet loss as while the reciever is handling and discarding a bad packet it may miss a valid packet, but that shouldn’t cause long outages, just erratic dropages I would have thought.

If the noise level is not constant, i.e it spikes at random intervals, the noise could be interpreted as a data bit during the time the receiver is ingesting a good packet, in effect mangling a good packet.

Indeed, as would always be the case for momentarily high “spikes” of interference, the difference being that if the abundant low level interference is blocking the device, it is possible to alter the noise floor threshold in software which would cause the receiver to ignore the general noise. Any spikes over that threshold will still be a threat, as they are on any system.

I really appreciate the help. Regarding the noise, why would it affect one node to a much greater extent than another? Unless part of the noise includes something else sending similar packets that look like that particular node?

Changing the settings seems to have started a new log, however since that point it seems to be working - as of now (17:50) I will move node 7 next to node 6 to see if the readings are reasonably similar

I’m not sure I buy the interference concept, except for one very particular case: Do we know if JeeLib will see the channel as ‘blocked’ and not try to transmit if there’s a continuous (i.e. present when it wants to transmit and for however long it’s prepared to wait) source of local, high strength, interference (i.e. which is not being seen by the further away emonTH’s). It’s a long shot, and from what we can see, I’d favour a hardware fault, but it’s something that I think we don’t know.