Missing data when two emontx powered on simultaneously

pdath · 14 October 2021 19:56

I’ve been using Open Energy Monitor for a good year and it has been great. I have one outstanding issue and with lockdown happening, I thought I might as well try and fix it.

I have an emonbase and a pair of emontx3. The emontx3 are AC-AC powered, each from their own power supply. They each have four CTs attached. No temperature probes.

The issue I’m having is I tend to lose a lot of data from the emontx3.

Things I have tried:

If I power on only 1 emontx3 (doesn’t matter which one) it works perfectly.
If I manually power on one and then the other, it works perfectly.
If I power on both at the same time, the unit with the higher node number loses lots of data. The emonhub log shows bigs gaps in the data being received for that node.
I have got an FTDI cable and updated both units to the latest continuous monitoring jeelib software from the emonbase.

I think they must be clobbering each other on the RF spectrum and the backoff is not working, or the back off is not sufficient.

I’ve had a look at the code, and trying to decide what the best way is to address this.

github.com

openenergymonitor/EmonTxV3CM/blob/master/src/EmonTxV3CM.ino

/*
  emonTxV3.4 Continuous Sampling
  using EmonLibCM https://github.com/openenergymonitor/EmonLibCM
  Authors: Robin Emley, Robert Wall, Trystan Lea
  
  -----------------------------------------
  Part of the openenergymonitor.org project
  Licence: GNU GPL V3
*/

/*
Change Log:
v1.0: First release of EmonTxV3 Continuous Monitoring Firmware.
v1.1: First stable release, Set default node to 15
v1.2: Enable RF startup test sequence (factory testing), Enable DEBUG by default to support EmonESP
v1.3: Inclusion of watchdog
v1.4: Error checking to EEPROM config
v1.5: Faster RFM factory test
v1.6: Removed reliance on full jeelib for RFM, minimal rfm_send fuction implemented instead, thanks to Robert Wall
v1.7: Check radio channel is clear before transmit

This file has been truncated. show original

I saw another post talking about adding a delay in the setup() function based on the node number, something like:
delay(nodeID*100)
But it also indicated some other issues. Also, this would only mask the issue - that the RF backoff is not working correctly.

I see there is a busyTimeout set to 15ms. I’m wondering if this is just too tight, and maybe it might be best to try adjusting this first. Perhaps double it to 30ms.

Thoughts?

Robert.Wall · 14 October 2021 21:40

Welcome, Philip, to the OEM forum.

Thoughts? This problem is almost inevitable, because each emonTx operates for the most part autonomously.

The first question - or observation - is, the two emonTx’s must be able to hear each other for the transmit hold-off to be able to work. If they can’t, each will believe the band is clear, even when the other is transmitting, and the base will get both simultaneously and that’s something it cannot handle.

You can try adding a delay in the startup code of one, it will alleviate the problem initially, but unless the two emonTx’s run at exactly the same speed forever, then one will catch up with the other and at some stage, you’ll have lost data.

The hold-off of 15 ms should be enough: the data is 40 bytes, and there’s an overhead of 9 bytes, sent at 49.2 kb/s, which I reckon happens in under 8 ms. But feel free to change it and see what happens. There’s a small but real possibility that it won’t work even when both can hear each other, which is because there’s a window while while each is listening and before transmitting - so if both start to listen at exactly the same instant, neither will detect the other listening (of course) and won’t know the other is about to transmit. I haven’t found that in the HopeRF data, so I don’t know how long it listens before deciding the band is clear - or not.

The hold-off doesn’t affect the rate at which the messages are sent, so to spread out the clashes, you could shorten the datalogging period by a mains cycle or two (or more, if you wish). That will mean you’ll lose fewer messages consecutively, but more frequently.

I’m not sure what you mean there - JeeLib isn’t used in that sketch. If you’ve replaced rfm.ino by JeeLib, you’ll have worsened the problem, because JeeLib has no mechanism to detect the band is busy.

pdath · 14 October 2021 22:09

the two emonTx’s must be able to hear each other for the transmit hold-off to be able to work

They are within 300mm of each other. They can definitely hear each other.

What you have described, them both listening at the same time, I think will be the problem. It makes sense.

I’m planning on adding an additional emontx next year, and knowing that I will have three, I think I will try these changes:

Use a delay(nodeID) in the setup to space them ever so slightly apart on power-up.
Change from using busyTimeout to nodeID for the timeout. Then if there are multiple emonTXs they will not all simultaneously try and re-transmit clobbering each other.

Do you accept pull requests? If this works, I’m happy to push this back to make it more reliable for other users with multiple emontx units.

With regard to the FTDI cable, I meant I applied this update using emoncms (which suggests it is using jeelib). Do you think I should use rfm69 instead?

Robert.Wall · 14 October 2021 22:20

I don’t understand, therefore have nothing to do with, Github. If you have a comment or suggestion for me, post it on the forum.

No, and it says it’s JeeLib message format, not JeeLib. Native RFM69 format is for use with the emonPiCM - see the post releasing that. You cannot mix the two.

pdath · 15 October 2021 02:49

I have something that works now.

I tried starting with a delay of 1ms, and kept going up, and it didn’t start working reliably from a cold power off till I had a delay of 25ms (it took quite a while …).

I note the busyTimeout is 15ms, and by coincidence, if the packet transmission time is 8ms, those two come to 23ms, just a shade under 25ms.

In EmonTxV3CM.ino, line 143 for me was:

if (digitalRead(DIP_switch1)==LOW) nodeID++;                         // IF DIP switch 1 is switched on (LOW) then add 1 from nodeID

I added after this:

// Add in a small delay in case multiple emonTx3 units are powering on at the same time
if (nodeID>15) delay((nodeID-15)*25);

Would you be able to add this to the main codebase to make it more reliable for others?

pdath · 15 October 2021 05:14

After running for a couple of hours with a 25ms delay I found the data loss issue was substantially reduced, but still occurred a little bit.

I have now increased the delay to 50ms, and after running for another couple of hours the loss is now zero, as in no lost data points at all.

// Add in a small delay in case multiple emonTx3 units are powering on at the same time
if (nodeID>15) delay((nodeID-15)*50);

borpin · 15 October 2021 08:01

Can you hardwire one of the EmonTX to the EmonBase via the serial/UART?

pdath · 15 October 2021 08:12

It’s not impossible, but it would be quite hard.

borpin · 15 October 2021 08:40

The clashes between the 2 devices are, almost, inevitable using the RFM (without your changes). You could use WiFi (either the EmonESP or a Pi ZeroW) to transmit the data to the EmonBase for one of the EmonTX.

Several threads on how to do so. IME a PiZW is very reliable.

That delay is quite a neat solution to a common issue @Robert.Wall @TrystanLea.

Not a guarantee of course, but the cold power on of 2 devices simultaneously causing data transmission issues, has come up before.

[edit]

This of course is dependent on the message size. If additional sensors were fitted etc, the message size would increase and that might require an increased delay.

User configurable delay via the TX menu @Robert.Wall?

Robert.Wall · 15 October 2021 15:29

I thought the change to include the delay had been put forward ages ago - it seems it has not been picked up. It’s in all the “RFM69 Native” sketches for working to the emonPiCM that I published back in July, and I’m quite certain the discussion was a long time before that.

But, I’ve been pondering whether it would be possible to work mains-powered emonTx’s using a “polled” method of getting the data, I’ll start a new thread.