Suggestion: adding a psuedo-random small delay to the emonTx transmit function to minimize collisions

heckler · 27 September 2019 14:59

Feature suggestion: it would be interesting to add a small random (or pseudo-random) delay or wait time of say 0-2 seconds (varying it every time) before engaging the wireless transmit in the emonTX. Doing this would minimize the chance of collision with other devices sending data with the same cadence.

Back story, or how I came upon this:

I just observed something interesting today: I have three emonTx V 3.4 pushing data to the same emonPi. Had a very brief power interruption (less than a second), and didn’t pay much attention to it. However, when I looked at the feeds about 1h later, none had been updated since the power interruption.

Looking at the 3 emonTX, they were working, and flashing the red transmit LED. However, all three were flashing the LED at exactly the same time, stepping on each other’s transmission.

I unplugged two of the three and, sure enough, data from the one still plugged started updating again. Plugged the other two back, one at a time, and all was good.

Cheers!
Claudio

Edit - changed sudo to pseudo to avoid any possible confusion with the command sudo. BT, Moderator

pb66 · 27 September 2019 17:28

I agree it’s an issue when multiple RFM devices (eg emonTx) come back online at the same time. My own solution to this is to have a delay in the start up equal to the node id in seconds, this means each unique node id will cause the devices to start sending their data on a staggered rota. This method avoids having to edit each sketch with a unique delay or introducing a random element which cam play havoc on your input processing, eg if you are summing the power1 inputs of 3 emonTx’s (eg 3phase monitoring) you would want the totalling to happen in the last of the 3 to post it’s data so the total is based on all 3 latest values.

Robert.Wall · 27 September 2019 21:36

Is a 1 second gap a little too generous?

With the default DS sketch coming in with a 26-byte payload, or the CM sketch with a 40-byte payload, and add to that the header and checksum of 9 bytes, transmitted at a little over 49 kBaud, those take under 5.8 and under 8 ms respectively to transmit. There’s a clear advantage in starting off with a respectable gap between transmissions, but that needs to be weighed against the possible need to keep the sampling reasonably synchronous - as your (@pb66) example infers.

I can also see the appeal of the idea of adding some randomness, as the intention is clearly to avoid a long period with two or more transmitters blocking each other until their slightly different clock rates allow them to drift apart.

I don’t see a viable “standard” solution, both would be the “best” given appropriate (and different) conditions.

pb66 · 28 September 2019 12:32

Yes the delay could be much less, I tend to use low node ids and my aim was to spread the first sends evenly(ish) across the 5-10 second interval. So with 3 emontx’s posting at 5s intervals nodes 1, 2 and 3 left a 2 second gap before the cycle restarted.

I see the appeal of adding a random element to avoid log periods of clashing after the devices have been running a while due to the “10s intervals” not always being exactly 10s on every device, but that would currently potentially cause an issue with the way fixed interval feeds work as the timestamp is the recieved time, not the pre-adjusted send time.

If there was a mechanism to recognise received packets as belonging to a particular timestamp, the random adjustment would be of great value, eg if there was a packet counter included in the payload and that was translatable to a particular timestamp (eg (counter x 10s) + start time = timestamp) then the actual time sent and time received will be of less interest and a staggered and/or random rota would work well and allow “synchronised” data as all emontx’s could be powered up at the same time and be sampling data for the same 10s interval, but reporting at irregular or staggered intervals, whilst the receiver allocates the same timestamp due to the same packet count.

In my own sketches, whilst I said above about a “delay” the way it actually works is that it will continue to loop and only send once the node id x 1s interval has passed, this has the advantage of allowing the sampling bias offset removal to settle whilst there are no transmissions and the first transmission is valid data. It eliminates “zero first values” for temp etc and power and voltage readings are accurate from the very first report.

Robert.Wall · 28 September 2019 13:42

The ultimate answer is probably a ‘polled’ solution - but again that can’t be a universal standard because of the needs of battery-only users (though how many of those exist these days is a moot point).

However, since emonLibCM takes its transmission interval from mains time, it gets around the problem of RF collisions (provided, as you suggest, the sketches don’t all start at exactly the same instant) but does nothing for, and might even worsen, the long-term drift of mains time against emonCMS’s clock. Unless of course emonCMS can get its time from the same mains clock…

I’m not sure I understand how that is fundamentally different from clock drift between the sketch and emonCMS. You still have clock drift underlying the random jitter, so (considering just one transmitter) each sample gets the time stamp on arrival as before - it’s just that rather than following a regular pattern of one missed or overwritten - depending on the direction of drift - every n minutes (or hours), you’d have a scatter of correct or missed (or overwritten) samples spread randomly either side of the n minutes event. For a single transmitter, that’s an obvious degradation of the data for no benefit, but considering many transmitters, it would make each individual channel’s data less consistent. The gain would be that long periods of missing data across pairs of channels should disappear, as suggested by @heckler.

That addresses the ‘scatter’ problem but sadly, it still leaves the issue of clock drift.

The other point about the length of the transmission is that with (say) 4 transmitters, the maximum time change needs to be only a few tens of milliseconds - if it was ±5 transmission periods (± 2 mains cycles with the longest data packet), there’s something like a 90% chance of no collision between any pair of transmitters. (Not being a statistician, I need to think about that for a long time, but that’s my best guess.)

I’m afraid I still think there’s no easy solution. I’m also reasonably certain that there isn’t a solution.

nchaveiro · 29 September 2019 10:07

I’m using a different approuch on this but will break compatibility.
By using GitHub - LowPowerLab/RFM69: RFM69 library for RFM69W, RFM69HW, RFM69CW, RFM69HCW (semtech SX1231, SX1231H) lib, you get some nice features:

digital rssi
configurable transmit power
package acknowledge

Each emonth listens (measures rssi) and only start a transmition if there is no chatting. A random delay is used to avoid colisions with other modules listening at the same time.
Each emonth tx power is lowered until no ackowledge is received and up a notch so that each module only transmits with the power level required to reach the base.

With this i was able to make the 2 AA cells live for more than 2 years on a network of 5 emonths posting every minute.
I can share my code for emonth and base if you want to take a look.

Robert.Wall · 29 September 2019 10:57

Thank you for that, Nuno.

I have looked at LPL and I did indeed see some of those features. I think @pb66 has looked too. As you say, adopting it would break compatibility, but now that the sketch in the emonPi is no longer automatically overwritten when the Pi’s emonCMS software is updated, there’s nothing to stop anyone changing to the LowPowerLabs library. But of course, the two libraries will not operate together as the message headers have some fundamental differences in their structure.

I think if Claudio @heckler wanted to go down that route, your code would be a very good start for him. However, it still doesn’t solve the possible timing problem with emonCMS that @pb66 pointed out.