emonPi losing connection to a Tx4

Anyone have any idea why an emonPi (bought mid 2019; software up to date, moved to LPL firmware) would be losing connect to a Tx4 that is nearby (like less than 2 foot away, but the emonPi in a metal network cab; side is currently off)?

It’s happened for the second time. The last time it happened, it’d happened while I was away, but the emonHP was still syncing data from it just fine (even though I’d never set it up to, autoconfig had). So I’m guessing it’s a sign the Tx4 is fine, though I had rebooted that to try.

The Tx4 is on the 3PH 12Ch v1.2.0 firmware due to nothing newer being available (boo).

The first time it happened, I tried updating the software on the emonPi, rebooting etc, nothing seemed to work. Eventually turning it off, disconnecting from the power for 30s and replugging it in seemed to help.

It’s happened again, well, according to the software ~38 hours ago. So I’ve just done a soft reboot and nothing, so will do another shutdown and power off…

I’m guessing this is a sign of a hardware issue, or something failing etc, potentially related to the board used for the device comms?

Trying to decide how much time is worth putting into this, vs just moving the monitoring to the emonHP and syncing that to HA afterwards.

I’m not using the 2 inbuilt CT ports on the emonPi, and as such, the voltage sampling it’s connected to is mostly redundant too.

Time to just retire the original emonPi?

1 Like

The shutdown and removal of power before turning it on again seems to have helped this time again…

I’ve disconnected the voltage sampling connected to the emonPi too, for the reason it provides no value atm.

The emonHP is in the airing cupboard, so up two floors and across somewhat.

The emonPi (and Tx4) are in the garage.

I’ve seen that before. It needs the power removed to reset everything especially the RFM chip.

Is the emonPi doing anything other than reccording the data? Does it do any measuring itself?

There will be a new Wi-Fi board soon that will just plug into the TX4 and uses ESPHome to communicate directly with HA and publish by HTTP or MQTT to emoncms, You could bypass RFM completely.

My ‘main’ emoncms is on a laptop with PVE and emoncms installed in an Ubuntu LXC. There is also a docker version of emoncms.

Good to know it’s not just me. I’m guessing there’s no apparent known (or easy?) fix then?

Looks like it’s happened again, ~7 hours ago. So maybe lasted ~4 days from the last reboot.

Guess I should go reboot it again.

Not really.

There are a few temperature probes attached, but nothing I really pay much attention to, or really care about, so they can be removed.

I had my house upgraded to 3PH last year, and eventually a 3PH inverter etc went in a few months later. So at that point, 2 CT clamps wasn’t really very useful. So hence removing the voltage sampling recently too, as it served no purpose (as I have an emonVS for the Tx4).

It’s doing some summing etc of the data feeds for them to be consumed in HA, but nothing particularly strenuous otherwise.

Like I say, the emonHP is there doing some similar stuff, and I suspect isn’t exactly highly loaded either.

When it appears, the Wi-Fi board will probably be the best solution. You could point it to the emonHP then.

The emonHP can seemingly see the Tx4 just fine - it’s not far away, just through a couple of walls and the floor.

Originally I wondered if this was the problem, that it was trying to talk to two different devices.

It kept autoconfig-ing itself and showing up in the list, so I hard to turn that off, and then still had to enable the “disable further input creation” feature to stop it appearing as inputs…

Ah yes it could be. I’d mistakenly thought the emonHP didn’t have RF. The two may well be conflicting.

Unfortunately, it’s still failing after turning it off autodiscovery (at least in emoncms) on the emonHP, but that doesn’t mean the radio still isn’t doing things in the background.

Though as both are only reading… I would presume it shouldn’t matter too much in practice, especially when they’re not specifically paired etc.

The age of the emonPi (6.5+ years) and there seemingly being some known/seen problems in the wild, seems to tally reasonably to suggest it’s probably some issue with the wifi chip/board, and short of possibly replacing it (unless it’s fixable in firmware somehow – I think it’s been a while since any actual updates were available), there’s not a great deal that can be done there, and having to hard reboot the device every few days really isn’t sustainable, nor is it any use when I’m away from home etc

I guess I might aswell try turning autodiscovery back on on the emonHP, enable device creation again, and port over the (limited) feed logic I have on the emonHP… And start pulling that data into HA, and swap over the energy monitoring devices that it’s using.

1 Like

Moved it over to the emonHP.

The old emonPi and emonHP seemed to be reading the data just fine at the same time.

Of course, as when I do things, I found a few bugs in emoncms etc, so bugs filed in some cases, PRs attached for others…

2 Likes

A possible explanation for the loss of data may be:
Assuming that your system consists of an emonTx4. emonPi1 and emonHP, all communicating by radio using the LPL protocol, then

I think the LPL protocol is designed for a many transmitter / one receiver system. When the Tx4 sends data to the receiver, it requests that the receiver acknowledge the receipt of the data by transmitting an ack packet. In a system with two receivers, both receivers will acknowledge the data and I think the LPL gets upset by this.

In my case I had a few transmitters and two receivers, When I switched them all from JeeLib to LPL protocol, I found the transmitters would occasionally lock up, until I removed one receiver. I don’t know if the two acks can lock up a receiver though.

So there may be nothing wrong with your emonPi …

The emonHP wasn’t specifically using the radio (because all the emonHP stuff is connected physically to the device), other than what it picked up via autoconf (there was a TH they were both listening to just fine).

I did turn off autoconf on the emonHP, but as I mentioned, that didn’t seem to fix it, certainly possible the radio was still doing stuff in the background though.

The TX seemed to continue working fine (and because 3PH 12CT, it is on an “older” firmware too), I didn’t need to reboot that to get it working (just the emonpi - and more than a reboot, it needed a full power off), and it continued transmitting to the emonHP just fine throughout.

On the emonPi1, the reset line on the RFM69 radio module isn’t connected to the Raspberry Pi - so it can’t reset it. It needs a full power down.

1 Like

… and jam each other, so the emonTx transmits again, and again until it hits the limit for retries. Note I believe the emonTH does not request acks.

2 Likes

To mitigate that, the Amateur Radio AX.25 Packet Radio protocol inserts a randomly variable delay in the transmitter key-up sequence.

Could something similar be incorporated into OEM devices that transmit acks?

1 Like

I think so, because I believe returning the ack is done in the sketch, not inside LPL.

@Bill.Thomson - this does not mitigate in general this as the clocks in the various devices are not synchronised, so a random delay just adds to the spread in time. The canonical study for unsynchronised senders is ALOHAnet.

There could be some improvements seen if the senders send nearly simultaneously, and the frequency difference would be negligible over the short time frame, so adding a large (relative to the packet time) random delay might help here.

We’re on the same sheet of music. :wink:

Which is exactly what the biggest problem with APRS was.
Many transmitters - with no clocks - were transmitting on the same channel.
Mobile stations, fixed stations, digipeaters. It made quite a mess of things.

So, Bob Bruninga, developer of APRS, implemented ALOHA.
It made a significant impact on channel throughput.

(For those not familair with the acromym, APRS is the
Automatic Packet Reporting System. Before the early 90s, APRS was known as the Automatic Position Reporting System)

From: AX.25 - Wikipedia

Media access control follows the carrier sense multiple access approach with collision recovery (CSMA/CR).


The mechanism I was referring to in my last post is:

Random Backoff: If a collision occurs, devices wait for a random period before attempting to retransmit. This randomization helps prevent repeated collisions by staggering the transmission attempts of different devices.


73,

KR6K

1 Like

For non-ham readers, @Bill.Thomson is saying “love to all” “best regards”, with his call sign…:slightly_smiling_face:.

1 Like

Hi @SarahH,

73 means best regards.

Hamspeak for love is 88.

:wink: :grin:

1 Like