EmonTx4's unstable behavior

Hi all,

This thread is a “copy” of an issue I wrote a couple of days ago. Here, it’ll be more visible, maybe other users experienced the same problem.

Playing around with an ESP32-expansion board and ESPHome, I could notice that sometimes, the firmware does not work in a stable way.
It happened mostly after a power-down then power-up.
It looks like, during power-up, sporadically, things are not loaded properly and part of the running environment (RAM, …) is corrupt.
Unfortunately, I cannot describe any clean scenario how to reproduce it.
These problems were combined with some difficulties to get things reset using the web-serial page “Serial Monitor”.
I had to power-down/power-up multiple times until I finally got the correct UI, with all the parameters, like the default Datalog period of 9.80.

During the unstable phase, the serial output was randomly corrupt, sending parameters multiple times, or non-existing parameters or even broken json. For example:

Received data: {"MSG":3085,"V1":232.89,"V2":232.52,"V3":232.80,"P1":1,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0,"E1":0,"E2":0,"E3":0,"E4":0,"E:0,"P6":0,"E1":0,"E2":0,"E3":0,"E4":0,"E5":0,"E6":0,"pulse":0}

I could catch such a line in the log…
Not only are some sensors sent multiple time, but the json is broken too.

"E4":0,"E:0,"P6":0
This part… there’s a missing " AND the tag “E” should be indexed!

The very strange thing is that the json was not always corrupt, these wrong duplicated parameters were not always there, some lines were fully okay!

After I got my Emontx4 fixed, it’s now running perfectly since the 20th of July 13:30 ! I couldn’t find anymore broken lines.

It ran fluently during some time and then…

A couple of minutes ago, I started to see errors and strange things in Home Assistant and on emoncms.org.
Leaving the Emontx4 powered on, I removed the expansion board.
Then I’ve wired only gnd and rx to an FTDI adapter, so that no reboot can be triggered.
Monitoring the serial with VSCode, I see now that instead of getting an output every 9.80 seconds, I get on full line every 0.5 seconds !!
See a piece of output with timestamp:

20:51:11:580 -> {"MSG":51789,"V1":233.03,"V2":233.46,"V3":233.40,"P1":1,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0,"E1":25,"E2":-17,"E3":-20,"E4":2,"E5":-8,"E6":-10,"pulse":0}
20:51:12:049 -> {"MSG":51790,"V1":233.43,"V2":233.72,"V3":233.51,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0,"E1":25,"E2":-17,"E3":-20,"E4":2,"E5":-8,"E6":-10,"pulse":0}
20:51:12:539 -> {"MSG":51791,"V1":233.49,"V2":233.71,"V3":233.31,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0,"E1":25,"E2":-17,"E3":-20,"E4":2,"E5":-8,"E6":-10,"pulse":0}
20:51:13:008 -> {"MSG":51792,"V1":233.47,"V2":233.09,"V3":233.79,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0,"E1":25,"E2":-17,"E3":-20,"E4":2,"E5":-8,"E6":-10,"pulse":0}
20:51:13:497 -> {"MSG":51793,"V1":233.71,"V2":233.38,"V3":233.02,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0,"E1":25,"E2":-17,"E3":-20,"E4":2,"E5":-8,"E6":-10,"pulse":0}
20:51:13:966 -> {"MSG":51794,"V1":233.54,"V2":233.66,"V3":233.71,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0,"E1":25,"E2":-17,"E3":-20,"E4":2,"E5":-8,"E6":-10,"pulse":0}
20:51:14:456 -> {"MSG":51795,"V1":233.32,"V2":233.09,"V3":233.69,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0,"E1":25,"E2":-17,"E3":-20,"E4":2,"E5":-8,"E6":-10,"pulse":0}

The output itself does not seem to be corrupt, so the corruption is due to the esp32 probably because too much data is being sent.
This “timing bug” just happened without any reboot, no power loss, see historical data for MSG in the picture.

Image

At the time I write this thread, my EmonTx4 runs fluently again since the 24th of July 21:17.

I think this is the cause of the issue. The ESP draws a lot of power (relatively) so you’ll get a transient on the power supply when you remove it while powered on. Power supply glitches are a classic method for scrambling microcontroller execution.

Potential mitigation approaches:

  • Don’t remove or install the ESP when powered on :slight_smile:
  • Improve the power supply stability. Hard to do on an existing board, certainly not scalable.
  • Investigate the brown-out detector threshold. This will attempt to mitigate low voltage situations.
  • Add a watchdog and hope the problem manifests such that the watchdog resets into a good state.

Well, I’ve done that to analyze what was wrong without rebooting the emontx.
And I connected an FTDI with only GND and RX/TX, again to avoid any reboot.
That’s where I say that in fact the serial was not corrupt, BUT, the emontx sent a full message every half of second !

Hi all,

There’s a REAL problem with power-ON of the Emontx4.

Yesterday, I’ve powered it down through emonVS (I’ve a switch on the power cable to emonVS).

And then I powered it ON again… since then, all the measurements are fully broken. This time, the rate at which the emonWifi module receives the messages is still correct, so every 9.8 seconds.

I don’t know if only my emontx4 is broken, or if the whole serie is broken ! I think, that must be some firmware issue.

After that, it’s very very hard to get is up and running again :frowning: Most of the time, the settings are reset to the defaults (rf enabled, serial format as “key:value pairs”, …).

Fred

I’m more and more convinced that brown out detection on the ATmega328 is either not well configured (this would be an Arduino setting) or working robustly - the fall rate of the voltage from the emonVs when you power it off is slow, so the transition to a voltage below the working threshold is perhaps too slow to trigger it.

The emontx4 is not running on an atmega328p.

It’s an AVR128DB48.

To bring it again up and running, I had to use an usb cable and had to reset the settings to the desired one (full json, …) with the Serial Config Tool.

Leaving the usb cable connected, I’ve plugged in the rj11 cable from emonVS.

Then I removed the usb cable, so there’s no power interruption.

Surprisingly, this time, all energy savings didn’t get lost!

Maybe the eeprom is read to quickly during setup? That could at least explain why the settings were wrong.

Sorry, yes. Very similar.

My hypothesis is that the BoD is not very good and you’re getting multiple resets with the slowly falling voltage supply. This might be what is causing the EEPROM to be corrupted - it’s being accessed repeatedly at a very marginal supply level.

You’re the HW specialist :smiley:

If this could be fixed with some SW/FW tuning, I could give a try!

Hello @FredM67

I made a key error in hindsight with the expansion board pin out options for the emonTx4 and that was not to route 5V to the expansion board. Only the regulated 3.3V supply is routed through to the expansion board. I think this under-powered supply is causing your issues. It’s also difficult to pickup 5V from the PCB with e.g a wire link for an easy fix.

I think the best solution is that I get you an emonTx6 / emonPi3 with the ESP expansion board that @awjlogan has developed!

Hi @TrystanLea,

That was absolutely no problem to wire the +5V till the corresponding pin suitable for the emonWifi module I got from @awjlogan. I’ve just soldered a wire from the RJ11 connector (back side of the pcb) to the pin for the expansion board.

But the problem I encounter is not related to that. At the very beginning, before Angus sent me an emonWifi module, I’ve used a module I made inspired from the esp32 expansion board. This module was powered with the 3.3V and from time to time, the emontx4 went wrong and ran somehow chaotic. We isolated this issue due to the power supply that was too weak.

The problem that still exists is related also to the power supply, but in a different way.

It looks like during power off and/or power on, something goes wrong. Angus tend for some HW issue, but well, it could a FW issue.

My hypothesis is that the BoD is not very good and you’re getting multiple resets with the slowly falling voltage supply. This might be what is causing the EEPROM to be corrupted - it’s being accessed repeatedly at a very marginal supply level.

As I wrote to Angus, the settings were corrupt (for example, RF was again enabled, Serial was set to key/pair, …) BUT, the energy counters were still there and correct.

Sure, I’m interested in the new emontx6/emonpi3, but no hurry (except if you need tester), I can wait until it’s officially released.

Fred

There are a couple of BOD related options when compiling AVR-DB firmware e.g:

and:

but could the issue you are experiencing be a fault with your emonVs perhaps rather than the BOD?

I’m not aware of settings getting reset being a common issue? apart from that is the issue fixed in December 2022 EmonTx4 firmware update, v1.5.4 (fix for calibration reset issue)

Hi @TrystanLea,

Well, you’re right, we did only focus on a emontx4 “fault”, but it could be too that my emonVS has some weak point, especially during power down and/or power up !
It’s a V1.1, in 3-phase version.

Fred

I don’t think there’s a fault with the emonVs - it’s just a rather slow falling voltage. Looking at the the AVR configuration, it appears the BOD is disabled (“Disabled/Disabled”) and also the BOD level is very close to VDDMIN (1.8 V). Given the Tx is always running from a regulated supply, I would enable the BOD (if it’s disabled) and raise the level to 2.85 V (although apparently that is the Arduino default). However, making those changes requires the bootloader to be updated, which requires an external programmer (another Arduino can do this). You can check what the current level is by reading the BOD’s CTRLB.LVL field, which is loaded from the fuses and if it’s active by reading the BOD’s CTRLA.ACTIVE field.

A software mitigation might be to check the last reset source and if it was done by the BOD (RSTFR.BORF == 1) then delay for a short time (say 100 ms) before doing anything. This would allow the power supply to settle before proceeding or, if it’s falling, to power off without doing any accesses.

For the Tx6/Pi3, the SAMD’s BOD is more sophisticated. At startup, the BOD level is set high but does not assert reset - once the supply rises above the BOD level, the reset is made active and execution continues. This is needed to prevent corruption of the flash, particularly when using the fast clock. See the source for more context.

1 Like

Well, I can try to enable the BOD and set the level to 2.85V…. as long as I find out how to do that without “breaking” my tx4, lol.

Will I need to remove the emonWifi module for these experimentations ?

Fred

I would check what the existing setting is first - I’m not too familiar with Arduino, but you might be able to do something like:

Serial.print(F("BODCFG = 0x"));  
Serial.println(FUSE.BODCFG, HEX);

Note, that fuse bits are inverted (i.e. 1 → not set) - best to feed it into one of the fuse decoders if you’re unsure. The value for the BOD config is described in section 8.8.2.2 of the AVR128DB’s datasheet.

Nope - the code above would just print over the normal serial connection.

Interested to hear the results :slight_smile:

First, here’s the log during upload:

Uploading .pio\build\Upload_USART\firmware.hex

avrdude: Version 7.1-arduino.1
Copyright the AVRDUDE authors;
see avrdude/AUTHORS at main · avrdudes/avrdude · GitHub

     System wide configuration file is C:\Users\metrichf\.platformio\packages\tool-avrdude\avrdude.conf

     Using Port                    : COM5
     Using Programmer              : arduino
     Overriding Baud Rate          : 115200
     AVR Part                      : AVR128DB48
     RESET disposition             : dedicated
     RETRY pulse                   : SCK
     Serial program mode           : yes
     Parallel program mode         : yes
     Memory Detail                 :

                                       Block Poll               Page                       Polled
       Memory Type Alias    Mode Delay Size  Indx Paged  Size   Size #Pages MinW  MaxW   ReadBack
       ----------- -------- ---- ----- ----- ---- ------ ------ ---- ------ ----- ----- ---------
       fuse0       wdtcfg      0     0     0    0 no          1    1      0     0     0 0x00 0x00
       fuse1       bodcfg      0     0     0    0 no          1    1      0     0     0 0x00 0x00
       fuse2       osccfg      0     0     0    0 no          1    1      0     0     0 0x00 0x00
       fuse4       tcd0cfg     0     0     0    0 no          1    1      0     0     0 0x00 0x00
       fuse5       syscfg0     0     0     0    0 no          1    1      0     0     0 0x00 0x00
       fuse6       syscfg1     0     0     0    0 no          1    1      0     0     0 0x00 0x00
       fuse7       codesize    0     0     0    0 no          1    1      0     0     0 0x00 0x00
       fuse8       bootsize    0     0     0    0 no          1    1      0     0     0 0x00 0x00
       fuses                   0     0     0    0 no          9   16      0     0     0 0x00 0x00
       lock                    0     0     0    0 no          4    1      0     0     0 0x00 0x00
       tempsense               0     0     0    0 no          2    1      0     0     0 0x00 0x00
       signature               0     0     0    0 no          3    1      0     0     0 0x00 0x00
       prodsig                 0     0     0    0 no        125  125      0     0     0 0x00 0x00
       sernum                  0     0     0    0 no         16    1      0     0     0 0x00 0x00
       userrow     usersig     0     0     0    0 no         32   32      0     0     0 0x00 0x00
       data                    0     0     0    0 no          0    1      0     0     0 0x00 0x00
       eeprom                  0     0     0    0 no        512    1      0     0     0 0x00 0x00
       flash                   0     0     0    0 no     131072  512      0     0     0 0x00 0x00

     Programmer Type : Arduino
     Description     : Arduino for bootloader using STK500 v1 protocol
     Hardware Version: 3
     Firmware Version: 25.1

avrdude: AVR device initialized and ready to accept instructions
avrdude: device signature = 0x1e970c (probably avr128db48)
avrdude: reading input file .pio\build\Upload_USART\firmware.hex for flash
with 25156 bytes in 1 section within [0x200, 0x6443]
using 50 pages and 444 pad bytes
avrdude: writing 25156 bytes flash …

Writing | ################################################## | 100% 3.79s

avrdude: 25156 bytes of flash written
avrdude: verifying flash memory against .pio\build\Upload_USART\firmware.hex

Reading | ################################################## | 100% 2.47s

avrdude: 25156 bytes of flash verified

avrdude done. Thank you.

1 Like

Then the serial output during runtime (I’ve added both Serial.print at the beginning of setup):

BODCFG = 0x0
firmware = emon_DB_6CT
version = 2.1.1
hardware = emonTx5
voltage = 1phase
Loaded EEPROM config
vCal = 101.30
iCal1 = 100.00, iLead1 = 3.20
iCal2 = 50.00, iLead2 = 3.20
iCal3 = 50.00, iLead3 = 3.20
iCal4 = 20.00, iLead4 = 3.20
iCal5 = 20.00, iLead5 = 3.20
iCal6 = 20.00, iLead6 = 3.20
pulse = off
RF = off
datalog = 9.80
json = on
vrefa = 1.0262
{“MSG”:1,“V1”:232.10,“P1”:0,“P2”:0,“P3”:0,“P4”:0,“P5”:0,“P6”:0,“E1”:0,“E2”:0,“E3”:0,“E4”:0,“E5”:0,“E6”:0}

1 Like

Thanks Fred. Mmm, unclear - the Arduino bootloader can’t program the fuses so the first part is not relevant but the FUSE read being 0x0 indicates it’s inactive. One more request, can you do something like?

Serial.print(F("CTRLA = 0x"));  
Serial.println(BOD.CTRLA, HEX);
Serial.print(F("CTRLB = 0x"));  
Serial.println(BOD.CTRLB, HEX);

Here we go :smiley:

BODCFG = 0x0
CTRLA = 0x0
CTRLB = 0x0
firmware = emon_DB_6CT
version = 2.1.1
hardware = emonTx4
voltage = 3phase
Loaded EEPROM config
vCal = 101.30
iCal1 = 20.00, iLead1 = 3.20
iCal2 = 20.00, iLead2 = 3.20
iCal3 = 20.00, iLead3 = 3.20
iCal4 = 20.00, iLead4 = 3.20
iCal5 = 20.00, iLead5 = 3.20
iCal6 = 20.00, iLead6 = 3.20
pulse = off
RF = off
datalog = 9.80
json = on
vrefa = 1.0260
{“MSG”:1,“V1”:232.78,“V2”:232.77,“V3”:232.71,“P1”:0,“P2”:0,“P3”:0,“P4”:0,“P5”:0,“P6”:0,“E1”:0,“E2”:0,“E3”:0,“E4”:0,“E5”:0,“E6”:0}

1 Like

Well, that suggests the BOD is indeed disabled. I would definitely recommend enabling it as we discussed above. A bit of a pain you can’t enable it through firmware.

To change the fuses, you’ll need to upload the Arduino bootloader again through the UDPI header.