EmonTx4's unstable behavior

That is, the numbers read “V1”:-180.00,“V2”:0.00,“V3”:0.00 or something like that?

Exactly !

Did you re-compile and re-load the sketch after you set the fuses?

Yes, I’ve recompiled everything and re-uploaded the sketch afterwards.

Have you changed the processor clock speed?

Nope. At least, I’ve just set up the parameter as in @TrystanLea’screenshot and/or the screenshot from the doc you sent me. But remember, before I made these fuse-changes, this strange behavior was already there.

See new (wrong) log with frequency:

[08:53:23.164][I][emontx:147]: Received data: {"MSG":12,"F":50.31,"V1":-162.08,"V2":0.00,"V3":0.00,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0,"E1":0,"E2":0,"E3":0,"E4":0,"E5":0,"E6":0}
[08:53:32.997][I][emontx:147]: Received data: {"MSG":13,"F":50.31,"V1":262.56,"V2":0.00,"V3":0.00,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0,"E1":0,"E2":0,"E3":0,"E4":0,"E5":0,"E6":0}
[08:53:42.826][I][emontx:147]: Received data: {"MSG":14,"F":50.29,"V1":-62.88,"V2":0.00,"V3":0.00,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0,"E1":0,"E2":0,"E3":0,"E4":0,"E5":0,"E6":0}
[08:53:52.627][I][emontx:147]: Received data: {"MSG":15,"F":50.30,"V1":-276.64,"V2":0.00,"V3":0.00,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0,"E1":0,"E2":0,"E3":0,"E4":0,"E5":0,"E6":0}
[08:54:02.459][I][emontx:147]: Received data: {"MSG":16,"F":50.19,"V1":-126.24,"V2":0.00,"V3":0.00,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0,"E1":0,"E2":0,"E3":0,"E4":0,"E5":0,"E6":0}
[08:54:12.282][I][emontx:147]: Received data: {"MSG":17,"F":50.39,"V1":51.68,"V2":0.00,"V3":0.00,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0,"E1":0,"E2":0,"E3":0,"E4":0,"E5":0,"E6":0}
[08:54:22.105][I][emontx:147]: Received data: {"MSG":18,"F":50.19,"V1":197.76,"V2":0.00,"V3":0.00,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0,"E1":0,"E2":0,"E3":0,"E4":0,"E5":0,"E6":0}

As you can see in the log, the voltage values are complete garbage !

I did not realise this.

Has your version of the sketch or the emonLibDB library got corrupted? Have you checked both against Github?

If all is OK, can you add more debugging statements?

I’ve extracted from the library and the sketch the lines where the rms voltage is calculated. The instantaneous voltage samples are squared and accumulated in longProcessing->sumSampleSquared, averaged to give the mean square, then the square root obtained to give rms, the residual offset is removed, and finally calibration is applied. You cannot of course have a square (of an integer) that is negative.

[emonLibDB]
vInput[i].Vrms = ((double)longProcessing->sumSampleSquared[i] / longProcessing->sampleSets);
vInput[i].Vrms -= ((double)longProcessing->cumSampleDeltas[i] / longProcessing->sampleSets 
                               * longProcessing->cumSampleDeltas[i] / longProcessing->sampleSets);
vInput[i].Vrms = sqrt(vInput[i].Vrms);
vInput[i].Vrms *= vInput[i].amplitudeCal;
vInput[i].Vrms *= VrefAdjust;

double EmonLibDB_getVrms(uint8_t input)
{
    return vInput[--input].Vrms;
}

[Sketch]
emon.V[ch] = EmonLibDB_getVrms(ch + 1) * 100;
Serial.print(emon.V[ch] * 0.01);

It is the last line there which prints the negative value.
You cannot take the square root of a negative value - it results in an error, therefore the problem lies after the line where this happens.
The first possibility is vInput[i].amplitudeCal. This appears to be correct - it is printed as correct before the sketch starts measuring, but it might have been corrupted. Unfortunately, there isn’t a “getter” function to return its value.
Next is VrefAdjust, so check it. The function EmonLibDB_getADCCal(void) returns it, so print this. Serial.print(EmonLibDB_getADCCal( ));
Finally, can you do Serial.print(EmonLibDB_getVrms(1)); This prints V1 directly from the library, ignoring the multiplication and division in the sketch.

I had not appreciated that the issue was that fundamental.

If you reset the EEPROM to default values so that it does not read settings from EEPROM are the voltage readings stable?

@TrystanLea
According to this EmonTx4's unstable behavior - #56 by FredM67, it is using the default values. Hence my dubiety regarding the integrity of the library and sketch that @FredM67 is using.

This is what happens when I feed what I believe is the same sketch with a negative voltage calibration constant for a minute or so. It’s certainly not the same as what @FredM67 is seeing.

This is another strange aspect of this issue!

Can you try uploading a pre-compiled version of the firmware @FredM67 using EmonScripts/emonupload.py

Indeed. It all smells of corruption somewhere along the line.
You (@TrystanLea ) put the voltages in the sketch in an array, so those are contiguous, I put all the properties of a voltage into a struct, and the three structs into an array, so those are pretty closely bunched together in memory too.

So something is overflowing (repeatedly?) and corrupting one of those patches in memory? That is my working theory at present.

So… I think more and more that the issue is somehow related to some variables shared between the isr and the loop.

Serial.print(F(",\"Vraw"));
Serial.print(ch + 1);
Serial.print("\":");
Serial.print(EmonLibDB_getVrms(ch + 1));

Serial.print(F(",\"VCal"));
Serial.print(ch + 1);
Serial.print("\":");
Serial.print(EmonLibDB_getVcal(ch + 1));

When one or both “blocks” are active in the code (I’ve added them just after the prints for the voltages), I could not reproduce the behavior, BUT then I got very often/mostly all voltages equal 0 with their matching vCal zero too !
I don’t understand anything now !
I tried like 20-30 times, as soon as one of these “Serial.print” are present, there’s no way I could see the “jumping” voltage behavior.

So why are you the first to have the problem, when the library has been in use for 2 years now?

Have you done as I suggested and matched both the library’s and the sketch’s source code on Github against your copy? Or done as Trystan suggested and tried his pre-compiled version?
Trystan’s suggestion could prove your compilation was bad, mine could show where the problem has arisen.

@TrystanLea
Could this be linked to the changes you’ve made to the EEPROM library? I think you really need a AVR328P version and a AVR-DB version.

Sorry guys, I know, that looks strange that I’m the “only” with this problem. Maybe it’s related to the 3-phase sketch ?!?

I’ve checked on 2 different laptops, with fresh cloned repo, …. same issue.

I’ve uploaded a pre-compiled binary as @TrystanLea requested it => same issue (btw, the avrdude.conf from the EmonScripts repo is outdated!).

See log with the fresh pre-compiled binary:

[16:56:11.606][I][emontx:147]: Received data: {"MSG":3,"V1":175.52,"V2":0.00,"V3":0.00,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0,"E1":0,"E2":0,"E3":0,"E4":0,"E5":0,"E6":0}
[16:56:21.606][I][emontx:147]: Received data: {"MSG":4,"V1":169.44,"V2":0.00,"V3":0.00,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0,"E1":0,"E2":0,"E3":0,"E4":0,"E5":0,"E6":0}
[16:56:31.617][I][emontx:147]: Received data: {"MSG":5,"V1":-146.24,"V2":0.00,"V3":0.00,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0,"E1":0,"E2":0,"E3":0,"E4":0,"E5":0,"E6":0}

I haven’t made any recent changes to the EEPROM library (last change was 3 years ago), we also do have two different versions one for 328 and another for avrdb:

Here’s data from an emonPi3 running the latest 3 phase sketch in our office building:

Could there be a hardware fault on @FredM67’s emonTx4 that’s causing an issue perhaps with the clock that’s then disrupting the ISR timing. I’ve experienced strange things with adding/removing serial prints before and I think they related to disrupting ISR timing.

I do have a spare EmonTx4 on my desk, perhaps I send you that as well as the emonPi3 @FredM67 ?

That’s something I had considered, but it reports reasonably accurate values for the mains frequency, so it seems unlikely. But given that Fred seems to have ruled out corrupted software, I’d go along with a hardware fault in general. As I wrote earlier, it smells of overflow, which could be a damaged pointer (a stuck bit?) causing a write to the wrong place that happens to be where the voltages are stored.

@FredM67
Can you give me the exact download link for the version of emonLibDB that you are using, and the sketch that you are using.
I’ll then compile and load those and see what I get.

Hi all,

I’ve found the issue, I’ve combined my neurones with some from AI, and I got to an explanation and a workaround.

So, it looks like it’s a mix of hardware and software issue.

I’ve added the following delay at the very beginning of setup:

  // Wait for power to stabilize when powered via RJ11
  delay(500);

Since then, the issue disappeared.
Here the explanation from AI:
I’ve added a 500ms delay at the very beginning of setup() to give the power supply time to fully stabilize before any initialization begins. This is especially important when powered via RJ11 where the power ramp-up may be slower.

If I’m the “only” one with this issue, that might point a hardware issue with my emonVS (3 phases), and/or my emontx4 board, or simply that for single-phase systems, something is different (consumption, timings, …). No idea how many customers are on 3-phase.
It may be related to the additional emonWifi module too (there’s “much” more consumption during start and especially during wifi-startup).

I don’t think blame can be attached to the 3-phase emonVs, because it contains exactly the same power supply as the single-phase version. After the initial outline, all the development work on emonLibDB and the demonstration sketches was done with a 3-phase version as a matter of necessity, and assumed a 12-channel emonTx4. You can look at the emonLibDB source to see how timings are arrived at, each sample is allowed the same processing time irrespective, but this depends on whether the c.t’s are line-line, and thus require two voltages to calculate the power, or line-neutral when the calculation is simpler and faster. The number of sample sets per mains cycle then depends on the total number of channels (voltage and current) being sampled - more with fewer active inputs but reduced if each input is allowed more time to be processed. (For best measurement fidelity, only include the inputs that you need.)

I think this will be proven to be the exact cause. I don’t have an emonWifi module so I am not able to test your theory.

Ok, culprit identified :smiley: @awjlogan
I did some “debug” wiring, so the emontx4 does not reboot when I open a serial monitor.
When the emonWifi module is NOT present, everything works fine, regardless how I power the tx4.
As soon as the emonWifi module is present, things work ONLY with the delay at the beginning of setup.
So, that’s an issue with some basic voltage readings, probably Vref.
emonWifi is powered through a sub-power-supply, taking +5V from emonVS. At first, I don’t really understand how it could influence so much the 3.3V powerline and/or the part for Vref.
It looks like also that powering through the USB-C is “much more” powerful than through EmonVS. Maybe it’s simply a matter of the track-thickness and/or length for +5V ?

Hi @FredM67 - great you’ve got it working, and also confirmed power up suspicions! You could try checking the reference value for both situations. Potentially the AC/DC converter in the emonVs has longer rise time than USB, but you’d have to put it on a scope to be sure.

Track thickness/length won’t be enough to cause this here - you need surprisingly narrow PCB traces for quite high currents, particularly on the outer layers.

The emonVs 5 V power supply is Multicomp MP-LD10-23B05R2 (10 W @ 5 V = 2 A).

Yellow = 5 V, blue = 3.3 V, load is an emonTx4.0.4 3V+12I

Well, I’m made further “debugging” … it looks like the ADC or something is completely messed up.

I’ve added these lines in EmonLibDB_getReadings:

When the system is running fine, I get every 9.8 seconds this:

{"MSG":51,"V1":234.80,"V2":234.72,"V3":234.97,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0}
,"sampleSets":22296":1339175.12,1339136.50,234.71":1339262.62,1339224.87,234.71":1340879.37,1340879.37,234.86vrefa = 1.0260
{"MSG":52,"V1":234.70,"V2":234.71,"V3":234.85,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0}
,"sampleSets":22298":1339659.87,1339615.62,234.75":1340129.25,1340089.12,234.79":1343065.37,1343065.37,235.05vrefa = 1.0260
{"MSG":53,"V1":234.74,"V2":234.78,"V3":235.04,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0}
,"sampleSets":22297":1342693.62,1342659.50,235.01":1342940.37,1342912.50,235.04":1342114.50,1342114.50,234.97vrefa = 1.0260
{"MSG":54,"V1":235.01,"V2":235.03,"V3":234.96,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0}
,"sampleSets":22297":1339454.25,1339414.00,234.73":1339636.25,1339596.50,234.75":1341593.25,1341593.25,234.92vrefa = 1.0260
{"MSG":55,"V1":234.73,"V2":234.74,"V3":234.92,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0}
,"sampleSets":22293":1340574.75,1340545.37,234.83":1339330.75,1339294.87,234.72":1341515.50,1341515.37,234.91vrefa = 1.0260
{"MSG":56,"V1":234.82,"V2":234.71,"V3":234.91,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0}
,"sampleSets":22293":1338755.75,1338717.37,234.67":1338326.37,1338289.87,234.63":1339701.87,1339701.75,234.76vrefa = 1.0260
{"MSG":57,"V1":234.66,"V2":234.63,"V3":234.75,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0}
,"sampleSets":22298":1343094.00,1343048.25,235.05":1342193.62,1342143.37,234.97":1342776.75,1342776.37,235.02vrefa = 1.0260
{"MSG":58,"V1":235.04,"V2":234.96,"V3":235.02,"P1":0,"P2":0,"P3":0,"P4":0,"P5":0,"P6":0}

Now when the system is broken I get this.
,“sampleSets”:4534":1345881.75,ovf,62314.46":1345881.75,ovf,62314.46":1345881.75,ovf,62314.46":1345881.75,ovf,62314.46":1345881.75,ovf,62314.46":1345881.75,ovf,62314.46":1345881.75,ovf,62314.46":1345881.75,ovf,62314.46":1345881.75,ovf,62314.46":1345881.75,ovf,62314.46":1345881.75,ovf,62314.46":1345881.75,ovf,62314.46":1345881.75,ovf,62314.46":1345881.75,ovf,62314.46":1345881.75,ovf,62314.46":1345881.75,ovf,62314.46":1345881.75,ovf,62314.46":1345881.75,ovf,62314.46":1345881.75,ovf,62314.46":1345881.75

and then

1345881.75,ovf,nan":1339857.75,ovf,nan":1339857.75,ovf,nan":1339857.75,ovf,nan":1339857.75,ovf,nan":1339857.75,ovf,nan":1339857.75,ovf,nan":1339857.75,ovf,nan":1339857.75,ovf,nan"

The key “sampleSets” has been printed only once at the beginning. The “pattern” starting with 1345881.75 has been printed 707 times, all without any single line break, then it changed to 1345881.75,ovf,nan":1339857.75,ovf,nan":1339857.75,ovf,nan": (there was a last 1345881.75then again ovf then nan instead of 62314.46.

reference (printed after the key vrefawas the same for both).

If cumSampleDeltas overflows, then it would point to the ADC running to fast, am I right ?

What I noticed late yesterday was a line on the data sheet for the emonVs power supply giving a maximum load capacitance. I did not look further, I am wondering whether the Wi-Fi module you added breaks that limit, and the 5 V is unstable, in much the same way as too much capacitance on the precision reference made that unusable?
Where can I find details of your Wi-Fi module?

I need to think a bit more about the ADC running too fast, but certainly the “ovf” explains some peculiar numbers.

Hi Rob - the WiFi module is 1 uF, no impact on the AC/DC converter which has a maximum of 6800 uF (if I recall correctly, but it’s of that magnitude).