STM32 3CT Example

Not by coincidence I don’t think, I downloaded RW’s paper earlier today. I should take a look.

That’s a method that can only be used if you’re doing interpolation. Not knowing the granularity of the timing, it might or might not be possible to start the ADCs so as to not need to interpolate. Or if the sampling frequency is adequately high, choosing which pair of V & I samples to use together also provides a means of adjustment after the samples have been collected.

It might of course be necessary to use both - timing or choice or sample for the coarse adjustment and interpolation for the fine adjustment.

Using the usec offset from the TxShield example seems like phase errors can be dealt with.
A function can be made to set the usec advance or delay upon receiving a command from the Pi.

I think I understand what you mean by interpolation, this relates to the staggered timing of samples taken by the ATMEGA. Interpolation might not be necessary with synchronous timing on the stm32. I.e. there is no PHASECAL constant, instead it’s the usec delay which does the job.
When getting into three phase monitoring, we can state a condition that the VTs are all the same.
The hardware as it stands is designed such that all VTs are on ADC1. The first 9 CTs are on ADC3. The 6 planned and yet to be designed expansion CTs are on ADC2. The choices of here reflect the physical layout of chip’s pins to board layout to minimise track lengths.

The VTs all on the same ADC1 mean the usec delay can apply equally to all transformers, and of course, helps with imagining the buffer indexes in the code.

I should make a table, Trys and I sketched out something illustrating a potential limitation the other day with indexing, buffer indexes…

i VT CT phase
0 1 CT1 A
1 2 CT2 B
2 3 CT3 C
3 1 CT4 A
4 2 CT5 B
5 3 CT6 C
6 1 CT7 A
7 2 CT8 B
8 3 CT9 C
9 1 CT1 A
n . . .

In sync VT/CT pairs between CTs 1,4,7…2,5,8…3,6,9

Looking at the expansion CTs now, just exploring options…

i VT CT Phase CTex
0 1 CT1 A CT1
1 2 CT2 B CT2
2 3 CT3 C CT3
3 1 CT4 A CT4
4 2 CT5 B CT5
5 3 CT6 C CT6
6 1 CT7 A .
7 2 CT8 B .
8 3 CT9 C .
9 1 CT1 A CT1
n . . . .

Or:

i VT CT Phase CTex
0 1 CT1 A CT1
1 2 CT2 B CT1
2 3 CT3 C CT1
3 1 CT4 A CT2
4 2 CT5 B CT2
5 3 CT6 C CT2
6 1 CT7 A CT3
7 2 CT8 B CT3
8 3 CT9 C CT3
9 1 CT1 A CT4
n . . . .

The pattern is set upon initialisation. Food for thought.

Edit: Darn. 16 is the limit to the pattern number, so sayeth cubeMx.

What about the CTs? If they’re all the same then the usec delay between starting the ADCs can work, but the primary objection to that approach when we first prototyped it was the need to support different model CTs on each input, each with quite different phase errors.

As I thought I’d implied, phase adjustment must be made on a per-channel basis at least, and for best accuracy, load and system voltage needs to be taken into account too.

It’s back to interpolation then.
In the for loop, iterating over the DMA buffer, defining which indexes to correlate. The crucial bit of knowledge in this case is the time per ADC conversion. I don’t know this value, I know it’s not as simple as 601.5 cycles as defined in CubeMX.
. Sampling time.
. Conversion time.
. and?

Anyway, 0.000008354166667 usecs for 601.5 cycles at 72MHz. Correct?..

. https://www.st.com/content/ccc/resource/technical/document/reference_manual/4a/19/6e/18/9d/92/43/32/DM00043574.pdf/files/DM00043574.pdf/jcr:content/translations/en.DM00043574.pdf
RM0316 pages 322 and 325.

This is it. We can calculate the time represented between buffer indexes.
601.5 cycles with an ADC prescaler of 2 is used in the 3CT example.
A better calculation example then is:

601.5+12.5 x (1/(72,000,000/2))

16.70833347usecs + 0.347222225usec = 17.055555695usecs.

58.7kHz sampling rate.

978 samples per complete waveform at 60 Hz.

Can someone verify?

How does this translate into phase correction?

I derive 5kHz as sufficient based on 100 samples per 1/50Hz waveform. So 58kHz is great. Slowing this down significantly can be done if we need more CPU/memory resources.

Spotted this: ADS1115 and sampling speed - #2 by Robert.Wall

But do we need to define a strict sampling rate based on 50/60Hz if we have zero-crossing detection? Probably not.

A technique I use to verify the ADC is sampling at the rate I think it is, is to drive a spare GPIO signal low at the start of the handler (both half and full complete) and drive it high at the end. If you probe the resultant square wave with a scope, the frequency can be used to verify your sampling rate and the duty cycle will reveal how much cpu is being used processing the data.

Also remember that each ADC is typically set up to cycle through a series of inputs. So if it’s doing a new sample every 17 usecs (say) and you’ve programmed it to cycle through 3 inputs, then it’s only sampling each pin every 51 usecs so the sampling rate of any given signal is 1 / 51 usecs.

Clear. Let’s take it to 9 CTs then, 154usecs or 6,515Hz.
Scoping, experimental data, that’s right.
Another method was found today after fixing a bug in the Trys’ 3CT program.
125 50Hz AC cycles is 2.5 seconds. In that time a counter gave us a value of around 48,400 samples for three channels. 2.5*58.7kHz / 3ch = 48,917. This is out by around 10%. I wonder if the scope will display a lower sampling rate than calculated, I’ve made a note to check this. I’ll tag @TrystanLea here so we we’re on the same page.

The bug was resolved by passing, at a sprintf, a format specifier for the counter variable (%lld is needed for a uint64_t, instead of %d)
and then by removing the linker flag from the makefile: -specs=nano.specs

N.B. I’ve found before the scope can be connected directly at the ADC input, as the sampling charges a capacitor the voltage drops, the dip can be picked up clearly.

Here perhaps?

Yep, like that.
I’d have to solder a 1Mohm resistor in place of a burden or something, so mimic your test.
Looking at the circuit, the voltage input has a 10k and 330R between the bias and adc input. This could be enough for something to show up. Failing that I’ll init the bias as GPIO input with pullup, should do the trick.

The 10% out I mentioned earlier was in fact 1%, just noticed :slight_smile:

Couldn’t get the scope on the ADCs showing anything, must need more impedance.

Doing the LED pin toggle on interrupt method.

edited: the numbers are unscrambled from the markdown formatting.
2.5 x 58700 / 3 = 48,917 sample calculated from settings above.
2.5 x 58590 / 3 = 48,825 samples derived from scoping interrupt and deriving sampling frequency.
actual count = 48,520 samples as printed from serial output.
scope on the 50Hz incoming shows it’s between 50Hz and 49.5Hz.

Seems like we’re okay.

22

This is really neat.

Regarding offset, I noticed the bias output channel can be routed to ADC4 in software. Seems like offset can be removed accurately with occasional sampling of the bias. Seems like this idea has also already been explored in the TxShield demo. I don’t know if there’s any difference in the performance of one method over the other. Fewer float calcs in the latter perhaps.

No, it’s standard maths.

The interpolation mentioned earlier I thought was purely a shift in the timing of a sample. Instead I see now that the interpolation done in emonLib was a time value in addition to a y axis value. Looking at it closely earlier I noticed a linear function applied to the wave, which is of course non-linear.
Then I see this linear/non-linear problem has already been posted about and resolved in a new library.
Regardless, I don’t intend on focussing on this kind of interpolation, wave-like or linear, right now, too much on my plate.

The shifting of the array using a known sampling time seems too good a place to start to ignore. We have the adc buffer to use for this purpose. The shifting of the array is, in practice, more like choosing the correct index to match the appropriate V and I samples. The phase angle is easily converted into a time difference if needed for a phase calibration constant. Phase angle is time difference, at a certain frequency.

The issue of phase angle changing on the value being measured needs revisiting, and a better experimental setup in our lab to test with.

As for the Infs and NaNs cropping, the problem seems resolved since finding and using some code from @dBC’s txShield example, something like:

if (f32sum_V_sq_avg < 0) // if offset removal cause a negative number,
  f32sum_V_sq_avg = 0;   // make it 0 to avoid a NaN at sqrt.
if (f32sum_I_sq_avg < 0)
  f32sum_I_sq_avg = 0;

...   
// calculate PF, while preventing dividing by zero error.
if (apparentPower != 0) {
  powerFactor = realPower / apparentPower;
}
else
  powerFactor = 0;

Yes, but if you read the comment associated with that first case, you’ll see it’s to make the f/w robust in the face of other problems…

  //
  // And remove its RMS from the accumulated RMS.  If the mid-rail is not stable
  // and the signal is hugging the mid-rail,
  // this subtraction can send things negative, so we nip that in the bud with 0
  // rather than generate a nan.
  //

That can happen pretty easily on the shield setup: you just run it without the shield plugged in and you measure random floating junk at the ADC inputs. I think the real issue in your post above is not why you got nans and infs (although it’s good that you’ve fixed them) but rather why you get random 0 readings under steady state load conditions.

You are right. The zeros were an aspect of the original problem.

I don’t know.

My best guess is that the sprintf %.2d format specifier and/or compiler flags were printing a float NaN as 0.000. What do you think? Why that should apply to Vrms while there’s definitely an AC adaptor connected confuses me.

An overflow of a variable? I could check again for this. I have already checked and didn’t find an obvious problem. I’ll make a note for a better check in the future of how close the variables are to overflowing… note made.

Thinking about it some more, nan is totally non-fatal and probably as good a way as any of indicating something is wrong. You could mount a good argument that letting the nan occur is the better choice. My code above to avoid the nan is a little like the old trick of putting masking tape over your car’s Oil Pressure warning light.

I’m not familiar with the code base you’re using. Do you accumulate in floating point or integer? When I was dong the stm32 shield stuff I found that single precision FP (handled by the FPU) was not high enough precision… the symptoms you get from that appear a lot like noise. Double precision FP (done entirely in s/w) solved that but was very slow. That’s why I settled on integer arithmetic.

The technique I use to determine how many bits I need is to look at the worst case. If you’re starting with 12 bit ADC readings, and you’re multiplying two of them together (either for V*I or for RMS calculions) then you’re up to 24 bits per pair. If you want to accumulate 4096 readings (say) then it’s like multiplying by a third 12-bit number, so you now need 36 bits for the accumulation. That’s all pretty conservative because it assumes every reading is going to be at maximum deflection and with an AC signal it won’t be. But the choice of variable is pretty chunky… uinit32 or uint64. In general uint64 has heaps of headroom unless you’re doing massive accumulations. uint32 is faster, but potentially tight in dynamic range.

Interesting shortcut.

A safe assumption would be to say the incoming value will be half 4096, because everything revolves around our bias.

I guess this is the long way of working out how many bits are needed:

Around 16150 samples per CT channel are being made over 2.5 seconds.
16150 * 2048 = 33,075,200
squared accumulator 16150 * 2048 * 2048 = 67,738,009,600
uint32 2^32 = 4,294,967,296
uint64 2^64 = 1.844674407 x 10^19

I’ve looked at these numbers before, it’s sinking in a bit more now, I know what you mean by Massive.

It follows that the summing accumulators can be uint32 and the sqauring accummulator can be uint64. The squaring acc. is miles away from the maximum uint64.

Accumulation is being done in integer at the moment.

The accumulators are cast to floats every 2.5s, although originally doubles in Trystan’s code I switched to floats thinking to make use of the FPU, although I don’t know how to use it or if the compiler automatically makes use of it. Is it automatic? I’ve looked into using arm_math.h and have had some advice on integrating it, although hadn’t tried yet. The sqrtf function I’m using now also.

If the errors are that bad with floats then switching back to doubles will be necessary then. Despite the performance dip, which I’ve also noticed, the floats were chomped through much faster.

Edit: squared accumulator calculation. my shortcutting didn’t work, again.

/**
  ******************************************************************************
  * @file           : main.c
  * @brief          : Main program body
  ******************************************************************************
  ** This notice applies to any and all portions of this file
  * that are not between comment pairs USER CODE BEGIN and
  * USER CODE END. Other portions of this file, whether 
  * inserted by the user or by software development tools
  * are owned by their respective copyright owners.
  *
  * COPYRIGHT(c) 2019 STMicroelectronics
  *
  * Redistribution and use in source and binary forms, with or without modification,
  * are permitted provided that the following conditions are met:
  *   1. Redistributions of source code must retain the above copyright notice,
  *      this list of conditions and the following disclaimer.
  *   2. Redistributions in binary form must reproduce the above copyright notice,
  *      this list of conditions and the following disclaimer in the documentation
  *      and/or other materials provided with the distribution.
  *   3. Neither the name of STMicroelectronics nor the names of its contributors
  *      may be used to endorse or promote products derived from this software
  *      without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  ******************************************************************************
  */
/* Includes ------------------------------------------------------------------*/
#include "main.h"
#include "stm32f3xx_hal.h"
#include "adc.h"
#include "dma.h"
#include "opamp.h"
#include "tim.h"
#include "usart.h"
#include "gpio.h"

/* USER CODE BEGIN Includes */
#define ARM_MATH_CM4
#include "arm_math.h"
#include "ds18b20.h"
#include <math.h>
#include <string.h>
typedef float float32_t;
/* USER CODE END Includes */

/* Private variables ---------------------------------------------------------*/

/* USER CODE BEGIN PV */
/* Private variables ---------------------------------------------------------*/

#define true 1
#define false 0
#define MID_ADC_READING 2000

// Serial output buffer
char log_buffer[150];

// Flag
uint8_t readings_ready = false;

// Calibration
const float VOLTS_PER_DIV = (3.3 / 4096.0);
float VCAL = 268.97;
float ICAL = 90.9;

// ISR accumulators
typedef struct channel_
{
  uint64_t sum_V_sq;
  uint64_t sum_I_sq;
  uint32_t sum_V;
  uint32_t sum_I;
  uint32_t sum_P;
  uint32_t count;

  uint8_t positive_V;
  uint8_t last_positive_V;
  uint8_t cycles;
} channel_t;

uint32_t pulseCount = 0;

//#define CTn 9 // number of CT channels
static channel_t channels[CTn];
static channel_t channels_copy[CTn];

/* USER CODE END PV */

/* Private function prototypes -----------------------------------------------*/
void SystemClock_Config(void);

/* USER CODE BEGIN PFP */
/* Private function prototypes -----------------------------------------------*/
//#define ARM_MATH_CM4
//#include "arm_math.h"
/* USER CODE END PFP */

/* USER CODE BEGIN 0 */
void process_frame(uint16_t offset)
{
  int32_t sample_V, sample_I, signed_V, signed_I;

  HAL_GPIO_WritePin(GPIOD, GPIO_PIN_8, GPIO_PIN_SET);
  for (int i = 0; i < adc_buff_half_size; i += CTn)
  {
    // Cycle through channels
    for (int ch = 0; ch < CTn; ch++)
    {
      channel_t *channel = &channels[ch];

      // ----------------------------------------
      // Voltage
      //float32_t theresult;
      //arm_rms_f32(float* adc1_dma_buff, uint32_t* sizeof(adc1_dma_buff), float* theresult);
      sample_V = adc1_dma_buff[offset + i + ch];
      signed_V = sample_V - MID_ADC_READING;
      channel->sum_V += signed_V;
      channel->sum_V_sq += signed_V * signed_V;
      // ----------------------------------------
      // Current
      sample_I = adc3_dma_buff[offset + i + ch];
      signed_I = sample_I - MID_ADC_READING;
      channel->sum_I += signed_I;
      channel->sum_I_sq += signed_I * signed_I;
      // ----------------------------------------
      // Power
      channel->sum_P += signed_V * signed_I;

      channel->count++;

      // Zero crossing detection
      channel->last_positive_V = channel->positive_V;
      if (signed_V > 0)
        channel->positive_V = true;
      else
        channel->positive_V = false;
      if (!channel->last_positive_V && channel->positive_V)
        channel->cycles++;

      // 125 cycles or 2.5 seconds
      if (channel->cycles >= 125)
      {
        channel->cycles = 0;

        channel_t *chn = &channels_copy[ch];
        // Copy accumulators for use in main loop
        memcpy((void *)chn, (void *)channel, sizeof(channel_t));
        // Reset accumulators to zero ready for next set of measurements
        memset((void *)channel, 0, sizeof(channel_t));

        if (ch == (CTn - 1))
        {
          readings_ready = true;
          /*
          sprintf(log_buffer, "chn->sum_V_sq:%lld\r\n", chn->sum_V_sq);
          debug_printf(log_buffer);
          sprintf(log_buffer, "chn->sum_I_sq:%lld\r\n", chn->sum_I_sq);
          debug_printf(log_buffer);
          sprintf(log_buffer, "chn->sum_V:%lld\r\n", chn->sum_V);
          debug_printf(log_buffer);
          sprintf(log_buffer, "chn->sum_I:%lld\r\n", chn->sum_I);
          debug_printf(log_buffer);
          sprintf(log_buffer, "chn->sum_P:%lld\r\n", chn->sum_P);
          debug_printf(log_buffer);
          sprintf(log_buffer, "chn->count:%ld\r\n", chn->count);
          debug_printf(log_buffer);
          sprintf(log_buffer, "chn->positive_V:%d\r\n", chn->positive_V);
          debug_printf(log_buffer);
          sprintf(log_buffer, "chn->last_positive_V:%d\r\n", chn->last_positive_V);
          debug_printf(log_buffer);
          sprintf(log_buffer, "chn->cycles:%d\r\n", chn->cycles);
          debug_printf(log_buffer);
          
            uint32_t sum_V_sq;
            uint32_t sum_I_sq;
            int32_t sum_V;
            int32_t sum_I;
            int32_t sum_P;
            uint32_t count;

            uint8_t positive_V;
            uint8_t last_positive_V;
            uint8_t cycles;
            */
        }
      }
    }
  }
  HAL_GPIO_WritePin(GPIOD, GPIO_PIN_8, GPIO_PIN_RESET);
}

/* USER CODE END 0 */

/**
  * @brief  The application entry point.
  *
  * @retval None
  */
int main(void)
{
  /* USER CODE BEGIN 1 */
  float V_RATIO = VCAL * VOLTS_PER_DIV;
  float I_RATIO = ICAL * VOLTS_PER_DIV;
  /* USER CODE END 1 */

  /* MCU Configuration----------------------------------------------------------*/

  /* Reset of all peripherals, Initializes the Flash interface and the Systick. */
  HAL_Init();

  /* USER CODE BEGIN Init */

  /* USER CODE END Init */

  /* Configure the system clock */
  SystemClock_Config();

  /* USER CODE BEGIN SysInit */

  /* USER CODE END SysInit */

  /* Initialize all configured peripherals */
  MX_GPIO_Init();
  MX_DMA_Init();
  MX_ADC3_Init();
  MX_ADC1_Init();
  MX_TIM8_Init();
  MX_USART2_UART_Init();
  MX_OPAMP4_Init();
  /* USER CODE BEGIN 2 */
  debug_printf("start\r\n");

  HAL_ADCEx_Calibration_Start(&hadc1, ADC_SINGLE_ENDED);
  HAL_ADCEx_Calibration_Start(&hadc3, ADC_SINGLE_ENDED);

  HAL_OPAMP_Start(&hopamp4);

  start_ADCs();

  /* USER CODE END 2 */

  /* Infinite loop */
  /* USER CODE BEGIN WHILE */
  while (1)
  {
    /*
    if (adc1_half_conv_complete && !adc1_half_conv_overrun)
    {
      adc1_half_conv_complete = false;
      process_frame(0);
    }

    if (adc1_full_conv_complete && !adc1_full_conv_overrun)
    {
      adc1_full_conv_complete = false;
      process_frame(adc_buff_half_size);
    }
*/
    if (readings_ready)
    {
      readings_ready = false;
      //process_ds18b20s();

      for (int ch = 0; ch < CTn; ch++)
      {
        channel_t *chn = &channels_copy[ch];

        float Vmean = (float)chn->sum_V / (float)chn->count;
        float Imean = (float)chn->sum_I / (float)chn->count;

        float f32sum_V_sq_avg = (float)chn->sum_V_sq / (float)chn->count;
        f32sum_V_sq_avg -= (Vmean * Vmean); // offset removal

        float f32sum_I_sq_avg = (float)chn->sum_I_sq / (float)chn->count;
        f32sum_I_sq_avg -= (Imean * Imean); // offset removal

        if (f32sum_V_sq_avg < 0) // if offset removal cause a negative number,
          f32sum_V_sq_avg = 0;   // make it 0 to avoid a nan at sqrt.
        if (f32sum_I_sq_avg < 0)
          f32sum_I_sq_avg = 0;

        float Vrms = V_RATIO * sqrtf(f32sum_V_sq_avg);
        float Irms = I_RATIO * sqrtf(f32sum_I_sq_avg);

        //float* sqrtV_temp;
        //float* sqrtI_temp;
        //arm_sqrt_f32(f32sum_V_sq_avg, sqrtV_temp);
        //arm_sqrt_f32(f32sum_I_sq_avg, sqrtI_temp);
        //float Vrms = V_RATIO * *sqrtV_temp;
        //float Irms = I_RATIO * *sqrtI_temp;

        float f32_sum_P_avg = (float)chn->sum_P / (float)chn->count;
        float mean_P = f32_sum_P_avg - (Vmean * Imean); // offset removal
        float realPower = V_RATIO * I_RATIO * mean_P;

        float apparentPower = Vrms * Irms;

        float powerFactor;
        // calculate PF, preventing dividing by zero error.
        if (apparentPower != 0)
        {
          powerFactor = realPower / apparentPower;
        }
        else
          powerFactor = 0;

        sprintf(log_buffer, "V%d:%.2f,I%d:%.3f,RP%d:%.1f,AP%d:%.1f,PF%d:%.3f,C%d:%ld,", ch, Vrms, ch, Irms, ch, realPower, ch, apparentPower, ch, powerFactor, ch, chn->count);
        debug_printf(log_buffer);

        uint32_t current_millis = HAL_GetTick();
        sprintf(log_buffer, "millis:%ld\r\n", current_millis);
        debug_printf(log_buffer);

        //ref_rms_f32();
        //sprintf(log_buffer,"\r\n");
        //debug_printf(log_buffer);

        
      }
      sprintf(log_buffer, "PC:%ld\r\n", pulseCount);
        debug_printf(log_buffer);
    }

    /*
    if (readings_ready)
    {
      readings_ready = false;

      for (int n = 0; n < CTn; n++)
      {
        channel_t *chn = &channels_copy[n];

        float Vmean = chn->sum_V * (1.0 / chn->count);
        float Imean = chn->sum_I * (1.0 / chn->count);

        chn->sum_V_sq *= (1.0 / chn->count);
        chn->sum_V_sq -= (Vmean * Vmean);
        float Vrms = V_RATIO * sqrt((float)chn->sum_V_sq);

        chn->sum_I_sq *= (1.0 / chn->count);
        chn->sum_I_sq -= (Imean * Imean);
        float Irms = I_RATIO * sqrt((float)chn->sum_I_sq);

        float mean_P = (chn->sum_P * (1.0 / chn->count)) - (Vmean * Imean);
        float realPower = V_RATIO * I_RATIO * mean_P;

        float apparentPower = Vrms * Irms;
        float powerFactor = realPower / apparentPower;

        sprintf(log_buffer, "V%d:%.2f,I%d:%.3f,RP%d:%.1f,AP%d:%.1f,PF%d:%.3f,C%d:%ld,", n, Vrms, n, Irms, n, realPower, n, apparentPower, n, powerFactor, n, chn->count);
        debug_printf(log_buffer);

        uint32_t current_millis = HAL_GetTick();
        sprintf(log_buffer, "millis:%ld\r\n", current_millis);
        debug_printf(log_buffer);
        //debug_printf("\r\n");
      }

      //ref_rms_f32();
      //sprintf(log_buffer,"\r\n");
      //debug_printf(log_buffer);

      sprintf(log_buffer, "PC:%ld\r\n", pulseCount);
      debug_printf(log_buffer);
    }
    */
    /* USER CODE END WHILE */

    /* USER CODE BEGIN 3 */
  }
  /* USER CODE END 3 */
}

/**
  * @brief System Clock Configuration
  * @retval None
  */
void SystemClock_Config(void)
{

  RCC_OscInitTypeDef RCC_OscInitStruct;
  RCC_ClkInitTypeDef RCC_ClkInitStruct;
  RCC_PeriphCLKInitTypeDef PeriphClkInit;

  /**Initializes the CPU, AHB and APB busses clocks 
    */
  RCC_OscInitStruct.OscillatorType = RCC_OSCILLATORTYPE_HSE;
  RCC_OscInitStruct.HSEState = RCC_HSE_ON;
  RCC_OscInitStruct.HSEPredivValue = RCC_HSE_PREDIV_DIV1;
  RCC_OscInitStruct.HSIState = RCC_HSI_ON;
  RCC_OscInitStruct.PLL.PLLState = RCC_PLL_ON;
  RCC_OscInitStruct.PLL.PLLSource = RCC_PLLSOURCE_HSE;
  RCC_OscInitStruct.PLL.PLLMUL = RCC_PLL_MUL9;
  if (HAL_RCC_OscConfig(&RCC_OscInitStruct) != HAL_OK)
  {
    _Error_Handler(__FILE__, __LINE__);
  }

  /**Initializes the CPU, AHB and APB busses clocks 
    */
  RCC_ClkInitStruct.ClockType = RCC_CLOCKTYPE_HCLK | RCC_CLOCKTYPE_SYSCLK | RCC_CLOCKTYPE_PCLK1 | RCC_CLOCKTYPE_PCLK2;
  RCC_ClkInitStruct.SYSCLKSource = RCC_SYSCLKSOURCE_PLLCLK;
  RCC_ClkInitStruct.AHBCLKDivider = RCC_SYSCLK_DIV1;
  RCC_ClkInitStruct.APB1CLKDivider = RCC_HCLK_DIV2;
  RCC_ClkInitStruct.APB2CLKDivider = RCC_HCLK_DIV1;

  if (HAL_RCC_ClockConfig(&RCC_ClkInitStruct, FLASH_LATENCY_2) != HAL_OK)
  {
    _Error_Handler(__FILE__, __LINE__);
  }

  PeriphClkInit.PeriphClockSelection = RCC_PERIPHCLK_USART2 | RCC_PERIPHCLK_TIM8;
  PeriphClkInit.Usart2ClockSelection = RCC_USART2CLKSOURCE_SYSCLK;
  PeriphClkInit.Tim8ClockSelection = RCC_TIM8CLK_HCLK;
  if (HAL_RCCEx_PeriphCLKConfig(&PeriphClkInit) != HAL_OK)
  {
    _Error_Handler(__FILE__, __LINE__);
  }

  /**Configure the Systick interrupt time 
    */
  HAL_SYSTICK_Config(HAL_RCC_GetHCLKFreq() / 1000);

  /**Configure the Systick 
    */
  HAL_SYSTICK_CLKSourceConfig(SYSTICK_CLKSOURCE_HCLK);

  /* SysTick_IRQn interrupt configuration */
  HAL_NVIC_SetPriority(SysTick_IRQn, 0, 0);
}

/* USER CODE BEGIN 4 */

void onPulse()
{
  pulseCount++;
}

/* USER CODE END 4 */

/**
  * @brief  This function is executed in case of error occurrence.
  * @param  file: The file name as string.
  * @param  line: The line in file as a number.
  * @retval None
  */
void _Error_Handler(char *file, int line)
{
  /* USER CODE BEGIN Error_Handler_Debug */
  /* User can add his own implementation to report the HAL error return state */
  while (1)
  {
  }
  /* USER CODE END Error_Handler_Debug */
}

#ifdef USE_FULL_ASSERT
/**
  * @brief  Reports the name of the source file and the source line number
  *         where the assert_param error has occurred.
  * @param  file: pointer to the source file name
  * @param  line: assert_param error line source number
  * @retval None
  */
void assert_failed(uint8_t *file, uint32_t line)
{
  /* USER CODE BEGIN 6 */
  /* User can add his own implementation to report the file name and line number,
     tex: printf("Wrong parameters value: file %s on line %d\r\n", file, line) */
  /* USER CODE END 6 */
}
#endif /* USE_FULL_ASSERT */

/**
  * @}
  */

/**
  * @}
  */

/************************ (C) COPYRIGHT STMicroelectronics *****END OF FILE****/

Should do, if you’re using the standard setup. You should see something like…

 -mfpu=fpv4-sp-d16 -mfloat-abi=hard

in your builds. If you’re only doing it ever 2.5 secs then it’s probably worth taking the hit of the higher precision.

I tried the DSP a while ago for ffts and the results were pretty good. I also played with using its dot product function for calculating Power. The V*I calculation is effectively the dot product of two big vectors. The DSP can also get you the RMS value of a vector. But given the need to do all of those, and some phase error adjustment, I didn’t pursue it very far. The current technique has the advantage of doing a single pass through the data whereas each of those DSP operations would require their own pass. But as I say, I didn’t pursue it very far, so don’t let me discourage you if you want to give it a shot.

With a steady load, you wouldn’t really expect overflow to come and go like that would you, unless you happen to be right on the edge? You could confirm that by just doubling the load and then you should be permanently in overflow land.