I’m guessing those analog muxes are external to the stm32? There are some big performance gains to be had if you can use on-chip muxes, i.e. choose an stm32 with a lot more pins. With that approach you can configure the ADCs and DMA controllers so that they continuously step through all their analog inputs with no cpu involvement at all. Configured and started once at boot-time, the conversions then run continuously with the results dumped straight into your C array (by the thousands if you make the array big enough). The cpu then gets an interrupt when the array is half full and the conversions continue in the other half.
With that model, the ARM core is totally dedicated to doing the power maths on values that have been magically delivered into your array. It doesn’t have to spend any cycles dealing with ADC completions, MUX configuring, SPI bus transactions etc. etc.
There’s some sample stm32 code that uses that technique to implement a basic energy monitor in this thread: https://community.openenergymonitor.org/t/stm32-development/6815