SAMD21 ADC and emonLibCM

Ideally, yes. But reality means you’re unlikely to achieve that. I’m hoping the 13th harmonic (of 50 Hz) will be achievable (i.e. > 26 samples per cycle); with 12 power channels and the -DB48 processor, even that might be out of reach.

I make that 0.02÷(27×13) = 56.98us.
There is a calculator here Getting the most out of the SAM D21's ADC - Stargirl (Thea) Flowers and if I enter values for max ADC clock and a sample time of 57us it shows:
Actual sample time of 0.33us
Propagation delay 5.33us
and Conversion time 14us

This leaves 19.66us free time between readings and is equivalent to 912 single cycle instructions (most instructions are single cycle even 32x32bit multiply) or 70 instructions / channel.

Will the calcs fit into that ?

3 V + 12 I = 15 according to my maths - if you have the equivalent of the emonTx V3 & emonVs,
otherwise it’s to suit your hardware (or maybe vice versa).

This depends on your processor, and what you do with the maths. You might spot some economies that I’ve missed…

With a 48MHz CPU and 27 x 13 samples in 20ms the required sampling frequency 17.55kHz or one sample every 56.98us. After ADC overheads and actual sampling time one is left with 17.32us or 831 ops (63.95 ops / channel).

Unfortunately I do not have the CM energy calcs in my head right now so I have no feeling if 63 ops would be enough ?

There are real economies to be gained however by running the ADC in scan mode and using DMA.

I am a bit pressed at the moment as there is a mountain of rigid insulation arriving tomo.

Thanks @ozpos @Robert.Wall it would be great to get an idea for how much processor time the SAMD21 might free up vs the AVR-DB, perhaps it will be very useful for getting higher performance at 12 channels. I’ve now got a couple of the Arduino Nano 33 and Feather M0 development boards, I wont get a chance to explore them for a bit, but interested following along.

@TrystanLea
I’m trying to make progress with the -DB48, and I don’t want to split my time between two processors. At the moment, I have a sampling interval of ~78 µs, which I think is viable - there are no displaced interrupts to be seen. Until I get all the logic and maths proven, I can’t really make a sensible prediction.

We know that reverting to the way that emonLibCM works with the '328P will allow more channels/better sample rate, but it retains the high/low current inaccuracies introduced by the value-dependent phase errors of the transformers, which is now the main source of inaccuracy.

1 Like

Can this be compensated for in a simple way or via a lookup table ?

Thanks Robert, yes understood

I don’t think so - the real power calculation is affected, but the effect of the error also heavily depends on the load power factor.

Is the maths in question mostly integer or floating point? And if integer, how many bits wide?

Hi @dBC integer. Usually values are scaled up so as not to loose precision. The Cortex M0 does not have an FPU (you have to go to the M4 & M7 for that) but it does have a 32-bit by long 32-bit in a single CPU cycle of 48MHz.

12-bit wide data sample so V x I could yield a maximum 25-bit result (0x1000 x 0x1000 or decimal 4096 x 4096).

The Arduino C/C++ compiler supports uint64_t data type.

Thanks. I was actually referring to Robert’s AVR code and his battle with the inevitable new data arrival tick, but I guess you’re both working from the same code base.

@Robert.Wall , are you familiar with AVR201: Using the AVR Hardware Multiplier?

If you don’t fancy slipping into AVR assembler, Atmel have provided some nice C inline functions, and you can find a copy of them here, and you can see how @cbmarkwardt used them here.

In that example…

mac16x16_32(vstats.val2_sum,vval,vval); // .. squared voltage

takes two 16 bit values, multiplies them and accumulates the result into a 32 bit variable. There’s also a mac16x16_24() that’s even faster, if you know the summation can fit in 24 bits.

1 Like

No - not found that.

Ouch - that’s going to stir up the old - and I mean old - grey matter. It must be around the mid 1980s when I last had serious dealings with assembler.
One doubt I have - I’m not a career programmer obviously (I’m really a systems/applications/projects engineer who just happens to have done a fair bit of programming over the years), would my attempts at assembler be better than a decent compiler? Yes, I know a compiler might think it needs to cater for all eventualities whereas I know the bounds of every value, but even so, I wonder.

24 bit multipliers would suit me, then accumulate into 32 or 64-bit - the plan was 2 stages; 32-bit until it could be close to full, then accumulate the sub-total to 64-bit, which would be about every 100 ms.

There will be two f.p. operations per power per cycle when I calculate the real power from the ‘partial powers’ accumulated over that cycle.

Thanks for all those pointers, I’ll have to look at them.

Fear not, when you dig through those pointers you’ll see you don’t need to write any assembler. To you, it’ll look like you’re just calling a C function - there’ll be no overhead of a function call though,it’s all inlined.

It’s definitely better than anything you’ll get out of gcc. C will be doing 32-bit operations (on an 8 bit micro) so it’ll be constantly checking for carries etc. These inline functions do the bare minimum instructions to do what they promise - multiply two 16-bit numbers together and add the result into a 24-bit accumulation for example. You can’t do that in C. C will just promote everything to 32-bits and assume the worse with regards dynamic range so have to do a lot of carry checks.

hmm… they may not work for you then. If you could take advantage of 16-bit multiplies, being accumulated into a 32-bit accumulator, they’ll be faster than C.

Have a look at the code generated when you multiply two 32-bit values together in C, on an 8-bit micro. It’s an awful lot of instructions… most of which can be done away with once you know your own dynamic range limits.

Once you’re on a 32-bit cpu like @ozpos that aspect goes away - it all just becomes single instruction. But 32-bit arithmetic is hard work for an 8-bit micro.

(N.B. The shorthand is +=→ means accumulated into n bits etc)

Yes, but 12 × 12 → 32, followed by 32 += → 32 a few times, then 32 +=→ 64 is surely faster than 16 × 16 +=→ 32 etc? And I’d guess (hopefully, without evidence) that would be faster than 16 × 16 +=→ 64 (because won’t it do the whole lot as 64-bits?).

We know that - but I’m stuck with it. It’s a great pity that the STM32 bit the dust after all the hard work you put into it. It’s a matter of great regret to me at least.

But how can you do that in C?

I don’t think there is a mac16x16_64() in those inlines, but you could potentially add one - just base it on mac16x16_32()

If you write it in C yes, it will all get promoted to 64 bits and be very slow. If you use these inlines then no, they’d do exactly what they say on the pack.

Exactly - what you’re saying, even just the inline macro for the multiplication only will be a significant improvement over (12 → 16) × (12 → 16) → 16
which in turn must be a significant improvement over (12 → 32) × (12 → 32) → 32

Looking at the assembler in

I don’t want to multiply two numbers, I want to square one number. So surely, rather than doing
(A + B) × (C + D) = A×C + A×D + B×C + B×D,

there’s a further economy to be had:

(A + B)² = A × A + B × B + (A × B) << 1

in other words, get rid of one multiply and replace it by a left-shift?

It seems like I’ve got to re-learn my assembler - Atmel-style.

Any more detail on A and B, like size, range constraints, signed Vs unsigned etc? For example, if A was straight out of the ADC you could say it’s unsigned in the range of 0…4095.

In true RISC fashion, pretty much all the ALU instructions are single cycle, so a left shift costs the same as a multiply: 1 cycle. But being an 8-bit machine, the multiply has to be 8-bit x 8-bit → 16-bit. So without knowing their sizes, it’s hard to compare.

Which got me wondering… do you just need approximate magnitudes of V and I for your dynamic phase adjustment (like high, medium or low) or do you need the same precision that you need for everything else?