It’s probably better described as the net phase shift of the CT, your anti-aliasing filters, and any s/w induced shifts (unless you synchronise two AtoDs to start sampling at exactly the same time?). Although I’m not sure how far any of those additional sources of phase shift will go towards explaining the discrepancy you see between your two methods.
The other thing to keep in mind is that the arccosine (or arctan in my case) maths assumes a pure sinewave. The calibration instructions for all the energy ICs call for “ideal inputs”, i.e. you have to feed it two pure sinewaves as generated by an AC lab supply or an AC power calibrator. Unfortunately they’re out of the reach of most home hobbyists.
The AC signal most of us get from the grid can be pretty distorted with flattops in the sinewave, reportedly from all the switch mode power supplies coming on high in the cycle. I’m struggling to get my head around what that will mean for your calculations. Since you’re feeding the grid to a purely resistive load, I would expect the harmonics in V to also appear in I, so the power contained in those frequencies will show up in your Real power calculation, as well as your Apparent power calculation. In which case, they probably wouldn’t drive you towards a non-unity PF in themselves (like harmonics appearing solely in I do). The bit I’m struggling with is what does it mean to do an arccos on those values.
How many linecycles do your 640 V,I pairs represent? Out of interest, do you get a different (better?) result if you make that calculation over a much larger sample? Do you tune the sample count to make sure they represent a whole number of line cycles?
One thing I try to avoid during calibration is the effects of quantization errors, especially if you’re dealing with very small numbers (or numbers that are very close to each other, or PFs that are very close to 1). To that end, for the purpose of phase error calibration, I collect V,I pairs for 20 seconds at 8kHz and make sure they represent a whole number of line cycles. That’s helped by the fact that I know the calibrator is going to feed me the exact same sinewaves cycle after cycle at an extremely stable line frequency. If a particular point on the sinewave sometimes reads as 100 and sometimes as 101, then by collecting it 8000x20 times I’m effectively oversampling it and determining that it really represents 100.3 (say) and the maths ends up locking that value into the calibration instead of the 101 it might have seen had I just sampled it once.