Now that you mention it, and have encouraged me to zoom in, I think I can see kinks in your sine where it’s “slipped”. At over 1 msec per sample, I can see why you’ve taken the approach you have… you’re presumably trying to overlap your CPU processing time with the conversion time? My gut feel is you want to be sampling the signal at a constant rate and that can’t be faster than 860 SPS. I suspect going faster than that, and reusing previous conversion results will do more harm than good to your accuracy, although that’s more a gut feel than a serious calculation. Assuming your processing time is less than your conversion time, perhaps you could get the overlap by:
for required measurement time:
. wait for a conversion to complete
. do the processing
That way you’re being “clocked” by the ADC, and while it’s busy converting the next result, you’re processing the previous one and hopefully getting back to step 1 well before the next one is ready.
A very quick scan of the datasheet made me think you can’t feed it voltages below GND, so it probably makes sense to stick with what you have. It looks like you’ll only get negative readings in differential mode, but I haven’t studied it in detail.
Pffff… the STM32 can deliver 104 12-bit conversions right to my C array in the time it takes the Uno to do one 10-bit conversion, and it’s a $2 part! OK, OK, that’s probably not want you want to hear right now… I’ll stop …soon ;-).
It’s about now I usually dust off this pic of my lights circuit (compact fluros). This circuit was drawing 893mA RMS at the time and you can see the peak current exceeds 3A, so needs a multiplier of 3.36 there. While that is a real circuit, the numbers are still quite small. It’s unlikely anyone will have designed things so that they start clipping at 3A. With much bigger whole-house aggregate signals, you’re unlikely to see anything anywhere near that peaky (power factor regulations don’t allow it), but it illustrates the issue.