Getting spikes in emonCMS graph

Little more info. Firstly the heading above the 10464.00 is ‘Latest’ and there’s a tooltip that says ‘Last recorded 2 value’. The number is still the same despite it being a day later.

Second I copied the data from /home/pi/data/phpfina/ and I wrote a perl program to read the .dat files. In 3.dat I can see the 12028.00 value sitting there:

12222.000000
12223.000000
12223.000000
12028.000000
12225.000000
12227.000000
12228.000000
12228.000000
12229.000000

So it’s definitely emon-whatever doing this, not my imagination!

Now the graph isn’t showing me the same data today, it’s obviously resampling or somesuch. Where is the explanation of how the graphs work? A link on the page would be useful.

OK. Here’s a screenshot showing the bad data point in the optical sensor output:

The energy feed is calculated from the CT sensor. My dump of the actual data files confirms that it’s a single bad value timed at 14:26:00 in the optical stream.

Is this a data collection error (bug) in the emonTx or a data recording error (bug) in the emonBase? Or is there some other possible explanation?

I gather from some googling that there is no user documentation for the graph app, so I’m still in the dark as to whether I’m using it right. Why are the colours shown as black in the ‘feeds in view’ for example?

I’ve got other oddities in the data that I’ll post later.

Here’s another screenshot:

This one concerns the electricity imported from the mains, measured by the other emonTx in my garage. There’s the optical sensor data, the raw power readings from the CT on the meter tail, and the integrated energy readings computed from the power. The readings have been scaled by different factors to make the graph show the problem.

The CT has a single missing data point at the critical time; the raw data in 4.dat confirms that:
2018-12-10 03:41:40 2160.000000
2018-12-10 03:41:50 2163.000000
2018-12-10 03:42:00 NaN
2018-12-10 03:42:10 2167.000000
2018-12-10 03:42:20 2184.000000

The computed energy has a single retrograde value at the same time and apparently catches up afterwards:
2018-12-10 03:41:40 113.359528
2018-12-10 03:41:50 113.365540
2018-12-10 03:42:00 112.734421
2018-12-10 03:42:10 113.377548
2018-12-10 03:42:20 113.389030
I don’t know what I expect the energy integration to do when there’s missing power data, but pretending that there was a sudden switch to energy outflow is definitely not it!

The optical sensor also has a retrograde measurement at the same time:
2018-12-10 03:41:40 115049.000000
2018-12-10 03:41:50 115055.000000
2018-12-10 03:42:00 114418.500000
2018-12-10 03:42:10 115067.000000
2018-12-10 03:42:20 115079.000000
This is very strange. How does a whacc feed produce retrograde data? And no, my meter doesn’t record exported power! And why does it occur at this precise time? What’s special about -636.5, approximately 100 times the magnitude of the actual import at that time?

Thanks for splitting the thread, Bill.

I believe the problem occurs as a result of NaN data, but I haven’t got to the bottom of it yet. (and apologies but the lack of comments in the code or overview documentation is one of the main things in slowing me down diagnosing the problem, IMHO)

Not a problem, Dave.
Made sense and is easy to do.

I’ve noticed the same behavior and could associate it with NaN values as well:

Thanks for noticing this thread and posting the cross-link. We do indeed seem to be experiencing the same problem. It’s clearly a bug that’s been around for a while but doesn’t seem to have been cleared. I raised a bug about it: incorrect data recorded in feeds · Issue #1137 · emoncms/emoncms · GitHub but it doesn’t seem to be attracting any attention.

I read the thread about your experiences and whilst a lot of it is still over my head I did notice some points that seem interesting:

(1) this problem seems to occur with the phpfina engine but not with the timeseries engine.

(2) I’ve also seen spikes in power_to_kwh data but they’re slightly different, in that they first go down when there’s null data underlying, then spike upwards by the same height above as the initial spike below, and then return to a sensible value. In each case, the underlying trend of the data is also accounted for.

(3) I haven’t got a clue what determines the size of the spikes, but I suspect it will prove to be significant. Here’s an interesting example with two spikes, one being three nulls wide:

(4) In your thread a year ago there was a suggestion that there is a better technique for recording the data (optical pulse counters on revenue meters, in my case) instead of wh_accumulator, but I chose wh_accumulator precisely because it is the technique recommended in the current documentation. So what’s the situation here?

I had an issue with nulls, you can test in the process list and skip the log to feed if the value is null.

image

A null means that, for whatever reason, the input value was not sent correctly. Unfortunately, by default, Emoncms logs nulls to feed. Seems odd, but that is what it does.

1 Like

@djh, have you tried to record it using both engines to demonstrate this? - just a thought.

The nulls are not the problem. The whole point of phpfina, as I understand it is to minimise writes and increase efficiency. It does that by writing nulls, which are essential for it to keep the time stamps accurate. So nulls are to be be expected in the underlying data.

The problem is what wh_accumulator (in conjunction with the phpfina engine) does with those nulls.

I expect in my case the nulls are due to some co-channel interference from some device or other.

No need. Others did that over a year ago.

Yes but if you skip the wh_accumulator process if the input is null, you may well eliminate the issue.

But you said ‘seems to’. By doing it yourself you can add to the weight of evidence.

You mean I can fudge the results and sweep the bug under the carpet?

Shame on you! I’d rather try to see it fixed.

How to do that is part of the puzzle that goes over my head. Plus I’m doing more than enough fiddling with my system as it is and getting myself into a muddle.

Bit harsh I feel; if you did, you can prove it solves the problem and that the process does not handle null values as expected and solve your data problems in the short term.

Just add another process line, with the wh_accumulator process to a new feed and select the other engine.

Personally, my next step would be to edit the code and put in a load of log messages with the various values at each stage and spot where the error (or problem with logic) occurs but YMMV.

I don’t think there’s any other hypothesis been proposed has there? It seems pretty clear to me. I don’t care about short term; I want a permanent fix.

Sorry, but I don’t understand what you mean.

Me too, but I don’t know how PHP works. Do I need to restart something, and if so what? And what if I cock up the edits and the thing won’t start; how do I recover? I’d rather not experiment on my own production system.

I had a thought. Is it possible to run an emoncms system in a simulation mode? I could run a copy on my PC and I already have a load of data that show the problem, if only I can play them back and feed them into a new wh_accumulator. Then I don’t care about the emoncms installation or all the data copies; I can just throw them away afterwards.

Been thinking about this. I had a similar query which was answered very well here (in terms of how a fixed interval feed worked). So in one way you are right about nulls.

I think though, the writing of nulls to the feed by the engine as it realises there are gaps, and the sending of nulls to the process, may have different results. I’ve tried to work through the code and actually cannot reconcile what you are seeing in the feed, with what I think should be written (I think that an input null should result in the last value in the feed being written back to the feed). Without inserting some debug log code it is impossible to do any more (and I do not have a pulse counter).

You could check my theory by skipping incoming nulls…

I’m not sure if the screenshot above is showing the actual data feed values or interpolated values. There is a way of extracting just the values actually written to the feed in the link above to be sure.

This is interesting, it took me until your screenshot yesterday to realise I’d seen similar, I have been meaning to look into it. I had not noticed the correlation with missing datapoints on the source input (null values).

I have an emontx which misses 4-5% of radio packets and you can see the same dips on its kWh feed:

The dips don’t appear to affect the cumulative total, the datapoint after the dip continues as expected.

Adding logging to the power_to_kwh process, highlighting here where a missing datapoint occurs and the interval jumps from 10s to 20s. The result again suggest that the cumulative total is correctly calculated (there is no sign of the dip at this point, the last kwh value carries over to the next kwh increment calculation after the missing datapoint).

2018-12-18 08:56:18.866|ERROR|process_processlist.php|165111 now: 1545123378 1417 last:{"time":1545123368,"value":473.47169667258}
2018-12-18 08:56:18.867|ERROR|process_processlist.php|time_elapsed: 10
2018-12-18 08:56:18.867|ERROR|process_processlist.php|kwh_inc: 0.0039361111111111
2018-12-18 08:56:18.867|ERROR|process_processlist.php|new_kwh: 473.47563278369
2018-12-18 08:56:18.868|ERROR|process_processlist.php|insert: 1545123378 473.47563278369
2018-12-18 08:56:38.894|ERROR|process_processlist.php|165111 now: 1545123398 1391 last:{"time":1545123378,"value":473.47563278369}
2018-12-18 08:56:38.894|ERROR|process_processlist.php|time_elapsed: 20
2018-12-18 08:56:38.895|ERROR|process_processlist.php|kwh_inc: 0.0077277777777778
2018-12-18 08:56:38.895|ERROR|process_processlist.php|new_kwh: 473.48336056147
2018-12-18 08:56:38.895|ERROR|process_processlist.php|insert: 1545123398 473.48336056147
2018-12-18 08:56:48.971|ERROR|process_processlist.php|165111 now: 1545123408 1377 last:{"time":1545123398,"value":473.48336056147}
2018-12-18 08:56:48.971|ERROR|process_processlist.php|time_elapsed: 10
2018-12-18 08:56:48.972|ERROR|process_processlist.php|kwh_inc: 0.003825
2018-12-18 08:56:48.972|ERROR|process_processlist.php|new_kwh: 473.48718556147
2018-12-18 08:56:48.972|ERROR|process_processlist.php|insert: 1545123408 473.48718556147

Which suggests that the problem is located in the way padding is written inside PHPFINA.
I may have now found the issue. Testing to see if that’s the case at the moment.

2 Likes

Yes, except for the detail that the entries in the file are actually floats, so four bytes long, not single bytes.

Sorry, I don’t understand what you mean here.

Yes, I haven’t managed to spot the error by reading the code either, and nor has anybody else apparently. So I agree about the need for debugging. Any thoughts about running a simulation?

I wrote a small perl program to read the data files directly, so I’m confident that the data we’re seeing is genuine. All the feeds are written at 10s intervals, so it’s esy to adjust for interpolation if necessary.