Another apparent graph error

djh · 21 January 2024 17:16

OK. Just noticed another strangeness in a graph. Here’s the graph:

It’s a graph I use to monitor my electricity consumption. I noticed a break in the graph around 12:52 so I zoomed in to investigate. Most of the data comes from emontx3, which is situated next to the emonbase that runs emoncms. So it seems that something interrupted its transmissions for 30-40 seconds.

But then I noticed that the orange line was interrupted whilst the purple one continued through the interruption. (Purple is shown as points here.) They both come from emontx4, which is in the garage 30 m or so away. And they’re both PHPFINA. What’s more they’re both derived from the same input:

The orange is the power, which is derived BEFORE the energy, which is purple. So how come there are purple values when there are no orange ones? And over the same period that emontx had its hiccup? What’s going on?

TrystanLea · 21 January 2024 18:04

Hello @djh the power to kWh process fills in the missing data points with a straight-line join between the last valid data point and the next.

This is required for e.g fetching daily data, where a kWh value is fetched at midnight every day. A missing value at midnight would break the calculation so we interpolate the most likely value from a valid data point at 11:59:50 and 00:00:10

You will also notice the ‘log to feed (join)’ input process this does the same thing but without the integrated power to kWh process. It’s advisable to use this input process with cumulative energy data from the emontx4 energy inputs as an example…

Hope that helps

djh · 22 January 2024 11:08

Ah, OK. So my problem was actually that the emonbase didn’t receive input from either emontx3 or emontx4 for 30-40 seconds?

I don’t understand that. The energy data is derived from the power inputs using power to kWh. How could I derive kWh otherwise?

TrystanLea · 22 January 2024 16:57

On an emonTx4 the kWh values are calculated on the emonTx4 itself and persisted in the EEPROM so that they dont reset when the emonTx4 is power cycled.

djh · 22 January 2024 17:27

emontx4 is just the name of the second emontx! It only has power outputs.

TrystanLea · 22 January 2024 17:35

Ah I was referring to the emonTx4 hardware & firmware, not an emonTx3 that you have renamed to be called emonTx4.

That said you can also upload the continuous monitoring firmware to the emonTx3 and have it report energy values as well. New firmware can be uploaded from the emonBase/emonPi via the Admin > Update > Firmware tool, this is what that firmware option looks like:

djh · 23 January 2024 20:36

I’m philosophically opposed to ‘faking’ data in the original logs. I believe it’s better to record what actually happened and then post process the data as required for particular applications. I want to be able to see when data is missing in particular.

TrystanLea · 23 January 2024 23:06

I do not agree that it’s faking data. In this case it’s basic interpolation between kWh data points e.g lets say at 11pm you have consumed 10kWh and then there is data gap and at 1am the next day you record 12 kWh. This approach just returns the best estimate for the value at midnight if you ask for it, e.g 11 kWh.

It could return null of course but then if you’re trying to subtract the kWh reading from midnight today with the value from midnight yesterday it will not be able to calculate that as the value returned for midnight is null. Hence in the case of cumulative kWh data it makes sense to have interpolation to find the most likely value at the desired request time.

With fixed interval data you need to fill in the data point slot in the feed itself. With PHPTimeseries, it could be done with interpolation during post processing. But then the whole reason for PHPFina is that it uses under half the disk space of PHPTimeseries when data is of a fixed interval - and has much faster data access as you dont need to do binary search to find each data point.

It’s a design decision on my part to focus on this way of doing this using fixed interval data storage and building everything around that. I’m very happy with it to be honest and feel there is good reason for this approach.

djh · 24 January 2024 17:42

For a counterexample, in the use case I’m considering at the moment I’m using the temperature data to build a thermal model of my house. If there’s missing data I need to know about it because I need to avoid feeding that period to the model building process.

To meet your requirement I would argue that you should indeed store Nan and then a process requiring to know the best ‘estimated’ value at midnight should [call a library to function to] find the good values just before and after midnight and interpolate between them. i.e. the interpolation should be done at read time, when required, and not at write time, forcing it on all consumers.

I don’t understand your point about “With fixed interval data you need to fill in the data point slot in the feed itself”. The suggestion I made works just as well for fixed intervals.

TrystanLea · 25 January 2024 14:43

Hello @djh for temperature data I agree that it’s best not to use Log to feed (join), my example was specific to cumulative kWh data.

In the MyHeatpump app, electric power, heat output, flow, return, room and outside temperatures are all expected to be recorded with Log to feed and include null values when data is missing.

One of the things that the MyHeatpump app does is distinguish between gaps in data that are very short e.g the odd missing value vs long gaps in data of a few hours. There’s actually a threshold in the modelling code in the app that if there is a gap in data above 15 minutes it treats it a full data outage and if the gap is less than that it will use the last valid value from each relevant power or temperature feed.

There’s also a second check in the modelling code to ensure that when doing COP calculations using power and heat data that both feeds must have data present to make that calculation. E.g it doesnt just calculate the mean of the electric power and the mean of the heat data separately and then divide one by the other. It checks that the mean is only calculated for periods where both feeds have data present.

So I agree that in the end use of data you need to be careful about how you are processing the data to ensure calculations are accurate / representative.

djh · 25 January 2024 16:36

I don’t see any difference between temperature or cumulative kWh or any other data. I’m using kWh data in my model as well, and I need to know when there’s a gap.

So I’m still firmly of the opinion that the original data should be stored, including gaps if they’re there, and any filling in of gaps should be done afterwards by applications that want to do it.