EmonPi Temperature Measurement

The recent post where this is revealed is in the emonPi hangs and stops logging - #12 by pb66 thread.

Trying to rule out that the users intermittent temp sensor issue might be causing the emonPi to stop updating, I found the emonPi FW didn’t follow the “300 Errors” convention, but I then assumed that might be because it doesn’t send it’s data over RFM so potentially it’s not an issue. But there remains the question over alerting the user to unused, missing and faulty sensors.

As a general rule we should avoid using “0” as a default value where “0” is a valid value. When a user sees his emonPi returning 0°C for any of the 6 temperature sensors, what does it mean? Was the sensor ever connected properly? Has it disappeared since installation? or is it actually exactly 0°C in here?

The original need for a non-zero value is because the emonTx packets were failing crc checks due to “bitslip” at the receiver due to the long string of 0’s when no temp sensors were used (or they were faulty).

I did check the emonTx repo for the commit hoping for a link to the discussion, although the bulk of the history for the emonTx sketch is lost/severed due to the reshuffles, I did find this commit

Hoping for better luck in the emonTH2 repo, I was really quite surprised to find no sign of these changes, so it less of a convention that I originally thought.

I originally suggested “300” as that was in-line with MartinR’s PLL sketches default temp value. Although I did make recommendations to use different 30x error codes I do not see that having been adopted in the emonTx FW.

However, My recommendations were different to your (@Robert.Wall) definitions above. I recommended that variables were created with a value of “300” and then either a valid temp reading or a error code >300 would replace that value at runtime if a sensor was installed.

This was specifically done so that in emoncms an event could be created that triggered a user alert for values greater than 300 so that the user wasn’t bombarded with warnings about sensors not being installed, only if a detected sensor subsequently disappeared or gave an error, would the event be triggered. I’m not sure if that’s a typo in RW’s OP or if things got changed for the 3ph and emonLibCM.

I would like to see an escalating level of warning priority so that (for example) if a user knows they have a very intermittent connection that results in the occasional missed value, they can set the alert warning to >301

I stand by my original recommendations that every sketch should at the very minimum avoid sending “0” for unused sensors as this will avoid “bitslip” and at least make the distinction between a used and unused sensor.

If this is going to be the “OEM convention”, which I thought was already the case, then yes there should be a place to reference this info, but where? Do we have a specific place that defines OEM conventions?

So reordering them would give us

#define UNUSED_TEMPERATURE 300
// this value (300C) is sent if no sensor has ever been detected
#define OUTOFRANGE_TEMPERATURE 301
// this value (301C) is sent if the sensor reports < -55C or > +125C
#define BAD_TEMPERATURE 302
// this value (302C) is sent if no sensor is present or the checksum is bad (corrupted data)
// NOTE: The sensor might report 85C if the temperature is retrieved but the sensor has not been commanded

However!
On numerous previous occasion when this has been discussed (incl some PM’s) the question over the “85°C” errors has arisen more than once and my opinion is that perhaps an exact reading of 85°C should be flagged as a low level warning that users can choose to include as an error or not. eg 85.000°C = 301, this way in situation where 85° is out of normal range a user can set >300 as an alert level or if 85°C is within normal operating range of the applied application the user could set >301 as an alert level and the cost will simply be that any exactly 85°C reading would be lost so (for example) an increasing temperature trace might go 84.8, 84.9,null,85.1,85.2 etc.

In which case the defined fault codes might be

#define UNUSED_TEMPERATURE 300
// this value (300C) is sent if no sensor has ever been detected
#define ERROR_TEMPERATURE 301
// this value (301C) is sent if the sensor reports exactly 85.00C (potential error)
#define OUTOFRANGE_TEMPERATURE 302
// this value (302C) is sent if the sensor reports < -55C or > +125C
#define BAD_TEMPERATURE 303
// this value (303C) is sent if no sensor is present or the checksum is bad (corrupted data)

Either way it should be consistent and probably documented somewhere.

I think this has been left open/unfinished where the proposition was made and the discussion faded as there was no formal adoption or agreement, @TrystanLea simply applied the minimal changes to the one FW that was the issue at that time and no further progress has been made other than @Robert.Wall using it in the newer stuff.

I have now found the original threads on the old forum for ref. First the thread where the “bitslip” issue was found and a solution proposed in July 2015 (Data loss due to RF packets getting corrupted) and then in Oct 2015 where Trystan adds the fix to the emonTx FW (RF69 reliability, timing, temp sensors, mqtt & lowpowerlabs).

It would be good to make a formal decision on this since it’s been around for 2yrs 8mths now.