EmonPi Temperature Measurement

A little while ago, Paul Burnell ( @pb66 ) noted that the temperature handling in the emonPi is not the same as our present “best practise”. Specifically, it did not return “300” error codes but worse, when I checked K&R, I found that the conversion function return value would be undefined if the temperature was out of range. Although a test revealed that the actual return value in that situation was zero, there is no guarantee of this.

I suggest therefore that the sketch should be altered.

The changes for the emonPi should be (noting that I have not tested this) as per the attached. Note that this set of files are ones I retrieved some time ago, dated 10 April 2017, so they might have been updated since then.

The codes used are:
#define BAD_TEMPERATURE 300
// this value (300C) is sent if no sensor is present or the checksum is bad (corrupted data)
#define UNUSED_TEMPERATURE 301
// this value (301C) is sent if no sensor has ever been detected
#define OUTOFRANGE_TEMPERATURE 302
// this value (302C) is sent if the sensor reports < -55C or > +125C
// NOTE: The sensor might report 85C if the temperature is retrieved but the sensor has not been commanded

1 Like

In the interest of trying to do most more on the public forum. Would you mind if I made this PM public? That way @pb66 and others can join in.

Where is this best practice defined? Are the temperature error codes not generated from the Dallas library?

Where are these defined?

I think Paul (@pb66) originally defined most of them to get around the problem of receiver drift when transmitting a long string of zeros by RFM. Those are the codes I’m using in emonLibCM and the 3-phase PLL.

The Dallas library generates DEVICE_DISCONNECTED for a checksum error, but you don’t check it separately anyway. It equates to -127, so to the function as it stands, it will appear out of range and then the function return value is undefined. I don’t use the Dallas library at all in the PLL due to time constraints (which is also why it can have only one temperature sensor). I use parts of it in emonLibCM.

It was proposed in the forum some time ago.

As it stands, the function returns the temperature if it is within range. As noted above, a checksum error results in an out-of-range value. In this case, there is no specific “return” statement in the logic path, and the program flows off the end of the function. According to Kernighan & Ritchie (2nd ed, p225), “Flowing off the end of a function is equivalent to a return with no expression. In either case, the returned value is undefined.” K&R doesn’t say it, but the inference is a return or flowing off the end is only acceptable if the function has been declared “void”, when it is expressly defined that the return value will never be used. If it is not a void function, then the return value could be anything. It happens to be zero, which is a valid in-range value, so knowledge of the fault has been discarded.

The recent post where this is revealed is in the emonPi hangs and stops logging - #12 by pb66 thread.

Trying to rule out that the users intermittent temp sensor issue might be causing the emonPi to stop updating, I found the emonPi FW didn’t follow the “300 Errors” convention, but I then assumed that might be because it doesn’t send it’s data over RFM so potentially it’s not an issue. But there remains the question over alerting the user to unused, missing and faulty sensors.

As a general rule we should avoid using “0” as a default value where “0” is a valid value. When a user sees his emonPi returning 0°C for any of the 6 temperature sensors, what does it mean? Was the sensor ever connected properly? Has it disappeared since installation? or is it actually exactly 0°C in here?

The original need for a non-zero value is because the emonTx packets were failing crc checks due to “bitslip” at the receiver due to the long string of 0’s when no temp sensors were used (or they were faulty).

I did check the emonTx repo for the commit hoping for a link to the discussion, although the bulk of the history for the emonTx sketch is lost/severed due to the reshuffles, I did find this commit

Hoping for better luck in the emonTH2 repo, I was really quite surprised to find no sign of these changes, so it less of a convention that I originally thought.

I originally suggested “300” as that was in-line with MartinR’s PLL sketches default temp value. Although I did make recommendations to use different 30x error codes I do not see that having been adopted in the emonTx FW.

However, My recommendations were different to your (@Robert.Wall) definitions above. I recommended that variables were created with a value of “300” and then either a valid temp reading or a error code >300 would replace that value at runtime if a sensor was installed.

This was specifically done so that in emoncms an event could be created that triggered a user alert for values greater than 300 so that the user wasn’t bombarded with warnings about sensors not being installed, only if a detected sensor subsequently disappeared or gave an error, would the event be triggered. I’m not sure if that’s a typo in RW’s OP or if things got changed for the 3ph and emonLibCM.

I would like to see an escalating level of warning priority so that (for example) if a user knows they have a very intermittent connection that results in the occasional missed value, they can set the alert warning to >301

I stand by my original recommendations that every sketch should at the very minimum avoid sending “0” for unused sensors as this will avoid “bitslip” and at least make the distinction between a used and unused sensor.

If this is going to be the “OEM convention”, which I thought was already the case, then yes there should be a place to reference this info, but where? Do we have a specific place that defines OEM conventions?

So reordering them would give us

#define UNUSED_TEMPERATURE 300
// this value (300C) is sent if no sensor has ever been detected
#define OUTOFRANGE_TEMPERATURE 301
// this value (301C) is sent if the sensor reports < -55C or > +125C
#define BAD_TEMPERATURE 302
// this value (302C) is sent if no sensor is present or the checksum is bad (corrupted data)
// NOTE: The sensor might report 85C if the temperature is retrieved but the sensor has not been commanded

However!
On numerous previous occasion when this has been discussed (incl some PM’s) the question over the “85°C” errors has arisen more than once and my opinion is that perhaps an exact reading of 85°C should be flagged as a low level warning that users can choose to include as an error or not. eg 85.000°C = 301, this way in situation where 85° is out of normal range a user can set >300 as an alert level or if 85°C is within normal operating range of the applied application the user could set >301 as an alert level and the cost will simply be that any exactly 85°C reading would be lost so (for example) an increasing temperature trace might go 84.8, 84.9,null,85.1,85.2 etc.

In which case the defined fault codes might be

#define UNUSED_TEMPERATURE 300
// this value (300C) is sent if no sensor has ever been detected
#define ERROR_TEMPERATURE 301
// this value (301C) is sent if the sensor reports exactly 85.00C (potential error)
#define OUTOFRANGE_TEMPERATURE 302
// this value (302C) is sent if the sensor reports < -55C or > +125C
#define BAD_TEMPERATURE 303
// this value (303C) is sent if no sensor is present or the checksum is bad (corrupted data)

Either way it should be consistent and probably documented somewhere.

I think this has been left open/unfinished where the proposition was made and the discussion faded as there was no formal adoption or agreement, @TrystanLea simply applied the minimal changes to the one FW that was the issue at that time and no further progress has been made other than @Robert.Wall using it in the newer stuff.

I have now found the original threads on the old forum for ref. First the thread where the “bitslip” issue was found and a solution proposed in July 2015 (Data loss due to RF packets getting corrupted) and then in Oct 2015 where Trystan adds the fix to the emonTx FW (RF69 reliability, timing, temp sensors, mqtt & lowpowerlabs).

It would be good to make a formal decision on this since it’s been around for 2yrs 8mths now.

I remember there was a big (but not unfriendly) argument over “85°C” and the possibility of deciding, depending on the values either side, whether it was likely to be an error or a valid reading. It is that, presumably because Dallas has it as the borderline where the accuracy spec. changes - “±0.5°C Accuracy from -10°C to +85°C” and I’m guessing it was originally an absolute limiting value. The number is hard coded into the sensor, you’ll wait a long time before that changes.

My only concern with shuffling the codes around is the 3-phase PLL sketch (now apparently on GitHub - I never knew :astonished: ) and emonLibCM (fortunately unpublished) will need changing.

Indeed.

I say yes there should be, AFAIK there isn’t one.
No.1 is: Imported power is positive.

I understand fully, hence why it is so important to agree and document these things rather than just letting it free-wheel and hope it all comes good one day.

IMO it’s not really an issue as the codes are not yet utilized or documented elsewhere. It would better to make the change now and suffer the consequence from a few early adopters getting slightly different results that can be fixed with a FW upgrade IF they even elect to use the codes, rather than continue down a path that stifles development and widespread adoption.

There are far more users with standard emonTx’s that already send “300” for unused temp sensors and many forum threads that reference that fact already.

That’s still a possibility but I didn’t bring it up as it sounds resource heavy (less of a concern if we use beefier MCU’s) but yes if the last value was even within 5°C of 85°C then it could be assumed to be valid or only assume the reading to be an error if it is static at 85, there are multiple approaches that could be tried, but it will be pointless if there is no low priority error code (eg 301) reserved for it, even if we ignore the “85°C” cases for now and just leave the 301 unallocated would be fine for now as any of these checks might occur in FW so it doesn’t need to be defined or even remain the same as long as the code is available.

And even if at some point the checking was to move out to emoncms or emonhub or ??? it still wouldn’t be an issue as the 301 code can just be made redundant, remain for legacy installs or continue to be used if appropriate.

Exactly. Hopefully it’s only @Simsala who’ll be significantly affected. I know he’s got a number in use (I’m not sure how many, but his business seems to be very like yours).

I got the impression that anything greater than 300 is OK for an error. But before I read this, I wondered whether there might be someone who wants to log furnace temperatures for example? I know the DS18B20 won’t cut it, but 300°C might easily be within range for that, and we have the “85” problem again. If we are having a ‘universal’ error code that’s not specific to the DS18B20 and future-proof, is there good reason not to go to -300 (or any number below -273.15°C), then at least we’d be sure we don’t invent a second problem where don’t need one. :grinning:

the issue with going beyond 320°C is that would cause an issue for those of us that use 2 decimal places for temperature readings scaled by 100 to fit a signed int. Yes would could accomadate “furnace monitors” that use temperature is excess of 300 but will we then find a use case for -300 ??

MartinR’s PLL sketches have used “300” for many years and the current emonTx has used “300” for 2.5yrs, are there enough users out there wanting to measure the temperature in furnaces to warrant the upheaval and on going confusion of “300” prior to March 18 and “???” post October 2018 wth 6 months of “either” in between as that’s how long it might take to get rolled out.

If there are users wanting to record temps outside of -55 to +125 they need a different sensor and therefore most likely a different library and the warnings surrounding those sorts of temperatures will possibly be far more “mission critical” than our use case, so a bespoke setup is going to be needed, they can just not use our “300” system and monitor the full range of temperature thier chosen devices provide. We are trying to provide for the masses not the odd case.

We run a risk here of looking for issues that will never occur and making this too complex, causing it to go un implemented for several more years. in the interest of “KISS” and the existing user base, I vote for keeping it “300”.

Besides telling someone the alert is triggered with a value < -300 is in itself confusing, take the RSSI discussions as a case example, >300 is so much simpler.

Only as an error!
It’s quite possible to have both - it gets a bit messy, but I was only trying to throw spanners in - better now than cleaning out the swarf later!

Fair enough. I’ll go along with that. You’ve got 28 possible error codes - that should be enough.

[Edit]
It might be a good idea to decide now whether we use × 10 or × 100 to send the values. The PLL sketch, emonTx V3, emonTH, emonPi do × 10 . EmonLibCM follows MartinR with × 100 but the sketch that uses it doesn’t need to do so, it can easily truncate or round and send × 10. I think there’s a good argument to keep the (apparent - at best steps of 0.0625°C) precision in the library, so that it can be put into use if required. The CM library won’t read the full precision anyway if the reporting rate is high and time is short.

Well we could have 280 or 2800 if decimal precision is utilized, I was keeping that under my hat, but for future yet undefined fault code we can always use 301.1 or 300.5 etc if more codes are needed.

I don’t think there is any reason to standardize this, it is defined in the payload and the emonhub definition. Some of us do prefer the 2 decimal places precision as it conveys movement better, you can wait a long time to see a whole 0.1 change and if the temp is regulated eg room temp, you may not see any change. I think it’s ok as it is to allow either and set “scales” to suit, the only reason I could see to force a single decimal precision is if we needed to move the range to include 3000°C which does seem a bit extreme.

Not on a ATmega328p it won’t, but on a faster MCU eg STM32 it might.

No, the conversion time is a limitation of the DS18B20. If it’s worth the programming effort and you have the processing power (e.g. STM32), it would be possible to read the temperature at a lower rate, say every other, every fourth or every eighth ‘report’, but then that raises the ugly question of whether you send the ‘old’ temperature between updates, or an “unavailable” code so that the downstream processing can decide what to do, or try to extrapolate from previous values, or what? I chose to reduce the precision to 9 bits (0.5°C steps, and < 100 ms conversion time) when the reporting period was less than 1 s. I don’t specify the library at more than 10 reports per second.

No I don’t mind. That’s totally fine. I’ve moved this thread to the emonPi forum.

Very true, ‘0’ is not a good error value for a temp sensor!

Yup, I remember that.

Maybe the learn page: Temperature Sensing Using DS18B20 Digital Sensors — OpenEnergyMonitor 0.0.1 documentation and or as a comment in the sketch.

That sounds good. I would be happy to use this as a convention.

Mmm I would be tempted to ignore 85 deg C as this is very much withing the possible reading range of a DS18B20 and may cause false negatives.

Are DS18B20 sensors errors a big issue? Once up and running I can’t think of any systems that I’ve been involved in that throw errors. I agree it’s best to handle all errors in the best way we can. The most important thing I think is to better handle ‘no sensor connected’ by returning 300 as returning ‘0’ is not good.

I don’t have time this week to work on adding this into the FW. I would gladly accept a PR if anyone fancies.

Agree, over 300 deg C is a niche use case and not worth considering. DS18B20 is max 125 deg C usable.

We got our SMT32 up and running today using an emontx shield. Very easy to upload to using PlatformIO or Arduino IDE. This is a topic for a different conversation!

As I’ve stated, it’s fine to ignore this for now, but once the decision is made to use the error code for a higher priority level there no going back. There is a valid discussion here and a possibility that the “85” errors can be better filtered in the future. There is nothing to be gained by making a premature decision to rule it out.

We should simply use

#define UNUSED_TEMPERATURE 300
// this value (300C) is sent if no sensor has ever been detected
#define OUTOFRANGE_TEMPERATURE 302
// this value (302C) is sent if the sensor reports < -55C or > +125C
#define BAD_TEMPERATURE 303
// this value (303C) is sent if no sensor is present or the checksum is bad (corrupted data)
// NOTE: The sensor might report 85C if the temperature is retrieved but the sensor has not been commanded

If defining the “301” code for possible future use with “85” errors is too distracting.

In fact why don’t we just leave ourselves a little more wriggle room and just use even numbers

#define UNUSED_TEMPERATURE 300
// this value (300C) is sent if no sensor has ever been detected
#define OUTOFRANGE_TEMPERATURE 302
// this value (302C) is sent if the sensor reports < -55C or > +125C
#define BAD_TEMPERATURE 304
// this value (304C) is sent if no sensor is present or the checksum is bad (corrupted data)
// NOTE: The sensor might report 85C if the temperature is retrieved but the sensor has not been commanded

That way we have the option to insert any possible future “85” errors as a higher or lower priority to “OUTOFRANGE” errors, Job done!

Exactly, that is the important part, it needs adding to the emonTH and emonPi sketches, plus the derived 3ph sketch and emonLibCM will need updating to at least use “300” for unused sensors.

Although having said that! We really should try and reach a decision on the rest soon rather than leaving it free-wheeling indefinitely again.

1 Like

Sure sounds good.

Decision made? Let’s use the above.

2 Likes

I concur.

1 Like

Error codes have been added to the DS18B20 learn page

In the ‘Software > Error codes’ section

https://learn.openenergymonitor.org/electricity-monitoring/temperature/DS18B20-temperature-sensing

You may wish to check the links in the Addressing the sensors. section, they now give a http 404 result.

Paul

PR submitted - Update git url's by Paul-Reed · Pull Request #20 · openenergymonitor/learn · GitHub to correctly update the git url’s as per my last post.

Paul