EmonHub Development

Note to readers: The following discussion was a PM between Paul Burnell @pb66 and I attempting to work out how we could bring emonhub development efforts together after my splitting of development by creating the emon-pi variant. Our discussion extends over 57 posts and there is a lot of information contained in the discussion that may help prospective emonhub developers to understand the future direction of emonhub and the design decisions behind it. We’ve decided to make the discussion public as its too lengthy to summarise and a summary would miss too much detail.

Hello Paul

Not quite there yet, but I’ve spent a bit of time this evening trying to get to the bottom of the emoncms http interfacer buffer issue and having studied your experiemntal branch again I now see where I went wrong! and your intention in terms of abstraction in the main emonhub_reporters class.

I’ve started the process of reverting the implementation back to your implementation in the experimental branch and so far its gone quite well, its still missing a couple of things I had added and I havent touched the mqtt interfacer yet.

I’ve committed the changes to a separate development branch here:
https://github.com/openenergymonitor/emonhub/tree/emonhub_buffer

I’ve also been through the other interfacers to make a note of what the differences are between the experimental branch and the emonpi variant, a great deal of it is the same including:

  • serial interfacer
  • socket interfacer
  • packetgen interfacer
  • jee interfacer has a couple of differences in the way the RSSI Is handled and a couple of catch exceptions.
  • the emoncms reporter is now pretty much the same

I’ve merged the emonhub_interfacer with what you had in the emonhub_reporter so that all interfacers have these base methods with the capability to read in data or report/send/publish data out.

The big changes from the experimental branch are the:

  • merging of interfacer and reporters including the base classes.
  • the separation of the resulting interfacers into seperate files
  • and a slightly different message queue implementation - although I think not far off your original implementation.

Il hopefully get some more time to continue this soon and I hope its a bit more familiar too you now.

A little more work on this, I’ve now adapted the MQTT interfacer to use the emonhub_interfacer class including the emonhub_buffer.

The MQTT interfacer requires the full cargo object, while the HTTP interfacer used the cut down emoncms format without names etc. Rather than fill the databuffer with the emoncms format I have modified it to buffer the full cargo object allowing the interfacer to then choose the output format, whether to use the names etc stored in the full cargo

Here’s the latest commit
https://github.com/openenergymonitor/emonhub/commit/a1328be802c4df47cf92e8bb6c5f2ffba2c4c654

I appreciate its going to be hard following what Im doing. I know how I like to understand every change line by line. Im wondering how I could make it easier for you, one idea is that I try and break down the emonpi variant changes into pull requests to your experimental branch and that we work through it that way? I can give it a go and see if I can manage it.

This is the opposite of the intended operation. In the case of the http interfacer, The data should always be parsed for the final payload format before buffering, this way the parsing is only done once and stored in a ready to send format, rather than the parsing happening on each and every send resend attempt, if it takes 8640 resend attempts at 10s intervals because the network is down for a full day, that is 8640 times the data will get parsed for sending (unless there is another intermediary buffer) and when the network does come up, the whole days data (eg 4 emonTxs = 34560 payloads) needs parsing over a short period to “catch up”. With the original method the parsing is done once only and that workload is spread thinly over a longer period, when the network comes up, there’s no parsing, just 139 requests of 250 payloads each. If the network outage happened to be emoncms.org down for a day, not all of those 250 payloads will get through first time as there could be long connection times and even timeouts, this is already blocking the http interfacer without giving it the extra workload of parsing every payload on send. Bear in mind the actual size of these buffers in memory are relatively small, I have many installs that produce and buffer (original emonhub) 100 bytes of data every 5secs and can buffer for 7/10 days no problem1.7MB of data is not huge but processing 691200 frames of data into 2770 requests and delivering them to a busy server after a 10 day outage is a huge task. (unlikely scenario I know but possible and entirely doable in original emonhub)

As for the MQTT implementation, this is one of the many reasons a mqtt bulk upload is desirable, that way it can work the same way as the http interfacer above when being used in a “reporter” type way. An alternative MQTT interfacer would provide local realtime one value per topic MQTT data on a more friendly and flexible topic tree. The former would have a buffer and QoS level 2 to ensure the historic data is never lost and the latter a low QoS with no extended buffering as the data is realtime only, devices can subscribe to the data stream as and when they want, like a MQTT “broadcast” rather than the one on one confirmed delivery of the bulk emonhub to emoncms MQTT messages.

Now the emoncms input methods are available to both the mqtt and the http inputs this should be easier to implement perhaps?

The use of 2 MQTT interfacers in the above example also provide a way for other interfacers to post data to one MQTT interfacer or the other. eg a new interfacer that polls emoncms for processed feed data, eg “total energy consumed today” and “total PV generated today”, this data could be published via the “local realtime MQTT” but not looped back to emoncms via the “bulk upload MQTT”.

The above 2 “different” methods of delivering data over any one protocol are the core reasons I believe both “interfacers” and “reporters” are needed, I know it was me that first suggested combining them, but it did not take long to realise this is both confusing to users and devs, and difficult to develop. The predominant difference is the buffering, this in turn means there are differences like confirmation of delivery, efficient packaging etc. Both your http and mqtt implementations in emonhub are “interfacers” and IMO adding the buffering doesn’t work very well, there needs to be different “types” or “levels” of interfacer or “reporters”.

Keeping memory usage and network bandwidth low is the reason I prefer the original bulk upload that used indexed csv, I’m not opposed to data being sent as key:value paired JSON, but I think, much like the MQTT examples above, the key:value json requests should be available in a http “interfacer” and not replace the original index csv bulk format of a http “reporter”.

Perhaps it would be possible to combine the different methods into one more complex interfacer so that a “bulk” mode setting could define the operation, buffered or not, confirmed or not, indexed csv or key:value payloads etc, it’s possible but I think it would be confusing that changing a setting might totally change the payload and behavior.

I’m just thinking out loud at this point so please do not start rewritting emonhub based on a couple of suggestions poised for discussion, I am not entirely sure of the best route yet.

But if we revamp the way the interfacers are used this would give an opportunity to revise the naming, eg the mqtt interfacer must be a generic MQTT interfacer and not emoncms specific, the generic mqtt interfacer should be available for subclassing to other MQTT services rather than reinventing the wheel if we want to send to a different target or as in the post above, in different formats.

The possibility of renaming some of the interfacers might also offer a solution to 2 other issues, firstly the “emonhubemoncmshttpinterfacer” type names are getting a bit long even in camelCase. perhaps we shouls entertain a wider naming stratagy so that emoncms specific interfacers drop the leading “emonhub” and emonhub core/generic interfacers retain the emonhub prefix. The second issue is the way ALL interfacers are in seperate files, IMO this is messy and it blocks the intended development of drop-in interfacers and a recommended development route and naming strategy.

My intention was that all generic/core interfacers are in one file, that is the only way we can ensure that the core classes are available to all interfacers once we encourage users to edit and manage the interfacer files/folder. i wanted emonhub****interfacer to be a reserved name range that would be included in all git pushes so a user can hold private interfacers under another name eg myPrivate****interfacer without fear of publishing it when contributing to emonhub development, if the user choose to contribute it, then simply refactoring the name would result in it’s inclusion to a PR. emoncms****interfacer could be a reserved name for the interfacers included in the default emonhub repo/install as seperate files rather than in the single core/genertic interfacers file. This is also where the more application specific "emonhub**** eg interfacers would reside eg emonhubwundergroundinterfacer and emonhubhiveinterfacer etc

you are now more familiar with socket interfacers and as I have said previously, these play a big part in dev’ing for emonhub. The official route would be for users to be able to “play” with a simple Python script to interface with their target device/project without any affect to emonhub, once they are able to get a simple Python script working in a basic form they can add a couple of lines to that script to post data via an existing emonhub socket interfacer. once they are familiar with Python code to achieve thier goal they can edit a copy of the provided exampletemplateinterfacer to construct a properly structured emonhub interfacer using the experimental Python code they have written in the script, once happy/discussed/finetuned they can create a PR just by remaing and pushing to emonhub repo. This is an easy dev path to support, ensures PR’s are of a consistent quality and makes it easier for users to get involved.

Im not sure that I see the problem converting a 250 item long chunk of the buffer every 30s as it works through 691200 frames. The conversion of the buffer into a json string is surely easy work for the pi?

So the question for me seems to be about memory, can the pi or other platforms handle say 30 days of 10s cargo objects, how much memory does that take? Maybe I need to put a test script together to better understand the memory requirements?

I need to think about this one a bit more, to try and get my head around it.

sure

yes sure

If a user modifies or deletes an interfacer from the interfacers folder that is inherited by another then isnt it the users fault for doing this? a user could just decide to enter emonhub_interfacers and modify something there removing a class’s availability?

The socket interfacer is great, I can see that being used as the main way for users to interface their own scripts with emonhub without creating a dedicated interfacer as you say to start with, great!

That would take over 24hrs (excluding any new data for that 24hrs) to catch up I was thinking more like 500 or even 1000 every 5 (or 10) secs

I’ve managed to work out a slightly cleaner module loader, that avoids the need for a load of hardcoded entries in emonhub.py, work in progress would be good to add some error catching where modules have not been defined in __init__.py

https://github.com/openenergymonitor/emonhub/blob/emonhub_buffer/src/emonhub.py#L30

A bit of test code for the memory question:

import Cargo
from guppy import hpy

h = hpy()
print h.heap()

buf = []

for i in range(0,691200):
    c = Cargo.new_cargo(realdata=[100,200,300,245.4])
    c.names = ["power1","power2","power3","vrms"]
    buf.append(c)

print h.heap()

result:

$ python testp.py 
Partition of a set of 25391 objects. Total size = 3271160 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  11077  44   855624  26    855624  26 str
     1   5906  23   476504  15   1332128  41 tuple
     2    325   1   283384   9   1615512  49 dict (no owner)
     3     72   0   218304   7   1833816  56 dict of module
     4    201   1   212952   7   2046768  63 dict of type
     5   1636   6   209408   6   2256176  69 types.CodeType
     6   1600   6   192000   6   2448176  75 function
     7    201   1   178816   5   2626992  80 type
     8    124   0   135328   4   2762320  84 dict of class
     9   1045   4    83600   3   2845920  87 __builtin__.wrapper_descriptor
<90 more rows. Type e.g. '_.more' to view.>
Partition of a set of 4863666 objects. Total size = 1192371760 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 691200  14 724377600  61 724377600  61 dict of Cargo.EmonHubCargo
     1 691531  14 193821064  16 918198664  77 dict (no owner)
     2 1382582  28 193799728  16 1111998392  93 list
     3 691200  14 44236800   4 1156235192  97 Cargo.EmonHubCargo
     4 691371  14 16592904   1 1172828096  98 int
     5 691206  14 16588944   1 1189417040 100 float
     6  11079   0   855752   0 1190272792 100 str
     7   5905   0   476440   0 1190749232 100 tuple
     8     72   0   218304   0 1190967536 100 dict of module
     9    201   0   212952   0 1191180488 100 dict of type
<92 more rows. Type e.g. '_.more' to view.>

I think that’s 1.1GB for 691200 frames.

or for the array buffer 95MB:

# import Cargo
from guppy import hpy

h = hpy()
print h.heap()

buf = []

for i in range(0,691200):
    buf.append([100,200,300,245.4])

print h.heap()

Only storing values and keys results in 369mb, which could be an option to reduce memory use:

# import Cargo
from guppy import hpy

h = hpy()
print h.heap()

buf = []

for i in range(0,691200):
    
    cargo = {}
    cargo['names'] = ["power1","power2","power3","vrms"]
    cargo['values'] = [100,200,300,245.4]
    
    buf.append(cargo)

print h.heap()

I’m struggling to understand, I don’t understand why there is any need to change what has worked very well in the original version for >4years. Nor do I understand how this is “allowing the interfacer to then choose the output format” to any greater degree than the original code.

The databuffer is part of the interfacer, originally it formats the payload before buffering and your proposal just moves the formatting to after the databuffer, that doesn’t in any way effect how or what format is ultimately used to send the data. It changes 2 things. Firstly it moves the formatting process in time from a steady regular pace prior to buffering to after the buffering when activity is already increased due to the increased sending of requests when the network becomes available. Secondly it changes the format of the data whilst held in the buffer, going by your own tests this would at best effectively double the amount of RAM required to hold the same data over the existing method.

This seem very inefficient on both counts. Yes a Pi 3 could probably get by, but why should it need to? I see no positives for these changes. I see no good reason to justify the extra RAM space used to do the same job or the bottleneck it needlessly creates. If we are to promote the emonPi and/or emonhub to do more IoT and monitoring, we need to focus on conserving resources so that there is room for other projects. And what about other models of Pi’s? are we suggesting users should not use older Pi A’s and B’s etc? That isn’t in keeping with the environmentally friendly aims of the project if we force users to abandon perfectly good hardware just so emonhub can do the same job inefficiently.

The Pi 3 might currently be the OEM favorite, but I see no reason why an emonSD image cannot run just as well on a Pi Zero W (more on this on another thread soon), although with a slower processor and half the ram we cannot afford to be so frivolous with resources.

How each interfacer formats and buffers the data is essentially up to the interfacer so there doesn’t need to be a common format but IMO it’s good practice. However the Cargo object is designed solely for the transportation of the data, in time more and more will be added to the cargo as more fields come on board and data gets better refined. We already see a “rssi” field because of the RFM interfacer and I can see a possible need for a QoS field at some point, the target field will become more important as 2 way comms get utilized more. Perhaps authentication or encryption fields will be needed down the line. The point being the Cargo object is ONLY for passing the data between interfacers and it’s format must fit the minimum requirements of any and all interfacers so it could grow in size, hence it is emptied as quick as possible.

By only buffering part of the cargo, you are no longer buffering “the cargo” so I see no reason for it to resemble the discarded cargo rather than being formatted ready to send.

I also have several PoC methods of persisting the buffers, from maintaining a persisteted copy of the whole buffer to chopping the contents up into smaller chunks eg the http interfacer buffer could store many small files each containing 250, 500 or 100 frames and when the network comes back up it sends one persisted file at a time, already formatted to go of course. This is easier to manage and can work in tandem with the RAM buffer and even beyond the defined RAM limit.

There was also alternative buffers considered from day one eg mysql, the original first commit of emonhub had a mysql based buffer alternative to RAM, but since the original release was intended as a RO replacement for the Rock Solid Gateway sdcard, it was removed and to be re-introduced at a later date as we didn’t want emonhub to force the installation of mysql (or any other db) for simplicity and to reduce card wear. That too could be reintroduced (I still have the code we removed) for setup’s with a mysql db installed (eg emonSD), that buffer method also stores the frame as a text string.

The obvious benefit of persisting a text string is that debugging becomes a whole lot easier and if things do wrong the data that’s orphaned in a buffer.file can be easily recovered and reintroduced. Plus if the cargo type formats use double the RAM I think it’s safe to assume that it will also use double the disc space to persist.

In addition to the above I have previously tried to sell an idea on the forum (discussion with Chaveiro) where the outgoing data could be persisted to disk as an “all time record” of data going to emoncms, this would allow new accounts/installs to be recreated with all historical data. I described this as an option to “replay time” at a much faster speed, new processing could be set up, the latest and greatest feed engines used. At the time it didn’t gather much interest but now the IoTaWatt effectively does this and it’s considered a valuable feature.

If in text string CSV format it allows the buffers/records to be edited, sorted, searched or even manufactured. I suggested to Glyn way back when he appealed for a method to test emoncms that we could just introduce a year or 2 of “backlog” to emonhub and use the existing controls for batch size and send interval to adjust the assault on the emoncms instance being tested. Not to mention the value of being able to open the buffer’s csv in a spreadsheet and easily locate and correct rogue entries that have contaminated emoncms feeds and resend the data.

I had already 99% resolved this 2.5 years ago, hence my desire to revert the breaking out of all interfacers to separate files so I can resume my development in this area. My solution resided in the __init__.py file not emonhub.py so the original code in emonhub.py worked without change.

Your reasoning for wanting the interfacers in individual files to make readability easier for you is equally matched to my wish to keep all the “included” interfacers in one file as personally, I find that more organised and easier to follow. To that I add a very good reason why this would help the way emonhub operates and the way it is developed for, then on top of that I also offer a compromise in the posts above where we only keep the “core” interfacers in one file and further catagorise and manage the separate interfacers files. You have not yet really offered any substantial reasoning or any compromise in this area.

I have also offered a good reason as to why there is a need for both interfacers and reporters, but I have also offered the compromise of and good reasoning for alternative “reporter like” interfacers for buffered/bulk upload instead, if the classes and inheritance are used correctly/efficiently it would make implementing and maintaining 2 forms of interfacer for each http and mqtt routes much easier than trying to make one form do both jobs, realtime/verbose (value per topic or key:value http) and buffered/bulk (compact and fast, timestamped and indexed csv, confirmed delivery or QoS 2 etc). I see no real reason to change the format the data is buffered in at this time and hope you can see the benefits of a dual approach to get the best of both scenarios rather than getting sub-standard performance in both scenarios from one interfacer that isn’t best suited to either method.

If we are to re-unite the emonhub versions we really need to focus on finding the middle ground and not aim for a place even further beyond where you have already taken the emon-pi variant.

I had put a lot of time in to developing the experimental version of emonhub and much of this stuff I had toiled over for many months already when you took emonhub off in another direction and I have constantly reviewed my thoughts and decisions for well over 2 years since then. I really do not want to dampen your enthusiasm but much of what you are doing is akin to “redesigning the wheel”, can we please try to bring the versions together and then fix any remaining issues before trying to redesign it?

Can we also consider moving all these discussions to the public forum? I do value our one on one discussions but it’s a lot of writing and a lot of information that would be better off with more participants. IMO it should be in a “dev’s corner” forum category but I’m unsure that will ever happen so it will have to be the public forum (or the staff category?).

The outcome I need is the ability to post or publish data from emonhub including a node name and input names. For MQTT that might mean a topic name of the form emon/emontx/power1 with value 100. I cant see how the experimental version could support that? I realise the experimental branch does not support names as per see and this was one of my additions, perhaps we need to discuss that point in more detail?

Sure there’s a memory overhead here and Im open to alternatives, e.g my proposal:

for i in range(0,691200):
    
    cargo = {}
    cargo['names'] = ["power1","power2","power3","vrms"]
    cargo['values'] = [100,200,300,245.4]
    
    buf.append(cargo)

Yes its still 4x the memory use of the values only approach.

The other idea I had was that the _process_post method could access the ehc object and fetch the names at that point e.g:

names = ehc.nodelist[node]['rx']['names']

The main problem with that approach that I can see is that if you change the node names in the config when there are large amounts of buffered data the names may not correspond to the values read in at an earlier time.

I notice there is a difference between your experimental branch here where it adds items to the buffer
https://github.com/emonhub/emonhub/blob/experimental/src/emonhub_reporter.py#L133

and the development branch here:
https://github.com/emonhub/emonhub/blob/development/src/emonhub_reporter.py#L124

but it looks like the development branch reporter still expects the buffer data in an array format judging by the log code here:

self._log.debug(str(data[-1]) + " Append to '" + self.name +
                    "' buffer => time: " + str(data[0])
                    + ", data: " + str(data[1:-1])
                    # TODO "ref" temporarily left on end of data string for info
                    + ", ref: " + str(data[-1]))
# TODO "ref" removed from end of data string here so not sent to emoncms
data = data[:-1]

Going back a step, is your proposal that we change the inherited method add:

in the experimental branch:

def add(self, cargo):
    """Append data to buffer.
    data (list): node and values (eg: '[node,val1,val2,...]')
    """

    # Create a frame of data in "emonCMS format"
    f = []
    f.append(cargo.timestamp)
    f.append(cargo.nodeid)
    for i in cargo.realdata:
        f.append(i)
    if cargo.rssi:
        f.append(cargo.rssi)

    self._log.debug(str(cargo.uri) + " adding frame to buffer => "+ str(f))

    # self._log.debug(str(carg.ref) + " added to buffer =>"
    #                 + " time: " + str(carg.timestamp)
    #                 + ", node: " + str(carg.node)
    #                 + ", data: " + str(carg.data))

    # databuffer is of format:
    # [[timestamp, nodeid, datavalues][timestamp, nodeid, datavalues]]
    # [[1399980731, 10, 150, 3450 ...]]
    self.buffer.storeItem(f)

is currently in emonhub_buffer:

def add(self, cargo):
    self.buffer.storeItem(cargo)

which could by default be as you implemented above but then be overwritten to or another format on an interfacer by interfacer basis?

def add(self, cargo):
    data = {}
    data['nodeid'] = cargo.nodeid
    data['names'] = cargo.names
    data['values'] = cargo.realvalues
    self.buffer.storeItem(data)

Ok Im pretty sure I get it. I’ve reverted the buffer format to be set in the def add(self, cargo) method, it avoids the memory issue especially for the bulk format data and keeps emonhub_interfacer and http emoncms interfacer closer to experimental branch state. I will update my git comparison on the emonhub repo.

Edit: git comparison updated but not sure how useful given interfacers and reporters are merged.

To me putting all the interfacers in a single file is a bit like putting all the emoncms models in one file: input,feed,users etc. It makes for a long and hard to read file. I did reply to your point about ensuring all core classes are inheritable, Im not convinced that keeping them all in one file helps that much. That said in the interest of moving forward I would accept the compromise position you outline in the short term in order to merge versions.

Im still trying to get my head around this, but I think I understand at least part of your suggestion. I’ve been thinking through the scenario of an interfacer that has both the code to read and post. I created a template interfacer to try and explore this (as well as what a guide would look like on building a new interfacer) https://github.com/openenergymonitor/emonhub/blob/emonhub_buffer/src/interfacers/EmonHubTemplateInterfacer.py

I realise that a single interfacer defined in emonhub.conf doing both things would not work as the reading would block the posting of data as they are on the same thread and so I ended up creating two instances of the interfacer in emonhub.conf:

[[TemplateRead]]
    Type = EmonHubTemplateInterfacer
    [[[init_settings]]]
    [[[runtimesettings]]]
        pubchannels = ToEmonCMS,
        
[[TemplateSend]]
    Type = EmonHubTemplateInterfacer
    [[[init_settings]]]
    [[[runtimesettings]]]
        subchannels = ToEmonCMS,

If there are no pubchannels defined the interfacer skips calling the read method altogether in templateread. I can see that there is something not quite right about this, as a user could initiate this interfacer in both input and output mode and have a whole load of blocking issues. The interfacer class code then has a whole load of redundant code when used as an output interfacer ‘reporter’ or the other way around as an input interfacer…

This is exactly what Im trying to do, I’ve reverted a whole load of code back to the experimental branch state over the last few days. The question of splitting interfacers into separate files and interfacer/reporters is key to ultimately working out the right merge point.