Could emonESP also support CSV and post to a local emonHub?

pb66 · 3 September 2016 10:42

Can I make a suggestion about the interaction between the emonTx sketches and the emonESP ?

Rather than a different payload as you are trialing now, could we look to standardize the way the data leaves the AVR to a string of values that is suitable to RFM, serial-direct and emonESP alike, so that we do not have to revise every sketch and narrow the use of this great development to those that want to post direct to emoncms via single api calls or those using MQTT.

The emonTx sketch could very easily during setup, pass a string of names to the emonESP that can be used to split the stream of data that follows off into pairs in the way you are currently focused on. That “name string” would also be very useful during serial debugging too, much like the existing sketch version etc that we currently see from an emonTX etc.

This would allow those of us that want (or sometimes have no choice but) to post to emonhub locally via socket to get on board too.

With the current arrangement each device will have to have it’s own reliable network connection, what happens to data produced during a network/server outage? Is it just lost? I would expect so. If a small buffer was to be introduced the “string of numbers” is easier and more compact to buffer.

There is also the issue that keeps cropping up (not related directly to emonESP) about the processing of data at emoncms and the fact that when individual data’s are sent eg MQTT or when JSON pairs are sent to emoncms you have little or no control over the order that they are processed in and that causes irregularities, in contrast sending a single CSV string ensures the processing is done in the expected order and the results are reliable.

The “power1pluspower2” in the emonpi payload is an example of the only way of getting around it, Are we going to start putting all our processing in the sketches just so we can have the fixed variable names repeated in every (unseen by humans) payload?

Once the emonESP has a string of names, plus a stream of “stringed values” (maybe CSV rather than space separated is the way to go?) the handling could be really flexible. Everything you have now would still be possible PLUS users like me can post locally to emonHub via a socket (by adding a emonhub.ino?), and move towards using MQTT in the near future BUT by passing a single payload of values to emonhub and emonhub then doing it’s intended job to pass that intact string to emoncms(s) for accurate and predictable processing AND also (if desired which there is strong evidence to suggest there is) break out the values to individual MQTT topics on the LAN for other purposes eg openHAB. Merging these 2 tasks into one is the bottleneck of the current OEM ecosystem.

Also what if a user wants to send to other locations? not just emoncms(s)? yes you could put “local” and “remote” emoncms settings but even that is very restrictive.

My suggestion isn’t asking those of you that embrace posting via named value pairs or individual topics to change your habits, or convince you that any particular way is the better way, just asking you to not exclude those of us that have a different opinion and include some scope for exploring other avenues.

jeremypoulter · 4 September 2016 14:07

The problem with that is if the EmonESP (or what ever the other end is) starts up after the EmonTX it will miss the name string

I am not sure there is any reason why you could not use the same setup just would mean that the emonhub service would need to be updated to read the JSON like output of the EmonTX.

The EmonESP posts all the values (to EmonCMS) in one request so I would assume this is no different than passing as CSV if not I would think it should not be to hard to ‘fix’ to ensure processing in the order passed. That being said I have not looked at the EmoCMS code

MQTT is a different case, I am not sure there is a way to post multiple values at the same time, EmonESP certainly posts one value at a time, but the order is very consistent and in the order they are in the string sent from the EmonTX.

You have however highlighted a potential issue in the processing of my inputs that I will need to look at

While I do agree with you that there does need to be some consistency I personally would like to see the move to something like (proper) JSON rather than the OEM specific strings that all the current transports use. This would make it extremely easy to say directly read the EmonTX output directly in to Node-RED for example.

pb66 · 4 September 2016 15:46

Fair comment, but not a show stopper, it could be repeated every so often or it could be asked for, I am talking about the output from the sketch here (not any input/api etc on the esp) so it could even be written to the emonESP’s eeprom during any updates to the firmware.

Once the order of the variables is scrambled by using JSON, they cannot be guaranteed to be put back in the same order. The ONLY way to have reliable control over the processing order in emoncms is to process the inputs in a known order, unless we sort the JSON before processing, alphabetically for example. I am referring to JSON and not just a JOSN like string, a string won’t change it’s order, but JSON pairs are just that they are unorderly pairs not an ordered list, there are no guarantees “power1:10, power2:20, power3:30” won’t end up as “power3:30, power2:20, power1:10” and if your “sum of all powers” is in power3’s processing then you will be adding the old power2 and old power1 to the new power3.

The CSV/JSON argument has been going on for years, the additional issue with JSON is there is no bulk upload either. When a network or sever has been down for lets say a day, just a single emonTx @10s intervals will have created 8640 payloads, which can be uploaded via 35 bulk uploads in well under a minute without straining the server, where as the same would cost 8640 individual requests with JSON or assuming a 10 variable payload 86400 MQTT publishes.

I too, like the idea of named inputs and data being easily used by other devices/softwares, but I am not willing to trade it for reliable and accurate data, which is why I regularly offer a variety of solutions such as splitting the delivery of data to emoncms and the broadcast of data for other uses.

All this single value focus eg JSON and MQTT is fantastic for sharing a simple value, as soon as you try doing any math with it you have no idea what you are likely to end up with. Sure alot of the time it’s right and often when it’s wrong it goes un-noticed (which IMO is an even bigger issue). For many users that just want some numbers to display or an approximation, fine. But if your application wants accurate data, timestamping is critical and so is using the correct data over “whatever came last”

The MQTT packet can very easily be a string of values. MQTT payload can be pretty much anything. that payload can then be easily be broken out into individual topics anywhere along it’s route eg emonhub.

Having now tried the emonESP, I have experimented with CSV too.

I have found it is passed ok and the usual indexes are given by emoncms as expected.

looks a bit odd in the emonESP webserver page though, thats due to the splitting by colon leaving the value in “namevalue[0]” not “namevalue[1]” as expected in config.js. (using the loop index “z” could fix that though)

So I imagine the ability to post that CSV to a local emonhub socket going to be fairly easy to add, and a even a switch for publishing as a single topic or not could be based on whether it has a colon in the string.

jeremypoulter · 4 September 2016 21:08

Kind of, certainly given the example of;

{"power1":10, "power2":20, "power3":30}

I would agree you should not assume parsing in any given order (unless you specifically write your JSON parser to process in the order passed) but that is not the only way you could express the data.

If the order is important then the values could be expressed as an array of objects;

[{"power1":10},{"power2":20},{"power3":30}]

or even a bit larger;

[{"name":"power1","value":10},{"name":"power2","value":20},{"name":"power3","value":30}]

and for cases were the name is not known (eg emonPi values received from RFM) you can revert to just an array of numbers;

[10,20,30]

Yet ;-), there is nothing inherent in JSON that prevents this eg;

[
  {
    "date":"2016-09-04 21:09:00+0000",
    "data":[
      {"power1":10},{"power2":20},{"power3":30}
    ]
  },
  {
    "date":"2016-09-04 21:09:10+0000",
    "data":[
      {"power1":10},{"power2":20},{"power3":30}
    ]
  },
  {
    "date":"2016-09-04 21:09:20+0000",
    "data":[
      {"power1":10},{"power2":20},{"power3":30}
    ]
  }
]

All of this can be done using JSON, granted not with the current implementation, but I would certainly be willing to help design/implement a solution that can do all of this.

Wow, I am slightly worried that this worked as that is not even close to valid JSON but I think as you are still using the JSON endpoint you will still have the same issues as with name/values. That being said on reviewing the input module (https://github.com/emoncms/emoncms/blob/master/Modules/input/input_controller.php) there is no difference between the order of parsing a CSV vs JSON input, mainly as the JSON is not really JSON and is not parsed to an object, as far as I can see the ‘JSON’ name:value pairs follow the same code path and will be processed in exactly the same order as if they were passed as CSV.

jeremypoulter · 4 September 2016 21:37

Actually I am going to disagree with myself here, while it is true the JSON and CSV follow the exact same code path the intermediate PHP array may mess with the order when names are used. It would however be a fairly simple change to ensure processing in the same as passed in.

pb66 · 5 September 2016 11:50

The fact that the JSON works in emoncms because it isn’t really JSON is not the strongest argument for using JSON and you are not the first to discuss changing the way emoncms works so the fact it works by accident is not something we can rely on remaining “faulty”.

No there isn’t! But it is not currently an option, And size would perhaps, still provide valid reason to prefer CSV as this is the same 3 packets using the current CSV bulk upload,

[[1473023340,node,10,20,30],[1473023350,node,10,20,30],[1473023340,node,10,20,30]]

it also includes the node id which would need something like "nodename":"node", added to each of the 3 entries in your example. Size might be important if using GSM which is another area of interest for the OEM project or if buffering “en-route” as in the solution offered on the thread “Data not sending after network loss” thread.
@ 82 chars verses >400, that is a fifth of the size in terms of data allowance or disc space (or RAM?) used.

By the way a remote “GSM only” location would also be better off posting via wifi AP locally and use one GSM connection to forward data rather than multiple GSM devices. (This too is an area I intend to explore rather than tackling the proxies and firewalls at my install sites).

My intention was not (as I pointed out up front) to try and convert you to my way of thinking, nor was it to promote the use of CSV, not even to highlight or discuss the potential pitfalls of any method. I only wanted to ask that users of CSV and/or local emonhub as a local “hub” not be excluded by developing only for JSON or MQTT in such a way that it cannot be used for a wider purpose.

I have learn’t from the dozens and dozens of lengthy discussions on this that there is no winning solution and that is why I avoid trying to force anyone to change against their will, I simply ask that I/we be given the same consideration. I cannot use the methods currently being implemented, for the reasons given and ask that be considered so we can develop a flexible solution rather than force users to make changes that might not work for them.

So far it has been suggested all the sketches need changing to suit emonESP and now we are discussing changing emonhub to accept JSON and emoncms to parse JSON in a known order which looks like it will have to include a complete revamp of the emoncms api’s and in turn the way emonESP formats it’s JSON. Maybe over time all these things can happen but for now since the CSV appears to be handled by the emonESP quite well unintentionally already, couldn’t we build on that to give a full compliment of choice rather than abandoning methods that work well in favor of those that are not yet perfect?

The predominant feature of ISP over Wifi is to be able to develop, improve and deploy firmware to the devices, the main area of improvement in measuring energy use is accuracy, either through better code or by better calibration, there is little point in fine tuning a continuous sampling sketch if there is no guarantee all data will always arrive (no buffering), or that the data isn’t timestamped at source, but if/when it hits the server, or if that data is then randomly mixed with old and new data during processing.

jeremypoulter · 5 September 2016 13:56

I do not think it has ever been suggested that CSV or any other feature should be dropped from the current firmwares and if I have given that impression I apologise. Backwards compatibility is always important. The current API has to remain as is to avoid breaking all the legacy devices out there that will probably never be updated as they are quite happy sending data to EmonCMS year after year.

To be clear my suggestion was add an option (either a build time or run time) to enable the output that the EmonESP expects from the default firmwares. This would not be the default and would be something that you would have to build a custom firmware (if a build option) or the EmonESP would enable (if run time) so everything you can currently do would not change unless you use the EmonESP.

With regard to the EmonESP there is a simple change on the server that could ensure that the JSON like name/value pairs are processed in the same order as they are given meaning you would have the best of both worlds, names and a fixed order. As long as that is a defined expectation of the input API there is no problem with it changing in the future.

My main points were that there seems to be a lot of historical worries about using JSON to transmit data but as long as the schema(s) are designed to handle all the use cases then it is a very good way of representing data, don’t forget the bulk upload is actually JSON and if anything I have suggested here was to be implemented it would have to be either a different endpoint or backward compatible.

borpin · 5 September 2016 18:08

[quote=“pb66, post:32, topic:795”]
I am referring to JSON and not just a JOSN like string, a string won’t change it’s order, but JSON pairs are just that they are unorderly pairs not an ordered list, there are no guarantees “power1:10, power2:20, power3:30” won’t end up as “power3:30, power2:20, power1:10” and if your “sum of all powers” is in power3’s processing then you will be adding the old power2 and old power1 to the new power3.[/quote]

That is interesting. I had seen this behaviour before but assumed that some processing somewhere was reordering it. So this is a ‘feature’ of a JSON string of pairs is it?

borpin · 5 September 2016 18:29

I’ve bumped my gums before about EmonCMS use of pseudo JSON rather than a valid JSON string. It would be really useful if the system did accept valid JSON, I have been using my EmonCMS server to accept some weather data via MQTT (having hacked the input process slightly to make it work). However, that hack means I have not updated the system for ages!

I’d support this as an addition so EmonCMS can pull in data from other sources more easily. The API could have an optional flag for a valid JSON string to process. MQTT may be a bit more difficult to make backward compatible.

[Edit] I meant to add that being able to actually add a ‘name’ to the input list from the JSON input would be a real bonus. Also could the system correctly process a date value in the JSON string. Last time I tried it just seemed to ignore it.

jeremypoulter · 6 September 2016 12:21

And I am going to do it again I think my original statement was correct; there is no difference between the order of parsing a CSV set of values and a ‘JSON’ set of values as in PHP an array is just a map with numbers as the ‘name’ and in both cases the map is ordered, see this post for more info.

But just to be sure walk with me through the input processing in /input/post.json (and I am talking about the single input here not the bulk input) …

                // code below processes input regardless of json or csv type
                if (isset($_GET['json'])) $datain = get('json');
                else if (isset($_GET['csv'])) $datain = get('csv');
                else if (isset($_GET['data'])) $datain = get('data');
                else if (isset($_POST['data'])) $datain = post('data');

This bit pulls the input CSV/JSON/whatever in to $datain

                if ($datain!="")
                {
                    $json = preg_replace('/[^\p{N}\p{L}_\s-.:,]/u','',$datain);

If we actually have any data strip out the anything that is not (^) a number (\p{N}), letter (\p{L}), _, whitespace (\s), -, ., :, or ,, ie for JSON input this will strip put all the JSON notation and leave you with just a name:value comma separated list.

                    $datapairs = explode(',', $json);

Split the string into an array ($datapairs) on the , char.

                    $data = array();
                    $csvi = 0;
                    for ($i=0; $i<count($datapairs); $i++)

loop through each item

                    {
                        $keyvalue = explode(':', $datapairs[$i]);

split into an array ($keyval) on the : char

                        if (isset($keyvalue[1])) {

If there is a second value in the $keyval array, ie name:value, then

                            if ($keyvalue[0]=='') {$valid = false; $error = "Format error, json key missing or invalid character"; }
                            if (!is_numeric($keyvalue[1])) {$valid = false; $error = "Format error, json value is not numeric"; }

do some validation, $keyvalue[0] (the ‘name’) must not be empty and $keyvalue[1] (the value) must be a number.

                            $data[$keyvalue[0]] = (float) $keyvalue[1];

store in the $data array

                        } else {

else, ie just a number

                            if (!is_numeric($keyvalue[0])) {$valid = false; $error = "Format error: csv value is not numeric"; }

make sure $keyvalue[0] (the value) is a number.

                            $data[$csvi+1] = (float) $keyvalue[0];
                            $csvi ++;

store in the $data array with the name of $csvi + 1, which as a side note for the input 10,20,30 will give you;

$data[1] = 10
$data[2] = 20
$data[3] = 30

will not be a problem, but worth noting

                        }
                    }

So now we have all our data parsed in to $data in the order added so for {p1:10,p2:20,p3:30} it will always be;

$data["p1"] = 10
$data["p2"] = 20
$data["p3"] = 30

and for 10,20,30 you will have

$data[1] = 10
$data[2] = 20
$data[3] = 30

                    $tmp = array();
                    foreach ($data as $name => $value)

enumerate the $data array;

                    {
                        if (!isset($dbinputs[$nodeid][$name])) {

If the input node does not exist then;

                            $inputid = $input->create_input($userid, $nodeid, $name);
                            $dbinputs[$nodeid][$name] = true;
                            $dbinputs[$nodeid][$name] = array('id'=>$inputid, 'processList'=>'');
                            $input->set_timevalue($dbinputs[$nodeid][$name]['id'],$time,$value);

Create the new input node and set the value, no need for processing as we have only just created it

                        } else {
                            $input->set_timevalue($dbinputs[$nodeid][$name]['id'],$time,$value);
                            if ($dbinputs[$nodeid][$name]['processList']) $tmp[] = array('value'=>$value,'processList'=>$dbinputs[$nodeid][$name]['processList'],'opt'=>array('sourcetype' => "INPUT",'sourceid'=>$dbinputs[$nodeid][$name]['id']));
                        }

else set the new value and if there is a process list add to the list of inputs to process ($tmp)

So at this point we have set all the values in the respective inputs and $tmp has a list of inputs that need processing but nothing has been actually processed.

                    foreach ($tmp as $i) $process->input($time,$i['value'],$i['processList'],$i['opt']);

The final step here is to actually go and process the inputs. Now the interesting thing here is that even though the inputs are kept in order it actually does not matter when summing or subtracting inputs as all the new values have already been updated.

I would also suggest modifying the input API documentation to change the examples to all use data=… instead of csv=… and json=… I think this may help show that it does not matter which format you use, and as a secondary point you can use the POST HTTP method which is a bit more RESTful

pb66 · 6 September 2016 13:42

I’m inclined to agree with you based on the fact the processing order has been more of an issue since MQTT was on the scene, previously there were not that many complaints despite the length of time that method has been in service and the number of users.

My comments regards JSON being unordered was not based on the inner workings of emoncms but the results I have had with “proper” JSON in other situations. hence my comment about emoncms working correctly due to it not being true JSON but more like CSV in JSON format (as per the bulk api).

It is good hear confirmation that the api processes current “JSON” in order and that if changes are made proper JSON can also be configured to force an order.

So both the CSV and JSON (maybe we should just stick to indexed and named as neither is fully JSON or CSV) methods are currently functioning correctly, great!

However they are still not interchangeable. and the reason behind that is the CSV “index” and JSON “name” being the same field, see my screenclip from the inputs page. you could not put the values of “CT1” “CT2” “T1” etc into the bulk (CSV) upload and post to the same inputs, it creates new inputs “1” “2” & “3” etc as you can see.

The only way for the 2 methods (named vs indexed) to be made interchangeable without breaking backwards compatibility for one or the other, is to add another field to the input schema and have both an index and a name, which in turn would give you an easier and configurable processing order, negating the need for any special ordered JSON.

I have been suggesting/asking for the adoption of an “index” field for a very long time! currently both the “name” and “description” fields are taken as different things in each case. When bulk uploading or using CSV ie un-named data then the “name field” is actually holding the “key” and the “description field” holds the “name”. but with named inputs the “name field” holds the “name” and a “description” goes in the “description field”.

In the inputs page though, what you see labelled as “Key” ids the “name field” and the “Name” is actually the “description field”.

The lack of a true index field means users must chose their method and stick to it, no interchanging.

(I think the JSON discussion needs splitting out to another thread, I will do it later this evening)

jeremypoulter · 6 September 2016 16:56

It sounds to me like the only thing really missing is the ability to bulk upload named values.

If that was added is there any other use case that needs to switch between indexed and named values? I guess installing an EmonESP into an existing system could maybe benefit from this but this is not going to happen that often.

To get back on topic the bulk upload of named values is going to be needed by the EmonESP if it was going tolerate network failures. There is a good 10Kb of memory free so should be able to buffer up a few hours worth of data, just need to get it to the server