Development: Indexed Inputs

TrystanLea · 25 August 2018 12:25

Following from discussion Device Module selecting wrong input and original emonhub development discussion here: EmonHub Development - #44 by TrystanLea I’ve put together a first attempt at implementing indexed inputs in emoncms, for early review and testing on sandbox systems (I wouldnt advise on running this on a live system yet) First attempt at indexed inputs by TrystanLea · Pull Request #1003 · emoncms/emoncms · GitHub

Indexed inputs

Starting with CSV

Emoncms has historically supported posting of CSV data in the following format:

input/post?node=mynode&csv=100,200,300
input/bulk?data=[[0,"mynode",100,200,300]]

Emoncms automatically names each input in ascending order as received, our example above results in:

--node:mynode
-----name:1,value:100
-----name:2,value:200
-----name:3,value:300

If the name is then changed in the inputs interface, further posted inputs in csv format will recreate the original input name (e.g 1,2,3).

We can also post data using key:value json format:

input/post?node=mynode&json={"power1":100,"power2":200,"power3":300}

which results in:

--node:mynode
-----name:"power1",value:100
-----name:"power2",value:200
-----name:"power3",value:300

The proposal of indexed inputs is to separate the naming of inputs from the posting/updating of values when using CSV format. So that if we change the input names in the inputs interface and post to the csv API again, the values will apply to the new names according to the index order. This makes it possible to combine the benefits of named inputs that are a feature of the json key:value API with the compact size of CSV.

Using the adapted implementation in the indexedinputs branch, try changing the input key for CSV posted data from 1,2,3 to power1,power2,power3 in the inputs interface, using the pencil edit icon.

Post again with:

input/post?node=mynode&csv=100,200,300

Notice how new inputs are not created and that the renamed inputs are updated.

Its also possible to name the inputs directly in the API (note that posting names without values is not currently implemented):

 input/post?node=mynode&csv=100,200,300&names=P1,P2,P3

and there’s the option to call csv: values which matches the emonhub naming convention:

 input/post?node=mynode&values=100,200,300&names=P1,P2,P3

We can also update the same inputs using the JSON format, try:

 input/post?node=mynode&json={"P1":100}

Starting with JSON

What happens if we start with JSON.

Step 1: Create a set of inputs with JSON format:

input/post?node=emontx&json={"power1":100,"power2":200,"power3":300}

These are indexed in received order, e.g power1 is index:0, power3 is index:2. If you now post another key:value seperately this will be indexed to index:3

input/post?node=emontx&json={"A":123}

Step 2: Post to the same inputs with CSV:

input/post?node=emontx&csv=100,200,300,123

A separate nodeid and nodename is not currently supported in this implementation which still makes it incompatible with the default emonhub http interfacer which posts the nodeid rather than nodename.

input->get_inputs function has been modified and use of this function in other parts of emoncms needs checking

pb66 · 25 August 2018 13:09

That’s great news!

it’s unlikely I will get a chance to test for a while as I’m behind on a big project at the moment. I have looked through your PR notes and it sounds great, I’m not able to comment on the code changes without testing as I’m not so hot with PHP. One thing that I did question was the use of index 0 when applying a index from JSON named inputs, I know it is usual to index from 0 in code, but the original CSV indexing starts from 1, so could that cause conflict or confusion? Are the original indexes (numerical names) taken at face value and 0 unused or is 0 now a valid index?

for example in your 1st example/explanation

-node:mynode
-----name:1,value:100
-----name:2,value:200
-----name:3,value:300

are the new “indexes”

-node:mynode
-----name:1,value:100,index:1
-----name:2,value:200,index:2
-----name:3,value:300,index:3

or

-node:mynode
-----name:1,value:100,index:0
-----name:2,value:200,index:1
-----name:3,value:300,index:2

I feel that due to historical use of OEM it needs to be the former to avoid confusion despite that not being syntactically correct in that with

csv=100,200,300

csv[1] is actually 200, but in OEM speak csv[1] is 100 not 200 as node id was always csv[0], but even that is a sketchy reason to continue that as a full frame of data now days would actually be time,node,val1,val2,val3 making csv[1] the node id OR nodename.

What’s your thoughts? It was only this line that raised the question for me

These are indexed in received order, e.g power1 is index:0, power3 is index:2. If you now post another key:value seperately this will be indexed to index:3

TrystanLea · 25 August 2018 14:06

Thanks @pb66 the index is really behind the scenes ordering, Id prefer to keep to the standard array indexing from zero for that I think.

-node:mynode
-----name:1,value:100,index:0
-----name:2,value:200,index:1
-----name:3,value:300,index:2

is correct. Im not sure that there necessarily has to be much user visibility of the index so it may not really matter.

TrystanLea · 25 August 2018 14:09

The automatic names of 1,2,3 given to csv are a separate field in the database from the ordering index.
Each input has an:

id: input id
nodeid: can be either nodeid or nodename in the emonhub sense
name: input name (e.g power1, power2, power3, 1,2,3...)
index: input index (0,1,2,3,..) used for ordering but not user visible

pb66 · 25 August 2018 15:17

So is there any reason the automatic names could start from 0 too? That would pull everything into alignment.

If I’m understanding correctly, after these changes the numerical names are just a temporary name, not an index as they were used previously.

after posting

input/post?node=mynode&csv=100,200,300

I would have inputs “1”,“2” and “3” so updating with

input/post?node=mynode&json={“1”:100,“2”:200,“3”:300}

would work, but after changing the names to “A”, “B” and “C” posting

input/post?node=mynode&json={“1”:100,“2”:200,“3”:300}

would create another 3 inputs where as

input/post?node=mynode&csv=100,200,300

would not.

OK, I’ll try and give it a whirl when I can, I see the numerical names are separate from the indexes in practice, I just not sure what side of the fence I’m on when the same input can be number “5” or input[4] when you start debugging as I also see the appeal of “human” numbering ie “1”,“2” & “3” rather than “0”,“1” & “2”.

What is the position with deleting and adding new inputs? do old indexes get reused or shifted?

eg after deleting input1 and adding a new input1 with a new inputid, will it be index 0 or index 3 (assuming 3 original inputs).

I had envisaged the indexes being visible (and editable?) to the user on the inputs page as there MUST be at least a sort by index option as currently we can sort by inputid or by input name/number, but they will not necessarily be in index order after some deletion and editing etc so

input/post?node=mynode&csv=100,200,300

may not mean anything as the user doesn’t know what those values relate to, they might be

id 1236, name “C”, index:0
id 1237, name “A”, index :1
id 1235, name “B”, index: 2

sorted by name = indexes 1,2,0
sorted by id = indexes 2,0,1

so it is not possible to know that csv=100,200,300 is posting to “C”,“A”,“B” and/or 1236,1237,1235 unless the indexes are visible on the inputs page, and then editing them would be useful, much like we can edit the current “keys” (numerical names) to define the order of the CSV, this is very useful for posting partial CSV eg

input/post?node=mynode&csv=100,200

would update “C” and “A” but if I wanted to update “A” and “B” every 5s and “C” every 3rd update (15s) I cannot change the index in the same way I can currently change the key to order the CSV so a partial frame works.

I’m just thinking out loud for discussion, I’m not picking fault or asking for change as I haven’t even tried it yet, I just knew it would be a bit of a minefield and already had some pre-concieved idea on this and I’m of the opinion that “hiding the indexes” will not necessarily make it simpler despite the reduction in information, but we’ll see, I’m open to either if it works well.

TrystanLea · 25 August 2018 18:41

I havent worked out yet how to apply indexes to existing inputs…

If you delete an input say input 2 (human numbering) then posting:

input/post?node=mynode&csv=100,200,300

does recreate input 2 again although the ordering in the input view is incorrect:

TrystanLea · 25 August 2018 18:51

I’ve made a couple of modifications for testing:

Included the index in the input view ui
auto naming of csv starting from 0… to see if this works

TrystanLea · 25 August 2018 19:28

Couple of further modifications, I needed a way of indicating that indexes have not yet been configured, e.g upgrading from non-indexed inputs, the default index value in the schema is now -1. If youve started testing @pb66 you will want to drop your existing column and recreate via db update.

Then run:

http://localhost/emoncms/input/populateindexes.json

on the user account your testing with to apply indexes to existing inputs. Not sure how best to trigger this automatically without adding too much code bulk to existing functions. It may be better for the admin to run a emoncms wide action as part of any future upgrade procedure.

pb66 · 26 August 2018 09:17

No I haven’t tested yet, as I mentioned, I’m currently buried in another project for while so cannot make any promises, but if/when I can find time I will definitely give it a spin.

TrystanLea · 21 September 2018 06:27

I’ve re-factored the code a bit more to reduce duplication and hopefully make for cleaner reading, latest version comparison with master here: Comparing master...indexedinputs · emoncms/emoncms · GitHub

pb66 · 21 September 2018 09:48

Hi @TrystanLea, coincidentally I started testing this yesterday and also started writing a post as I had some issues, I will pull in the changes and try it again before posting in case things have changed, thanks for letting us know.

TrystanLea · 21 September 2018 10:32

Thanks @pb66 I’ve just uploaded a few further changes, both refactoring and fixing the fulljson api format. There is also a new branch of the device module called indexedinputs that includes the required supporting changes.

Its now possible to post csv values and then use a device template to apply the naming etc. I had to add additional entries into the templates for power2 and power3 so that the indexing works, see the first template in the git comparison here: indexedinputs support (see emoncms core branch) · emoncms/device@adb4a19 · GitHub

The indexing is defined by the indexed array in the template.

pb66 · 21 September 2018 14:30

But it seems I need to remain on the older inputs page (ie not the device module integration beta) to run the indexedinputs branch as there is no displaying of the indexes on the new inputs page, is that right?

The main issue I had yesterday is still at large.

When i switch to the indexedinputs branch all my inputs get duplicated as fresh data arrive’s. For example I have an emonTx v2 posting via emonhub (original) every 5s to this test server

and when I switch to the indexedinputs branch this happens

There are possibly 2 issues here, the first is obviously the duplication of the inputs, I’m guessing there maybe a unhandled difference between an old string name “1” and a new numeric index 1 that results in a new input being created, but that is just a gut feeling, semi-confirmed by deleting some new inputs and them popping back up rather than starting to update the original inputs like when the MQTT inputs get duplicated etc. (note when I switch back to master and delete the new inputs the original inputs DO start updating as expected)

Next is the numbering, old input “1” is now input 0, if this had not duplicated my inputs what would have happened? Would it have accepted input “1” already had a name and just assigned it an index without also changing the name to “0” or would it have changed the name to “0” and then when I switched back to master, the voltage (old input “1” new input 0) would not get updated and the grid power (old input “2” and new input 1) would be updated with my line voltage (x100).

I’m undecided which way is the better way to handle this,

forget about using index 0 across the board, possibly the easiest to do but not technically correct, I too like the idea that the data indexing starts at 0, but perhaps it’s just not practical here?
do not overwrite pre-existing input names, not ideal as input “1” will be index 0, I do not really see this as too much of a problem as most users will change the names to meaningful names once fully committed, at which time the ability to go back to a non-indexedinputs branch is not possible without editing all the names back anyway, but in the interim, the numerical names being offset to the indexes provides that escape route back to the master branch without screwing up your data. New inputs can be created with either an ofset numerical name to match the existing inputs or new inputs get matching names and indexes, or to avoid confusion a naming strategy like the feed creation could be used, ie rather than naming a new input 0 “0” or “1”, name it “index0” or “input0” or “10:0” (assuming a node name of “10”. That way, worse case scenaio when going back to master, new inputs will not be updated until the “input0” gets edited to “1” and the duplicate input is deleted. IMO better to have some missing data than corrupt feeds being updated and not obvious why.
We just allow the changes to happen to the names and assume there is no reason to go back once this is rolled out, not the friendliest approach and it might make testing harder. This might offer the most “correct” result with indexes starting at 0 and names matching indexes on all inputs including those converted, but it is possibly my least favorite option (but the jury’s still out)

It maybe that you have already implemented one of the above but I just didn’t experience it due to the duplicated inputs. I think it is very important to get this bit right before fixing the duplication because once the input names are overwritten we might not be able to go back without lots of editing (each time I run input/clean I get a page telling how many inputs were deleted and that varies between 33 and 53 inputs deleted each time I switch, they would all need manually renaming if they weren’t duplicated) and that needs to be done with the data on hold rather than on live updates.

That sounds good to me.

TrystanLea · 21 September 2018 14:43

I forgot to mention that the indexedinputs is only implemented via the HTTP api so far and not phpmqtt_input. Which one are you testing with? I will get on to changing phpmqtt_input.

and yes only implemented in the old inputs list so far

pb66 · 21 September 2018 14:55

Not a problem there, all my testing so far is http only.

Cool, that fits with what I’ve found.

Any thoughts on the index/name offset and/or the duplication?

TrystanLea · 21 September 2018 15:01

Trying to solve your seen issue first with new inputs, After switching the branch did you update the database to add the index field? and then run the populateindexes command as above?:

http://localhost/emoncms/input/populateindexes.json

TrystanLea · 21 September 2018 15:07

The idea is that these indexedinput changes will work seamlessly with existing systems, i.e no new inputs appearing, everything working as expected, no broken configuration. I think once your indexes are populated and its setup correctly most of your questions will I hope be answered

TrystanLea · 21 September 2018 15:08

Existing inputs with input names “1”, “2” etc will continue to post to these names. But their index will be 0, 1

pb66 · 21 September 2018 15:09

Yes

No, I had missed that and now I’m a bit concerned about running it until I know what it’s likely to do with the input names, what is the intended behavior?

Maybe I’ll set up another test user so that i can just test with one set of inputs rather than risking the need to rename 50 inputs back.

TrystanLea · 21 September 2018 15:12

The populateindexes command only modifies the index field so it wont change anything else:

github.com

emoncms/emoncms/blob/indexedinputs/Modules/input/input_model.php#L264

    
      
              $dbinputs = array();
              if (!$result = $this->mysqli->query("SELECT id,nodeid,indx,name,processList FROM input WHERE `userid` = '$userid' ORDER BY nodeid,name asc")) {
                  $this->log->warn("mysql_get_inputs sql query error");
                  return false;
              }
              while ($row = (array)$result->fetch_object()) $dbinputs[] = $row;
              return $dbinputs;
          }
          
          
public function populate_indexes($userid) 
          {
              $userid = (int) $userid;
              $this->log->warn("populate_indexes u=$userid");
              
              $nodes = array();
              if (!$result = $this->mysqli->query("SELECT id,nodeid,indx,name,processList FROM input WHERE `userid` = '$userid' ORDER BY id")) {
                  $this->log->warn("populate_indexes, sql error, indx may be missing");
                  return false;
              }
              
              while ($row = $result->fetch_object()) {

You can delete the index column in the database, recreate and run the command again as many times as you like to test without affecting existing data there.