Multi-instance service-runner

Greebo · 8 February 2019 13:52

Following on from discussion in Emoncms V9.9.6 Stable merge (pre-merge check) - #31 by borpin, I’ve made some changes to service-runner to try and deal with different EmonCMS prefixes and multiple instances of EmonCMS running against the same redis server.

Before I create a PR I thought I’d post it here for those who may be able to test it further - i.e. those with either non-EmonPi installs, or multi-instance EmonCMS installs.

#!/usr/bin/python

## Used to update log viewer window in Emoncms admin
# Used in conjunction with: service-runner-update.sh and Emoncms admin module

import sys
import re
import redis
import subprocess
import time
import signal
import os

# Edit this list if you add additional EmonCMS instances or change the instance name from the
#  default of "" for an EmonPi, or "emoncms" for a self-install from GitHub.
# Ensure you add the trailing colon as shown below for "emoncms:"!
instancelist = "", "emoncms:"

def handle_sigterm(sig, frame):
  print("Got Termination signal, exiting")
  sys.exit(0)

# Setup the signal handler to gracefully exit
signal.signal(signal.SIGTERM, handle_sigterm)
signal.signal(signal.SIGINT, handle_sigterm)

def connect_redis():
  while True:
    try:
      server = redis.Redis()
      if server.ping():
        print("Connected to redis-server")
        # need to wait until EmonCMS has started
        return server
    except redis.exceptions.ConnectionError:
      print("Unable to connect to redis-server, sleeping for 30s")
      sys.stdout.flush()
    time.sleep(30)

print("Starting service-runner")
sys.stdout.flush()

server = connect_redis()

while True:
  try:
    # Check for the existence of a redis 'service-runner' key
    for instance in instancelist:
      if server.exists(instance + 'service-runner'):
        # We've got one, now to turn it into a cmdline
        flag = server.lpop(instance + 'service-runner')
        print("Got flag: %s\n" % flag)
        sys.stdout.flush()
        script, logfile = flag.split('>')
        cmdstring = "{s} > {l} 2>&1".format(s=script, l=logfile)
        print("STARTING: " + cmdstring)
        sys.stdout.flush()
        # Got a cmdline, now run it.
        subprocess.call(cmdstring, shell=True)
        if not (os.path.isfile(logfile)):
          f = open(logfile, 'a')
          f.close()
        print("COMPLETE: " + cmdstring)
        sys.stdout.flush()
  except redis.exceptions.ConnectionError:
    print("Connection to redis-server lost, attempting to reconnect")
    sys.stdout.flush()
    server = connect_redis()
  except SystemExit:
    # If the sys.exit(0) from the interrupt handler gets caught here,
    # just break from the while True: and let the script exit normally.
    break
  except:
    print("Exception occurred", sys.exc_info()[0])
    sys.exit(1)
  time.sleep(0.2)

Save the above into a file somewhere called whatever you like… sr.py will do if you’re a lazy typer
Make sure the standard service-runner is stopped using:
sudo systemctl stop service-runner
From the folder you saved the above script into, run:
sudo python sr.py (replace sr.py with whatever name you chose)
Confirm it finds your instance(s) in the output printed to the console - something like:

pi@emonpi(ro):~$ sudo python sr.py
Starting service-runner
Connected to redis-server
Found unnamed EmonCMS instance []

Then try using the backup module or one of the other Web Admin tasks that utilise service-runner and confirm they get run (again, checking the output in the console). You should see something like this:

Got flag: /home/pi/backup/emoncms-export.sh /tmp/emoncms-flag-export>/home/pi/data/emoncms-export.log

STARTING: /home/pi/backup/emoncms-export.sh /tmp/emoncms-flag-export > /home/pi/data/emoncms-export.log 2>&1
COMPLETE: /home/pi/backup/emoncms-export.sh /tmp/emoncms-flag-export > /home/pi/data/emoncms-export.log 2>&1

Terminate service runner using Ctrl-C

After you’ve tested sufficiently, make sure you restart the actual service-runner with:
sudo systemctl start service-runner

I also realise that as written, service-runner will only run a single command at a time, so if a backup is initiated and then the user changes pages to the Admin page and initiates an update, the update will not start until the backup completes.
Likewise if an update is triggered from two instances of EmonCMS, they will execute serially, which may cause confusion in the user interface, particularly of the one initiated second.

There are certainly ways around this but they add complexity and I’m open to opinions on whether this complexity is actually necessary before diving in and adding it.

Greebo · 8 February 2019 13:58

I’ve just realised that if EmonCMS hasn’t started before service-runner, it will never find the “readapikey” key in redis and will not work at all… I’ll need to rejig that startup code a bit to keep retrying until it finds one (in much the same way it keeps retrying until it can connect to redis), so this will be a known bug in the above script…

borpin · 8 February 2019 16:54

I think it would be worth doing a periodic check to see if the keys have changed. Perhaps a note in the docs to state that a restart is recommended if an emoncms instance is restarted.

Would an error message if no key is found be worthwhile?

Rather than unnamed perhaps without prefix - unnamed seems a bit alarming, but that situation is not unusual (as the defaults are setup).

pb66 · 8 February 2019 17:11

I think it would need to be able to continually re assess the known prefixes as instances can be added anytime, which would otherwise need the SR to be restarted after each new instance was added (removed?).

The apikey itself would be unimportant (correct?) it’s just the existence of at least one apikey that you are looking for (correct?).

Don’t forget (for example) even the emonSD doesn’t ship with a user created so SR will not work until a user is created and at THAT point it might need the SR restarted. The reason I say at least one apikey is because it is normal for a (non-single user) emoncms setup to have multiple apikeys in redis.

Is the apikey added back to redis after a flush? just wondering about a reboot (or flush) at what point is an apikey re-added to redis?

borpin · 8 February 2019 17:20

No the issue is the redis prefix, not the Unix user. The defaults are different in the emonpi default settings and the ‘default’ settings (Issue #1168) which was where this all started!

Interesting question. Using the readapikey key is just that any emoncms instance always (we think) has this key so you can search through and do a match against *readapikey to find if a prefix is in use.

pb66 · 8 February 2019 17:43

This is not the case!

At first boot the emonSD will not have any users therefore any apikeys. After creating the first user and before the SR is used (import old data?) the SR must find that readapikey to register the emoncms instance to notice the flag when it happens.

That maybe another issue, but regardles of whether you are watching for a flag with a specific prefix or no prefix, there is no read api key until the first user is created (as above).

The addition of a prefix in the emonpi default settings should be a trivial thing, no worse than a redis flush() as from the point of editing/saving the new settings all redis addresses will use the new prefix, the redundant redis entries will get cleared on a reboot or flush (or probably by redis after they are dormant for a time?).

The potential problems will only occur with the whaccumulator, pulse counting or more complex processing, but should be no worse than a reboot situation (worth testing thoroughly before rolling out though).

borpin · 8 February 2019 18:45

Ok, that is good to know. As I said the fact it is the readapikey key is irrelevant; @Greebo just needed a key, generated by the emoncms instance, that he knew would always be there, so the prefix in use could be identified.

It is part of the same issue. As it stands, service-runner will not pick up the redis key unless there is no prefix because it is looking for a key of service-runner and not a key emoncms:service-runner (which was why my backup did not run).

Scanning continually is not thought to be a good idea from a load/efficiency perspective, so we need to find a mechanism for telling service-runner what prefixs might be in use as that setting could be anything.

Assuming the prefix is here to stay, do you have any suggestions how to tell service-runner, what prefixes are in use? A regular scan as opposed to a continuous scan of all keys can do it (to find an ‘always there key’), but what key to scan for?

I’d noticed that in some cases the global $redis is used, and in others a new connection is made. I do not know enough about it to know if creating the new connections is legacy coding. There was a suggestion about encapsulating the redis settinsg. Maybe this is a way forward and when a new connection is made, it sets up a specific key that could be scanned for. If everything used the same connection that might help (or I might be talking rubbish).

Just thinking out loud really (which is sometimes a good idea, but not always ).

pb66 · 8 February 2019 19:01

I get that, and @Greebo has chosen wisely (IMO) as it is probably the one that is most likely to be there. But to put it another way. There is unlikely to be any redis entries to establish a prefix (or a non-prefix) until a user is created or some other action that invokes redis to load something.

Again, I understand that and was meerly pointing out that it would be required unless the service is restarted after the first user is created, first redis action is done or the additional emoncms is installed etc etc, it must check regularly (at a processor cost) or it will need restarting when ever a new prefix (or non-prefix) is added.

No, not offhand. That is why I suggested elsewhere altering emoncms to not use a prefix for just the service runner this would make a single server wide queue for all instances to deposit their flags in. Including the current emonSD default of no prefix without any need to amend the settings.

@Greebo and yourself are doing a great job of implementing a prefix search method and I was simply offering further info to help progress that approach. But yes, IMO the no SR prefix route would be the better method. But I’m not doing the work and do not know definitively that is is possible, I’m sorta 99% sure it could be, but that is not an overly informed position.

pb66 · 8 February 2019 20:58

Or better still, a unique prefix specific to the SR eg “emoncms-service-runner” in all emoncms instances (even current emonSD “no prefix” default) by hardcoding the SR prefix into the SR handling code in emoncms.

Greebo · 8 February 2019 23:46

I know I shouldn’t have posted that and gone straight to bed but it was already stupid o’clock here

I’ll go through the posts above and provide responses/commentary in the next few hours… I still think this is going to be workable as-is. I had already started to add code to periodically check (hourly? daily?) for new instances when I was fixing the “EmonCMS hasn’t started yet” issue.

Bill.Thomson · 9 February 2019 00:08

Man oh man. Can I ever identify with that one.
Been there and done that way too many times.

Good one, Greebo!

Greebo · 9 February 2019 00:23

I’ve just read through all the posts again, put a long response together commenting on each of the various points you’d both discussed and then I remembered this from Paul:

So I figured I should test hitting the “Flush” button on the admin web page… 5 minutes later there’s still no readapikey (or writeapikey) in redis.

So I deleted all of my other previous comments as they’re now largely irrelevant!

After further testing and this excellent discussion, I believe we’ll need changes made external to service-runner to make this work.
Either all cases where the service-runner key are created have to connect to redis without a prefix, OR, there’s a guaranteed key always loaded in redis which includes the prefix of that EmonCMS instance.

As it stands, readapikey sounded like a great idea but there’s too many cases where it doesn’t exist in redis… (its now almost 10 minutes since I hit the flush button and there’s still no readapikey or writeapikey in my redis)

@TrystanLea, @glyn.hudson, what are your thoughts on the feasibility of either of these?

pb66 · 9 February 2019 00:44

However, you cannot queue a service-runner flag without logging in, or opening a web page with write permissions. Does logging in create a redis apikey entry? (perhaps looking for a write apikey or user would be better?).

Although I still think an instance agnostic queue would be better. If an “add SR flag” function to connect to redis using the fixed SR prefix and add the SR command string and profile path, was added in /Lib and reused globally, wouldn’t that work?

Greebo · 9 February 2019 01:07

I think in most cases, it will be perfectly fine as it stands, but one thing I know for sure after watching this forum for over a year now - the corner cases happen every week.

I agree completely, but I have no idea how much additional coding would be required for that within the bowels of EmonCMS and/or any of the other modules that might use service-runner.

If it was much simpler and less error-prone to add code to ensure there was always a well known key available in redis whenever an EmonCMS instance was running, I could work with that just as easily.
Something like [prefix:]emoncms-available would probably make sense

I don’t have the faintest idea how much effort either of those options would require.

pb66 · 9 February 2019 12:59

the obstacle there is that nothing happens in core emoncms unless a page is opened or an api is called. The feedwriter or MQTT input scripts are the only “active” parts and they are not always installed.

Greebo · 9 February 2019 22:44

Right… Its clearly not enough to just update code behind the “Flush” button then… that doesn’t handle a restart of the entire system… So we’re back to option 1.

pb66 · 9 February 2019 22:58

The flush button is another point that needs visiting. Currently the flush will dump all data across all instances (ie all prefixes) which can cause errors to creep in on other instances as a result of using it in any instance. I currently have to use the command line to delete all data with a particular prefix instead.

borpin · 10 February 2019 12:03

If we assume the edge case is a rarity but we need to cover for it.

How about either a config file or environment variable that service-runner can evaluate at start? Real edge case would be an array of prefixes to scan for.

Also add a not to the settings files telling users the imapct of changing the defaults and what to do with service-runner if they do.

Greebo · 10 February 2019 12:54

These are all things I’ve been pondering but I keep coming back to the same questions that I don’t have a good answer for:
Where is it stored?
How does it get in there?
How do we ensure there’s only one copy?
How is it updated?

As soon as any of those things have to rely on a person doing something, we’ve done it wrong IMHO.

Instead of spending effort on working through those questions and getting sensible answers, we’d be better off spending that effort on working out how we force all calls to service-runner to use an agnostic redis key instead.

I might have a poke through the stuff that calls out to SR and see how hard it might be to change the source of the problem… Maybe it’s as simple as Paul suggests

borpin · 10 February 2019 13:10

I’m not sure if it is relevant, but when I did some digging, sometimes the global redis object is used and other, a new one is created. My initial feeling was too difficult to decipher what I suspect is legacy work.

An environment variable doesn’t matter where it is. A config file, next to the script. Add a default file to the repo, than needs to be copied to be used but put the copied filename in the gitignore.

To some extent you need to look at the audience. Anyone running 2 instances of emoncms on the same server are not your average user so they are likely to expect and be looking for some ‘funnies’ to make it work.

Docs could include a ‘multi instance’ element.