Lightwave RF MQTT Daemon doesn't start properly after reboot

Hi,

I’ve needed to restart my Rpi a few times recently and noticed a repeating problem, that after a reboot, lwrfd doesn’t work. The following appears in daemon.log and syslog:

May 25 07:19:35 emonpi lwrfd[441]: Starting LightwaveRF MQTT daemon: lwrfd.
May 25 07:19:35 emonpi lwrfd[489]: Failed to connect, return code -1

If I attempt to send any commands by publishing to the lwrf topic, nothing happens, nothing appears in daemon or syslog. If I restart lwrfd, then it works just fine:

sudo service lwrfd restart
May 25 18:50:55 emonpi lwrfd[27091]: Stopping LightwaveRF MQTT daemon: lwrfd.
May 25 18:50:55 emonpi lwrfd[27100]: Starting LightwaveRF MQTT daemon: lwrfd.
May 25 18:50:55 emonpi lwrfd[27107]: Subscribing to topic lwrf for client lwrf using QoS 1
May 25 18:51:04 emonpi lwrfd[27107]: Sending command 0 on channel 1, level 0
May 25 18:51:05 emonpi lwrfd[27107]: Sending command 1 on channel 1, level 0

I think I have found the connect failure message and -1 return code comes from this section in https://github.com/openenergymonitor/lightwaverf-pi/blob/master/lwrfmqtt.c

if ((rc = MQTTClient_connect(client, &conn_opts)) != MQTTCLIENT_SUCCESS)
{
log_msg(“Failed to connect, return code %d\n”, rc);
exit(-1);

I guess that it is a timing issue, and that when lwrf starts something else hasn’t finished yet, hence restarting the service sometime later works, but I can’t see what the conflict is, and I haven’t manged to understand what the code is trying to do. Any ideas?

I am using emoncms v9.6 on a Rpi with RFM69Pi and the OOK transmitter from the OEM store

Thanks. Ian

Edited title (expanded acronym) - Moderator, BT

OK, so it’s happened again. No, not a problem with LightwaveRF, but a problem with my brain!
After I submitted the post last night, and went to bed, I then had an idea that had eluded me whilst I was staring at the screen.

I work in networking, and so when I see “Failed to connect” I think of network problems like cables and ports on switches, but that’s not relevent here as it’s all inside the Pi. So I also thought of tcp and udp ports, but I couldn’t work out why lwrfd could not bind to it’s own port at startup, but magically it would work later?

But I think I got this back to front. I now suspect lwrfd is trying to connect to a service, and specifically the MQTT broker service. And I believe that service is not active when lwrfd starts up, hence the failure to connect, but later it works as MQTT broker is running by then.

The logs show lwrfd does start before the mosquitto MQTT broker, although it all happens within 1 second, so I am relying on the order in which they appear in syslog:

May 25 07:19:35 emonpi systemd[1]: Starting LSB: LightwaveRF MQTT daemon startup…
May 25 07:19:35 emonpi systemd[1]: Starting LSB: mosquitto MQTT v3.1 message broker…

May 25 07:19:35 emonpi mosquitto[444]: Starting network daemon:: mosquitto.

May 25 07:19:35 emonpi lwrfd[441]: Starting LightwaveRF MQTT daemon: lwrfd.
May 25 07:19:35 emonpi lwrfd[489]: Failed to connect, return code -1

May 25 07:19:35 emonpi systemd[1]: Started LSB: LightwaveRF MQTT daemon startup.
May 25 07:19:35 emonpi systemd[1]: Started LSB: mosquitto MQTT v3.1 message broker.

So, what’s the best way of fixing this ? Add a delay before starting lwrfd? Change the sequence that daemons get started to move lwrfd towards the end? Or modify the lwrfd code so that if it fails when it tries to connect to the broker, it can pause and try again after a few seconds, perhaps repeat a few times before exiting?

And assuming this is the cause of the problem, I wonder if it’s worth tweaking the error message from lwrfmqtt.c to something like “Failed to connect to the MQTT broker service”, which would make it easier to find similar problems? @glyn.hudson it looks like you were involved in the lwrfmqtt code on Github, so interested in your thoughts?

Hi Ian,

Thanks again for your excellent debug output, a support request from you is a joy to work though.

I’ve checked my syslog (running stock emonSD-03May16 on a RasPi3) and lwrfd connects to local host MQTT Mosquitto server first time. I wonder is different about your system? As you say it sounds like the Mosquitto MQTT server has not finished booting before LWRF tries to connect

Out of interest have you made any changes to the standard image, installed applications etc?

What version mosquitto are you running? The standard emonSD-03May16 has got 1.4.8

$ mosquitto -v
mosquitto version 1.4.8 (build date Sun, 14 Feb 2016 15:06:55 +0000) starting

I have improved the error message to read Failed to connect to MQTT. To see the change you will need to git pull and recompile:

Update: just noticed lightwaverf-pi does not seem to be cloned on the latest image

git clone https://github.com/openenergymonitor/lightwaverf-pi
cd ~/lightwaverf-pi
git pull
make
sudo make install

If lwrf fails to connect it should wait 10s then try again:

Does lwrfd manage to connect 10s later for you?

Hi @glyn.hudson ,

I’m at work now and so can’t confirm the version of mosquitto, but I am running the emonSD-03May16 image. I did run the update from emonhub a couple of days ago, so I am on emoncms v9.6. I don’t think I have installed any additional applications - this was one of the nice parts about upgrading from the 2015 image to the May 2016 image that I didn’t need to install tweepy, lightwaveRF and mosquitto, I simply modified my python script to add MQTT authentication and it worked.

I am running this on a Pi - not a Pi2 or Pi3, but a Pi, model B I think, so maybe on a newer model lightwaverf starts up more quickly? Whilst investigating I did a search for the error message and it also appeared in this post

I also wonder how many people are using lightwaveRF with this copy of emonSD ? Maybe not enough to report it? It was only after I had had this problem several times that I decided it needed investigation - before that I just started the service and got on with something else :wink:

I’ll confirm the version tonight, and pull the updated code. I see you updated your post with a comment “just noticed lightwaverf-pi does not seem to be cloned on the latest image” which I don’t really understand (I really need to make time to learn git !) but I will follow the instructions you posted and let you know the results.

Thanks for your excellent support

Yup, I think this is what’s caused the issue. The Model B (understandably) takes longer to boot and loadup mosquito MQTT

Sadly not many I think, maybe you’re the first! I use it daily myself to control my lwrf plugs. It is a bit of a niche application.

‘Not cloned’ just means the lightwaverf-pi folder is not present, git cloning simply downloads a ‘clone’ of the repo from git.

Have you ever tried monit? It is excellent at restarting services under various circumstances. I have a weather recording system that sometimes just stops writing the data. Monit can detect a folder has not been written to for a set time and restart the service. Just an apt-get away :smile:

1 Like

Hi @glyn.hudson

I can now confirm I am using v1.4.8 of mosquitto. However I tried the updates to lightwaverf but it hasn’t fixed it yet.

I followed your instructions to clone, which worked:

git clone GitHub - openenergymonitor/lightwaverf-pi: a port of the arduino lightwaverf library to the raspberry pi
Cloning into ‘lightwaverf-pi’…
remote: Counting objects: 142, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 142 (delta 0), reused 0 (delta 0), pack-reused 139
Receiving objects: 100% (142/142), 30.81 KiB | 0 bytes/s, done.
Resolving deltas: 100% (67/67), done.
Checking connectivity… done.

although the compile threw out a few warning messages:

[email protected]:~/lightwaverf-pi $ make
gcc -fPIC -O3 -g -Wall -Werror -c -o lightwaverf.o lightwaverf.c
gcc -shared -Wl,-soname,liblightwaverf.so.0 lightwaverf.o -o liblightwaverf.so.0.1
ldconfig -v -n .
.:
liblightwaverf.so.0 → liblightwaverf.so.0.1 (changed)
ln -s liblightwaverf.so.0 liblightwaverf.so
gcc send.c -o send -L. -llightwaverf -lwiringPi
gcc receive.c -o receive -L. -llightwaverf -lwiringPi
gcc send2.c -o send2 -L. -llightwaverf -lwiringPi
gcc mqttsend.c -o mqttsend -L. -llightwaverf -lwiringPi -lpaho-mqtt3c
gcc lwrfmqtt.c -o lwrfmqtt -L. -llightwaverf -lwiringPi -lpaho-mqtt3c
lwrfmqtt.c:20:0: warning: “TRUE” redefined
#define TRUE 1
^
In file included from lightwaverf.h:9:0,
from lwrfmqtt.c:5:
/usr/include/wiringPi.h:31:0: note: this is the location of the previous definition

define TRUE (1==1)

^
lwrfmqtt.c:21:0: warning: “FALSE” redefined
#define FALSE 0
^
In file included from lightwaverf.h:9:0,
from lwrfmqtt.c:5:
/usr/include/wiringPi.h:32:0: note: this is the location of the previous definition

define FALSE (!TRUE)

^

[email protected]:~/lightwaverf-pi $ sudo make install
install -m 0644 liblightwaverf.so.0.1 /usr/local/lib
ldconfig
ln -f -s /usr/local/lib/liblightwaverf.so.0 /usr/local/lib/liblightwaverf.so
install -m 0644 *.h /usr/local/include
install -m 0755 lwrfmqtt /usr/local/bin
install -m 0755 lwrfd /etc/init.d

I then rebooted, and checked the log, but sorry to say it still looks like a problem:

grep -i mqtt /var/log/syslog
May 26 21:12:40 emonpi systemd[1]: Starting LSB: LightwaveRF MQTT daemon startup…
May 26 21:12:40 emonpi systemd[1]: Starting LSB: mosquitto MQTT v3.1 message broker…
May 26 21:12:40 emonpi lwrfd[439]: Starting LightwaveRF MQTT daemon: lwrfd.
May 26 21:12:40 emonpi lwrfd[488]: Failed to connect to MQTT, return code -1
May 26 21:12:40 emonpi systemd[1]: Started LSB: LightwaveRF MQTT daemon startup.
May 26 21:12:40 emonpi systemd[1]: Started LSB: mosquitto MQTT v3.1 message broker.

It looks as though the loop to check if it is connected and sleep for 10 isn’t working? Could this be related to the compiler warning messages about TRUE being redefined, and affecting the connected = TRUE; on line 186?

I restarted the lwrfd service manually and it worked correctly. Can you take another look?

Thanks for testing, I’ll take another look…

Warning messages do not look serious, I also had them.

Working with the original developer of lightwaverf-pi I have pushed a fix.

This should fix the reconnection issue and try and re-connect every 10s if MQTT connection is not established first time.

To update and recompile:

rpi-rw
sudo systemctl stop lwrfd.service
cd ~/lightwaverf-pi
git pull
make
sudo make install
sudo systemctl daemon-reload
sudo systemctl start lwrfd.service
rpi-ro
tail tail /var/log/daemon.log 

You should see connection message. Please test after a reboot.

Thanks .

Hi @glyn.hudson Well, what can I say, what amazing support !!! I wasn’t sure you would be able to spend any time on this as I might be the only person affected, and so I was trying to fix this myself, with a primitive 2 second delay in lwrfmqtt.c but was struggling to get even that to work. So this fix is fantastic, and I am pleased to say it has worked:

May 29 16:37:58 emonpi systemd[1]: Starting LSB: LightwaveRF MQTT daemon startup…
May 29 16:37:58 emonpi systemd[1]: Starting LSB: mosquitto MQTT v3.1 message broker…
May 29 16:37:58 emonpi mosquitto[443]: Starting network daemon:: mosquitto
May 29 16:37:58 emonpi lwrfd[489]: Connecting to MQTT
May 29 16:37:58 emonpi lwrfd[439]: Starting LightwaveRF MQTT daemon: lwrfd.
May 29 16:37:58 emonpi lwrfd[489]: Failed to connect to MQTT, return code -1
May 29 16:37:58 emonpi systemd[1]: Started LSB: LightwaveRF MQTT daemon startup.
May 29 16:37:58 emonpi systemd[1]: Started LSB: mosquitto MQTT v3.1 message broker.
May 29 16:38:07 emonpi lwrfd[489]: Reconnecting to MQTT …
May 29 16:38:07 emonpi lwrfd[489]: Subscribing to MQTT topic lwrf for client lwrf using QoS 1