emonTx data disappeared

I woke this morning to discover that data from my two emonTx stopped being logged at 21:02 last night. I happened to have an ssh session open to the pi and that was still active. Other data that is fed in via various scripts was still being collected and displayed by my emoncms website.

I rebooted the pi and the emonTx data came back onstream again. There’s nothing in any emon* log file - they all just start from the reboot point. The [persistent] journal doesn’t show anything at the relevant time, just cron logs from the various data-gathering scripts.

Is there any way to know what happened? Any way to investigate better if it happens again? Any way to stop it happening, or any way to get an alert when it happens?

Hello @djh do you have any other radio units e.g emonTH’s? or did everything on rfm69 radio just stop at that time? Do you know which firmware the emonTx units are running? what are the emonTx units monitoring? how many sensors, temperature, pulse etc?

Hi @TrystanLea - no I just have the two emonTx so I believe that therfm69 stopped working for some reason. The emonTx are running whatever firmware you shipped them to me with. They each have one temperature DS18B20 attached and one pulse counter. and voltage and RSSI. One emonTx just has one CT measuring mains import and the other has four CTs measuring various consumption figures and solar generation

How old is the emonSD install/image Dave? Assuming this is an emonSD when did you last update via emoncms admin page? If you have updated, did you update the rfm2pi FW too?

This concerns me as all the later images have log2ram installed and even if you did an ungraceful power cycle (pull the power plug) there should be old logs in ram. So is this an old pre-log2ram image or is there issues with log2ram I wonder?

We really need to look at the emonhub.log prior to and around the time of failure, even an uneventful log tells us something. If it happens again please copy the emonhub logs to disk before rebooting if you suspect log2ram isn’t installed or functioning.

The current log will be /var/log/emonhub/emonhub.log, In the case of log2ram, this file is rotated and saved to the /var/log.old/emonhub/ folder each hour so it may appear to be only “since boot” if it’s rotated since reboot. When you gracefully reboot the logs are saved to /var/log.bak during shutdown and reloaded to ram on startup, so there should always be some logs to be had, whether they go back long enough for the time in question depends on how soon you check and how fast the log messages are flowing, but for the timescale you mention, there really should be relevant logs if l2r is working as several logs are retained, possibly zipped.

The most common cause for rfm modules to stop receiving has been low voltage, the rfm modules are more sensitive than the Pi’s and can brownout without the Pi being affected if voltage is just a tad low. Have you changed PSU or added any ancillaries to this Pi recently (temp sensors pulse counters etc).

Another test you can try before rebooting/restarting if it happens again is to reset the rfm via ssh see

for a simple commandline utility to reset the rfm2pi.

Similar to this (the title doesn’t do it justice)? Hanging hardware, requiring a power cycle

If so, twice in a short space of time @TrystanLea.

As seen on the other thread, the log just seems to stop.

Might be worth looking at syslog around that time.

Do we know that for a fact? I agree there is a potential link between the 2 issues, but we have little or no info on either as yet so that is a big assumption. Either way, the more info we have the more likely we are to find and resolve it, the emonhub.logs are always a good place to start. Even if the logs do “just stop” do they stop at similar points in the code? what was emonhub doing just before it “just stopped”? As I said above

Also yet another test you can try is to change a setting in emonhub.conf whilst watching with tail - f /var/log/emonhub/emonhub.log in another ssh window.

In instances where all data stops flowing to emonhub it can be difficult to tell if emonhub is still alive. By changing a setting and saving, if emonhub logs the change in settings, emonhub is still alive. This is an additional check to the sudo systemctl status emonhub service check. (don’t forget to change the setting back whilst the conf is open).

Possibly 3 if you count this thread

But with one user opting for serial connections and the other tempted to automate resetting the rfm2pi when the data stops, there is little opportunity to debug.

Also @djh, if the emonhub.log isn’t moving could you also do a quick df -h before restarting just to confirm the log partition isn’t full, cheers.

Thanks for all the thoughts, everybody. The system is low-write 9.9.8 (yes, I know). The pi has a battery backup so shouldn’t be suffering from any brownouts. No other recent changes. /var/log is 3% full. I have logging set to warn & errors so most of the time emoncms.log sits there with nothing much happening (for months). I’ll try to check emonhub.log before rebooting if the problem recurs.

I don’t think I run log2ram (there’s no process with that name) because on my system /var/log is a tmpfs and log management is good enough that there’s never a problem (because I’ve controlled the emoncms.log bad boy).

I rebooted via ssh, rather than hardware, and as I reported the system was still working apart from the RF link. The emoncms admin display claimed emonhub was Active running.

I did look at the journal (which is persistent). Here’s the relevant bit for -u emonhub

Jan 16 16:17:44 emonpi sudo[1131]: pam_unix(sudo:session): session closed for user root
Jan 16 16:17:44 emonpi emonhub[1103]: Starting OpenEnergyMonitor emonHub: emonhub has been started ok.
Jan 16 16:17:44 emonpi systemd[1]: Started LSB: Start/stop emonHub.
Jan 18 09:56:07 emonpi systemd[1]: Stopping LSB: Start/stop emonHub...
Jan 18 09:56:10 emonpi sudo[27334]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/mkdir -p /var/log/emonhub
Jan 18 09:56:10 emonpi sudo[27334]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jan 18 09:56:10 emonpi sudo[27334]: pam_unix(sudo:session): session closed for user root

As I say, the data stopped at 21:02 yesterday so there’s nothing in the log except my sudo reboot and the system restarting. Here’s the full journal around the time of the failure:

Jan 17 21:00:03 emonpi CRON[9048]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 17 21:00:03 emonpi CRON[9046]: pam_unix(cron:session): session opened for user pi by (uid=0)
Jan 17 21:00:03 emonpi CRON[9047]: pam_unix(cron:session): session opened for user pi by (uid=0)
Jan 17 21:00:03 emonpi CRON[9060]: (pi) CMD (/home/pi/additional-scripts-djh/tplink-to-emoncms.pl)
Jan 17 21:00:03 emonpi CRON[9059]: (root) CMD (/home/pi/additional-scripts-djh/mk-var-log-djh-scripts)
Jan 17 21:00:03 emonpi CRON[9061]: (pi) CMD (/home/pi/additional-scripts-djh/enphase-to-emoncms.pl)
Jan 17 21:00:03 emonpi CRON[9048]: (CRON) info (No MTA installed, discarding output)
Jan 17 21:00:03 emonpi CRON[9048]: pam_unix(cron:session): session closed for user root
Jan 17 21:00:04 emonpi CRON[9047]: pam_unix(cron:session): session closed for user pi
Jan 17 21:00:05 emonpi CRON[9046]: pam_unix(cron:session): session closed for user pi
Jan 17 21:01:01 emonpi CRON[9093]: pam_unix(cron:session): session opened for user pi by (uid=0)
Jan 17 21:01:01 emonpi CRON[9097]: (pi) CMD (/home/pi/additional-scripts-djh/onehouse-weather-to-emoncms.pl)
Jan 17 21:01:35 emonpi CRON[9093]: pam_unix(cron:session): session closed for user pi
Jan 17 21:02:02 emonpi CRON[9115]: pam_unix(cron:session): session opened for user pi by (uid=0)
Jan 17 21:02:02 emonpi CRON[9119]: (pi) CMD (/home/pi/additional-scripts-djh/wattisham-weather-to-emoncms.pl)
Jan 17 21:02:07 emonpi CRON[9115]: pam_unix(cron:session): session closed for user pi
Jan 17 21:03:00 emonpi CRON[8993]: pam_unix(cron:session): session closed for user pi
Jan 17 21:03:31 emonpi CRON[8991]: pam_unix(cron:session): session closed for user pi
Jan 17 21:05:01 emonpi CRON[9170]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 17 21:05:01 emonpi CRON[9169]: pam_unix(cron:session): session opened for user pi by (uid=0)
Jan 17 21:05:01 emonpi CRON[9179]: (pi) CMD (/home/pi/additional-scripts-djh/tplink-to-emoncms.pl)
Jan 17 21:05:01 emonpi CRON[9180]: (root) CMD (/home/pi/additional-scripts-djh/mk-var-log-djh-scripts)
Jan 17 21:05:02 emonpi CRON[9170]: (CRON) info (No MTA installed, discarding output)
Jan 17 21:05:02 emonpi CRON[9170]: pam_unix(cron:session): session closed for user root
Jan 17 21:05:03 emonpi CRON[9169]: pam_unix(cron:session): session closed for user pi

I’m not sure this sounds much like the other problems people have had.

We’ve had issues like this reoccur over the years, as they are usually quite rare its hard to track down exactly what the cause is, an emonhub restart, or emontx restart seems to fix it. I think something crashes somehow in the rfm69 radios, I think @Robert.Wall or @pb66 may have a better grasp of what that could be than I do.

It sounds like you are running quite an old image @djh and perhaps old firmware, if you dont wish to update and test the latest at this point, perhaps setting up some kind of alert using nodered or similar might be the best solution for the time being?

Have you had this happen before on your system? and how long has it been running since the last time?

Reading your posts again, Im a little confused by these points, did the data come through again after the reboot, or was it the hardware reboot (power cycle or reset button?) that bought the data back online again?

I don’t think I can be of much help - I’ve never studied the sketch in the RFM69Pi. I assume it’s much the same as the emonPi, but even while working on the emonPiCM, I didn’t go into it in great detail. I’ve studied the transmit side of the “classic” JeeLib, but never the receive side.

If I had to make a guess, I’d say that by some mechanism, a character in the message got lost and it’s looking for data that’s never going to come and hasn’t timed out - or it has timed but doesn’t tell anybody.

1 Like

No, I’m fairly happy with the way I have the system set up. Actually I don’t need an alert because I already get one. I have a script that runs hourly and sends email if it spots any problems. So I had an email every hour from 22:00 last night - unfortunately I hadn’t looked at them :frowning:

No it’s a new problem for me, so no last time.

Yes the data came through after a sudo reboot entered via ssh. No power interruption. Last power interruption was a few days ago after a power cut, which the pi survived because of its battery, but unfortunately it lost its LAN connection stack, so needed a hard reboot since I couldn’t access it over the network.

1 Like

One thing I did notice were three new unknown nodes on the input page, but I’ve no idea when they were generated since I don’t normally look at that page. Incidentally, it’s very tedious deleting each input one at a time until eventually the node winks out of existence. Is there any way to delete the entire spurious node more simply?

Put a smart plug on the supply side and just do a hard reset :laughing:.

Phantom (and duplicate) inputs can be dealt with the same way, using the “clean” input api

the api can be called eg by daily cron and any inputs without a processlist will be deleted, so put a redundant process in any legitimate but unused inputs to keep things tidy eg “reset to original” will effective have no effect other than to save an input from deletion.

Ok, so that suggests an old image and explains the lack of logs following a reboot. I still wouldn’t rule out the rfm browning out just yet just because of a UPS, the rfm is far more sensitive to low voltage than a Pi, if the UPS (or any other power source) was operating at the lower end of an acceptable voltage range and dipped just momentarily, the rfm wouldn’t be able to recover. Even a good PSU/UPS can deteriorate with time and/or even temporarily with cold temps. I never rule out PSU issues until I can prove otherwise.

If it happens again please try and save any emonhub.logs, check the service with sudo systemctl status emonhub, try changing a setting in emonhub.conf and try the reset_rfm2pi script in that order and record any info, before resorting to a reboot or power cycle. Perhaps automate these steps with your alert until we get to the bottom of it. If you are near the device when it stops, can you see if the rfm2pi LED is on, off or flashes intermittently (not the pi led) the rfm2pi led is tiny so maybe familiarise yourself with it whilst things are working first.

If you pre-install the script in the thread I linked, then resetting the rfm2pi is just one command reset_rfm2pi, there’s nothing bad in there, the script’s content is in that thread so you can check it out before installing.

Can you also make sure the emonhub logging level is set to debug in emonhub.conf and also set quiet = false so we can see if there is other traffic that maybe interfering with the rfm2pi too.

I would also be keen to know what FW is installed on the rfm2pi, this is also logged to emonhub.log at startup eg

Jan 15 15:44:56 docker4 emonHub[15767]: INFO     MainThread EmonHub emonHub (emon-pi variant) v2.1.5
Jan 15 15:44:56 docker4 emonHub[15767]: INFO     MainThread Opening hub...
Jan 15 15:44:56 docker4 emonHub[15767]: INFO     MainThread Logging level set to DEBUG
Jan 15 15:44:56 docker4 emonHub[15767]: INFO     MainThread Creating EmonHubJeeInterfacer 'RFM2Pi'
Jan 15 15:44:56 docker4 emonHub[15767]: DEBUG    MainThread Opening serial port: /dev/ttyAMA0 @ 38400 bits/s
Jan 15 15:44:58 docker4 emonHub[15767]: INFO     MainThread RFM2Pi device firmware version: [RF12demo.13]
Jan 15 15:44:58 docker4 emonHub[15767]: INFO     MainThread RFM2Pi device current settings:  E i5 g210 @ 433 MHz q1

[borrowed from Hanging hardware, requiring a power cycle - #3 by Guff666]

Also Dave, there should/could be another rotated emonhub log /var/log/emonhub/emonhub.log.1 as emonhub rotates the logs at 5mb.

A smart plug won’t achieve anything since the pi has battery backup.

1 Like

It’s a relatively new unit that has the pi permanently powered from the battery and charges the battery in the background. (It’s an Adafruit PowerBoost 1000 Charger The green light (full charge) is normally on and was when I looked about two hours before the RF failure. The battery would last longer than that (has in previous tests). There was no power failure/glitch noticed by any other system or by the MK1 eyeball

I know which the rfm2pi LED is. I used to watch it but now I can’t because I finally got around to cutting a slot in the case lid so the antenna can poke up so I can fit the lid to keep the dust out. Whose bright idea was it to supply a black case so you can’t see what’s inside? I’m looking for a transparent replacement, ideally just a lid but I’ve no idea where to get that and Adafruit seem to be discontinuing their case but I haven’t got to the bottom of that yet. I don’t know what else would fit.

That’s what kills emon-systems - mine just records errors and warnings because otherwise it fills disks. So I’m extremely reluctant to do that. That’s why the firmware version isn’t logged.

Mmmmm, ok so to recap. We have no LED accessible, no emonhub logs available and you’re not willing to even consider the most common issue of low voltage to rfm. All I can say is I hope it doesn’t happen again and it was just a glitch, because hope is just about all we got in the toolbox.

I have no idea what would give you that impression, the emonhub.logs have been, by far, without doubt, the most useful debug tool this project has had for many years. It is the only piece of software on the emonsd images that rotates it’s own logs when needed and guarantees it cannot possibly fill the 40mb log partition with an absolute max of 2 files under 5mb each. (unless you mess with the logging to use syslog)

I understand running with warning level logs when all is well, but when trying to debug, you need debugging level info for a fighting chance, so even if only temporarily, even just to confirm there is nothing of use there, you really do need to set loglevel to debug. You can wait so see if it occur again or occurs more frequently first if you choose, but with the IT equivalent of shutting your eyes and sticking your fingers in your ears you’re unlikely to find many clues.

1 Like