RFM69Pi stops updating/freezes

This thread is a continuation of a thread started in the old forum

A number of users have experienced their RFM69Pi module stopping or freezing, causing emoncms to stop updating it’s RF derived feeds.
This can occur at any time, often after weeks or even months of running. My RFM69Pi however has chosen to stop more frequently, making it a good testbed to identify what the problem is being caused by.

@pb66 has been the driving force behind solving this, and I’m grateful for his support & help.

NOTE - Before assuming that you have this ‘firmware’ issue, please check the symptoms described in the old forum thread first, AND also check this thread relating to physical connection / shorting issues that have been experienced with the RFM2Pi boards.

Paul

Thanks future moving this thread over. However I’m slightly sad that it’s still running…

Has there been any progress into what could be the issue and how it could be solved? The RFM69Pi’s we have been running in the lab and at home and been stable.

Stitching together the ragged ends of the corresponding thread in the old forum.

I asked:

I gather that the serial comms path is two way - a command stream for changing various RF Module parameters (infrequent) and received packet chat (frequent) streaming back. A command gets a response, I can see that in the logs. My question was for the packet stream - does that get echoed back?

Reply from pb66:

The commands are echoed back from the RFM2Pi sketch as they are in the RF12 demo with a “>” prefix, emonhub then logs those as an “acknowledged command:”

Ok, got that direction thanks. How about a packet report travelling the other way?
i.e. does something such as a ? 10 147 0 0 0 0 0 120 95 245 7 (-68) get echoed back to the originator since the read is posted with echo enabled?

The reason I ask is that the RF12demo command processor is bare bones, injecting a string like that will force some sort of undesired response.

Not to my knowledge or by design in emonhub.

emonhub has the ability to write to the serial port => the rfm2pi, not only for the RFM settings, quiet etc but also so send packets out onto network, (currently only really used for the emonGLCD time updates) in between settings and/or transmissions, the outgoing serial line from the Pi to the rfm2pi rx should not be in use.

Many users do not know the serial port can only be used by 2 devices, and there are softwares like node-red on the emonSD that could interrupt/interject, but I certainly do not knowingly “allow” anything else access to the serial port.

These “3’s a crowd” instances are normally reported as emonhub crashing or at least not getting any input, and usually stem from the Pi’s serial console access not removed from /boot/cmdline or the use of avrdude without stopping emonhub.

I can see there could be something else getting involved, but in these latest FWs I removed the auto printing of the full help text when an unknown settings choice is received and replaced it with a “catch all else” response of “unknown option - type h for help” so even an unexplained choice would hopefully get recorded in emonhubs logs as “non-numerical data”, but I could look closer at the sketch and emonhub code to be sure it isn’t silently discarded.

Ok, understood. I did a quick check with the unmodified RF12demo and the parentheses from … 10 147 0 0 0 0 0 120 95 245 7 (-68) trigger a full command list response. Looks like you have already caught that one.

This is part of an effort to force a lockup with the current driver. If the comms path is very busy (but hopefully no character truncation/corruption/buffer overflow etc.), this would expose any tiny interrupt service glitch window faster.

An easily reproducible failure case is still proving elusive…

Hi,

Not sure if it’s related but I recently have been experiencing problems where I stop getting updates from emonhub. There’s nothing in the emonhub log file to indicate that it’s received any values from the rf module. I opened the RF module using screen (whilst emonhub was running). It seemed responsive to commands, e.g. ‘v’, etc. The modes seem correct (although I don’t remember if quiet mode was set - I think it wasn’t, and don’t have a copy of the screen session :(). However, there were no packets being displayed (neither good nor bad ones). The EmonTX was still transmitting (I had previously had this problem and fixed it by restarting emonhub).

Running the ‘q’ command immediately brought things to life: I started seeing packets (good and bad) again, and emonhub was also seeing them and updating emoncms.

I’m running the sketch that is put onto the device when it’s shipped as a standalone unit (the RFM2Pi):

[RF12demo.12] E i5 g210 @ 433 MHz q1

I don’t have time to look at the code at the moment but can maybe dig more later.

Thanks,
Andy.

which RFM2Pi are you using? rfm69pi, rfm2pi(rfm12) or rfm2pi(rfm69)

which version of emonhub are you using? original v1.2 or “emon-pi” v??

When you say you used “screen” I assume you started a serial console like minicom or similar? there can only be 2 parties involved with the serial port at any one time, minicom could block emonhub and/or vice versa. the “q” function only tells the FW whether to print the data or not, it doesn’t do anything significant enough to nudge things back to life as far as we have seen previously. so that was probably coincidental or the result of another action that was previously blocked by multiple parties on the line.

There is specific firmware in use in this thread so that we know we are all using the same, the [RF12demo.12] label covers many releases, updates and edits, so there is no real way of knowing what you have beyond when did you buy it and how long could it have been on the shelf or had the FW installed prior to that. plus we cannot know what version compiler and/or library versions were used etc etc.

When the data stops does the led on the RFM2Pi continue to flash ? (assuming it does normally flash on receipt of a packet). The issue on this thread has been narrowed to point where we can nudge the device back to life by setting the node id, group or frequency. There is a utility script for this purpose in the original thread on the old forum (link in the post at the top of this thread) but you can do it via emonhub very easily, but since emonhub will only apply changes you will need to change a setting, save the cof and then change it back to trick emonhub into resending the original setting.

What sketch is the emonTx running? what is a typical packet? can you post a log excerpt?

Just for info, mine has stopped 3 times in the past 3 days, but has been successfully restarted each time after 5 minutes by a node-red watchdog/script.
I’ve temporarily reset the watchdog back to 30 seconds again, but can change it back again if any ‘fixes’ need testing.

Paul

Perhaps @emjay can expand on how we can extract some data from your RFM module, although we cannot reproduce the issue on demand your’s does seem to fault frequently enough to make some progress.

Maybe once we can gather some more data on the “hung status” we can mimic that on-demand to speed things up while trying for a solution and then roll out a candidate for a somewhat slower “real world” test.

It’s the RFM69Pi (I believe - not sure what the latter two are?)

I"m using a modified version of the emon-pi branch. My changes are on github here: GitHub - adpeace/emonhub: emonHub is the link between your inputs and emonCMS

My changes are mostly around emonhub config handling (removing a lot of code) - the interfacers are unmodified.

I was surprised to be able to connect to the serial console too but it definitely works - I can also connect to it and see the packets whilst emonhub is running (and still posting to emoncms) using screen as root. Screen is GNU/Screen: Screen - GNU Project - Free Software Foundation - I connected using screen /dev/ttyAMA0 38400. Maybe the two are racing to read input and so some stuff gets through to emoncms and others to screen?

However, I’m pretty sure my actions did cause the module to come back to life - it had been dead for days previously and suddenly started working. It may have been one of the other commands, admittedly - I issued other commands prior to ‘q’ thinking they hadn’t worked.

Fair enough - my issue may also be different to the others’. It doesn’t seem to happen very often and I can’t reproduce it deterministically either. It was bought last November from the OEM shop but as you say it could have anything on it. When I get more time, if there is a sketch you prefer to test with, I can flash it with that.

Not sure about the LED. If it happens again I’ll check it.

Whatever it was shipped with, it’s a v3 device. I don’t think it’s the emontx that is problematic since in all cases restarting emonhub has fixed this.

Here’s an example packet:

OK 10 147 1 0 0 0 0 0 0 184 95 184 11 184 11 184 11 184 11 184 11 184 11 1 0 (-31)

My main interest is in helping to find/fix any issues - this isn’t in a critical installation so I’m not too worried about supporting it.

Just earlier versions the first ATmega328p “v2” boards had rfm12b modules then they got swapped out for rfm69cw’s on the same board, then the v3 board was updated to breakout the extra IO to a new GPIO header and called “RFM69Pi”

Ok so ignore my previous comment about using emonhub to “nudge” the settings as you have removed the “runtimesettings” code, so that won’t work. I see you have also removed the “thread is dead” check, this was added as the interfacers are threaded and the main thread has no way of knowing if the child process is still there or not, the interfacers cannot be relied upon to inform the main thread (or the user/log) once it has crashed so IF the RFM2Pi thread was crashing you wouldn’t know unless you start checking and counting python processes.

It’s not common for the RFM2Pi thread to crash as it has more error handling and sanitation than the other types eg the serial interfacer. The fact it is unlikely and not an obvious thing to look for was the reason for the “thread is dead” check.

I have just tested GNU screen and it does block emonhub from seeing any data whilst connected so anything you try during any one session will only be seen to take effect in emonhub as you exit screen.

In future when you use screen can you note if you see any data flow when you first open screen, if not just type “4b” if the data then starts in screen, exit screen and it should be active again in emonhub. [quote=“adpeace, post:10, topic:137”]
Fair enough - my issue may also be different to the others’. It doesn’t seem to happen very often and I can’t reproduce it deterministically either. It was bought last November from the OEM shop but as you say it could have anything on it. When I get more time, if there is a sketch you prefer to test with, I can flash it with that.
[/quote]

The “4b” test above should confirm if it’s the same issue, you are welcome to use the FW from this thread to chip in with testing etc The differences are very minimal and there isn’t anything in the changes expected to effect a change yet it is just to level the playing field and give us a known start point.[quote=“adpeace, post:10, topic:137”]
Not sure about the LED. If it happens again I’ll check it.
[/quote]

I recommend checking it whilst it is working to note the position,colour and intensity as it is not easy to see when there is no fault so thinking it is not flashing during a fault condition when it is can be an easy mistake to make[quote=“adpeace, post:10, topic:137”]
I don’t think it’s the emontx that is problematic since in all cases restarting emonhub has fixed this.
[/quote]

Fair comment, I wasn’t thinking the emonTx was failing but the packet it sends may have been problematic to the rfm2pi and a reset gave it another go eg the “zero runs” mentioned in other threads, but I see the “300” (184 11) positive fault reporting is in play so you must have the later FW .

There is a bit of a discrepancy with the packet though that could be relevant depending on where you fetched it from, your current emonhub.conf settings and whether this is a consistent format. The “data” part is 24 byte values representing 12 signed ints

147 1 0 0 0 0 0 0 184 95 184 11 184 11 184 11 184 11 184 11 184 11 1 0

The payload for the current sketch is 26 bytes, the last value is an unsigned Long and should be 4 bytes not the two, " 1 0 " you see. If the conf matches the FW and the length is constant it should pose no problem which is most probably the case, but worth checking out.

OK, thanks - it’s definitely the newer one iwth the RFM63CW.

I removed it in an effort to simplify the code and reduce CPU usage (that particular loop was consuming about 10% of CPU on the Pi 1 IIRC). I don’t think that was the cause of the issue though since interacting the with module through screen brought emonhub back to life.

Will do - this time around there was no data flowing, but I haven’t run into the problem again yet. When I get chance I will reflash to the firmware on this thread. Is it a newer version of the firmware the modules are shipping with?

Hmm ok; this was copy/pasted from screen directly.

Paul (Reed),

Great that this has been brought over from the previous thread but could you edit your top post to add in some of the things that have been seen to be causing this to make sure anyone posting has made sure that these aren’t the cause.

So things like making sure that the rfm module is seated correctly and not sitting skew due to the aerial wire pushing the module down, making sure that the underside of the aerial connector isn’t touching any of the pi connectors (found mine was doing that again a week or so ago after another move) and also making sure that the pi has adequate power.

If you make sure that your top post includes these then we may have fewer issues…

Simon

I’ve added some links to the first post @Bramco , feel free to contribute if you think anything’s missing.

Paul

A post was split to a new topic: Unable to get new emonTx working with existing emonPi