Having to restart Emonhub to receive data from emonTH

Hi,

A new user here and I seem to have an issue with the data transferring between the emonTH and emonPI in that it seems to work from anything from 3 to 6 hours and then stops. I had been pulling the batteries on the emonTH initially and then restarting the emonPI but worked out the issue was with the emonPI so moved to power cycling that when I saw this issue. I have since worked out if I restart Emonhub the data transfer starts again.

I’ve not found anything searching for this issue and at this point I’m not really sure what I need to be searching for.

Any one got any ideas on how I can resolve this?

Hi welcome.

Can you open the admin page and click on Copy as Markdown next to Server Information and paste it here - no further formatting required.

Hi, requested information below.

Server Information

Server Information

Services

  • emonhub :- Active Running
  • emoncms_mqtt :- Active Running
  • feedwriter :- Active Running - sleep 300s 43 feed points pending write
  • service-runner :- Active Running
  • emonPiLCD :- Active Running
  • redis-server :- Active Running
  • mosquitto :- Active Running
  • demandshaper :- Active Running

Emoncms

Server

  • OS :- Linux 4.19.75-v7+
  • Host :- emonpi | emonpi | (192.168.1.34)
  • Date :- 2020-07-22 08:48:17 BST
  • Uptime :- 08:48:17 up 12:41, 0 users, load average: 0.03, 0.09, 0.09

Memory

  • RAM :- Used: 19.47%
    • Total :- 975.62 MB
    • Used :- 189.98 MB
    • Free :- 785.64 MB
  • Swap :- Used: 0.00%
    • Total :- 100 MB
    • Used :- 0 B
    • Free :- 100 MB
      Write Load Period

Disk

  • / :- Used: 51.11%
    • Total :- 3.92 GB
    • Used :- 2.01 GB
    • Free :- 1.72 GB
    • Write Load :- 806.52 B/s (12 hours 11 mins)
  • /boot :- Used: 20.55%
    • Total :- 252.05 MB
    • Used :- 51.79 MB
    • Free :- 200.26 MB
    • Write Load :- 0 B/s (12 hours 11 mins)
  • /var/opt/emoncms :- Used: 0.05%
    • Total :- 9.98 GB
    • Used :- 4.95 MB
    • Free :- 9.47 GB
    • Write Load :- 68.91 B/s (12 hours 11 mins)
  • /var/log :- Used: 3.91%
    • Total :- 50 MB
    • Used :- 1.95 MB
    • Free :- 48.05 MB
    • Write Load :- n/a

HTTP

  • Server :- Apache/2.4.38 (Raspbian) HTTP/1.1 CGI/1.1 80

MySQL

  • Version :- 5.5.5-10.3.17-MariaDB-0+deb10u1
  • Host :- localhost:6379 (127.0.0.1)
  • Date :- 2020-07-22 08:48:17 (UTC 01:00‌​)
  • Stats :- Uptime: 46230 Threads: 9 Questions: 126359 Slow queries: 0 Opens: 51 Flush tables: 1 Open tables: 42 Queries per second avg: 2.733

Redis

  • Version :-
    • Redis Server :- 5.0.3
    • PHP Redis :- 5.0.2
  • Host :- localhost:6379
  • Size :- 129 keys (837.03K)
  • Uptime :- 0 days

MQTT Server

  • Version :- Mosquitto 1.5.7
  • Host :- localhost:1883 (127.0.0.1)

PHP

  • Version :- 7.3.9-1~deb10u1 (Zend Version 3.3.9)
  • Modules :- apache2handlercalendar Core ctype curl date dom v20031129exif fileinfo filter ftp gd gettext hash iconv json v1.7.0libxml mbstring mosquitto v0.4.0mysqli mysqlnd vmysqlnd 5.0.12-dev - 20150407 - $Id: 7cc7cc96e675f6d72e5cf0f267f48e167c2abb23 $openssl pcre PDO pdo_mysql Phar posix readline redis v5.0.2Reflection session shmop SimpleXML sockets sodium SPL standard sysvmsg sysvsem sysvshm tokenizer wddx xml xmlreader xmlwriter xsl Zend OPcache zlib

Pi

  • Model :- Raspberry Pi 3 Model B Rev 1.2 - 1GB (Embest)

  • Serial num. :- 539EE22D

  • CPU Temperature :- 44.55°C

  • GPU Temperature :- 44.0°C

  • emonpiRelease :- emonSD-17Oct19

  • File-system :- read-write

Client Information

Client Information

HTTP

  • Browser :- Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36
  • Language :- en-US,en;q=0.9

Window

  • Size :- 1903 x 967

Screen

  • Resolution :- 1920 x 1200

Thanks,

On the emonhub page, Do you see DEBUG messages?

If not, please click on the Edit Config button and in the window that appears change the log level to DEBUG.

Click Save - no need to restart emonhub, go back to the log window and you should see DEBUG messages.

You are looking for messages that say NEW FRAME e.g.

2020-07-22 09:54:42,634 DEBUG    RFM2Pi     68399 NEW FRAME : OK 19 221 0 0 0 195 1 26 0 (-66)
2020-07-22 09:54:42,636 DEBUG    RFM2Pi     68399 Timestamp : 1595408082.6345186
2020-07-22 09:54:42,637 DEBUG    RFM2Pi     68399 From Node : 19
2020-07-22 09:54:42,637 DEBUG    RFM2Pi     68399    Values : [22.1, 0, 45.1, 2.6]
2020-07-22 09:54:42,638 DEBUG    RFM2Pi     68399      RSSI : -66

A message that starts NEW FRAME : OK 19 says that there is data from node 19 - that is usually the emonTH node. You will also see a OK 5 that is the emonPi energy monitoring node.

Can you post the equivalent to above please.

Hi, yes I do see those records in the logs although hadn’t realised that the logs are cleared on a restart so I need to get the logs next time it goes wrong before I restart anything.

2020-07-22 08:20:39,505 DEBUG    RFM2Pi     445 NEW FRAME : OK 23 214 0 0 0 10 2 30 0 1 0 0 0 (-41)
2020-07-22 08:20:39,507 DEBUG    RFM2Pi     445 Timestamp : 1595402439.5056157
2020-07-22 08:20:39,514 DEBUG    RFM2Pi     445 From Node : 23
2020-07-22 08:20:39,520 DEBUG    RFM2Pi     445    Values : [21.400000000000002, 0, 52.2, 3, 1]
2020-07-22 08:20:39,525 DEBUG    RFM2Pi     445      RSSI : -41
2020-07-22 08:20:39,531 DEBUG    RFM2Pi     445 Sent to channel(start)' : ToEmonCMS
2020-07-22 08:20:39,537 DEBUG    RFM2Pi     445 Sent to channel(end)' : ToEmonCMS
2020-07-22 08:20:39,549 DEBUG    MQTT       Publishing: emon/emonth5/temperature 21.400000000000002
2020-07-22 08:20:39,551 DEBUG    MQTT       Publishing: emon/emonth5/external temperature 0
2020-07-22 08:20:39,552 DEBUG    MQTT       Publishing: emon/emonth5/humidity 52.2
2020-07-22 08:20:39,553 DEBUG    MQTT       Publishing: emon/emonth5/battery 3
2020-07-22 08:20:39,555 DEBUG    MQTT       Publishing: emon/emonth5/pulsecount 1
2020-07-22 08:20:39,556 DEBUG    MQTT       Publishing: emon/emonth5/rssi -41
2020-07-22 08:20:39,558 INFO     MQTT       Publishing: emonhub/rx/23/values 21.400000000000002,0,52.2,3,1,-41

@dc1970,

When posting code or terminal output, please put 3 backticks (top left of keyboard) on a line before and on a line after, so it formats correctly.

The emonhub logs rotate at 500 Mb so if it continues to receive data from elsewhere, the logs will rotate.

Do you have more than one emonTH? If so, is it just that one that fails?

No this is my only one, only received the emonPI and emonTH on Friday and first spotted the issue on Saturday.

Have you enabled SSH & are you comfortable with using it?

No I haven’t enabled it but have used in previously with other pi’s.

Ok good. I suggest 2 sessions

tail -f /var/log/emonhub/emonhub.log | grep "NEW FRAME : OK 23"

And you will see when data is received (or not)

journalctl -f -u emonhub

should pick up any exceptions in the python.

OK, thanks for your help, I will give it a try later.

Hi,

I don’t know if this gives anymore insight into the issue I am seeing or not but I have seen the feed from the emonTH stop twice now with this message displayed.

2020-07-24 19:33:29,527 DEBUG    RFM2Pi     8157 NEW FRAME : OK 23 28 1 0 0 135 1 31 0 1 0 0 0 (-73)
2020-07-24 19:34:28,819 DEBUG    RFM2Pi     8170 NEW FRAME : OK 23 28 1 0 0 136 1 31 0 1 0 0 0 (-74)
2020-07-24 19:35:28,019 DEBUG    RFM2Pi     8183 NEW FRAME : OK 23 28 1 0 0 136 1 31 0 1 0 0 0 (-75)
2020-07-24 19:36:27,383 DEBUG    RFM2Pi     8196 NEW FRAME : OK 23 28 1 0 0 136 1 31 0 1 0 0 0 (-74)
tail: /var/log/emonhub/emonhub.log: file truncated

Did you confirm in emoncms that the emonTH stopped updating when that message came up?

The message indicates that the file you are “tailing” was truncated, truncating the log file is part of the new log rotation process and appears to be simply kicking you out of the tail, if you were to repeat the tail command you would continue to see the updates (if they are still happening). I think this is not linked to your initial issue as emonhub continues to function 100% when the logs are truncated (or at least it used to).

When it fails next, can you check the services on the admin page to see if the emonhub and MQTT services are running? Also can you confirm if any other data continues when the emonTH data stops?

Could this be related to the cause of the truncation?

With --follow (-f), tail defaults to following the file descriptor,
which means that even if a tail’ed file is renamed, tail will
continue to track its end. This default behavior is not desirable
when you really want to track the actual name of the file, not the
file descriptor (e.g., log rotation). Use --follow=name in that
case. That causes tail to track the named file in a way that
accommodates renaming, removal and creation.

Ref: tail(1) - Linux manual page

I ran it again all day yesterday without a failure until after I shut down the computer I was monitoring it on.

From the previous failures I can confirm that I have seen that the emonTH feed stopped updating at the same time as the last message was recorded in the SSH session. What I cant confirm at the moment is when the truncate message appeared but as I said previously I have seen the emonTH inputs stop updating twice with this message being the last thing in the SSH session watching the log file. I have seen the truncate message at other times in the SSH sessions I have been running and it has continued happily. I have checked the admin page when this happens and everything looks to be running normally but do bare in mind I have only been using this for just over a week.

I also noticed the last time this happened I had to restart emonHub twice before the feed started again.

This is as I expected, the “truncate” error is a red herring regards the issue you are seeking to cure. It’s more an unhelpful annoyance than anything.

This would be normal as you are specifically filtering only the emonTH messages so when the data stops the messages stop (or vice versa?).

Again, to be expected. When the emonTH stops the tail session will continue to be live, albeit with no activity due to the lack of emonTH messages, until the log file is rotated again causing the truncate error.

Not really, the reason the logrotation uses copytruncate is to preserve BOTH the file descriptor and the file name so that the program (emonhub) can continue to log to the exact same file with no change, it’s a rather crude way of getting around handling the rotations in a more robust way.

Although I’m no expert, I’m guessing the tail get orphaned by the truncate because one moment it is tailing the END of a file that suddenly is no longer part of that file as the file is reduced to 0 bytes.

The biggest hurdle here is the size of the logs, originally 2x 5M, they were replaced by 2x 1M which was found time and time again to be way too small and now they are apparently 0.5M.

I think that was supposed to be “500 Kb” as the whole log partition is only 50M for all current logs.

I think the “truncate” message will just need to be “put up with” when using tail. When ever that message pops up just end the tail with ctrl-c, then use the up arrow and enter. That will repeat the last command and the tail will continue, independent of whether the emonTH has failed or not.

Yes sorry (edited).

So after the truncate message, you go no further data from node 23? If so, very odd. However, you do not know whether the tail stopped before the truncate message. i.e. last message at 19:36, truncate message 30 minutes later.

It certainly tells me the RFM stops getting data from the TH which makes me think it is faulty. @Gwil in the shop may be able to help.

emonhub logs are rotated in code by emonhub, not by logrotate.

Not really I have explained above already that the truncate error happens independent of the actual error and therefore one must come first and it can be either. if the file truncates before the emonTH stops the tail output stops because of the truncate error or if the emonTH stops the tail session will continue until the file is truncated. It’s a red herring. Of course they can happen around the same time, but even if they don’t it could appear they had.

Whilst it is too early to rule out a rare faulty unit, it could also be the antenna touching the gpio pins, not seated correctly or even more likely a PSU isssue since the rfm will always be the weakest link when it comes to power stability.

The emonPi has a watchdog that reinitialises the rfm every so often, that isn’t in the rfm firmware, but there again the latest firmware has been pretty stable. Assuming this RFM69Pi has the latest FW and isn’t from the back of shelf.

If the RFM69Pi was physically faulty it is less likely to come to life after emonhub is restarted, that would suggest the device is being reset by emonhub restarting the serial connection. This is how the RFM issues used to manifest some years ago, there are many threads on it and at least one of them has a little test script that the user can execute instead of restarting emonhub to reset the RFM2Pi and if the data starts, we know it’s the age old issue and probably a PSU issue if running the latest FW.

When did that change? As I recall all the emonSD images have always rotated the emonhub logs way before the emonhub hardcoded rotation at 5M. Originally by logrotate only at 1M then at 3M by log2ram+logrotate more recently. I was surprised by your comment about only 0.5M but took it you knew something I didn’t.

The emonSD has a global rotation size 100k

however the logrotate config for emonhub has maxsixe 3M

and if no logrotation takes place then eventually emonhub will rotate itself at 5M

so if the rotation is happening at 0.5M I’m not sure where that is coming from.

Nope. I see these truncate errors regularly when tailing the emonhub log, and the tail just keeps going. It isn’t an error, it is an info message telling you the file has been truncated.

I’m certain the output of the tail has stopped before the truncate message (i.e. the receipt of data has stopped).

No idea when it changed, I just know it does. There was a long discussion because at one point the service file didn’t pass the log file parameter so the logs were not rotated. Just checked it is 5M.

Still a faulty unit IMHO as it is new.

@dc1970 - did you buy a DC 5V PSU with the emonPi?

[edit]
Just double checked the logrotate config and that has a maxsize of 3M. Actually, it should be larger than the hardcoded value else there is a risk of conflict. logrotate should be a backstop I’d suggest. It will depend on how much you are logging of course.

@TrystanLea comment?