Having to restart Emonhub to receive data from emonTH

No this is my only one, only received the emonPI and emonTH on Friday and first spotted the issue on Saturday.

Have you enabled SSH & are you comfortable with using it?

No I haven’t enabled it but have used in previously with other pi’s.

Ok good. I suggest 2 sessions

tail -f /var/log/emonhub/emonhub.log | grep "NEW FRAME : OK 23"

And you will see when data is received (or not)

journalctl -f -u emonhub

should pick up any exceptions in the python.

OK, thanks for your help, I will give it a try later.

Hi,

I don’t know if this gives anymore insight into the issue I am seeing or not but I have seen the feed from the emonTH stop twice now with this message displayed.

2020-07-24 19:33:29,527 DEBUG    RFM2Pi     8157 NEW FRAME : OK 23 28 1 0 0 135 1 31 0 1 0 0 0 (-73)
2020-07-24 19:34:28,819 DEBUG    RFM2Pi     8170 NEW FRAME : OK 23 28 1 0 0 136 1 31 0 1 0 0 0 (-74)
2020-07-24 19:35:28,019 DEBUG    RFM2Pi     8183 NEW FRAME : OK 23 28 1 0 0 136 1 31 0 1 0 0 0 (-75)
2020-07-24 19:36:27,383 DEBUG    RFM2Pi     8196 NEW FRAME : OK 23 28 1 0 0 136 1 31 0 1 0 0 0 (-74)
tail: /var/log/emonhub/emonhub.log: file truncated

Did you confirm in emoncms that the emonTH stopped updating when that message came up?

The message indicates that the file you are “tailing” was truncated, truncating the log file is part of the new log rotation process and appears to be simply kicking you out of the tail, if you were to repeat the tail command you would continue to see the updates (if they are still happening). I think this is not linked to your initial issue as emonhub continues to function 100% when the logs are truncated (or at least it used to).

When it fails next, can you check the services on the admin page to see if the emonhub and MQTT services are running? Also can you confirm if any other data continues when the emonTH data stops?

Could this be related to the cause of the truncation?

With --follow (-f), tail defaults to following the file descriptor,
which means that even if a tail’ed file is renamed, tail will
continue to track its end. This default behavior is not desirable
when you really want to track the actual name of the file, not the
file descriptor (e.g., log rotation). Use --follow=name in that
case. That causes tail to track the named file in a way that
accommodates renaming, removal and creation.

Ref: tail(1) - Linux manual page

I ran it again all day yesterday without a failure until after I shut down the computer I was monitoring it on.

From the previous failures I can confirm that I have seen that the emonTH feed stopped updating at the same time as the last message was recorded in the SSH session. What I cant confirm at the moment is when the truncate message appeared but as I said previously I have seen the emonTH inputs stop updating twice with this message being the last thing in the SSH session watching the log file. I have seen the truncate message at other times in the SSH sessions I have been running and it has continued happily. I have checked the admin page when this happens and everything looks to be running normally but do bare in mind I have only been using this for just over a week.

I also noticed the last time this happened I had to restart emonHub twice before the feed started again.

This is as I expected, the “truncate” error is a red herring regards the issue you are seeking to cure. It’s more an unhelpful annoyance than anything.

This would be normal as you are specifically filtering only the emonTH messages so when the data stops the messages stop (or vice versa?).

Again, to be expected. When the emonTH stops the tail session will continue to be live, albeit with no activity due to the lack of emonTH messages, until the log file is rotated again causing the truncate error.

Not really, the reason the logrotation uses copytruncate is to preserve BOTH the file descriptor and the file name so that the program (emonhub) can continue to log to the exact same file with no change, it’s a rather crude way of getting around handling the rotations in a more robust way.

Although I’m no expert, I’m guessing the tail get orphaned by the truncate because one moment it is tailing the END of a file that suddenly is no longer part of that file as the file is reduced to 0 bytes.

The biggest hurdle here is the size of the logs, originally 2x 5M, they were replaced by 2x 1M which was found time and time again to be way too small and now they are apparently 0.5M.

I think that was supposed to be “500 Kb” as the whole log partition is only 50M for all current logs.

I think the “truncate” message will just need to be “put up with” when using tail. When ever that message pops up just end the tail with ctrl-c, then use the up arrow and enter. That will repeat the last command and the tail will continue, independent of whether the emonTH has failed or not.

Yes sorry (edited).

So after the truncate message, you go no further data from node 23? If so, very odd. However, you do not know whether the tail stopped before the truncate message. i.e. last message at 19:36, truncate message 30 minutes later.

It certainly tells me the RFM stops getting data from the TH which makes me think it is faulty. @Gwil in the shop may be able to help.

emonhub logs are rotated in code by emonhub, not by logrotate.

Not really I have explained above already that the truncate error happens independent of the actual error and therefore one must come first and it can be either. if the file truncates before the emonTH stops the tail output stops because of the truncate error or if the emonTH stops the tail session will continue until the file is truncated. It’s a red herring. Of course they can happen around the same time, but even if they don’t it could appear they had.

Whilst it is too early to rule out a rare faulty unit, it could also be the antenna touching the gpio pins, not seated correctly or even more likely a PSU isssue since the rfm will always be the weakest link when it comes to power stability.

The emonPi has a watchdog that reinitialises the rfm every so often, that isn’t in the rfm firmware, but there again the latest firmware has been pretty stable. Assuming this RFM69Pi has the latest FW and isn’t from the back of shelf.

If the RFM69Pi was physically faulty it is less likely to come to life after emonhub is restarted, that would suggest the device is being reset by emonhub restarting the serial connection. This is how the RFM issues used to manifest some years ago, there are many threads on it and at least one of them has a little test script that the user can execute instead of restarting emonhub to reset the RFM2Pi and if the data starts, we know it’s the age old issue and probably a PSU issue if running the latest FW.

When did that change? As I recall all the emonSD images have always rotated the emonhub logs way before the emonhub hardcoded rotation at 5M. Originally by logrotate only at 1M then at 3M by log2ram+logrotate more recently. I was surprised by your comment about only 0.5M but took it you knew something I didn’t.

The emonSD has a global rotation size 100k

however the logrotate config for emonhub has maxsixe 3M

and if no logrotation takes place then eventually emonhub will rotate itself at 5M

so if the rotation is happening at 0.5M I’m not sure where that is coming from.

Nope. I see these truncate errors regularly when tailing the emonhub log, and the tail just keeps going. It isn’t an error, it is an info message telling you the file has been truncated.

I’m certain the output of the tail has stopped before the truncate message (i.e. the receipt of data has stopped).

No idea when it changed, I just know it does. There was a long discussion because at one point the service file didn’t pass the log file parameter so the logs were not rotated. Just checked it is 5M.

Still a faulty unit IMHO as it is new.

@dc1970 - did you buy a DC 5V PSU with the emonPi?

[edit]
Just double checked the logrotate config and that has a maxsize of 3M. Actually, it should be larger than the hardcoded value else there is a risk of conflict. logrotate should be a backstop I’d suggest. It will depend on how much you are logging of course.

@TrystanLea comment?

@dc1970, did you try and run this? An exception in the Python will be seen here.

Hi,

I did purchase the power adapter with the emonPI and have been running the journalctl command mentioned above and I have not seen any issues reported.

One observation I can add to this is that the comms between the emonPI and the emonTH seem a lot more reliable when I have a connected SSH session. I don’t know the intricacies of the comms between the units at this point but it is starting to look more and more like its possibly a power saving feature that goes in to sleep mode or a comms failure that exceeds its max retries and then is silently failing.

Can the logging output be changed through the config or would I need to modify the code which I believe is available on github. If I modify the code can a custom version of the code be loaded on to the emonPI and is there a guide to do it. I haven’t done any PI software development but I am a software developer so would be capable of making changes with a bit of time/effort.

Thanks

David

But unless you were actually watching it, you don’t know when the last RFM message was returned in relation to the log message. You could try this (I’m running it to see if it works)

tail -f /var/log/emonhub/emonhub.log | perl -pe 's/^/$_=qx(date +%T_); chomp; $_/e' | grep "NEW FRAME : OK 19"

I do appreciate what you are seeing, but really don’t think the logging or the SSH session being closed has anything to do with it as there are 100s of these devices out in the wild and you seem to be the only one with an issue.

If you just restart emonhub (i.e. from the Web UI) this restarts the data from the emonTH - correct?

Have you run an update?

If so then you could try reflashing the SD Card with a fresh image on the off chance there is something not quite right.

Otherwise, I think you need to email the shop and ask them. You could perhaps try a second enonTH, if both do the same, I suggest it is probably a hardware fault on the emonPi. When you email, reference this discussion please.

Here is something I hadn’t noticed before, this is the top of the admin page and all the logs are empty.

and here is the server info at the same time

‘’’

Server Information

Server Information

Services

  • emonhub :- Active Running
  • emoncms_mqtt :- Active Running
  • feedwriter :- Active Running - sleep 300s 225 feed points pending write
  • service-runner :- Active Running
  • emonPiLCD :- Active Running
  • redis-server :- Active Running
  • mosquitto :- Active Running
  • demandshaper :- Active Running

Emoncms

Server

  • OS :- Linux 4.19.75-v7+
  • Host :- emonpi | emonpi | (192.168.1.34)
  • Date :- 2020-07-27 09:29:43 BST
  • Uptime :- 09:29:43 up 5 days, 13:23, 0 users, load average: 0.13, 0.11, 0.09

Memory

  • RAM :- Used: 20.74%
    • Total :- 975.62 MB
    • Used :- 202.31 MB
    • Free :- 773.31 MB
  • Swap :- Used: 0.00%
    • Total :- 100 MB
    • Used :- 0 B
    • Free :- 100 MB
      Write Load Period

Disk

  • / :- Used: 51.10%
    • Total :- 3.92 GB
    • Used :- 2.01 GB
    • Free :- 1.73 GB
    • Write Load :- 757.62 B/s (5 days 12 hours 52 mins)
  • /boot :- Used: 20.55%
    • Total :- 252.05 MB
    • Used :- 51.79 MB
    • Free :- 200.26 MB
    • Write Load :- 0 B/s (5 days 12 hours 52 mins)
  • /var/opt/emoncms :- Used: 0.05%
    • Total :- 9.98 GB
    • Used :- 5.24 MB
    • Free :- 9.47 GB
    • Write Load :- 58.38 B/s (5 days 12 hours 52 mins)
  • /var/log :- Used: 2.18%
    • Total :- 50 MB
    • Used :- 1.09 MB
    • Free :- 48.91 MB
    • Write Load :- n/a

HTTP

  • Server :- Apache/2.4.38 (Raspbian) HTTP/1.1 CGI/1.1 80

MySQL

  • Version :- 5.5.5-10.3.17-MariaDB-0+deb10u1
  • Host :- localhost:6379 (127.0.0.1)
  • Date :- 2020-07-27 09:29:42 (UTC 01:00‌​)
  • Stats :- Uptime: 480716 Threads: 12 Questions: 369304 Slow queries: 0 Opens: 51 Flush tables: 1 Open tables: 42 Queries per second avg: 0.768

Redis

  • Version :-
    • Redis Server :- 5.0.3
    • PHP Redis :- 5.0.2
  • Host :- localhost:6379
  • Size :- 125 keys (869.79K)
  • Uptime :- 5 days

MQTT Server

  • Version :- Mosquitto 1.5.7
  • Host :- localhost:1883 (127.0.0.1)

PHP

  • Version :- 7.3.9-1~deb10u1 (Zend Version 3.3.9)
  • Modules :- apache2handlercalendar Core ctype curl date dom v20031129exif fileinfo filter ftp gd gettext hash iconv json v1.7.0libxml mbstring mosquitto v0.4.0mysqli mysqlnd vmysqlnd 5.0.12-dev - 20150407 - $Id: 7cc7cc96e675f6d72e5cf0f267f48e167c2abb23 $openssl pcre PDO pdo_mysql Phar posix readline redis v5.0.2Reflection session shmop SimpleXML sockets sodium SPL standard sysvmsg sysvsem sysvshm tokenizer wddx xml xmlreader xmlwriter xsl Zend OPcache zlib

Pi

  • Model :- Raspberry Pi 3 Model B Rev 1.2 - 1GB (Embest)

  • Serial num. :- 539EE22D

  • CPU Temperature :- 44.01°C

  • GPU Temperature :- 44.5°C

  • emonpiRelease :- emonSD-17Oct19

  • File-system :- read-write

Client Information

Client Information

HTTP

  • Browser :- Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36
  • Language :- en-US,en;q=0.9

Window

  • Size :- 1903 x 1057

Screen

  • Resolution :- 1920 x 1200
'''

Have you tried running a ‘full update’

Can you check permissions on the folders /var/log and /var/log/emonhub/

Hi,

Yes the feed restarts if I restart emonHub through the web UI.

Yes I have run a full update on the day I initially setup the emonPI and from memory nothing was updated as it was all up to date. I retried it again this morning after posting my previous post and again there were no updates reported on the log.

I have been in contact with the store and they seem to think its an issue with the emonPI and are sending me out a replacement.

Thank you both for you help and time on this issue and hopefully the new unit will rectify the problem.

Another observation I have made is that the duration between failures seems to be increasing, when I first noticed the problem it was anywhere between 3 to 6 hours and it seems to have slowly increased over the last week to where it is now running at least 12 hours between failures.