Apache2 Problem?

johnbanks · 23 September 2019 23:45

I have 4 instances of emontx/RPi each with connected sensors and running ver 10.1.6 and connected wirelessly to the local network.

Each instance records sensor data to its SDHC but also sends the data to a fifth RPi (‘watchman’) running ver 10.1.6 which has no sensors and is wire connected to the local network.

The 4 instances with sensors send data to ‘watchman’ using the EmonHubEmoncmsHTTPInterfacer with …

url = http://192.168.1.100/emoncms # watchman’s FIXED IP address

I access all 5 instances remotely using dataplicity.

HOWEVER, periodically I cannot access (via dataplicity) one of the 4 emonTx/RPi instances – not because it’s crashed because I can see via ‘watchman’ that it’s still sending sensor data. But the sends only occur each several minutes whereas normally it’s every few seconds.

Any suggestions would be most welcome.

My attempts at finding a solution have focussed on Apache2 (http). When I check apache2 status on an accessible emonTx/RPi I get …

Sep 23 20:20:19 emonpi apachectl[10187]: AH00558: apache2: Could not reliably determine the server’s fully qualified domain name, using 127.0.1.1. Set the ‘ServerName’ directive globally to suppress this message

If I then add the following to /etc/apache2/apache2.conf …

ServerName localhost

Then restart apache2 and then check apache2 status, the above warning is gone.

Could this be the solution? It’s difficult to test when it’s an intermittent problem.

Guess I need to drive 40 miles and ‘get my hands on’ the problem emonTx/RPi. Using nmap remotely , ‘watchman’ cannot see it on the local network. Perhaps it will mean pulling the power and looking at the SDHC. I suspect the logs in /var will then no longer be there.

Comments most welcome.

borpin · 24 September 2019 06:59

That Apache error doesn’t make any difference.

Is the local instance of EmonCMS continuing to record the data at the normal rate?

If you look at the emoncms.log for watchman, do those sends have a backlog of data? Emonhub will buffer the data and send it as a bulk input when it gets through (one of the major advantages of the HTTP API over MQTT IIRC).

If it does then emonhub is working correctly, and picking up the serial data, just is not able to send it.

Actually, the more I write this, the more I think the problem is Network. An Apache problem would only affect the local instance of EmonCMS not the sending by HTTP API. You could shut Apache down and emonhub would still send the data to Watchman.

How is the network setup if the remote station is 40 miles away - not exactly a Local Area Network .

You could also check the process table with htop. Sometimes this happens when there is a process hogging the CPU.

johnbanks · 24 September 2019 12:10

@borpin

Many thx.

Just to clarify – the 4 emonTx/RPi Nodes and Watchman (RPi running emoncms) are all at my son’s place 40 miles away and are all connected to his local network. I can access each of them individually remotely via dataplicity.

However currently, I cannot access Node13 via dataplicity.

Currently Node13 seems to be operating in a partially failed state. It is sending data to Watchman via the HPPTInterfacer but …

Using Watchman’s web interface, I can see the feeds being received. Normally this happens every up to 30 secs. However I’ve just observed the Node 13 feed delay grow to 65 mins then it started to reduce – 55 mins, 23 mins, etc – until it reached less than 30 secs. At that point it seemed to behave normally.

I guess that suggests a data backlog built up at Node 13 and then was slowly sent to Watchman.

And I can see from Watchman’s web interface that no data has been lost as a result of this behaviour.

I ran htop on Watchman … CPU approx 5% and Memory approx 2%

I’ll make a site visit later this week. It is most unlikely that I’ll be able to access Node 13 and so will just have to pull the power and examine the SDHC. The logs in /var will be lost, of course.

Do you have further thoughts or better ideas?

PS – the capability of the latest emoncms software is fantastic – so thx to all you guys

borpin · 24 September 2019 12:37

So I would definitely say it is a network issue. Reboot the router / DHCP server.

johnbanks · 24 September 2019 22:49

@borpin
I had my son reboot the router waiting 10 secs before repowering.
The problem is still there - it must be at Node 13?
Any further thoughts?
I’ll make a site visit in the next couple of days …
Will probably have to resort to pulling the power on Node 13 and so will lose the logs in /var.
All very frustrating - just when you really need the logs, they ain’t there.
But, of course I’m mindful about constraints - low write, limited space on SDHC, etc

borpin · 25 September 2019 06:38

I still think it is a network issue as you cannot connect by Dataplicity. I will point out again that the data sent to Watchman does not actually have anything to do with Emoncms as it never goes through the local EmonCMS system.

The EmonTX sends the data via RFM to the RFM board on the EmonPi - any problems here and the data is lost.
The RFM board passes the data via a serial interface and it is picked up on the EmonPi by emonhub.
emonhub is then sending it to two places; Watchman via the HTTP API and the local EmonCMS via MQTT. emonhub will buffer the data on the HTTP API interface if the remote server (Watchman) does not respond with an ‘ok’.

So it seems, as EmonHub appears to be buffering the data as there is no data loss at Watchman, that Emonhub is unable to make contact with the Watchman server to send the data.

That is a Network issue. Where the problem is, Router, EmonPi or Watchman is more difficult to ascertain .

It is almost certainly nothing to do with Apache unless it is effectively overlaoding the network with rubbish.

The Emonhub logs might tell you what is happening, but currently these are rotated every hour. If you want a scheme that still stops /var/log filling up (the reason for the hourly rotate), but retains emonhub.log for a while have a look at this post emonSD next steps: filesystem & logrotate - #86 by borpin and adapt as necessary.

One other thing you can do (probably via Dataplicity), is install and run tshark on Watchman and see what HTTP messages are being received and also run it on the EmonPi. I’ve used tshark a bit so search here - it takes a bit of research to get the right parameters so you just get what you need.

borpin · 25 September 2019 06:54

Just a thought, can you access that EmonPi locally via the HTTP Web interface or else SSH in. I’m wondering if it might be a full SD Card.

For remote monitoring the machines health, try installing monit - I pull the data into node-red via a JSON output and then set triggers up from there. Node-RED and XML to JSON | The (Tech) Life of (a) Brian and Systemd Timer and monit | The (Tech) Life of (a) Brian.

johnbanks · 25 September 2019 23:01

@borpin

When I’m on site I’ll try accessing Node 13 via the web interface and via SSHas you suggest.

In the meanwhile I’ve tried it remotely from Watchman … ssh ‘[email protected]’ reports that the connection to port 22 has timed out … Node 13 is 192.168.1.113.

nmap -R 192.168.1.113 from Watchman is quite revealing …

Starting Nmap 7.40 ( https://nmap.org ) at 2019-09-25 11:52 BST
Nmap scan report for 192.168.1.113
Host is up (0.0025s latency).
Not shown: 983 filtered ports
PORT      STATE  SERVICE
80/tcp    closed http
1234/tcp  closed hotline
5000/tcp  open   upnp	
5001/tcp  closed commplex-link
6000/tcp  closed X11
6100/tcp  closed synchronet-db
8008/tcp  open   http
8080/tcp  open   http-proxy
8081/tcp  closed blackice-icecap
9080/tcp  closed glrpc
9091/tcp  open   xmltec-xmlmail
9998/tcp  closed distinct32
9999/tcp  closed abyss
49153/tcp open   unknown
49155/tcp closed unknown
49160/tcp open   unknown
49163/tcp closed unknown
MAC Address: 7A:1F:29:D4:F4:7A (Unknown)
Nmap done: 1 IP address (1 host up) scanned in 91.63 seconds

Note that ports 80, 1234, 9080 & 41955 are closed whereas they are open on Node 12 which is identical to Node 13 and is running OK.

Node 13 does have the following ports open … 8008 http and 8080 http-proxy.
Is there a way of using either of these in lieu of closed port 80? … googling suggests not.

Increasingly I’m becoming resigned to the fact that I need to go on site and pull the power on Node 13. My understanding is the logs will then be lost.

johnbanks · 26 September 2019 00:44

@borpin

Clutching at straws, I looked at Watchman’s logs.

Watchman has a USB HDD to supplement the SDHC as shown in /etc/fstab …

pi@watchman:/var/log/logrotate $ cat /etc/fstab
proc                  /proc           proc    defaults          0       0
##PARTUUID=50c2568e-01  /boot           vfat    defaults          0       2
##PARTUUID=50c2568e-02  /               ext4    defaults,noatime,nodiratime  0       1
##PARTUUID=50c2568e-03  /home/pi/data   ext2    defaults,noatime,nodiratime  0       2

# a swapfile is not a swap partition, no line here
#   use  dphys-swapfile swap[on|off]  for that

tmpfs           /tmp             tmpfs   nodev,nosuid,size=30M,mode=1777 0  0
tmpfs           /var/tmp         tmpfs   nodev,nosuid,size=1M,mode=1777  0  0
tmpfs           /var/log         tmpfs   nodev,nosuid,size=50M,mode=1777 0  0

/dev/mmcblk0p1  /boot           vfat    defaults,noatime,nodiratime 0   2
/dev/sda1      /         ext4    defaults,noatime,nodiratime  0       1
/dev/sda2      /home/pi/data   ext4   defaults,noatime,nodiratime  0    2

Current SDHC/USB HDD space utilization is revealed by df -H …

pi@watchman:/var/log/logrotate $ df -H
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        21G  1.8G   19G   9% /
devtmpfs        508M     0  508M   0% /dev
tmpfs           513M     0  513M   0% /dev/shm
tmpfs           513M   52M  461M  11% /run
tmpfs           5.3M  4.1k  5.3M   1% /run/lock
tmpfs           513M     0  513M   0% /sys/fs/cgroup
tmpfs            53M   53M     0 100% /var/log
tmpfs           1.1M     0  1.1M   0% /var/tmp
tmpfs            32M   70k   32M   1% /tmp
/dev/mmcblk0p1   45M   24M   22M  52% /boot
/dev/sda2       293G  1.5G  277G   1% /home/pi/data
tmpfs           103M     0  103M   0% /run/user/1000

/var/log has zero available space which explains why the final entry in syslog is dated 14 Sep …

pi@watchman:/var/log $ tail -n 25 syslog
Sep 14 12:45:06 watchman dhcpcd[537]: eth0: soliciting a DHCPv6 lease
Sep 14 12:45:14 watchman dhcpcd[537]: eth0: Router Advertisement from fe80::e80:63ff:fe39:372c
Sep 14 12:45:14 watchman dhcpcd[537]: eth0: soliciting a DHCPv6 lease
Sep 14 12:45:22 watchman dhcpcd[537]: eth0: Router Advertisement from fe80::e80:63ff:fe39:372c
Sep 14 12:45:22 watchman dhcpcd[537]: eth0: soliciting a DHCPv6 lease
Sep 14 12:45:27 watchman dhcpcd[537]: eth0: Router Advertisement from fe80::e80:63ff:fe39:372c
Sep 14 12:45:27 watchman dhcpcd[537]: eth0: soliciting a DHCPv6 lease
Sep 14 12:45:27 watchman liblogging-stdlog: action 'action 3' resumed (module 'builtin:omfile') [v8.24.0 try http://www.rsyslog.com/e/2359 ]
Sep 14 12:45:27 watchman liblogging-stdlog: action 'action 3' resumed (module 'builtin:omfile') [v8.24.0 try http://www.rsyslog.com/e/2359 ]
Sep 14 12:45:27 watchman liblogging-stdlog: action 'action 3' resumed (module 'builtin:omfile') [v8.24.0 try http://www.rsyslog.com/e/2359 ]
Sep 14 12:45:27 watchman liblogging-stdlog: action 'action 3' resumed (module 'builtin:omfile') [v8.24.0 try http://www.rsyslog.com/e/2359 ]
Sep 14 12:45:27 watchman liblogging-stdlog: action 'action 3' resumed (module 'builtin:omfile') [v8.24.0 try http://www.rsyslog.com/e/2359 ]
Sep 14 12:45:27 watchman liblogging-stdlog: action 'action 3' resumed (module 'builtin:omfile') [v8.24.0 try http://www.rsyslog.com/e/2359 ]
Sep 14 12:45:27 watchman liblogging-stdlog: action 'action 3' resumed (module 'builtin:omfile') [v8.24.0 try http://www.rsyslog.com/e/2359 ]
Sep 14 12:45:27 watchman liblogging-stdlog: action 'action 3' resumed (module 'builtin:omfile') [v8.24.0 try http://www.rsyslog.com/e/2359 ]
Sep 14 12:45:27 watchman liblogging-stdlog: action 'action 3' resumed (module 'builtin:omfile') [v8.24.0 try http://www.rsyslog.com/e/2359 ]
Sep 14 12:45:27 watchman liblogging-stdlog: action 'action 3' suspended, next retry is Sat Sep 14 12:45:57 2019 [v8.24.0 try http://www.rsyslog.com/e/2007 ]
Sep 14 12:45:33 watchman dhcpcd[537]: eth0: Router Advertisement from fe80::e80:63ff:fe39:372c
Sep 14 12:45:33 watchman dhcpcd[537]: eth0: soliciting a DHCPv6 lease
Sep 14 12:45:37 watchman dhcpcd[537]: eth0: Router Advertisement from fe80::e80:63ff:fe39:372c
Sep 14 12:45:37 watchman dhcpcd[537]: eth0: soliciting a DHCPv6 lease
Sep 14 12:45:42 watchman dhcpcd[537]: eth0: Router Advertisement from fe80::e80:63ff:fe39:372c
Sep 14 12:45:42 watchman dhcpcd[537]: eth0: soliciting a DHCPv6 lease
Sep 14 12:45:52 watchman dhcpcd[537]: eth0: Router Advertisement from fe80::e80:63ff:fe39:372c

Which is undoubtedly explained by the logrotate.log …

pi@watchman:/var/log/logrotate $ tail -n 25 logrotate.log
log /var/log/daemon.log.2 doesn't exist -- won't try to dispose of it
copying /var/log/daemon.log to /var/log/daemon.log.1
error: error writing to /var/log/daemon.log.1: No space left on device
error: error copying /var/log/daemon.log to /var/log/daemon.log.1: No space left on device
rotating log /var/log/apache2/access.log, log->rotateCount is 1
dateext suffix '-20190914'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
renaming /var/log/apache2/access.log.1 to /var/log/apache2/access.log.2 (rotatecount 1, logstart 1, i 1), 
old log /var/log/apache2/access.log.1 does not exist
renaming /var/log/apache2/access.log.0 to /var/log/apache2/access.log.1 (rotatecount 1, logstart 1, i 0), 
old log /var/log/apache2/access.log.0 does not exist
copying /var/log/apache2/access.log to /var/log/apache2/access.log.1
error: error writing to /var/log/apache2/access.log.1: No space left on device
error: error copying /var/log/apache2/access.log to /var/log/apache2/access.log.1: No space left on device
rotating log /var/log/syslog, log->rotateCount is 1
dateext suffix '-20190914'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
renaming /var/log/syslog.1 to /var/log/syslog.2 (rotatecount 1, logstart 1, i 1), 
old log /var/log/syslog.1 does not exist
renaming /var/log/syslog.0 to /var/log/syslog.1 (rotatecount 1, logstart 1, i 0), 
old log /var/log/syslog.0 does not exist
log /var/log/syslog.2 doesn't exist -- won't try to dispose of it
copying /var/log/syslog to /var/log/syslog.1
error: error writing to /var/log/syslog.1: No space left on device
error: error copying /var/log/syslog to /var/log/syslog.1: No space left on device

Should I modify fstab and substantially increase the size of /var/log ? as / has 19G available.

Could this situation have any connection with my problems with Node 13? - whether cause or effect?

As an aside - most users will not have a USB HDD - is 53M an optimal limit for /var/log?

Sorry for all the hassle …

borpin · 26 September 2019 08:04

Need to clear out the /var/log/ first.

Do an ls -la and post to show the largest ones

Look for the larges files and

Ah, I have seen Dataplicity fill up logs as the Dataplicity log is not rotated - might be that.

Could be the issue on all EmonPis

johnbanks · 26 September 2019 21:30

@borpin

Thx for the input.

Watchman’s logs - apache2/access.log, daemon.log & sylog totalled approx 46M of the permitted 53M. Nothingelse came close.

I truncated these (thx for the tip) and increased /var/log to 200M (/ is on a USB HDD partition sized 21Gb)
And after a reboot things seem OK.

Question 1 - When should logrotation recommence? 4 hours after reboot, /var/log/logrotate/logrotate.log exists but has 0 bytes - I thought it happened hourly?

Question 2 - How to stop this periodic log ‘constipation’ problem in future?

However, a downside is that Node 13 seems to have stopped data transmission to Watchman - has not done so for the past 6 hours.

borpin · 27 September 2019 04:27

Is emonhub still running?

johnbanks · 28 September 2019 23:51

@borpin

By way of update …

My problem with logrotate on Watchman has been solved with help from @pb66

Node 13 is a different matter …
I made a site visit. Node 13 is located in a plastic enclosure with the grid supply meter on an outside wall of the house.
On close inspection there was a small fly running around inside the RPi plastic case. And in one corner of the case was a strange orange discolouration/residue which was also on the underside of the RPi board where the ethernet socket is - see pic …

I cleaned the RPi board as best I could but the small gap between the ethernet socket & the board was inaccessible.

I then fired up Node 13. I could not connect to the web interface - Emoncms error: not defined: line 48:column 9, etc). However I could connect via ssh but it was intermittent and the responses were jerky.

I switched the SDHC to a brand new RPi3 B+ and everything worked OK.
That fly or an errant rain drop screwed things up … an unusual problem for the forum??

Many thx for yr support

borpin · 29 September 2019 17:47

No bother.

A network problem but a ‘hard’ rather than a soft problem!

djh · 3 October 2019 15:44

Specifically, it sounds like it was a bug.