Data not logged locally

I need help with a critical issue:

Short story: On January 3, data collection at an Emonbase stopped. The pi is running since when without reboot, so I hope the data is still in the redis queue and I can save it to the disk.
What can I try to do without making it worse?
feedwriter, apache, redis, mosquitto and emonhub are all running. Nevertheless I restarted the feedwriter and the Mosquitto service, which didn’t help.

I’ve noticed two things: The last log entry in /var/log/emoncms.log is also from January 3, 2018 and the log partition seems to be full.

It’s critical to me since I am performing a measurement for a client with an Emontx that is directly connected to a Pi with an emonSD image. It’s being deployed remotely, but I have SSH access to it. The data is logged locally on the pi and also sent to a remote emoncms server. Due to some data gaps on the remote server I have to use the data stored on the pi.

Update 1:
This gives some more information where the problem might be:

[email protected](ro):~$ sudo systemctl status emonhub
● emonhub.service - LSB: Start/stop emonHub
   Loaded: loaded (/etc/init.d/emonhub)
   Active: active (running) since Do 2018-02-22 03:31:29 UTC; 1 months 21 days ago
  Process: 30918 ExecStop=/etc/init.d/emonhub stop (code=exited, status=0/SUCCESS)
  Process: 31188 ExecStart=/etc/init.d/emonhub start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/emonhub.service
           └─31235 python /usr/share/emonhub/emonhub.py --config-file /home/pi/data/emonhub.conf --logfile /var/...

Apr 14 15:30:21 rexometer emonhub[31188]: self.stream.flush()
Apr 14 15:30:21 rexometer emonhub[31188]: IOError: [Errno 28] No space left on device
Apr 14 15:30:21 rexometer emonhub[31188]: Logged from file EmonHubMqttInterfacer.py, line 142
Apr 14 15:30:22 rexometer emonhub[31188]: Traceback (most recent call last):
Apr 14 15:30:22 rexometer emonhub[31188]: File "/usr/lib/python2.7/logging/__init__.py", line 883, in emit
Apr 14 15:30:22 rexometer emonhub[31188]: self.flush()
Apr 14 15:30:22 rexometer emonhub[31188]: File "/usr/lib/python2.7/logging/__init__.py", line 843, in flush
Apr 14 15:30:22 rexometer emonhub[31188]: self.stream.flush()
Apr 14 15:30:22 rexometer emonhub[31188]: IOError: [Errno 28] No space left on device
Apr 14 15:30:22 rexometer emonhub[31188]: Logged from file EmonHubJeeInterfacer.py, line 96
[email protected](ro):~$ df -h
Dateisystem    Größe Benutzt Verf. Verw% Eingehängt auf
/dev/root       3,4G    2,0G  1,3G   61% /
devtmpfs        483M       0  483M    0% /dev
tmpfs           487M       0  487M    0% /dev/shm
tmpfs           487M     56M  432M   12% /run
tmpfs           5,0M    4,0K  5,0M    1% /run/lock
tmpfs           487M       0  487M    0% /sys/fs/cgroup
tmpfs            40M    6,1M   34M   16% /var/lib/openhab
tmpfs           1,0M    8,0K 1016K    1% /var/lib/dhcpcd5
tmpfs           1,0M    4,0K 1020K    1% /var/lib/dhcp
tmpfs            50M     50M     0  100% /var/log
tmpfs            30M     36K   30M    1% /tmp
/dev/mmcblk0p1   60M     21M   40M   35% /boot
/dev/mmcblk0p3  3,5G    159M  3,2G    5% /home/pi/data

So it’s seems emonhub is not working because the log partition is full. Can I just delete the log files and restart emonhub?

Update 2:
I did delete some old rotated logs. Is it safe to restart emonhub now without losing data in the queue (if there are any)?

Update 3:
I restarted emonhub, mqtt_input and feedwriter and I thought it had worked because the file size of the phpfina file climbed up from 6 MB to 10 MB. But exporting the values to csv using this tool: Access data files & convert to csv - #2 by Jon showed no data between january 3 and now. That seems quite odd, doesn’t it?

Update 4:
The mentioned export tool showed with the 6 MB file:

filesize: 6747752
npoints: 1686938

and with the new one:

filesize: 10237724
npoints: 2559431

Difference: 872493 points. So there is actually more data than the tool is exporting. Any idea how to get to the data?

Hi Elias, Sorry I wasn’t able to help last night, it seems you have progressed from the initial issue and got everything working again, although I think the data might be lost.

I have seen what you describe in the past, although this was when using the original emonhub not the emonpi variant so I’m not 100% the behavior would be exactly the same, but have no reason to suspect any difference at this stage.

What I’ve previously seen is that when the /var/log/ partition fills up, it prevents emonhub from recording it’s log, it does not stop emonhub from working. I have often found a remote emonhub device is still working but reporting a 100% full log partition, when I ssh in I find emonhub has stopped logging messages, but otherwise working just fine and by deleting stuff from the log partition emonhub has resumed logging messages.

If this reoccurs, it is useful to check the file sizes for clues as to why the log filled up, eg if you have a rotated abc.log.1 of 1Mb and the current abc.log is 10Mb, something has caused abc to log excessive messages since the last hourly rotation. If it is a current log that’s maxing you out, deleting the rotated logs might not help much and when the massive current log gets rotated, it’s still hanging around using valuable space.

I normally find the biggest culprits are the system logs like syslog. I simply replace the offending log file with an empty file using echo and delete any rotated logs whilst I’m there.

I have found I my cases the most common cause for this log partition overload is due to ssh brute force attacks, I have reduced this massively by blocking password log-ins (ssh key only) and changing the ssh port away from 22 so that it’s not the main focus of attack.

Although emonhub can continue whilst the log partition is full, there are services that cannot, back when I used to see this fairly frequently, the tail tell sign for a full log partition for me was that my cron based nodes stopped posting. I found that cron doesn’t run when the log partition is full, so although emonhub and all my other nodes still ran, my one cron node stopped.

This is significant for the emonSD because if this is still relevant (I’ve had no reason to check it is still happening recently so the OS may have changed) when the log partition is full, log rotate cannot be triggered to rotate out the bloated logs to make room for new logs as it is triggered by cron. I have brought this up before and was told it’s not an issue, but that was back when I knew it was definitely an issue.

So why did your data stop? I cannot answer that with any certainty, but since the dates align, I assume either one of the required softwares mosquitto, mqtt_input, apache, bufferwrite etc etc as less robust than emonhub and also get stuck when the log partition is full like cron OR perhaps one of these services failed causing the logs to fill up? (a bit of a chicken or egg situation).

Just as your own findings show, I suspect emonhub has been running throughout the problem (Since 22 Feb), when you restarted emonhub, that would have deleted any buffered data it couldn’t post at shut down, but since emonhub was probably running throughout the issue, I suspect there was no buffered data when you restarted.

Unfortunately no. In actual fact with phpfina all you need to make a datafile of any size is one data point. the way the phpfina works is it only records the value, not the timestamp. I puts the value in a position relative to the begining of the file that represents the timestamp. eg you create a 10s phpfina feed with a start time of 6am and post a single datapoint to it at 8am, it will result in a file of 2880bytes, 2hrs/10s=720 “slots” at 4bytes each.

So as soon as your system was back on it’s feet, the first new data would have been put in it’s correct position in the feed file and the “hole” is just null datapoints.

Based on what you’ve said, I think the data is lost. If you still have the online account there might be a copy of the missing data thre since emonhub would have continued to post via http so the local fails shouldn’t have affected that. How you might merge that data (if it’s there) I do not know off-hand.

You might get a insight as to what might of happened if you have some of the January logfiles. The order of events might be determinable from them, but beware that it will of all occured over the span of an hour or 2 and the logfile times may not align. emoncms always uses unixtime (not TZ’s) and emonhub always uses the local time set by the Pi’s TZ the same as all the other system logs.

Thank you very much for your very detailed answer! Thanks for the good explanation of how PHPFINA works. This explains why the file has increased by a few MB without containing the desired data.

I have also tracked the other linked conversations from you and understand that the problem could be related to the MQTT implementation. In the mosquitto.log are also many entries like this:

1514938644: mosquitto version 1.4.10 (build date Thu, 25 Aug 2016 10:12:09 +0100) starting
1514938644: Config loaded from /etc/mosquitto/mosquitto.conf.
1514938644: Opening ipv4 listen socket on port 1883.
1514938644: Opening ipv6 listen socket on port 1883.
1514938644: New connection from 127.0.0.1 on port 1883.
1514938644: New client connected from 127.0.0.1 as d31aa927-77ce-400a-8f5a-b3b77ef65c7d (c1, k60, u'emonpi').
1514938644: Socket error on client d31aa927-77ce-400a-8f5a-b3b77ef65c7d, disconnecting.
1514938644: New connection from 127.0.0.1 on port 1883.
1514938644: New client connected from 127.0.0.1 as 19f1c389-67f7-49ba-8d7a-e5233ffec19b (c1, k60, u'emonpi').
1514938645: Socket error on client 19f1c389-67f7-49ba-8d7a-e5233ffec19b, disconnecting.
1514938645: New connection from 127.0.0.1 on port 1883.
1514938645: New client connected from 127.0.0.1 as bf3a1c5a-7df0-43b7-bcce-587a2f84624e (c1, k60, u'emonpi').
1514938645: Socket error on client bf3a1c5a-7df0-43b7-bcce-587a2f84624e, disconnecting.
1514938645: New connection from 127.0.0.1 on port 1883.
1514938645: New client connected from 127.0.0.1 as 8eeebcce-5739-44ad-a183-7acc9d033cd7 (c1, k60, u'emonpi').
1514938645: Socket error on client 8eeebcce-5739-44ad-a183-7acc9d033cd7, disconnecting.
1514938645: New connection from 127.0.0.1 on port 1883.
1514938645: New client connected from 127.0.0.1 as 9d7e6303-2a98-4f30-8eb0-15e21cae7629 (c1, k60, u'emonpi').
1514938645: Socket error on client 9d7e6303-2a98-4f30-8eb0-15e21cae7629, disconnecting.
1514938645: New connection from 127.0.0.1 on port 1883.

Altogether there are three emontx / emonbase combinations used by the customer. With the other two devices which should be configured exactly the same, the problem did not occur. From all devices I have copied the corresponding PHPFINA files and exported them into CSV-files. Interestingly, the locally recorded data had significantly more gaps than the data on the server, which surprised me. I put both data tracks together with the Python/Pandas function dataframe1.fillna(dataframe2) which works very well. Unfortunately I could not close a gap of 2 days, because the mobile connection was broken during this time and therefore no data was sent to the server.

Update1:
Here is the entire mosquitto.log, perhaps it is helpful in finding the cause:mosquitto.log.txt (100.7 KB)

Update2:
I just noticed that at my emonPi home the log directory is also full. Here the feedwriter has also failed twice in the last time. The emoncms.log takes the most space with 32 MB. It’s full of these messages:

2018-03-24 03:17:01.703|ERROR|phpmqtt_input.php|exception 'Mosquitto\Exception' in /var/www/emoncms/scripts/phpmqtt_input.php:116
Stack trace:
#0 /var/www/emoncms/scripts/phpmqtt_input.php(116): Mosquitto\Client->loop()
#1 {main}
2018-03-24 03:17:01.704|ERROR|phpmqtt_input.php|exception 'Mosquitto\Exception' in /var/www/emoncms/scripts/phpmqtt_input.php:116
Stack trace:
#0 /var/www/emoncms/scripts/phpmqtt_input.php(116): Mosquitto\Client->loop()
#1 {main}
2018-03-24 03:17:01.706|ERROR|phpmqtt_input.php|exception 'Mosquitto\Exception' in /var/www/emoncms/scripts/phpmqtt_input.php:116

There are no new entries after this date, probably because logrotate no longer works.
Frankly, I do not believe that this is the same problem as originally described here.

Are you saying that some data got through on the failed setup? ie only one of 3 emonTx’s stopped? Or are you saying, of 3 separate and independent, emonTx/emonBase pairs, only one failed?

That doesn’t surprise me, The closer I look at the way MQTT is implemented on the emonSD, the less impressed I am.

There is no buffering for the local logging and it has just dawned on me that since there are no timestamps used on MQTT until the data lands at emoncms, so even if emonhub or the mosquito broker had buffered the data, when it was able to get through, the first data to land would get the timestamp of “now” and everything that follows would get a similar timestamp, since emoncms truncates timestamps to 1sec, every topic that lands in a 1sec period would get the same timestamp and (when using phpfina) everything that lands over the next 10s will be ignored, the first data to land 10s later will get the next 10s slot and so on until it catches up.

There is no way to save MQTT data, it’s probably better it was lost, otherwise it would have been very wrong and perhaps not obvious why. IMO if the MQTT data isn’t received in realtime, the best it can do is skew the data with delays and omissions.

Sounds interesting, how does that work?

Oh dear! But again, doesn’t surprise me as It doesn’t take a lot for a fault to run wild and fill the partition, it is rotated every hour, a lot can happen in that time and once it’s full, it’s stuck if cron can’t run.

The second is the case:

This is very interesting with MQTT implementation, looks as if the concatenation of emonhub, MQTT and then emoncms does not work optimally. The original emonhub version forwarded the data directly to emoncms? If no MQTT is required, it is probably more robust and therefore a better choice.

Here is an example that explains the use of pandas fillna:

import pandas as pd

# Import feed data exported from the feed page.
power1_remote = pd.read_csv("feed_power1_remote.csv", sep=',', skipinitialspace=True, delim_whitespace=False)			
power1_local = pd.read_csv("feed_power1_local.csv", sep=',', skipinitialspace=True, delim_whitespace=False)

# Set Index = datetime
power1_remote.index = pd.to_datetime(power1_remote.iloc[:,0], format='%d/%m/%Y %H:%M:%S') 
power1_local.index = pd.to_datetime(power1_local.iloc[:,0], format='%d/%m/%Y %H:%M:%S') 			

# Calculate minute averages. This step is important because it creates a continuous datetime index. Missing data is marked with NaN.
power1_remote_min = power1_remote.resample('1min').mean() 	
power1_local_min = power1_local.resample('1min').mean() 						

# Fill gaps: If an event in the series of power1_remote_min is marked with NaN, then the value from the other dataframe (power1_local_min) is used with the same timestamp.
power1 = power1_remote_min.fillna(power1_local_min)

# Reference for fillna: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html

Just for information: The feedwriter of my own emonpi at home stopped again on May 18th. These are the last lines of the log:

2018-04-15 22:13:32.540|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-04-15 22:13:32.565|WARN|phpmqtt_input.php|Connecting to MQTT server: Connection Accepted.: code: 0
2018-05-02 03:39:34.994|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-05-02 03:39:53.526|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-05-02 03:40:07.571|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-05-02 03:42:08.260|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-05-02 03:42:08.343|WARN|phpmqtt_input.php|Connecting to MQTT server: Connection Accepted.: code: 0
2018-05-18 00:08:40.327|WARN|phpmqtt_input.php|Not connected, retrying connection
2018-05-18 00:08:40.466|WARN|phpmqtt_input.php|Connecting to MQTT server: Connection Accepted.: code: 0
2018-05-19 13:47:52.822|ERROR|input_controller.php|{"success": false, "message": "Format error, json string supplied is not valid"}
Server Information
Emoncms Version low-write 9.8.28 : 2018.01.27
Modules Administration : App v1.1.0 : Backup v1.1.2 : EmonHub Config v1.0.0 : Dashboard v1.1.1 : EventProcesses : Feed : Graph v1.2.0 : Input : postprocess : CoreProcess : Schedule : setup : Time : User : Visualisation : WiFi v1.0.0
Buffer loading…
Writer Daemon is not running, start it at ~/scripts/feedwriter
Server OS Linux 4.4.26-v7+
Host emonpi emonpi (127.0.1.1)
Date 2018-05-21 22:12:59 UTC
Uptime 22:12:59 up 77 days, 13:18, 0 users, load average: 1.18, 1.15, 1.01
HTTP Server Apache/2.4.10 (Raspbian) HTTP/1.1 CGI/1.1 80
MySQL Version 5.5.52-0+deb8u1
Host localhost (127.0.0.1)
Date 2018-05-21 22:12:59 (UTC 00:00‌​)
Stats Uptime: 6700711 Threads: 2 Questions: 1001336 Slow queries: 13 Opens: 61 Flush tables: 1 Open tables: 52 Queries per second avg: 0.149
Redis Version 2.8.17
Host localhost:6379 (127.0.0.1)
Size 645 keys (176.76M)
Uptime 77 days
MQTT Version 1.4.10
Host localhost:1883 (127.0.0.1)
Pi CPU Temp 53.69°C
Release emonSD-07Nov16
File-system Set root file-system temporarily to read-write, (default read-only)
Memory RAM Used: 69.50% Total: 973.11 MB Used: 676.28 MB Free: 296.82 MB
Disk Mount Stats
/ Used: 67.01% Total: 3.33 GB Used: 2.23 GB Free: 959.95 MB
/boot Used: 34.72% Total: 59.95 MB Used: 20.81 MB Free: 39.13 MB
/home/pi/data Used: 13.82% Total: 3.46 GB Used: 489.81 MB Free: 2.8 GB
PHP Version 5.6.27-0+deb8u1 (Zend Version 2.6.0)
Modules apache2handler : bcmath : bz2 : calendar : Core v5.6.27-0+deb8u1 : ctype : curl : date v5.6.27-0+deb8u1 : dba : dio v0.0.4RC4 : dom v20031129 : ereg : exif v1.4 : fileinfo v1.0.5 : filter v0.11.0 : ftp : gettext : hash v1.0 : iconv : json v1.3.6 : libxml : mbstring : mcrypt : mhash : mosquitto v0.3.0 : mysql v1.0 : mysqli v0.1 : openssl : pcre : PDO v1.0.4dev : pdo_mysql v1.0.2 : Phar v2.0.2 : posix : readline v5.6.27-0+deb8u1 : redis v2.2.7 : Reflection : session : shmop : SimpleXML v0.1 : soap : sockets : SPL v0.2 : standard v5.6.27-0+deb8u1 : sysvmsg : sysvsem : sysvshm : tokenizer v0.1 : wddx : xml : xmlreader v0.1 : xmlwriter v0.1 : Zend OPcache v7.0.6-devFE : zip v1.12.5 : zlib v2.0 :

Client Information
HTTP Browser Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/604.4.7 (KHTML, like Gecko) Version/11.0.2 Safari/604.4.7
Screen Resolution 1280 x 800
Window Size 1706 x 848