danbates
(Daniel Bates)
19 August 2018 13:08
14
pb66:
What do you see if you try to restart it from the command line?
sudo systemctl restart redis-server
have you checked the redis logs and emoncms logs?
What are your Redis settings in emoncms/settings.php?
have you tried flushing redis (very long shot but worth ruling out)
I also notice the redis php extension module is 3.1.1, that’s quite dated. The current version is 4.1.1, but there were “potentially” breaking changes made in v4 (see Possible breaking api changes in php-redis · Issue #836 · emoncms/emoncms · GitHub ) so I’m still running v3.1.6. I doubt this is the cause (guesswork) but worth a mention.
[edit] Oh! And another long shot, are there any redis files in /var/lib ?
sudo ls -la /var/lib/redis/*
redis logs…
emonCMS logs…
redis settings in emoncms/settings.php…
logs and settings.zip (24.0 KB)
see 12:35pm on 19th August for seconds leading up to observed redis CPU use increase.
sudo systemctl restart redis-server
does nothing
Flushing redis.
nothing again.
redis version
is 3.2.6 after update emonPi update button, no change
redis files in /var/lib/redis/*
permission error, can’t view 'em.
sudo ls -la /var/log/redis
total 8
drwxr-xr-x 2 redis redis 60 Aug 19 12:34 .
drwxr-xr-x 11 root root 500 Aug 19 12:34 …
-rw-rw-rw- 1 redis redis 6384 Aug 19 13:04 redis-server.log
and 9. give it inputs to process.
no change.
Greebo
(Greebo)
19 August 2018 21:14
15
danbates:
redis files in /var/lib/redis/*
permission error, can’t view 'em.
Not even with sudo ls -la /var/lib/redis
?
Greebo
(Greebo)
19 August 2018 21:37
16
This is in syslog:
Aug 19 12:34:29 emonpi systemd[1]: [/lib/systemd/system/redis-server.service:14] Unknown lvalue 'RunTimeDirectory' in section 'Service'
Aug 19 12:34:29 emonpi systemd[1]: Starting Advanced key-value store...
RunTimeDirectory
sounds important, it looks like its value couldn’t be loaded from the config?
There’s no errors in the redis-server.log
indicating there’s an issue though, but probably worth checking anyway.
I just found a bug report on Debian bug reports which says:
Jan 06 16:50:13 cai systemd[1]: [/lib/systemd/system/redis-server.service:14] Unknown lvalue 'RunTimeDirectory' in section 'Service'
I think the lvalue should be ‘RuntimeDirectory’ according to man (5) systemd.exec. Note capitalization.
Followed by (in closing the bug report):
Source: redis
Source-Version: 3:3.2.6-2
We believe that the bug you reported is fixed in the latest version of redis, which is due to be installed in the Debian FTP archive. A summary of the changes between this version and the previous one is
attached.
Thank you for reporting the bug, which will now be closed. If you have further comments please address them to [email protected] , and the maintainer will reopen the bug report if appropriate.
And then only a few months ago on the same page, it was reopened again:
So, is it worth changing the case of “RunTimeDirectory” to “RuntimeDirectory” in /lib/systemd/system/redis-server.service
(should be on line 14) and see if there’s any change?
[Edit] worth noting that the actual fix in the bug report linked above is
[Edit #2 ] I just checked the man-page for systemd.exec and the RuntimeDirectory directive is used to create runtime folders under a root folder as defined in the following table, so likely there are runtime folders that are expected to exist but do not.
Table from man-page:
Locations
for system
for users
RuntimeDirectory=
/run
$XDG_RUNTIME_DIR
danbates
(Daniel Bates)
20 August 2018 06:26
17
Got it. I’d like to know how that command works.
pi@emonpi(ro):~$ sudo ls -la /var/lib/redis
total 12
drwxr-x--- 2 redis redis 4096 May 17 13:41 .
drwxr-xr-x 44 root root 4096 May 17 12:52 ..
-rw-rw---- 1 redis redis 76 May 17 13:41 dump.rdb
danbates
(Daniel Bates)
20 August 2018 06:35
18
. Correcting typo on line 14, reboot, no change with issue.
. Directory /run and /run/redis contents below… Not sure what I’m looking for.
pi@emonpi(rw):/$ cd run
pi@emonpi(rw):run$ ls
agetty.reload dbus dnsmasq log network rpcbind.lock sshd udev
apache2 dhclient.wlan0.pid emonhub.pid lwrfd.pid ntpd.pid rpcbind.sock sshd.pid user
avahi-daemon dhcpcd emonPiLCD.pid mosquitto.pid plymouth rpc_pipefs sysconfig utmp
console-setup dhcpcd.pid feedwriter.pid motd.dynamic redis rsyslogd.pid systemd wpa_supplicant
crond.pid dhcpcd.sock initctl mount resolvconf sendsigs.omit.d thd.socket wpa_supplicant.wlan0.pid
crond.reboot dhcpcd.unpriv.sock lock mysqld rpcbind shm tmpfiles.d xtables.lock
pi@emonpi(rw):run$
pi@emonpi(rw):run$ cd redis
pi@emonpi(rw):redis$ ls
redis-server.pid
pi@emonpi(rw):redis$
Greebo
(Greebo)
20 August 2018 06:55
19
More usefully is actually understanding why “sudo ls -la /var/lib/redis/*
” doesn’t work. See this post of mine here .
In previous distributions that file would have been deleted as part of the “making read-only”. It was recently discovered (in that thread I just linked to) that there was an error in the build guide which leaves that file behind causing other issues (as per that other thread). Try removing it using:
sudo rm /var/lib/redis/dump.rdb
Ok, looks like once the folder was there, all it created in that folder was the .pid file (which is normally used to track what PID the process is running under) so it probably didn’t have any consequences beyond logging an error in syslog on service startup.
See how you go removing the dump.rdb
file from /var/lib/redis
.
danbates
(Daniel Bates)
20 August 2018 07:11
20
Removing seemed to happen OK. No change with redis.
pi@emonpi(ro):~$ sudo ls -la /var/lib/redis
total 8
drwxr-x--- 2 redis redis 4096 Aug 20 06:56 .
drwxr-xr-x 44 root root 4096 May 17 12:52 ..
I notice on the redis website we’ve got a newer old version 3 to try out.
danbates
(Daniel Bates)
20 August 2018 08:08
21
I’ve compiled 3.2.12 and have a src folder with all these:
pi@emonpi(rw):redis-3.2.12$ cd src
pi@emonpi(rw):src$ ls
adlist.c bitops.c debug.o latency.o object.o redis-check-aof rio.o slowlog.o t_zset.o
adlist.h bitops.o dict.c lzf_c.c pqsort.c redis-check-aof.c scripting.c solarisfixes.h util.c
adlist.o blocked.c dict.h lzf_c.o pqsort.h redis-check-aof.o scripting.o sort.c util.h
ae.c blocked.o dict.o lzf_d.c pqsort.o redis-check-rdb sdsalloc.h sort.o util.o
ae_epoll.c cluster.c endianconv.c lzf_d.o pubsub.c redis-check-rdb.c sds.c sparkline.c valgrind.sup
ae_evport.c cluster.h endianconv.h lzf.h pubsub.o redis-check-rdb.o sds.h sparkline.h version.h
ae.h cluster.o endianconv.o lzfP.h quicklist.c redis-cli sds.o sparkline.o ziplist.c
ae_kqueue.c config.c fmacros.h Makefile quicklist.h redis-cli.c sentinel.c syncio.c ziplist.h
ae.o config.h geo.c Makefile.dep quicklist.o redis-cli.o sentinel.o syncio.o ziplist.o
ae_select.c config.o geo.h memtest.c rand.c redis-sentinel server.c testhelp.h zipmap.c
anet.c crc16.c geo.o memtest.o rand.h redis-server server.h t_hash.c zipmap.h
anet.h crc16.o help.h mkreleasehdr.sh rand.o redis-trib.rb server.o t_hash.o zipmap.o
anet.o crc64.c hyperloglog.c multi.c rdb.c release.c setproctitle.c t_list.c zmalloc.c
aof.c crc64.h hyperloglog.o multi.o rdb.h release.h setproctitle.o t_list.o zmalloc.h
aof.o crc64.o intset.c networking.c rdb.o release.o sha1.c t_set.c zmalloc.o
asciilogo.h db.c intset.h networking.o redisassert.h replication.c sha1.h t_set.o
bio.c db.o intset.o notify.c redis-benchmark replication.o sha1.o t_string.c
bio.h debug.c latency.c notify.o redis-benchmark.c rio.c slowlog.c t_string.o
bio.o debugmacro.h latency.h object.c redis-benchmark.o rio.h slowlog.h t_zset.c
Where do these go if I was to update my redis? Or is there a better way of compiling directly into the redis folder perhaps?
Greebo
(Greebo)
20 August 2018 10:53
22
Dan will be happy to hear that I’ve just downloaded the latest image from OEM and flashed it to an SD card and I get the same behaviour.
Just to reiterate, this is after:
download SD image from here
burn image to SD card using Etcher
plug SD card into RPi 3
plug RPi into LAN port of switch
power up Pi
find IP address of emonpi from Router interface
ssh to IP address and logon using pi
run top
top - 10:50:11 up 11 min, 1 user, load average: 0.53, 0.63, 0.45
Tasks: 112 total, 1 running, 69 sleeping, 0 stopped, 0 zombie
%Cpu(s): 7.3 us, 7.6 sy, 0.3 ni, 81.5 id, 1.2 wa, 0.0 hi, 2.0 si, 0.0 st
KiB Mem : 1000184 total, 357156 free, 264160 used, 378868 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 648336 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1288 redis 20 0 31312 3032 2260 S 36.8 0.3 2:26.50 redis-server
10122 pi 20 0 9868 3340 2792 R 10.5 0.3 0:00.05 top
1 root 20 0 27224 6208 4880 S 0.0 0.6 0:05.36 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
I did an strace on the process and its doing a LOT of this:
[lots of unnecessary text deleted]
Does that mean anything to anyone?
[EDIT] Found the culprit… See the next post below.
Greebo
(Greebo)
20 August 2018 11:35
23
Looks like its service-runner…
using redis-cli
, and MONITOR
, I see this:
1534764290.902086 [0 127.0.0.1:40252] "COMMAND"
1534764290.940222 [0 127.0.0.1:40252] "lpop" "service-runner"
1534764291.049197 [0 127.0.0.1:40254] "COMMAND"
1534764291.092773 [0 127.0.0.1:40254] "lpop" "service-runner"
1534764291.209071 [0 127.0.0.1:40256] "COMMAND"
1534764291.284804 [0 127.0.0.1:40256] "lpop" "service-runner"
1534764291.394679 [0 127.0.0.1:40258] "COMMAND"
1534764291.433175 [0 127.0.0.1:40258] "lpop" "service-runner"
1534764291.542032 [0 127.0.0.1:40260] "COMMAND"
1534764291.580867 [0 127.0.0.1:40260] "lpop" "service-runner"
1534764291.689864 [0 127.0.0.1:40262] "COMMAND"
1534764291.728895 [0 127.0.0.1:40262] "lpop" "service-runner"
1534764291.837814 [0 127.0.0.1:40264] "COMMAND"
1534764291.875647 [0 127.0.0.1:40264] "lpop" "service-runner"
1534764291.984414 [0 127.0.0.1:40266] "COMMAND"
1534764292.051389 [0 127.0.0.1:40266] "lpop" "service-runner"
1534764292.162736 [0 127.0.0.1:40268] "COMMAND"
1534764292.200655 [0 127.0.0.1:40268] "lpop" "service-runner"
1534764292.309909 [0 127.0.0.1:40270] "COMMAND"
1534764292.347370 [0 127.0.0.1:40270] "lpop" "service-runner"
1534764292.456159 [0 127.0.0.1:40272] "COMMAND"
1534764292.494217 [0 127.0.0.1:40272] "lpop" "service-runner"
1534764292.603430 [0 127.0.0.1:40274] "COMMAND"
1534764292.641038 [0 127.0.0.1:40274] "lpop" "service-runner"
1534764292.750231 [0 127.0.0.1:40276] "COMMAND"
1534764292.787986 [0 127.0.0.1:40276] "lpop" "service-runner"
1534764292.901594 [0 127.0.0.1:40278] "COMMAND"
1534764292.977671 [0 127.0.0.1:40278] "lpop" "service-runner"
1534764293.089051 [0 127.0.0.1:40280] "COMMAND"
1534764293.127222 [0 127.0.0.1:40280] "lpop" "service-runner"
1534764293.236079 [0 127.0.0.1:40282] "COMMAND"
1534764293.274621 [0 127.0.0.1:40282] "lpop" "service-runner"
1534764293.383566 [0 127.0.0.1:40284] "COMMAND"
1534764293.422042 [0 127.0.0.1:40284] "lpop" "service-runner"
1534764293.531111 [0 127.0.0.1:40286] "COMMAND"
1534764293.569599 [0 127.0.0.1:40286] "lpop" "service-runner"
1534764293.678696 [0 127.0.0.1:40288] "COMMAND"
1534764293.722883 [0 127.0.0.1:40288] "lpop" "service-runner"
1534764293.839386 [0 127.0.0.1:40290] "COMMAND"
1534764293.914465 [0 127.0.0.1:40290] "lpop" "service-runner"
1534764294.024684 [0 127.0.0.1:40292] "COMMAND"
1534764294.062352 [0 127.0.0.1:40292] "lpop" "service-runner"
1534764294.171162 [0 127.0.0.1:40294] "COMMAND"
1534764294.209069 [0 127.0.0.1:40294] "lpop" "service-runner"
1534764294.318104 [0 127.0.0.1:40296] "COMMAND"
1534764294.355857 [0 127.0.0.1:40296] "lpop" "service-runner"
1534764294.464634 [0 127.0.0.1:40298] "COMMAND"
1534764294.502605 [0 127.0.0.1:40298] "lpop" "service-runner"
1534764294.611750 [0 127.0.0.1:40300] "COMMAND"
1534764294.676633 [0 127.0.0.1:40300] "lpop" "service-runner"
1534764294.788416 [0 127.0.0.1:40302] "COMMAND"
1534764294.826857 [0 127.0.0.1:40302] "lpop" "service-runner"
1534764294.935804 [0 127.0.0.1:40304] "COMMAND"
1534764294.974066 [0 127.0.0.1:40304] "lpop" "service-runner"
1534764295.082956 [0 127.0.0.1:40306] "COMMAND"
1534764295.121194 [0 127.0.0.1:40306] "lpop" "service-runner"
1534764295.230041 [0 127.0.0.1:40308] "COMMAND"
1534764295.268337 [0 127.0.0.1:40308] "lpop" "service-runner"
1534764295.377303 [0 127.0.0.1:40310] "COMMAND"
1534764295.416084 [0 127.0.0.1:40310] "lpop" "service-runner"
1534764295.529468 [0 127.0.0.1:40312] "COMMAND"
1534764295.605709 [0 127.0.0.1:40312] "lpop" "service-runner"
1534764295.716997 [0 127.0.0.1:40314] "COMMAND"
1534764295.755140 [0 127.0.0.1:40314] "lpop" "service-runner"
1534764295.864223 [0 127.0.0.1:40316] "COMMAND"
1534764295.902137 [0 127.0.0.1:40316] "lpop" "service-runner"
1534764296.011123 [0 127.0.0.1:40318] "COMMAND"
1534764296.048981 [0 127.0.0.1:40318] "lpop" "service-runner"
1534764296.157993 [0 127.0.0.1:40320] "COMMAND"
1534764296.195802 [0 127.0.0.1:40320] "lpop" "service-runner"
1534764296.304819 [0 127.0.0.1:40322] "COMMAND"
1534764296.342483 [0 127.0.0.1:40322] "lpop" "service-runner"
1534764296.458611 [0 127.0.0.1:40324] "COMMAND"
1534764296.534213 [0 127.0.0.1:40324] "lpop" "service-runner"
1534764296.645611 [0 127.0.0.1:40326] "COMMAND"
1534764296.683906 [0 127.0.0.1:40326] "lpop" "service-runner"
1534764296.792961 [0 127.0.0.1:40328] "COMMAND"
1534764296.831556 [0 127.0.0.1:40328] "lpop" "service-runner"
1534764296.940561 [0 127.0.0.1:40330] "COMMAND"
1534764296.978779 [0 127.0.0.1:40330] "lpop" "service-runner"
1534764297.087696 [0 127.0.0.1:40332] "COMMAND"
1534764297.123778 [0 127.0.0.1:40332] "lpop" "service-runner"
1534764297.232712 [0 127.0.0.1:40334] "COMMAND"
1534764297.289257 [0 127.0.0.1:40334] "lpop" "service-runner"
1534764297.405525 [0 127.0.0.1:40336] "COMMAND"
1534764297.469148 [0 127.0.0.1:40336] "lpop" "service-runner"
1534764297.577916 [0 127.0.0.1:40338] "COMMAND"
1534764297.615979 [0 127.0.0.1:40338] "lpop" "service-runner"
1534764297.724937 [0 127.0.0.1:40340] "COMMAND"
1534764297.763295 [0 127.0.0.1:40340] "lpop" "service-runner"
1534764297.872390 [0 127.0.0.1:40342] "COMMAND"
1534764297.910873 [0 127.0.0.1:40342] "lpop" "service-runner"
1534764298.019929 [0 127.0.0.1:40344] "COMMAND"
1534764298.058203 [0 127.0.0.1:40344] "lpop" "service-runner"
1534764298.171360 [0 127.0.0.1:40346] "COMMAND"
1534764298.247247 [0 127.0.0.1:40346] "lpop" "service-runner"
1534764298.358903 [0 127.0.0.1:40348] "COMMAND"
1534764298.397296 [0 127.0.0.1:40348] "lpop" "service-runner"
1534764298.506275 [0 127.0.0.1:40350] "COMMAND"
1534764298.544455 [0 127.0.0.1:40350] "lpop" "service-runner"
1534764298.653411 [0 127.0.0.1:40352] "COMMAND"
1534764298.691741 [0 127.0.0.1:40352] "lpop" "service-runner"
1534764298.800785 [0 127.0.0.1:40354] "COMMAND"
1534764298.839067 [0 127.0.0.1:40354] "lpop" "service-runner"
1534764298.947977 [0 127.0.0.1:40356] "COMMAND"
1534764299.002411 [0 127.0.0.1:40356] "lpop" "service-runner"
1534764299.118331 [0 127.0.0.1:40358] "COMMAND"
1534764299.185577 [0 127.0.0.1:40358] "lpop" "service-runner"
1534764299.294485 [0 127.0.0.1:40360] "COMMAND"
1534764299.332686 [0 127.0.0.1:40360] "lpop" "service-runner"
1534764299.441743 [0 127.0.0.1:40362] "COMMAND"
1534764299.479647 [0 127.0.0.1:40362] "lpop" "service-runner"
1534764299.588542 [0 127.0.0.1:40364] "COMMAND"
1534764299.626844 [0 127.0.0.1:40364] "lpop" "service-runner"
1534764299.735902 [0 127.0.0.1:40366] "COMMAND"
1534764299.775585 [0 127.0.0.1:40366] "lpop" "service-runner"
1534764299.884500 [0 127.0.0.1:40368] "COMMAND"
1534764299.959623 [0 127.0.0.1:40368] "lpop" "service-runner"
1534764300.070999 [0 127.0.0.1:40370] "COMMAND"
1534764300.109080 [0 127.0.0.1:40370] "lpop" "service-runner"
1534764300.218050 [0 127.0.0.1:40372] "COMMAND"
1534764300.256361 [0 127.0.0.1:40372] "lpop" "service-runner"
1534764300.365163 [0 127.0.0.1:40374] "COMMAND"
1534764300.402860 [0 127.0.0.1:40374] "lpop" "service-runner"
1534764300.511883 [0 127.0.0.1:40376] "COMMAND"
1534764300.550369 [0 127.0.0.1:40376] "lpop" "service-runner"
1534764300.659241 [0 127.0.0.1:40378] "COMMAND"
1534764300.701712 [0 127.0.0.1:40378] "lpop" "service-runner"
1534764300.817983 [0 127.0.0.1:40380] "COMMAND"
1534764300.895561 [0 127.0.0.1:40380] "lpop" "service-runner"
q11534764301.005468 [0 127.0.0.1:40382] "COMMAND"
1534764301.044846 [0 127.0.0.1:40382] "lpop" "service-runner"
1534764301.153728 [0 127.0.0.1:40384] "COMMAND"
1534764301.192484 [0 127.0.0.1:40384] "lpop" "service-runner"
1534764301.301287 [0 127.0.0.1:40386] "COMMAND"
Which appears to be what is in ~/emonpi/service-runner:
pi@emonpi(ro):~$ cat emonpi/service-runner
#!/bin/bash
## Used to update log viewer window in Emoncms admin
# Used in conjunction with: service-runner-update.sh and Emoncms admin module
scriptname=$(basename $0)
pidfile="/tmp/${scriptname}"
# lock it
exec 200>$pidfile
flock -n 200 || exit 1
pid=$$
echo $pid 1>&200
echo "starting service-runner"
while :
do
flag=$(redis-cli lpop service-runner)
if [ "$flag" ]; then
echo "flag: "$flag
IFS=\> read script logfile <<< "$flag"
echo "STARTING: $script > $logfile"
$script > $logfile
if [ ! -e $logfile ]; then
touch $logfile
fi
echo "COMPLETE: $script > $logfile"
fi
sleep 0.1
done
As soon as I killed service-runner, redis stopped going mental.
[EDIT]
As soon as it started again via crontab though, redis-server
is back to consuming an entire core on the Pi.
danbates
(Daniel Bates)
20 August 2018 12:14
24
We have the culprit! kind of…
Updating redis is unlikely to fix this.
danbates
(Daniel Bates)
20 August 2018 12:21
25
The standard method for following log files is tail -f logfilename
If it can be displayed in the admin page it’d be a better method.
danbates
(Daniel Bates)
20 August 2018 15:38
26
and thus: the magic of crontab -e and commenting-out the relavant line and a reboot did the trick!
Next, emonHub using 4% CPU
Greebo
(Greebo)
20 August 2018 20:56
27
Except that it hasn’t fixed service-runner
…
danbates
(Daniel Bates)
21 August 2018 06:50
28
Any thoughts on who could take responsibility for that?
Luckily it’s a non-essential program.
danbates
(Daniel Bates)
21 August 2018 07:00
29
Might be useful.
http://logio.org/
Greebo
(Greebo)
21 August 2018 07:52
30
Based on the code in github:
I’d suggest it’s @glyn.hudson that we need to let know.
danbates
(Daniel Bates)
21 August 2018 07:53
31
Cool. Apologies that I can’t / won’t go into this more deeply now, I don’t know how service-runner works. I’ve got to keep focussing on my install. The topic was focused on redis high CPU usage and that is fixed. I’ll leave it to you guys for now
Greebo
(Greebo)
21 August 2018 07:54
32
No problem at all, I only commented because you marked the thread as “Solved” when the problem isn’t actually fixed at all
danbates
(Daniel Bates)
21 August 2018 08:15
33
I see your point The underlying problem remains.