rPi 3B+ getting hot. Redis high CPU usage. (emonSD-13Jun18 dev. specific)

  1. redis logs…

  2. emonCMS logs…

  3. redis settings in emoncms/settings.php…
    logs and settings.zip (24.0 KB)
    see 12:35pm on 19th August for seconds leading up to observed redis CPU use increase.

  4. sudo systemctl restart redis-server
    does nothing

  5. Flushing redis.
    nothing again.

  6. redis version
    is 3.2.6 after update emonPi update button, no change

  7. redis files in /var/lib/redis/*
    permission error, can’t view 'em.

  8. sudo ls -la /var/log/redis
    total 8
    drwxr-xr-x 2 redis redis 60 Aug 19 12:34 .
    drwxr-xr-x 11 root root 500 Aug 19 12:34 …
    -rw-rw-rw- 1 redis redis 6384 Aug 19 13:04 redis-server.log

and 9. give it inputs to process.
no change.

Not even with sudo ls -la /var/lib/redis ?

This is in syslog:

Aug 19 12:34:29 emonpi systemd[1]: [/lib/systemd/system/redis-server.service:14] Unknown lvalue 'RunTimeDirectory' in section 'Service'
Aug 19 12:34:29 emonpi systemd[1]: Starting Advanced key-value store...

RunTimeDirectory sounds important, it looks like its value couldn’t be loaded from the config?
There’s no errors in the redis-server.log indicating there’s an issue though, but probably worth checking anyway.

I just found a bug report on Debian bug reports which says:

Followed by (in closing the bug report):

And then only a few months ago on the same page, it was reopened again:

So, is it worth changing the case of “RunTimeDirectory” to “RuntimeDirectory” in /lib/systemd/system/redis-server.service (should be on line 14) and see if there’s any change?

[Edit] worth noting that the actual fix in the bug report linked above is

[Edit #2] I just checked the man-page for systemd.exec and the RuntimeDirectory directive is used to create runtime folders under a root folder as defined in the following table, so likely there are runtime folders that are expected to exist but do not.
Table from man-page:

Locations for system for users
RuntimeDirectory= /run $XDG_RUNTIME_DIR

Got it. I’d like to know how that command works.

pi@emonpi(ro):~$ sudo ls -la /var/lib/redis
total 12
drwxr-x---  2 redis redis 4096 May 17 13:41 .
drwxr-xr-x 44 root  root  4096 May 17 12:52 ..
-rw-rw----  1 redis redis   76 May 17 13:41 dump.rdb

. Correcting typo on line 14, reboot, no change with issue.
. Directory /run and /run/redis contents below… Not sure what I’m looking for.

pi@emonpi(rw):/$ cd run
pi@emonpi(rw):run$ ls
agetty.reload  dbus                dnsmasq         log            network     rpcbind.lock     sshd        udev
apache2        dhclient.wlan0.pid  emonhub.pid     lwrfd.pid      ntpd.pid    rpcbind.sock     sshd.pid    user
avahi-daemon   dhcpcd              emonPiLCD.pid   mosquitto.pid  plymouth    rpc_pipefs       sysconfig   utmp
console-setup  dhcpcd.pid          feedwriter.pid  motd.dynamic   redis       rsyslogd.pid     systemd     wpa_supplicant
crond.pid      dhcpcd.sock         initctl         mount          resolvconf  sendsigs.omit.d  thd.socket  wpa_supplicant.wlan0.pid
crond.reboot   dhcpcd.unpriv.sock  lock            mysqld         rpcbind     shm              tmpfiles.d  xtables.lock
pi@emonpi(rw):run$ 
pi@emonpi(rw):run$ cd redis
pi@emonpi(rw):redis$ ls
redis-server.pid
pi@emonpi(rw):redis$

More usefully is actually understanding why “sudo ls -la /var/lib/redis/*doesn’t work. See this post of mine here.
In previous distributions that file would have been deleted as part of the “making read-only”. It was recently discovered (in that thread I just linked to) that there was an error in the build guide which leaves that file behind causing other issues (as per that other thread). Try removing it using:

sudo rm /var/lib/redis/dump.rdb

Ok, looks like once the folder was there, all it created in that folder was the .pid file (which is normally used to track what PID the process is running under) so it probably didn’t have any consequences beyond logging an error in syslog on service startup.

See how you go removing the dump.rdb file from /var/lib/redis.

Removing seemed to happen OK. No change with redis.

pi@emonpi(ro):~$ sudo ls -la /var/lib/redis
total 8
drwxr-x---  2 redis redis 4096 Aug 20 06:56 .
drwxr-xr-x 44 root  root  4096 May 17 12:52 ..

I notice on the redis website we’ve got a newer old version 3 to try out.

I’ve compiled 3.2.12 and have a src folder with all these:

pi@emonpi(rw):redis-3.2.12$ cd src
pi@emonpi(rw):src$ ls
adlist.c     bitops.c      debug.o        latency.o        object.o           redis-check-aof    rio.o           slowlog.o       t_zset.o
adlist.h     bitops.o      dict.c         lzf_c.c          pqsort.c           redis-check-aof.c  scripting.c     solarisfixes.h  util.c
adlist.o     blocked.c     dict.h         lzf_c.o          pqsort.h           redis-check-aof.o  scripting.o     sort.c          util.h
ae.c         blocked.o     dict.o         lzf_d.c          pqsort.o           redis-check-rdb    sdsalloc.h      sort.o          util.o
ae_epoll.c   cluster.c     endianconv.c   lzf_d.o          pubsub.c           redis-check-rdb.c  sds.c           sparkline.c     valgrind.sup
ae_evport.c  cluster.h     endianconv.h   lzf.h            pubsub.o           redis-check-rdb.o  sds.h           sparkline.h     version.h
ae.h         cluster.o     endianconv.o   lzfP.h           quicklist.c        redis-cli          sds.o           sparkline.o     ziplist.c
ae_kqueue.c  config.c      fmacros.h      Makefile         quicklist.h        redis-cli.c        sentinel.c      syncio.c        ziplist.h
ae.o         config.h      geo.c          Makefile.dep     quicklist.o        redis-cli.o        sentinel.o      syncio.o        ziplist.o
ae_select.c  config.o      geo.h          memtest.c        rand.c             redis-sentinel     server.c        testhelp.h      zipmap.c
anet.c       crc16.c       geo.o          memtest.o        rand.h             redis-server       server.h        t_hash.c        zipmap.h
anet.h       crc16.o       help.h         mkreleasehdr.sh  rand.o             redis-trib.rb      server.o        t_hash.o        zipmap.o
anet.o       crc64.c       hyperloglog.c  multi.c          rdb.c              release.c          setproctitle.c  t_list.c        zmalloc.c
aof.c        crc64.h       hyperloglog.o  multi.o          rdb.h              release.h          setproctitle.o  t_list.o        zmalloc.h
aof.o        crc64.o       intset.c       networking.c     rdb.o              release.o          sha1.c          t_set.c         zmalloc.o
asciilogo.h  db.c          intset.h       networking.o     redisassert.h      replication.c      sha1.h          t_set.o
bio.c        db.o          intset.o       notify.c         redis-benchmark    replication.o      sha1.o          t_string.c
bio.h        debug.c       latency.c      notify.o         redis-benchmark.c  rio.c              slowlog.c       t_string.o
bio.o        debugmacro.h  latency.h      object.c         redis-benchmark.o  rio.h              slowlog.h       t_zset.c

Where do these go if I was to update my redis? Or is there a better way of compiling directly into the redis folder perhaps?

Dan will be happy to hear that I’ve just downloaded the latest image from OEM and flashed it to an SD card and I get the same behaviour.

Just to reiterate, this is after:

  1. download SD image from here
  2. burn image to SD card using Etcher
  3. plug SD card into RPi 3
  4. plug RPi into LAN port of switch
  5. power up Pi
  6. find IP address of emonpi from Router interface
  7. ssh to IP address and logon using pi
  8. run top
top - 10:50:11 up 11 min,  1 user,  load average: 0.53, 0.63, 0.45
Tasks: 112 total,   1 running,  69 sleeping,   0 stopped,   0 zombie
%Cpu(s):  7.3 us,  7.6 sy,  0.3 ni, 81.5 id,  1.2 wa,  0.0 hi,  2.0 si,  0.0 st
KiB Mem :  1000184 total,   357156 free,   264160 used,   378868 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   648336 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1288 redis     20   0   31312   3032   2260 S  36.8  0.3   2:26.50 redis-server
10122 pi        20   0    9868   3340   2792 R  10.5  0.3   0:00.05 top
    1 root      20   0   27224   6208   4880 S   0.0  0.6   0:05.36 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd

I did an strace on the process and its doing a LOT of this:

[lots of unnecessary text deleted]

Does that mean anything to anyone?

[EDIT] Found the culprit… See the next post below.

Looks like its service-runner…

using redis-cli, and MONITOR, I see this:

1534764290.902086 [0 127.0.0.1:40252] "COMMAND"
1534764290.940222 [0 127.0.0.1:40252] "lpop" "service-runner"
1534764291.049197 [0 127.0.0.1:40254] "COMMAND"
1534764291.092773 [0 127.0.0.1:40254] "lpop" "service-runner"
1534764291.209071 [0 127.0.0.1:40256] "COMMAND"
1534764291.284804 [0 127.0.0.1:40256] "lpop" "service-runner"
1534764291.394679 [0 127.0.0.1:40258] "COMMAND"
1534764291.433175 [0 127.0.0.1:40258] "lpop" "service-runner"
1534764291.542032 [0 127.0.0.1:40260] "COMMAND"
1534764291.580867 [0 127.0.0.1:40260] "lpop" "service-runner"
1534764291.689864 [0 127.0.0.1:40262] "COMMAND"
1534764291.728895 [0 127.0.0.1:40262] "lpop" "service-runner"
1534764291.837814 [0 127.0.0.1:40264] "COMMAND"
1534764291.875647 [0 127.0.0.1:40264] "lpop" "service-runner"
1534764291.984414 [0 127.0.0.1:40266] "COMMAND"
1534764292.051389 [0 127.0.0.1:40266] "lpop" "service-runner"
1534764292.162736 [0 127.0.0.1:40268] "COMMAND"
1534764292.200655 [0 127.0.0.1:40268] "lpop" "service-runner"
1534764292.309909 [0 127.0.0.1:40270] "COMMAND"
1534764292.347370 [0 127.0.0.1:40270] "lpop" "service-runner"
1534764292.456159 [0 127.0.0.1:40272] "COMMAND"
1534764292.494217 [0 127.0.0.1:40272] "lpop" "service-runner"
1534764292.603430 [0 127.0.0.1:40274] "COMMAND"
1534764292.641038 [0 127.0.0.1:40274] "lpop" "service-runner"
1534764292.750231 [0 127.0.0.1:40276] "COMMAND"
1534764292.787986 [0 127.0.0.1:40276] "lpop" "service-runner"
1534764292.901594 [0 127.0.0.1:40278] "COMMAND"
1534764292.977671 [0 127.0.0.1:40278] "lpop" "service-runner"
1534764293.089051 [0 127.0.0.1:40280] "COMMAND"
1534764293.127222 [0 127.0.0.1:40280] "lpop" "service-runner"
1534764293.236079 [0 127.0.0.1:40282] "COMMAND"
1534764293.274621 [0 127.0.0.1:40282] "lpop" "service-runner"
1534764293.383566 [0 127.0.0.1:40284] "COMMAND"
1534764293.422042 [0 127.0.0.1:40284] "lpop" "service-runner"
1534764293.531111 [0 127.0.0.1:40286] "COMMAND"
1534764293.569599 [0 127.0.0.1:40286] "lpop" "service-runner"
1534764293.678696 [0 127.0.0.1:40288] "COMMAND"
1534764293.722883 [0 127.0.0.1:40288] "lpop" "service-runner"
1534764293.839386 [0 127.0.0.1:40290] "COMMAND"
1534764293.914465 [0 127.0.0.1:40290] "lpop" "service-runner"
1534764294.024684 [0 127.0.0.1:40292] "COMMAND"
1534764294.062352 [0 127.0.0.1:40292] "lpop" "service-runner"
1534764294.171162 [0 127.0.0.1:40294] "COMMAND"
1534764294.209069 [0 127.0.0.1:40294] "lpop" "service-runner"
1534764294.318104 [0 127.0.0.1:40296] "COMMAND"
1534764294.355857 [0 127.0.0.1:40296] "lpop" "service-runner"
1534764294.464634 [0 127.0.0.1:40298] "COMMAND"
1534764294.502605 [0 127.0.0.1:40298] "lpop" "service-runner"
1534764294.611750 [0 127.0.0.1:40300] "COMMAND"
1534764294.676633 [0 127.0.0.1:40300] "lpop" "service-runner"
1534764294.788416 [0 127.0.0.1:40302] "COMMAND"
1534764294.826857 [0 127.0.0.1:40302] "lpop" "service-runner"
1534764294.935804 [0 127.0.0.1:40304] "COMMAND"
1534764294.974066 [0 127.0.0.1:40304] "lpop" "service-runner"
1534764295.082956 [0 127.0.0.1:40306] "COMMAND"
1534764295.121194 [0 127.0.0.1:40306] "lpop" "service-runner"
1534764295.230041 [0 127.0.0.1:40308] "COMMAND"
1534764295.268337 [0 127.0.0.1:40308] "lpop" "service-runner"
1534764295.377303 [0 127.0.0.1:40310] "COMMAND"
1534764295.416084 [0 127.0.0.1:40310] "lpop" "service-runner"
1534764295.529468 [0 127.0.0.1:40312] "COMMAND"
1534764295.605709 [0 127.0.0.1:40312] "lpop" "service-runner"
1534764295.716997 [0 127.0.0.1:40314] "COMMAND"
1534764295.755140 [0 127.0.0.1:40314] "lpop" "service-runner"
1534764295.864223 [0 127.0.0.1:40316] "COMMAND"
1534764295.902137 [0 127.0.0.1:40316] "lpop" "service-runner"
1534764296.011123 [0 127.0.0.1:40318] "COMMAND"
1534764296.048981 [0 127.0.0.1:40318] "lpop" "service-runner"
1534764296.157993 [0 127.0.0.1:40320] "COMMAND"
1534764296.195802 [0 127.0.0.1:40320] "lpop" "service-runner"
1534764296.304819 [0 127.0.0.1:40322] "COMMAND"
1534764296.342483 [0 127.0.0.1:40322] "lpop" "service-runner"
1534764296.458611 [0 127.0.0.1:40324] "COMMAND"
1534764296.534213 [0 127.0.0.1:40324] "lpop" "service-runner"
1534764296.645611 [0 127.0.0.1:40326] "COMMAND"
1534764296.683906 [0 127.0.0.1:40326] "lpop" "service-runner"
1534764296.792961 [0 127.0.0.1:40328] "COMMAND"
1534764296.831556 [0 127.0.0.1:40328] "lpop" "service-runner"
1534764296.940561 [0 127.0.0.1:40330] "COMMAND"
1534764296.978779 [0 127.0.0.1:40330] "lpop" "service-runner"
1534764297.087696 [0 127.0.0.1:40332] "COMMAND"
1534764297.123778 [0 127.0.0.1:40332] "lpop" "service-runner"
1534764297.232712 [0 127.0.0.1:40334] "COMMAND"
1534764297.289257 [0 127.0.0.1:40334] "lpop" "service-runner"
1534764297.405525 [0 127.0.0.1:40336] "COMMAND"
1534764297.469148 [0 127.0.0.1:40336] "lpop" "service-runner"
1534764297.577916 [0 127.0.0.1:40338] "COMMAND"
1534764297.615979 [0 127.0.0.1:40338] "lpop" "service-runner"
1534764297.724937 [0 127.0.0.1:40340] "COMMAND"
1534764297.763295 [0 127.0.0.1:40340] "lpop" "service-runner"
1534764297.872390 [0 127.0.0.1:40342] "COMMAND"
1534764297.910873 [0 127.0.0.1:40342] "lpop" "service-runner"
1534764298.019929 [0 127.0.0.1:40344] "COMMAND"
1534764298.058203 [0 127.0.0.1:40344] "lpop" "service-runner"
1534764298.171360 [0 127.0.0.1:40346] "COMMAND"
1534764298.247247 [0 127.0.0.1:40346] "lpop" "service-runner"
1534764298.358903 [0 127.0.0.1:40348] "COMMAND"
1534764298.397296 [0 127.0.0.1:40348] "lpop" "service-runner"
1534764298.506275 [0 127.0.0.1:40350] "COMMAND"
1534764298.544455 [0 127.0.0.1:40350] "lpop" "service-runner"
1534764298.653411 [0 127.0.0.1:40352] "COMMAND"
1534764298.691741 [0 127.0.0.1:40352] "lpop" "service-runner"
1534764298.800785 [0 127.0.0.1:40354] "COMMAND"
1534764298.839067 [0 127.0.0.1:40354] "lpop" "service-runner"
1534764298.947977 [0 127.0.0.1:40356] "COMMAND"
1534764299.002411 [0 127.0.0.1:40356] "lpop" "service-runner"
1534764299.118331 [0 127.0.0.1:40358] "COMMAND"
1534764299.185577 [0 127.0.0.1:40358] "lpop" "service-runner"
1534764299.294485 [0 127.0.0.1:40360] "COMMAND"
1534764299.332686 [0 127.0.0.1:40360] "lpop" "service-runner"
1534764299.441743 [0 127.0.0.1:40362] "COMMAND"
1534764299.479647 [0 127.0.0.1:40362] "lpop" "service-runner"
1534764299.588542 [0 127.0.0.1:40364] "COMMAND"
1534764299.626844 [0 127.0.0.1:40364] "lpop" "service-runner"
1534764299.735902 [0 127.0.0.1:40366] "COMMAND"
1534764299.775585 [0 127.0.0.1:40366] "lpop" "service-runner"
1534764299.884500 [0 127.0.0.1:40368] "COMMAND"
1534764299.959623 [0 127.0.0.1:40368] "lpop" "service-runner"
1534764300.070999 [0 127.0.0.1:40370] "COMMAND"
1534764300.109080 [0 127.0.0.1:40370] "lpop" "service-runner"
1534764300.218050 [0 127.0.0.1:40372] "COMMAND"
1534764300.256361 [0 127.0.0.1:40372] "lpop" "service-runner"
1534764300.365163 [0 127.0.0.1:40374] "COMMAND"
1534764300.402860 [0 127.0.0.1:40374] "lpop" "service-runner"
1534764300.511883 [0 127.0.0.1:40376] "COMMAND"
1534764300.550369 [0 127.0.0.1:40376] "lpop" "service-runner"
1534764300.659241 [0 127.0.0.1:40378] "COMMAND"
1534764300.701712 [0 127.0.0.1:40378] "lpop" "service-runner"
1534764300.817983 [0 127.0.0.1:40380] "COMMAND"
1534764300.895561 [0 127.0.0.1:40380] "lpop" "service-runner"
q11534764301.005468 [0 127.0.0.1:40382] "COMMAND"
1534764301.044846 [0 127.0.0.1:40382] "lpop" "service-runner"
1534764301.153728 [0 127.0.0.1:40384] "COMMAND"
1534764301.192484 [0 127.0.0.1:40384] "lpop" "service-runner"
1534764301.301287 [0 127.0.0.1:40386] "COMMAND"

Which appears to be what is in ~/emonpi/service-runner:

pi@emonpi(ro):~$ cat emonpi/service-runner
#!/bin/bash

## Used to update log viewer window in Emoncms admin
# Used in conjunction with: service-runner-update.sh and Emoncms admin module

scriptname=$(basename $0)
pidfile="/tmp/${scriptname}"

# lock it
exec 200>$pidfile
flock -n 200 || exit 1
pid=$$
echo $pid 1>&200

echo "starting service-runner"

while :
do
    flag=$(redis-cli lpop service-runner)
    if [ "$flag" ]; then
        echo "flag: "$flag
        IFS=\> read script logfile <<< "$flag"
        echo "STARTING: $script > $logfile"
        $script > $logfile
        if [ ! -e $logfile ]; then
            touch $logfile
        fi
        echo "COMPLETE: $script > $logfile"
    fi

    sleep 0.1
done

As soon as I killed service-runner, redis stopped going mental.

[EDIT]
As soon as it started again via crontab though, redis-server is back to consuming an entire core on the Pi.

We have the culprit! kind of…
Updating redis is unlikely to fix this.

The standard method for following log files is tail -f logfilename
If it can be displayed in the admin page it’d be a better method.

and thus: the magic of crontab -e and commenting-out the relavant line and a reboot did the trick!

Next, emonHub using 4% CPU :laughing:

Except that it hasn’t fixed service-runner

Any thoughts on who could take responsibility for that?
Luckily it’s a non-essential program.

Might be useful.
http://logio.org/

Based on the code in github:

I’d suggest it’s @glyn.hudson that we need to let know.

Cool. Apologies that I can’t / won’t go into this more deeply now, I don’t know how service-runner works. I’ve got to keep focussing on my install. The topic was focused on redis high CPU usage and that is fixed. :slight_smile: I’ll leave it to you guys for now :slight_smile:

No problem at all, I only commented because you marked the thread as “Solved” when the problem isn’t actually fixed at all :slight_smile:

I see your point :slight_smile: The underlying problem remains.