rPi 3B+ getting hot. Redis high CPU usage. (emonSD-13Jun18 dev. specific)

I’ve compiled 3.2.12 and have a src folder with all these:

pi@emonpi(rw):redis-3.2.12$ cd src
pi@emonpi(rw):src$ ls
adlist.c     bitops.c      debug.o        latency.o        object.o           redis-check-aof    rio.o           slowlog.o       t_zset.o
adlist.h     bitops.o      dict.c         lzf_c.c          pqsort.c           redis-check-aof.c  scripting.c     solarisfixes.h  util.c
adlist.o     blocked.c     dict.h         lzf_c.o          pqsort.h           redis-check-aof.o  scripting.o     sort.c          util.h
ae.c         blocked.o     dict.o         lzf_d.c          pqsort.o           redis-check-rdb    sdsalloc.h      sort.o          util.o
ae_epoll.c   cluster.c     endianconv.c   lzf_d.o          pubsub.c           redis-check-rdb.c  sds.c           sparkline.c     valgrind.sup
ae_evport.c  cluster.h     endianconv.h   lzf.h            pubsub.o           redis-check-rdb.o  sds.h           sparkline.h     version.h
ae.h         cluster.o     endianconv.o   lzfP.h           quicklist.c        redis-cli          sds.o           sparkline.o     ziplist.c
ae_kqueue.c  config.c      fmacros.h      Makefile         quicklist.h        redis-cli.c        sentinel.c      syncio.c        ziplist.h
ae.o         config.h      geo.c          Makefile.dep     quicklist.o        redis-cli.o        sentinel.o      syncio.o        ziplist.o
ae_select.c  config.o      geo.h          memtest.c        rand.c             redis-sentinel     server.c        testhelp.h      zipmap.c
anet.c       crc16.c       geo.o          memtest.o        rand.h             redis-server       server.h        t_hash.c        zipmap.h
anet.h       crc16.o       help.h         mkreleasehdr.sh  rand.o             redis-trib.rb      server.o        t_hash.o        zipmap.o
anet.o       crc64.c       hyperloglog.c  multi.c          rdb.c              release.c          setproctitle.c  t_list.c        zmalloc.c
aof.c        crc64.h       hyperloglog.o  multi.o          rdb.h              release.h          setproctitle.o  t_list.o        zmalloc.h
aof.o        crc64.o       intset.c       networking.c     rdb.o              release.o          sha1.c          t_set.c         zmalloc.o
asciilogo.h  db.c          intset.h       networking.o     redisassert.h      replication.c      sha1.h          t_set.o
bio.c        db.o          intset.o       notify.c         redis-benchmark    replication.o      sha1.o          t_string.c
bio.h        debug.c       latency.c      notify.o         redis-benchmark.c  rio.c              slowlog.c       t_string.o
bio.o        debugmacro.h  latency.h      object.c         redis-benchmark.o  rio.h              slowlog.h       t_zset.c

Where do these go if I was to update my redis? Or is there a better way of compiling directly into the redis folder perhaps?

Dan will be happy to hear that I’ve just downloaded the latest image from OEM and flashed it to an SD card and I get the same behaviour.

Just to reiterate, this is after:

  1. download SD image from here
  2. burn image to SD card using Etcher
  3. plug SD card into RPi 3
  4. plug RPi into LAN port of switch
  5. power up Pi
  6. find IP address of emonpi from Router interface
  7. ssh to IP address and logon using pi
  8. run top
top - 10:50:11 up 11 min,  1 user,  load average: 0.53, 0.63, 0.45
Tasks: 112 total,   1 running,  69 sleeping,   0 stopped,   0 zombie
%Cpu(s):  7.3 us,  7.6 sy,  0.3 ni, 81.5 id,  1.2 wa,  0.0 hi,  2.0 si,  0.0 st
KiB Mem :  1000184 total,   357156 free,   264160 used,   378868 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   648336 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1288 redis     20   0   31312   3032   2260 S  36.8  0.3   2:26.50 redis-server
10122 pi        20   0    9868   3340   2792 R  10.5  0.3   0:00.05 top
    1 root      20   0   27224   6208   4880 S   0.0  0.6   0:05.36 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd

I did an strace on the process and its doing a LOT of this:

[lots of unnecessary text deleted]

Does that mean anything to anyone?

[EDIT] Found the culprit… See the next post below.

Looks like its service-runner…

using redis-cli, and MONITOR, I see this:

1534764290.902086 [0 127.0.0.1:40252] "COMMAND"
1534764290.940222 [0 127.0.0.1:40252] "lpop" "service-runner"
1534764291.049197 [0 127.0.0.1:40254] "COMMAND"
1534764291.092773 [0 127.0.0.1:40254] "lpop" "service-runner"
1534764291.209071 [0 127.0.0.1:40256] "COMMAND"
1534764291.284804 [0 127.0.0.1:40256] "lpop" "service-runner"
1534764291.394679 [0 127.0.0.1:40258] "COMMAND"
1534764291.433175 [0 127.0.0.1:40258] "lpop" "service-runner"
1534764291.542032 [0 127.0.0.1:40260] "COMMAND"
1534764291.580867 [0 127.0.0.1:40260] "lpop" "service-runner"
1534764291.689864 [0 127.0.0.1:40262] "COMMAND"
1534764291.728895 [0 127.0.0.1:40262] "lpop" "service-runner"
1534764291.837814 [0 127.0.0.1:40264] "COMMAND"
1534764291.875647 [0 127.0.0.1:40264] "lpop" "service-runner"
1534764291.984414 [0 127.0.0.1:40266] "COMMAND"
1534764292.051389 [0 127.0.0.1:40266] "lpop" "service-runner"
1534764292.162736 [0 127.0.0.1:40268] "COMMAND"
1534764292.200655 [0 127.0.0.1:40268] "lpop" "service-runner"
1534764292.309909 [0 127.0.0.1:40270] "COMMAND"
1534764292.347370 [0 127.0.0.1:40270] "lpop" "service-runner"
1534764292.456159 [0 127.0.0.1:40272] "COMMAND"
1534764292.494217 [0 127.0.0.1:40272] "lpop" "service-runner"
1534764292.603430 [0 127.0.0.1:40274] "COMMAND"
1534764292.641038 [0 127.0.0.1:40274] "lpop" "service-runner"
1534764292.750231 [0 127.0.0.1:40276] "COMMAND"
1534764292.787986 [0 127.0.0.1:40276] "lpop" "service-runner"
1534764292.901594 [0 127.0.0.1:40278] "COMMAND"
1534764292.977671 [0 127.0.0.1:40278] "lpop" "service-runner"
1534764293.089051 [0 127.0.0.1:40280] "COMMAND"
1534764293.127222 [0 127.0.0.1:40280] "lpop" "service-runner"
1534764293.236079 [0 127.0.0.1:40282] "COMMAND"
1534764293.274621 [0 127.0.0.1:40282] "lpop" "service-runner"
1534764293.383566 [0 127.0.0.1:40284] "COMMAND"
1534764293.422042 [0 127.0.0.1:40284] "lpop" "service-runner"
1534764293.531111 [0 127.0.0.1:40286] "COMMAND"
1534764293.569599 [0 127.0.0.1:40286] "lpop" "service-runner"
1534764293.678696 [0 127.0.0.1:40288] "COMMAND"
1534764293.722883 [0 127.0.0.1:40288] "lpop" "service-runner"
1534764293.839386 [0 127.0.0.1:40290] "COMMAND"
1534764293.914465 [0 127.0.0.1:40290] "lpop" "service-runner"
1534764294.024684 [0 127.0.0.1:40292] "COMMAND"
1534764294.062352 [0 127.0.0.1:40292] "lpop" "service-runner"
1534764294.171162 [0 127.0.0.1:40294] "COMMAND"
1534764294.209069 [0 127.0.0.1:40294] "lpop" "service-runner"
1534764294.318104 [0 127.0.0.1:40296] "COMMAND"
1534764294.355857 [0 127.0.0.1:40296] "lpop" "service-runner"
1534764294.464634 [0 127.0.0.1:40298] "COMMAND"
1534764294.502605 [0 127.0.0.1:40298] "lpop" "service-runner"
1534764294.611750 [0 127.0.0.1:40300] "COMMAND"
1534764294.676633 [0 127.0.0.1:40300] "lpop" "service-runner"
1534764294.788416 [0 127.0.0.1:40302] "COMMAND"
1534764294.826857 [0 127.0.0.1:40302] "lpop" "service-runner"
1534764294.935804 [0 127.0.0.1:40304] "COMMAND"
1534764294.974066 [0 127.0.0.1:40304] "lpop" "service-runner"
1534764295.082956 [0 127.0.0.1:40306] "COMMAND"
1534764295.121194 [0 127.0.0.1:40306] "lpop" "service-runner"
1534764295.230041 [0 127.0.0.1:40308] "COMMAND"
1534764295.268337 [0 127.0.0.1:40308] "lpop" "service-runner"
1534764295.377303 [0 127.0.0.1:40310] "COMMAND"
1534764295.416084 [0 127.0.0.1:40310] "lpop" "service-runner"
1534764295.529468 [0 127.0.0.1:40312] "COMMAND"
1534764295.605709 [0 127.0.0.1:40312] "lpop" "service-runner"
1534764295.716997 [0 127.0.0.1:40314] "COMMAND"
1534764295.755140 [0 127.0.0.1:40314] "lpop" "service-runner"
1534764295.864223 [0 127.0.0.1:40316] "COMMAND"
1534764295.902137 [0 127.0.0.1:40316] "lpop" "service-runner"
1534764296.011123 [0 127.0.0.1:40318] "COMMAND"
1534764296.048981 [0 127.0.0.1:40318] "lpop" "service-runner"
1534764296.157993 [0 127.0.0.1:40320] "COMMAND"
1534764296.195802 [0 127.0.0.1:40320] "lpop" "service-runner"
1534764296.304819 [0 127.0.0.1:40322] "COMMAND"
1534764296.342483 [0 127.0.0.1:40322] "lpop" "service-runner"
1534764296.458611 [0 127.0.0.1:40324] "COMMAND"
1534764296.534213 [0 127.0.0.1:40324] "lpop" "service-runner"
1534764296.645611 [0 127.0.0.1:40326] "COMMAND"
1534764296.683906 [0 127.0.0.1:40326] "lpop" "service-runner"
1534764296.792961 [0 127.0.0.1:40328] "COMMAND"
1534764296.831556 [0 127.0.0.1:40328] "lpop" "service-runner"
1534764296.940561 [0 127.0.0.1:40330] "COMMAND"
1534764296.978779 [0 127.0.0.1:40330] "lpop" "service-runner"
1534764297.087696 [0 127.0.0.1:40332] "COMMAND"
1534764297.123778 [0 127.0.0.1:40332] "lpop" "service-runner"
1534764297.232712 [0 127.0.0.1:40334] "COMMAND"
1534764297.289257 [0 127.0.0.1:40334] "lpop" "service-runner"
1534764297.405525 [0 127.0.0.1:40336] "COMMAND"
1534764297.469148 [0 127.0.0.1:40336] "lpop" "service-runner"
1534764297.577916 [0 127.0.0.1:40338] "COMMAND"
1534764297.615979 [0 127.0.0.1:40338] "lpop" "service-runner"
1534764297.724937 [0 127.0.0.1:40340] "COMMAND"
1534764297.763295 [0 127.0.0.1:40340] "lpop" "service-runner"
1534764297.872390 [0 127.0.0.1:40342] "COMMAND"
1534764297.910873 [0 127.0.0.1:40342] "lpop" "service-runner"
1534764298.019929 [0 127.0.0.1:40344] "COMMAND"
1534764298.058203 [0 127.0.0.1:40344] "lpop" "service-runner"
1534764298.171360 [0 127.0.0.1:40346] "COMMAND"
1534764298.247247 [0 127.0.0.1:40346] "lpop" "service-runner"
1534764298.358903 [0 127.0.0.1:40348] "COMMAND"
1534764298.397296 [0 127.0.0.1:40348] "lpop" "service-runner"
1534764298.506275 [0 127.0.0.1:40350] "COMMAND"
1534764298.544455 [0 127.0.0.1:40350] "lpop" "service-runner"
1534764298.653411 [0 127.0.0.1:40352] "COMMAND"
1534764298.691741 [0 127.0.0.1:40352] "lpop" "service-runner"
1534764298.800785 [0 127.0.0.1:40354] "COMMAND"
1534764298.839067 [0 127.0.0.1:40354] "lpop" "service-runner"
1534764298.947977 [0 127.0.0.1:40356] "COMMAND"
1534764299.002411 [0 127.0.0.1:40356] "lpop" "service-runner"
1534764299.118331 [0 127.0.0.1:40358] "COMMAND"
1534764299.185577 [0 127.0.0.1:40358] "lpop" "service-runner"
1534764299.294485 [0 127.0.0.1:40360] "COMMAND"
1534764299.332686 [0 127.0.0.1:40360] "lpop" "service-runner"
1534764299.441743 [0 127.0.0.1:40362] "COMMAND"
1534764299.479647 [0 127.0.0.1:40362] "lpop" "service-runner"
1534764299.588542 [0 127.0.0.1:40364] "COMMAND"
1534764299.626844 [0 127.0.0.1:40364] "lpop" "service-runner"
1534764299.735902 [0 127.0.0.1:40366] "COMMAND"
1534764299.775585 [0 127.0.0.1:40366] "lpop" "service-runner"
1534764299.884500 [0 127.0.0.1:40368] "COMMAND"
1534764299.959623 [0 127.0.0.1:40368] "lpop" "service-runner"
1534764300.070999 [0 127.0.0.1:40370] "COMMAND"
1534764300.109080 [0 127.0.0.1:40370] "lpop" "service-runner"
1534764300.218050 [0 127.0.0.1:40372] "COMMAND"
1534764300.256361 [0 127.0.0.1:40372] "lpop" "service-runner"
1534764300.365163 [0 127.0.0.1:40374] "COMMAND"
1534764300.402860 [0 127.0.0.1:40374] "lpop" "service-runner"
1534764300.511883 [0 127.0.0.1:40376] "COMMAND"
1534764300.550369 [0 127.0.0.1:40376] "lpop" "service-runner"
1534764300.659241 [0 127.0.0.1:40378] "COMMAND"
1534764300.701712 [0 127.0.0.1:40378] "lpop" "service-runner"
1534764300.817983 [0 127.0.0.1:40380] "COMMAND"
1534764300.895561 [0 127.0.0.1:40380] "lpop" "service-runner"
q11534764301.005468 [0 127.0.0.1:40382] "COMMAND"
1534764301.044846 [0 127.0.0.1:40382] "lpop" "service-runner"
1534764301.153728 [0 127.0.0.1:40384] "COMMAND"
1534764301.192484 [0 127.0.0.1:40384] "lpop" "service-runner"
1534764301.301287 [0 127.0.0.1:40386] "COMMAND"

Which appears to be what is in ~/emonpi/service-runner:

pi@emonpi(ro):~$ cat emonpi/service-runner
#!/bin/bash

## Used to update log viewer window in Emoncms admin
# Used in conjunction with: service-runner-update.sh and Emoncms admin module

scriptname=$(basename $0)
pidfile="/tmp/${scriptname}"

# lock it
exec 200>$pidfile
flock -n 200 || exit 1
pid=$$
echo $pid 1>&200

echo "starting service-runner"

while :
do
    flag=$(redis-cli lpop service-runner)
    if [ "$flag" ]; then
        echo "flag: "$flag
        IFS=\> read script logfile <<< "$flag"
        echo "STARTING: $script > $logfile"
        $script > $logfile
        if [ ! -e $logfile ]; then
            touch $logfile
        fi
        echo "COMPLETE: $script > $logfile"
    fi

    sleep 0.1
done

As soon as I killed service-runner, redis stopped going mental.

[EDIT]
As soon as it started again via crontab though, redis-server is back to consuming an entire core on the Pi.

We have the culprit! kind of…
Updating redis is unlikely to fix this.

The standard method for following log files is tail -f logfilename
If it can be displayed in the admin page it’d be a better method.

and thus: the magic of crontab -e and commenting-out the relavant line and a reboot did the trick!

Next, emonHub using 4% CPU :laughing:

Except that it hasn’t fixed service-runner

Any thoughts on who could take responsibility for that?
Luckily it’s a non-essential program.

Might be useful.
http://logio.org/

Based on the code in github:

I’d suggest it’s @glyn.hudson that we need to let know.

Cool. Apologies that I can’t / won’t go into this more deeply now, I don’t know how service-runner works. I’ve got to keep focussing on my install. The topic was focused on redis high CPU usage and that is fixed. :slight_smile: I’ll leave it to you guys for now :slight_smile:

No problem at all, I only commented because you marked the thread as “Solved” when the problem isn’t actually fixed at all :slight_smile:

I see your point :slight_smile: The underlying problem remains.

Thanks for bringing this to our attention. The issue was service runner was polling redis every 100ms. Since this is a bash script it was not using the php redis connection from emoncms it was creating it’s own connection. This was creating a overhead resulting in high load. Increasing the sleep delay to 1s has now reduced the redis load down to <5%

@glyn.hudson That works well. On the RPi I have here a change of 1s results in about 7%, a change of 2s results in about 4% use, back to normal.
An improvement again would be to start the service runner when an update is initialised, and terminate it afterwards, if the purpose of the runner is to work in conjunction with the updater.

The service running is what triggers the update, it needs to be running continuously. We use the service runner as a bridge between the web application and update bash scripts.

The process is as follows:

  1. Web application triggers an update by setting a flag in redis
  2. Service runner continuously polls redis for an update flag
  3. Service runner start the update and pipes the log back to the web application
1 Like

All is clear. Thanks.
Does polling redis normally take so much CPU? Or is it because of creating a new connection each time?

What triggered the change in approach from file system flags to a redis flag?
Having to use redis-cli from bash seems the least efficient way to achieve this functionality…

service-runner was able to poll the file system 10 times a second without causing significant load on the Pi but that interval polling redis sends redis into a tail spin.

[EDIT] I went back further in history on github and it looks like the filesystem flag was being polled every 1 second until the last commit before it was changed to use redis…

It’s creating a new connection each time. Redis can handle being polled very quickly once the connection is open.

I’ll have to revisit this with @TrystanLea, I think it was decided to be more reliable. easier to manage e.g. file system permissions from a www user etc and more secure.

Maybe using redis is the right way then… How about getting rid of the bash script and use php or python instead so you can maintain a persistent connection to redis?

[EDIT] I’ve chucked together a rough python script today in between other things and will do some testing on it when i get home tonight against the new image to see how it behaves. I’ve left the sleep at 0.1s to see how it compares with the previous bash script and will post results here.