emonPi stops working at midnight

Hey folks.

Over the past 3 weeks, my emonpi has stopped working at around midnight, 3 times.

When it stops working, the HTTP server is running, but serves up a 404 for everything. And the SSH server is down, so I can’t get in to debug.

I’m struggling to debug because once I restart and things are back up, the logs are gone because they’re in a RAM disk.

Anyone else seen this? And any tips on debugging? Like do we already have a way to get logs to go to the disk? I was thinking of just writing a cron to copy the main syslog to disk every 30 minutes or something.

No, but I think it is essential for successful debugging to retain the logs, I have previously suggested using rsync to maintain a copy on the writeable partition or maybe even using logrotate to rotate them out of RAM to disk.

You could set up a script to get temperture, cpu usage, ram usage, size of /var/log and other key stats to be sent to and persisted on disk by emoncms, for checking after the event. Some stats are available on the admin page of emoncms, but they are not much use for debugging if you cannot access the page and they only give current status, not a full picture over the time leading up to the fail.

If you believe it is time linked it might be worth changing the local time on the Pi for a test period, cron jobs etc are usually localtime and changing the timezone should shift the “daily pattern” eg if the fail is because a critical cron job tries to run at midnight but fails due to /var/log being full, the fail time should shift with the new timezone. emonHub and emonCMS data are not dependent on localtime, so your data should be safe, but any logfile timestamps will be affected.

Matt - For debugging I have a bash script that runs once per hour via cron. It grabs all of the log files in the /var/log directory and any log files in the /home/pi/data directory and copies them to a USB thumb drive. Each backup goes into it’s own timestamp directory. Run it with:

sudo bash backup-emonPi-logs-v0.8a.sh

The script will complete in a few seconds.


The code is a bit overkill. I was experimenting with rsync commands and its various parameters. Feel free to change to a simple copy command.

backup-emonPi-logs-v0.8a.sh.txt (2.9 KB)

1 Like

Oh awesome thanks @Jon! I’ve adapted your script and I’ll leave this running. Very useful.

Matt - here is another tool for debugging…

If you use node-red here is a flow that will monitor the various drives (e.g., /var/log, /home/pi/data), CPU load averages, temputure and memory.

Besides node-red you’ll need to install node-red-contrib-os for the drives and load averages.

My original flow saved the data via MQTT and sent alerts via Pushover. I modified (simplified?) this flow just to save the data to a log file. (feel free to ask for the original flow)

[
    {
        "id": "9930649a.efcee8",
        "type": "function",
        "z": "9692bb3e.836928",
        "name": "",
        "func": "if (!msg || !msg.payload) { return null; }\nvar now = new Date();\n/*if (typeof msg.payload === 'object')\n    { msg.payload = \"\\n\" + JSON.stringify(msg.payload, null, 4) + \"\\n\";}*/\n\nif (typeof msg === 'object') { \n    msg1 = msg;\n    delete msg1._msgid;     //  drop the msg ID to save some room\n    msg1.payload = \"\\nmsg : Object\\n\" + JSON.stringify(msg1, null, 4) + \"\\n\";\n    }\n\nmsg.payload = now.toString() + \"\\t\" + now.valueOf() + \"\\t\" + msg1.payload;\nreturn msg;",
        "outputs": 1,
        "noerr": 0,
        "x": 750,
        "y": 160,
        "wires": [
            [
                "e29a6c0f.5151c"
            ]
        ]
    },
    {
        "id": "b1f61693.58f498",
        "type": "comment",
        "z": "9692bb3e.836928",
        "name": "emonPi CPU Monitors",
        "info": "",
        "x": 120,
        "y": 40,
        "wires": []
    },
    {
        "id": "9e84ab4e.2ca66",
        "type": "inject",
        "z": "9692bb3e.836928",
        "name": "1 min",
        "topic": "",
        "payload": "",
        "payloadType": "str",
        "repeat": "60",
        "crontab": "",
        "once": false,
        "x": 110,
        "y": 140,
        "wires": [
            [
                "85b2e630.3d142",
                "b51b2b6c.596618",
                "d61b2f06.bc18a8"
            ]
        ]
    },
    {
        "id": "7ee2d8dd.dc3688",
        "type": "Drives",
        "z": "9692bb3e.836928",
        "name": "",
        "x": 270,
        "y": 100,
        "wires": [
            [
                "8b8d23d2.62d3"
            ]
        ]
    },
    {
        "id": "85b2e630.3d142",
        "type": "Loadavg",
        "z": "9692bb3e.836928",
        "name": "",
        "x": 280,
        "y": 140,
        "wires": [
            [
                "6898e49b.f7e394"
            ]
        ]
    },
    {
        "id": "6898e49b.f7e394",
        "type": "function",
        "z": "9692bb3e.836928",
        "name": "parse loadavg",
        "func": "loadavgTopic = [ \"loadavg01min\", \"loadavg05min\", \"loadavg15min\" ];\n\nfor (i = 0; i < msg.payload.loadavg.length; i++) { \n    msg2 = null;                // msg2 for high load avg Alert\n    var load = (msg.payload.loadavg[i]).toFixed(3);\n\n    if (load >= 4.0) {          // Send Alert?\n        msg2 = {\"topic\":\"CPU Load Average Alert\"};  // msg2 for over high load Alert\n        msg2.filename = \"/home/pi/data/emonCPU.log\";\n        msg2.payload = loadavgTopic[i] + \" is too high at \";\n        msg2.payload += load + \"\\n\";\n        msg2.payload += JSON.stringify(msg.payload, null, 4);\n        }\n\n    msg1 = { payload : load};    // msg1 for mqtt load average\n    msg1.topic = \"emon/emonpiCPU/\" + loadavgTopic[i];\n    node.send([msg1,msg2]);     // send for each load avg in for loop\n}\n\nvar pLoad = (msg.payload.loadavg[0] * 100 / 4).toFixed(1) ;                    // divide by 4 cores\nmsg1 = { payload : pLoad,  topic : \"emon/emonpiCPU/pLoad\" };\nnode.send([msg1,null]); \n\nreturn null;\n",
        "outputs": "2",
        "noerr": 0,
        "x": 520,
        "y": 140,
        "wires": [
            [
                "9930649a.efcee8"
            ],
            []
        ]
    },
    {
        "id": "b51b2b6c.596618",
        "type": "exec",
        "z": "9692bb3e.836928",
        "command": "cat /sys/class/thermal/thermal_zone0/temp",
        "addpay": false,
        "append": "",
        "useSpawn": "",
        "timer": "",
        "name": "Get CPU Temp",
        "x": 300,
        "y": 200,
        "wires": [
            [
                "5e7b01a4.ae101"
            ],
            [],
            []
        ]
    },
    {
        "id": "5e7b01a4.ae101",
        "type": "function",
        "z": "9692bb3e.836928",
        "name": "convert temp data",
        "func": "msg.payload = (msg.payload / 1000).toFixed(1);\nmsg.topic = \"emon/emonpiCPU/tempC\";\nreturn msg;\n",
        "outputs": 1,
        "noerr": 0,
        "x": 530,
        "y": 200,
        "wires": [
            [
                "9930649a.efcee8"
            ]
        ]
    },
    {
        "id": "8b8d23d2.62d3",
        "type": "function",
        "z": "9692bb3e.836928",
        "name": "parse drives",
        "func": "for (i = 0; i < msg.payload.length; i++) { \n    msg2 = null;            // msg2 for over high % capacity\n    var mnt = msg.payload[i].mount;\n    if (mnt == '/') { mnt = 'root'; }\n    var pcap = Math.round(msg.payload[i].capacity * 100.00);\n    if (pcap >= 80) {\n        msg2 = {topic : \"emonPi Drive Alert\"};    // msg2 for over high % capacity\n        msg2.filename = \"/home/pi/data/emonCPU.log\";\n        msg2.payload = mnt + \" directory is at \";\n        msg2.payload += pcap + \"% and is almost full (or full)\" + \"\\n\";\n        msg2.payload += JSON.stringify(msg.payload[i], null, 4);\n\n        msg2.priority = 0;                       //  Normal Priority (0)\n        if (pcap >= 90) { msg2.priority = 1;}    //  High Priority (1)\n        if (pcap >= 98) { msg2.priority = 2;}    //  Emergency Priority (2)\n        }\n    \n    var mnt1 = mnt.replace(/\\//, '');       // delete leading '/'\n    var mnt2 = mnt1.replace(/\\//g, '-');    // replace remaining '/'\n\n    msg1 = {payload : pcap};            // msg1 for mqtt drive % capacity\n    msg1.topic = \"emon/emonpiCPU/\" + mnt2;\n\n    node.send([msg1,msg2]);             // send % for each drive in for loop\n}\nreturn null;\n",
        "outputs": "2",
        "noerr": 0,
        "x": 510,
        "y": 100,
        "wires": [
            [
                "9930649a.efcee8"
            ],
            []
        ]
    },
    {
        "id": "60a8663a.576978",
        "type": "inject",
        "z": "9692bb3e.836928",
        "name": "15 min",
        "topic": "",
        "payload": "",
        "payloadType": "str",
        "repeat": "",
        "crontab": "*/15 0-23 * * *",
        "once": false,
        "x": 120,
        "y": 100,
        "wires": [
            [
                "7ee2d8dd.dc3688"
            ]
        ]
    },
    {
        "id": "d61b2f06.bc18a8",
        "type": "exec",
        "z": "9692bb3e.836928",
        "command": "free -m",
        "addpay": false,
        "append": "",
        "useSpawn": "",
        "timer": "",
        "name": "Get Mem Usage",
        "x": 300,
        "y": 260,
        "wires": [
            [
                "93a15f35.0574a"
            ],
            [],
            []
        ]
    },
    {
        "id": "93a15f35.0574a",
        "type": "function",
        "z": "9692bb3e.836928",
        "name": "get memory data",
        "func": "if ((msg === null) || (!msg.payload)) { return null; }\n\nvar re = /^-\\/\\+.buffers\\/cache.*$/mg;\nvar contextMatch = msg.payload.match(re);     //  find line that starts with \"-/+ buffers/cache:\"\n//  this equals  [ \"-/+ buffers/cache: 475 497\" ]\n\nvar re2 = /^-\\/\\+.buffers\\/cache:\\s*/m;     //  find line that starts with \"-/+ buffers/cache:\"\nvar contextReplace = contextMatch[0].replace(re2, \"\");  //  remove beginning \"-/+ buffers/cache:  \"\n//  this equals    475 497\n\nvar contextSplit = contextReplace.split(/\\s+/);\n\nvar used = contextSplit[0];\nvar available = contextSplit[1];\n\nmsg1 = { topic : \"emon/emonpiCPU/MEMfree\" };      // msg1 for mqtt memory data\nmsg1.payload = available;\nnode.send([msg1,null]);\n\nmsg1 = {};\nmsg1 = { topic : \"emon/emonpiCPU/MEMused\" };\nmsg1.payload = used;\nnode.send([msg1,null]);\n\nmsg1 = {};\nmsg1 = { topic : \"emon/emonpiCPU/MEMtotal\" };\nvar total = (+available) + (+used);\nmsg1.payload = (+total);\nnode.send([msg1,null]);\n\npUsed = ((+used) / (+total) * 100).toFixed(1);\n\nif (pUsed >= 70) {          // Percent mem too high?  Send alert\n    msg2 = {\"topic\":\"Memory Alert\"};    // msg2 for high memory usage alert\n    msg2.payload = \"Available \" + available + \" MB\\nUsed \" + used + \" MB\\n\";\n    msg2.payload += \"Total \" + total + \" MB\\n\" + pUsed + \" % used\";\n    msg2.filename = \"/home/pi/data/emonCPU.log\";\n    msg2.priority = 0;\n    if (pUsed >= 80) { msg2.priority = 1;}    //  High Priority (1)\n    if (pUsed >= 90) { msg2.priority = 2;}    //  Emergency Priority (2)\n\n    node.send([null,msg2]);\n}\n\nreturn;\n",
        "outputs": "2",
        "noerr": 0,
        "x": 530,
        "y": 260,
        "wires": [
            [
                "9930649a.efcee8"
            ],
            []
        ]
    },
    {
        "id": "e29a6c0f.5151c",
        "type": "file",
        "z": "9692bb3e.836928",
        "name": "",
        "filename": "/home/pi/data/emonCPU.log",
        "appendNewline": true,
        "createDir": true,
        "overwriteFile": "false",
        "x": 860,
        "y": 240,
        "wires": []
    }
]

Never heard of this before. Let me know how you get on debugging.

Can you remember any changes you may have made to the ‘stock’ emonPi SD card image?

The only changes are my dashboard development work.

I’ll definitely let you know how I get on debugging!

I realised that one thing it could be is the Dropbox backup script I’m running. I’m just using @Paul’s script and I’ve looked through it and can’t understand why that would be causing it. However it’s a hunch I have right now.

It hasn’t happened again since I’ve had logs being copied out, so I haven’t got any real proof just yet!

What model pi are you using Matt?
Have you tried running the dropbox script whilst monitoring performance.
Also are you running any other RAM hungry apps at the same time, ie any other cron/nodered tasks.

Paul

It’s the latest emonPi from the store here. So a Pi 3.

Not specifically. That’s a good call.

Nope nothing intensive should be running at the same time. And it’s only just started happening with no changes I can think of.

It’s certainly very odd! I’m hoping it happens again soon and then I can go and debug!