Emoncms dashboard crashes after x hours

I am not a very good apache guy, so i need to clarify a few things.
(answered my own question about tail)
Is access.log created if it does not exist? I have run the above command and there is an access.log and access.log.1 so I am assuming they are created.
I use putty to enter the commands, but do I need to leave the terminal window open to continue to write to the file?
How do I copy the file to my local computer, over wifi, or is it on the SD card as I could just remove the card and get the file on my computer.

The latest test did crash, but I wanted to catch it when it crashed, so I reloaded the page (dashboard) and it started up again… just waiting for the crash (1355 PST).

I CAUGHT THE CRASH!!!
I can copy/paste a few entries before and after the crash, but it should be in the log file.
I can copy the terminal buffer and save it as a text file.
The ip ending in .120 is my tablet running Chrome, the .113 is my OpenEVSE charger, and the .103 is the emonpi
Here are the few lines before/after the crash:

192.168.1.120 - - [20/Jun/2019:16:16:57 -0700] "GET /emoncms/feed/data.json?id=59&start=1560986217190&end=1561072617190&interval=108&skipmissing=1&limitinterval=1 HTTP/1.1" 200 6839 "http://192.168.1.103/emoncms/vis/multigraph?embed=1&mid=1" "Mozilla/5.0 (Linux; Android 8.1.0; Lenovo TB-X304F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36"
192.168.1.120 - - [20/Jun/2019:16:17:01 -0700] "GET /emoncms/feed/list.json?userid=1 HTTP/1.1" 200 832 "http://192.168.1.103/emoncms/dashboard/view?id=10&readapikey=1323878bcb618cec5f56b0da8293ec49" "Mozilla/5.0 (Linux; Android 8.1.0; Lenovo TB-X304F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36"
192.168.1.120 - - [20/Jun/2019:16:17:04 -0700] "GET /emoncms/feed/list.json?userid=1 HTTP/1.1" 200 833 "http://192.168.1.103/emoncms/dashboard/view?id=10&readapikey=1323878bcb618cec5f56b0da8293ec49" "Mozilla/5.0 (Linux; Android 8.1.0; Lenovo TB-X304F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36"
192.168.1.120 - - [20/Jun/2019:16:17:04 -0700] "GET /emoncms/feed/data.json?id=61&start=1560986223983&end=1561072623983&interval=108&skipmissing=1&limitinterval=1 HTTP/1.1" 200 7173 "http://192.168.1.103/emoncms/vis/multigraph?embed=1&mid=12" "Mozilla/5.0 (Linux; Android 8.1.0; Lenovo TB-X304F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36"
192.168.1.120 - - [20/Jun/2019:16:17:04 -0700] "GET /emoncms/feed/data.json?id=57&start=1561051024210&end=1561072624210&interval=27&skipmissing=0&limitinterval=1 HTTP/1.1" 200 7297 "http://192.168.1.103/emoncms/vis/multigraph?embed=1&mid=9" "Mozilla/5.0 (Linux; Android 8.1.0; Lenovo TB-X304F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36"
192.168.1.113 - - [20/Jun/2019:16:17:15 -0700] "GET /emoncms/input/post.json?node=openevse&json={\"amp\":0,\"wh\":804977,\"temp1\":345,\"temp2\":0,\"temp3\":230,\"pilot\":10,\"state\":1,\"freeram\":27624,\"divertmode\":1}&apikey=d57ceaff6596d72ba15f8229d845ffc7 HTTP/1.1" 200 170 "-" "ESP8266HTTPClient"
192.168.1.113 - - [20/Jun/2019:16:17:45 -0700] "GET /emoncms/input/post.json?node=openevse&json={\"amp\":0,\"wh\":804977,\"temp1\":345,\"temp2\":0,\"temp3\":230,\"pilot\":10,\"state\":1,\"freeram\":27624,\"divertmode\":1}&apikey=d57ceaff6596d72ba15f8229d845ffc7 HTTP/1.1" 200 170 "-" "ESP8266HTTPClient"
192.168.1.120 - - [20/Jun/2019:16:17:56 -0700] "-" 408 0 "-" "-"   

*******************tablet crashed

192.168.1.113 - - [20/Jun/2019:16:18:16 -0700] "GET /emoncms/input/post.json?node=openevse&json={\"amp\":0,\"wh\":804977,\"temp1\":345,\"temp2\":0,\"temp3\":230,\"pilot\":10,\"state\":1,\"freeram\":27624,\"divertmode\":1}&apikey=d57ceaff6596d72ba15f8229d845ffc7 HTTP/1.1" 200 170 "-" "ESP8266HTTPClient"
192.168.1.113 - - [20/Jun/2019:16:18:46 -0700] "GET /emoncms/input/post.json?node=openevse&json={\"amp\":0,\"wh\":804977,\"temp1\":345,\"temp2\":0,\"temp3\":230,\"pilot\":10,\"state\":1,\"freeram\":27624,\"divertmode\":1}&apikey=d57ceaff6596d72ba15f8229d845ffc7 HTTP/1.1" 200 170 "-" "ESP8266HTTPClient"
192.168.1.113 - - [20/Jun/2019:16:19:16 -0700] "GET /emoncms/input/post.json?node=openevse&json={\"amp\":0,\"wh\":804977,\"temp1\":345,\"temp2\":0,\"temp3\":230,\"pilot\":10,\"state\":1,\"freeram\":27624,\"divertmode\":1}&apikey=d57ceaff6596d72ba15f8229d845ffc7 HTTP/1.1" 200 170 "-" "ESP8266HTTPClient"
192.168.1.113 - - [20/Jun/2019:16:19:46 -0700] "GET /emoncms/input/post.json?node=openevse&json={\"amp\":0,\"wh\":804977,\"temp1\":345,\"temp2\":0,\"temp3\":230,\"pilot\":10,\"state\":1,\"freeram\":27624,\"divertmode\":1}&apikey=d57ceaff6596d72ba15f8229d845ffc7 HTTP/1.1" 200 170 "-" "ESP8266HTTPClient"
192.168.1.113 - - [20/Jun/2019:16:20:16 -0700] "GET /emoncms/input/post.json?node=openevse&json={\"amp\":0,\"wh\":804977,\"temp1\":345,\"temp2\":0,\"temp3\":230,\"pilot\":10,\"state\":1,\"freeram\":27624,\"divertmode\":1}&apikey=d57ceaff6596d72ba15f8229d845ffc7 HTTP/1.1" 200 170 "-" "ESP8266HTTPClient"
192.168.1.113 - - [20/Jun/2019:16:20:47 -0700]..........  

end of copy/paste

Notice that the emonpi is still talking to the charger, but the dashboard is not requesting data anymore, and ended with a 404

That’s great, your perseverance has payed off.

Could you post a larger section of “before the crash” so that we can see a full cycle or two of all the different requests sent by your tablet, the section you post shows requests to update 3 feeds in 3 multigraphs, there is enough data to work out what that failed request might have been.

The return code is actually a 408 not a 404, a 408 is a timeout. What would be useful in this instance is another test with emoncms and the client browser on the same machine to rule out network delays and interruptions but I doubt that is easily achieved given the emonSD and emonPi are not geared up for a desktop or monitor.

Either way, even if this is caused by a external network issue, it should be able to handle a temp network issue and a couple of timeouts without crashing (IMO), if we can narrow the search by working out which request failed (which widget) we can look closer at the code to see if/how it handles timeouts and retries.

It would also be useful if you could repeat the test a couple of times so that we might establish a pattern, eg is it always the same widget’s request that causes the timeout or is it any/all widgets?

Assuming these are on a simple LAN network, are they using wifi? Is there any chance of wifi dropout/sleep? Is it possible to use Ethernet cables (for a test) to rule out wifi issues?

OOPS sorry. I was so excited to capture the crash I confused the number with something else.

Here is the complete log file for your dining and dancing pleasure! This is my ID=10 dashboard.
At about line 659 is the crash. The time stamp is
[20/Jun/2019:16:17:56 -0700]
CRASH.txt (231.9 KB)

Since last night 2019.06.20 22:29:14, I have been running ID=5 which has realtime, multigraph, dial, gauge, battery on a simple dashboard, and it has not crashed as of 06.21 0900.
Here is that logfile also:
id5_nocrash.txt (6.7 MB)

I may start another dashboard to test. I will try ID=1 (my original problem child) and log that.

I would be more than happy to try wired ethernet rather than wireless.
Is the ethernet port active, or do I need to configure it?

Hi Bill,

If you connect the port to your router then reboot, your emonPi should pick up an IP address
from your router’s DHCP server and you should be good to go.

Thanks Bill! I now have it wired ethernet, and I stopped the wireless so I only have wired comm.
Had to install an ethernet switch, but I have wanted to do that for a while now, so this was the “inspiration” to do it!

I would much rather go wired as I am beginning to have too much wireless crap, and it is starting to affect bandwidth. 4 thermostats, 2 Roku tv’s, 3 tablets, a sprinkler controller… I do not like wireless because of the bandwidth problem and possible interference and security.
Guess I am old school!

Don’t feel like the Lone Ranger. I much prefer twisted pairs to wi-fi myself.

Oddly enough, on the way home from work today,
I heard Ronnie & the Daytonas singin’ Little GTO.

When the song got to the part:

Gonna save all my money
And buy a GTO (turnin’ it on blowin’ it out)
Get a helmet and a roll bar
And I’ll be ready to go (turnin’ it on, blowin’ it out)
Take it out to Pomona
And let ‘em know, yeah yeah (turnin’ it on blowin’ it out)
That I’m the coolest thing around
Little buddy, gonna shut you down
When I turn it on, wind it up, blow it out, GTO

I thought, 'tis a small world indeed!

alrightythen!

Well even with ethernet connection, my tablet STILL crashed with ID=10 dashboard running. It ran for approx. 7 hours today.
id_10 crash.txt (5.0 MB)
I uploaded the putty log file, and made a note where the tablet stopped receiving data.
The string is “*********** last entry to tablet and crash” sans quote marks.
i am guessing that there is something with this dashboard, and the others that crash, that creates the problem, because not all my dashboards crash.

I will run one of the non-crashing dashboards again just to make sure it doesn’t crash.
The one I will run is ID=7. This has ONLY realtime graphs.
I will log the activity too.

I am creating a spreadsheet of each dashboard ID number, the elements in the dashboard, and if it crashed.
I plan on running more tests of the same dashboards just for redundancy purposes.
I will keep ya’ll updated!

I find it hard to believe that i am the only one with this problem though?!?!
hhmmmmmm :thinking::thinking::roll_eyes:

Looking at a couple of your crash logs, I see no obvious pattern to the crash, it doesn’t occur at the point of the update pattern and there is no 408 in the last crash report, it just stops, so I suspect the 408 may be a result of the crash not the cause.

Whilst I do think the dashboard you are testing with should function without problem, at this stage of the debugging, I have to wonder about the level of traffic. You are not just testing with one multigraph, you have 4. Those multigraphgs are also updating at a fair old rate of every 10s and then there are the 5s updates for the gauges. For me this raises so many other questions like

  • Is it too busy at 36 requests per minute?
  • is it because of multiple multigraphs?
  • is it the combination of multigraphs and dials?

I would try some stuff like removing all the dials (clone the dash rather than working on your original) and/or calming down the multigraph traffic by increasing the update interval and/or reducing the number of multigraphs (or just turn off auto-update rather than delete?). Since you have a dash that you know will crash every time, use that to work out why it crashes by editing it until it doesn’t crash.

Have you tried just a single multigraph dashboard yet?

Is your laptop on an Ethernet cable?

[edit] Actually, before anything else, I would try setting the mutigraphs to only auto-update every 60s, if that works, try 30s, anything faster than 30s is wasted IMO, when you have 24hrs data on display the sampling means it’s pot luk whether the last 2 mins will contribute anything to the plot. Look at the “&interval=” parts of your data requests in the logs, 3 multigraphs are using a 108 second interval and one is using a 27 second interval, even with the 27 second interval you only have a 1 in 3 chance that the last 10s data will be displayed and TBH, unless it is drastically different to the previous couple of minutes you are not going to see the change on the single last pixel of the graph. In short, you are burning alot of effort in both the browser and the emonpi to achieve a mostly undetectable level of updates. I have previously set up my multigraphs at 30s and they work very well, if they hadn’t, or if the load on my cloud VPS was too great (far more powerful that a pi) I would have happily changed that to 60 or 120s as I’m sure that would be almost undetectable.

1 Like

My last run had id 8 on the tablet (all raw data graphs), and id 7 on the laptop (real time data).
Strange thing is the raw data graphs do NOT update. I have attached the log from that run.

I will try a clone of my “default” dashboard and remove all the multigraphs and add several more dials and gauges.

Let ya know what I find out.8 no update and 7laptop.txt (7.1 MB)

id 8 is raw data graphs and id 7 is realtime.
IP .11 is the laptop and IP .120 is the tablet. IP . 113 is my openevse charge controller. IP 146 is the wired ethernet from the Emonpi.

I don’t that is strage tbh, I think “realtime” graphs were so called because at the time they were the only ones that gave you real time data. All the other graphs at that time were static (unless refreshed). Then came along multigraph which I believe aimed to replace all the different graphs, then more recently the graph module had a similar ambition but there are several features of the multigraph that haven’t been ported yet, such as auto-update. So IIRC only the realtime and multigraph (with auto-refresh configured) will update without refreshing the page.

The point I’m getting at is to eliminate potential wifi network interruptions by using wired network from end to end. As I understand it, the dashboards that crash, used to crash regardless of whether it was on the tablet or laptop? So do they still crash running on the laptop with a Ethernet cable, ie not using wifi?

As far as I can recall, the dashboards that you have had crash, are all pretty busy, either they have had a single realtime graph (updating every second) or 4 multigraphs with 10s intervals (updating every ~2.5s). Just try slowing that traffic down as slow as possible for some tests.

You could do! But all the data for all the dials and guages type widgets are all done in a single request every 5s. I don’t think it will matter too much if you have 1 or 101 gauges, the requests will not differ, only the time taken to populate and redraw the widgets will differ so whilst (IMO) unlikely, it’s not impossible for it to be the cause (stranger things have happened!)

If you open the browser console (F12 on chrome) you can watch the requests being made in realtime and how long the replies take etc. you will see a dial only dash only makes one request every 5s, where as 4 multigraphs and some dials makes 10 across every 10s, that’s 5x as many requests and I’d guess the multigraph replies might be larger and slower than the normal “feed list” replies too.

Yes, laptop or wifi still crash.
Just a question… why does it take 7 hours approx. for the crash to happen? I would think that if there was a timing problem, it should crash within the first few minutes of running the dashboard, or does the crash happen at “just the right time”. It seems like 7 hours is the “magic” number for a crash to happen.

So what I gather is to set the update times to something like 30-60 seconds, rather than only when data is received from the Emonpi?

If I allowed ONLY particular graphs and dials a short refresh, and set the others for a long time, that would solve the problem?

I have seen very detailed dashboards as examples, but there is no clue as to the update rate that is being used.

Bad news… id 11 crashed. this is the gauges and dial with a battery and led. This was on the laptop, the hard wired ethernet laptop.
Here is the log file, but nothing obvious… ip 11 just stops responding (yes the dashboard and IP were both 11).
Took about 7 hours to happen.
5 and 11 on laptop 2.txt (5.4 MB)

For the next test, I am going to clone my original dashboard (id=1), and reconfigure all my multigrpahs to 30 second update

I wonder if there’s a browser memory issue? A memory leak causing the crash, I haven’t run a dashboard for extended periods like this for a while. Does anyone know of a good way of monitoring firefox’s memory use? I can see the memory snapshot but it takes ages

Hi Trystan. I am using chrome in incognito mode as was suggested.

Anyway, I went through all my multigraphs and set the refresh times to 60 seconds except the bargraph and that is coming from use_kwh and has an interval of “d” for day.
I ran this dashboard, id=13, last night and 7 hours later it had crashed AGAIN.

I attached the putty log file and requests from .120 (tablet) to Emonpi .146 just stopped.
The requests from my OpenEVSE .113 still continued.

To my very limited knowledge of json and html, all I can assume is that something is “overflowing” or just stopping because of the random times the crashes happen, and the length that a run will, well, run before crashing.

since it has happened in Chrome and Firefox, I could try a different browser on the tablet?
I could try IE (I dispise IE) on the laptop with the same dashboard that has reliably crashed?
Here is the log file of ID13:
13 slow update on tablet.txt (4.2 MB)

Search for " ************ tablet crashes " (sans quotes) string where the tablet (.120) stops.
No error, no clue as to why.

Also, how are the different elements in a dashboard identified? Do they have an ID number? like json?userid=1 HTTP/1.1" 200 836.
If so, what are the id’s referring to as far as the gauges/dials/graphs?

Is there anyone else seeing similar issues, running display’s with dashboards that are always on crashing?

I will set up a test here and see if I can replicate.

Hi Trystan. That would be great! Just remember my crashes took about 7-8 hours to occur.
One BIG (I think) discovery…
On my laptop, I ran the ID13 (a clone of ID1) with 60 second multigraph updates, and ran it in IE instead of chrome.
NO CRASH AFTER 24 HOURS!!!
I have since installed Opera on the tablet and am trying a run now.
So to recap, ID13 / Opera on the Android tablet / IE on the laptop (win 7 64 bit).

I am logging putty’s output, so I will see if this combination lasts 8 hour (hopefully ALOT more), and will post the progress and log file.

OOPS spoke too soon!
Just got an error on Win laptop running IE…
EmonCMS error
Message Out of memory
Route:
Line: 1
Column: 1
Error: {description “:“Out of memory”:”-2146828281}
And the Putty trace window shows that .11 (laptop) is not talking anymore.
Here is the log file:
id13 IE laptop opera tablet.txt (1.2 MB)
Hope this might help!