Mysqld table marked as crashed

johnbanks · 15 November 2019 17:20

I have 4 instances of emonTx/Rpi that send data to a fifth RPi (watchman) via HTTPInterfacers.

Working remotely (dataplicity) it seems one emonTx/RPi instance is not working. The other 3 are fine updating feed data on their respetive SDHC’s.

Also watchman has stopped updating. Here’s /var/log/mysql/error.log for watchman …

2019-11-14 16:15:49 1988837376 [Note] InnoDB: The InnoDB memory heap is disabled
2019-11-14 16:15:49 1988837376 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2019-11-14 16:15:49 1988837376 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2019-11-14 16:15:49 1988837376 [Note] InnoDB: Compressed tables use zlib 1.2.8
2019-11-14 16:15:49 1988837376 [Note] InnoDB: Using Linux native AIO
2019-11-14 16:15:49 1988837376 [Note] InnoDB: Using generic crc32 instructions
2019-11-14 16:15:49 1988837376 [Note] InnoDB: Initializing buffer pool, size = 128.0M
2019-11-14 16:15:49 1988837376 [Note] InnoDB: Completed initialization of buffer pool
2019-11-14 16:15:49 1988837376 [Note] InnoDB: Highest supported file format is Barracuda.
2019-11-14 16:15:49 1988837376 [Note] InnoDB: The log sequence numbers 1618147 and 1618147 in ibdata files do not match the log sequence number 1618167 in the ib_logfiles!
2019-11-14 16:15:49 1988837376 [Note] InnoDB: Restoring possible half-written data pages from the doublewrite buffer...
2019-11-14 16:15:49 1988837376 [Note] InnoDB: 128 rollback segment(s) are active.
2019-11-14 16:15:49 1988837376 [Note] InnoDB: Waiting for purge to start
2019-11-14 16:15:49 1988837376 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.35-80.0 started; log sequence number 1618167
2019-11-14 16:15:50 1447031616 [Note] InnoDB: Dumping buffer pool(s) not yet started
2019-11-14 16:15:50 1988837376 [Note] Plugin 'FEEDBACK' is disabled.
2019-11-14 16:15:50 1988837376 [Note] Recovering after a crash using tc.log
2019-11-14 16:15:50 1988837376 [Note] Starting crash recovery...
2019-11-14 16:15:50 1988837376 [Note] Crash recovery finished.
2019-11-14 16:15:50 1988837376 [Note] Server socket created on IP: '127.0.0.1'.
2019-11-14 16:15:50 1988837376 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.1.23-MariaDB-9+deb9u1'  socket: '/home/pi/data/mysql/mysql.sock'  port: 3306  Raspbian 9.0
2019-11-15 13:34:11 1639236416 [ERROR] mysqld: Table './emoncms/feeds' is marked as crashed and should be repaired
2019-11-15 13:34:11 1639236416 [Warning] Checking table:   './emoncms/feeds'

Seems to be a problem at 16:15 on 14 Nov which is when watchman feeds stopped updating.

And another problem today at 13:43 re the mysqld table. This was after I restarted apache2.

I then rebooted watchman - the new error log was somewhat different …

pi@watchman:/ $ cat /var/log/mysql/error.log
2019-11-15 14:35:23 1989095424 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2019-11-15 14:35:23 1989095424 [Note] InnoDB: The InnoDB memory heap is disabled
2019-11-15 14:35:23 1989095424 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2019-11-15 14:35:23 1989095424 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2019-11-15 14:35:23 1989095424 [Note] InnoDB: Compressed tables use zlib 1.2.8
2019-11-15 14:35:23 1989095424 [Note] InnoDB: Using Linux native AIO
2019-11-15 14:35:23 1989095424 [Note] InnoDB: Using generic crc32 instructions
2019-11-15 14:35:23 1989095424 [Note] InnoDB: Initializing buffer pool, size = 128.0M
2019-11-15 14:35:23 1989095424 [Note] InnoDB: Completed initialization of buffer pool
2019-11-15 14:35:23 1989095424 [Note] InnoDB: Highest supported file format is Barracuda.
2019-11-15 14:35:23 1989095424 [Note] InnoDB: 128 rollback segment(s) are active.
2019-11-15 14:35:23 1989095424 [Note] InnoDB: Waiting for purge to start
2019-11-15 14:35:23 1989095424 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.35-80.0 started; log sequence number 1618177
2019-11-15 14:35:23 1447031616 [Note] InnoDB: Dumping buffer pool(s) not yet started
2019-11-15 14:35:23 1989095424 [Note] Plugin 'FEEDBACK' is disabled.
2019-11-15 14:35:23 1989095424 [Note] Server socket created on IP: '127.0.0.1'.
2019-11-15 14:35:24 1989095424 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.1.23-MariaDB-9+deb9u1'  socket: '/home/pi/data/mysql/mysql.sock'  port: 3306  Raspbian 9.0

I can export watchman’s feed data up to the point it stopped updating on 14 Nov but I suspect it will contain a corruption and so the export will not be usable.

Any help most welcome
Thx

borpin · 15 November 2019 20:54

How is watchman setup? It seems likely this is SD Card corruption.

johnbanks · 15 November 2019 21:35

@borpin

watchman runs from a USB HDD. I’ve rebooted it sucessfully, done a data export and Admin shows plenty of space on all partitions. I think that rules out card or HDD corruption?

Been googling …

Did sudo mysqlcheck --repair --all-databases which reported all was OK (except for 3 lines where repair is not supported)
Restarted mariadb and apache2.
The narrative when mariadb was shutting down referred to log sequence 1618177.
When starting, the log sequence was 1618187.
Is this difference significant?

I then looked at the emonHub log on one of the satellite emonTx/RPi’s.
It was running and had a large buffer of unsent data to watchman (expected)
Restarted emonHub which got rid of the large buffer.
In both cases, the HTTPInterfacer with watchman reported …
… wanted ‘ok’ but got ’
,
The single quote mark on the second line looks strange? Were a bunch of non printable characters sent?

I can nmap scan the network from watchman and see the 3 running satellite emonTx/RPi’s - so it’s not a network problem.

Any further thoughts?
Thx

borpin · 15 November 2019 22:03

For all the remote TXs?

So they are updating locally but not to watchman?

Just so I confirm I understand the setup; 4x TXs, each with a Pi and EmonSD directly connected by serial to the TX (or RF) at remote locations separated by ??? from each other, connected to watchman by WiFi? which is also running an emonsd. Each TX/Pi combo records locally by MQTT? but also sends data to watchman via HTTP. Is that right?

Is the HDD powered by the Pi or by the USB? Could be a power issue. The instances I have used a USB HDD I had it powered directly for just that reason. Is it a HDD or an SSD?

I’m not clear why the feed might have stopped. If you could get it to talk it might have been able to dump that data into watchman (it is part of the point of the emonhub - to buffer when network goes down).

You will have to do some investigation from the TX/Pi combos (I know it is remote) using the commands in the API from the command line to see if the remote Pi can talk to watchman using the API. Using tshark on watchman, you can also see the HTTP traffic coming in and going back out.

Is there anything in the emoncms log on watchman at the same time that the remote TX is getting that error? You may need to change the log level to 1 to see it.

Why 3 are reliable and one less so is a mystery. It does work sometimes though doesn’t it?

johnbanks · 15 November 2019 23:40

@borpin

Yr understanding is correct.

All 3 running remote TX/RPi’s are not updating to watchman.
The 4th is usually OK - will get my son to reboot tomorrow - perhaps a wifi power management issue.

watchman is hard wired to the network. Remotes are wifi. watchman can see (nmap) the 3 running remotes.

watchman uses a USB laptop HDD powered by the RPi. Not been a past power problem and I’ve done power measurements in the past which have left me feeling comfortable on this aspect.

Here’s the emoncms log for watchman …
Things don’t look AOK.
How do I set a more detailed log level 1?

2019-11-15 14:35:28.853|WARN|emoncms_mqtt.php|Not connected, retrying connection
2019-11-15 14:35:29.120|WARN|emoncms_mqtt.php|Connecting to MQTT server: Connection Accepted.: code: 0
2019-11-15 14:35:33.560|WARN|index.php|406 Not Acceptable|feed/list
2019-11-15 14:35:38.229|WARN|index.php|406 Not Acceptable|feed/list
2019-11-15 14:36:05.305|WARN|index.php|406 Not Acceptable|feed/list
2019-11-15 14:36:10.198|WARN|index.php|406 Not Acceptable|feed/list
2019-11-15 17:19:54.706|ERROR|index.php|Not Authenticated|feed/api
2019-11-15 18:32:17.047|WARN|emoncms_mqtt.php|Not connected, retrying connection
2019-11-15 18:32:17.185|WARN|emoncms_mqtt.php|Connecting to MQTT server: Connection Accepted.: code: 0
2019-11-15 19:18:24.350|WARN|index.php|406 Not Acceptable|feed/list
2019-11-15 19:18:29.242|WARN|index.php|406 Not Acceptable|feed/list
2019-11-15 19:18:34.231|WARN|index.php|406 Not Acceptable|feed/list
2019-11-15 19:18:39.244|WARN|index.php|406 Not Acceptable|feed/list
2019-11-15 19:18:44.235|WARN|index.php|406 Not Acceptable|feed/list
2019-11-15 19:18:49.271|WARN|index.php|406 Not Acceptable|feed/list
2019-11-15 19:18:54.318|WARN|index.php|406 Not Acceptable|feed/list
2019-11-15 19:18:59.245|WARN|index.php|406 Not Acceptable|feed/list
2019-11-15 19:19:03.883|ERROR|index.php|Not Authenticated|feed/list
2019-11-15 19:19:04.281|WARN|index.php|406 Not Acceptable|feed/list

How can authentication work OK for months and then suddenly not be OK?

I’ll research how to use tshark on watchman.

But pls give me a steer on using commands in the API sent from a remote TX/RPi.

You didn’t comment on the mariadb log sequence number difference. Is this a red herring?

Many thx for yr input

johnbanks · 16 November 2019 01:35

@borpin

I’ve installed tshark on watchman and run …
sudo tshark -f “host 192.168.1.100” … the IP address of watchman
Aso tried the IP address of one of the running remote TX/RPi’s.
In both cases - after waiting for several minutes, ‘0 packets captured’
Does tshark only record successful data transfers?

Pls give me a steer on using commands in the API sent from a remote TX/RPi.

It does seem that the remote TX/RPi’s are attempting to send data OK - very unlikely they would all fail simultaneously?
Hence the problem must be with watchman - seems to me.

Many thx

borpin · 16 November 2019 10:02

In settings.php or settings.ini - search for ‘log’

Have you searched here on this? There have been several threads recently. This would be the problem.

I have no idea.

In the Input API help (top right of Inputs page) you will see some example curl commands - you just need to fiddle with these. use a different node ID so it does not muck up real data.

No it sees anything that matches the rules you provide.

You need to play with tshark commands - there are quite a few I have used. First use a really general one and then go from there. If you access the emoncms on watchman from a browser on a different machine, you will see http requests and responses. start from what you know is working, and work back from there.

Just going out - so a bit of a quick response I’m afraid. the 406 are the root of the issue I think.

Had you done an update at all?

johnbanks · 16 November 2019 17:08

@borpin

Digging a bit further, I downloaded the emoncms log for one of the remote TX/RPi’s. It reveals more than if it it just viewed on the web screen.
192.168.1.100 is the IP address for watchman.

Clearly this is the problem and why the remotes are not updating watchman.

2019-11-16 16:19:45,148 WARNING  To_watchman send failure: wanted 'ok' but got '<body><script language="javaScript">window.parent.document.cookie="Authorization=;path=/";
window.parent.location.href = "http://192.168.1.100";
</script></body></html>
'
2019-11-16 16:19:45,334 DEBUG    MQTT       Publishing: emon/1phase_node12/power1 14

What it all means and what is the fix - beyond me
Any advice?
Thx

PS - for what it’s worth - an extract from emonhub.conf on one of the remote TX/RPi’s showing the HTTPInterfacer. It’s all been working OK for months until a couple of days ago.

[[To_watchman]]
    Type = EmonHubEmoncmsHTTPInterfacer
    [[[init_settings]]]
    [[[runtimesettings]]]
        pubchannels = ToRFM12,
        subchannels = ToEmonCMS,
        url = http://192.168.1.100/emoncms    # watchman's FIXED IP address
        apikey = 4448328134be7b9b3f044********e8b
        senddata = 1                   # To _watchman
        sendstatus = 0

[Edit - APIKey mangled - RW (Moderator) ]

borpin · 16 November 2019 17:54

Do you mean emoncms or emonhub log? That looks like the emonhub log.

As I said before,

Is there a corresponding error in the emoncms log of watchman, to the emonhub error on the TX? You can change the log level on the emoncms.log file in settings.php or settings.ini.
You need to try the curl commands from the remote RPi to see if they work.
You need to use tshark on watchman to see the actual incoming and outgoing HTTP traffic.

In addition, you could try one of the examples from the Input API when connected to the watchman web interface from a different computer. If that doesn’t work, it is a general error with the EmonCMS API on watchman.

To debug that, all you can do is use tshark to monitor it and see what is actually being sent.

Generally, to debug this you just have to be systematic - start with the general case, if that works try more specific test cases till you find the one that fails.

Have you done updates on all systems? What version are they at?
e.g (though I expect you are on stable branch)

johnbanks · 16 November 2019 20:34

@borpin

Yes - it was the emonhub log.

From a remote TX/RPi (node12) I’ve made an API call to determine a last feed value & time using the recommended format …

pi@emonpi-node-12:/ $ curl http://192.168.1.100/emoncms/feed/value.json?id=1 -H "Authorization: Bearer 4448328134be7bXXXXXXXXXXXXXX49f910e8b"
<body><script language="javaScript">window.parent.document.cookie="Authorization=;path=/";
window.parent.location.href = "http://192.168.1.100";
</script></body></html>
pi@emonpi-node-12:/ $

The response is similar to that shown in the emonhub log to a call by the HTTPInterfacer

I then checked the emoncms log but there was no report for that same time.
Pls give me more specific advice on how to change the logging level in the emoncms log.

I then logged into watchman’s web interface and clicked on the suggested API call for a feed last value & time. It worked fine - but, of course, was 2 days out of date … the problem we are engaged with.

I then logged out of watchman and from my remote laptop made the same API call. It worked OK.

My crude non IT summary …

When logged in, watchman can converse with itself
When not logged in to watchman, I can converse with watchman from 40 miles away via dataplicity
BUT - another server on watchman’s local network cannot gain access to converse - even tho’ (as demonstrated by nmap) watchman can see these other servers on that local network.

I’ve struggled with the complexity of tshark’s parameters. In blind faith I tried a suggestion of yours I found on another forum post - with the following result …

pi@watchman:/ $ sudo tshark -Y http.request -T fields -e http.request.full_uri
Running as user "root" and group "root". This could be dangerous.
Capturing on 'eth0'
http://239.255.255.250:1900*
http://239.255.255.250:1900*
http://239.255.255.250:1900*
http://239.255.255.250:1900*
http://239.255.255.250:1900*

… and so it went on over several minutes.
I’m not sure this is helpful?

At the time of the problem onset, all TX/RPi’s were running ver 10.1.10. watchman is now runing ver 10.1.13.

I’ve done CTL+R to clear the browser cache on my laptop. Does watchman have a browser cache? - if so, how to clear it? But watchman was rebooted yesterday.

Sorry to be a pain - and thx

Does the title of this forum post need changing - seems the problem is not about mysql

borpin · 16 November 2019 22:45

Ok, but we are not interested in the feed API, but the input API (Note I said input API). In the input API there is a line (and I’m not fussed about the API key);

Just copy that line, and paste it into a shell on the remote EmonTX/Rpi. Do you get ok back?

ls -la /var/www/emoncms

then either

nano /var/www/emoncms/settings.php

or

nano /var/www/emoncms/settings.ini

depending on which file is there.

Then find the line relating to ‘log’ and change the level to one (1).

I used that exact same command - I got the 1900* when I was not logged into the web interface. once I was I got the messages about me navigating around the web interface.

http://239.255.255.250:1900*
http://239.255.255.250:1900*
http://239.255.255.250:1900*
http://239.255.255.250:1900*
http://192.168.7.243/device/list.json
http://192.168.7.243/input/list.json
http://192.168.7.243/admin/view
http://239.255.255.250:1900*
http://239.255.255.250:1900*
http://239.255.255.250:1900*
http://192.168.7.243/user/logout?msg=Admin+Authentication+Required&ref=L2FkbWluL3ZpZXc%3D
http://192.168.7.243/?msg=Admin+Authentication+Required&ref=JTJGYWRtaW4lMkZ2aWV3
http://239.255.255.250:1900*
http://239.255.255.250:1900*
http://239.255.255.250:1900*
http://239.255.255.250:1900*
http://239.255.255.250:1900*
http://192.168.7.243/user/login.json
http://192.168.7.243/feed/buffersize.json
http://192.168.7.243/admin/getlog
http://192.168.7.243/admin/emonpi/getupdatelog
http://192.168.7.243/backup
http://192.168.7.243/backup/exportlog
http://192.168.7.243/backup/importlog
http://192.168.7.243/backup/exportlog
http://192.168.7.243/backup/importlog
http://192.168.7.243/backup/exportlog
http://192.168.7.243/backup/importlog
http://192.168.7.243/backup/exportlog
http://192.168.7.243/backup/importlog
http://192.168.7.243/admin/view

So using a slightly different command (I’m no expert, just google for examples and read the docs)

pi@emonpi:/var/www/emoncms $ sudo tshark -f "port 80"
Running as user "root" and group "root". This could be dangerous.
Capturing on 'eth0'
    7 4.643893968 192.168.7.50 → 192.168.7.243 TCP 74 40288 → 80 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=280670634 TSecr=0 WS=128
    8 4.644041155 192.168.7.243 → 192.168.7.50 TCP 74 80 → 40288 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=2551356597 TSecr=280670634 WS=128
    9 4.644380737 192.168.7.50 → 192.168.7.243 TCP 66 40288 → 80 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=280670635 TSecr=2551356597
   10 4.644487559 192.168.7.50 → 192.168.7.243 HTTP 315 POST /input/bulk HTTP/1.1  (application/x-www-form-urlencoded)
   11 4.644592559 192.168.7.243 → 192.168.7.50 TCP 66 80 → 40288 [ACK] Seq=1 Ack=250 Win=30080 Len=0 TSval=2551356597 TSecr=280670635
   12 4.681301171 192.168.7.243 → 192.168.7.50 HTTP 217 HTTP/1.1 200 OK  (text/plain)
   13 4.681776847 192.168.7.50 → 192.168.7.243 TCP 66 40288 → 80 [ACK] Seq=250 Ack=152 Win=30336 Len=0 TSval=280670644 TSecr=2551356634
   14 4.682138095 192.168.7.50 → 192.168.7.243 TCP 66 40288 → 80 [FIN, ACK] Seq=250 Ack=152 Win=30336 Len=0 TSval=280670644 TSecr=2551356634
   15 4.682244605 192.168.7.243 → 192.168.7.50 TCP 66 80 → 40288 [FIN, ACK] Seq=152 Ack=251 Win=30080 Len=0 TSval=2551356635 TSecr=280670644
   16 4.682476896 192.168.7.50 → 192.168.7.243 TCP 66 40288 → 80 [ACK] Seq=251 Ack=153 Win=30336 Len=0 TSval=280670644 TSecr=2551356635
   17 5.000654139 192.168.7.123 → 192.168.7.243 TCP 66 52929 → 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM=1
   18 5.000821898 192.168.7.243 → 192.168.7.123 TCP 66 80 → 52929 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460 SACK_PERM=1 WS=128
   19 5.001264605 192.168.7.123 → 192.168.7.243 TCP 60 52929 → 80 [ACK] Seq=1 Ack=1 Win=2102272 Len=0
   20 5.001584187 192.168.7.123 → 192.168.7.243 HTTP 752 GET /feed/list.json HTTP/1.1
   21 5.001691062 192.168.7.243 → 192.168.7.123 TCP 54 80 → 52929 [ACK] Seq=1 Ack=699 Win=32128 Len=0
   22 5.032394124 192.168.7.243 → 192.168.7.123 HTTP 735 HTTP/1.1 200 OK  (application/json)
   23 5.074267300 192.168.7.123 → 192.168.7.243 TCP 60 52929 → 80 [ACK] Seq=699 Ack=682 Win=2101504 Len=0

You can see this command (copied directly off the Input API help page) and executed on 192.168.7.50

curl --data "data=[[0,16,1137],[2,17,1437,3164],[4,19,1412,3077]]&apikey=0dbf7b37b252d30e8ccb54b063d8bdf7" "http://192.168.7.243/input/bulk"

being received and responded to. I don’t understand all the garbage, but if there was an error, it would be obvious.

Line 12 shows the response ok being sent back (and I see that in the command line on 50).

johnbanks · 18 November 2019 23:29

@borpin

Many thx for sticking with my problem.

I’ve now adjusted watchman’s logging level for emoncms log. I tried it by sending an API FEED request for a value and time to a feed that existed and to a feed that did not exist. The log clearly reported both instances.

I’ve now located the help section on API INPUTS - it’s under Inputs - obvious when you you know how

In blind faith I tried yr suggested Legacy format line in a remote TX/RPI. It reported - failed to connect to the IP address, timed out. But I expected this as the API key & specfic IP address are not relevant to my system.

I’m a bit concerned about sending random test data to watchman in case it screws things up - valuable data that I’ve exported as a precaution.
However remote TX/RPi Node 12 is at an outdoor pool now closed for the season so it’s data can be a sacrificial lamb.

I rebooted Node 12 to eliminate the 3 day buffer of unsent data. Then I copied the emonhub send from the log, added curl and the API key data and sent this from a Node 12 dataplicity terminal with the following results …

pi@emonpi-node-12:/ $ curl http://192.168.1.100/emoncms/input/bulk.json?apikey=4448328134be7b9b3f0447f49f910e8b&data=[[1574032954.700755,12,9,0,0,0,254.37,19.700000000000003,6.300000000000001,6.300000000000001,5.5,6.1000000000000005],[1574032964.600453,12,6,0,0,0,254.59,19.700000000000003,6.2,6.300000000000001,5.5,6.1000000000000005],[1574032982.282405,12,7,0,0,0,254.32,19.700000000000003,6.300000000000001,6.300000000000001,5.5,6.1000000000000005]]&sentat=1574032985
[1] 3553
[2] 3554
[2]+  Done                    data=[[1574032954.700755,12,9,0,0,0,254.37,19.700000000000003,6.300000000000001,6.300000000000001,5.5,6.1000000000000005],[1574032964.600453,12,6,0,0,0,254.59,19.700000000000003,6.2,6.300000000000001,5.5,6.1000000000000005],[1574032982.282405,12,7,0,0,0,254.32,19.700000000000003,6.300000000000001,6.300000000000001,5.5,6.1000000000000005]]
pi@emonpi-node-12:/ $ <body><script language="javaScript">window.parent.document.cookie="Authorization=;path=/";
window.parent.location.href = "http://192.168.1.100";
</script></body></html>

The response is familiar.
It appears that Node 12 data is sent to watchman but watchman has not well received it to the point of sending its standard ok response.
Could it be that the 3553 & 3554 are significant?

I appreciate that I have not tried sending the data in yr suggested curl --data data=[~~~~ format.
But is this likely to give a different result?

I installed tshark on remote TX/RPi (node 14) and had success with yr “port 80” command …

7 0.006220022 192.168.1.14 → 192.168.1.100 HTTP 439 POST /emoncms/input/bulk.json?apikey=4448328134be7b9b3f0447f49f910e8b HTTP/1.1  (application/x-www-form-urlencoded)
    8 0.033410773 192.168.1.100 → 192.168.1.14 TCP 66 80 → 45018 [ACK] Seq=1 Ack=1449 Win=8688 Len=0 TSval=1318087915 TSecr=1892923141
    9 0.033429627 192.168.1.100 → 192.168.1.14 TCP 66 80 → 45018 [ACK] Seq=1 Ack=2897 Win=11584 Len=0 TSval=1318087915 TSecr=1892923141
   10 0.033442387 192.168.1.100 → 192.168.1.14 TCP 66 80 → 45018 [ACK] Seq=1 Ack=4345 Win=14480 Len=0 TSval=1318087916 TSecr=1892923141
   11 0.033454731 192.168.1.100 → 192.168.1.14 TCP 66 80 → 45018 [ACK] Seq=1 Ack=4718 Win=17376 Len=0 TSval=1318087916 TSecr=1892923141
   12 0.033466710 192.168.1.100 → 192.168.1.14 TCP 75 80 → 45018 [PSH, ACK] Seq=1 Ack=4718 Win=17376 Len=9 TSval=1318087916 TSecr=1892923141 [TCP segment of a reassembled PDU]
   13 0.033663897 192.168.1.14 → 192.168.1.100 TCP 66 45018 → 80 [ACK] Seq=4718 Ack=10 Win=29312 Len=0 TSval=1892923168 TSecr=1318087916
   14 0.033479575 192.168.1.100 → 192.168.1.14 HTTP 388 HTTP/1.1 200 OK  (text/html)
   15 0.033505512 192.168.1.100 → 192.168.1.14 TCP 66 80 → 45018 [RST, ACK] Seq=333 Ack=4718 Win=17376 Len=0 TSval=1318087916 TSecr=1892923141
   16 0.037801741 192.168.1.100 → 192.168.1.14 TCP 60 80 → 45018 [RST] Seq=10 Win=0 Len=0
   17 30.128464238 192.168.1.14 → 192.168.1.100 TCP 74 45020 → 80 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1892953263 TSecr=0 WS=128
   18 30.135410245 192.168.1.100 → 192.168.1.14 TCP 74 80 → 45020 [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 SACK_PERM=1 TSval=1318095441 TSecr=1892953263 WS=2
   19 30.135551651 192.168.1.14 → 192.168.1.100 TCP 66 45020 → 80 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=1892953270 TSecr=1318095441
   20 30.135702900 192.168.1.14 → 192.168.1.100 TCP 1514 POST /emoncms/input/bulk.json?apikey=4448328134be7b9b3f0447f49f910e8b HTTP/1.1  [TCP segment of a reassembled PDU]
   21 30.135764254 192.168.1.14 → 192.168.1.100 TCP 1514 45020 → 80 [ACK] Seq=1449 Ack=1 Win=29312 Len=1448 TSval=1892953270 TSecr=1318095441 [TCP segment of a reassembled PDU]
   22 30.135813420 192.168.1.14 → 192.168.1.100 TCP 1514 45020 → 80 [PSH, ACK] Seq=2897 Ack=1 Win=29312 Len=1448 TSval=1892953270 TSecr=1318095441 [TCP segment of a reassembled PDU]

Line 14 shows watchman (192.168.1.100) sending OK (text/html) to the remote TX/RPi (192.168.1.14).
In yr post above, the OK’s are OK (text/plain) and OK (application/json).
These OK’s are different - is it sgnificant?

watchman’s emonCMS log at this same time revealed …

2019-11-18 19:56:56.729|INFO|rememberme_model.php|login
2019-11-18 19:56:56.730|INFO|rememberme_model.php|getCookieValues: not present
2019-11-18 19:56:56.740|INFO|user_model.php|get_preferences()|Empty User preferences
2019-11-18 19:56:56.757|INFO|user_model.php|get_preferences()|Empty User preferences
2019-11-18 19:57:12.977|INFO|index.php|user/login.json
2019-11-18 19:57:12.983|INFO|rememberme_model.php|login
2019-11-18 19:57:12.984|INFO|rememberme_model.php|getCookieValues: not present
2019-11-18 19:57:12.988|INFO|rememberme_model.php|clearCookie
2019-11-18 19:57:12.989|INFO|rememberme_model.php|getCookieValues: not present
2019-11-18 19:57:13.190|INFO|index.php|feed/list
2019-11-18 19:57:13.218|INFO|user_model.php|get_preferences()|bookmarks|String(85) = [{"path":"admin\/view","text":"Administration"},{"…
2019-11-18 19:57:13.240|INFO|user_model.php|get_preferences()|bookmarks|String(85) = [{"path":"admin\/view","text":"Administration"},{"…
2019-11-18 19:57:13.811|INFO|index.php|feed/list.json
2019-11-18 19:57:13.820|INFO|feed_model.php|EngineClass() Autoloaded new instance of 'PHPTimeSeries'.
2019-11-18 19:57:13.826|INFO|feed_model.php|EngineClass() Autoloaded new instance of 'PHPFina'.
2019-11-18 19:57:13.869|INFO|index.php|process/list.json
2019-11-18 19:57:13.874|INFO|index.php|schedule/list.json
2019-11-18 19:57:13.882|INFO|index.php|feed/list.json
2019-11-18 19:57:13.886|INFO|index.php|input/list.json
2019-11-18 19:57:14.192|INFO|feed_model.php|EngineClass() Autoloaded new instance of 'PHPTimeSeries'.
2019-11-18 19:57:14.199|INFO|feed_model.php|EngineClass() Autoloaded new instance of 'PHPFina'.
2019-11-18 19:57:18.797|INFO|index.php|feed/list.json
2019-11-18 19:57:18.812|INFO|feed_model.php|EngineClass() Autoloaded new instance of 'PHPTimeSeries'.
2019-11-18 19:57:18.820|INFO|feed_model.php|EngineClass() Autoloaded new instance of 'PHPFina'.
2019-11-18 19:57:18.865|INFO|index.php|admin/view
2019-11-18 19:57:19.776|INFO|user_model.php|get_preferences()|bookmarks|String(85) = [{"path":"admin\/view","text":"Administration"},{"…
2019-11-18 19:57:19.788|INFO|user_model.php|get_preferences()|bookmarks|String(85) = [{"path":"admin\/view","text":"Administration"},{"…
2019-11-18 19:57:20.181|INFO|index.php|admin/getlog
2019-11-18 19:57:20.186|INFO|index.php|feed/buffersize.json
2019-11-18 19:57:20.195|INFO|index.php|admin/emonpi/getupdatelog
2019-11-18 19:58:59.787|INFO|index.php|admin/downloadlog

Anything of significance?

I then started to look at the logs …
apache2 access log had no errors or warnings
The apache 2 error log was different … It had many similar errors to this at the time in question …

[Mon Nov 18 19:57:19.769510 2019] [:error] [pid 14474] [client 127.0.0.1:58146] PHP Notice: Undefined index: hw in /var/www/emoncms/Modules/admin/admin_main_view.php on line 397, referer: https://inXXXXXXXXXXXXXXXXaplicity.io/emoncms/feed/list

Here’s the relevant stanza in admin_main_view.php

<?php if (!empty(implode('',$rpi_info))) : ?>
        <h4 class="text-info text-uppercase border-top pt-2 mt-0 px-1"><?php echo _('Pi'); ?></h4>
        <dl class="row">
            <?php echo row(sprintf('<span class="align-self-center">%s</span>',_('Model')), $rpi_info['model'].'<div>'.RebootBtn().Shutdown$
            <!-- <?php echo row(_('SoC'), $rpi_info['hw']) ?> -->
            <?php echo row(_('Serial num.'), strtoupper(ltrim($rpi_info['sn'], '0'))) ?>
            <?php echo row(_('Temperature'), sprintf('%s - %s', $rpi_info['cputemp'], $rpi_info['gputemp'])) ?>
            <?php echo row(_('emonpiRelease'), $rpi_info['emonpiRelease']) ?>
            <?php echo row(_('File-system'), $rpi_info['currentfs']) ?>
        </dl>
        <?php endif; ?>

Does that throw any light on the problem?

I then looked at the mysql logs …

2019-11-15 18:28:10 1638232896 [Note] /usr/sbin/mysqld: Normal shutdown

2019-11-15 18:28:10 1638232896 [Note] Event Scheduler: Purging the queue. 0 events
2019-11-15 18:28:10 1426060096 [Note] InnoDB: FTS optimize thread exiting.
2019-11-15 18:28:10 1638232896 [Note] InnoDB: Starting shutdown...
2019-11-15 18:28:11 1638232896 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool
2019-11-15 18:28:11 1638232896 [Note] InnoDB: Shutdown completed; log sequence number 1618187
2019-11-15 18:28:12 1638232896 [Note] /usr/sbin/mysqld: Shutdown complete

2019-11-15 18:28:13 1988194304 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2019-11-15 18:28:13 1988194304 [Note] InnoDB: The InnoDB memory heap is disabled
2019-11-15 18:28:13 1988194304 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2019-11-15 18:28:13 1988194304 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2019-11-15 18:28:13 1988194304 [Note] InnoDB: Compressed tables use zlib 1.2.8
2019-11-15 18:28:13 1988194304 [Note] InnoDB: Using Linux native AIO
2019-11-15 18:28:13 1988194304 [Note] InnoDB: Using generic crc32 instructions
2019-11-15 18:28:13 1988194304 [Note] InnoDB: Initializing buffer pool, size = 128.0M
2019-11-15 18:28:13 1988194304 [Note] InnoDB: Completed initialization of buffer pool
2019-11-15 18:28:14 1988194304 [Note] InnoDB: Highest supported file format is Barracuda.
2019-11-15 18:28:14 1988194304 [Note] InnoDB: 128 rollback segment(s) are active.
2019-11-15 18:28:14 1988194304 [Note] InnoDB: Waiting for purge to start
2019-11-15 18:28:14 1988194304 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.35-80.0 started; log sequence number 1618187
2019-11-15 18:28:14 1442837312 [Note] InnoDB: Dumping buffer pool(s) not yet started
2019-11-15 18:28:14 1988194304 [Note] Plugin 'FEEDBACK' is disabled.
2019-11-15 18:28:14 1988194304 [Note] Server socket created on IP: '127.0.0.1'.
2019-11-15 18:28:14 1988194304 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.1.23-MariaDB-9+deb9u1'  socket: '/home/pi/data/mysql/mysql.sock'  port: 3306  Raspbian 9.0

Nothing untoward I think - just documenting a reboot?

I feel to have gone full circle. The problem does seem to be with mariadb per my initial forum post …

2019-11-14 16:15:49 1988837376 [Note] InnoDB: The InnoDB memory heap is disabled
2019-11-14 16:15:49 1988837376 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2019-11-14 16:15:49 1988837376 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2019-11-14 16:15:49 1988837376 [Note] InnoDB: Compressed tables use zlib 1.2.8
2019-11-14 16:15:49 1988837376 [Note] InnoDB: Using Linux native AIO
2019-11-14 16:15:49 1988837376 [Note] InnoDB: Using generic crc32 instructions
2019-11-14 16:15:49 1988837376 [Note] InnoDB: Initializing buffer pool, size = 128.0M
2019-11-14 16:15:49 1988837376 [Note] InnoDB: Completed initialization of buffer pool
2019-11-14 16:15:49 1988837376 [Note] InnoDB: Highest supported file format is Barracuda.
2019-11-14 16:15:49 1988837376 [Note] InnoDB: The log sequence numbers 1618147 and 1618147 in ibdata files do not match the log sequence number 1618167 in the ib_logfiles!
2019-11-14 16:15:49 1988837376 [Note] InnoDB: Restoring possible half-written data pages from the doublewrite buffer...
2019-11-14 16:15:49 1988837376 [Note] InnoDB: 128 rollback segment(s) are active.
2019-11-14 16:15:49 1988837376 [Note] InnoDB: Waiting for purge to start
2019-11-14 16:15:49 1988837376 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.35-80.0 started; log sequence number 1618167
2019-11-14 16:15:50 1447031616 [Note] InnoDB: Dumping buffer pool(s) not yet started
2019-11-14 16:15:50 1988837376 [Note] Plugin 'FEEDBACK' is disabled.
2019-11-14 16:15:50 1988837376 [Note] Recovering after a crash using tc.log
2019-11-14 16:15:50 1988837376 [Note] Starting crash recovery...
2019-11-14 16:15:50 1988837376 [Note] Crash recovery finished.
2019-11-14 16:15:50 1988837376 [Note] Server socket created on IP: '127.0.0.1'.
2019-11-14 16:15:50 1988837376 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.1.23-MariaDB-9+deb9u1'  socket: '/home/pi/data/mysql/mysql.sock'  port: 3306  Raspbian 9.0
2019-11-15 13:34:11 1639236416 [ERROR] mysqld: Table './emoncms/feeds' is marked as crashed and should be repaired
2019-11-15 13:34:11 1639236416 [Warning] Checking table:   './emoncms/feeds'

At the time, I repaired mysql and updated the database via the emon Admin page- but to no avail.

Is the key line …

2019-11-15 13:34:11 1639236416 [ERROR] mysqld: Table './emoncms/feeds' is marked as crashed and should be repaired

If so - how to do that repair?
Where is ./emoncms/feeds? - I cannot find it.

Again- many thx

borpin · 19 November 2019 07:58

I think we are deep into the inner workings of emoncms here and @TrystanLea will be your best bet.

On trying curl, open the Input API help page on watchman, copy one of the examples, and paste it into firstly an SSH shell connected to watchman and see if it works, and then on an SSH shell on the remote. It should n’t disrupt anything as long as the nodeID is not in use. It will just create an Input you can delete.

The key bit here is that if it is a mysql issue, it won’t work from either terminals.

That’s not relevant.

It seems you are right. How did you repair it (not that I know anything about repairing tables)?

On the basis it is still reporting it as ‘crashed’ the repair could not have been successful first time.

I searched on “is marked as crashed and should be repaired” and there was quite a lot of information out there.

johnbanks · 19 November 2019 17:32

@TrystanLea

Can you help pls.

The root cause problem seems to be with mariadb …

Version: '10.1.23-MariaDB-9+deb9u1'  socket: '/home/pi/data/mysql/mysql.sock'  port: 3306  Raspbian 9.0
2019-11-15 13:34:11 1639236416 [ERROR] mysqld: Table './emoncms/feeds' is marked as crashed and should be repaired
2019-11-15 13:34:11 1639236416 [Warning] Checking table:   './emoncms/feeds'

I’ve now managed to find the feeds file but sudo cat is not very helpful …

pi@watchman:~/data/mysql $ sudo cat /home/pi/data/mysql/emoncms/feeds.MYI
����d�- �$$�������+�@�5��������]��r\|^d]��r$nn{



?@

 @
)j,��
T
 ��`��@|��L��$\ �!�"@#�$�%8'�(�)� Xpi@watchman:~/data/mysql $

Thinking that a good copy might be zipped in a backup export made before the problem occurred, I tried to download the export using dataplicity & winSCP. Part way into the download, watchman went offline as well as 2 of the remote TX/RPi’s. Oh! the joys of working remotely.

Can you suggest what I might try once I get them up & running again?

Googling I can see what might be some relevant mysql things to try but I really don’t want to irretreivably screw things up. I noted warnings about turning off the mysql daemon before doing repairs. How do I do that? And then turn it back on again?

Thanks

johnbanks · 20 November 2019 12:48

@TrystanLea @borpin
Update …
Last evening, I called my son to reboot watchman by pulling & re-inserting the RPi power lead.
watchman came back on line but so surprisingly did the other 2 TX/RPi’s that had gone offline with watchman earlier in the day when I was trying to download a large backup export file.

The feeds web page for watchman showed days of delay in the updating columnn on the right.
Sometime later, I noticed these delays had reduced somewhat and this continued slowly.
And the emonHub log for a TX/RPi on the network now showed that watchman was sending the correct ‘ok’ response.

By this morning, watchman had completely caught up - updates were happening every few to 30 secs.

I cannot explain this Lazarus effect - last evening was certainly not the first time that watchman had been rebooted since the problem first started.

In summary I can now see …
The 4 TX/RPi nodes have suffered no breaks in the data on their individual SDHC’s
On watchman …
Data for 1 node has no break
Data for the other 3 nodes suffered breaks simultaneouly at 4:15pm on the 14th.
Data then resumed node by node - 8:45pm on 15th, 6:30pm on 16th and 11:30pm on 17th

The only actions I have taken re watchman - to run a mysql repair command, update the database and ver to 10.1.13 from the Admin page.

emonHub/CMS is quite amazing - hanks

borpin · 20 November 2019 12:54

Great description.