What does psent and psuccess indicate?

Hi

I have 4 emontx in use. Because of poor radio performance I fitted 3 of them with emonESP boards to use wifi. The first two apparently gave excellent results and the readings matched my meter in one case and PV inverter in the other. The last one I installed seemed to give good results but after a day or two I noticed big gaps in the data. I was not sure at first how to debug this but I started to record psent and psuccess in a graph. It was pretty obvious that the psent numbers were being reset to zero very frequently, sometimes restarting to count up immediately but after a long time during the data gaps.
I had already checked the wifi signal with an app and it was showing a very good signal.

Puzzled by this I looked at the emontx and I noticed the 5volt DC plug was not very well seated. On close examination I realised the aperture in the plastic was too small for the plug on the cable and this was stopping the plug going in properly. Some quick work with a file remedied this and I put the emontx back in service. I was delighted to see that the data gaps no longer occurred.

I had left the psent feed in place and could see that the number was now climbing into very high numbers with a very occasional reset to zero. I got curious about the resetting to zero and started monitoring one of the other emonesp units. To my surprise this one was resting psent to zero very frequently

Can any one tell me what this means and more important do I need to do anything about it?

Regards

Ian

It sounds very much like a power supply problem. If your ‘first’ emonTx was (presumably) dropping out and resetting because the additional power required to transmit by Wi-Fi pulled the 3.3 V rail down, then it seems prudent to look for the same problem on the other two.

Would I be right to think that you haven’t removed the link (JP2) so they are still taking power from both the a.c. adapter and the 5 V USB? If that’s the case (and the internal emonTx V3.4 power supply wasn’t designed to support an ESP8266), I can well believe that when the incoming mains voltage drops a little, there isn’t enough power to carry it through the data transmission and it resets.

I don’t have an ESP8266, I’d guess that psent is Packets SENT.

psent = packets sent and psuccess = successfully acknowledged requests, so every time an attempt is made to send a packet psent is incremented by one and only if the reply is a “ok” does it increment psuccess.

Looking at your graph (for example) just before 08:00 we see psent is a little higher than psuccess meaning there were some requests that didn’t get an “ok” response, that could be for a variety of reasons, network issues, wifi issues, emoncms timeout etc. looking closer at that same example time window it appears the errors occurred around 07:00 as we can see a “dog-leg” in the psuccess plot.

These counts should AFAICT increase steadily forever and only reset at rollover (4,294,967,295 or every 1362 years at 10s intervals) or power interruptions (to the emonESP not the emonTx) or FW updates OR if the failure count (psent minus psuccess) reaches 30 (see line 84 of EmonESP/src/emoncms.cpp).

If your device is reaching that 30 fails limit and resetting that would suggest (for example) between 02:00 and 04:00, when the device reset 9 times that there were over 270 failed requests, since there are only 720 packets sent over 2 hours at 10s intervals that could indicate a fail rate of 37.5% which isn’t great!

I think a reset of emoncms_connection_error_count when there is a successful “ok” receipt might improve the resetting,

if (result == "ok"){
    packets_success++;
    emoncms_connected = true;
    emoncms_connection_error_count = 0;
  }
  else{
    emoncms_connected=false;
    DEBUG.print("Emoncms error: ");
    DEBUG.println(result);
    emoncms_connection_error_count ++;
    if (emoncms_connection_error_count>30) {
      ESP.restart();
    }

as that would then only reset if there were 30 consecutive fails, but this may mask the underlying issue as the psent and psuccess numbers will get really big, dwarfing the <30 fails. I think you would then need to look at logging the difference ie the fails rather than the successes to make sense of it.

I cannot see any “retry on fail code” but I haven’t studied all the separate files that make up the emonESP code.

[edit RW has posted whilst I was writing so, apologies if I have doubled up]

Thanks for your replies. All units are on external 5 volt DC supplies and not using the AC supply. They are all of the same make. I will try different DC supplies.

Ian

Hi

I have investigated this further. I changed the power supply on the emontx around 8 o’clock on the day shown in the graph below. It made no difference. As this emontx is on my local network I was able to see the WifI strength which was not good. I have 2 wifi access points and I was using the nearest. I switched access points and reverted to the original power supply at around 15:30 and from that point on the problems disappeared.

My conclusion is that my problem is spotty wifi connection.

I then looked at another emontx/emonESP packet performance and that is excellent.

That leaves me with one emontx/emonESP that still has a problem.

If I check with my wifi signal strength app there are at least 10 other strong wifi signals from my close neighbours. Can a signal get lost in the crowd?

There a 2 things I have thought of. If I understand the code correctly the resets occur when packets lost reach 30 regardless of the time frame. If I implement the change suggested by Paul but make the count value 10 I will reset only on 10 consecutive fails which might be more sensible. The second thing is that I cannot move the emontx and the wifi has thick 16th century stone walls between signal and router. However I could attach the emonESP with a cable and move the wifi antenna. What kind of distance limit would be sensible with the emonESP/emontx serial connection?

Ian

Hi Ian,

I think the code change i suggested would be an improvement regardless of what the root cause is as I’m thinking (for example) you could currently have a setup that encounters 29 issues during some network event(s) and run perfectly for ever and a day but on the next single fail it will reset. Any reset I would assume will result in at least a couple of missed packets, so a single missed packet becomes several missed packets just because it is the 30th.

Reducing that total from 30 to 10 may improve things but it could just as well make things worse. It all depends on whether the resetting of the device is helping restore the connection. looking at your data it seems most of the time the 30 faults are built up gradually over a period of time and not where the connection is lost until reset. By lowering the “consecutive count” to 10 you may actually increase the downtime, eg if just 10 packets are lost, then a further 5 are lost during a reset and reconnect to wifi so 15 are lost rather than 10. The ideal “auto reset” or watchdog is never hit during normal usage, so you would need to look at how long the average “bad patch” last and set this level to something at least above that.

If you can improve the WiFi signal by moving the esp that would be a good step. I would expect you could easily extend by a good couple of meters, beyond that it would depend on the quality of the cable and noise levels etc. Even putting the esp on a meter flylead and moving it around on the blind side of the wall might find a usable “sweet spot”, eg a reflection etc.

Do you have multiple SSID’s or have you extended the same wifi network with duplicated network settings?
I found I had a scenario where one of my devices used to perform poorly over wifi, after a while I traced it down to when I parked the car in a different location for an hour or so (in front of the garage) it cut the stronger wifi signal to that device just long enough that it would reconnect to the distant weaker AP signal and after the car was gone, the “distant” wifi was patchy but never quite bad enough to cause the device to search and find the stronger signal again. Raising the device above the height of the car resolved that issue.

Hi Paul

I am using multiple SSID’s so I can ensure which wifi connection emonESP is using. I have been carefully monitoring the one emonESP that is still resetting and I can confirm that the error count is growing very slowly over about 12 hours until it reaches 30. I am now convinced that the dropped packets are occurring one at a time over a considerable period. I will add the mod to the code leaving the error count at 30 and see what happens.

The other emonESP has been rock solid since the wifi change.

Ian

What if we decrement emoncms_connection_error_count by one every n OK packets? So if for a long time no error occurs the device will not reset.

I’m not sure what that brings to the party. If you get a successful connection, it should zero so you get the full set of retries before resetting. You want to avoid a situation where you get a flaky connection restored after a prolonged outage and it resets soon after a valid connection purely because not all connections are successful, a reset isn’t going to improve the weak signal. It will only remedy a terminal connection.

I see your point and I agree. In any case the error count must be zeroed, one way or another, after a successful connection. The current implementation is not OK IMHO, your proposal to zero the error count after the first success is simple and solves the issue.

I have just created a PR for this.

Thanks a lot, PR has been merged and new release V2.3.0 has been created: Releases · openenergymonitor/EmonESP · GitHub

1 Like