Mystery:
As I mentioned recently, the monitoring of my 16s 2p system has changed with two plates in which 16 v 4.4 modules are integrated, which allows efficient cooling of the same, improving the balancing capacity of the system. I have removed one of the groups of modules and replaced it with a new one and in the two of the system that controls the house’s battery, I have reprogrammed the modules to version 9k6. The system works in an extraordinary way, however, the one that I have withdrawn I wanted to leave it also programmed to 9k6 to have a spare part that can be exchanged quickly if any of the 16 integrated modules break down (in almost 1.5 years of experience, only the current sensor has been damaged in one of the modules).
The thing is that I have not been able to make this group of 16 modules work in 9k6, I have tried to program them from the controller (there is no error message in the installation) I have also reprogrammed them from avrdude with the firm module_fw_V440_COMMS_9K6_attiny841_440_eF4_hD6_l6C.hex. I’ve been on this all Saturday and part of Sunday, thinking about the causes, it could be that some or part of the Attiny could be wrong or not legitimate. (now I always buy them from a secure provider, but it wasn’t always like that) I already say that with the standard speed version it’s perfect, but I can’t upload it to 9k6.
Any ideas what could be the cause?
Hi @Yar_Leo looks like you have been doing a lot of good debugging well done.
First of all, as I mentioned in my comms video - the modules communicate in “banks” of 16 - that means the controller cannot communicate with more than 16 modules in a single data packet/request. So the slight pause you see between requests/flashing leds is nothing to worry about, thats normal.
The round trips you are getting are good based on the screenshots (less than 800ms).
Can I suggest you try the 5,000 baud speed firmware? (5K version) That was made to help reduce issues like the one you have here where you get inconsistent results.
Ultimately, there will be interference or a bad connection somewhere in the chain - the usual issue is the last cable back to the controller - see if you can keep this away from everything else.
Here you have probably misunderstood me a little. I do not mean the small pause between module 16 and 17 in the chain. I mean the pause between comms LED flashes on the first module. So, the controller starts the communication → LED on cell module 1 flashes and the running light go further → Pause (nothing happens till the controller board sends a new request) → controller starts the communication. So I mean the pause between 2 separate requests from controller board (2 flashes of the comms LED on the cell module 1)
When more that 16 modules configured, doesn’t matter if 1S 2xP or 2S 1xP, this pause, the time during which there are no communication requests from controller board, sometimes is so long, that cell modules go to independent (stand alone) mode and then all these chaotic error appear.
Yes, the round trip is super: 750ms for 14S and for 2 string it is 1500ms
I can try it if necessary, but only in the evening. However I do not expect it to help, since previously, on the standard speed, I had the same issue, but back then I thought is was because I was using 4,21 boards on string 1 and 4,4 on string 2. And, since you have announced the new version with increased speed I was not reporting this issue and was waiting for new 4.4 boards to arrive to update string 1 and check it on the new software.
That have checked multiple times. For example, here is a screen shot from my test controller which is located directly near the cell modules and connected to them with short cables (15 cm on TX and 25 cm on RX). After lasts counters reset it is running for around 20 hours. With exactly the same cables and on the same controller but at least 2 cells less there are completely no errors (see screen shots above for 16 and 14 cells)
If necessary, I can make a video to show my setup and the “strange” behavior of the running lights.
The code which transmits the requests from the controller lives in the file named “main.cpp” and is below. It uses a different delay based on the comms speed.
void transmit_task(void *param)
{
for (;;)
{
// Delay based on comms speed, ensure the first module has time to process and clear the request
// before sending another packet
uint16_t delay_ms = 900;
if (mysettings.baudRate == 9600)
{
delay_ms = 450;
}
else if (mysettings.baudRate == 5000)
{
delay_ms = 700;
}
There is also an “enqueue” task which actually requests the scanning of the modules. This runs either every 5 seconds or 10 seconds based on if there are less or more than 16 modules.
Its possible, the “10 seconds” is the delay you are seeing.
void enqueue_task(void *param)
{
for (;;)
{
// Ensure we service the cell modules every 5 or 10 seconds, depending on number of cells being serviced
// slower stops the queues from overflowing when a lot of cells are being monitored
// TODO: SCALE THIS BASED ON COMMS BAUD RATES
vTaskDelay(pdMS_TO_TICKS((TotalNumberOfCells() <= maximum_cell_modules_per_packet) ? 5000 : 10000));
If there are more than 16 cells the controller executes 1 or 2 or 3 transmit_task() every 10 seconds. So there is enough time in-between for cell modules to go to stand alone (8 seconds? am I right?).
If there are <=16 cell modules I see more running lights (up to 5-6 transmit_task() ) and the pause is smaller (5 seconds).
In case 1 (>16) chaotic errors appear. From my observation on the modules, which blinks rapidly (stand alone indication) when the running light from next transmit request comes to such a module.
In case 2 (<=16) there are no errors.
Now the question is:
is there some issue with transmit_task(), since there are only 1-2, maximum 3 packets sent
or
the 10 seconds delay is too big
or
something wrong with my controller boards (bot of them?), because some packets are not processed correctly (too little packets are sent).
UPD:
How much sets of running lights should actually be sent by the transmit_task() per 1 “enqueue” task? Because by me it is not always the same. The more common number is 2, but often it is also 1 or 3, and by small number of cell I saw more, like 5 or 6.
Sometimes I also see on the web interface the field “Send Q length” with values from 1 to 3. Could it be the reason, that number of transfer requests are different from time to time?
Hello, I’m building a version of the 4.40 module, but I changed the resistors to a different resistance. Do I need to specify the resistance of the new registers in the “DLOAD_RESISTANCE” parameter? What is this setting for? Does he participate in the calculation of mAh balancing?
I can’t remove the modules from the board, they are soldered.
However, they can be programmed (I have programmed two other groups like this without problems.) I disconnect power and RX TX to the controller and the modules apparently accept the programming without errors in the load. However , with the Standard firmware , everything works fine , but with the 9k6 firmware the modules do not sync . It is as if one or more of the modules had not recorded the Firm properly or that one of the 16 attiny had a malfunction that does not affect when they operate at standard speed, but that gives problems to 9k6. This sounds reasonable to you, an ATTiny with limited performance: It apparently allows itself to be programmed, but then it does not respond at the required speed, causing the chain desynchronization problem? Example in standar mode
In my opinion, this rules out module interconnection problems and limits the problem to attiny, what do you think?
@stuart, genuine question:
is there a reason to run 9k6 baud code on a 8S lifepo4 24V system? From what you’re saying I may be better off staying at 5k (if not stock!)
Am I right?
running the test system now for over a month at 9k6 have the occasional OOS or CRC (and of course the occasional reboot with influx at fast refresh rates) so thinking of stepping back to 5k
Hi @stuart
I have tried it based on you old video with instructions for compiling and downloading a d1 mini. But when I try to built/upload the project I get this message
Executing task in folder ESPController: e:\Projects\000\DIY_BMS\diyBMSv4ESP32\ESPController\platformio.exe run --environment esp32-devkitc <
The terminal process failed to launch: Path to shell executable "e:\Projects\000\DIY_BMS\diyBMSv4ESP32\ESPController\platformio.exe" does not exist.
Terminal will be reused by tasks, press any key to close it.
I do not know what can I do with it
Since there was a TODO comment, I have made such update for tests
void enqueue_task(void *param)
{
for (;;)
{
// Ensure we service the cell modules every 5 or 10 seconds, depending on number of cells being serviced
// slower stops the queues from overflowing when a lot of cells are being monitored
// TODO: SCALE THIS BASED ON COMMS BAUD RATES
vTaskDelay(pdMS_TO_TICKS((TotalNumberOfCells() <= maximum_cell_modules_per_packet) ? 5000 : 8000));
if (mysettings.baudRate == 9600)
{
vTaskDelay(pdMS_TO_TICKS((TotalNumberOfCells() <= maximum_cell_modules_per_packet) ? 3000 : 5000));
}
else if (mysettings.baudRate == 5000)
{
vTaskDelay(pdMS_TO_TICKS((TotalNumberOfCells() <= maximum_cell_modules_per_packet) ? 4000 : 6500));
}
Maybe you can send me a precompiled file and I will test it?
did you by any chance use PlatformIO or the project folder in a network drive? Is E: a physical drive of the windows computer or a network?
PlatformIO doesn’t work on network drives (say 99% or needs an awful lot of hassle) so copy the project over to C and try again…
I am using it on the local drive. But I have found out that some people have also similar problems because of antivirus software. I will try to install VScode on the virtual machine and try there.
I will keep testing for some longer period of time and will check other values to find out on which settings it is still running stable and let you know.
I have my first battery bank up and running but I am wanting a better way of mounting the 4.4 monitor modules to my batteries. I built them using the standard 5x20 method. I have seen folks with 3d printed mounts but I don’t have a 3d printer so I am looking for something I can do without a printer. Right now I just have them stuck on with double sided heat tape but that is only temporary. Any recommendations?
So, the experiment is finished.
More than 16 modules run stable at 7 seconds enqueue_task or less. With 8 or 9 the situation was the same, because of too long pause some modules go to stand along mode and a chaos in communication happens.
Since there was a todo comment, I have updated this piece of code like that