Has anyone else ever been frustrated with an error that simply says, “bitmain_scanreg,crc5 error, should be 00, but check as 01…..” I have, and lacking available source code to trace this down, I’m only guessing to the culprit. The fact that it seems to be looking for something blank makes me think it’s an ASIC check, or possibly something with the PIC.
Disclaimer – This may be hard to follow, it’s sometimes hard to put thoughts as you work a problem down, but it’s a great reference for me in the future and hopefully y’all as well.
Problem – I have a board that doesn’t find any ASICs and obviously won’t hash. The only real clue I had was a long series of:
bitmain_scanreg,crc5 error,should be 00, but check as 01…..
Looking at all the usual places, Reddit, Discord, and google searches only pointed me to people saying the obvious, your board’s bad, send it to Bitmain or a repair shop. Well, regardless of this, I’m determined to figure this out for all of us that want DIY fixes.
I’ll start by saying the test fixtures you can buy on Ebay and other sites are basically garbage, at least if you spend $200 on one. I did a quick article on turning your control board into a DIY test fixture and that’s all I ever use anymore. I always test with the data cable and power cable hooked up, once I press the test button (IP_sig on the control board itself) the PIC initializes the board and I should get ~10V out of the buck converter, 14.2V from the boost, and 1.8V from the LDO’s. A quick check of these voltages show that all are reading perfect, so on to the ASIC by ASIC search. So knowing the proper voltages are present, I moved onto the LDO’s.
All LDO’s tested out perfect, we’ve got ~2.5V in and 1.8V out from each. So much for an easy fix, so now it’s time to get down and dirty, 72 individual checks of the basics, RI and CLK (all the ASICs have test points available on the bottom of the board.) As I moved through each ASIC, starting at 72, I found an anomaly at ASIC 28. I verified proper voltages to RI (1.8V) and CLK (0.8V) at all ASICS until I got to ASIC 28, which has CLK at 0.95V but 0V at RI. ASIC’s from 1-27 also have 0V at RI. Aha! I’m thinking short at this point, so where is the short? I moved to pulling off the heatsinks in that entire row, inspected, reflowed the joints, but still having the “found 0 asics” error in the kernel log.
We now go back to the schematics to see how the chains are organized and if there’s any reason this particular ASIC could take out the chain before it, or is there another one in the chain doing the damage.
I’ve also verified that CO (pin 4) and BO (pin 2) read 1.8V on ASIC28, so if there’s a short to ground causing the 0V at RI (pin 3), it’s not there. If we go back one ASIC in the chain, ASIC 27, we also measure RI as 0V. ASIC 27 outputs RO (pin 26) into RI (pin 3) of ASIC 28. Of note, the two pins on either side of RO are CI (pin 25) and BI (pin 27.) When we go to the schematics we see that CI (pin 25) is the transmit signal that goes to CO (pin 4) of ASIC 28. We also see that BI (pin 27) is shown as tied to ground and is tied to BO (pin 2) of ASIC 28.
OK, that’s a lot of data, but one thing that stands out, if on ASIC 27 BI (pin 27) is tied to ground, can there be a short from BI (pin 27) to RO (pin 26) and would that cause us to see 0V on RI at ASIC 28?
I measured resistance between each of the pins on each ASIC, and verified that each ASIC had continuity from input to output of the next, no issues. Now I’m getting frustrated, I think it’s something with ASIC 28, but that’s a PITA to remove. I’ve tried reflowing to no avail, guess my next step is to attempt to remove it with a heat gun and then replace, this could get interesting.
One hot air reflow knife, 1 pair of tweezers, a lot of flux, and even more patience pays off. It took me a good 2-3 minutes before I was able to pop ASIC 28 off, but I ultimately won. I’ve heard folks say it takes 30 seconds, well, they must have better gear than me. Anyway, after popping ASIC 28 off, cleaning the pads, and cleaning the board I was ready to throw it back into a system and see what we have.
Chain 0, 27 asics found
This was music to my ears, I flipped the board over and measured RI at ASIC 27 and several between that and ASIC 1, 1.8V! As my son would say, “We did it!!!” We found the issue, I looked at the kernel log and all CRC5 errors were gone, and on the miner status page we could clearly see 27 asics found. I felt the biggest sense of relief and victory.
But now the hard part, I’ve never soldered on a new ASIC on one of these. That’s a different story for a different time, but for now we’ve slayed the dragon (the dragon being CRC5 errors.)
CRC5 errors seem to point to a bad ASIC. If you’re receiving these I would move immediately to testing the voltages (RST, BO, RI, CO, CLK) with power and a data cable installed. You may need to press the test (or IP_Sig) button to start the buck controller so the LDOs are operating so keep that in mind. The board won’t output ~10V from the buck controller without this, and you’ll need to do this every minute or two when the board is in test mode.
If you get an anomaly on one of these output voltages it could be that particular ASIC, or the ASIC up the chain. Shooters choice as to which it can be, but you are much closer to solving this problem.