FIX-IT! – Bitmain Antminer S9


The following is a quick-fix guide and downloads that may help you diagnose and repair miner problems.

NO HASHING AT ALL – MY MINER BOOTS UP AND STARTS TO HASH

I’m not hashing at all?

My miner boots up and starts to hash then goes to zero?

Check your pools.

Insure your pools are alive and connected as the unit will stop hashing if no pool is detected.

Is there a blinking red light on front? Fan errors.

A common error can be the fans, check you fan speeds to insure they both are at least 3000 RPM. You can see this on the Miner Status page or go into the kernel log under the System menu. Scroll down and look for fan0 or fan1 failures (errors.) If you have a fan failure the unit will stop hashing as a preventative measure to preserve the unit. 

Check your cables where they attach to the control board as they wiggle loose at times. Also try swapping the cables to see if the fan error moves from fan0 to fan1.

Is there a blinking red light on front? Check your connection to the Internet.

If your unit can’t reach the Internet it will not receive work to process and therefore won’t be hashing. You can verify connection to the internet by going to Network -> Diagnostics and running a ping or traceroute.

Insure you don’t have an IP address conflict as well, it’s best to set your unit to DHCP if it isn’t already and reboot.

Is the I/O cable damaged or has it wiggled loose?

Sometimes I/O cables go bad, try swapping cables with another chain to see if the problem moves with the cable.

Reload the firmware.

Bitmain suggests reloading the firmware as a solution. The firmware can get corrupted in some cases so giving a fresh reload of the firmware may solve the problem.

HASH BOARD MISSING OR FEWER THAN 63 CHIPS FOUND

My hash board missing or fewer than 63 chips found?

When I boot my miner up a hash board is missing or it doesn’t show all 63 chips in the miner status page.

Check your temps.

Insure your are reading proper temperatures as this could be an indication of TMP451 temp sensor or LDO failures.

Reflow those solder joints!

An intermittent connection can change with environmental conditions . Heat and cold can flex cold solder joints and ultimately lead to failures. I’ve found that reheating and reflowing the joints on the temo sensor, buck converter, and PIC have resolved problems I’ve had in the past with missing components.

Reflowing solves the problem (sometimes.)

Check voltages at the 10V buck converter and 14.2V boost circuit.

Sometimes, more often than it should, the boost circuit on the hash board fails and subsequently asics show as missing or ‘xxxxxx’. Furthermore insure the output of the 10V buck converter matches the voltage you’ve set in the firmware for that chain (i.e. common voltages like 9.5V, 9.8V, or stock voltage of 10.11V.)

Check voltages at your LDOs.

The  S9 has 21 voltage domains, each domain powers 3 BM1387’s, each controlled by a single voltage regulator. Most recently Bitmain used an LN1134 but has used an SPX5205 and SGM2202 in the past. In the past these domains have failed when the 14.2V circuit failed so it’s a good idea to check the voltage at each domain. This can be done by checking the voltage between pin 2 (middle pin on LDO) and pin 1 (input ~2.4V) and pin 5 (output ~1.8V.) The first 16 domains are powered by 12V and the last 5 by the 14V boost circuit.

Reload the firmware.

Sometimes reloading the firmware, especially with one that allows autotune, can help isoloate or even fix the problem if it’s with the PIC.

Cold restart the unit.

From time to time intermittent problems like this can be solved by shutting the unit down for 30 seconds and then rebooting. This isn’t a long term fix but may get your unit back up and running for the time being.

Lower the frequency and increase the voltage.

Go into your advanced settings for the problem hash board and try lowering the frequency and upping the voltage (if your firmware allows.) Tuning the hash boards to run on minimal speed and power can have the board operating at the edge of its ability to function. Resetting the PIC to a more normal operating condition may solve your problem. Likewise operating at too high a frequency and power can potentially shorten the life of components or operate on the edge of functionality.

HASH BOARD SHOWING “XXXXXX” ACROSS CHAINS

Why is my hash board is showing “xxxxxx” across chains?

My miner shows several “x” on a hash board in the miner status page..

Check your HW error rate.

If you have an abnormally high HW error rate (and a nonce greater than 1%) you most likely have asics that are going bad. Generally a reboot can reset the state of the asics however it most likely will come back. 

If using firmware such as HiveOS go into Miner Configuration -> Manual Chips Freq and see if specific asics on the chain are having an abnormally high HM error rate. If so lower the frequency of those asics. If it’s several asics, consider lowering the frequency of the entire chain and increasing the voltage.

Is the I/O cable damaged or has it wiggled loose?

Sometimes I/O cables go bad, try swapping cables with another chain to see if the problem moves with the cable.

Check voltages at the 10V buck converter and 14.2V boost circuit.

Sometimes, more often than it should, the boost circuit on the hash board fails and subsequently asics show as missing or ‘xxxxxx’. Furthermore insure the output of the 10V buck converter matches the voltage you’ve set in the firmware for that chain (i.e. common voltages like 9.5V, 9.8V, or stock voltage of 10.11V.)

Reload the firmware.

Sometimes reloading the firmware, especially with one that allows autotune, can help isoloate or even fix the problem if it’s with the PIC. This is Bitmain’s go to solution.

Cold restart the unit.

From time to time intermittent problems like this can be solved by shutting the unit down for 30 seconds and then rebooting. This isn’t a long term fix but may get your unit back up and running for the time being.

Lower the frequency and increase the voltage.

Go into your advanced settings for the problem hash board and try lowering the frequency and upping the voltage to at least 9.8V. Tuning the hash boards to run on minimal speed and power can have the board operating at the edge of its ability to function. Resetting the PIC to a more normal operating condition may solve your problem. Likewise operating at too high a frequency and power can potentially shorten the life of components or operate on the edge of functionality.

Are your heatsinks attached?

If the ‘x’ appears after running a few minutes check that the heat sink is securely attached. Sometimes these get loose and poor thermal transfer will lead to asic failure.

Check your temps.

If your hash board temperatures (PCB and Chip) are abnormally high (around 80C or above) you may have failures related to heat. Things to consider are:

Clean your hash board, blow compressed air through and under the heat sinks to blow out dust to increase air flow and cooling.

Lower your chain frequency.

Is the ambient temperature too high?

Are your fans clean and operating at auto RPM (i.e. you haven’t set a static fan speed too low in the firmware.)

Check your power cables and power supply.

In some cases this can be due to lack of enough power or a clean ground. Check your ATX cables going into your hash board for damage, corrosion, and that they are securely installed. Also check the power connectors on the hash board for any damage or discoloration as this may be a sign of a power or connection issue.

In some cases your power supply may be going bad or not able to handle the full load. Unplug another hash board and reboot to see if that’s the issue. Also, if the power supply seems dirty, blow compressed air through the vents to clean it out.

MY MINER STARTS UP THEN KEEPS RESTARTING

Why does my miner reset/restart after a few minutes?

I’m not getting any failures before it resets

Check your power supply.

If your power supply is going bad it may present symptoms like this. An over heating power supply will go into thermal shutdown. Try blowing any dust out of the supply and insure the fans are operating properly. 

Worst case, get yourself a new supply. The APW7 is an upgraded version of the APW3+ and supports 120V and 240V.

Check if you have a hash rate watchdog enabled.

If your hash rate dips below a specified value, the unit will reset. Turn this off so you can fully diagnose the problem as there may be another issue going on being masked by the watchdog before you can troubleshoot it. 

Check your fans.

If you have fans that are going bad it can cause the hash rate to drop to zero, built in protection, and the unit may restart. Furthermore if the fans peg at 100% it may also cause a fan error. This may also then lead to the hash rate watchdog rebooting the system.

Check your network connection.

If you have lost your connection to the internet this will lead to no hashing and may then lead to your hash rate watchdog rebooting the system.

Check your hash boards.

If you have a bad hash board this will drop your overall hash rate and may then lead to your hash rate watchdog rebooting the system.

Update your firmware.

Older Bitmain firmware has a tendancy to randomly reboot.