Hello,
I’m having issues with a SATA add in card and would like advice on further troubleshooting steps or if it sounds like I should exchange the card with the merchant.
I recently bought a 8P6G-PCIE-SATA-CARD and have it installed in a PCIe 3.0 x4 slot that’s downstream of an AMD x570 chipset, with x6 relatively new Western Digital WD8003FFBX drives connected to the card. This is all running under Debian 12.4 Bookworm. All of the drives have healthy SMART attributes and pass self tests. I’ve been running a ZFS pool composed of all six drives connected to the card for about a week and a half as of yesterday with no issues, and have put everything from low to punishing loads on the drives and card. The drive cages and chassis have decently good airflow provided by three 140mm intake fans and one exhaust.
Yesterday two of the drives faulted out of the ZFS pool while both they and the system were under low to moderate load. Upon inspection of the syslog, I found that there were ATA errors showing up for the ports that they were on (wired to port 2 on the card, cables 1 and 2 on the 8087 breakout). This led to failed soft and hard resets of the SATA link and the kernel subsequently giving up and disabling the device (thus causing the two drives attached to fault out). The other four drives that were connected through port 1 on the card were unaffected.
SMART data from the drives that were impacted shows no mechanical related errors with the drives themselves or UDMA CRC errors that’d be indicative of a bad physical connection between the drive and controller card. Self tests ran since then have passed. I haven’t had any issues since migrating the drives to the onboard SATA ports, which makes me more confident that it isn’t a problem with the drives (or with other hardware / software). Checking the cables and connections revealed no discernable issues (nothing loose or unseated).
One of the intake fans is blowing in fresh air on the PCI slot area, so I don’t think temperature was an issue unless these cards are known to be particularly sensitive to heat.
Any thoughts on further troubleshooting steps? I’d really like to be able to use this card for the extra ports. Should I try to return the card to the merchant for an exchange- maybe this one is a dud? Maybe this is a linux/driver issue?
Thanks for your time and consideration.