Hard disk and RAID troubleshooting

Description

Checking hard disk health


Contents

Checking file system integrity with fsck (Linux only)

Running a File System Check on a file system that is mounted in a writable state can cause severe data loss or corruption.

To perform a File System Check manually, you have three options:

Automatic fsck at boot time

In most cases, a file system check will be run while the system is being booted up, before the file systems are mounted completely. You can reboot your server to have this done automatically. If any major problems are found, the boot process will halt and the server will wait for user-input on the console.
In such a scenario, you can create a ticket for LeaseWeb Support with all information on the specific server, and what commands were run before rebooting the system.

Run a File System Check through a Live Operating System or "Rescue Mode"

This can be requested by creating a LeaseWeb Support ticket.
On request, a different operating system will be booted on the server without making any changes to the hard drives in the server. A File System Check can then be run on the local hard drives through the Live OS.

Run a File System Check manually

During the system’s boot process, you can do this if you have access to a KVM. You can reboot the system into single user mode and run the fsck before the file systems are mounted or after dismounting them manually.

Checking the health of a hard drive with SmartCTL (Linux only)

To check the health of a disk in the Linux operating system, we use a program called SmartCTL. This uses the hard drive's SMART (Self-Monitoring, Analysis and Reporting Technology) capability to check certain health parameters of the disk. 

SmartCTL is not trustworthy with the newer SSD drives.


To use SmartCTL, we need to install a package on the Linux OS that contains this program. This can be done using the following commands:

Centos / Redhat
yum install smartmontools


Debian / Ubuntu
apt-get install smartmontools

After installing the smartmontools package, you are now equipped with the SmartCTL command. You can use the command as detailed below:

smartctl -a /dev/sda

sda is the first hard drive on the system, sdb would the second, sdc would be third, and so on. The above command gives you an large output on the health of the first hard drive in the system. The most important porperties are listed below:

  Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
5   Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0

These properties give a clear indication of when a hard drive is failing. When you create a Support ticket to have the disk in your system replaced, always try to include the smart output. This will give the Support department a clear overview of your situation. 

Usually having a Reallocated_Sector_Ct  of more than 0 indicates that the hard drive may have started to fail. 

Related keywords
You can click on any of the keywords below this article to see all related articles for that keyword