RAID stands for Redundant Array of Inexpensive Disks.The main purpose of raid is to increase logical capacity of storage devices used,improve read/write performance and ensure redundancy in case of a hard disk failure. Raid devices are marked by two letters and a number. Eg. md0,md1,md2.
Raid is mostly used in large file servers where data accessibility is higher. Raid unit appears to be equivalent to a single large capacity disk drive. The remarkable benefit of disk array is that if any single disk in the RAID fails, the system and array still continues to function without loss of data. This is possible because the redundancy data is stored on separate disk drives and the RAID can reconstruct the data that was stored on the failed disk drive.
Standard RAID levels
A number of standard schemes have evolved. These are called levels. Originally, there were five RAID levels, but many variations have evolved—notably several nested levels and many non-standard levels (mostly proprietary). RAID levels and their associated data formats are standardized by the Storage Networking Industry Association (SNIA) in the Common RAID Disk Drive Format (DDF) standard
- RAID 0
RAID 0 comprises striping . This level provides no data redundancy nor fault tolerance, but improves performance through parallelism of read and write operations across multiple drives. RAID 0 has no error detection mechanism, so the failure of one disk causes the loss of all data on the array. RAID 0 will be having a minimum of 2 disks.
- RAID 1
RAID 1 comprises mirroring. Data is written identically to two (or more) drives, thereby producing a “mirrored set”. The read request is serviced by any of the drives containing the requested data. This can improve performance if data is read from the disk with the least seek latency and rotational latency. Conversely, write performance can be degraded because all drives must be updated; thus the write performance is determined by the slowest drive. The array continues to operate as long as at least one drive is functioning.RAID 1 will be having a minimum of 2 disks.
- RAID 2
RAID 2 comprises bit-level striping with dedicated Hamming-code parity. All disk spindle rotation is synchronized and data is striped such that each sequential bit is on a different drive. Hamming-code parity is calculated across corresponding bits and stored on at least one parity drive. RAID 2 will be having a minimum of 4 disks.
- RAID 3
RAID 3 comprises byte-level striping with dedicated parity. All disk spindle rotation is synchronized and data is striped such that each sequential byte is on a different drive. Parity is calculated across corresponding bytes and stored on a dedicated parity drive.RAID 3 will be having a minimum of 4 disks.
- RAID 4
RAID 4 comprises block-level striping with dedicated parity. This level was previously used by NetApp, but has now been largely replaced by a proprietary implementation of RAID 4 with two parity disks, called RAID-DP.. RAID 4 will be having a minimum of 3 disks.
- RAID 5
RAID 5 comprises block-level striping with distributed parity. Unlike in RAID 4, parity information is distributed among the drives. It requires that all drives but one be present to operate. Upon failure of a single drive, subsequent reads can be calculated from the distributed parity such that no data is lost. RAID 5 requires at least three disks. RAID 5 is seriously affected by the general trends regarding array rebuild time and chance of failure during rebuild.RAID 5 will be having a minimum of 3 disks.
- RAID 6
RAID 6 comprises block-level striping with double distributed parity. Double parity provides fault tolerance up to two failed drives. This makes larger RAID groups more practical, especially for high-availability systems, as large-capacity drives take longer to restore. As with RAID 5, a single drive failure results in reduced performance of the entire array until the failed drive has been replaced. With a RAID 6 array, using drives from multiple sources and manufacturers, it is possible to mitigate most of the problems associated with RAID 5. The larger the drive capacities and the larger the array size, the more important it becomes to choose RAID 6 instead of RAID 5.
/proc/mdstat :
/proc is a pseudo-filesystem on modern Linux operating systems.We can check the status of RAID devices by printing out the contents of mdstat file under /proc. For example, consider below case :
#cat /proc/mdstat
Personalities : [linear] [raid0] [raid1]
md0 : active raid1 sdb4[1] sda4[0]
482359552 blocks [2/2] [UU]
unused devices: <none>
> Personalities: [raid0] [raid1]
This line tells us which type of RAID arrays are used on the system. In this case, we have raid 0 and raid1.
> md0 : active raid1 sdb4[1] sda4[0]
Here md0 is a raid1 device,spanning sdb4 and sdb2 partitions.
> 482359552 blocks [2/2] [UU]
This line gives us information about the device.[2/2] [UU] this means both partitions are used.
Implementations
The distribution of data across multiple drives can be managed either by dedicated computer hardware or by software. A software solution may be part of the operating system, or it may be part of the firmware and drivers supplied with a hardware RAID controller.
Software-based
Software RAID implementations are now provided by many operating systems. Software RAID can be implemented as:
- A layer that abstracts multiple devices, thereby providing a single virtual device
- A more generic logical volume manager (provided with most server-class operating systems,
- A component of the file system.
- A layer that sits above any file system and provides parity protection to user data
- Some advanced file systems are designed to organize data across multiple storage devices directly.
- ZFS supports equivalents of RAID 0, RAID 1, RAID 5 (RAID-Z), RAID 6 (RAID-Z2) and a triple-parity version RAID-Z3. As it always stripes over top-level vdevs, it supports equivalents of the 1+0, 5+0, and 6+0 nested RAID levels (as well as striped triple-parity sets) but not other nested combinations.
- Btrfs supports RAID 0, RAID 1 and RAID 10
- Many operating systems provide basic RAID functionality independently of volume management
If a boot drive fails, the system has to be sophisticated enough to be able to boot off the remaining drive or drives.
Commands and configuration file for Software RAID
mdadm is the modern tool most Linux distributions use to manage software RAID arrays.
#/etc/mdadm.conf is the main configuration file for mdadm
#cat /proc/mdstat and mdadm –detail : used to view the raid status.
Example :#cat /proc/mdstat :: Displays the content of /proc/mdstat
Personalities : [raid1] [raid10]
md2 : active raid10 sda3[0] sdd3[3] sdc3[2] sdb3[1]
959194880 blocks 64K chunks 2 near-copies [4/4] [UUUU]
md1 : active raid10 sda2[0] sdd2[3] sdc2[2] sdb2[1]
17385216 blocks 64K chunks 2 near-copies [4/4] [UUUU]
md0 : active raid1 sda1[0] sdb1[3] sdd1[2] sdc1[1]
96256 blocks [4/4] [UUUU]unused devices:
#mdadm —create : used to create a new array
Example : mdadm –create /dev/md0 –level1 /dev/sda1 /dev/sdb2 :: Creates a raid disk named /dev/mdo with raid 1 configuration using the disks /dev/sda1 /dev/sdb2
#mdadm –stop : to stop a RAID array.
Example : mdadm –stop /dev/md0 //Now the device will be stopped
mdadm –remove /dev/md0 // The Raid Device is removed.
#mdadm –fail and mdadm –remove : used to fail and remove RAID arrays respectively.
Note : The Raid Disks once created can be only changed after failing it.
Example : mdadm –fail /dev/md0 /dev/sda1 //disk is failed for removing
mdadm –remove /dev/md0 /dev/sda1//disk is removed
#mdadm –add : To add a disk to an existing array.
Example : mdadm –add /dev/md0 /dev/sdb1// New disk is added to RAID
Hardware/driver-based
Software-implemented RAID is not always compatible with the system’s boot process, and it is generally impractical for various operating systems. A RAID is considered hardware-based when it is implemented in hardware, either on the motherboard directly or a separate RAID card. However, hardware RAID controllers are expensive and proprietary. SO, cheap “RAID controllers” were introduced that do not contain a dedicated RAID controller chip, but simply a standard drive controller chip with proprietary firmware and drivers; during early stage bootup, the RAID is implemented by the firmware, and once the operating system has been more completely loaded, then the drivers take over control. Consequently, such controllers may not work when driver support is not available for the host operating system. As there is some minimal hardware support involved, this implementation approach is also called “hardware-assisted software RAID”,”hybrid model” RAID, or even “fake RAID”.
Commands for Hardware RAID
Commands for managing the hardware raid depends upon the hardware vendor. Various vendors are available such as 3ware, Adaptec, HP, LSI etc.
# lspci | grep -i raid : used to identify the hardware vendor
- For 3ware RAID status can be accessed via the command tw_cli /c0 show
- For LSI RAID status can be accessed via the command megacli -PDList -Aall
- For adaptec RAID status can be accessed via the command /usr/StorMan/arcconf getconfig 1
Advantages of RAID
- Increases the performance and reliability of data storage.
- Ability to maintain redundant data which can be used to restore data in the event of a disk failure
- In event failure, if one of the drives fails then either drive swapped out for a new drive without turning the systems off also known as hot swapable.
- Increase the parity check and regularly checks for any possible system crash.
- It offers fault tolerance and higher throughput levels than a single hard drive or group of independent hard drives.
- Reading and Writing of data done at simultaneously.
- Disk Striping makes multiple smaller hard disks to a single large volume.
Disadvantages
- RAID cannot completely protect your data.
- RAID doesn’t always result in improved system performance.
- Expensive, must purchase and maintain RAID controllers and dedicated hard drives.
- Need to update firmware for RAID controller’s regularly.
- If RAID controller fails, none of the HDD can be accessible, due to this server may go down.
RAID techniques
RAID uses different techniques for writing data to disks. These techniques enable RAID to provide data redundancy or better performance. These techniques include
- Mirroring: Copying data from one physical disk to another physical disk. Mirroring provides data redundancy by maintaining two copies of the same data on different physical disks. If one of the disks in the mirror fails, the system can continue to operate using the unaffected (working) disk. Both sides of the mirror contain the same data at all times. Either side of the mirror can act as the operational side.
- Striping: Disk striping writes data across all physical disks in a virtual disk. Each stripe consists of consecutive virtual disk data addresses that are mapped in fixed-size units to each physical disk in the virtual disk using a sequential pattern. For example, if the virtual disk includes five physical disks, the stripe writes data to physical disks one through five without repeating any of the physical disks. The amount of space consumed by a stripe is the same on each physical disk. The portion of a stripe that resides on a physical disk is a stripe element. Striping by itself does not provide data redundancy. Striping in combination with parity does provide data redundancy.
- Parity: Parity refers to redundant data that is maintained using an algorithm in combination with striping. When one of the striped disks fails, the data can be reconstructed from the parity information using the algorithm.
- Span: A span is a RAID technique used to combine storage space from groups of physical disks into a RAID 10 or 50 virtual disk.
If you require help, contact SupportPRO Server Admin