Home ยป How to recover from a drive failure in a RAID 5 configuration?

How to recover from a drive failure in a RAID 5 configuration?

Solutons:


The system is running very slowly because it has to reconstruct the missing data which involves additional CPU and I/O.

If you have a missing disk in a RAID-5 configuration you have no recovery strategy. If another disk goes down you will lose your data. Run, don’t walk, to the nearest vendor from which you can get a compatible part covered by manufacturer’s warranty shipped by a same-day urgent courier. If the vendor you bought the array from is already in the process of getting the part, get both parts and stash the other one away as a spare.

If you have a RAID-5 being used for a production system you should consider leaving a spare disk in the array as a hot spare.

Added –
If your logs are not on a separate volume (physically separate disks) move them to a separate set of disks, even just a single mirrored pair. This will also be a performance win if your database has any significant load as contention on log volumes has a disproportionately bad effect on performance.

If this is possible you can also make your database more robust by doing the following:

  1. Shut down the database.
  2. Backup the database.
  3. Move the logs to a physically separate set of disks (make sure you reconfigure the database so it knows where the logs have been moved to).
  4. Restart the database and application.

If you have the logs on a separate volume you can restore and roll forward from the backup if and only if a disk failure does not compromise the logs. Database logs should be on a separate disk volume for (amongst others) the following reasons:

  • Logs usage patterns are predominantly sequential, appending log entries onto the end of the file (the file is in effect a ring buffer). This means that a large number of log entries can be written out quickly as there is little disk head seek activity.

  • If they are sharing physical disks with a heavily random access workload (e.g. a transactional tables and indexes) they will be slowed down disproportionately as the head seek activity disrupts the sequential writes.

  • Having the logs on a separate volume is almost always a performance win and only needs a single mirrored pair for logs to support quite a heavy workload. This means that the hardware to do it is quite cheap, so there is a small cost for a big performance and reliability win.

  • If your data array goes down the logs are not lost. If you have a proper backup strategy you can restore from the backup and roll foward from the logs. This means that a whole array can go down on the server without being a single point of failure. Both the log and data arrays have to fail simultaneously to cause data loss.

1) Backup.

Right now no data has been lost. If your backups are not up to date backup now.

2) Read the manual, call the vendor etc.

Different RAID systems have different steps for replacing a disk, and done wrong you risk destroying the whole array. Without knowing what sort of RAID hardware/software you have we can only guess at the steps needed.

Also, the slow performance is because RAID 5 in a degraded state (i.e.: one disk dead) has horrible read performance. How horrible depends on how the parity is stored and which disk died, but the “good” news is slow performance with one disk gone is a known issue and not cause for panic.

First I would read the manual for the hardware/software that you’re using – the section for failure recovery ๐Ÿ™‚

Should be a simple matter of replacing the disk and rebuilding the array though.

The most important point in such cases is that the disk should be replaced as soon as possible since if another disk fails you will probably lose data. Also you should address the cause of failure – was it because the disk was getting old? Should you replace the other ones too? Or was it because of a power surge, heat or vibration?

Related Solutions

When should I not kill -9 a process?

Generally, you should use kill (short for kill -s TERM, or on most systems kill -15) before kill -9 (kill -s KILL) to give the target process a chance to clean up after itself. (Processes can't catch or ignore SIGKILL, but they can and often do catch SIGTERM.)...

Default value for UUID column in Postgres

tl;dr Call DEFAULT when defining a column to invoke one of the OSSP uuid functions. The Postgres server will automatically invoke the function every time a row is inserted. CREATE TABLE tbl ( pkey UUID NOT NULL DEFAULT uuid_generate_v1() , CONSTRAINT pkey_tbl...

comparing five integers with if , else if statement

try this : int main () { int n1, n2, n3, n4, n5, biggest,smallest; cout << "Enter the five numbers: "; cin >> n1 >> n2 >> n3 >> n4 >> n5 ; smallest=biggest=n1; if(n2>biggest){ biggest=n2; } if(n2<smallest){ smallest=n2;...

How to play YouTube audio in background/minimised?

Here's a solution using entirely free and open source software. The basic idea is that although YouTube can't play clips in the background, VLC for Android can play clips in the background, so all we need to do is pipe the clip to VLC where we can listen to it...

Why not use “which”? What to use then?

Here is all you never thought you would ever not want to know about it: Summary To get the pathname of an executable in a Bourne-like shell script (there are a few caveats; see below): ls=$(command -v ls) To find out if a given command exists: if command -v...

Split string into Array of Arrays [closed]

If I got correct what you want to receive as a result, then this code would make what you want: extension Array { func chunked(into size: Int) -> [[Element]] { return stride(from: 0, to: self.count, by: size).map { Array(self[$0 ..< Swift.min($0 + size,...

Retrieving n rows per group

Let's start with the basic scenario. If I want to get some number of rows out of a table, I have two main options: ranking functions; or TOP. First, let's consider the whole set from Production.TransactionHistory for a particular ProductID: SELECT...

Don’t understand how my mum’s Gmail account was hacked

IMPORTANT: this is based on data I got from your link, but the server might implement some protection. For example, once it has sent its "silver bullet" against a victim, it might answer with a faked "silver bullet" to the same request, so that anyone...

What is /storage/emulated/0/?

/storage/emulated/0/Download is the actual path to the files. /sdcard/Download is a symlink to the actual path of /storage/emulated/0/Download However, the actual files are located in the filesystem in /data/media, which is then mounted to /storage/emulated/0...

How can I pass a command line argument into a shell script?

The shell command and any arguments to that command appear as numbered shell variables: $0 has the string value of the command itself, something like script, ./script, /home/user/bin/script or whatever. Any arguments appear as "$1", "$2", "$3" and so on. The...

What is pointer to string in C?

argv is an array of pointers pointing to zero terminated c-strings. I painted the following pretty picture to help you visualize something about the pointers. And here is a code example that shows you how an operating system would pass arguments to your...

How do mobile carriers know video resolution over HTTPS connections?

This is an active area of research. I happen to have done some work in this area, so I'll share what I can about the basic idea (this work was with industry partners and I can't share the secret details ๐Ÿ™‚ ). The tl;dr is that it's often possible to identify an...

How do I change the name of my Android device?

To change the hostname (device name) you have to use the terminal (as root): For Eclair (2.1): echo MYNAME > /proc/sys/kernel/hostname For Froyo (2.2): (works also on most 2.3) setprop net.hostname MYNAME Then restart your wi-fi. To see the change, type...

How does reverse SSH tunneling work?

I love explaining this kind of thing through visualization. ๐Ÿ™‚ Think of your SSH connections as tubes. Big tubes. Normally, you'll reach through these tubes to run a shell on a remote computer. The shell runs in a virtual terminal (tty). But you know this part...

Difference between database vs user vs schema

In Oracle, users and schemas are essentially the same thing. You can consider that a user is the account you use to connect to a database, and a schema is the set of objects (tables, views, etc.) that belong to that account. See this post on Stack Overflow:...