I know it’s the time spent by the CPU
waiting for a IO operations to
complete, but what kind of IO
operations precisely? What I am also
not sure, is why it so important?
Can’t the CPU just do something else
while the IO operation completes, and
then get back to processing data?
Yes, the operating system will schedule other processes to run while one is blocked on IO. However inside that process, unless it’s using asynchronous IO, it will not progress until whatever IO operation is complete.
Also what are the right tools to
diagnose what process(es) did exactly
wait for IO.
Some tools you might find useful
iostat, to monitor the service times of your disks
iotop(if your kernel supports it), to monitor the breakdown of IO requests per process
strace, to look at the actual operations issued by a process
And what are the ways to minimize IO
- ensure you have free physical memory so the OS can cache disk blocks in memory
- keep your filesystem disk usage below 80% to avoid excessive fragmentation
- tune your filesystem
- use a battery backed array controller
- choose good buffer sizes when performing io operations
Old question, recently bumped, but felt the existing answers were insufficient.
IOWait definition & properties
IOWait (usually labeled
%wa in top) is a sub-category of idle (
%idle is usually expressed as all idle except defined subcategories), meaning the CPU is not doing anything. Therefore, as long as there is another process that the CPU could be processing, it will do so. Additionally, idle, user, system, iowait, etc are a measurement with respect to the CPU. In other words, you can think of iowait as the idle caused by waiting for io.
Precisely, iowait is time spent receiving and handling hardware interrupts as a percentage of processor ticks. Software interrupts usually are labled separately as
Importance & Potential misconception
IOWait is important because it often is a key metric to know if you’re bottlenecked on IO. But absense of iowait does not necessarily mean your application is not bottlenecked on IO. Consider two applications running on a system. If program 1 is heavily io bottlenecked and program 2 is a heavy CPU user, the
%user + %system of CPU may still be something like ~100% and correspondingly, iowait would show 0. But that’s just because program 2 is intensive and relatively appear to say nothing about program 1 because all this is from the CPU’s point of view.
Tools to Detect IOWait
See posts by Dave Cheney and Xerxes
But also a simple
top will show in
Also, as we are now almost entering 2013, in addition to what others said, the option of simply awesome IO storage devices are affordable, namely SSDs. SSDs are awesome!!!
I found the explanation and examples from this link very useful: What exactly is “iowait”?. BTW, for the sake of completeness, the I/O here refers to disk I/O, but could also include I/O on a network mounted disk (such as nfs), as explained in this other post.
I will quote a few important sections (in case the link goes dead), some of those would be repetitions of what others have said already, but to me at least these were clearer:
To summarize it in one sentence, ‘iowait’ is the percentage of time
the CPU is idle AND there is at least one I/O in progress.
Each CPU can be in one of four states: user, sys, idle, iowait.
I was wondering what happens when system has other processes ready to run while one process is waiting for I/O. The below explains it:
If the CPU is idle, the kernel then determines if there is at least
one I/O currently in progress to either a local disk or a remotely
mounted disk (NFS) which had been initiated from that CPU. If there
is, then the ‘iowait’ counter is incremented by one. If there is no
I/O in progress that was initiated from that CPU, the ‘idle’ counter
is incremented by one.
And here is an example:
Let’s say that there are two programs running on a CPU. One is a ‘dd’
program reading from the disk. The other is a program that does no I/O
but is spending 100% of its time doing computational work. Now assume
that there is a problem with the I/O subsystem and that physical I/Os
are taking over a second to complete. Whenever the ‘dd’ program is
asleep while waiting for its I/Os to complete, the other program is
able to run on that CPU. When the clock interrupt occurs, there will
always be a program running in either user mode or system mode.
Therefore, the %idle and %iowait values will be 0. Even though iowait
is 0 now, that does not mean there is NOT a I/O problem because there
obviously is one if physical I/Os are taking over a second to
The full text is worth reading. Here is a mirror of this page, in case it goes down.