How to estimate performance of a computer?
One of the generic measures is MIPS (millions of
instructions per second). This is only meaningful when comparing machines with the same architecture, since
some architectures may require substantially more instructions than others for the same program.
This method also can be very dependent on the mix of instructions and hence on the program used to measure MIPS.
Some manufacturers report "peak MIPS" on carefully designed but useless programs.
It is obvious, that all major computer components such as CPU, memory and IO devices together
affect computer's performance. Slow RAM or hard disk is going to be a bottleneck for fast CPU.
In reality, however, high performance of PC is always a trade off to low cost:
|Data bus width
||Wider means faster
||Low pin count is cheaper
||Multiple (requires arbitration)
||Single (no arbitration)
Let's take a look at the factors that influence computer performance in more detail:
- The CPU.
CPU architecture is important. The higher the generation, the better.
For example, because of high performance new features, Pentium 75 (fifth generation with the clock rate 75 MHz)
outperforms 80486DX100 (which is the fourth generation CPU with the clock rate 100MHz).
One of the techniques, enhancing the performance , is parallel processing. For example, while
an instruction is being executed in the ALU (E), the next instruction can be fetched from memory (F) and
The drawing illustrates the idea of instruction-level parallelizm or pipelining.
__ __ __ __
| | | | | | | |
_| |__| |__| |__| |__
F1 F2 F3 F4
D1 D2 D3
Instruction Prefetching is another idea, first appeared in 286 (6 byte prefetching).
It is based on the fact, that CPU is normally performing sequential code fetching. Only jump instructions alter
program flow and they are
statistically rare.Rather than wait for the execution unit to request next instruction fetch, CPU
during next cycle prefetches the next instruction from memory and put it into prefetch queue to have it
If jump instruction is executed the information in prefetch queue is marked as invalid.
- Data bus width.
80486 processors have data bus 32 bits wide, whereas Pentiums are
64 bit processors, thus Pentiums can transfer twice as much data at a time compared to fourth generation
- Clock rate.
Since any step of processing can happen only on the "tick" of the clock , the faster the rate the
quicker the CPU works.
The diagram illustrates a general memory ierarchy of PC:
The ammount of RAM really depends on your applications. Reasonable performance today calls for
128 MB. Adding more RAM will speed up the performance if you run several applications at the same time
or work with large files and documents.
L1 cache resides on-chip. The bigger the on-chip cache size - the better, since more instructions
and data can be stored on the chip, reducing the number of times the processor has to access slower
off-chip memory areas to get data.
- IO devices
Speaking of effective interfacing I/O devices to CPU, synchronous protocol (includes a clock in the
control lines) is more effective than asynchronous. A synchronous interface means data and address
are transmitted relative to the clock. Since little or no logic is needed to decide what to do next,
a synchronous interface can be both fast and inexpencive. A disadvantage of this protocol is that it can not
be long because of the clock-skew problem.
An asynchronous interface does not need clock. Instead, self-timed, handshaking protocols are used between
sender and receiver.
Most I/O devices today are interrupt-driven , i.e. CPU does not do anything for the I/O device
until it notifies the CPU by sending interrupt (IRQ). First computers used polling - a simple
interface, when the CPU periodically checked status bits to see if it is time for the next I/O operation.
Since CPU is much faster than any I/O device, it is obvious that polling is a waste of the CPU's time.
In general-purpose applications, using IRQ
is the key to multitasking operating systems and good response time.
Since I/O events often involve block transfers, direct memory access (DMA) hardware is added to many
computer systems. DMA is when I/O device acts as a master and transfers large number of words to/from
memory without intervention by the CPU.
What is ahead?
- greater instruction level parallelism?
- Bigger caches?
- multiple CPUs ?
- complete systems on a chip?
- high performance LANs ?
References and links:
- Computer system architecture, Morris Mano, third edition, Prentice Hall
- PC Platform Mindshare class (June 2001)
- Understanding PCI-Bus subtleties optimizes system performance, Paul Schreier,EE Feb 2000
- Microcomputer architecture, Ivao Moricita , Tokyo 1994
- Quenching processor thirst, Electronics World, May 2001
- EDN Oct30, 2003 "Speed searching:Tweaking Windows for rich-media performance."