Computer Performance factors

Alex P , National Semiconductor , 2003


How to estimate performance of a computer?
One of the generic measures is MIPS (millions of instructions per second). This is only meaningful when comparing machines with the same architecture, since some architectures may require substantially more instructions than others for the same program. This method also can be very dependent on the mix of instructions and hence on the program used to measure MIPS. Some manufacturers report "peak MIPS" on carefully designed but useless programs.

It is obvious, that all major computer components such as CPU, memory and IO devices together affect computer's performance. Slow RAM or hard disk is going to be a bottleneck for fast CPU.
In reality, however, high performance of PC is always a trade off to low cost:

Option High performance Low cost
Bus architecture Separate address/data Multiplex address/data
Data bus width Wider means faster Low pin count is cheaper
Bus masters Multiple (requires arbitration) Single (no arbitration)
Transfer size Multiple words Single word
Clocking Synchronous Asynchronous

Let's take a look at the factors that influence computer performance in more detail:

  1. The CPU.

    CPU architecture is important. The higher the generation, the better. For example, because of high performance new features, Pentium 75 (fifth generation with the clock rate 75 MHz) outperforms 80486DX100 (which is the fourth generation CPU with the clock rate 100MHz).
    One of the techniques, enhancing the performance , is parallel processing. For example, while an instruction is being executed in the ALU (E), the next instruction can be fetched from memory (F) and decoded (D).

    The drawing illustrates the idea of instruction-level parallelizm or pipelining.

    	 __    __    __    __
    	|  |  |  |  |  |  |  |
           _|  |__|  |__|  |__|  |__
    	F1    F2    F3    F4
    	      D1    D2    D3
    	            E1    E2	

    Instruction Prefetching is another idea, first appeared in 286 (6 byte prefetching). It is based on the fact, that CPU is normally performing sequential code fetching. Only jump instructions alter program flow and they are statistically rare.Rather than wait for the execution unit to request next instruction fetch, CPU during next cycle prefetches the next instruction from memory and put it into prefetch queue to have it ready. If jump instruction is executed the information in prefetch queue is marked as invalid.

  2. Data bus width.
    80486 processors have data bus 32 bits wide, whereas Pentiums are 64 bit processors, thus Pentiums can transfer twice as much data at a time compared to fourth generation CPUs.

  3. Clock rate.
    Since any step of processing can happen only on the "tick" of the clock , the faster the rate the quicker the CPU works.
  4. Memory.

    The diagram illustrates a general memory ierarchy of PC:

    The ammount of RAM really depends on your applications. Reasonable performance today calls for 128 MB. Adding more RAM will speed up the performance if you run several applications at the same time or work with large files and documents.

    L1 cache resides on-chip. The bigger the on-chip cache size - the better, since more instructions and data can be stored on the chip, reducing the number of times the processor has to access slower off-chip memory areas to get data.

  5. IO devices

    Speaking of effective interfacing I/O devices to CPU, synchronous protocol (includes a clock in the control lines) is more effective than asynchronous. A synchronous interface means data and address are transmitted relative to the clock. Since little or no logic is needed to decide what to do next, a synchronous interface can be both fast and inexpencive. A disadvantage of this protocol is that it can not be long because of the clock-skew problem.
    An asynchronous interface does not need clock. Instead, self-timed, handshaking protocols are used between sender and receiver.

    Most I/O devices today are interrupt-driven , i.e. CPU does not do anything for the I/O device until it notifies the CPU by sending interrupt (IRQ). First computers used polling - a simple interface, when the CPU periodically checked status bits to see if it is time for the next I/O operation. Since CPU is much faster than any I/O device, it is obvious that polling is a waste of the CPU's time. In general-purpose applications, using IRQ is the key to multitasking operating systems and good response time.

    Since I/O events often involve block transfers, direct memory access (DMA) hardware is added to many computer systems. DMA is when I/O device acts as a master and transfers large number of words to/from memory without intervention by the CPU.

What is ahead?
  • greater instruction level parallelism?
  • Bigger caches?
  • multiple CPUs ?
  • complete systems on a chip?
  • high performance LANs ?

References and links:
  • Computer system architecture, Morris Mano, third edition, Prentice Hall
  • PC Platform Mindshare class (June 2001)
  • Understanding PCI-Bus subtleties optimizes system performance, Paul Schreier,EE Feb 2000
  • Microcomputer architecture, Ivao Moricita , Tokyo 1994
  • Quenching processor thirst, Electronics World, May 2001
  • EDN Oct30, 2003 "Speed searching:Tweaking Windows for rich-media performance."