Lesson 10 of 48 intermediate

BIU and EU: The Dual-Engine Design

How the 8086 fetches and executes at the same time — the birth of instruction pipelining

Open interactive version (quiz + challenge)

Real-world analogy

Imagine a restaurant with two workers: a waiter (BIU) and a chef (EU). The waiter goes to the market to fetch ingredients and places them in a prep queue (the instruction queue). Meanwhile, the chef grabs ingredients from the queue and cooks dishes. The waiter does not wait for the chef to finish — he keeps fetching ahead. The chef does not wait for the waiter — he grabs whatever is queued. This overlap means meals (instructions) come out faster than if one person did both jobs sequentially.

What is it?

The 8086 processor is internally split into two concurrent units: the Bus Interface Unit (BIU) and the Execution Unit (EU). The BIU handles all communication with external memory and I/O — it fetches instruction bytes into a 6-byte FIFO queue and services data read/write requests. The EU pulls instructions from this queue, decodes them, and executes them using the ALU and registers. By working in parallel, the BIU pre-fetches while the EU executes, creating a primitive instruction pipeline that significantly improves throughput compared to a single fetch-then-execute cycle.

Real-world relevance

The BIU/EU pipeline concept directly evolved into modern CPU pipelines. Your laptop's CPU pipeline has stages for fetch, decode, rename, schedule, execute, and retire — all descendants of the 8086's original two-stage idea. In embedded systems, engineers who program 8086-compatible chips must understand pipeline stalls to write efficient code, especially in real-time applications like motor controllers and sensor systems where every microsecond counts.

Key points

Code example

; Demonstrating how BIU pre-fetching affects performance
; Sequential arithmetic — BIU stays ahead of EU
.MODEL SMALL
.STACK 100h
.DATA
  val1 DW 1000
  val2 DW 2000
  result DW ?

.CODE
MAIN PROC
  MOV AX, @DATA
  MOV DS, AX

  ; --- Fast: EU works from queue, BIU fetches ahead ---
  MOV AX, 5        ; EU executes, BIU pre-fetches
  ADD AX, 10       ; likely already in queue
  SUB AX, 3        ; likely already in queue
  MOV BX, AX       ; likely already in queue

  ; --- Slower: Memory access forces BIU to pause fetch ---
  MOV AX, [val1]   ; BIU must do data read (stall fetch)
  ADD AX, [val2]   ; another data read (stall again)
  MOV [result], AX  ; data write (stall again)

  ; --- Slowest: Jump flushes the entire queue ---
  JMP continue      ; queue flushed! 6 bytes wasted
  NOP               ; these are never fetched
  NOP
continue:
  MOV AH, 4Ch
  INT 21h
MAIN ENDP
END MAIN

Line-by-line walkthrough

  1. 1. MOV AX, @DATA / MOV DS, AX — standard data segment setup. The BIU fetches these instruction bytes while the EU is idle at startup
  2. 2. MOV AX, 5 / ADD AX, 10 / SUB AX, 3 / MOV BX, AX — register-only operations. The EU executes each from the queue while the BIU freely pre-fetches ahead. No bus conflicts because these instructions only use internal registers
  3. 3. MOV AX, [val1] — the EU asks the BIU to read val1 from memory. The BIU pauses pre-fetching, performs the data read, delivers the value to the EU, then resumes fetching
  4. 4. ADD AX, [val2] — another memory read. The BIU again pauses pre-fetching to service this request. If the queue ran low during the previous stall, the EU may have to wait
  5. 5. MOV [result], AX — a memory write. The BIU handles the write cycle, again pausing instruction pre-fetch
  6. 6. JMP continue — the EU decodes a jump. The instruction queue is completely flushed because all pre-fetched bytes are for the wrong address. The BIU must start fresh from the 'continue' label
  7. 7. NOP / NOP — these instructions were sequentially next in memory and may have been in the queue, but the JMP discarded them. They are never executed
  8. 8. MOV AH, 4Ch / INT 21h — program termination via DOS. The BIU fetches the INT 21h vector from the interrupt table

Spot the bug

; Programmer expects this to be fast
; because all values fit in registers
MOV AX, [array]     ; read from memory
MOV BX, [array+2]   ; read from memory
ADD AX, BX          ; register add
MOV [result], AX     ; write to memory
JMP done             ; jump forward
done:
  MOV AH, 4Ch
  INT 21h
Need a hint?
Count how many times the BIU must pause pre-fetching. Is this really a 'register-fast' routine?
Show answer
Despite the programmer's comment, this code is NOT fast. It causes 3 BIU stalls (two memory reads and one memory write) plus a queue flush from JMP. Only the ADD AX, BX truly benefits from the pipeline. To make it faster, load all needed values into registers at the start, do all computation, then store results — minimizing interleaved memory accesses that stall the pipeline.

Explain like I'm 5

Imagine you are building with LEGO bricks. If you had to walk to the shelf, grab a brick, walk back, place it, then walk back for the next brick — it would take forever. Now imagine your friend stands by the shelf and keeps passing you bricks while you build. You never stop building, and your friend never stops grabbing. That is what the BIU and EU do: one keeps fetching instructions while the other keeps executing them.

Fun fact

The 6-byte instruction queue of the 8086 was a carefully chosen size. Most 8086 instructions are 1-6 bytes long, so the queue can typically hold at least one complete instruction ahead. Intel engineers found that a larger queue would not significantly improve performance because memory-accessing instructions (which stall the BIU) occur frequently enough to keep the queue from ever filling far ahead.

Hands-on challenge

Draw a timing diagram showing 6 clock cycles where the BIU and EU operate in parallel. Show a scenario with: (1) three register-only instructions where the queue stays full, (2) a MOV AX, [mem] instruction where the BIU stalls pre-fetching to service the data read, and (3) a JMP that flushes the queue. Label which unit is active in each cycle and what it is doing.

More resources

Open interactive version (quiz + challenge) ← Back to course: Microprocessor A–Z