Lesson 5 of 48 beginner

Fetch-Decode-Execute Cycle

The heartbeat of every processor — the three-step loop that runs millions of times per second

Open interactive version (quiz + challenge)

Real-world analogy

Imagine you work at a restaurant. Each order follows the same cycle: (1) Fetch — the waiter picks up the next order slip from the queue, (2) Decode — the chef reads the slip to understand what dish to make, (3) Execute — the chef actually cooks the dish. Then the waiter grabs the next slip and the cycle repeats endlessly, all day long. A microprocessor works exactly the same way with instructions instead of order slips.

What is it?

The Fetch-Decode-Execute cycle is the fundamental operating loop of every CPU. In the Fetch phase, the processor retrieves the next instruction from memory using the Instruction Pointer. In the Decode phase, the control unit interprets the opcode to determine what operation and operands are involved. In the Execute phase, the CPU performs the actual operation. This three-step cycle repeats for every instruction the processor runs. The 8086 enhances this basic cycle with a 6-byte prefetch queue that overlaps fetching and execution for better performance.

Real-world relevance

Every program you have ever run — from a web browser to a video game — was ultimately executed as billions of fetch-decode-execute cycles. When your PC boots, the very first cycle fetches the instruction at address FFFF0h (the reset vector). From that moment until you shut down, the cycle never stops. Modern CPUs run this loop billions of times per second and add sophisticated features like out-of-order execution, branch prediction, and speculative execution — but the fundamental three-step loop is the same one the 8086 used in 1978.

Key points

The Three-Step Cycle — Every instruction the CPU runs goes through three fundamental phases: Fetch (retrieve the instruction from memory), Decode (figure out what the instruction means), and Execute (perform the operation). This cycle repeats for every single instruction in your program, millions of times per second.
Fetch Phase — Getting the Instruction — During the fetch phase, the CPU places the value of the Instruction Pointer (IP) onto the address bus. The memory at that address responds by placing the instruction opcode onto the data bus. The CPU reads those bytes into its instruction register and increments IP to point to the next instruction.
Decode Phase — Understanding the Instruction — The control unit examines the opcode in the instruction register and determines: what operation to perform (ADD, MOV, JMP, etc.), which registers or memory locations are involved, and how many additional bytes (operands) need to be read. The decoder activates the correct internal data paths.
Execute Phase — Performing the Operation — The CPU carries out the decoded instruction. This could mean: transferring data between registers, sending values to the ALU for arithmetic, reading or writing memory, or modifying the instruction pointer for a jump. The result is stored in the destination specified by the instruction.
The Role of the Instruction Pointer — The IP (Instruction Pointer) is the CPU's bookmark. It always holds the address of the next instruction to fetch. After each fetch, IP is automatically incremented. Jump instructions (JMP, JE, CALL) modify IP directly, changing the flow of execution.
Instruction Pipelining in 8086 — The 8086 has a 6-byte prefetch queue managed by the Bus Interface Unit (BIU). While the Execution Unit (EU) is executing the current instruction, the BIU fetches the next instructions ahead of time. This overlap means the CPU rarely has to wait for a fetch — a primitive form of pipelining.
What Happens on a Branch — When a jump or conditional branch is executed, the prefetch queue becomes invalid because the CPU is going to a different address. The BIU must flush the queue and start fetching from the new target address. This causes a brief stall known as a pipeline flush or branch penalty.
Bus Interface Unit vs Execution Unit — The 8086 CPU is internally divided into two independent units. The BIU handles all communication with the outside world: fetching instructions, reading memory, writing memory. The EU handles decoding and executing instructions. They work in parallel, communicating through the instruction queue.
Cycle Timing and Clock Dependence — Each phase of the fetch-decode-execute cycle takes a specific number of clock cycles (T-states). A simple register MOV might take 2 clocks, while a memory-to-register ADD might take 9+. The CPU clock speed determines how fast these cycles complete. An 8086 at 5 MHz completes one T-state every 200 nanoseconds.
Interrupts and the Cycle — The CPU checks for pending interrupts at the end of each instruction's execute phase (not mid-instruction). If an interrupt is pending and IF=1, the CPU pushes flags, CS, and IP onto the stack, then fetches the interrupt handler address from the interrupt vector table — beginning the fetch-decode-execute cycle at the handler.

Code example

; Complete fetch-decode-execute trace for a small program
; Running on 8086, starting at CS:IP = 0000:0100

; Program in memory:
; 0100h: B8 05 00    ; MOV AX, 0005h
; 0103h: BB 03 00    ; MOV BX, 0003h
; 0106h: 01 D8       ; ADD AX, BX
; 0108h: F4          ; HLT

; --- Cycle 1: MOV AX, 0005h ---
; FETCH:   IP=0100h, read bytes B8 05 00 from memory
; DECODE:  B8 = MOV AX, imm16 -> need 2 more bytes (05 00)
; EXECUTE: AX <- 0005h, IP advanced to 0103h

; --- Cycle 2: MOV BX, 0003h ---
; FETCH:   IP=0103h, read bytes BB 03 00 from memory
; DECODE:  BB = MOV BX, imm16 -> need 2 more bytes (03 00)
; EXECUTE: BX <- 0003h, IP advanced to 0106h

; --- Cycle 3: ADD AX, BX ---
; FETCH:   IP=0106h, read bytes 01 D8 from memory
; DECODE:  01 = ADD r/m16, r16; D8 = ModR/M for AX, BX
; EXECUTE: ALU computes 0005h + 0003h = 0008h
;          AX <- 0008h, flags updated (ZF=0, CF=0)
;          IP advanced to 0108h

; --- Cycle 4: HLT ---
; FETCH:   IP=0108h, read byte F4 from memory
; DECODE:  F4 = HLT (halt processor)
; EXECUTE: CPU enters halt state, cycle stops
;          (only RESET, NMI, or INTR can wake it)

Line-by-line walkthrough

1. Our program starts at address 0100h. The CPU's Instruction Pointer (IP) is set to 0100h, ready to begin the first fetch.
2. Cycle 1 FETCH: The BIU puts address 0100h on the address bus. Memory responds with bytes B8 05 00. These are loaded into the instruction register.
3. Cycle 1 DECODE: The decoder sees opcode B8h and recognizes it as 'MOV AX, immediate 16-bit value.' It knows to read the next 2 bytes (05 00) as the value 0005h (little-endian).
4. Cycle 1 EXECUTE: The value 0005h is loaded into register AX. IP is advanced by 3 bytes (instruction length) to 0103h.
5. Cycle 2 follows the same pattern for MOV BX, 0003h — BX gets loaded with 0003h, IP advances to 0106h.
6. Cycle 3 FETCH: Bytes 01 D8 are fetched from address 0106h. Meanwhile, the BIU may have already prefetched these into the 6-byte queue.
7. Cycle 3 DECODE: Opcode 01h means 'ADD r/m16, r16'. The ModR/M byte D8h specifies the destination is AX and the source is BX.
8. Cycle 3 EXECUTE: The ALU adds AX (0005h) + BX (0003h) = 0008h. The result goes into AX. Flags are updated: ZF=0 (not zero), CF=0 (no carry).
9. Cycle 4: The HLT instruction (F4h) is fetched, decoded, and executed. The CPU enters halt state and stops the fetch-decode-execute cycle until an interrupt or reset occurs.

Spot the bug

; Program to add numbers 1+2+3+4+5 = 15
; Expected: AX = 000Fh (15 decimal)

MOV AX, 0000h    ; sum = 0
MOV CX, 0005h    ; counter = 5

LOOP_START:
  ADD AX, CX     ; sum += counter
  DEC CX         ; counter--
  JNZ LOOP_START ; jump if counter != 0

; BUG: What if we replaced JNZ with JMP?
; JMP LOOP_START  ; unconditional jump

Need a hint?

Think about what happens to the fetch-decode-execute cycle if the jump is unconditional. When would CX reach zero, and would the loop ever check?

Show answer

Bug: If JNZ is replaced with JMP (unconditional jump), the loop never terminates. Even though DEC CX eventually makes CX = 0, JMP does not check any flags — it always jumps back to LOOP_START. CX would decrement past 0 to FFFFh, then keep going forever. The fetch-decode-execute cycle would endlessly repeat the loop body. JNZ works correctly because it checks the Zero Flag (ZF) set by DEC — when CX becomes 0, ZF=1 and JNZ does NOT jump, allowing execution to fall through and end.

Explain like I'm 5

Think of a robot following a recipe book. Step 1 (Fetch): The robot looks at the next line in the recipe. Step 2 (Decode): The robot reads and understands what that line says — like 'add 2 cups of flour.' Step 3 (Execute): The robot actually scoops and adds the flour. Then it moves its finger to the next line and starts over. The robot does this for every single line until the recipe is done. That's exactly what a CPU does with instructions!

Fun fact

The 8086's 6-byte prefetch queue was considered revolutionary in 1978. It was one of the first commercial processors to overlap instruction fetching and execution. This simple form of pipelining gave the 8086 a significant speed advantage over the 8085, which had to wait for each fetch to complete before executing. Modern CPUs have taken this idea to extremes — an Intel Core i9 has a pipeline over 20 stages deep!

Hands-on challenge

Write a small 8086 assembly program (5-8 instructions) and trace the complete fetch-decode-execute cycle for each instruction. For each cycle, write down: (1) the value of IP before the fetch, (2) the bytes fetched from memory, (3) what the decoder determines, and (4) what changes during execution (registers, flags, memory). Include at least one conditional jump to show what happens to the prefetch queue when a branch is taken vs not taken.

More resources

Instruction Cycle - Wikipedia (Wikipedia)
Fetch Decode Execute Cycle Explained (YouTube - Computerphile)
8086 Internal Architecture (GeeksforGeeks)
8086 BIU and EU Explained (TutorialsPoint)

Open interactive version (quiz + challenge) ← Back to course: Microprocessor A–Z