Lesson 5 of 48 beginner

Fetch-Decode-Execute Cycle

The heartbeat of every processor — the three-step loop that runs millions of times per second

Open interactive version (quiz + challenge)

Real-world analogy

Imagine you work at a restaurant. Each order follows the same cycle: (1) Fetch — the waiter picks up the next order slip from the queue, (2) Decode — the chef reads the slip to understand what dish to make, (3) Execute — the chef actually cooks the dish. Then the waiter grabs the next slip and the cycle repeats endlessly, all day long. A microprocessor works exactly the same way with instructions instead of order slips.

What is it?

The Fetch-Decode-Execute cycle is the fundamental operating loop of every CPU. In the Fetch phase, the processor retrieves the next instruction from memory using the Instruction Pointer. In the Decode phase, the control unit interprets the opcode to determine what operation and operands are involved. In the Execute phase, the CPU performs the actual operation. This three-step cycle repeats for every instruction the processor runs. The 8086 enhances this basic cycle with a 6-byte prefetch queue that overlaps fetching and execution for better performance.

Real-world relevance

Every program you have ever run — from a web browser to a video game — was ultimately executed as billions of fetch-decode-execute cycles. When your PC boots, the very first cycle fetches the instruction at address FFFF0h (the reset vector). From that moment until you shut down, the cycle never stops. Modern CPUs run this loop billions of times per second and add sophisticated features like out-of-order execution, branch prediction, and speculative execution — but the fundamental three-step loop is the same one the 8086 used in 1978.

Key points

Code example

; Complete fetch-decode-execute trace for a small program
; Running on 8086, starting at CS:IP = 0000:0100

; Program in memory:
; 0100h: B8 05 00    ; MOV AX, 0005h
; 0103h: BB 03 00    ; MOV BX, 0003h
; 0106h: 01 D8       ; ADD AX, BX
; 0108h: F4          ; HLT

; --- Cycle 1: MOV AX, 0005h ---
; FETCH:   IP=0100h, read bytes B8 05 00 from memory
; DECODE:  B8 = MOV AX, imm16 -> need 2 more bytes (05 00)
; EXECUTE: AX <- 0005h, IP advanced to 0103h

; --- Cycle 2: MOV BX, 0003h ---
; FETCH:   IP=0103h, read bytes BB 03 00 from memory
; DECODE:  BB = MOV BX, imm16 -> need 2 more bytes (03 00)
; EXECUTE: BX <- 0003h, IP advanced to 0106h

; --- Cycle 3: ADD AX, BX ---
; FETCH:   IP=0106h, read bytes 01 D8 from memory
; DECODE:  01 = ADD r/m16, r16; D8 = ModR/M for AX, BX
; EXECUTE: ALU computes 0005h + 0003h = 0008h
;          AX <- 0008h, flags updated (ZF=0, CF=0)
;          IP advanced to 0108h

; --- Cycle 4: HLT ---
; FETCH:   IP=0108h, read byte F4 from memory
; DECODE:  F4 = HLT (halt processor)
; EXECUTE: CPU enters halt state, cycle stops
;          (only RESET, NMI, or INTR can wake it)

Line-by-line walkthrough

  1. 1. Our program starts at address 0100h. The CPU's Instruction Pointer (IP) is set to 0100h, ready to begin the first fetch.
  2. 2. Cycle 1 FETCH: The BIU puts address 0100h on the address bus. Memory responds with bytes B8 05 00. These are loaded into the instruction register.
  3. 3. Cycle 1 DECODE: The decoder sees opcode B8h and recognizes it as 'MOV AX, immediate 16-bit value.' It knows to read the next 2 bytes (05 00) as the value 0005h (little-endian).
  4. 4. Cycle 1 EXECUTE: The value 0005h is loaded into register AX. IP is advanced by 3 bytes (instruction length) to 0103h.
  5. 5. Cycle 2 follows the same pattern for MOV BX, 0003h — BX gets loaded with 0003h, IP advances to 0106h.
  6. 6. Cycle 3 FETCH: Bytes 01 D8 are fetched from address 0106h. Meanwhile, the BIU may have already prefetched these into the 6-byte queue.
  7. 7. Cycle 3 DECODE: Opcode 01h means 'ADD r/m16, r16'. The ModR/M byte D8h specifies the destination is AX and the source is BX.
  8. 8. Cycle 3 EXECUTE: The ALU adds AX (0005h) + BX (0003h) = 0008h. The result goes into AX. Flags are updated: ZF=0 (not zero), CF=0 (no carry).
  9. 9. Cycle 4: The HLT instruction (F4h) is fetched, decoded, and executed. The CPU enters halt state and stops the fetch-decode-execute cycle until an interrupt or reset occurs.

Spot the bug

; Program to add numbers 1+2+3+4+5 = 15
; Expected: AX = 000Fh (15 decimal)

MOV AX, 0000h    ; sum = 0
MOV CX, 0005h    ; counter = 5

LOOP_START:
  ADD AX, CX     ; sum += counter
  DEC CX         ; counter--
  JNZ LOOP_START ; jump if counter != 0

; BUG: What if we replaced JNZ with JMP?
; JMP LOOP_START  ; unconditional jump
Need a hint?
Think about what happens to the fetch-decode-execute cycle if the jump is unconditional. When would CX reach zero, and would the loop ever check?
Show answer
Bug: If JNZ is replaced with JMP (unconditional jump), the loop never terminates. Even though DEC CX eventually makes CX = 0, JMP does not check any flags — it always jumps back to LOOP_START. CX would decrement past 0 to FFFFh, then keep going forever. The fetch-decode-execute cycle would endlessly repeat the loop body. JNZ works correctly because it checks the Zero Flag (ZF) set by DEC — when CX becomes 0, ZF=1 and JNZ does NOT jump, allowing execution to fall through and end.

Explain like I'm 5

Think of a robot following a recipe book. Step 1 (Fetch): The robot looks at the next line in the recipe. Step 2 (Decode): The robot reads and understands what that line says — like 'add 2 cups of flour.' Step 3 (Execute): The robot actually scoops and adds the flour. Then it moves its finger to the next line and starts over. The robot does this for every single line until the recipe is done. That's exactly what a CPU does with instructions!

Fun fact

The 8086's 6-byte prefetch queue was considered revolutionary in 1978. It was one of the first commercial processors to overlap instruction fetching and execution. This simple form of pipelining gave the 8086 a significant speed advantage over the 8085, which had to wait for each fetch to complete before executing. Modern CPUs have taken this idea to extremes — an Intel Core i9 has a pipeline over 20 stages deep!

Hands-on challenge

Write a small 8086 assembly program (5-8 instructions) and trace the complete fetch-decode-execute cycle for each instruction. For each cycle, write down: (1) the value of IP before the fetch, (2) the bytes fetched from memory, (3) what the decoder determines, and (4) what changes during execution (registers, flags, memory). Include at least one conditional jump to show what happens to the prefetch queue when a branch is taken vs not taken.

More resources

Open interactive version (quiz + challenge) ← Back to course: Microprocessor A–Z