Fetch-Decode-Execute Cycle
The heartbeat of every processor — the three-step loop that runs millions of times per second
Open interactive version (quiz + challenge)Real-world analogy
What is it?
The Fetch-Decode-Execute cycle is the fundamental operating loop of every CPU. In the Fetch phase, the processor retrieves the next instruction from memory using the Instruction Pointer. In the Decode phase, the control unit interprets the opcode to determine what operation and operands are involved. In the Execute phase, the CPU performs the actual operation. This three-step cycle repeats for every instruction the processor runs. The 8086 enhances this basic cycle with a 6-byte prefetch queue that overlaps fetching and execution for better performance.
Real-world relevance
Every program you have ever run — from a web browser to a video game — was ultimately executed as billions of fetch-decode-execute cycles. When your PC boots, the very first cycle fetches the instruction at address FFFF0h (the reset vector). From that moment until you shut down, the cycle never stops. Modern CPUs run this loop billions of times per second and add sophisticated features like out-of-order execution, branch prediction, and speculative execution — but the fundamental three-step loop is the same one the 8086 used in 1978.
Key points
- The Three-Step Cycle — Every instruction the CPU runs goes through three fundamental phases: Fetch (retrieve the instruction from memory), Decode (figure out what the instruction means), and Execute (perform the operation). This cycle repeats for every single instruction in your program, millions of times per second.
- Fetch Phase — Getting the Instruction — During the fetch phase, the CPU places the value of the Instruction Pointer (IP) onto the address bus. The memory at that address responds by placing the instruction opcode onto the data bus. The CPU reads those bytes into its instruction register and increments IP to point to the next instruction.
- Decode Phase — Understanding the Instruction — The control unit examines the opcode in the instruction register and determines: what operation to perform (ADD, MOV, JMP, etc.), which registers or memory locations are involved, and how many additional bytes (operands) need to be read. The decoder activates the correct internal data paths.
- Execute Phase — Performing the Operation — The CPU carries out the decoded instruction. This could mean: transferring data between registers, sending values to the ALU for arithmetic, reading or writing memory, or modifying the instruction pointer for a jump. The result is stored in the destination specified by the instruction.
- The Role of the Instruction Pointer — The IP (Instruction Pointer) is the CPU's bookmark. It always holds the address of the next instruction to fetch. After each fetch, IP is automatically incremented. Jump instructions (JMP, JE, CALL) modify IP directly, changing the flow of execution.
- Instruction Pipelining in 8086 — The 8086 has a 6-byte prefetch queue managed by the Bus Interface Unit (BIU). While the Execution Unit (EU) is executing the current instruction, the BIU fetches the next instructions ahead of time. This overlap means the CPU rarely has to wait for a fetch — a primitive form of pipelining.
- What Happens on a Branch — When a jump or conditional branch is executed, the prefetch queue becomes invalid because the CPU is going to a different address. The BIU must flush the queue and start fetching from the new target address. This causes a brief stall known as a pipeline flush or branch penalty.
- Bus Interface Unit vs Execution Unit — The 8086 CPU is internally divided into two independent units. The BIU handles all communication with the outside world: fetching instructions, reading memory, writing memory. The EU handles decoding and executing instructions. They work in parallel, communicating through the instruction queue.
- Cycle Timing and Clock Dependence — Each phase of the fetch-decode-execute cycle takes a specific number of clock cycles (T-states). A simple register MOV might take 2 clocks, while a memory-to-register ADD might take 9+. The CPU clock speed determines how fast these cycles complete. An 8086 at 5 MHz completes one T-state every 200 nanoseconds.
- Interrupts and the Cycle — The CPU checks for pending interrupts at the end of each instruction's execute phase (not mid-instruction). If an interrupt is pending and IF=1, the CPU pushes flags, CS, and IP onto the stack, then fetches the interrupt handler address from the interrupt vector table — beginning the fetch-decode-execute cycle at the handler.
Code example
; Complete fetch-decode-execute trace for a small program
; Running on 8086, starting at CS:IP = 0000:0100
; Program in memory:
; 0100h: B8 05 00 ; MOV AX, 0005h
; 0103h: BB 03 00 ; MOV BX, 0003h
; 0106h: 01 D8 ; ADD AX, BX
; 0108h: F4 ; HLT
; --- Cycle 1: MOV AX, 0005h ---
; FETCH: IP=0100h, read bytes B8 05 00 from memory
; DECODE: B8 = MOV AX, imm16 -> need 2 more bytes (05 00)
; EXECUTE: AX <- 0005h, IP advanced to 0103h
; --- Cycle 2: MOV BX, 0003h ---
; FETCH: IP=0103h, read bytes BB 03 00 from memory
; DECODE: BB = MOV BX, imm16 -> need 2 more bytes (03 00)
; EXECUTE: BX <- 0003h, IP advanced to 0106h
; --- Cycle 3: ADD AX, BX ---
; FETCH: IP=0106h, read bytes 01 D8 from memory
; DECODE: 01 = ADD r/m16, r16; D8 = ModR/M for AX, BX
; EXECUTE: ALU computes 0005h + 0003h = 0008h
; AX <- 0008h, flags updated (ZF=0, CF=0)
; IP advanced to 0108h
; --- Cycle 4: HLT ---
; FETCH: IP=0108h, read byte F4 from memory
; DECODE: F4 = HLT (halt processor)
; EXECUTE: CPU enters halt state, cycle stops
; (only RESET, NMI, or INTR can wake it)Line-by-line walkthrough
- 1. Our program starts at address 0100h. The CPU's Instruction Pointer (IP) is set to 0100h, ready to begin the first fetch.
- 2. Cycle 1 FETCH: The BIU puts address 0100h on the address bus. Memory responds with bytes B8 05 00. These are loaded into the instruction register.
- 3. Cycle 1 DECODE: The decoder sees opcode B8h and recognizes it as 'MOV AX, immediate 16-bit value.' It knows to read the next 2 bytes (05 00) as the value 0005h (little-endian).
- 4. Cycle 1 EXECUTE: The value 0005h is loaded into register AX. IP is advanced by 3 bytes (instruction length) to 0103h.
- 5. Cycle 2 follows the same pattern for MOV BX, 0003h — BX gets loaded with 0003h, IP advances to 0106h.
- 6. Cycle 3 FETCH: Bytes 01 D8 are fetched from address 0106h. Meanwhile, the BIU may have already prefetched these into the 6-byte queue.
- 7. Cycle 3 DECODE: Opcode 01h means 'ADD r/m16, r16'. The ModR/M byte D8h specifies the destination is AX and the source is BX.
- 8. Cycle 3 EXECUTE: The ALU adds AX (0005h) + BX (0003h) = 0008h. The result goes into AX. Flags are updated: ZF=0 (not zero), CF=0 (no carry).
- 9. Cycle 4: The HLT instruction (F4h) is fetched, decoded, and executed. The CPU enters halt state and stops the fetch-decode-execute cycle until an interrupt or reset occurs.
Spot the bug
; Program to add numbers 1+2+3+4+5 = 15
; Expected: AX = 000Fh (15 decimal)
MOV AX, 0000h ; sum = 0
MOV CX, 0005h ; counter = 5
LOOP_START:
ADD AX, CX ; sum += counter
DEC CX ; counter--
JNZ LOOP_START ; jump if counter != 0
; BUG: What if we replaced JNZ with JMP?
; JMP LOOP_START ; unconditional jumpNeed a hint?
Show answer
Explain like I'm 5
Fun fact
Hands-on challenge
More resources
- Instruction Cycle - Wikipedia (Wikipedia)
- Fetch Decode Execute Cycle Explained (YouTube - Computerphile)
- 8086 Internal Architecture (GeeksforGeeks)
- 8086 BIU and EU Explained (TutorialsPoint)