Lesson 7 of 48 intermediate

Clock, T-States, and Machine Cycles

How time is measured inside a processor — the rhythm that synchronizes every operation

Open interactive version (quiz + challenge)

Real-world analogy

Think of the CPU clock as a metronome in an orchestra. Every musician (register, ALU, bus) must act in sync with the beat. One tick of the metronome is a T-state — the smallest unit of time. Several ticks together form a measure (machine cycle), like reading a note from the sheet music or playing a chord. A complete musical phrase (instruction cycle) might need several measures. The faster the metronome, the faster the orchestra plays — that's clock speed.

What is it?

The CPU clock is a precise square wave signal that synchronizes all internal operations. A T-state is one clock cycle — the smallest unit of CPU time. A machine cycle is a group of 4+ T-states that performs one bus operation (memory read, memory write, I/O read, I/O write). An instruction cycle encompasses all the machine cycles needed to completely execute one instruction. The 8086 at 5 MHz has a T-state of 200 nanoseconds, and instructions range from 2 to 200+ T-states depending on complexity.

Real-world relevance

Clock speed is why you see 'GHz' in CPU specs. A modern 5 GHz processor has T-states of just 0.2 nanoseconds — a thousand times shorter than the 8086's 200 ns. However, clock speed alone does not determine performance — a modern CPU does far more work per T-state through pipelining, superscalar execution, and caching. Understanding T-states and machine cycles is essential for embedded systems where you must guarantee precise timing, such as controlling motors, generating signals, or meeting communication protocol deadlines.

Key points

The Clock Signal — The CPU clock is a continuous square wave oscillating between HIGH (1) and LOW (0). Every internal operation is synchronized to this signal. The 8086 uses an external 8284A clock generator that takes a crystal oscillator input and produces the precise clock signal the CPU needs. The 8086 runs at 5 MHz (5 million cycles per second) or 8 MHz for the 8086-2.
T-States — The Atomic Unit of Time — A T-state (timing state) is one complete clock cycle — one HIGH pulse plus one LOW pulse. It is the smallest measurable unit of CPU time. Every operation inside the 8086 takes a whole number of T-states. Nothing happens between T-states; all state changes occur at clock edges.
Machine Cycles — A machine cycle is a group of T-states that accomplishes one specific bus operation — like reading a byte from memory, writing to an I/O port, or acknowledging an interrupt. The 8086 has several types of machine cycles: memory read, memory write, I/O read, I/O write, and interrupt acknowledge. Each takes 4 T-states minimum (T1-T4).
T1 — Address Phase — During T1, the CPU places the memory address on the address/data bus (AD0-AD15) and the high address on A16-A19. ALE pulses HIGH to tell external latches to capture this address. This is when the system learns which memory location or I/O port is being accessed.
T2 — Transition Phase — During T2, the bus transitions. For a read, AD0-AD15 go to high-impedance (tri-state) so memory can drive data onto them. For a write, the CPU places data onto AD0-AD15. The appropriate control signal (RD or WR) is asserted during this state.
T3 — Data Transfer Phase — During T3, actual data transfer occurs. For a read, the memory chip has had time to respond and valid data appears on the data bus. The CPU samples the READY pin at the beginning of T3 — if READY is LOW, the CPU inserts wait states (Tw) until the memory signals it is ready.
T4 — Completion Phase — During T4, the bus cycle wraps up. Control signals (RD or WR) are deasserted. For a read, the CPU has latched the data internally. The bus returns to idle state, and the next machine cycle can begin at the following T1 — or the bus may enter idle T-states (Ti) if no bus activity is needed.
Instruction Cycle = Multiple Machine Cycles — A complete instruction cycle consists of one or more machine cycles needed to fully execute an instruction. A simple MOV reg,reg needs no machine cycle (internal only). A MOV AX,[mem] needs one memory read cycle. A MOV [mem],[mem] would need both a read and write cycle. Complex instructions like MUL can need many cycles.
Calculating Execution Time — To find how long an instruction takes in real time, multiply its T-state count by the clock period. At 5 MHz, each T-state is 200 ns. If MOV AX,[2000h] takes 14 T-states, its execution time is 14 * 200 ns = 2800 ns = 2.8 microseconds. This calculation is essential for writing time-critical code.
Wait States and System Performance — When slow memory or I/O devices cannot respond within the standard T1-T4 window, the READY signal forces wait states (Tw). Each Tw adds one full clock period of delay. Systems with zero wait states are fastest. Adding wait states is like adding speed bumps — the CPU can handle it but throughput drops proportionally.

Code example

; Timing analysis of a complete program on 8086 at 5 MHz
; Each T-state = 200 ns

; --- Program ---
; Instruction         T-states  Time (ns)  Machine Cycles
; -----------------------------------------------------------
MOV AX, 2000h      ;    4       800       opcode fetch only
MOV DS, AX         ;    2       400       internal
MOV CX, 0005h      ;    4       800       opcode fetch only
MOV BX, 0000h      ;    4       800       opcode fetch only

LOOP_START:
  MOV AX, [BX]     ;   13      2600       fetch + mem read
  ADD AX, 0001h    ;    4       800       fetch + internal
  MOV [BX], AX     ;   13      2600       fetch + mem write
  ADD BX, 0002h    ;    4       800       fetch + internal
  DEC CX           ;    2       400       internal
  JNZ LOOP_START   ;   16      3200       fetch (taken=16, not=4)
                    ;    4       800       (final iteration, not taken)

; Loop body per iteration: 13+4+13+4+2+16 = 52 T-states
; Loop body total: 4 iterations * 52 + last * 40 = 248 T-states
; Setup: 4+2+4+4 = 14 T-states
;
; Total: 14 + 248 = 262 T-states
; Total time: 262 * 200 ns = 52,400 ns = 52.4 microseconds

Line-by-line walkthrough

1. Our timing analysis starts with setup instructions. MOV AX, 2000h takes 4 T-states because it needs to fetch the opcode and the immediate value — but it's fast since no memory data read is needed beyond the instruction fetch.
2. MOV DS, AX takes only 2 T-states — it's an internal register-to-segment-register transfer, requiring no bus activity beyond the instruction fetch already in the queue.
3. Inside the loop, MOV AX, [BX] takes 13 T-states. This breaks down into: the opcode fetch machine cycle, address calculation (effective address = DS*16 + BX), and one memory read machine cycle (T1-T2-T3-T4) to get the actual data.
4. ADD AX, 0001h takes only 4 T-states because the immediate value is part of the instruction stream (already in the prefetch queue) and the addition happens internally in the ALU.
5. MOV [BX], AX takes 13 T-states — similar to the read but now includes a memory write machine cycle where the data bus carries AX's value out to memory.
6. DEC CX is blazing fast at 2 T-states — entirely internal, the ALU decrements CX and updates FLAGS.
7. JNZ LOOP_START takes 16 T-states when the branch is taken (CX is not zero) because it must flush the prefetch queue and fetch from the new target address. On the final iteration when CX=0, the branch is not taken, costing only 4 T-states.
8. The total execution time of 52.4 microseconds demonstrates how even simple programs can be precisely timed when you know each instruction's T-state count.

Spot the bug

; Timing bug: Delay loop designed for 1 ms delay
; Clock: 5 MHz (T-state = 200 ns)
; Need: 1,000,000 ns / 200 ns = 5000 T-states

MOV CX, 1000     ; load counter
DELAY:
  NOP             ; 3 T-states
  NOP             ; 3 T-states
  DEC CX          ; 2 T-states
  JNZ DELAY       ; 16 T-states (taken)
                  ; Total per loop: 3+3+2+16 = 24 T-states
; Expected: 1000 * 24 = 24000 T-states = 4.8 ms
; BUG: The delay is 4.8 ms, not 1 ms!

Need a hint?

The loop body takes 24 T-states per iteration. If you need 5000 T-states total and each iteration costs 24, how many iterations do you actually need?

Show answer

Bug: The counter value is wrong. With 24 T-states per iteration and a target of 5000 T-states, you need 5000 / 24 = approximately 208 iterations, not 1000. With CX=1000, the delay is 1000 * 24 * 200 ns = 4,800,000 ns = 4.8 ms — almost 5 times too long. Fix: Change MOV CX, 1000 to MOV CX, 208. This gives 208 * 24 = 4992 T-states = 998,400 ns, which is approximately 1 ms. For exact 1 ms, you could use MOV CX, 208 plus a few extra NOP instructions after the loop to fine-tune the remaining 8 T-states.

Explain like I'm 5

Imagine a grandfather clock going tick-tock, tick-tock. Each tick-tock is one T-state — the smallest beat of time the CPU knows. To do something simple like picking up a toy, you need a few beats (a machine cycle): tick-put out your hand, tock-grab the toy, tick-pull it back, tock-done! To do something bigger like building a LEGO tower (an instruction), you need several of these grab-and-place actions, each taking a few beats. The faster the clock ticks, the faster you build!

Fun fact

The original 8086 ran at 5 MHz, meaning 5 million clock ticks per second. Today's processors run at 5 GHz — exactly 1000 times faster in clock speed. But the real performance difference is even larger because modern CPUs execute multiple instructions per clock cycle (IPC > 1), while the 8086 needed many clocks per instruction (IPC < 1). A modern CPU can be over 100,000 times faster than the 8086 in raw throughput!

Hands-on challenge

Write an 8086 assembly program that creates a precise time delay of approximately 100 microseconds, assuming a 5 MHz clock (200 ns per T-state). Calculate the exact number of T-states your delay loop takes per iteration, determine how many iterations are needed, and verify your math. Show the T-state count for each instruction in the loop. Hint: You need 100,000 ns / 200 ns = 500 T-states of delay.

More resources

8086 Machine Cycle and Timing Diagram (GeeksforGeeks)
Clock and Timing in Microprocessors (YouTube)
8086 Instruction Timing (TutorialsPoint)
Understanding CPU Clock Speed (Wikipedia)

Open interactive version (quiz + challenge) ← Back to course: Microprocessor A–Z