Clock, T-States, and Machine Cycles
How time is measured inside a processor — the rhythm that synchronizes every operation
Open interactive version (quiz + challenge)Real-world analogy
What is it?
The CPU clock is a precise square wave signal that synchronizes all internal operations. A T-state is one clock cycle — the smallest unit of CPU time. A machine cycle is a group of 4+ T-states that performs one bus operation (memory read, memory write, I/O read, I/O write). An instruction cycle encompasses all the machine cycles needed to completely execute one instruction. The 8086 at 5 MHz has a T-state of 200 nanoseconds, and instructions range from 2 to 200+ T-states depending on complexity.
Real-world relevance
Clock speed is why you see 'GHz' in CPU specs. A modern 5 GHz processor has T-states of just 0.2 nanoseconds — a thousand times shorter than the 8086's 200 ns. However, clock speed alone does not determine performance — a modern CPU does far more work per T-state through pipelining, superscalar execution, and caching. Understanding T-states and machine cycles is essential for embedded systems where you must guarantee precise timing, such as controlling motors, generating signals, or meeting communication protocol deadlines.
Key points
- The Clock Signal — The CPU clock is a continuous square wave oscillating between HIGH (1) and LOW (0). Every internal operation is synchronized to this signal. The 8086 uses an external 8284A clock generator that takes a crystal oscillator input and produces the precise clock signal the CPU needs. The 8086 runs at 5 MHz (5 million cycles per second) or 8 MHz for the 8086-2.
- T-States — The Atomic Unit of Time — A T-state (timing state) is one complete clock cycle — one HIGH pulse plus one LOW pulse. It is the smallest measurable unit of CPU time. Every operation inside the 8086 takes a whole number of T-states. Nothing happens between T-states; all state changes occur at clock edges.
- Machine Cycles — A machine cycle is a group of T-states that accomplishes one specific bus operation — like reading a byte from memory, writing to an I/O port, or acknowledging an interrupt. The 8086 has several types of machine cycles: memory read, memory write, I/O read, I/O write, and interrupt acknowledge. Each takes 4 T-states minimum (T1-T4).
- T1 — Address Phase — During T1, the CPU places the memory address on the address/data bus (AD0-AD15) and the high address on A16-A19. ALE pulses HIGH to tell external latches to capture this address. This is when the system learns which memory location or I/O port is being accessed.
- T2 — Transition Phase — During T2, the bus transitions. For a read, AD0-AD15 go to high-impedance (tri-state) so memory can drive data onto them. For a write, the CPU places data onto AD0-AD15. The appropriate control signal (RD or WR) is asserted during this state.
- T3 — Data Transfer Phase — During T3, actual data transfer occurs. For a read, the memory chip has had time to respond and valid data appears on the data bus. The CPU samples the READY pin at the beginning of T3 — if READY is LOW, the CPU inserts wait states (Tw) until the memory signals it is ready.
- T4 — Completion Phase — During T4, the bus cycle wraps up. Control signals (RD or WR) are deasserted. For a read, the CPU has latched the data internally. The bus returns to idle state, and the next machine cycle can begin at the following T1 — or the bus may enter idle T-states (Ti) if no bus activity is needed.
- Instruction Cycle = Multiple Machine Cycles — A complete instruction cycle consists of one or more machine cycles needed to fully execute an instruction. A simple MOV reg,reg needs no machine cycle (internal only). A MOV AX,[mem] needs one memory read cycle. A MOV [mem],[mem] would need both a read and write cycle. Complex instructions like MUL can need many cycles.
- Calculating Execution Time — To find how long an instruction takes in real time, multiply its T-state count by the clock period. At 5 MHz, each T-state is 200 ns. If MOV AX,[2000h] takes 14 T-states, its execution time is 14 * 200 ns = 2800 ns = 2.8 microseconds. This calculation is essential for writing time-critical code.
- Wait States and System Performance — When slow memory or I/O devices cannot respond within the standard T1-T4 window, the READY signal forces wait states (Tw). Each Tw adds one full clock period of delay. Systems with zero wait states are fastest. Adding wait states is like adding speed bumps — the CPU can handle it but throughput drops proportionally.
Code example
; Timing analysis of a complete program on 8086 at 5 MHz
; Each T-state = 200 ns
; --- Program ---
; Instruction T-states Time (ns) Machine Cycles
; -----------------------------------------------------------
MOV AX, 2000h ; 4 800 opcode fetch only
MOV DS, AX ; 2 400 internal
MOV CX, 0005h ; 4 800 opcode fetch only
MOV BX, 0000h ; 4 800 opcode fetch only
LOOP_START:
MOV AX, [BX] ; 13 2600 fetch + mem read
ADD AX, 0001h ; 4 800 fetch + internal
MOV [BX], AX ; 13 2600 fetch + mem write
ADD BX, 0002h ; 4 800 fetch + internal
DEC CX ; 2 400 internal
JNZ LOOP_START ; 16 3200 fetch (taken=16, not=4)
; 4 800 (final iteration, not taken)
; Loop body per iteration: 13+4+13+4+2+16 = 52 T-states
; Loop body total: 4 iterations * 52 + last * 40 = 248 T-states
; Setup: 4+2+4+4 = 14 T-states
;
; Total: 14 + 248 = 262 T-states
; Total time: 262 * 200 ns = 52,400 ns = 52.4 microsecondsLine-by-line walkthrough
- 1. Our timing analysis starts with setup instructions. MOV AX, 2000h takes 4 T-states because it needs to fetch the opcode and the immediate value — but it's fast since no memory data read is needed beyond the instruction fetch.
- 2. MOV DS, AX takes only 2 T-states — it's an internal register-to-segment-register transfer, requiring no bus activity beyond the instruction fetch already in the queue.
- 3. Inside the loop, MOV AX, [BX] takes 13 T-states. This breaks down into: the opcode fetch machine cycle, address calculation (effective address = DS*16 + BX), and one memory read machine cycle (T1-T2-T3-T4) to get the actual data.
- 4. ADD AX, 0001h takes only 4 T-states because the immediate value is part of the instruction stream (already in the prefetch queue) and the addition happens internally in the ALU.
- 5. MOV [BX], AX takes 13 T-states — similar to the read but now includes a memory write machine cycle where the data bus carries AX's value out to memory.
- 6. DEC CX is blazing fast at 2 T-states — entirely internal, the ALU decrements CX and updates FLAGS.
- 7. JNZ LOOP_START takes 16 T-states when the branch is taken (CX is not zero) because it must flush the prefetch queue and fetch from the new target address. On the final iteration when CX=0, the branch is not taken, costing only 4 T-states.
- 8. The total execution time of 52.4 microseconds demonstrates how even simple programs can be precisely timed when you know each instruction's T-state count.
Spot the bug
; Timing bug: Delay loop designed for 1 ms delay
; Clock: 5 MHz (T-state = 200 ns)
; Need: 1,000,000 ns / 200 ns = 5000 T-states
MOV CX, 1000 ; load counter
DELAY:
NOP ; 3 T-states
NOP ; 3 T-states
DEC CX ; 2 T-states
JNZ DELAY ; 16 T-states (taken)
; Total per loop: 3+3+2+16 = 24 T-states
; Expected: 1000 * 24 = 24000 T-states = 4.8 ms
; BUG: The delay is 4.8 ms, not 1 ms!Need a hint?
Show answer
Explain like I'm 5
Fun fact
Hands-on challenge
More resources
- 8086 Machine Cycle and Timing Diagram (GeeksforGeeks)
- Clock and Timing in Microprocessors (YouTube)
- 8086 Instruction Timing (TutorialsPoint)
- Understanding CPU Clock Speed (Wikipedia)