Data Sizes, Endianness, and Alignment
How the 8086 stores bytes and words in memory — little-endian order and alignment effects
Open interactive version (quiz + challenge)Real-world analogy
What is it?
The 8086 handles data as bytes (8-bit) or words (16-bit) and stores multi-byte values in little-endian format — the least significant byte at the lowest address. Word-aligned accesses (even addresses) complete in one bus cycle; unaligned accesses take two. Assembler directives DB, DW, and DD define data in memory with proper sizing and initialization.
Real-world relevance
Little-endian is used by all x86 and ARM processors today — when you inspect memory in any debugger on a PC, you see little-endian byte order. Network protocols use big-endian (called 'network byte order'), which is why functions like htons() and ntohs() exist in socket programming to convert between the two formats. Alignment issues still matter — accessing an unaligned 64-bit value on modern x86 can be 2-3x slower than aligned access.
Key points
- Byte (8-bit) vs Word (16-bit) — The 8086 is a 16-bit processor: its registers and data bus are 16 bits wide. A byte is 8 bits (00h to FFh). A word is 16 bits (0000h to FFFFh). Instructions can operate on either bytes or words, determined by the register used (AL=byte, AX=word).
- Little-Endian Byte Order — The 8086 uses little-endian storage: the least significant byte (LSB) is stored at the lower memory address, and the most significant byte (MSB) at the higher address. For word 1234h: byte 34h goes to address N, byte 12h goes to address N+1.
- Why Little-Endian? — Little-endian simplifies hardware for multi-precision arithmetic: adding two multi-byte numbers starts with the least significant byte, which is always at the base address. It also allows reading a byte value from the same address whether you treat it as a byte or word — the low byte is always at offset 0.
- Word Alignment — A word is 'aligned' if stored at an even address (divisible by 2). The 8086 can access unaligned words, but it takes two bus cycles instead of one — the bus interface must read two separate bytes and combine them. Aligned access is twice as fast.
- DB Directive — Define Byte — DB (Define Byte) reserves and initializes one or more bytes in memory. Used for byte-sized variables, character strings, and lookup tables. Each value occupies exactly 1 byte. Strings are stored as consecutive ASCII bytes.
- DW Directive — Define Word — DW (Define Word) reserves and initializes 16-bit words. Each word occupies 2 bytes stored in little-endian order. Used for 16-bit variables, addresses, and offset tables. Useful for storing pointers and segment:offset pairs.
- DD Directive — Define Doubleword — DD (Define Doubleword) reserves 4 bytes (32 bits). Used for far pointers (segment:offset), 32-bit values, and floating-point data. Stored in little-endian: the least significant byte at the lowest address.
- DUP Operator — DUP repeats a value or pattern to initialize multiple bytes or words. '100 DUP(0)' creates 100 bytes of zeros. 'DUP(?)' leaves memory uninitialized. Essential for declaring arrays and buffers of known size.
- Byte Order in Instruction Encoding — Instructions themselves are stored in little-endian format. A MOV AX, 1234h encodes as B8 34 12 — opcode B8 followed by the immediate value in little-endian. Understanding this helps when reading memory dumps and debugging.
Code example
; Demonstrating endianness and alignment in 8086
; Setup
MOV AX, 2000h
MOV DS, AX ; DS = 2000h
; Store a word at an even (aligned) address
MOV AX, 0ABCDh
MOV [0100h], AX ; PA = 20100h
; Memory: 20100h = CDh (low byte)
; 20101h = ABh (high byte)
; Store a word at an odd (unaligned) address
MOV [0101h], AX ; PA = 20101h (UNALIGNED!)
; Memory: 20101h = CDh (low byte)
; 20102h = ABh (high byte)
; This takes 2 bus cycles instead of 1!
; Read back to verify endianness
MOV BX, [0100h] ; BX = ABCDh (reassembled)
MOV BL, [0100h] ; BL = CDh (just the low byte)
; Data definition examples
; (in data segment)
; my_byte DB 42h ; 1 byte
; my_word DW 1234h ; 2 bytes: 34h, 12h
; my_dword DD 12345678h ; 4 bytes: 78h, 56h, 34h, 12h
; my_buf DB 100 DUP(0) ; 100 zero bytes
; Verify with byte-by-byte read
; After MOV [0100h], AX where AX=ABCDh:
MOV AL, [0100h] ; AL = CDh (low byte at even addr)
MOV AH, [0101h] ; AH = ABh (high byte at odd addr)
; AX is now ABCDh againLine-by-line walkthrough
- 1. MOV AX, 2000h / MOV DS, AX — sets up the data segment at 2000h. All memory accesses with DS will start from physical address 20000h.
- 2. MOV AX, 0ABCDh / MOV [0100h], AX — stores the word ABCDh at offset 0100h. Little-endian: CDh at 20100h, ABh at 20101h.
- 3. Address 0100h is even, so this word write completes in a single bus cycle — the full 16-bit data bus transfers both bytes at once.
- 4. MOV [0101h], AX — stores ABCDh at ODD address 0101h. This is unaligned! The 8086 must perform two separate byte transfers, doubling the bus time.
- 5. MOV BX, [0100h] — reads back the word from 0100h. The CPU reads CDh and ABh and reassembles them into BX = ABCDh automatically.
- 6. MOV BL, [0100h] — byte read. Only reads 1 byte: BL = CDh. Same address, but we get just the low byte of the word stored there.
- 7. DB, DW, DD directives tell the assembler how many bytes to reserve and how to arrange initial values in memory.
- 8. DUP(0) fills memory with zeros — DUP(?) leaves it undefined, saving assembly time for buffers that will be filled at runtime.
Spot the bug
; Goal: Store and read back a 32-bit value
; DS = 3000h
;
MOV AX, 1234h
MOV [0100h], AX ; Store low word
MOV AX, 5678h
MOV [0101h], AX ; Store high word
;
; Programmer expects memory at 0100h-0103h:
; 34h 12h 78h 56h (representing 56781234h)
; Then reads back:
MOV AX, [0100h] ; Expects 1234h
MOV DX, [0102h] ; Expects 5678hNeed a hint?
Show answer
Explain like I'm 5
Fun fact
Hands-on challenge
More resources
- Endianness Explained (GeeksforGeeks)
- Little Endian vs Big Endian (YouTube)
- 8086 Data Types and Directives (TutorialsPoint)
- Memory Alignment and Performance (IBM Developer)