Lesson 18 of 48 intermediate

Data Sizes, Endianness, and Alignment

How the 8086 stores bytes and words in memory — little-endian order and alignment effects

Open interactive version (quiz + challenge)

Real-world analogy

Little-endian is like writing a phone number starting with the last digit: 1234 becomes '4321' in memory. It seems backwards to humans, but it makes the hardware simpler — the CPU always finds the least significant byte first at the base address, like always finding the ones digit at the front of the line.

What is it?

The 8086 handles data as bytes (8-bit) or words (16-bit) and stores multi-byte values in little-endian format — the least significant byte at the lowest address. Word-aligned accesses (even addresses) complete in one bus cycle; unaligned accesses take two. Assembler directives DB, DW, and DD define data in memory with proper sizing and initialization.

Real-world relevance

Little-endian is used by all x86 and ARM processors today — when you inspect memory in any debugger on a PC, you see little-endian byte order. Network protocols use big-endian (called 'network byte order'), which is why functions like htons() and ntohs() exist in socket programming to convert between the two formats. Alignment issues still matter — accessing an unaligned 64-bit value on modern x86 can be 2-3x slower than aligned access.

Key points

Code example

; Demonstrating endianness and alignment in 8086

; Setup
MOV AX, 2000h
MOV DS, AX         ; DS = 2000h

; Store a word at an even (aligned) address
MOV AX, 0ABCDh
MOV [0100h], AX    ; PA = 20100h
; Memory: 20100h = CDh (low byte)
;         20101h = ABh (high byte)

; Store a word at an odd (unaligned) address
MOV [0101h], AX    ; PA = 20101h (UNALIGNED!)
; Memory: 20101h = CDh (low byte)
;         20102h = ABh (high byte)
; This takes 2 bus cycles instead of 1!

; Read back to verify endianness
MOV BX, [0100h]    ; BX = ABCDh (reassembled)
MOV BL, [0100h]    ; BL = CDh (just the low byte)

; Data definition examples
; (in data segment)
; my_byte  DB  42h          ; 1 byte
; my_word  DW  1234h        ; 2 bytes: 34h, 12h
; my_dword DD  12345678h    ; 4 bytes: 78h, 56h, 34h, 12h
; my_buf   DB  100 DUP(0)   ; 100 zero bytes

; Verify with byte-by-byte read
; After MOV [0100h], AX where AX=ABCDh:
MOV AL, [0100h]    ; AL = CDh (low byte at even addr)
MOV AH, [0101h]    ; AH = ABh (high byte at odd addr)
; AX is now ABCDh again

Line-by-line walkthrough

  1. 1. MOV AX, 2000h / MOV DS, AX — sets up the data segment at 2000h. All memory accesses with DS will start from physical address 20000h.
  2. 2. MOV AX, 0ABCDh / MOV [0100h], AX — stores the word ABCDh at offset 0100h. Little-endian: CDh at 20100h, ABh at 20101h.
  3. 3. Address 0100h is even, so this word write completes in a single bus cycle — the full 16-bit data bus transfers both bytes at once.
  4. 4. MOV [0101h], AX — stores ABCDh at ODD address 0101h. This is unaligned! The 8086 must perform two separate byte transfers, doubling the bus time.
  5. 5. MOV BX, [0100h] — reads back the word from 0100h. The CPU reads CDh and ABh and reassembles them into BX = ABCDh automatically.
  6. 6. MOV BL, [0100h] — byte read. Only reads 1 byte: BL = CDh. Same address, but we get just the low byte of the word stored there.
  7. 7. DB, DW, DD directives tell the assembler how many bytes to reserve and how to arrange initial values in memory.
  8. 8. DUP(0) fills memory with zeros — DUP(?) leaves it undefined, saving assembly time for buffers that will be filled at runtime.

Spot the bug

; Goal: Store and read back a 32-bit value
; DS = 3000h
;
MOV AX, 1234h
MOV [0100h], AX     ; Store low word
MOV AX, 5678h
MOV [0101h], AX     ; Store high word
;
; Programmer expects memory at 0100h-0103h:
;   34h 12h 78h 56h (representing 56781234h)
; Then reads back:
MOV AX, [0100h]     ; Expects 1234h
MOV DX, [0102h]     ; Expects 5678h
Need a hint?
What address does the second MOV write to, and does it overlap with the first write?
Show answer
Bug: The second write MOV [0101h], AX stores 5678h at offset 0101h — which means 78h at 0101h and 56h at 0102h. But the first write put 12h at 0101h! The second write overwrites offset 0101h. Memory becomes: 0100h=34h, 0101h=78h, 0102h=56h (only 3 bytes, not 4). Reading back: MOV AX,[0100h] = 7834h (wrong!) and MOV DX,[0102h] reads from 0102h-0103h which is 56h and garbage. Fix: The high word should be written to [0102h], not [0101h]. Words are 2 bytes, so offsets should be 0100h and 0102h.

Explain like I'm 5

Imagine you want to store the number 1234 using two boxes. Little-endian means you put the '34' part (the small part) in the first box, and the '12' part (the big part) in the second box. When you want the number back, the computer knows to read both boxes and put them in the right order. It seems backwards, but it makes the computer's job easier because it always finds the small part first.

Fun fact

The term 'endianness' comes from Jonathan Swift's 'Gulliver's Travels' where wars were fought over which end of a boiled egg to crack open — the 'Big End' or the 'Little End.' Computer scientist Danny Cohen borrowed this analogy in 1980 to describe byte order disputes between different processor architectures, and the terms stuck!

Hands-on challenge

Given memory contents at DS:0100h = CDh, ABh, 34h, 12h, 78h, 56h — determine the value of AX after each: (a) MOV AX,[0100h], (b) MOV AX,[0101h], (c) MOV AX,[0102h], (d) MOV AL,[0100h]. Then declare a data segment with: a byte variable 'status' initialized to 1, a word variable 'count' initialized to 1000, a doubleword 'far_addr' initialized to 1234h:5678h, and a 64-byte buffer 'buf' filled with FFh. Show the exact memory layout in hex.

More resources

Open interactive version (quiz + challenge) ← Back to course: Microprocessor A–Z