Lesson 27 of 48 intermediate

String Instructions and REP Prefix

Process arrays and strings at hardware speed with the 8086's built-in bulk operations

Open interactive version (quiz + challenge)

Real-world analogy

String instructions are like a factory conveyor belt. The items (bytes/words) move past a workstation one at a time. MOVSB is a worker who picks an item from one belt (DS:SI) and places it on another (ES:DI). REP is the foreman who keeps the belt running until all items are processed (CX reaches 0). The Direction Flag determines whether the belt moves forward or backward.

What is it?

String instructions are a set of 8086 operations (MOVS, CMPS, SCAS, LODS, STOS) designed for efficient bulk processing of byte or word arrays. They automatically update SI and/or DI pointers based on the Direction Flag. Combined with REP/REPE/REPNE prefixes, they execute repeated operations entirely in hardware, providing high-speed memory copy, fill, compare, and search without explicit loop code.

Real-world relevance

String instructions power the inner loops of operating systems. DOS uses REP MOVSB to copy file buffers, REP STOSB to zero memory during allocation, and REPNE SCASB to find string terminators. BIOS uses them to scroll the video buffer. Any time you need memcpy, memset, strcmp, or strchr in assembly, string instructions are the tool.

Key points

Code example

; Program: Copy a string then find a character in it
.MODEL SMALL
.STACK 100h
.DATA
  src  DB 'Hello, 8086 World!', 0
  dest DB 20 DUP(0)
  pos  DW ?

.CODE
MAIN PROC
  MOV AX, @DATA
  MOV DS, AX
  MOV ES, AX         ; ES = DS (same segment)

  ; --- Part 1: Copy string ---
  LEA SI, src
  LEA DI, dest
  MOV CX, 19          ; 18 chars + null
  CLD                  ; forward direction
  REP MOVSB            ; copy src -> dest

  ; --- Part 2: Find '8' in dest ---
  LEA DI, dest
  MOV CX, 19
  MOV AL, '8'          ; search for '8'
  CLD
  REPNE SCASB          ; scan until match
  JNE not_found

  ; DI points one past '8', so subtract to get position
  LEA AX, dest
  SUB DI, AX
  DEC DI               ; DI = zero-based position
  MOV [pos], DI        ; pos = 7

not_found:
  MOV AH, 4Ch
  INT 21h
MAIN ENDP
END MAIN

Line-by-line walkthrough

  1. 1. MOV ES, AX — set ES equal to DS because string destinations always use ES:DI
  2. 2. LEA SI, src / LEA DI, dest — load effective addresses of source and destination strings into the index registers
  3. 3. MOV CX, 19 — set the repeat counter to 19 (18 characters plus the null terminator)
  4. 4. CLD — clear Direction Flag so string operations move forward (incrementing SI and DI)
  5. 5. REP MOVSB — copies 19 bytes from DS:SI to ES:DI. Each iteration: copy byte, increment SI, increment DI, decrement CX
  6. 6. LEA DI, dest — reset DI to the start of dest for the search operation
  7. 7. MOV AL, '8' — load the character we want to find into AL (SCASB always compares with AL)
  8. 8. REPNE SCASB — scan through dest comparing each byte with AL. Stops when a match is found (ZF=1) or CX reaches 0
  9. 9. JNE not_found — if ZF=0 after REPNE SCASB, the character was not found in the buffer
  10. 10. SUB DI, AX / DEC DI — calculate the zero-based position. DI points one past the match, so we subtract the base address and then subtract 1

Spot the bug

LEA SI, src
LEA DI, dst
MOV CX, 50
REP MOVSB
Need a hint?
What is the Direction Flag set to? And is ES configured?
Show answer
Two bugs: (1) DF is not explicitly set — if a previous routine left DF=1, MOVSB will go backward through memory, corrupting data. Add CLD before REP MOVSB. (2) ES may not point to the correct segment. Add MOV AX, @DATA / MOV ES, AX (or equivalent) to ensure ES:DI is valid.

Explain like I'm 5

Imagine you have two long lines of toy blocks (arrays). MOVSB is like picking up one block from the left line and placing it in the right line, then stepping forward to the next block. REP is like telling a robot to keep doing this until all blocks are moved. SCASB is like looking at each block in a line and checking if it is the special red one you want. CMPSB is like comparing two lines block by block to see if they are identical.

Fun fact

On the original 8086, REP MOVSB was only slightly faster than an equivalent manual loop. But on later x86 processors (especially from the Pentium onward), the microcode for REP MOVS was heavily optimized — modern CPUs can use 256-bit wide internal transfers, making REP MOVSB as fast as hand-tuned SIMD code. Intel even added the ERMS (Enhanced REP MOVSB) feature flag in Ivy Bridge.

Hands-on challenge

Write an 8086 program that: (1) defines two strings, (2) uses REPE CMPSB to compare them, (3) if they differ, uses SCASB to find the first occurrence of the differing character in the first string, and (4) stores the mismatch position in a variable. Test with 'ASSEMBLY' and 'ASSEMBLE!'.

More resources

Open interactive version (quiz + challenge) ← Back to course: Microprocessor A–Z