Lesson 58 of 77 advanced

System Design III: Offline-First Field App at Scale

Designing for 10K+ field workers in low-connectivity — sync queues, conflict resolution, and battery optimisation

Open interactive version (quiz + challenge)

Real-world analogy

An offline-first field app is like a clipboard that works anywhere — even underground, in rice fields, or in a tunnel. You write on it all day, then when you're back in range, it automatically syncs with the office filing cabinet. But 10,000 people might edit the same file, so you need rules about who wins when two people changed the same thing.

What is it?

An offline-first field app architecture enables full CRUD functionality without connectivity for thousands of concurrent field workers, with background synchronization, conflict resolution, and battery-optimized sync strategies.

Real-world relevance

BRAC's digital data collection tools and FieldBuzz (Bangladesh field operations platform) operate in areas where connectivity is intermittent. Field officers collect household data, visit records, and form submissions offline all day, with the app syncing when they return to areas with signal — maintaining data integrity across 10,000+ concurrent workers.

Key points

Code example

// Offline-first sync architecture with Drift + WorkManager

// Drift schema with sync metadata
class Visits extends Table {
  TextColumn get clientId => text()(); // UUID, set at creation
  TextColumn get serverId => text().nullable()(); // set after sync ACK
  TextColumn get householdId => text()();
  TextColumn get officerId => text()();
  TextColumn get notes => text().withDefault(const Constant(''))();
  TextColumn get gpsLat => real().nullable().map(const RealConverter())();
  TextColumn get syncState => text().withDefault(const Constant('pending'))();
  // pending | syncing | synced | conflict
  DateTimeColumn get updatedAt => dateTime()();
  DateTimeColumn get createdAt => dateTime()();
  DateTimeColumn get deletedAt => dateTime().nullable()(); // soft delete

  @override
  Set<Column> get primaryKey => {clientId};
}

class SyncQueueEntries extends Table {
  IntColumn get id => integer().autoIncrement()();
  TextColumn get entityType => text()(); // 'visit', 'household', etc.
  TextColumn get entityId => text()(); // clientId of the entity
  TextColumn get operation => text()(); // 'insert' | 'update' | 'delete'
  TextColumn get payload => text()(); // JSON of changed fields only
  DateTimeColumn get enqueuedAt => dateTime()();
  IntColumn get retryCount => integer().withDefault(const Constant(0))();
  TextColumn get status => text().withDefault(const Constant('pending'))();
}

// Visit DAO — every write enqueues a sync entry
extension VisitDao on AppDatabase {
  Future<void> createVisit(VisitsCompanion visit) async {
    await transaction(() async {
      await into(visits).insert(visit);
      await into(syncQueueEntries).insert(SyncQueueEntriesCompanion.insert(
        entityType: 'visit',
        entityId: visit.clientId.value,
        operation: 'insert',
        payload: jsonEncode(visit.toJson()),
        enqueuedAt: DateTime.now(),
      ));
    });
  }
}

// WorkManager task (runs in background isolate)
@pragma('vm:entry-point')
void syncBackgroundTask() {
  Workmanager().executeTask((taskName, inputData) async {
    try {
      await SyncService.runSync();
      return Future.value(true);
    } catch (e) {
      return Future.value(false); // WorkManager will retry
    }
  });
}

// Sync service
class SyncService {
  static Future<void> runSync() async {
    final db = await DatabaseFactory.open();
    final api = ApiClient.create();

    // Push: send pending queue entries
    final pending = await db.syncQueueDao.getPendingEntries(limit: 100);
    if (pending.isEmpty) return;

    final response = await api.post('/sync/push', data: {
      'changes': pending.map((e) => e.toJson()).toList(),
    });

    for (final result in response.data['results']) {
      final clientId = result['clientId'] as String;
      if (result['status'] == 'accepted') {
        await db.syncQueueDao.markSent(clientId);
        await db.visitsDao.updateServerId(
          clientId: clientId,
          serverId: result['serverId'] as String,
          syncState: 'synced',
        );
      } else if (result['status'] == 'conflict') {
        await db.visitsDao.markConflict(clientId, result['serverRecord']);
      }
    }

    // Pull: fetch changes since last cursor
    final cursor = await db.syncMetaDao.getCursor();
    final pullResponse = await api.get('/sync/pull', queryParameters: {
      'cursor': cursor,
      'limit': 500,
    });

    await db.transaction(() async {
      for (final record in pullResponse.data['records']) {
        await db.visitsDao.upsertFromServer(record);
      }
      await db.syncMetaDao.setCursor(pullResponse.data['newCursor']);
    });
  }
}

// WorkManager registration
Future<void> registerBackgroundSync() async {
  await Workmanager().registerPeriodicTask(
    'field_sync',
    'syncBackgroundTask',
    frequency: const Duration(minutes: 15),
    constraints: Constraints(networkType: NetworkType.connected),
    existingWorkPolicy: ExistingWorkPolicy.keep,
  );
}

Line-by-line walkthrough

  1. 1. The Visits table includes both clientId (UUID set at creation) and serverId (set after server ACK) — this two-ID pattern is fundamental to offline-first; the client uses clientId for all local references.
  2. 2. syncState column tracks the sync lifecycle: pending (awaiting sync), syncing (in-flight), synced (confirmed by server), conflict (server rejected with newer version).
  3. 3. createVisit uses a Drift transaction to insert the visit AND its sync queue entry atomically — if either write fails, both are rolled back, ensuring the queue is never missing an entry for a local change.
  4. 4. SyncQueueEntry.operation captures 'insert'/'update'/'delete' so the server can apply the correct operation when processing the push.
  5. 5. syncBackgroundTask is annotated with @pragma('vm:entry-point') — this prevents the Dart tree shaker from removing the function, which would cause WorkManager to fail to find it.
  6. 6. Workmanager().executeTask returning false signals WorkManager to retry the task — returning true marks it as complete. The retry policy is configured separately.
  7. 7. In runSync(), the push batch limit of 100 prevents a single sync from timing out when the queue is very large (e.g., after 7 days offline).
  8. 8. The pull sync uses a transaction to atomically upsert all received records AND update the cursor — if the transaction rolls back, the cursor is not advanced and the same records are re-fetched on next sync.

Spot the bug

// Sync queue processing
Future<void> processQueue() async {
  final entries = await db.syncQueueDao.getAllPending();
  
  for (final entry in entries) {
    try {
      final result = await api.post('/sync/push-single', data: entry.toJson());
      if (result.data['status'] == 'accepted') {
        await db.syncQueueDao.delete(entry);
      }
    } catch (e) {
      // Will retry on next sync
    }
  }
}
Need a hint?
This approach causes major problems at scale with 10K workers, even when it works correctly. What are the two architectural problems?
Show answer
Bug 1: N HTTP requests for N queue entries — if a worker has 500 pending changes after a day offline, this makes 500 sequential API calls. Each round-trip is 100-500ms over mobile networks, making the total sync take 50-250 seconds. Fix: batch all entries into a single POST /sync/push request with an array payload — one round-trip regardless of queue size. Bug 2: Sequential processing — if one entry fails (e.g., entry 50 of 500), the loop catches the error and continues, but the failed entry remains 'pending'. On next sync it is re-attempted, but entries 51-500 that succeeded are also re-queued (since we only delete on success). Fix: track status per entry (sent/failed) and only re-process failed entries. With batch processing: the server returns per-entry results, allowing targeted retry of only conflicted or failed entries.

Explain like I'm 5

Imagine 10,000 people all working in places with no phone signal, all writing notes in their personal notebooks. When they get signal again, all their notebooks need to update a single big shared notebook in the office. If two people wrote different things about the same family, someone (or a smart rule) has to decide what the shared notebook says. The app has to do this automatically, without losing anyone's work, and without draining their phone battery doing it.

Fun fact

The Open Data Kit (ODK) — used by WHO, CDC, and hundreds of NGOs for field data collection across 100+ countries — was one of the first mobile systems to prove offline-first data collection at massive scale, influencing the architecture of most modern field app platforms including BRAC's digital tools.

Hands-on challenge

Design the complete offline-first architecture for a field data collection app with 10,000 workers: (1) Drift schema for Visit and SyncQueueEntry tables with all sync metadata columns. (2) The sync queue write pattern (transaction ensures queue entry is always created with the data change). (3) WorkManager task registration with correct constraints. (4) Push sync flow — describe what happens when a conflict is detected. (5) Pull sync flow with cursor-based delta sync. (6) How would you handle the scenario where a worker's device was offline for 7 days?

More resources

Open interactive version (quiz + challenge) ← Back to course: Flutter Interview Mastery