Files
ttrpg-initiative-tracker/docs/REWORK_PLAN.md
T

14 KiB

Initiative Tracker — Rework Plan

Status: DRAFT — pending approval to execute Owner: draistrick (fork → keen99/ttrpg-initiative-tracker, private) Upstream: code.draft13.com/robert/ttrpg-initiative-tracker (friend's Gitea)


Goals

  1. Replace Firebase with self-hosted backend. Browser cannot own a DB file (sandbox). Cross-device (DM + tablet + player view) requires a real backend. Backend is the foundation, built first.
  2. Automated test ecosystem as the baseline. Lock current behavior before changing it. Skip bug must become provably impossible to reintroduce.
  3. Remain mergeable upstream. Default behavior (Firebase) preserved behind flag. Upstream main stays clean. Friend keeps Firebase path.
  4. Self-hostable in local Docker (in-house network). Public exposure = future, only after auth + multiuser safety.

Non-Goals (this plan)

  • Changing user-visible functionality beyond the documented bug fixes (skip, manual turn override).
  • Ripping Firebase. Kept as default adapter upstream.
  • Public/multiuser deployment. Deferred.
  • Rewriting the entire 2935-line App.js. Only extract what testability demands.

Problem Statement

Why Firebase is wrong here (for this fork)

  • Requires Google account + network for a single-user tabletop tool.
  • Realtime value (DM view ↔ player display) is real but solvable locally.
  • API key baked into client bundle (CRA REACT_APP_* at build); security depends entirely on console rules not in repo.
  • Vendor lock + quota; onSnapshot on collections burns reads.
  • Friend keeps it; we fork off it.

Why a backend is mandatory

Browser sandbox cannot write the filesystem. No sqlite file, no /data/db.sqlite, nothing. Browser JS is blocked from disk by design. Therefore cross-device storage (DM ↔ tablet ↔ player view) requires a separate Node process owning the DB file and serving the browser over HTTP/WebSocket. There is no browser-only path. The backend is step one, not deferred.

Known bug: initiative skips / lost state

Two failure classes observed:

  1. Race / data loss. Every turn mutation = client reads snapshot → computes → writes whole doc back. Two interleaved actions → last-write-wins → state lost → skip. Firebase gives eventual snapshots, no transactions. Even single-user bites via optimistic UI vs server round-trip.
  2. Logic drift. turnOrderIds array vs participants array vs isActive filter drift across mid-combat add/remove/toggle. currentIndex === -1 fallback path is fragile. No invariant enforced. No way for DM to manually say "this participant's turn now."

Undo is fragile

Current undo = stale snapshot write-back. Interleaved undos = data loss. Suspected already bitten during live game.


Architecture

Stack (locked)

  • Node.js runtime
  • Express web framework
  • ws WebSocket lib (realtime push, replaces onSnapshot)
  • better-sqlite3 SQLite driver (synchronous, simple, fast)
  • SQLite DB (single file, docker volume, trivial backup)
  • Jest test runner (already in CRA deps)

Postgres deferred until public multiuser exposure is real. SQLite schema ports easily if that day comes.

Three storage impls, one interface (frontend)

The storage interface is the test seam and the upstream-compat layer.

Impl When used Automated-tested?
firebase.js default (STORAGE=firebase) — upstream path No — requires live Firebase project
ws.js STORAGE=ws — our fork, talks to backend Yes — against running backend
memory.js test-only, in-process Yes — fast, deterministic

Frontend interface contract (all three implement):

  • getDoc(path), setDoc(path, data, opts), updateDoc(path, patch)
  • deleteDoc(path), batch(ops)
  • subscribeDoc(path, cb) / subscribeCollection(path, cb) → real-time push

Firebase impl: existing onSnapshot + SDK calls, moved verbatim behind interface. WS impl: thin client; dispatches actions to backend, receives state updates via WS subscribe. Memory impl: in-memory Map + EventEmitter, for tests that don't need the backend.

Backend design

  • Owns SQLite file. Only writer.
  • Holds authoritative state. FSM (turn logic) runs server-side inside SQLite transaction.
  • Client sends action (e.g. NEXT_TURN, not the resulting state). Server computes result, persists, broadcasts diff.
  • Kills last-write-wins races by construction.
  • WS broadcast on every state change → all connected clients (DM view, player display, tablet) update instantly.

Repo layout (npm workspaces)

/
  package.json              # workspaces root
  src/                      # React frontend (existing, refactored behind storage interface)
    storage/
      index.js              # factory: pick impl from STORAGE env
      firebase.js           # extracted from current App.js (verbatim)
      ws.js                 # NEW — talks to backend
      memory.js             # NEW — test only
      types.js              # interface contract (JSDoc)
  server/                   # NEW
    index.js                # Express + ws bootstrap
    db.js                   # better-sqlite3, schema, migrations
    fsm/
      turn.js               # turn-order state machine (pure)
    handlers/               # action handlers (call fsm, persist, broadcast)
    server.test.js          # API + WS integration tests
  shared/                   # pure logic, no I/O, importable by client + server + tests
    turn.js                 # turn FSM re-export (single source)
    types.js
  shared.test/              # FSM unit tests (characterization + desired)
    turn.characterization.test.js
    turn.desired.test.js
  docker-compose.yml        # NEW — Milestone E
  docs/
    REWORK_PLAN.md          # this file

Auth

  • Now: AUTH_MODE=none. App gated by nginx HTTP basic auth (reuse friend's existing pattern). In-house only. Risk acceptable: someone sees your initiative counter.
  • Future: AUTH_MODE=token — real login, real users. Only if/when publicly exposed. Not built this plan.

Milestones

Each milestone = independently mergeable PR upstream (unless marked ).

Milestone 0 — Repo + branch setup

  • Fresh branch off main (not dsr-rework — avoid contamination). Name: rework-backend.
  • Add upstream remote (friend's Gitea, read-only fetch).
  • Push origin = keen99/ttrpg-initiative-tracker (private).
  • npm workspaces root config.
  • Commit this plan.
  • Exit criteria: clean branch, plan committed, remotes set.
  • Upstream-PRable: n/a (fork infra)

Milestone 1 — Turn FSM extraction + characterization tests

  • Extract pure turn-order logic from App.js (handleNextTurn, computeTurnOrder*, sort, add/remove mid-combat) into shared/turn.js.
  • Pure function: (state, action) → state. No I/O, no Firebase, no React.
  • Port verbatim — bugs included.
  • Write characterization tests capturing current behavior (including the skip bug). Lock reality.
  • FSM unit-testable in Node with zero infra.
  • Exit criteria: all characterization tests green. Behavior provably identical to current.
  • Upstream-PRable: pure refactor, zero behavior change, no Firebase dependency introduced.

Milestone 2 — Backend skeleton

  • server/: Express + ws + better-sqlite3.
  • Schema mirrors current Firestore doc tree (campaigns, encounters subcoll, activeDisplay, logs).
  • FSM (from Milestone 1) runs server-side inside SQLite transaction.
  • WS broadcast on every state change.
  • Backend integration tests: spin server on random port, assert WS pushes + SQLite persists.
  • Exit criteria: backend boots, serves state over WS, persists to SQLite, tests green.
  • Upstream-PRable: divergence (friend stays Firebase).

Milestone 3 — Frontend WS adapter

  • Define storage/types.js interface.
  • Move all ~30 Firestore call sites from App.js into storage/firebase.js behind interface (verbatim).
  • Implement storage/ws.js per interface, talking to backend. Dispatches actions, subscribes to WS.
  • Implement storage/memory.js for frontend unit tests.
  • storage/index.js factory: STORAGE env → pick impl. Default firebase (upstream unchanged).
  • App runs against backend with STORAGE=ws.
  • Cross-device verified manually: DM view + player display + tablet.
  • Exit criteria: app runs fully against local backend, no Firebase. Multi-device sync works.
  • Upstream-PRable: ⚠️ partial. Storage interface + firebase extract = . WS impl = .

Milestone 4 — Red tests + fix skip bug + manual turn override

  • Write desired-behavior tests (red):
    • Never-skip invariant: after NEXT_TURN, current participant is always a valid active participant, or encounter cleanly ends.
    • Mid-combat add enters turn order correctly.
    • Remove mid-combat doesn't skip next.
    • Pause/resume preserves order.
  • Fix FSM until red tests go green. Skip bug dies.
  • Add new action: JUMP_TURN_TO(participantId). DM clicks participant → cursor jumps → that participant's turn now → future NEXT_TURN continues from there. UI button label: "Make This Turn" (candidates: "Force Turn Here").
  • Regression-protected by Milestone 1 characterization + new desired tests.
  • Exit criteria: skip bug gone + provably cannot regress. Manual override works.
  • Upstream-PRable: logic fix + new feature, both beneficial.

Milestone 5 — Docker compose

  • docker-compose.yml:
    • backend service (Node + sqlite volume)
    • nginx service (static frontend + reverse proxy + http basic auth)
  • Profiles: firebase (frontend only, current behavior) vs backend (full stack).
  • Exit criteria: docker compose up runs full stack in-house.
  • Upstream-PRable: divergence.

Milestone 6 — Undo rework

  • Events table: every mutating action writes (type, payload, undo_payload, undone, ts).
  • Undo = apply undo_payload in same SQLite tx, flip undone. Transactional, no stale clobber.
  • Replaces current fragile /logs snapshot-write undo.
  • Migration: keep old undo working for existing entries until cleared; new format for new entries.
  • Exit criteria: undo works transactionally; interleaved undos don't corrupt.
  • Upstream-PRable: if logic kept storage-agnostic (FSM-level). Backend-specific events table = .

Milestone 7 — Playwright E2E (deferred)

  • Multi-window E2E: DM view + display + player view in separate browser contexts against running backend.
  • Verify realtime sync end-to-end.
  • Only build if sync regresses or we deviate significantly. FSM + backend integration tests cover most regression risk cheaper.
  • Exit criteria: e2e green for core combat flow across 3 windows.
  • Upstream-PRable: if test infra shared.

Milestone 8 — (Future) Public exposure

  • Real auth (AUTH_MODE=token).
  • Rate limiting, CSRF, hardening.
  • Postgres migration if load warrants.
  • Only if we decide to expose publicly + multiuser.

Testing strategy

Layers

  1. FSM unit tests (Jest, pure functions) — every turn transition, skip invariants, manual override. Cheap, essential. Covers skip bug permanently. (Milestones 1, 4)
  2. Backend integration tests (Jest) — spin server on random port, assert WS pushes + SQLite persists + transactional correctness. (Milestone 2)
  3. Frontend adapter contract tests (Jest, memory) — impl parity against interface. (Milestone 3)
  4. Playwright multi-window E2E — deferred. Only realtime sync glue FSM can't reach. (Milestone 7)

Two-pass on FSM (Milestones 1 → 4)

  1. Characterization — capture current behavior exactly (bugs included). Locks extraction as provably identical. Lets later refactor port safely.
  2. Desired-behavior (red) — write what should happen. Fail today. Fix → green. Bug dies, stays dead.

Manual smoke via config flags

  • STORAGE=firebase → current behavior (friend's path, upstream default).
  • STORAGE=ws → our path, local backend.
  • docker-compose profiles mirror the above.

Accepted test gap

  • Firebase adapter untested (requires live project). Accepted cost.
  • Mitigated by: interface contract; if firebase impl drifts, integration smoke only.

Mergability upstream

Milestone Upstream-PRable? Why
0 repo setup n/a fork infra
1 FSM extract + characterization pure refactor, identical behavior
2 backend divergence (friend stays Firebase)
3 WS adapter ⚠️ partial interface + firebase extract , WS
4 skip fix + manual override logic fix + beneficial feature
5 docker compose divergence
6 undo rework ⚠️ partial FSM logic , events table
7 playwright if test infra shared

Default STORAGE=firebase + AUTH_MODE=none (unset) = upstream sees literally zero change.


Risks

  • CRA + workspaces friction. Create React App may resist monorepo layout. Mitigation: keep src/ as CRA root, shared/ as separate workspace imported via alias. Eject/craco only if forced.
  • FSM extraction correctness. Current logic tangled; verbatim port risks subtle drift. Mitigation: characterization tests first, diff behavior before any fix.
  • Firebase drift untested. Mitigation: interface contract; friend's path his to maintain.
  • Undo history migration. Existing log entries use old snapshot format. Mitigation: keep old undo working until cleared, new format for new entries.
  • WS reconnect/state-sync edge cases. Transient drop mid-combat. Mitigation: client requests full state resync on (re)connect; server is source of truth.

Open decisions (need answers before Milestone 0)

  1. Branch name off main. Propose: rework-backend. Confirm or rename.
  2. Manual turn override. Action JUMP_TURN_TO(participantId). UI button label "Make This Turn" (alt: "Force Turn Here"). Pick.
  3. npm workspaces for shared/ + server/ alongside CRA src/. Prefer yes. CRA may fight → alias as fallback.

Next action (on approval)

Milestone 0: create branch rework-backend off main, set remotes, commit this plan. Then Milestone 1 kickoff: trace handleNextTurn full path + write first characterization test capturing the skip bug.