Files
ttrpg-initiative-tracker/docs/REWORK_PLAN.md
T
david raistrick 52866784b2 M2: replace shape-specific backend with generic KV doc store (firebase mirror)
Backend was endpoint-mapper (POST /api/campaigns ignores path id, etc). Adapter
translation layer brittle, untested, lost doc identity. Generic contract (Layer 2)
test caught 15 bugs immediately.

Rewrite to firebase-mirror KV model:
- server/db.js: docs table (path PK, parent, data JSON). getDoc/setDoc/updateDoc/
  deleteDoc/getCollection/batchWrite. parent = path prefix for collection queries.
- server/index.js: generic REST (GET/PUT/PATCH/DELETE /api/doc, GET /api/collection,
  POST /api/collection addDoc, POST /api/batch). WS subscribe by path (doc|collection),
  broadcast to doc subs at changed path + collection subs at parent path.
- src/storage/ws.js: thin passthrough adapter. norm() strips firebase prefix.
  initial value via REST (independent of WS connect), subsequent changes via WS.
- shared/turn.js kept (M4 use). server/handlers.js + server.test.js removed (logic
  now in App, backend is dumb KV).
- src/storage/contract.js: flush() bumped to 50ms (WS roundtrip > setTimeout(0)).

Layer 2 test (server/ws-contract.test.js): spins fresh backend per test, runs same
storage contract spec against createWsStorage. Catches adapter translation bugs
that firebase-mock Layer 1 tests cannot.

nanoid v5 ESM breaks jest CJS -> replaced with crypto.randomUUID (Node builtin).

Tests: 114 green (39 shared + 19 ws-contract + 56 frontend).
2026-06-29 13:00:24 -04:00

14 KiB

Initiative Tracker — Rework Plan

Status: APPROVED — executing Owner: draistrick (fork → keen99/ttrpg-initiative-tracker, private) Upstream: code.draft13.com/robert/ttrpg-initiative-tracker (friend's Gitea)


Goals

  1. Replace Firebase with self-hosted backend. Browser cannot own a DB file (sandbox). Cross-device (DM + tablet + player view) requires a real backend. Backend is the foundation, built first.
  2. Automated test ecosystem as the baseline. Lock current behavior before changing it. Skip bug must become provably impossible to reintroduce.
  3. Remain mergeable upstream. Default behavior (Firebase) preserved behind flag. Upstream main stays clean. Friend keeps Firebase path.
  4. Self-hostable in local Docker (in-house network). Public exposure = future, only after auth + multiuser safety.

Non-Goals (this plan)

  • Changing user-visible functionality beyond the documented bug fixes (skip, manual turn override).
  • Ripping Firebase. Kept as default adapter upstream.
  • Public/multiuser deployment. Deferred.
  • Rewriting the entire 2935-line App.js. Only extract what testability demands.

Problem Statement

Why Firebase is wrong here (for this fork)

  • Requires Google account + network for a single-user tabletop tool.
  • Realtime value (DM view ↔ player display) is real but solvable locally.
  • API key baked into client bundle (CRA REACT_APP_* at build); security depends entirely on console rules not in repo.
  • Vendor lock + quota; onSnapshot on collections burns reads.
  • Friend keeps it; we fork off it.

Why a backend is mandatory

Browser sandbox cannot write the filesystem. No sqlite file, no /data/db.sqlite, nothing. Browser JS is blocked from disk by design. Therefore cross-device storage (DM ↔ tablet ↔ player view) requires a separate Node process owning the DB file and serving the browser over HTTP/WebSocket. There is no browser-only path. The backend is step one, not deferred.

Known bug: initiative skips / lost state

Two failure classes observed:

  1. Race / data loss. Every turn mutation = client reads snapshot → computes → writes whole doc back. Two interleaved actions → last-write-wins → state lost → skip. Firebase gives eventual snapshots, no transactions. Even single-user bites via optimistic UI vs server round-trip.
  2. Logic drift. turnOrderIds array vs participants array vs isActive filter drift across mid-combat add/remove/toggle. currentIndex === -1 fallback path is fragile. No invariant enforced. No way for DM to manually say "this participant's turn now."

Undo is fragile

Current undo = stale snapshot write-back. Interleaved undos = data loss. Suspected already bitten during live game.


Architecture

Stack (locked)

  • Node.js runtime
  • Express web framework
  • ws WebSocket lib (realtime push, replaces onSnapshot)
  • better-sqlite3 SQLite driver (synchronous, simple, fast)
  • SQLite DB (single file, docker volume, trivial backup)
  • Jest test runner (already in CRA deps)

Postgres deferred until public multiuser exposure is real. SQLite schema ports easily if that day comes.

Backend design

  • Owns SQLite file. Only writer.
  • Holds authoritative state. Turn logic (initiative order, next-turn, add/remove mid-combat) runs server-side inside SQLite transaction.
  • Client sends action (e.g. NEXT_TURN, not the resulting state). Server computes result, persists, broadcasts diff.
  • Kills last-write-wins races by construction.
  • WS broadcast on every state change → all connected clients (DM view, player display, tablet) update instantly.

Three storage impls, one interface (frontend)

The storage interface is the test seam and the upstream-compat layer.

Impl When used Automated-tested?
firebase.js default (STORAGE=firebase) — upstream path No — requires live Firebase project
ws.js STORAGE=ws — our fork, talks to backend Yes — against running backend
memory.js test-only, in-process Yes — fast, deterministic

Frontend interface contract (all three implement):

  • getDoc(path), setDoc(path, data, opts), updateDoc(path, patch)
  • deleteDoc(path), batch(ops)
  • subscribeDoc(path, cb) / subscribeCollection(path, cb) → real-time push

Firebase impl: existing onSnapshot + SDK calls, moved verbatim behind interface (M2). WS impl: thin client; dispatches actions to backend, receives state updates via WS subscribe (M2). Memory impl: in-memory Map + EventEmitter, for tests (M3).

Repo layout (npm workspaces)

/
  package.json              # workspaces root
  src/                      # React frontend (existing, refactored behind storage interface)
    storage/
      index.js              # factory: pick impl from STORAGE env
      firebase.js           # extracted from current App.js (verbatim)
      ws.js                 # NEW — talks to backend
      memory.js             # NEW — test only
      types.js              # interface contract (JSDoc)
  server/                   # NEW
    index.js                # Express + ws bootstrap, generic KV REST
    db.js                   # better-sqlite3, docs table (KV), broadcast
    ws-contract.test.js     # adapter vs live backend (Layer 2 test)
  shared/                   # pure logic, no I/O, importable by client + server + tests
    turn.js                 # turn logic (single source; server imports, tests import)
    types.js
  shared.test/              # turn logic unit tests (characterization + desired)
    turn.characterization.test.js
    turn.desired.test.js
  docker-compose.yml        # NEW — M5
  docs/
    REWORK_PLAN.md          # this file

Auth

  • Now: AUTH_MODE=none. App gated by nginx HTTP basic auth (reuse friend's existing pattern). In-house only. Risk acceptable: someone sees your initiative counter.
  • Future: AUTH_MODE=token — real login, real users. Only if/when publicly exposed. Not built this plan.

Milestones

Each milestone = independently mergeable PR upstream (unless marked ).

M Does Tests?
0 repo, branch, remotes no
1 build backend (Node+Express+ws+better-sqlite3) unit tests as built
2 frontend WS adapter — app runs vs backend, cross-device works yes
3 characterization tests lock current behavior (skip bug included) yes
4 skip fix + manual override, regression-protected yes
5 docker compose in-house smoke
6 undo rework (tx events) unit
7 playwright multi-window e2e (deferred) e2e

Milestone 0 — Repo + branch setup

  • Fresh branch off main (not dsr-rework). Name: rework-backend.
  • upstream remote = friend's Gitea (read-only fetch).
  • Push origin = keen99/ttrpg-initiative-tracker (private).
  • npm workspaces root config.
  • Commit this plan.
  • Exit criteria: clean branch, plan committed, remotes set. DONE (commit ad7979d, then plan-restored).
  • Upstream-PRable: n/a (fork infra)

Milestone 1 — Build backend

  • server/: Express + ws + better-sqlite3.
  • Schema mirrors current Firestore doc tree (campaigns, encounters subcoll, activeDisplay, logs).
  • Turn logic (initiative order, next-turn, add/remove mid-combat) ported from App.js into server/turn.js (pure function, server-authoritative). Port verbatim — bugs included for now.
  • Actions dispatched to backend; server computes result, persists in SQLite tx, broadcasts via WS.
  • Unit tests as built: turn logic unit tests (characterization capturing current behavior), plus basic API/WS smoke.
  • Exit criteria: backend boots, serves state over WS, persists to SQLite, unit tests green.
  • Upstream-PRable: divergence (friend stays Firebase).

Milestone 2 — Frontend WS adapter

  • Define storage/types.js interface.
  • Move all ~30 Firestore call sites from App.js into storage/firebase.js behind interface (verbatim).
  • Implement storage/ws.js per interface, talking to backend. Dispatches actions, subscribes to WS.
  • Implement storage/memory.js for frontend unit tests.
  • storage/index.js factory: STORAGE env → pick impl. Default firebase (upstream unchanged).
  • App runs against backend with STORAGE=ws.
  • Cross-device verified manually: DM view + player display + tablet.
  • Exit criteria: app runs fully against local backend, no Firebase. Multi-device sync works.
  • Upstream-PRable: ⚠️ partial. Storage interface + firebase extract = . WS impl = .

Milestone 3 — Characterization tests lock current behavior

  • Lock current behavior end-to-end via integration tests against running backend (turn logic now server-side).
  • Capture the skip bug as a characterization test (whatever current does = locked, bugs included).
  • Cover: START, NEXT_TURN, PAUSE, RESUME, ADD_PARTICIPANT, REMOVE_PARTICIPANT, TOGGLE_ACTIVE, REORDER, APPLY_DAMAGE/HEAL, DEATH_SAVE, END.
  • Iterate until confident: baseline solid, regressions impossible to silently slip.
  • Exit criteria: characterization suite green against backend. Baseline locked.
  • Upstream-PRable: if kept storage-agnostic (tests target turn logic shape).

Milestone 4 — Skip fix + manual turn override

  • Write desired-behavior tests (red):
    • Never-skip invariant: after NEXT_TURN, current participant is always a valid active participant, or encounter cleanly ends.
    • Mid-combat add enters turn order correctly.
    • Remove mid-combat doesn't skip next.
    • Pause/resume preserves order.
  • Fix turn logic until red tests go green. Skip bug dies.
  • Add new action: JUMP_TURN_TO(participantId). DM clicks participant → cursor jumps → that participant's turn now → future NEXT_TURN continues from there. UI button label: "Make This Turn".
  • Regression-protected by M3 characterization + new desired tests.
  • Exit criteria: skip bug gone + provably cannot regress. Manual override works.
  • Upstream-PRable: logic fix + new feature, both beneficial.

Milestone 5 — Docker compose

  • docker-compose.yml:
    • backend service (Node + sqlite volume)
    • nginx service (static frontend + reverse proxy + http basic auth)
  • Profiles: firebase (frontend only, current behavior) vs backend (full stack).
  • Exit criteria: docker compose up runs full stack in-house.
  • Upstream-PRable: divergence.

Milestone 6 — Undo rework

  • Events table: every mutating action writes (type, payload, undo_payload, undone, ts).
  • Undo = apply undo_payload in same SQLite tx, flip undone. Transactional, no stale clobber.
  • Replaces current fragile /logs snapshot-write undo.
  • Migration: keep old undo working for existing entries until cleared; new format for new entries.
  • Exit criteria: undo works transactionally; interleaved undos don't corrupt.
  • Upstream-PRable: ⚠️ partial. Turn-logic-level undo = . Backend events table = .

Milestone 7 — Playwright E2E (deferred)

  • Multi-window E2E: DM view + display + player view in separate browser contexts against running backend.
  • Verify realtime sync end-to-end.
  • Only build if sync regresses or we deviate significantly. Turn-logic unit + backend integration tests cover most regression risk cheaper.
  • Exit criteria: e2e green for core combat flow across 3 windows.
  • Upstream-PRable: if test infra shared.

Milestone 8 — (Future) Public exposure

  • Real auth (AUTH_MODE=token).
  • Rate limiting, CSRF, hardening.
  • Postgres migration if load warrants.
  • Only if we decide to expose publicly + multiuser.

Testing strategy

Layers

  1. Turn logic unit tests (Jest, pure functions) — every turn transition, skip invariants, manual override. Built in M1 (characterization), extended in M4 (desired). Cheap, essential.
  2. Backend integration tests (Jest) — spin server on random port, assert WS pushes + SQLite persists + transactional correctness. (M1+)
  3. Frontend adapter contract tests (Jest, memory) — impl parity against interface. (M2)
  4. Playwright multi-window E2E — deferred. Only realtime sync glue turn logic can't reach. (M7)

Two-pass on turn logic (M1 → M4)

  1. Characterization (M1/M3) — capture current behavior exactly (bugs included). Locks extraction/port as provably identical. Lets later fix be provable.
  2. Desired-behavior (red) (M4) — write what should happen. Fail today. Fix → green. Bug dies, stays dead.

Manual smoke via config flags

  • STORAGE=firebase → current behavior (friend's path, upstream default).
  • STORAGE=ws → our path, local backend.
  • docker-compose profiles mirror the above.

Accepted test gap

  • Firebase adapter untested (requires live project). Accepted cost.
  • Mitigated by: interface contract; if firebase impl drifts, integration smoke only.

Mergability upstream

Milestone Upstream-PRable? Why
0 repo setup n/a fork infra
1 backend divergence (friend stays Firebase)
2 WS adapter ⚠️ partial interface + firebase extract , WS
3 characterization tests if storage-agnostic
4 skip fix + manual override logic fix + beneficial feature
5 docker compose divergence
6 undo rework ⚠️ partial turn-logic-level , events table
7 playwright if test infra shared

Default STORAGE=firebase + AUTH_MODE=none (unset) = upstream sees literally zero change.


Risks

  • CRA + workspaces friction. Create React App may resist monorepo layout. Mitigation: keep src/ as CRA root, server/ + shared/ as separate workspaces imported via alias. Eject/craco only if forced.
  • Turn logic port correctness. Current logic tangled; verbatim port risks subtle drift. Mitigation: characterization tests in M1/M3 lock behavior before any fix.
  • Firebase drift untested. Mitigation: interface contract; friend's path his to maintain.
  • Undo history migration. Existing log entries use old snapshot format. Mitigation: keep old undo working until cleared, new format for new entries.
  • WS reconnect/state-sync edge cases. Transient drop mid-combat. Mitigation: client requests full state resync on (re)connect; server is source of truth.

Decisions (locked)

  1. Branch: rework-backend off main.
  2. Manual turn override: action JUMP_TURN_TO(participantId). UI button "Make This Turn".
  3. npm workspaces for server/ + shared/ alongside CRA src/. Fallback alias if CRA fights.

Next action

M0 DONE. M1 kickoff: scaffold server/ workspace, set up better-sqlite3 + Express + ws, port turn logic from App.js into server/turn.js, write first unit tests.