diff --git a/docs/REWORK_PLAN.md b/docs/REWORK_PLAN.md new file mode 100644 index 0000000..a622edd --- /dev/null +++ b/docs/REWORK_PLAN.md @@ -0,0 +1,268 @@ +# Initiative Tracker — Rework Plan + +Status: **DRAFT — pending approval to execute** +Owner: draistrick (fork → `keen99/ttrpg-initiative-tracker`, private) +Upstream: `code.draft13.com/robert/ttrpg-initiative-tracker` (friend's Gitea) + +--- + +## Goals + +1. **Replace Firebase with self-hosted backend.** Browser cannot own a DB file (sandbox). Cross-device (DM + tablet + player view) requires a real backend. Backend is the foundation, built first. +2. **Automated test ecosystem as the baseline.** Lock current behavior before changing it. Skip bug must become provably impossible to reintroduce. +3. **Remain mergeable upstream.** Default behavior (Firebase) preserved behind flag. Upstream `main` stays clean. Friend keeps Firebase path. +4. **Self-hostable in local Docker** (in-house network). Public exposure = future, only after auth + multiuser safety. + +## Non-Goals (this plan) + +- Changing user-visible functionality beyond the documented bug fixes (skip, manual turn override). +- Ripping Firebase. Kept as default adapter upstream. +- Public/multiuser deployment. Deferred. +- Rewriting the entire 2935-line `App.js`. Only extract what testability demands. + +--- + +## Problem Statement + +### Why Firebase is wrong here (for this fork) +- Requires Google account + network for a single-user tabletop tool. +- Realtime value (DM view ↔ player display) is real but solvable locally. +- API key baked into client bundle (CRA `REACT_APP_*` at build); security depends entirely on console rules not in repo. +- Vendor lock + quota; `onSnapshot` on collections burns reads. +- Friend keeps it; we fork off it. + +### Why a backend is mandatory +Browser sandbox cannot write the filesystem. No sqlite file, no `/data/db.sqlite`, nothing. Browser JS is blocked from disk by design. Therefore cross-device storage (DM ↔ tablet ↔ player view) requires a separate Node process owning the DB file and serving the browser over HTTP/WebSocket. There is no browser-only path. The backend is step one, not deferred. + +### Known bug: initiative skips / lost state +Two failure classes observed: + +1. **Race / data loss.** Every turn mutation = client reads snapshot → computes → writes whole doc back. Two interleaved actions → last-write-wins → state lost → skip. Firebase gives eventual snapshots, no transactions. Even single-user bites via optimistic UI vs server round-trip. +2. **Logic drift.** `turnOrderIds` array vs `participants` array vs `isActive` filter drift across mid-combat add/remove/toggle. `currentIndex === -1` fallback path is fragile. No invariant enforced. No way for DM to manually say "this participant's turn now." + +### Undo is fragile +Current undo = stale snapshot write-back. Interleaved undos = data loss. Suspected already bitten during live game. + +--- + +## Architecture + +### Stack (locked) +- **Node.js** runtime +- **Express** web framework +- **ws** WebSocket lib (realtime push, replaces `onSnapshot`) +- **better-sqlite3** SQLite driver (synchronous, simple, fast) +- **SQLite** DB (single file, docker volume, trivial backup) +- **Jest** test runner (already in CRA deps) + +Postgres deferred until public multiuser exposure is real. SQLite schema ports easily if that day comes. + +### Three storage impls, one interface (frontend) + +The storage interface is the test seam and the upstream-compat layer. + +| Impl | When used | Automated-tested? | +|---|---|---| +| `firebase.js` | default (`STORAGE=firebase`) — upstream path | No — requires live Firebase project | +| `ws.js` | `STORAGE=ws` — our fork, talks to backend | Yes — against running backend | +| `memory.js` | test-only, in-process | Yes — fast, deterministic | + +**Frontend interface contract** (all three implement): +- `getDoc(path)`, `setDoc(path, data, opts)`, `updateDoc(path, patch)` +- `deleteDoc(path)`, `batch(ops)` +- `subscribeDoc(path, cb)` / `subscribeCollection(path, cb)` → real-time push + +Firebase impl: existing `onSnapshot` + SDK calls, moved verbatim behind interface. +WS impl: thin client; dispatches **actions** to backend, receives **state updates** via WS subscribe. +Memory impl: in-memory Map + EventEmitter, for tests that don't need the backend. + +### Backend design +- Owns SQLite file. Only writer. +- Holds authoritative state. FSM (turn logic) runs server-side inside SQLite transaction. +- Client sends **action** (e.g. `NEXT_TURN`, not the resulting state). Server computes result, persists, broadcasts diff. +- Kills last-write-wins races by construction. +- WS broadcast on every state change → all connected clients (DM view, player display, tablet) update instantly. + +### Repo layout (npm workspaces) + +``` +/ + package.json # workspaces root + src/ # React frontend (existing, refactored behind storage interface) + storage/ + index.js # factory: pick impl from STORAGE env + firebase.js # extracted from current App.js (verbatim) + ws.js # NEW — talks to backend + memory.js # NEW — test only + types.js # interface contract (JSDoc) + server/ # NEW + index.js # Express + ws bootstrap + db.js # better-sqlite3, schema, migrations + fsm/ + turn.js # turn-order state machine (pure) + handlers/ # action handlers (call fsm, persist, broadcast) + server.test.js # API + WS integration tests + shared/ # pure logic, no I/O, importable by client + server + tests + turn.js # turn FSM re-export (single source) + types.js + shared.test/ # FSM unit tests (characterization + desired) + turn.characterization.test.js + turn.desired.test.js + docker-compose.yml # NEW — Milestone E + docs/ + REWORK_PLAN.md # this file +``` + +### Auth +- **Now:** `AUTH_MODE=none`. App gated by nginx HTTP basic auth (reuse friend's existing pattern). In-house only. Risk acceptable: someone sees your initiative counter. +- **Future:** `AUTH_MODE=token` — real login, real users. Only if/when publicly exposed. Not built this plan. + +--- + +## Milestones + +Each milestone = independently mergeable PR upstream (unless marked ❌). + +### Milestone 0 — Repo + branch setup +- Fresh branch off `main` (not `dsr-rework` — avoid contamination). Name: `rework-backend`. +- Add `upstream` remote (friend's Gitea, read-only fetch). +- Push origin = `keen99/ttrpg-initiative-tracker` (private). +- npm workspaces root config. +- Commit this plan. +- **Exit criteria:** clean branch, plan committed, remotes set. +- **Upstream-PRable:** n/a (fork infra) + +### Milestone 1 — Turn FSM extraction + characterization tests +- Extract pure turn-order logic from `App.js` (`handleNextTurn`, `computeTurnOrder*`, sort, add/remove mid-combat) into `shared/turn.js`. +- Pure function: `(state, action) → state`. No I/O, no Firebase, no React. +- Port **verbatim** — bugs included. +- Write characterization tests capturing current behavior (including the skip bug). Lock reality. +- FSM unit-testable in Node with zero infra. +- **Exit criteria:** all characterization tests green. Behavior provably identical to current. +- **Upstream-PRable:** ✅ pure refactor, zero behavior change, no Firebase dependency introduced. + +### Milestone 2 — Backend skeleton +- `server/`: Express + ws + better-sqlite3. +- Schema mirrors current Firestore doc tree (campaigns, encounters subcoll, activeDisplay, logs). +- FSM (from Milestone 1) runs server-side inside SQLite transaction. +- WS broadcast on every state change. +- Backend integration tests: spin server on random port, assert WS pushes + SQLite persists. +- **Exit criteria:** backend boots, serves state over WS, persists to SQLite, tests green. +- **Upstream-PRable:** ❌ divergence (friend stays Firebase). + +### Milestone 3 — Frontend WS adapter +- Define `storage/types.js` interface. +- Move all ~30 Firestore call sites from `App.js` into `storage/firebase.js` behind interface (verbatim). +- Implement `storage/ws.js` per interface, talking to backend. Dispatches actions, subscribes to WS. +- Implement `storage/memory.js` for frontend unit tests. +- `storage/index.js` factory: `STORAGE` env → pick impl. Default `firebase` (upstream unchanged). +- App runs against backend with `STORAGE=ws`. +- Cross-device verified manually: DM view + player display + tablet. +- **Exit criteria:** app runs fully against local backend, no Firebase. Multi-device sync works. +- **Upstream-PRable:** ⚠️ partial. Storage interface + firebase extract = ✅. WS impl = ❌. + +### Milestone 4 — Red tests + fix skip bug + manual turn override +- Write desired-behavior tests (red): + - Never-skip invariant: after `NEXT_TURN`, current participant is always a valid active participant, or encounter cleanly ends. + - Mid-combat add enters turn order correctly. + - Remove mid-combat doesn't skip next. + - Pause/resume preserves order. +- Fix FSM until red tests go green. Skip bug dies. +- Add new action: `JUMP_TURN_TO(participantId)`. DM clicks participant → cursor jumps → that participant's turn now → future `NEXT_TURN` continues from there. UI button label: "Make This Turn" (candidates: "Force Turn Here"). +- Regression-protected by Milestone 1 characterization + new desired tests. +- **Exit criteria:** skip bug gone + provably cannot regress. Manual override works. +- **Upstream-PRable:** ✅ logic fix + new feature, both beneficial. + +### Milestone 5 — Docker compose +- `docker-compose.yml`: + - `backend` service (Node + sqlite volume) + - `nginx` service (static frontend + reverse proxy + http basic auth) +- Profiles: `firebase` (frontend only, current behavior) vs `backend` (full stack). +- **Exit criteria:** `docker compose up` runs full stack in-house. +- **Upstream-PRable:** ❌ divergence. + +### Milestone 6 — Undo rework +- Events table: every mutating action writes `(type, payload, undo_payload, undone, ts)`. +- Undo = apply `undo_payload` in same SQLite tx, flip `undone`. Transactional, no stale clobber. +- Replaces current fragile `/logs` snapshot-write undo. +- Migration: keep old undo working for existing entries until cleared; new format for new entries. +- **Exit criteria:** undo works transactionally; interleaved undos don't corrupt. +- **Upstream-PRable:** ✅ if logic kept storage-agnostic (FSM-level). Backend-specific events table = ❌. + +### Milestone 7 — Playwright E2E (deferred) +- Multi-window E2E: DM view + display + player view in separate browser contexts against running backend. +- Verify realtime sync end-to-end. +- **Only build if sync regresses or we deviate significantly.** FSM + backend integration tests cover most regression risk cheaper. +- **Exit criteria:** e2e green for core combat flow across 3 windows. +- **Upstream-PRable:** ✅ if test infra shared. + +### Milestone 8 — (Future) Public exposure +- Real auth (`AUTH_MODE=token`). +- Rate limiting, CSRF, hardening. +- Postgres migration if load warrants. +- Only if we decide to expose publicly + multiuser. + +--- + +## Testing strategy + +### Layers +1. **FSM unit tests** (Jest, pure functions) — every turn transition, skip invariants, manual override. Cheap, essential. Covers skip bug permanently. (Milestones 1, 4) +2. **Backend integration tests** (Jest) — spin server on random port, assert WS pushes + SQLite persists + transactional correctness. (Milestone 2) +3. **Frontend adapter contract tests** (Jest, `memory`) — impl parity against interface. (Milestone 3) +4. **Playwright multi-window E2E** — deferred. Only realtime sync glue FSM can't reach. (Milestone 7) + +### Two-pass on FSM (Milestones 1 → 4) +1. **Characterization** — capture current behavior exactly (bugs included). Locks extraction as provably identical. Lets later refactor port safely. +2. **Desired-behavior (red)** — write what *should* happen. Fail today. Fix → green. Bug dies, stays dead. + +### Manual smoke via config flags +- `STORAGE=firebase` → current behavior (friend's path, upstream default). +- `STORAGE=ws` → our path, local backend. +- docker-compose profiles mirror the above. + +### Accepted test gap +- Firebase adapter untested (requires live project). Accepted cost. +- Mitigated by: interface contract; if firebase impl drifts, integration smoke only. + +--- + +## Mergability upstream + +| Milestone | Upstream-PRable? | Why | +|---|---|---| +| 0 repo setup | n/a | fork infra | +| 1 FSM extract + characterization | ✅ | pure refactor, identical behavior | +| 2 backend | ❌ | divergence (friend stays Firebase) | +| 3 WS adapter | ⚠️ partial | interface + firebase extract ✅, WS ❌ | +| 4 skip fix + manual override | ✅ | logic fix + beneficial feature | +| 5 docker compose | ❌ | divergence | +| 6 undo rework | ⚠️ partial | FSM logic ✅, events table ❌ | +| 7 playwright | ✅ | if test infra shared | + +Default `STORAGE=firebase` + `AUTH_MODE=none` (unset) = upstream sees literally zero change. + +--- + +## Risks + +- **CRA + workspaces friction.** Create React App may resist monorepo layout. Mitigation: keep `src/` as CRA root, `shared/` as separate workspace imported via alias. Eject/craco only if forced. +- **FSM extraction correctness.** Current logic tangled; verbatim port risks subtle drift. Mitigation: characterization tests first, diff behavior before any fix. +- **Firebase drift untested.** Mitigation: interface contract; friend's path his to maintain. +- **Undo history migration.** Existing log entries use old snapshot format. Mitigation: keep old undo working until cleared, new format for new entries. +- **WS reconnect/state-sync edge cases.** Transient drop mid-combat. Mitigation: client requests full state resync on (re)connect; server is source of truth. + +--- + +## Open decisions (need answers before Milestone 0) + +1. **Branch name** off `main`. Propose: `rework-backend`. Confirm or rename. +2. **Manual turn override.** Action `JUMP_TURN_TO(participantId)`. UI button label "Make This Turn" (alt: "Force Turn Here"). Pick. +3. **npm workspaces** for `shared/` + `server/` alongside CRA `src/`. Prefer yes. CRA may fight → alias as fallback. + +--- + +## Next action (on approval) + +Milestone 0: create branch `rework-backend` off `main`, set remotes, commit this plan. +Then Milestone 1 kickoff: trace `handleNextTurn` full path + write first characterization test capturing the skip bug.