Rework backend #1

Merged
robert merged 86 commits from rework-backend into main 2026-07-01 19:29:34 -04:00
Showing only changes of commit d679c9d1e9 - Show all commits
+72 -63
View File
@@ -1,6 +1,6 @@
# Initiative Tracker — Rework Plan # Initiative Tracker — Rework Plan
Status: **DRAFT — pending approval to execute** Status: **APPROVED — executing**
Owner: draistrick (fork → `keen99/ttrpg-initiative-tracker`, private) Owner: draistrick (fork → `keen99/ttrpg-initiative-tracker`, private)
Upstream: `code.draft13.com/robert/ttrpg-initiative-tracker` (friend's Gitea) Upstream: `code.draft13.com/robert/ttrpg-initiative-tracker` (friend's Gitea)
@@ -32,7 +32,7 @@ Upstream: `code.draft13.com/robert/ttrpg-initiative-tracker` (friend's Gitea)
- Friend keeps it; we fork off it. - Friend keeps it; we fork off it.
### Why a backend is mandatory ### Why a backend is mandatory
Browser sandbox cannot write the filesystem. No sqlite file, no `/data/db.sqlite`, nothing. Browser JS is blocked from disk by design. Therefore cross-device storage (DM ↔ tablet ↔ player view) requires a separate Node process owning the DB file and serving the browser over HTTP/WebSocket. There is no browser-only path. The backend is step one, not deferred. Browser sandbox cannot write the filesystem. No sqlite file, no `/data/db.sqlite`, nothing. Browser JS is blocked from disk by design. Therefore cross-device storage (DM ↔ tablet ↔ player view) requires a separate Node process owning the DB file and serving the browser over HTTP/WebSocket. There is no browser-only path. **The backend is step one, not deferred.**
### Known bug: initiative skips / lost state ### Known bug: initiative skips / lost state
Two failure classes observed: Two failure classes observed:
@@ -57,6 +57,13 @@ Current undo = stale snapshot write-back. Interleaved undos = data loss. Suspect
Postgres deferred until public multiuser exposure is real. SQLite schema ports easily if that day comes. Postgres deferred until public multiuser exposure is real. SQLite schema ports easily if that day comes.
### Backend design
- Owns SQLite file. Only writer.
- Holds authoritative state. Turn logic (initiative order, next-turn, add/remove mid-combat) runs server-side inside SQLite transaction.
- Client sends **action** (e.g. `NEXT_TURN`, not the resulting state). Server computes result, persists, broadcasts diff.
- Kills last-write-wins races by construction.
- WS broadcast on every state change → all connected clients (DM view, player display, tablet) update instantly.
### Three storage impls, one interface (frontend) ### Three storage impls, one interface (frontend)
The storage interface is the test seam and the upstream-compat layer. The storage interface is the test seam and the upstream-compat layer.
@@ -72,16 +79,9 @@ The storage interface is the test seam and the upstream-compat layer.
- `deleteDoc(path)`, `batch(ops)` - `deleteDoc(path)`, `batch(ops)`
- `subscribeDoc(path, cb)` / `subscribeCollection(path, cb)` → real-time push - `subscribeDoc(path, cb)` / `subscribeCollection(path, cb)` → real-time push
Firebase impl: existing `onSnapshot` + SDK calls, moved verbatim behind interface. Firebase impl: existing `onSnapshot` + SDK calls, moved verbatim behind interface (M2).
WS impl: thin client; dispatches **actions** to backend, receives **state updates** via WS subscribe. WS impl: thin client; dispatches **actions** to backend, receives **state updates** via WS subscribe (M2).
Memory impl: in-memory Map + EventEmitter, for tests that don't need the backend. Memory impl: in-memory Map + EventEmitter, for tests (M3).
### Backend design
- Owns SQLite file. Only writer.
- Holds authoritative state. FSM (turn logic) runs server-side inside SQLite transaction.
- Client sends **action** (e.g. `NEXT_TURN`, not the resulting state). Server computes result, persists, broadcasts diff.
- Kills last-write-wins races by construction.
- WS broadcast on every state change → all connected clients (DM view, player display, tablet) update instantly.
### Repo layout (npm workspaces) ### Repo layout (npm workspaces)
@@ -98,17 +98,16 @@ Memory impl: in-memory Map + EventEmitter, for tests that don't need the backend
server/ # NEW server/ # NEW
index.js # Express + ws bootstrap index.js # Express + ws bootstrap
db.js # better-sqlite3, schema, migrations db.js # better-sqlite3, schema, migrations
fsm/ turn.js # turn-order logic (pure, server-authoritative)
turn.js # turn-order state machine (pure) handlers/ # action handlers (call turn logic, persist, broadcast)
handlers/ # action handlers (call fsm, persist, broadcast)
server.test.js # API + WS integration tests server.test.js # API + WS integration tests
shared/ # pure logic, no I/O, importable by client + server + tests shared/ # pure logic, no I/O, importable by client + server + tests
turn.js # turn FSM re-export (single source) turn.js # turn logic (single source; server imports, tests import)
types.js types.js
shared.test/ # FSM unit tests (characterization + desired) shared.test/ # turn logic unit tests (characterization + desired)
turn.characterization.test.js turn.characterization.test.js
turn.desired.test.js turn.desired.test.js
docker-compose.yml # NEW — Milestone E docker-compose.yml # NEW — M5
docs/ docs/
REWORK_PLAN.md # this file REWORK_PLAN.md # this file
``` ```
@@ -123,34 +122,36 @@ Memory impl: in-memory Map + EventEmitter, for tests that don't need the backend
Each milestone = independently mergeable PR upstream (unless marked ❌). Each milestone = independently mergeable PR upstream (unless marked ❌).
### Milestone 0 — Repo + branch setup | M | Does | Tests? |
- Fresh branch off `main` (not `dsr-rework` — avoid contamination). Name: `rework-backend`. |---|---|---|
- Add `upstream` remote (friend's Gitea, read-only fetch). | 0 | repo, branch, remotes | no |
| 1 | build backend (Node+Express+ws+better-sqlite3) | unit tests as built |
| 2 | frontend WS adapter — app runs vs backend, cross-device works | yes |
| 3 | characterization tests lock current behavior (skip bug included) | yes |
| 4 | skip fix + manual override, regression-protected | yes |
| 5 | docker compose in-house | smoke |
| 6 | undo rework (tx events) | unit |
| 7 | playwright multi-window e2e (deferred) | e2e |
### Milestone 0 — Repo + branch setup ✅
- Fresh branch off `main` (not `dsr-rework`). Name: `rework-backend`.
- `upstream` remote = friend's Gitea (read-only fetch).
- Push origin = `keen99/ttrpg-initiative-tracker` (private). - Push origin = `keen99/ttrpg-initiative-tracker` (private).
- npm workspaces root config. - npm workspaces root config.
- Commit this plan. - Commit this plan.
- **Exit criteria:** clean branch, plan committed, remotes set. - **Exit criteria:** clean branch, plan committed, remotes set. ✅ DONE (commit ad7979d, then plan-restored).
- **Upstream-PRable:** n/a (fork infra) - **Upstream-PRable:** n/a (fork infra)
### Milestone 1 — Turn FSM extraction + characterization tests ### Milestone 1 — Build backend
- Extract pure turn-order logic from `App.js` (`handleNextTurn`, `computeTurnOrder*`, sort, add/remove mid-combat) into `shared/turn.js`.
- Pure function: `(state, action) → state`. No I/O, no Firebase, no React.
- Port **verbatim** — bugs included.
- Write characterization tests capturing current behavior (including the skip bug). Lock reality.
- FSM unit-testable in Node with zero infra.
- **Exit criteria:** all characterization tests green. Behavior provably identical to current.
- **Upstream-PRable:** ✅ pure refactor, zero behavior change, no Firebase dependency introduced.
### Milestone 2 — Backend skeleton
- `server/`: Express + ws + better-sqlite3. - `server/`: Express + ws + better-sqlite3.
- Schema mirrors current Firestore doc tree (campaigns, encounters subcoll, activeDisplay, logs). - Schema mirrors current Firestore doc tree (campaigns, encounters subcoll, activeDisplay, logs).
- FSM (from Milestone 1) runs server-side inside SQLite transaction. - Turn logic (initiative order, next-turn, add/remove mid-combat) ported from `App.js` into `server/turn.js` (pure function, server-authoritative). Port verbatim — bugs included for now.
- WS broadcast on every state change. - Actions dispatched to backend; server computes result, persists in SQLite tx, broadcasts via WS.
- Backend integration tests: spin server on random port, assert WS pushes + SQLite persists. - Unit tests as built: turn logic unit tests (characterization capturing current behavior), plus basic API/WS smoke.
- **Exit criteria:** backend boots, serves state over WS, persists to SQLite, tests green. - **Exit criteria:** backend boots, serves state over WS, persists to SQLite, unit tests green.
- **Upstream-PRable:** ❌ divergence (friend stays Firebase). - **Upstream-PRable:** ❌ divergence (friend stays Firebase).
### Milestone 3 — Frontend WS adapter ### Milestone 2 — Frontend WS adapter
- Define `storage/types.js` interface. - Define `storage/types.js` interface.
- Move all ~30 Firestore call sites from `App.js` into `storage/firebase.js` behind interface (verbatim). - Move all ~30 Firestore call sites from `App.js` into `storage/firebase.js` behind interface (verbatim).
- Implement `storage/ws.js` per interface, talking to backend. Dispatches actions, subscribes to WS. - Implement `storage/ws.js` per interface, talking to backend. Dispatches actions, subscribes to WS.
@@ -161,15 +162,23 @@ Each milestone = independently mergeable PR upstream (unless marked ❌).
- **Exit criteria:** app runs fully against local backend, no Firebase. Multi-device sync works. - **Exit criteria:** app runs fully against local backend, no Firebase. Multi-device sync works.
- **Upstream-PRable:** ⚠️ partial. Storage interface + firebase extract = ✅. WS impl = ❌. - **Upstream-PRable:** ⚠️ partial. Storage interface + firebase extract = ✅. WS impl = ❌.
### Milestone 4Red tests + fix skip bug + manual turn override ### Milestone 3Characterization tests lock current behavior
- Lock current behavior end-to-end via integration tests against running backend (turn logic now server-side).
- Capture the skip bug as a characterization test (whatever current does = locked, bugs included).
- Cover: START, NEXT_TURN, PAUSE, RESUME, ADD_PARTICIPANT, REMOVE_PARTICIPANT, TOGGLE_ACTIVE, REORDER, APPLY_DAMAGE/HEAL, DEATH_SAVE, END.
- Iterate until confident: baseline solid, regressions impossible to silently slip.
- **Exit criteria:** characterization suite green against backend. Baseline locked.
- **Upstream-PRable:** ✅ if kept storage-agnostic (tests target turn logic shape).
### Milestone 4 — Skip fix + manual turn override
- Write desired-behavior tests (red): - Write desired-behavior tests (red):
- Never-skip invariant: after `NEXT_TURN`, current participant is always a valid active participant, or encounter cleanly ends. - Never-skip invariant: after `NEXT_TURN`, current participant is always a valid active participant, or encounter cleanly ends.
- Mid-combat add enters turn order correctly. - Mid-combat add enters turn order correctly.
- Remove mid-combat doesn't skip next. - Remove mid-combat doesn't skip next.
- Pause/resume preserves order. - Pause/resume preserves order.
- Fix FSM until red tests go green. Skip bug dies. - Fix turn logic until red tests go green. Skip bug dies.
- Add new action: `JUMP_TURN_TO(participantId)`. DM clicks participant → cursor jumps → that participant's turn now → future `NEXT_TURN` continues from there. UI button label: "Make This Turn" (candidates: "Force Turn Here"). - Add new action: `JUMP_TURN_TO(participantId)`. DM clicks participant → cursor jumps → that participant's turn now → future `NEXT_TURN` continues from there. UI button label: "Make This Turn".
- Regression-protected by Milestone 1 characterization + new desired tests. - Regression-protected by M3 characterization + new desired tests.
- **Exit criteria:** skip bug gone + provably cannot regress. Manual override works. - **Exit criteria:** skip bug gone + provably cannot regress. Manual override works.
- **Upstream-PRable:** ✅ logic fix + new feature, both beneficial. - **Upstream-PRable:** ✅ logic fix + new feature, both beneficial.
@@ -187,12 +196,12 @@ Each milestone = independently mergeable PR upstream (unless marked ❌).
- Replaces current fragile `/logs` snapshot-write undo. - Replaces current fragile `/logs` snapshot-write undo.
- Migration: keep old undo working for existing entries until cleared; new format for new entries. - Migration: keep old undo working for existing entries until cleared; new format for new entries.
- **Exit criteria:** undo works transactionally; interleaved undos don't corrupt. - **Exit criteria:** undo works transactionally; interleaved undos don't corrupt.
- **Upstream-PRable:** ✅ if logic kept storage-agnostic (FSM-level). Backend-specific events table = ❌. - **Upstream-PRable:** ⚠️ partial. Turn-logic-level undo = ✅. Backend events table = ❌.
### Milestone 7 — Playwright E2E (deferred) ### Milestone 7 — Playwright E2E (deferred)
- Multi-window E2E: DM view + display + player view in separate browser contexts against running backend. - Multi-window E2E: DM view + display + player view in separate browser contexts against running backend.
- Verify realtime sync end-to-end. - Verify realtime sync end-to-end.
- **Only build if sync regresses or we deviate significantly.** FSM + backend integration tests cover most regression risk cheaper. - **Only build if sync regresses or we deviate significantly.** Turn-logic unit + backend integration tests cover most regression risk cheaper.
- **Exit criteria:** e2e green for core combat flow across 3 windows. - **Exit criteria:** e2e green for core combat flow across 3 windows.
- **Upstream-PRable:** ✅ if test infra shared. - **Upstream-PRable:** ✅ if test infra shared.
@@ -207,14 +216,14 @@ Each milestone = independently mergeable PR upstream (unless marked ❌).
## Testing strategy ## Testing strategy
### Layers ### Layers
1. **FSM unit tests** (Jest, pure functions) — every turn transition, skip invariants, manual override. Cheap, essential. Covers skip bug permanently. (Milestones 1, 4) 1. **Turn logic unit tests** (Jest, pure functions) — every turn transition, skip invariants, manual override. Built in M1 (characterization), extended in M4 (desired). Cheap, essential.
2. **Backend integration tests** (Jest) — spin server on random port, assert WS pushes + SQLite persists + transactional correctness. (Milestone 2) 2. **Backend integration tests** (Jest) — spin server on random port, assert WS pushes + SQLite persists + transactional correctness. (M1+)
3. **Frontend adapter contract tests** (Jest, `memory`) — impl parity against interface. (Milestone 3) 3. **Frontend adapter contract tests** (Jest, `memory`) — impl parity against interface. (M2)
4. **Playwright multi-window E2E** — deferred. Only realtime sync glue FSM can't reach. (Milestone 7) 4. **Playwright multi-window E2E** — deferred. Only realtime sync glue turn logic can't reach. (M7)
### Two-pass on FSM (Milestones 1 → 4) ### Two-pass on turn logic (M1 → M4)
1. **Characterization** — capture current behavior exactly (bugs included). Locks extraction as provably identical. Lets later refactor port safely. 1. **Characterization** (M1/M3) — capture current behavior exactly (bugs included). Locks extraction/port as provably identical. Lets later fix be provable.
2. **Desired-behavior (red)** — write what *should* happen. Fail today. Fix → green. Bug dies, stays dead. 2. **Desired-behavior (red)** (M4) — write what *should* happen. Fail today. Fix → green. Bug dies, stays dead.
### Manual smoke via config flags ### Manual smoke via config flags
- `STORAGE=firebase` → current behavior (friend's path, upstream default). - `STORAGE=firebase` → current behavior (friend's path, upstream default).
@@ -232,12 +241,12 @@ Each milestone = independently mergeable PR upstream (unless marked ❌).
| Milestone | Upstream-PRable? | Why | | Milestone | Upstream-PRable? | Why |
|---|---|---| |---|---|---|
| 0 repo setup | n/a | fork infra | | 0 repo setup | n/a | fork infra |
| 1 FSM extract + characterization | | pure refactor, identical behavior | | 1 backend | | divergence (friend stays Firebase) |
| 2 backend | ❌ | divergence (friend stays Firebase) | | 2 WS adapter | ⚠️ partial | interface + firebase extract ✅, WS ❌ |
| 3 WS adapter | ⚠️ partial | interface + firebase extract ✅, WS ❌ | | 3 characterization tests | ✅ | if storage-agnostic |
| 4 skip fix + manual override | ✅ | logic fix + beneficial feature | | 4 skip fix + manual override | ✅ | logic fix + beneficial feature |
| 5 docker compose | ❌ | divergence | | 5 docker compose | ❌ | divergence |
| 6 undo rework | ⚠️ partial | FSM logic ✅, events table ❌ | | 6 undo rework | ⚠️ partial | turn-logic-level ✅, events table ❌ |
| 7 playwright | ✅ | if test infra shared | | 7 playwright | ✅ | if test infra shared |
Default `STORAGE=firebase` + `AUTH_MODE=none` (unset) = upstream sees literally zero change. Default `STORAGE=firebase` + `AUTH_MODE=none` (unset) = upstream sees literally zero change.
@@ -246,23 +255,23 @@ Default `STORAGE=firebase` + `AUTH_MODE=none` (unset) = upstream sees literally
## Risks ## Risks
- **CRA + workspaces friction.** Create React App may resist monorepo layout. Mitigation: keep `src/` as CRA root, `shared/` as separate workspace imported via alias. Eject/craco only if forced. - **CRA + workspaces friction.** Create React App may resist monorepo layout. Mitigation: keep `src/` as CRA root, `server/` + `shared/` as separate workspaces imported via alias. Eject/craco only if forced.
- **FSM extraction correctness.** Current logic tangled; verbatim port risks subtle drift. Mitigation: characterization tests first, diff behavior before any fix. - **Turn logic port correctness.** Current logic tangled; verbatim port risks subtle drift. Mitigation: characterization tests in M1/M3 lock behavior before any fix.
- **Firebase drift untested.** Mitigation: interface contract; friend's path his to maintain. - **Firebase drift untested.** Mitigation: interface contract; friend's path his to maintain.
- **Undo history migration.** Existing log entries use old snapshot format. Mitigation: keep old undo working until cleared, new format for new entries. - **Undo history migration.** Existing log entries use old snapshot format. Mitigation: keep old undo working until cleared, new format for new entries.
- **WS reconnect/state-sync edge cases.** Transient drop mid-combat. Mitigation: client requests full state resync on (re)connect; server is source of truth. - **WS reconnect/state-sync edge cases.** Transient drop mid-combat. Mitigation: client requests full state resync on (re)connect; server is source of truth.
--- ---
## Open decisions (need answers before Milestone 0) ## Decisions (locked)
1. **Branch name** off `main`. Propose: `rework-backend`. Confirm or rename. 1. **Branch:** `rework-backend` off `main`.
2. **Manual turn override.** Action `JUMP_TURN_TO(participantId)`. UI button label "Make This Turn" (alt: "Force Turn Here"). Pick. 2. **Manual turn override:** action `JUMP_TURN_TO(participantId)`. UI button "Make This Turn".
3. **npm workspaces** for `shared/` + `server/` alongside CRA `src/`. Prefer yes. CRA may fight → alias as fallback. 3. **npm workspaces** for `server/` + `shared/` alongside CRA `src/`. Fallback alias if CRA fights.
--- ---
## Next action (on approval) ## Next action
Milestone 0: create branch `rework-backend` off `main`, set remotes, commit this plan. M0 ✅ DONE.
Then Milestone 1 kickoff: trace `handleNextTurn` full path + write first characterization test capturing the skip bug. M1 kickoff: scaffold `server/` workspace, set up better-sqlite3 + Express + ws, port turn logic from `App.js` into `server/turn.js`, write first unit tests.