14 KiB
Initiative Tracker — Rework Plan
Status: DRAFT — pending approval to execute
Owner: draistrick (fork → keen99/ttrpg-initiative-tracker, private)
Upstream: code.draft13.com/robert/ttrpg-initiative-tracker (friend's Gitea)
Goals
- Replace Firebase with self-hosted backend. Browser cannot own a DB file (sandbox). Cross-device (DM + tablet + player view) requires a real backend. Backend is the foundation, built first.
- Automated test ecosystem as the baseline. Lock current behavior before changing it. Skip bug must become provably impossible to reintroduce.
- Remain mergeable upstream. Default behavior (Firebase) preserved behind flag. Upstream
mainstays clean. Friend keeps Firebase path. - Self-hostable in local Docker (in-house network). Public exposure = future, only after auth + multiuser safety.
Non-Goals (this plan)
- Changing user-visible functionality beyond the documented bug fixes (skip, manual turn override).
- Ripping Firebase. Kept as default adapter upstream.
- Public/multiuser deployment. Deferred.
- Rewriting the entire 2935-line
App.js. Only extract what testability demands.
Problem Statement
Why Firebase is wrong here (for this fork)
- Requires Google account + network for a single-user tabletop tool.
- Realtime value (DM view ↔ player display) is real but solvable locally.
- API key baked into client bundle (CRA
REACT_APP_*at build); security depends entirely on console rules not in repo. - Vendor lock + quota;
onSnapshoton collections burns reads. - Friend keeps it; we fork off it.
Why a backend is mandatory
Browser sandbox cannot write the filesystem. No sqlite file, no /data/db.sqlite, nothing. Browser JS is blocked from disk by design. Therefore cross-device storage (DM ↔ tablet ↔ player view) requires a separate Node process owning the DB file and serving the browser over HTTP/WebSocket. There is no browser-only path. The backend is step one, not deferred.
Known bug: initiative skips / lost state
Two failure classes observed:
- Race / data loss. Every turn mutation = client reads snapshot → computes → writes whole doc back. Two interleaved actions → last-write-wins → state lost → skip. Firebase gives eventual snapshots, no transactions. Even single-user bites via optimistic UI vs server round-trip.
- Logic drift.
turnOrderIdsarray vsparticipantsarray vsisActivefilter drift across mid-combat add/remove/toggle.currentIndex === -1fallback path is fragile. No invariant enforced. No way for DM to manually say "this participant's turn now."
Undo is fragile
Current undo = stale snapshot write-back. Interleaved undos = data loss. Suspected already bitten during live game.
Architecture
Stack (locked)
- Node.js runtime
- Express web framework
- ws WebSocket lib (realtime push, replaces
onSnapshot) - better-sqlite3 SQLite driver (synchronous, simple, fast)
- SQLite DB (single file, docker volume, trivial backup)
- Jest test runner (already in CRA deps)
Postgres deferred until public multiuser exposure is real. SQLite schema ports easily if that day comes.
Three storage impls, one interface (frontend)
The storage interface is the test seam and the upstream-compat layer.
| Impl | When used | Automated-tested? |
|---|---|---|
firebase.js |
default (STORAGE=firebase) — upstream path |
No — requires live Firebase project |
ws.js |
STORAGE=ws — our fork, talks to backend |
Yes — against running backend |
memory.js |
test-only, in-process | Yes — fast, deterministic |
Frontend interface contract (all three implement):
getDoc(path),setDoc(path, data, opts),updateDoc(path, patch)deleteDoc(path),batch(ops)subscribeDoc(path, cb)/subscribeCollection(path, cb)→ real-time push
Firebase impl: existing onSnapshot + SDK calls, moved verbatim behind interface.
WS impl: thin client; dispatches actions to backend, receives state updates via WS subscribe.
Memory impl: in-memory Map + EventEmitter, for tests that don't need the backend.
Backend design
- Owns SQLite file. Only writer.
- Holds authoritative state. FSM (turn logic) runs server-side inside SQLite transaction.
- Client sends action (e.g.
NEXT_TURN, not the resulting state). Server computes result, persists, broadcasts diff. - Kills last-write-wins races by construction.
- WS broadcast on every state change → all connected clients (DM view, player display, tablet) update instantly.
Repo layout (npm workspaces)
/
package.json # workspaces root
src/ # React frontend (existing, refactored behind storage interface)
storage/
index.js # factory: pick impl from STORAGE env
firebase.js # extracted from current App.js (verbatim)
ws.js # NEW — talks to backend
memory.js # NEW — test only
types.js # interface contract (JSDoc)
server/ # NEW
index.js # Express + ws bootstrap
db.js # better-sqlite3, schema, migrations
fsm/
turn.js # turn-order state machine (pure)
handlers/ # action handlers (call fsm, persist, broadcast)
server.test.js # API + WS integration tests
shared/ # pure logic, no I/O, importable by client + server + tests
turn.js # turn FSM re-export (single source)
types.js
shared.test/ # FSM unit tests (characterization + desired)
turn.characterization.test.js
turn.desired.test.js
docker-compose.yml # NEW — Milestone E
docs/
REWORK_PLAN.md # this file
Auth
- Now:
AUTH_MODE=none. App gated by nginx HTTP basic auth (reuse friend's existing pattern). In-house only. Risk acceptable: someone sees your initiative counter. - Future:
AUTH_MODE=token— real login, real users. Only if/when publicly exposed. Not built this plan.
Milestones
Each milestone = independently mergeable PR upstream (unless marked ❌).
Milestone 0 — Repo + branch setup
- Fresh branch off
main(notdsr-rework— avoid contamination). Name:rework-backend. - Add
upstreamremote (friend's Gitea, read-only fetch). - Push origin =
keen99/ttrpg-initiative-tracker(private). - npm workspaces root config.
- Commit this plan.
- Exit criteria: clean branch, plan committed, remotes set.
- Upstream-PRable: n/a (fork infra)
Milestone 1 — Turn FSM extraction + characterization tests
- Extract pure turn-order logic from
App.js(handleNextTurn,computeTurnOrder*, sort, add/remove mid-combat) intoshared/turn.js. - Pure function:
(state, action) → state. No I/O, no Firebase, no React. - Port verbatim — bugs included.
- Write characterization tests capturing current behavior (including the skip bug). Lock reality.
- FSM unit-testable in Node with zero infra.
- Exit criteria: all characterization tests green. Behavior provably identical to current.
- Upstream-PRable: ✅ pure refactor, zero behavior change, no Firebase dependency introduced.
Milestone 2 — Backend skeleton
server/: Express + ws + better-sqlite3.- Schema mirrors current Firestore doc tree (campaigns, encounters subcoll, activeDisplay, logs).
- FSM (from Milestone 1) runs server-side inside SQLite transaction.
- WS broadcast on every state change.
- Backend integration tests: spin server on random port, assert WS pushes + SQLite persists.
- Exit criteria: backend boots, serves state over WS, persists to SQLite, tests green.
- Upstream-PRable: ❌ divergence (friend stays Firebase).
Milestone 3 — Frontend WS adapter
- Define
storage/types.jsinterface. - Move all ~30 Firestore call sites from
App.jsintostorage/firebase.jsbehind interface (verbatim). - Implement
storage/ws.jsper interface, talking to backend. Dispatches actions, subscribes to WS. - Implement
storage/memory.jsfor frontend unit tests. storage/index.jsfactory:STORAGEenv → pick impl. Defaultfirebase(upstream unchanged).- App runs against backend with
STORAGE=ws. - Cross-device verified manually: DM view + player display + tablet.
- Exit criteria: app runs fully against local backend, no Firebase. Multi-device sync works.
- Upstream-PRable: ⚠️ partial. Storage interface + firebase extract = ✅. WS impl = ❌.
Milestone 4 — Red tests + fix skip bug + manual turn override
- Write desired-behavior tests (red):
- Never-skip invariant: after
NEXT_TURN, current participant is always a valid active participant, or encounter cleanly ends. - Mid-combat add enters turn order correctly.
- Remove mid-combat doesn't skip next.
- Pause/resume preserves order.
- Never-skip invariant: after
- Fix FSM until red tests go green. Skip bug dies.
- Add new action:
JUMP_TURN_TO(participantId). DM clicks participant → cursor jumps → that participant's turn now → futureNEXT_TURNcontinues from there. UI button label: "Make This Turn" (candidates: "Force Turn Here"). - Regression-protected by Milestone 1 characterization + new desired tests.
- Exit criteria: skip bug gone + provably cannot regress. Manual override works.
- Upstream-PRable: ✅ logic fix + new feature, both beneficial.
Milestone 5 — Docker compose
docker-compose.yml:backendservice (Node + sqlite volume)nginxservice (static frontend + reverse proxy + http basic auth)
- Profiles:
firebase(frontend only, current behavior) vsbackend(full stack). - Exit criteria:
docker compose upruns full stack in-house. - Upstream-PRable: ❌ divergence.
Milestone 6 — Undo rework
- Events table: every mutating action writes
(type, payload, undo_payload, undone, ts). - Undo = apply
undo_payloadin same SQLite tx, flipundone. Transactional, no stale clobber. - Replaces current fragile
/logssnapshot-write undo. - Migration: keep old undo working for existing entries until cleared; new format for new entries.
- Exit criteria: undo works transactionally; interleaved undos don't corrupt.
- Upstream-PRable: ✅ if logic kept storage-agnostic (FSM-level). Backend-specific events table = ❌.
Milestone 7 — Playwright E2E (deferred)
- Multi-window E2E: DM view + display + player view in separate browser contexts against running backend.
- Verify realtime sync end-to-end.
- Only build if sync regresses or we deviate significantly. FSM + backend integration tests cover most regression risk cheaper.
- Exit criteria: e2e green for core combat flow across 3 windows.
- Upstream-PRable: ✅ if test infra shared.
Milestone 8 — (Future) Public exposure
- Real auth (
AUTH_MODE=token). - Rate limiting, CSRF, hardening.
- Postgres migration if load warrants.
- Only if we decide to expose publicly + multiuser.
Testing strategy
Layers
- FSM unit tests (Jest, pure functions) — every turn transition, skip invariants, manual override. Cheap, essential. Covers skip bug permanently. (Milestones 1, 4)
- Backend integration tests (Jest) — spin server on random port, assert WS pushes + SQLite persists + transactional correctness. (Milestone 2)
- Frontend adapter contract tests (Jest,
memory) — impl parity against interface. (Milestone 3) - Playwright multi-window E2E — deferred. Only realtime sync glue FSM can't reach. (Milestone 7)
Two-pass on FSM (Milestones 1 → 4)
- Characterization — capture current behavior exactly (bugs included). Locks extraction as provably identical. Lets later refactor port safely.
- Desired-behavior (red) — write what should happen. Fail today. Fix → green. Bug dies, stays dead.
Manual smoke via config flags
STORAGE=firebase→ current behavior (friend's path, upstream default).STORAGE=ws→ our path, local backend.- docker-compose profiles mirror the above.
Accepted test gap
- Firebase adapter untested (requires live project). Accepted cost.
- Mitigated by: interface contract; if firebase impl drifts, integration smoke only.
Mergability upstream
| Milestone | Upstream-PRable? | Why |
|---|---|---|
| 0 repo setup | n/a | fork infra |
| 1 FSM extract + characterization | ✅ | pure refactor, identical behavior |
| 2 backend | ❌ | divergence (friend stays Firebase) |
| 3 WS adapter | ⚠️ partial | interface + firebase extract ✅, WS ❌ |
| 4 skip fix + manual override | ✅ | logic fix + beneficial feature |
| 5 docker compose | ❌ | divergence |
| 6 undo rework | ⚠️ partial | FSM logic ✅, events table ❌ |
| 7 playwright | ✅ | if test infra shared |
Default STORAGE=firebase + AUTH_MODE=none (unset) = upstream sees literally zero change.
Risks
- CRA + workspaces friction. Create React App may resist monorepo layout. Mitigation: keep
src/as CRA root,shared/as separate workspace imported via alias. Eject/craco only if forced. - FSM extraction correctness. Current logic tangled; verbatim port risks subtle drift. Mitigation: characterization tests first, diff behavior before any fix.
- Firebase drift untested. Mitigation: interface contract; friend's path his to maintain.
- Undo history migration. Existing log entries use old snapshot format. Mitigation: keep old undo working until cleared, new format for new entries.
- WS reconnect/state-sync edge cases. Transient drop mid-combat. Mitigation: client requests full state resync on (re)connect; server is source of truth.
Open decisions (need answers before Milestone 0)
- Branch name off
main. Propose:rework-backend. Confirm or rename. - Manual turn override. Action
JUMP_TURN_TO(participantId). UI button label "Make This Turn" (alt: "Force Turn Here"). Pick. - npm workspaces for
shared/+server/alongside CRAsrc/. Prefer yes. CRA may fight → alias as fallback.
Next action (on approval)
Milestone 0: create branch rework-backend off main, set remotes, commit this plan.
Then Milestone 1 kickoff: trace handleNextTurn full path + write first characterization test capturing the skip bug.