# Initiative Tracker — Rework Plan

Status: **APPROVED — executing**
Owner: draistrick (fork → `keen99/ttrpg-initiative-tracker`, private)
Upstream: `code.draft13.com/robert/ttrpg-initiative-tracker` (friend's Gitea)

---

## Goals

1. **Replace Firebase with self-hosted backend.** Browser cannot own a DB file (sandbox). Cross-device (DM + tablet + player view) requires a real backend. Backend is the foundation, built first.
2. **Automated test ecosystem as the baseline.** Lock current behavior before changing it. Skip bug must become provably impossible to reintroduce.
3. **Remain mergeable upstream.** Default behavior (Firebase) preserved behind flag. Upstream `main` stays clean. Friend keeps Firebase path.
4. **Self-hostable in local Docker** (in-house network). Public exposure = future, only after auth + multiuser safety.

## Non-Goals (this plan)

- Changing user-visible functionality beyond the documented bug fixes (skip, manual turn override).
- Ripping Firebase. Kept as default adapter upstream.
- Public/multiuser deployment. Deferred.
- Rewriting the entire 2935-line `App.js`. Only extract what testability demands.

---

## Problem Statement

### Why Firebase is wrong here (for this fork)
- Requires Google account + network for a single-user tabletop tool.
- Realtime value (DM view ↔ player display) is real but solvable locally.
- API key baked into client bundle (CRA `REACT_APP_*` at build); security depends entirely on console rules not in repo.
- Vendor lock + quota; `onSnapshot` on collections burns reads.
- Friend keeps it; we fork off it.

### Why a backend is mandatory
Browser sandbox cannot write the filesystem. No sqlite file, no `/data/db.sqlite`, nothing. Browser JS is blocked from disk by design. Therefore cross-device storage (DM ↔ tablet ↔ player view) requires a separate Node process owning the DB file and serving the browser over HTTP/WebSocket. There is no browser-only path. **The backend is step one, not deferred.**

### Known bug: initiative skips / lost state
Two failure classes observed:

1. **Race / data loss.** Every turn mutation = client reads snapshot → computes → writes whole doc back. Two interleaved actions → last-write-wins → state lost → skip. Firebase gives eventual snapshots, no transactions. Even single-user bites via optimistic UI vs server round-trip.
2. **Logic drift.** `turnOrderIds` array vs `participants` array vs `isActive` filter drift across mid-combat add/remove/toggle. `currentIndex === -1` fallback path is fragile. No invariant enforced. No way for DM to manually say "this participant's turn now."

### Undo is fragile
Current undo = stale snapshot write-back. Interleaved undos = data loss. Suspected already bitten during live game.

---

## Architecture

### Stack (locked)
- **Node.js** runtime
- **Express** web framework
- **ws** WebSocket lib (realtime push, replaces `onSnapshot`)
- **better-sqlite3** SQLite driver (synchronous, simple, fast)
- **SQLite** DB (single file, docker volume, trivial backup)
- **Jest** test runner (already in CRA deps)

Postgres deferred until public multiuser exposure is real. SQLite schema ports easily if that day comes.

### Backend design
- Owns SQLite file. Only writer.
- Holds authoritative state. Turn logic (initiative order, next-turn, add/remove mid-combat) runs server-side inside SQLite transaction.
- Client sends **action** (e.g. `NEXT_TURN`, not the resulting state). Server computes result, persists, broadcasts diff.
- Kills last-write-wins races by construction.
- WS broadcast on every state change → all connected clients (DM view, player display, tablet) update instantly.

### Three storage impls, one interface (frontend)

The storage interface is the test seam and the upstream-compat layer.

| Impl | When used | Automated-tested? |
|---|---|---|
| `firebase.js` | default (`STORAGE=firebase`) — upstream path | No — requires live Firebase project |
| `ws.js` | `STORAGE=ws` — our fork, talks to backend | Yes — against running backend |
| `memory.js` | test-only, in-process | Yes — fast, deterministic |

**Frontend interface contract** (all three implement):
- `getDoc(path)`, `setDoc(path, data, opts)`, `updateDoc(path, patch)`
- `deleteDoc(path)`, `batch(ops)`
- `subscribeDoc(path, cb)` / `subscribeCollection(path, cb)` → real-time push

Firebase impl: existing `onSnapshot` + SDK calls, moved verbatim behind interface (M2).
WS impl: thin client; dispatches **actions** to backend, receives **state updates** via WS subscribe (M2).
Memory impl: in-memory Map + EventEmitter, for tests (M3).

### Repo layout (npm workspaces)

```
/
  package.json              # workspaces root
  src/                      # React frontend (existing, refactored behind storage interface)
    storage/
      index.js              # factory: pick impl from STORAGE env
      firebase.js           # extracted from current App.js (verbatim)
      ws.js                 # NEW — talks to backend
      memory.js             # NEW — test only
      types.js              # interface contract (JSDoc)
  server/                   # NEW
    index.js                # Express + ws bootstrap
    db.js                   # better-sqlite3, schema, migrations
    turn.js                 # turn-order logic (pure, server-authoritative)
    handlers/               # action handlers (call turn logic, persist, broadcast)
    server.test.js          # API + WS integration tests
  shared/                   # pure logic, no I/O, importable by client + server + tests
    turn.js                 # turn logic (single source; server imports, tests import)
    types.js
  shared.test/              # turn logic unit tests (characterization + desired)
    turn.characterization.test.js
    turn.desired.test.js
  docker-compose.yml        # NEW — M5
  docs/
    REWORK_PLAN.md          # this file
```

### Auth
- **Now:** `AUTH_MODE=none`. App gated by nginx HTTP basic auth (reuse friend's existing pattern). In-house only. Risk acceptable: someone sees your initiative counter.
- **Future:** `AUTH_MODE=token` — real login, real users. Only if/when publicly exposed. Not built this plan.

---

## Milestones

Each milestone = independently mergeable PR upstream (unless marked ❌).

| M | Does | Tests? |
|---|---|---|
| 0 | repo, branch, remotes | no |
| 1 | build backend (Node+Express+ws+better-sqlite3) | unit tests as built |
| 2 | frontend WS adapter — app runs vs backend, cross-device works | yes |
| 3 | characterization tests lock current behavior (skip bug included) | yes |
| 4 | skip fix + manual override, regression-protected | yes |
| 5 | docker compose in-house | smoke |
| 6 | undo rework (tx events) | unit |
| 7 | playwright multi-window e2e (deferred) | e2e |

### Milestone 0 — Repo + branch setup ✅
- Fresh branch off `main` (not `dsr-rework`). Name: `rework-backend`.
- `upstream` remote = friend's Gitea (read-only fetch).
- Push origin = `keen99/ttrpg-initiative-tracker` (private).
- npm workspaces root config.
- Commit this plan.
- **Exit criteria:** clean branch, plan committed, remotes set. ✅ DONE (commit ad7979d, then plan-restored).
- **Upstream-PRable:** n/a (fork infra)

### Milestone 1 — Build backend
- `server/`: Express + ws + better-sqlite3.
- Schema mirrors current Firestore doc tree (campaigns, encounters subcoll, activeDisplay, logs).
- Turn logic (initiative order, next-turn, add/remove mid-combat) ported from `App.js` into `server/turn.js` (pure function, server-authoritative). Port verbatim — bugs included for now.
- Actions dispatched to backend; server computes result, persists in SQLite tx, broadcasts via WS.
- Unit tests as built: turn logic unit tests (characterization capturing current behavior), plus basic API/WS smoke.
- **Exit criteria:** backend boots, serves state over WS, persists to SQLite, unit tests green.
- **Upstream-PRable:** ❌ divergence (friend stays Firebase).

### Milestone 2 — Frontend WS adapter
- Define `storage/types.js` interface.
- Move all ~30 Firestore call sites from `App.js` into `storage/firebase.js` behind interface (verbatim).
- Implement `storage/ws.js` per interface, talking to backend. Dispatches actions, subscribes to WS.
- Implement `storage/memory.js` for frontend unit tests.
- `storage/index.js` factory: `STORAGE` env → pick impl. Default `firebase` (upstream unchanged).
- App runs against backend with `STORAGE=ws`.
- Cross-device verified manually: DM view + player display + tablet.
- **Exit criteria:** app runs fully against local backend, no Firebase. Multi-device sync works.
- **Upstream-PRable:** ⚠️ partial. Storage interface + firebase extract = ✅. WS impl = ❌.

### Milestone 3 — Characterization tests lock current behavior
- Lock current behavior end-to-end via integration tests against running backend (turn logic now server-side).
- Capture the skip bug as a characterization test (whatever current does = locked, bugs included).
- Cover: START, NEXT_TURN, PAUSE, RESUME, ADD_PARTICIPANT, REMOVE_PARTICIPANT, TOGGLE_ACTIVE, REORDER, APPLY_DAMAGE/HEAL, DEATH_SAVE, END.
- Iterate until confident: baseline solid, regressions impossible to silently slip.
- **Exit criteria:** characterization suite green against backend. Baseline locked.
- **Upstream-PRable:** ✅ if kept storage-agnostic (tests target turn logic shape).

### Milestone 4 — Skip fix + manual turn override
- Write desired-behavior tests (red):
  - Never-skip invariant: after `NEXT_TURN`, current participant is always a valid active participant, or encounter cleanly ends.
  - Mid-combat add enters turn order correctly.
  - Remove mid-combat doesn't skip next.
  - Pause/resume preserves order.
- Fix turn logic until red tests go green. Skip bug dies.
- Add new action: `JUMP_TURN_TO(participantId)`. DM clicks participant → cursor jumps → that participant's turn now → future `NEXT_TURN` continues from there. UI button label: "Make This Turn".
- Regression-protected by M3 characterization + new desired tests.
- **Exit criteria:** skip bug gone + provably cannot regress. Manual override works.
- **Upstream-PRable:** ✅ logic fix + new feature, both beneficial.

### Milestone 5 — Docker compose
- `docker-compose.yml`:
  - `backend` service (Node + sqlite volume)
  - `nginx` service (static frontend + reverse proxy + http basic auth)
- Profiles: `firebase` (frontend only, current behavior) vs `backend` (full stack).
- **Exit criteria:** `docker compose up` runs full stack in-house.
- **Upstream-PRable:** ❌ divergence.

### Milestone 6 — Undo rework
- Events table: every mutating action writes `(type, payload, undo_payload, undone, ts)`.
- Undo = apply `undo_payload` in same SQLite tx, flip `undone`. Transactional, no stale clobber.
- Replaces current fragile `/logs` snapshot-write undo.
- Migration: keep old undo working for existing entries until cleared; new format for new entries.
- **Exit criteria:** undo works transactionally; interleaved undos don't corrupt.
- **Upstream-PRable:** ⚠️ partial. Turn-logic-level undo = ✅. Backend events table = ❌.

### Milestone 7 — Playwright E2E (deferred)
- Multi-window E2E: DM view + display + player view in separate browser contexts against running backend.
- Verify realtime sync end-to-end.
- **Only build if sync regresses or we deviate significantly.** Turn-logic unit + backend integration tests cover most regression risk cheaper.
- **Exit criteria:** e2e green for core combat flow across 3 windows.
- **Upstream-PRable:** ✅ if test infra shared.

### Milestone 8 — (Future) Public exposure
- Real auth (`AUTH_MODE=token`).
- Rate limiting, CSRF, hardening.
- Postgres migration if load warrants.
- Only if we decide to expose publicly + multiuser.

---

## Testing strategy

### Layers
1. **Turn logic unit tests** (Jest, pure functions) — every turn transition, skip invariants, manual override. Built in M1 (characterization), extended in M4 (desired). Cheap, essential.
2. **Backend integration tests** (Jest) — spin server on random port, assert WS pushes + SQLite persists + transactional correctness. (M1+)
3. **Frontend adapter contract tests** (Jest, `memory`) — impl parity against interface. (M2)
4. **Playwright multi-window E2E** — deferred. Only realtime sync glue turn logic can't reach. (M7)

### Two-pass on turn logic (M1 → M4)
1. **Characterization** (M1/M3) — capture current behavior exactly (bugs included). Locks extraction/port as provably identical. Lets later fix be provable.
2. **Desired-behavior (red)** (M4) — write what *should* happen. Fail today. Fix → green. Bug dies, stays dead.

### Manual smoke via config flags
- `STORAGE=firebase` → current behavior (friend's path, upstream default).
- `STORAGE=ws` → our path, local backend.
- docker-compose profiles mirror the above.

### Accepted test gap
- Firebase adapter untested (requires live project). Accepted cost.
- Mitigated by: interface contract; if firebase impl drifts, integration smoke only.

---

## Mergability upstream

| Milestone | Upstream-PRable? | Why |
|---|---|---|
| 0 repo setup | n/a | fork infra |
| 1 backend | ❌ | divergence (friend stays Firebase) |
| 2 WS adapter | ⚠️ partial | interface + firebase extract ✅, WS ❌ |
| 3 characterization tests | ✅ | if storage-agnostic |
| 4 skip fix + manual override | ✅ | logic fix + beneficial feature |
| 5 docker compose | ❌ | divergence |
| 6 undo rework | ⚠️ partial | turn-logic-level ✅, events table ❌ |
| 7 playwright | ✅ | if test infra shared |

Default `STORAGE=firebase` + `AUTH_MODE=none` (unset) = upstream sees literally zero change.

---

## Risks

- **CRA + workspaces friction.** Create React App may resist monorepo layout. Mitigation: keep `src/` as CRA root, `server/` + `shared/` as separate workspaces imported via alias. Eject/craco only if forced.
- **Turn logic port correctness.** Current logic tangled; verbatim port risks subtle drift. Mitigation: characterization tests in M1/M3 lock behavior before any fix.
- **Firebase drift untested.** Mitigation: interface contract; friend's path his to maintain.
- **Undo history migration.** Existing log entries use old snapshot format. Mitigation: keep old undo working until cleared, new format for new entries.
- **WS reconnect/state-sync edge cases.** Transient drop mid-combat. Mitigation: client requests full state resync on (re)connect; server is source of truth.

---

## Decisions (locked)

1. **Branch:** `rework-backend` off `main`.
2. **Manual turn override:** action `JUMP_TURN_TO(participantId)`. UI button "Make This Turn".
3. **npm workspaces** for `server/` + `shared/` alongside CRA `src/`. Fallback alias if CRA fights.

---

## Next action

M0 ✅ DONE.
M1 kickoff: scaffold `server/` workspace, set up better-sqlite3 + Express + ws, port turn logic from `App.js` into `server/turn.js`, write first unit tests.