Project Design Doc
[Project name] - scope of work, architecture, and open questions. This document has three parts: the Scope of Work (below), what the client sees when the product ships, and how it works under the hood.
Phase 1 - V1 (current scope) V1
Rough shape: ~3 weeks end to end.
V1 delivers the full member-facing product on the full content library. It exists to prove that members will use it and that the anti-hallucination guarantee holds up under real queries.
Success criteria
- Members can ask questions and get grounded, video-cited answers on the full library.
- Members can browse the catalog by every important facet (practice area, sub-area, state approval, format, date, popularity) and find programs quickly.
- The product is embedded in the members' site behind proper membership gating; non-members see a teaser and get upsold.
- The evaluation harness reports the retrieval hit-rate and grounding score, and we have improved both to an agreed-upon bar.
- The client can sign off on a demo acceptance script before V2 starts.
Workstream 1.1 - Setup, infrastructure & migration
- Access, scope lock, and handoff - collect access to the current system (repo, app, existing vendors, content platform, video store, designs, content sample); lock the V1 win condition and deadline; run a knowledge-transfer session with the departing developer.
- Migration off the current RAG vendor - export the transcripts, chunks, and metadata before the current vendor goes offline; stand up our own catalog + vector store so nothing depends on the outgoing vendor after the cutover.
- Single secure cloud environment - Container Apps + Postgres (catalog + pgvector) + managed Redis + queue + blob storage + secret store + private networking + CI/CD + basic monitoring and budget alerts. Single environment, not multi-region.
- Content pipeline over the full library - extract, transcribe with diarization, resolve speakers to named attorneys, chunk with timecodes, tag, embed, and index every program.
- Retrieval + RAG backbone - the embedding + hybrid search + re-rank + grounded reasoning stack, plus SSE streaming for the member-facing answer.
- Auth + three-tier gating foundation - the token handshake between the members' site and the embedded app; entitlement checks; signed video URLs; the public / teaser / member boundary enforced server-side.
- Operator console - internal UI for the human-in-the-loop steps (speaker verification, tag corrections, content review).
Workstream 1.2 - Member-facing UI
- Embed + the 4-panel answer - iframe embed with the token handshake; the four-panel answer streaming token by token; single and multi-clip answer variants; core states (loading, streaming, teaser, no-source, error); link out to program materials.
- Hybrid search UI - faceted browse with counts and subcategorization; keyword and semantic search; live and replay content featured first; the narrowing UX (subtopic chips, "tell us more" flow).
- Video playback + upgrade moment - seek-and-play a range via signed URLs; the teaser-to-upgrade paywall UX.
Workstream 1.3 - Testing, quality & the refinement loop
- Initial eval set - build the first set of real attorney queries + expected results with client input; this becomes the ground truth we score against.
- Refinement loop - score retrieval hit-rate + grounding, tune prompts / chunking / tags / re-rank, repeat until we hit the agreed quality bar.
- Grounding + safety - citation check, no-source fallback, "not legal advice" disclaimer placement.
- QA + basic telemetry - QA the three-tier gating, the streaming and resume behavior, and the entire member journey; ship basic telemetry (queries, clip plays, teaser hits, upgrades).