Project Design Doc

[Project name] - scope of work, architecture, and open questions. This document has three parts: the Scope of Work (below), what the client sees when the product ships, and how it works under the hood.

Phase 1 - V1 (current scope) V1

Rough shape: ~3 weeks end to end.

V1 delivers the full member-facing product on the full content library. It exists to prove that members will use it and that the anti-hallucination guarantee holds up under real queries.

Success criteria

Members can ask questions and get grounded, video-cited answers on the full library.
Members can browse the catalog by every important facet (practice area, sub-area, state approval, format, date, popularity) and find programs quickly.
The product is embedded in the members' site behind proper membership gating; non-members see a teaser and get upsold.
The evaluation harness reports the retrieval hit-rate and grounding score, and we have improved both to an agreed-upon bar.
The client can sign off on a demo acceptance script before V2 starts.

Workstream 1.1 - Setup, infrastructure & migration

Access, scope lock, and handoff - collect access to the current system (repo, app, existing vendors, content platform, video store, designs, content sample); lock the V1 win condition and deadline; run a knowledge-transfer session with the departing developer.
Migration off the current RAG vendor - export the transcripts, chunks, and metadata before the current vendor goes offline; stand up our own catalog + vector store so nothing depends on the outgoing vendor after the cutover.
Single secure cloud environment - Container Apps + Postgres (catalog + pgvector) + managed Redis + queue + blob storage + secret store + private networking + CI/CD + basic monitoring and budget alerts. Single environment, not multi-region.
Content pipeline over the full library - extract, transcribe with diarization, resolve speakers to named attorneys, chunk with timecodes, tag, embed, and index every program.
Retrieval + RAG backbone - the embedding + hybrid search + re-rank + grounded reasoning stack, plus SSE streaming for the member-facing answer.
Auth + three-tier gating foundation - the token handshake between the members' site and the embedded app; entitlement checks; signed video URLs; the public / teaser / member boundary enforced server-side.
Operator console - internal UI for the human-in-the-loop steps (speaker verification, tag corrections, content review).

Workstream 1.2 - Member-facing UI

Embed + the 4-panel answer - iframe embed with the token handshake; the four-panel answer streaming token by token; single and multi-clip answer variants; core states (loading, streaming, teaser, no-source, error); link out to program materials.
Hybrid search UI - faceted browse with counts and subcategorization; keyword and semantic search; live and replay content featured first; the narrowing UX (subtopic chips, "tell us more" flow).
Video playback + upgrade moment - seek-and-play a range via signed URLs; the teaser-to-upgrade paywall UX.

Workstream 1.3 - Testing, quality & the refinement loop

Initial eval set - build the first set of real attorney queries + expected results with client input; this becomes the ground truth we score against.
Refinement loop - score retrieval hit-rate + grounding, tune prompts / chunking / tags / re-rank, repeat until we hit the agreed quality bar.
Grounding + safety - citation check, no-source fallback, "not legal advice" disclaimer placement.
QA + basic telemetry - QA the three-tier gating, the streaming and resume behavior, and the entire member journey; ship basic telemetry (queries, clip plays, teaser hits, upgrades).

How It Connects

System context

Rendering…

Query, Retrieval & the 4-Panel Answer V1V2 conversation

Query sequence: member ask to streamed answer

Rendering…

Retrieval + grounding loop

Rendering…

Hybrid Search V1 coreV2 autocomplete

Hybrid search paths: facets + keyword + semantic

Rendering…

Grounding, Safety & the Quality Loop V1V2 advanced training

Grounding pipeline: layered checks

Rendering…

Training / refinement loop

Rendering…

Evaluation harness

Rendering…

Architecture Overview V1

Architecture overview: request plane + processing plane

Rendering…

System startup + health checks

Rendering…

Content Ingestion V1

Content ingestion pipeline

Rendering…

Video Delivery & Timecoded Playback V1

Video delivery: signed URL to seek-and-play

Rendering…

Clip fetch flow: citation to byte range

Rendering…

Streaming architecture (SSE + Redis)

Rendering…