A stateless FastAPI service that turns pharma field rep voice notes into structured intelligence reports — in under 45 seconds. Send a voice message on WhatsApp or Telegram describing a doctor visit; receive back a formatted report with doctor name, products discussed, receptivity assessment, objections, commitments, next steps, coaching notes, and a diarized transcript. No app to install. No forms to fill. Just talk.

See It Work

Medical rep sends a 2-minute voice note after a doctor visit:

“Just met Dr. Patel at his clinic in Koramangala. Discussed Amoxicillin 500mg and the new Azithromycin formulation. He was positive about Amoxicillin, already prescribing it. On Azithromycin he pushed back — said he needs to see more clinical data before switching from his current brand. I left samples and promised to send the Phase 3 trial results by Friday. Follow-up next Wednesday.”

MR Voice returns in ~30 seconds:

Doctor: Dr. Patel · Koramangala clinic
Products: Amoxicillin 500mg, Azithromycin (new formulation)
Receptivity: Positive (Amoxicillin), Neutral (Azithromycin)
Objections: Needs clinical data before switching Azithromycin brands
Commitments: Send Phase 3 trial results by Friday (Tentative)
Next Steps: Follow-up Wednesday, share trial data
Follow-up Date: Next Wednesday
Key Insight: Dr. Patel is an existing Amoxicillin prescriber — focus on Azithromycin conversion with clinical evidence
Coaching: Rep spent 65% of the time talking — aim for 40% to let the doctor surface objections naturally
Transcript: Full diarized conversation (MR vs Doctor, speaker-labeled)

The Problem

Medical reps visit 8-12 doctors daily. Reporting is manual — paper forms or CRM apps filled at the end of the day, when details are forgotten. Managers lack real-time visibility into field performance, doctor sentiment, and competitive intelligence.

The industry standard: 2-3 hours daily on reporting, capturing less than half of what was discussed.

The Pipeline

A 7-step stateless pipeline — no database, no queue system, no state between requests:

1. Receive — Voice message arrives via WhatsApp or Telegram webhook. HTTP 200 returned immediately; processing runs in background.

2. Download — Audio bytes fetched from messaging platform CDN (Telegram) or Meta Graph API (WhatsApp). Supports OGG, MP3, WAV, AAC, AMR, WebM.

3. Store — Audio uploaded to Azure Blob Storage with UUID-based naming. Temporary — deleted after processing.

4. Transcribe — Azure OpenAI gpt-4o-transcribe-diarize performs speech-to-text with automatic speaker diarization. Speakers mapped to MR (rep) and Doctor. No local audio processing — raw bytes sent directly.

5. Summarize — GPT-5.4 analyzes the diarized transcript and extracts 15+ structured fields: doctor name, clinic, products discussed, receptivity (Positive/Neutral/Negative with reasoning), objections, commitments (with strength: Strong/Tentative/Polite deflection), next steps, follow-up date, sentiment score, rep talk percentage, key insight, and coaching notes. Output validated as JSON with automatic repair on parse failure.

6. Format — JSON summary converted to platform-specific format: HTML for Telegram (bold, italic, emojis), plain text for WhatsApp. Empty sections omitted. Transcript truncated to 20 lines with full count noted.

7. Deliver — Formatted message sent back to the user. Auto-splits if exceeding platform character limits. Audio blob deleted (fire-and-forget).

Key Technical Decisions

Stateless by design — No database. No queue. No user sessions. Every voice note is processed independently via FastAPI BackgroundTasks. This eliminates operational complexity for a service that’s inherently request-response.
Diarization at the transcription layer — gpt-4o-transcribe-diarize handles speaker separation natively. No separate speaker diarization model or audio preprocessing needed. Speakers labeled automatically.
JSON repair over retry — If GPT-5.4 returns malformed JSON, a text-only repair call extracts the fields without resending audio. Cheaper and faster than a full retry.
Dual-platform from day one — Same AI pipeline serves both Telegram and WhatsApp. Platform-specific code is limited to download/send/format — the core processing is shared.
Temporary blob storage — Audio uploaded to Azure Blob only for the transcription API call, then deleted. No audio retention — important for pharma compliance.

Tech Stack

Backend: FastAPI (async) · Python 3.11+ · httpx (async HTTP) · uvicorn

AI: Azure OpenAI gpt-4o-transcribe-diarize (diarized transcription) · Azure OpenAI GPT-5.4 (structured summarization) · Azure OpenAI gpt-audio (test audio generation)

Messaging: WhatsApp Cloud API v25.0 · Telegram Bot API (webhook + long-polling)

Storage: Azure Blob Storage (temporary audio only)

Quality: Ruff · mypy strict · structured JSON logging with correlation IDs · invite-code authorization

Deployment: Docker · Azure App Service · uv (package manager)

MR Voice

See It in Action

See It Work

The Problem

The Pipeline

Key Technical Decisions

Tech Stack

Want to build something like this?