Voice-to-Insight / WhatsApp AI Live

MR Voice

Voice-to-insight for pharma field intelligence — 30-second structured reports from a voice note.

FastAPIAzure OpenAIGPT-5.4gpt-4o-transcribeWhatsApp APITelegram APIAzure Blob Storage

See It in Action

A pharma rep sends a voice note after a doctor visit — and gets a structured intelligence report in under 45 seconds.

1 · Send Voice Note
WhatsApp conversation — MR sends voice recording, bot starts processing
2 · AI Analysis
Structured report — objections raised, commitments, and next steps extracted
3 · Full Report
Report conclusion — sentiment score, rep talk ratio, key insight and coaching

A stateless FastAPI service that turns pharma field rep voice notes into structured intelligence reports — in under 45 seconds. Send a voice message on WhatsApp or Telegram describing a doctor visit; receive back a formatted report with doctor name, products discussed, receptivity assessment, objections, commitments, next steps, coaching notes, and a diarized transcript. No app to install. No forms to fill. Just talk.

See It Work

Medical rep sends a 2-minute voice note after a doctor visit:

“Just met Dr. Patel at his clinic in Koramangala. Discussed Amoxicillin 500mg and the new Azithromycin formulation. He was positive about Amoxicillin, already prescribing it. On Azithromycin he pushed back — said he needs to see more clinical data before switching from his current brand. I left samples and promised to send the Phase 3 trial results by Friday. Follow-up next Wednesday.”

MR Voice returns in ~30 seconds:

  • Doctor: Dr. Patel · Koramangala clinic
  • Products: Amoxicillin 500mg, Azithromycin (new formulation)
  • Receptivity: Positive (Amoxicillin), Neutral (Azithromycin)
  • Objections: Needs clinical data before switching Azithromycin brands
  • Commitments: Send Phase 3 trial results by Friday (Tentative)
  • Next Steps: Follow-up Wednesday, share trial data
  • Follow-up Date: Next Wednesday
  • Key Insight: Dr. Patel is an existing Amoxicillin prescriber — focus on Azithromycin conversion with clinical evidence
  • Coaching: Rep spent 65% of the time talking — aim for 40% to let the doctor surface objections naturally
  • Transcript: Full diarized conversation (MR vs Doctor, speaker-labeled)

The Problem

Medical reps visit 8-12 doctors daily. Reporting is manual — paper forms or CRM apps filled at the end of the day, when details are forgotten. Managers lack real-time visibility into field performance, doctor sentiment, and competitive intelligence.

The industry standard: 2-3 hours daily on reporting, capturing less than half of what was discussed.

The Pipeline

A 7-step stateless pipeline — no database, no queue system, no state between requests:

1. Receive — Voice message arrives via WhatsApp or Telegram webhook. HTTP 200 returned immediately; processing runs in background.

2. Download — Audio bytes fetched from messaging platform CDN (Telegram) or Meta Graph API (WhatsApp). Supports OGG, MP3, WAV, AAC, AMR, WebM.

3. Store — Audio uploaded to Azure Blob Storage with UUID-based naming. Temporary — deleted after processing.

4. Transcribe — Azure OpenAI gpt-4o-transcribe-diarize performs speech-to-text with automatic speaker diarization. Speakers mapped to MR (rep) and Doctor. No local audio processing — raw bytes sent directly.

5. Summarize — GPT-5.4 analyzes the diarized transcript and extracts 15+ structured fields: doctor name, clinic, products discussed, receptivity (Positive/Neutral/Negative with reasoning), objections, commitments (with strength: Strong/Tentative/Polite deflection), next steps, follow-up date, sentiment score, rep talk percentage, key insight, and coaching notes. Output validated as JSON with automatic repair on parse failure.

6. Format — JSON summary converted to platform-specific format: HTML for Telegram (bold, italic, emojis), plain text for WhatsApp. Empty sections omitted. Transcript truncated to 20 lines with full count noted.

7. Deliver — Formatted message sent back to the user. Auto-splits if exceeding platform character limits. Audio blob deleted (fire-and-forget).

Key Technical Decisions

  • Stateless by design — No database. No queue. No user sessions. Every voice note is processed independently via FastAPI BackgroundTasks. This eliminates operational complexity for a service that’s inherently request-response.

  • Diarization at the transcription layergpt-4o-transcribe-diarize handles speaker separation natively. No separate speaker diarization model or audio preprocessing needed. Speakers labeled automatically.

  • JSON repair over retry — If GPT-5.4 returns malformed JSON, a text-only repair call extracts the fields without resending audio. Cheaper and faster than a full retry.

  • Dual-platform from day one — Same AI pipeline serves both Telegram and WhatsApp. Platform-specific code is limited to download/send/format — the core processing is shared.

  • Temporary blob storage — Audio uploaded to Azure Blob only for the transcription API call, then deleted. No audio retention — important for pharma compliance.

Tech Stack

Backend: FastAPI (async) · Python 3.11+ · httpx (async HTTP) · uvicorn

AI: Azure OpenAI gpt-4o-transcribe-diarize (diarized transcription) · Azure OpenAI GPT-5.4 (structured summarization) · Azure OpenAI gpt-audio (test audio generation)

Messaging: WhatsApp Cloud API v25.0 · Telegram Bot API (webhook + long-polling)

Storage: Azure Blob Storage (temporary audio only)

Quality: Ruff · mypy strict · structured JSON logging with correlation IDs · invite-code authorization

Deployment: Docker · Azure App Service · uv (package manager)

Want to build something like this?

Let's talk about your project. No commitment, no slides — just a conversation about what's possible.

Get in Touch