IKS Health — Azure AI to Google Cloud AI Migration for the Stacks AI ML Engine
IKS Health partnered with Wohlig Transformations to optimize the Stacks AI ML Engine — its AI-powered healthcare document-processing platform — across the OCR, Search, and LLM layers on Google Cloud, achieving $0.00557 per page (5.4× better than the SOW’s <3¢/page target) and a 15-percentage-point gain in multi-page document matching accuracy in a 3-week sprint.
Project Overview
As part of a wider cloud modernization initiative, IKS Health moved its Stacks AI ML Engine — the document-intelligence platform that processes patient records, dates of service, providers, medical images, and multi-page clinical reports — from a multi-vendor Azure AI Search + Azure OCR + OpenAI GPT-4o stack to a unified Google Cloud AI stack built on Vertex AI Search, Document AI, and Gemini 2.5 Flash. IKS Health had rebuilt the codebase on a five-service Cloud Run architecture before engaging Wohlig; our mandate was to tune the OCR, Search, and LLM layers and hand back a production-ready cutover plan. The work ran as a three-week optimization sprint (Apr 6 – Apr 24, 2026) plus cutover preparation into early May, with every artefact delivered to IKS Health’s ML and Server teams.
The Challenge
Multi-Vendor Complexity. AI workflows were split across Azure AI Search, Azure OCR, and OpenAI GPT-4o — three vendors, three SLAs, three cost models sitting in front of a single clinical workflow.
Healthcare Document Complexity. Production traffic includes multi-page clinical reports, MRI / X-ray imaging, mixed document types per batch, and multiple patients per submission — healthcare-grade accuracy is required on both single-page and multi-page workflows.
Large-File Processing Failures. High-resolution or high-page-count documents failed inside the initial pipeline even when on-disk file size was modest — a system-capacity issue, not a simple size threshold.
Cost-per-Page Target. The SOW set a hard ceiling of under 3 cents per page across the combined OCR + Search + LLM pipeline — end-to-end optimization, not single-component tuning.
Production-Grade Cutover. First-time go-live with no prior production to roll back to — every Go/No-Go gate had to be defensible.
Key Objectives
Unified Cloud AI Stack: Migrate from Azure + OpenAI to Vertex AI Search + Document AI + Gemini 2.5 Flash without regressing on accuracy or relevance.
Sub-3¢-per-Page Pipeline: Hit the SOW’s cost target across OCR + Search + LLM combined, on real benchmarked workloads.
Multi-Page Accuracy Lift: Improve document grouping and field-level extraction on multi-page healthcare reports.
Reusable AI Agent: Package the optimized engine as a self-contained, modular service IKS Health can drop into future projects.
Production Cutover Plan: Go/No-Go gates, runbook, acceptance tests, and live-monitoring checks so first-time go-live behaves like a controlled deploy.
The Solution: Optimized Five-Service Cloud Run Pipeline
Five-Service Cloud Run Architecture. The engine runs as five independently-scaled Cloud Run services — chunk-coordinator → conversion-service (small + large) → ai-processing → status-notifier. CPU, memory, concurrency, timeout, and gunicorn workers are configured per service to match the workload.
The conv-large Split (Wohlig-introduced). We split conversion-service into two pools. Small jobs stay at 4 CPU / 16 GiB / concurrency 8. A new conversion-service-large runs at 8 CPU / 32 GiB, concurrency 1, timeout 3600s — a dedicated worker for high-resolution and high-page-count documents that isolates the long-tail without slowing the hot path.
OCRTEXT Pipeline Mode. We switched LLM_INPUT_TYPE from IMAGE to OCRTEXT — Gemini 2.5 Flash now consumes Document AI‘s clean OCR text instead of raw PNG buffers from every page. Smaller payloads, faster prompts, fewer hallucinations on tables and dates — the single change that lifted both accuracy and cost together.
Gemini 2.5 Flash (fine-tuned). Replaces OpenAI GPT-4o at the LLM stage. Cross-model benchmarking across GPT-4o, Gemini Pro, and Gemini 2.5 Flash drove the selection on unit cost, latency, and field-level accuracy; Gemini Pro stays available as a fallback for harder document classes.
Technology Stack. Vertex AI Search, Document AI, Gemini 2.5 Flash, Cloud Run, GKE, BigQuery, Cloud Storage, Cloud Build, Terraform, Cloud Monitoring, Cloud IAM, Secret Manager.
Key Benefits & Results
Previous: $0.00689 per page on the Azure + OpenAI baseline. Our Solution: OCRTEXT mode + fine-tuned Gemini 2.5 Flash + per-service Cloud Run sizing. Result: $0.00557 per page — 5.4× better than the SOW’s <3¢ target; total run cost $7.736 → $6.223 on the same 1,118-page benchmark (−19.56%).
Previous: Multi-page document matching at 65%. Our Solution: conv-large split + chunking + OCRTEXT pipeline. Result: 80% multi-page matching (+15 percentage points).
Previous: Files Fully Matched at 67.60%. Our Solution: Optimized end-to-end pipeline. Result: 78.10% (+10.50 pp).
Previous: Field-level accuracy on the Azure baseline. Our Solution: Field-specific prompts + OCRTEXT mode. Result: PatientDOB 82.09% → 89.94% (+7.85 pp), PatientName 95.06% → 97.69%, DateOfService 66.10% → 70.00%, Provider 55.75% → 58.78%.
Previous: Large-file processing failures on high-resolution / high-page-count documents. Our Solution: New conversion-service-large (8 CPU / 32 GiB / concurrency 1 / 3600s). Result: Mitigated and tracked through cutover gate G5.
Previous: No production cutover discipline. Our Solution: 11 Go/No-Go gates + 14-step runbook + 6 acceptance tests + 8 live-monitoring checks. Result: Production-grade first-time go-live readiness, handed to the ML and Server teams.
Technical Innovation
OCRTEXT Pipeline Mode. Switching Gemini’s input from raw PNG buffers to Document AI OCR text simultaneously lifted accuracy and dropped cost — a single change with two-axis impact.
conv-large Service Split. A dedicated Cloud Run service for high-resolution and high-page-count documents (8 CPU / 32 GiB / concurrency 1), without sacrificing throughput on the small / fast documents that route to conv-small.
Tuned Per-Service Sizing. Each of the five Cloud Run services is configured independently — CPU, memory, concurrency, timeout, gunicorn workers and threads — for the workload it actually handles.
Production Cutover Discipline. 11 Go/No-Go gates, a 14-step runbook, 6 acceptance tests, and 8 live-monitoring checks. Open production-readiness items (large-file, high-page-count, bulk-load) are tracked transparently as gates G5–G7.
Reusable AI Agent Packaging. The optimized stack ships as a self-contained, modular agent that IKS Health can drop into future document-intelligence workflows without re-architecting the OCR / Search / LLM layer.
Wohlig’s Approach
Discovery & migration assessment — audit existing Azure AI Search, Azure OCR, and OpenAI GPT-4o integrations; document baselines, token usage, and the per-page cost scoreboard.
Vertex AI Search optimization — tune data stores, schemas, and ingestion pipelines; validate content and embedding parity with the Azure baseline.
Document AI OCR integration — configure processors for IKS Health’s document types; switch the pipeline to OCRTEXT mode.
Gemini LLM optimization & prompt re-engineering — re-engineer prompts for Gemini’s instruction format and 1M-token context window; fine-tune Gemini 2.5 Flash for stable field-level extraction.
Code refactoring & reusable agent packaging — factor the orchestration into a modular, agent-based service IKS Health can reuse across future projects.
Cross-model benchmarking, system testing, cutover plan — GPT-4o vs Gemini Pro vs Gemini Flash on the same workload; an 11-scenario test pass (single-page, multi-page grouping, improper sequence, mixed document types, missing fields, fuzzy matching, exact-match DOB / DOS, case insensitivity, MRI / X-ray imaging, low-quality OCR, multi-patient batches); production cutover plan with 11 Go/No-Go gates, 14-step runbook, 6 acceptance tests, and 8 live-monitoring checks.
About IKS Health
IKS Health is a leading US-focused healthcare solutions company providing revenue cycle management, clinical documentation improvement, and IT services to hospitals and physician groups. By combining clinical expertise with intelligent automation, IKS Health reduces administrative burdens so clinicians can focus on delivering quality patient care.
About Wohlig Transformations Pvt. Ltd.
Founded in 2015, Wohlig Transformations specialises in GenAI and DevOps, with 160+ professionals across India and the UK.
Detailed Case Study Presentation : https://canva.link/jazml6x3dxtavc9


