Dr. Reddy's Laboratories: From a GenAI Workshop to an AI-Powered Patent Intelligence Platform
Project Overview
Dr. Reddy’s Laboratories (DRL) partnered with Wohlig Transformations in a two-phase engagement — starting with a 1-week Google Cloud GenAI workshop and continuing into an ongoing engineering engagement building Project Cognito, DRL’s AI-powered drug prioritization platform. The first production pillar shipped is the IP (Intellectual Property) Pillar — a multi-model patent-landscape analysis pipeline (Gemini extracts, Claude judges) running across four analytical dimensions.
DRL’s R&D, Manufacturing, Quality, and Biologics functions — led by Nishit Mittal as Data Science Lead — engaged Wohlig to accelerate GenAI adoption across drug-development decision-making. The work followed a deliberate Workshop → Production arc: a 1-week capability demonstration first, then a continuous production-engineering relationship. In that second phase, Wohlig is building Project Cognito, DRL’s umbrella platform for drug prioritization and research automation, delivered as discrete production pillars. The first to ship is the IP Pillar, a multi-model pipeline that pairs Gemini for extraction with Claude as a judge across four analytical dimensions, runs on 10 parallel Cloud Run instances, retrieves from a ChromaDB vector store, and refreshes automatically on a bi-weekly schedule.
The Challenge
Capability Demonstration
Before committing to a long-term AI engineering engagement, DRL’s R&D leadership needed to see Wohlig build production-grade patterns end-to-end on real pharma use cases — not slideware.
Patent Landscape Complexity
A drug’s IP exposure spans composition-of-matter, formulation, device, and process patents — each with different inclusion logic, jurisdictional nuances, and litigation history.
Multi-Source Data Sprawl
Patent data lives across Espacenet, Google Patents, and the Indian Patent Database (IPD); clinical evidence spans 6+ international registries. Each source has its own schema, latency, and gaps.
LLM Fragility
Single-model patent analysis hallucinates classifications, misses contextual Tier-3 matches, and produces malformed JSON — none of which is acceptable in a system informing real drug-investment decisions.
Production Scale
A bi-weekly refresh across hundreds of drugs requires parallel compute, retry strategies, incremental processing, and cost discipline — far beyond what a notebook prototype provides.
Key Objectives
Workshop-First Demonstration: Build all three workshop modules end-to-end on Google Cloud (ADK + Vertex AI + Document AI + Vector Search + Cloud Run).
Multi-Model Verification: Use Gemini for extraction and Claude as judge to catch hallucinations on every field.
Tiered Patent Inclusion: Codify a Tier 1 / Tier 2 / Tier 3 taxonomy that surfaces every relevant patent, including non-obvious contextual matches.
Multi-Source Coverage: Index every relevant patent and clinical-trial source (Espacenet, Google Patents, IPD, ClinicalTrials.gov, PubMed, ChiCTR, EU CTR, CTRI India, JRCT Japan).
Production Compute: Parallel Cloud Run pipelines, Cloud Scheduler refresh, BigQuery storage, and an AlloyDB migration path.
Continuous Optimization: Cost monitoring, per-dimension evaluation metrics, and knowledge transfer to DRL.
The Solution: Two-Phase GenAI Engagement
Phase 1: 1-Week Workshop
A Google Cloud GenAI workshop delivered three hands-on modules end-to-end.
Module 1 was a multi-agent Intelligent Chatbot built on four ADK agents (Structured Data, Unstructured Data, Web Search, and a Response Aggregator) with custom RAG on Vertex AI Vector Search.
Module 2 was a Document Intelligence pipeline for FDA Complete Response Letter (CRL) analysis using Document AI plus four specialized agents (Checklist, Summary, Metadata, Cross-Reference).
Module 3 was an MCP (Model Context Protocol) server giving a natural-language interface to BigQuery and Cloud SQL, containerised on Cloud Run.
Phase 2: Project Cognito
In the ongoing engagement, Wohlig built Project Cognito’s IP Pillar end-to-end, scaling the workshop’s proven patterns into a production system.
IP Pillar Architecture
Gemini extracts, Claude judges; 10 parallel Cloud Run instances per run; a ChromaDB vector store with k=12 KNN cosine similarity; sliding-window overlap with section-aware chunking; a metadata pre-filter (year, jurisdiction, patent type, assignee); Cloud Scheduler bi-weekly refresh; and all evaluation fields stored in BigQuery.
Tier 1 / Tier 2 / Tier 3 Patent Inclusion
Drug-name, brand, and Orange Book references resolve to Tier 1; chemical-structure matches to Tier 2; and assignee plus device, formulation, and process signals with a product-specific link to Tier 3.
Multi-Source Data Integration
Espacenet and Google Patents are pre-fetched in parallel, a reverse-engineered IPD fetcher fills the Indian Patent Database gap, and six clinical trial registries are indexed.
Technology Stack
Vertex AI, Gemini, Claude, Agent Development Kit (ADK), Document AI, Vertex AI Vector Search, BigQuery, Cloud Run, Cloud Scheduler, Firestore, Cloud Storage, Secret Manager, ChromaDB (→ AlloyDB planned), FastAPI, and Python.
Key Benefits & Results
Previous: One-shot single-model LLM patent analysis with high hallucination risk.
Our Solution: Gemini extracts and Claude judges with parallel verification.
Result: Every field is cross-checked; failed checks trigger correction with confidence recalculation.
Previous: Tier-1-only patent search that misses contextual matches.
Our Solution: Tier 1 / Tier 2 / Tier 3 taxonomy.
Result: Captures composition-of-matter, formulation, device, process, and dosing patents that assignee-only or direct-mention search misses.
Previous: Single patent source coverage gaps.
Our Solution: Parallel Espacenet + Google Patents pre-fetch plus a reverse-engineered IPD fetcher.
Result: Patent data normally requiring millions in third-party fees, now in-house.
Previous: Tavily API cost for web search.
Our Solution: Migrated to the Vertex AI Google Search tool with domain restriction and keyword match.
Result: Lower cost, better coverage.
Previous: Sequential pipeline runs and slow refresh.
Our Solution: 10 parallel Cloud Run instances with CLOUD_RUN_TASK_INDEX work distribution and Cloud Scheduler automation.
Result: Production-ready bi-weekly refresh across hundreds of drugs.
Previous: Notebook prototypes only (Phase 1).
Our Solution: Production deployment on Cloud Run with BigQuery storage and monitoring.
Result: Workshop patterns scaled into a real production system in Phase 2.
Technical Innovation
Gemini + Claude Multi-Model Judging
Gemini extracts patent data; Claude evaluates and verifies every field in batches of 10 across 8 parallel API calls. Failed checks trigger correction with confidence recalculation — removing single-model hallucination risk entirely.
Tier 1 / Tier 2 / Tier 3 Patent Inclusion Logic
An explicit, codified taxonomy for direct drug, brand, and code mentions, chemical-structure matches, and contextual assignee plus product-type matches. It catches the Tier 3 patents most pipelines miss.
Mandatory Blocking Category Classification
Every patent receives a non-empty classification (composition_of_matter, formulation, device, and more), read directly from the patent claims rather than the title or abstract. This drives downstream dimension routing.
Reverse-Engineered IPD Fetcher
Replaces a third-party service that charges millions for Indian Patent Database fields. Built in-house, it was immediately cost-positive in operation.
Parallel Cloud Run Compute
10 instances per pipeline run with CLOUD_RUN_TASK_INDEX work distribution, exponential-backoff retries for Gemini and Claude rate limits, and incremental processing (insert new, skip unchanged) — production scale, not POC scale.
Wohlig’s Approach
Workshop & capability demonstration — a 1-week hands-on build of three modules covering the agentic chatbot, document intelligence, and MCP-server data-lake access.
Kickoff & architectural design — defining the multi-pillar Project Cognito architecture, with the IP Pillar selected as Pillar 1.
Multi-model pipeline engineering — Gemini + Claude judge integration, ChromaDB retrieval, tiered inclusion logic, and Blocking Category Classification.
Multi-source data integration — Espacenet, Google Patents, and the reverse-engineered IPD fetcher; plus ClinicalTrials.gov, PubMed, ChiCTR, EU CTR, CTRI India, and JRCT Japan.
Production compute engineering — 10 parallel Cloud Run instances, Cloud Scheduler bi-weekly refresh, and BigQuery evaluation storage.
Evaluation framework & continuous iteration — per-dimension metrics (faithfulness, context precision, answer relevancy, cross-dimension coherence), knowledge transfer, the planned AlloyDB migration for production scale, and the upcoming Medical Potential, API Complexity, and Complexity Pillars.
About Dr. Reddy’s Laboratories
Dr. Reddy’s Laboratories Limited (DRL) is a Hyderabad-headquartered global pharmaceutical company specialising in generics, biosimilars, and proprietary products. Its R&D, Manufacturing, Quality, and Biologics functions are leading AI adoption across drug-development decision-making, regulatory document processing, and patent-landscape analysis.
About Wohlig Transformations Pvt. Ltd.
Founded in 2015, Wohlig Transformations specialises in GenAI and DevOps, with 160+ professionals across India and the UK.


