How PureHealth Built an AI-Powered CMS-1500 Form System with Supabase

How PureHealth Built an AI-Powered CMS-1500 Form System with Supabase

The Medical Billing Challenge

Medical billing is one of healthcare's most complex and time-consuming processes. PureHealth faced a critical operational bottleneck: data entry specialists spending 2-3 hours per CMS-1500 form, manually cross-referencing doctor prescriptions with medical codes, finding associated charges, and navigating cumbersome desktop applications.

With over 8,000 CPT codes and 80,000+ procedure codes to manage, the traditional approach was unsustainable. The solution? An intelligent RAG (Retrieval-Augmented Generation) system powered by Supabase's vector database capabilities.

The Legacy Problem: Static Data in MongoDB

Previously, all medical codes and billing data lived in MongoDB as static JSON files. While functional, this approach had significant limitations:

  • No vector search capabilities in MongoDB Compass
  • Manual code lookup requiring extensive medical coding knowledge
  • Desktop application dependency for Tebra integration
  • Time-intensive form completion process
  • High error rates due to manual cross-referencing

The data migration challenge seemed daunting with massive datasets, but Supabase's import capabilities and 80MB upload limits made the transition surprisingly smooth through strategic file splitting.

System Architecture: Vector-Powered Medical Code Intelligence

Core Database Design

PureHealth's new system centers around two primary code tables optimized for both traditional search and vector similarity:

ICD-10 Codes Table: Houses diagnostic codes with their descriptions, embeddings, and searchable text. This table enables intelligent mapping between doctor's written diagnoses and standardized medical codes.

Procedure Codes Table: Contains over 80,000 procedure codes with official names, embeddings, and keyword arrays. The vector embeddings capture semantic relationships between medical procedures described in various ways.

Both tables leverage Supabase's dual indexing strategy: GIN indices for traditional text search and IVFFlat indices for lightning-fast vector similarity searches.

Financial Integration Layer

The Procedure Code Charges Table creates the crucial link between medical procedures and billing amounts. This table connects procedure codes to their associated fees, modifiers, and descriptions, enabling automatic charge calculation based on identified procedures.

The system maintains comprehensive relationships across 34+ tables covering forms, users, form details, document types, prescriptions, and all CMS-1500 form field relationships—all managed seamlessly within Supabase's unified platform.

RAG-Powered Code Intelligence

Bidirectional Code Mapping

The system's intelligence lies in its ability to understand medical language in both directions:

Diagnosis to Code Mapping: When doctors write diagnoses in natural language, the RAG system identifies the corresponding ICD-10 codes through semantic similarity search. The system understands that "acute myocardial infarction" maps to specific diagnostic codes, even when written as "heart attack" or "MI."

Code to Diagnosis Translation: Conversely, when medical codes appear in prescriptions, the system translates them back to human-readable diagnoses, ensuring accuracy and completeness in form filling.

Procedure Recognition and Billing

The procedure code system goes beyond simple lookup. When analyzing doctor prescriptions, the RAG system:

  • Identifies medical procedures described in various terminologies
  • Maps to standardized CPT codes through vector similarity
  • Automatically calculates associated charges from the linked billing table
  • Determines visit types based on procedure combinations

This intelligent mapping eliminates the need for medical coding expertise among data entry staff.

The Transformation: From Hours to Minutes

Automated Workflow

The new PureHealth AI platform transforms the entire billing process:

Upload: Users simply upload doctor prescriptions and patient documents Validate: AI automatically extracts medical information, maps codes, and calculates charges Submit: Integrated Tebra submission streamlines insurance claim processing

What previously required 2-3 hours of specialized knowledge now completes in under 10 minutes with minimal human intervention.

Real-Time Processing with Edge Functions

Supabase's edge functions provide real-time processing capabilities that were impossible with the previous MongoDB setup. The system processes documents instantly, providing immediate feedback and validation to users.

The edge function architecture ensures:

  • Sub-second response times for code lookups
  • Real-time form validation as users input data
  • Instant charge calculations when procedures are identified
  • Live document processing without page refreshes

Technical Advantages of the Migration

Vector Search Capabilities

The migration from MongoDB to Supabase unlocked powerful vector search capabilities that transformed code matching accuracy. Instead of exact text matching, the system now understands semantic relationships between medical terms.

Medical professionals describe the same condition in countless ways. The vector embeddings capture these nuances, ensuring accurate code mapping regardless of terminology variations.

Unified Data Platform

Supabase's PostgreSQL foundation eliminated the complexity of managing separate systems for relational data and search functionality. All 34+ tables work together seamlessly, enabling complex queries that combine traditional joins with vector similarity searches.

Simplified Data Migration

Despite initial concerns about migrating massive datasets, Supabase's import tools made the transition remarkably smooth. The 80MB upload limit was easily managed through strategic file splitting, and the one-click import process eliminated migration complexity.

Performance Impact and Results

Operational Efficiency

The transformation delivered dramatic improvements across all metrics:

  • Processing Time: Reduced from 2-3 hours to 10 minutes (94% reduction)
  • Accuracy: Eliminated manual lookup errors through automated code mapping
  • User Experience: Simplified interface replaced complex desktop applications
  • Scalability: Cloud-based system handles concurrent users effortlessly

Cost Reduction

By eliminating the need for specialized medical coding knowledge, PureHealth reduced training requirements and enabled general staff to handle complex billing tasks. The automated charge calculation prevents billing errors that could result in claim rejections.

Integration Benefits

The Tebra integration streamlines the entire insurance claim process. Forms automatically populate with accurate codes and charges, reducing submission errors and accelerating reimbursement cycles.

Key Technical Insights

Vector Embeddings for Medical Data

Medical terminology presents unique challenges for traditional search systems. Vector embeddings excel at capturing the semantic relationships between medical terms, abbreviations, and colloquial descriptions.

The system learned that "BP" relates to "blood pressure," "DM" connects to "diabetes mellitus," and countless other medical abbreviation patterns. This understanding enables accurate code mapping regardless of how doctors document conditions.

Hybrid Search Strategy

The most effective approach combined vector similarity with traditional text search. While embeddings capture semantic meaning, exact text matches remain important for specific code lookups. The dual indexing strategy ensures optimal performance for both search types.

Real-Time Data Processing

Edge functions proved crucial for user experience. Real-time processing eliminates the frustrating delays common in traditional batch processing systems. Users receive immediate feedback, enabling rapid form completion and validation.

Lessons Learned

Database Design for Medical Applications

Medical data requires careful consideration of relationships and search patterns. The separation of codes, charges, and metadata into distinct tables enabled optimized indexing strategies while maintaining data integrity.

Vector indices require tuning for medical terminology. The IVFFlat configuration with appropriate list parameters proved essential for maintaining search performance as the dataset grew.

Migration Strategy

Large dataset migrations benefit from strategic planning. Breaking files into manageable chunks and leveraging Supabase's import tools made what seemed like a complex migration surprisingly straightforward.

Testing vector search accuracy with medical terminology required domain expertise. Collaborating with medical professionals during development ensured the embeddings captured clinically relevant relationships.

Future Enhancements

Predictive Analytics

The rich dataset enables predictive capabilities for claim approval likelihood, optimal billing strategies, and fraud detection. Machine learning models could analyze historical patterns to optimize billing success rates.

Multi-Modal Processing

Future versions could incorporate image processing for handwritten prescriptions, voice recognition for dictated notes, and structured data extraction from various document formats.

Advanced Integration

Deeper integration with electronic health records (EHR) systems and insurance platforms could create end-to-end automation from patient visit to payment processing.

Conclusion

PureHealth's transformation from a manual, error-prone billing process to an AI-powered automation platform demonstrates the revolutionary potential of modern vector databases in healthcare applications.

By leveraging Supabase's unified platform combining PostgreSQL reliability with vector search capabilities, PureHealth created a system that not only dramatically improves efficiency but also enhances accuracy and user experience.

The 94% reduction in processing time—from hours to minutes—represents more than operational efficiency. It enables healthcare providers to focus on patient care rather than administrative burden, while ensuring accurate billing that supports sustainable healthcare delivery.

This implementation showcases how thoughtful application of vector database technology can solve complex real-world problems, transforming industries that rely on semantic understanding and relationship mapping. The success lies not just in the technical sophistication, but in the practical impact: making healthcare billing accessible, accurate, and efficient for everyone involved.

The future of medical billing is intelligent, automated, and built on the foundation of vector-powered semantic understanding—exactly what PureHealth delivered with Supabase.

Siddhesh Shirodkar
Siddhesh Shirodkar
Tech Lead
Acquia Certification: Tips & Resources by Joshua Fernandes

The Story of My First Acquia Certification

Joshua Fernandes
Coding Tips, Duke Experience

Coding Tips, Duke Experience

Sandeep Kumar
Case Study – Protein Smoothies (Mobile App)

Case Study – Protein Smoothies (Mobile App)

MOHAN PAI