GraphRAGCost OptimizationAWS Migration

    From Serverless to Self-Hosted: Migrating DevMate to Neo4j and Pgvector

    December 15, 202512 min readRam Pakanayev

    TL;DR: To optimize for cost and latency in DevMate, I migrated the GraphRAG backend from AWS Neptune and OpenSearch to a self-hosted Neo4j and Pgvector stack running on Dockerized EC2 instances. This reduced cloud overhead by 60% and improved retrieval latency by keeping the data orchestration layer closer to the compute.

    The Setup: Building DevMate

    DevMate started as a code intelligence platform for enterprise clients. The initial architecture was pure AWS serverless: Neptune for graph queries and OpenSearch for vector search. Clean, managed, and... expensive.

    After three months in production, the monthly bill was eye-watering. It was time to rethink the stack.

    The Original Stack: Serverless Dreams

    AWS Neptune + OpenSearch

    • ✅ Fully managed—no operational overhead
    • ✅ Auto-scaling built in
    • ✅ Tight AWS integration (IAM, VPC, etc.)
    • ❌ Neptune: $0.348/hr minimum (db.r5.large) = $250+/month idle
    • ❌ OpenSearch: $0.244/hr minimum = $175+/month idle
    • ❌ Data transfer between services adds up fast

    The Migration: Docker on EC2

    The insight was simple: for a B2B product with predictable traffic, serverless scaling wasn't worth the premium. I migrated to:

    • Neo4j Community Edition in Docker (graph database)
    • PostgreSQL + Pgvector in Docker (vector search)
    • Both running on a single m6i.xlarge EC2 Reserved Instance

    💡 The Math: Neptune + OpenSearch = ~$500/month minimum. EC2 Reserved m6i.xlarge + EBS = ~$180/month. 60% cost reduction.

    The "Cloud vs. Docker" Trade-off

    Serverless databases (Neptune, OpenSearch) are excellent for scaling to millions of users. But for a B2B SaaS with predictable traffic, the high monthly minimums and "cold start" latency become bottlenecks, not features.

    Key insight: By running Neo4j and Pgvector as Docker containers on the same EC2 instance as the FastAPI backend, we eliminated network hops between services. RAG queries that previously took 200-300ms now complete in 50-80ms.

    Why Neo4j Over Neptune?

    Neptune uses a subset of Gremlin/SPARQL. Neo4j uses Cypher. For code analysis queries, Cypher is dramatically more readable:

    // Find all functions that call a specific API
    MATCH (caller:Function)-[:CALLS]->(api:ExternalAPI {name: "stripe"})
    RETURN caller.name, caller.file_path

    Plus: Neo4j's neo4j-graphrag library integrates vector search natively, reducing the "glue code" between systems.

    Why Pgvector Over OpenSearch?

    OpenSearch is powerful but overkill for pure vector similarity. Pgvector:

    • ✅ Native PostgreSQL—no new query language
    • ✅ HNSW indexes with 95%+ recall
    • ✅ Combine vector search with SQL JOINs in one query
    • ✅ Fraction of the memory footprint

    ⚠️ War Story: The migration took 2 weeks. The hardest part? Rewriting Neptune's traversal queries in Cypher. Pro tip: start with the most complex query first—if that works, the rest is easy.

    When to Stay Serverless

    This migration made sense because DevMate has predictable B2B traffic. If you have:

    • Highly variable traffic (0 to 10K users)
    • No DevOps capacity to manage containers
    • Strict compliance requirements (SOC2, HIPAA)

    ...then managed services are worth the premium. Know your trade-offs.

    Learn more about GraphRAG architecture in my AI Engineer Roadmap 2026.

    Want to Go Deeper?

    This is just one piece of the puzzle. Get the complete picture in my AI Engineer Roadmap.