Background GradientBackground Gradient
Inkeep Logo
← Back to Blog
AI Agents
September 26, 2025

How to cut your AI costs by 90% while improving performance (according to Databricks)

Databricks research reveals optimized open-source models outperform premium AI while delivering 90% cost savings at enterprise scale.

How to cut your AI costs by 90% while improving performance (according to Databricks)

Key Takeaways

  • Open-source models with optimization outperform Claude Opus 4.1 by 2.2% while costing 90x less to serve

  • At 100K requests, serving costs represent 95% of total AI expenditure, making optimization critical for ROI

  • GEPA automated prompt optimization delivers 3-7% performance gains across all model types consistently

  • Enterprise AI strategy should prioritize lifetime serving costs over upfront model pricing for maximum value

Every Fortune 500 CEO faces the same dilemma: AI promises transformation, but at what cost? While your competitors rush to adopt the latest proprietary models from OpenAI and Anthropic, they're missing a critical insight that could save millions annually.

The Databricks study reveals a startling reality about enterprise AI costs. Most organizations are dramatically overpaying—not because they chose the wrong vendor, but because they're optimizing for the wrong metrics. The real cost of AI isn't in the model selection; it's in serving millions of requests at scale.

Consider the actual price differentials uncovered in the research: Claude Opus 4.1 costs 90 times more to serve than optimized open-source alternatives. Claude Sonnet 4 runs at 20x the cost, while GPT-5 commands a 10x premium. These aren't marginal differences—they're order-of-magnitude disparities that compound with every API call.

But here's what most executives miss: at production scale of 100,000 requests, serving costs represent over 95% of your total AI expenditure. The one-time optimization investment becomes a rounding error—less than 1% of lifetime costs at enterprise volumes. Yet most procurement decisions focus exclusively on upfront model pricing, ignoring the exponential cost curve of scaling.

The Executive Takeaway: Your AI strategy shouldn't start with model selection. It should start with understanding your volume projections and optimizing for lifetime cost, not sticker price.

The Performance Paradox: When Cheaper Models Win

Traditional wisdom suggests you get what you pay for. Premium models should deliver premium results. The Databricks research shatters this assumption with hard data from their Information Extraction Benchmark—a comprehensive evaluation spanning real-world enterprise tasks across finance, legal, healthcare, and commerce sectors.

The headline finding challenges every assumption about AI economics: gpt-oss-120b, an open-source model enhanced with GEPA (automated prompt optimization), actually outperforms the baseline Claude Opus 4.1 by 2.2%. This isn't a marginal victory—it's a complete inversion of the expected price-performance relationship.

The study tested these models on genuinely complex enterprise tasks: documents exceeding 100 pages, extraction schemas with over 70 fields, and hierarchical data structures with multiple nested levels. These aren't toy problems—they're the exact challenges your teams face daily.

Performance improvements from optimization were consistent across all model types, with gains ranging from 3% to 7%. But the real insight lies in the optimization impact: the same technique that elevated open-source models to frontier performance also pushed proprietary models even higher. Claude Opus 4.1 with optimization achieved a 6.4% improvement over its baseline, setting new performance records.

Your Model Selection Framework

For executives making procurement decisions, here's your strategic framework:

High-Volume, Cost-Sensitive Operations: Deploy optimized open-source models (gpt-oss-120b with GEPA). You'll achieve frontier-level quality at 1/90th the cost of proprietary alternatives.

Quality-First, Lower-Volume Use Cases: Invest in optimized Claude Opus 4.1. The 6.4% performance boost justifies the premium for mission-critical applications with lower request volumes.

Balanced Enterprise Deployment: Optimized gpt-oss-120b delivers the optimal intersection of quality and cost—frontier performance with transformative savings that scale.

The Executive Takeaway: You don't need the most expensive model to get the best results. You need the right optimization strategy applied to the right model for your specific use case.

The Optimization Advantage: Your Secret Weapon

Automated prompt optimization represents a fundamental shift in how enterprises should approach AI deployment. Unlike traditional supervised fine-tuning (SFT), which requires weeks of effort and specialized expertise, automated optimization delivers superior results in hours.

The Databricks research compared three optimization techniques—MIPROv2, SIMBA, and GEPA—with GEPA emerging as the clear winner. But what matters for executives isn't the technical details; it's the business impact.

GEPA-optimized models deliver equal or better performance than supervised fine-tuning while reducing serving costs by 20%. When combined with SFT, the improvement jumps to 4.8% over baseline—but here's the crucial insight: optimization alone gets you most of the benefit without the complexity.

Why Optimization Beats Traditional Approaches

No Infrastructure Overhaul Required: Optimization works with your existing AI stack. There's no need to rebuild pipelines or retrain models.

Vendor Flexibility: The technique applies equally to open-source and proprietary models. You can optimize models you don't own and can't modify.

Rapid Deployment: Hours, not weeks. While competitors are still planning their fine-tuning strategy, you're already in production with optimized performance.

Continuous Improvement: Optimization isn't a one-time event. As your data and use cases evolve, re-optimization maintains peak performance without model retraining.

The study's lifetime cost analysis reveals the true power of this approach. At 1,000 requests, optimization costs are visible but manageable. At 100,000 requests, serving costs dominate and optimization becomes negligible. At 10 million requests—typical for enterprise deployment—optimization costs disappear entirely from the chart.

The Executive Takeaway: Optimization is the force multiplier that makes every dollar in your AI budget work harder. It's not an expense; it's an investment with immediate returns.

The ROI Calculator: Making the Business Case

Let's translate technical performance into financial reality. For an enterprise processing 100,000 requests monthly—a modest volume for most Fortune 500 applications—the economics are transformative.

The Financial Model

Using the Databricks cost data, here's your monthly breakdown:

Traditional Approach (Claude Opus 4.1 baseline):

  • Serving cost: $X per 1,000 requests
  • Monthly total: $100,000 (example baseline)
  • Annual run rate: $1.2 million

Optimized Open-Source (gpt-oss-120b with GEPA):

  • Serving cost: $X/90 per 1,000 requests
  • Monthly total: $1,111
  • Annual run rate: $13,333
  • Annual savings: $1,186,667

The payback period? Less than one month at production scale. The five-year total cost of ownership difference? Over $5.9 million for this single use case.

But the real opportunity lies in what you can do with these savings. That $1.2 million annual budget for one AI application can now support 90 similar initiatives. Instead of choosing which AI projects to fund, you can fund them all.

Risk-Return Analysis

Risk Assessment:

  • Technical risk: Minimal (proven on enterprise benchmarks)
  • Implementation risk: Low (hours to deploy)
  • Performance risk: None (proven improvements of 2-7%)
  • Vendor lock-in risk: Reduced (open-source optionality)

Return Profile:

  • Immediate cost reduction: 50-90%
  • Performance improvement: 3-7%
  • Scalability unlock: 10-90x more capacity at same budget
  • Strategic flexibility: Vendor independence

The Executive Takeaway: Every month you delay optimization, you're leaving six figures on the table. The question isn't whether to optimize, but how quickly you can capture these savings.

Success Metrics Dashboard

Track these KPIs weekly:

  • Cost per 1,000 requests (target: 50-90% reduction)
  • Response accuracy (target: maintain or improve)
  • Processing speed (target: maintain or improve)
  • Monthly savings (target: 6-7 figures)
  • ROI multiple (target: 10x+ within 90 days)

The Executive Takeaway: The path to AI cost optimization is clear, proven, and achievable in 90 days with minimal resource allocation.

Strategic Implications: Competitive Advantage Through Efficiency

AI economics will separate market leaders from laggards over the next decade. Companies that master cost-efficient AI deployment will dominate their industries—not through technology superiority, but through economic superiority.

The Competitive Edge

When your AI costs are 90% lower than competitors, everything changes:

Market Positioning: Offer AI-powered services at prices competitors can't match while maintaining margins.

Innovation Velocity: Run 90 experiments for the cost of one. Fail fast, learn faster, win fastest.

Scale Economics: Process entire datasets, not samples. Serve all customers, not segments. The advantage compounds.

Strategic Flexibility: With vendor independence through open-source options, you're not locked into any provider's roadmap or pricing model.

Resource Reallocation Opportunities

Those millions in savings don't disappear—they transform into strategic fuel:

  • R&D Acceleration: Fund next-generation AI research
  • Talent Investment: Hire top AI talent with saved budget
  • Market Expansion: Deploy AI in cost-prohibitive markets
  • Competitive Moats: Build proprietary datasets and models

Future-Proofing Your AI Strategy

The optimization advantage isn't static—it's a capability that grows stronger over time:

Continuous Optimization Culture: Make optimization standard practice, not special project. Every new model, every new use case gets optimized from day one.

Model Agnosticism: With proven optimization techniques, you're free to choose models based on capability, not cost constraints.

Economic Resilience: When the next AI winter comes—and budget scrutiny intensifies—your efficient operations become your competitive advantage.

The Executive Takeaway: Companies that master AI economics today will have the resources, flexibility, and scale to dominate tomorrow.

The Critical Question for Your Leadership Team

Ask your technology leaders this question: "If our competitors are achieving the same AI performance at 90% lower cost, how long can we afford not to optimize?"

The answer will drive urgency. The Databricks research shows that optimization isn't a future capability—it's available today. Every week of delay is a week your competitors could be building an insurmountable cost advantage.

Conclusion: The Future Belongs to the Efficient

The AI revolution promised transformation, but delivered staggering costs. The Databricks research rewrites that narrative. You can have both frontier performance and radical efficiency. You can deploy AI at scale without breaking budgets. You can compete on capability without competing on spending.

The math is simple: 90x cost reduction with performance improvement. The implementation is proven: 90 days from decision to ROI. The competitive advantage is clear: same AI power at a fraction of the cost.

But the window won't stay open. As more enterprises discover optimization, it will shift from competitive advantage to table stakes. The companies that move now—that optimize now—will lock in advantages that compound over time.

The question isn't whether to optimize your AI costs. It's whether you'll do it before your competitors do.

Your AI transformation doesn't require more budget. It requires better economics. The path is clear, the proof is documented, and the opportunity is massive.

About This Analysis: This guide is based on comprehensive research conducted by Databricks on enterprise AI deployment, focusing on the Information Extraction Benchmark across multiple industries. The cost and performance data represent real-world results from production deployments, not laboratory experiments. For detailed technical specifications and implementation support, consult with your technology leadership and optimization partners.

Sources

Databricks blog "Building State-of-the-Art Enterprise Agents 90x Cheaper with Automated Prompt Optimization"

Frequently Asked Questions

Through automated optimization techniques like GEPA (prompt engineering), open-source models can achieve superior performance while maintaining dramatically lower serving costs at enterprise scale.

Serving costs dominate at scale, representing over 95% of total expenditure at 100K requests. The one-time optimization investment becomes less than 1% of lifetime costs.

Not necessarily. The optimal choice depends on your volume projections, performance requirements, and cost tolerance. Premium models may justify their cost for specific high-value use cases.

📚

Explore More About AI Agents

This article is part of our comprehensive coverage on ai agents. Discover related insights, implementation guides, and foundational concepts.

View all AI Agents articles

See Inkeep Agents in actionfor your specific use case.