Background GradientBackground Gradient
Inkeep Logo
EVALUATION RUBRIC

Enterprise AI Agents
Evaluation Rubric

Summary

  • A structured framework for objectively evaluating AI agent platforms across eight critical dimensions—from no-code building to enterprise security—enabling confident, criteria-driven decisions for your customer experience technology investments.

9Dimensions
36Capabilities
Dimension 13 capabilities

Building Agents

No-code visual builder to build agents

A drag-and-drop interface that allows non-technical users to create and modify AI agent workflows or teams of Agents without writing code.

Evaluation Criteria

Look for visual workflow builders, flowchart-style interfaces, or GUI-based agent configuration tools. Must be accessible to business users, not just developers.

Examples

  • Zapier-style workflow builders
  • Microsoft Power Platform-like interfaces
  • Visual conversation flow designers

Agents Configurable via Developer SDK

Comprehensive software development kits that provide pre-built functions, classes, and utilities for building AI agents programmatically. This would not include SDKs for just talking to or using agents or AI functionality, it must be an SDK, typically TypeScript or Python, that fully defines how an agent works and what it does in a declarative way.

Evaluation Criteria

Must include documentation, code examples, and framework support (like React, FastAPI, etc.). Look for official SDKs, not just API wrappers.

Examples

  • Official npm packages
  • PyPI packages
  • GitHub repositories with framework integrations

2-way sync between code and UI

Changes made in the visual builder automatically update the underlying code, and code changes reflect in the UI interface.

Evaluation Criteria

Must demonstrate bidirectional synchronization. Changes in either interface should be reflected in the other without data loss.

Examples

  • Export to code from visual builder
  • Import code changes back to visual interface
Dimension 26 capabilities

Developer Platform

Take actions on any MCP Server, App, or API

Support for Model Context Protocol servers, enabling standardized tool and data source integrations.

Evaluation Criteria

Must explicitly support MCP protocol or demonstrate compatibility with MCP servers. Look for MCP-specific documentation or integrations.

Examples

  • MCP server integrations
  • MCP protocol support documentation
  • Standardized tool interfaces

Multi-agent Architecture

Systems that coordinate multiple specialized agents using graph-based workflows or decision trees.

Evaluation Criteria

Must support multiple agents working together with defined relationships and handoff logic. Look for visual workflow representations or agent collaboration features.

Examples

  • Agent workflow diagrams
  • Specialist agent routing
  • Multi-agent conversations
  • Task delegation systems

Multi-agent Coordination

Support for both delegating tasks to sub-agents while maintaining control, and fully handing off conversations to specialized agents.

Evaluation Criteria

Must demonstrate both patterns - delegation (supervisor remains involved) and handoff (full transfer of control). Should show clear examples of each.

Examples

  • Supervisor agents delegating to specialists
  • Seamless handoffs between support tiers
  • Escalation workflows with different control patterns

Talk to Agents via A2A, MCP, and Vercel AI SDK formats

Direct communication channels between agents without human intervention, enabling collaborative problem-solving.

Evaluation Criteria

Must show agents communicating directly with each other, sharing context, or collaborating on tasks. Should be more than just sequential workflows.

Examples

  • Agents sharing findings
  • Collaborative problem-solving
  • Peer-to-peer agent communication
  • Agent consensus mechanisms

Agent Credential and Permissions Management

Individual authentication and authorization systems for each agent, allowing different access levels and API keys.

Evaluation Criteria

Must allow different agents to have different credentials, API keys, or access permissions. Should support credential isolation and management.

Examples

  • Agent-specific API key management
  • Individual service account assignments
  • Per-agent permission systems

Agent traces in UI + OpenTelemetry

Detailed logging and tracing of agent actions with visual interfaces and industry-standard telemetry.

Evaluation Criteria

Must provide visual trace interfaces showing agent decision-making and support OpenTelemetry standards for observability.

Examples

  • Agent decision trees in UI
  • OpenTelemetry integration
  • Distributed tracing
  • Agent performance monitoring
Dimension 35 capabilities

Data Connectors

Automated ingestion of public sources (docs, help center, etc.)

Systems that automatically discover, crawl, and index publicly available information sources.

Evaluation Criteria

Look for web crawling capabilities, RSS feed ingestion, public API integrations, or automated content discovery. Must be ongoing, not one-time imports.

Examples

  • Website crawling
  • Documentation site ingestion
  • Public forum monitoring
  • News feed integration

Automated ingestion of private sources (Notion/Confluence)

Direct integrations that automatically sync content from private knowledge management systems.

Evaluation Criteria

Must have native integrations (not just manual uploads) with popular enterprise tools. Should handle permissions and access controls.

Examples

  • Notion API integration
  • Confluence Cloud connector
  • SharePoint sync
  • Google Drive integration

Optimized RAG with managed retrieval

Advanced retrieval-augmented generation with intelligent chunking, embedding optimization, and relevance scoring.

Evaluation Criteria

Look for advanced RAG features like semantic chunking, hybrid search, relevance tuning, or retrieval optimization. Must be more sophisticated than basic vector search.

Examples

  • Hybrid search (semantic + keyword)
  • Relevance tuning interfaces
  • Chunk optimization
  • Retrieval analytics

Real-time fetch from any database/API/web

Ability to query live data sources during conversations, not just pre-indexed static content.

Evaluation Criteria

Must demonstrate live API calls, database queries, or web scraping during agent interactions. Should handle authentication and rate limiting.

Examples

  • Live inventory lookups
  • Real-time pricing queries
  • Current weather data
  • Live database queries

Self-updating knowledge base

Automated systems that refresh and update the agent's knowledge (from internal & external sources like website & docs) without manual intervention.

Evaluation Criteria

Look for scheduled updates, webhook-based updates, or real-time syncing with data sources. Must handle changes automatically.

Examples

  • Auto-sync with documentation sites
  • Scheduled database refreshes
  • Webhook integrations for content updates
Dimension 47 capabilities

Interact with your AI agents in...

Claude, ChatGPT, and Cursor

AI agents are callable inside Claude, ChatGPT, and Cursor via each platform's native tool/action interface and can execute at least workflows end-to-end.

Evaluation Criteria

Evidence of a working, documented integration: official listing/docs + runnable setup + a successful end-to-end workflow in the target surface (no "theoretical support").

Examples

  • Claude Tool Use via Anthropic Messages API
  • ChatGPT Actions/Assistants action (manifest/OAuth)
  • Cursor editor extension or MCP that triggers the agent

Slack and Discord

Native bot integrations that let agents run tasks, respond, and interact within team chats (not just webhooks).

Evaluation Criteria

Must include native bot apps with rich, interactive features (slash commands, buttons, threads). One or more workflows must run fully inside Slack/Discord with proper auth and error handling.

Examples

  • Slack bot app with /command support
  • Interactive messages and channel triggers
  • Discord bot that responds to slash commands

Zendesk, Salesforce, and any Support Platform

Direct integrations with major CRM and customer service platforms for seamless workflow integration.

Evaluation Criteria

Must provide native integrations with ticket creation, customer data access, or workflow automation. Should be more than just API connections.

Examples

  • Zendesk ticket integration
  • Salesforce case management
  • CRM data synchronization
  • Workflow automation

Product Expert Chat Bubble ("Ask AI")

Dedicated conversational AI Agent for customer support that knows everything about the product and company that can search, cite, and handoff questions to other support questions when needed.

Evaluation Criteria

Must be able to be based on indexed data in a company's internal and external docs. Must be fully configurable for control and customization.

Examples

  • Inkeep Ask AI support feature

Answers with Inline Citations

Responses that include specific references to source documents with clickable links or clear attribution.

Evaluation Criteria

Must provide traceable sources for generated content. Look for clickable links, document references, or clear source attribution in responses.

Examples

  • Footnote-style citations
  • Inline source links
  • "According to [document]" attributions
  • Source confidence scores

Guardrails

Safety mechanisms that prevent inappropriate responses and confidence thresholds that trigger human escalation.

Evaluation Criteria

Must include content filtering, response confidence scoring, and automatic escalation when confidence is low. Should show safety mechanisms in action.

Examples

  • Content filtering systems
  • Confidence score displays
  • Automatic escalation triggers
  • Safety policy enforcement

Enterprise Search (Semantic search, Algolia Replacement)

Advanced search capabilities that understand context and meaning, not just keyword matching.

Evaluation Criteria

Must demonstrate semantic search capabilities across enterprise data sources with relevance ranking and context understanding.

Examples

  • Natural language search interfaces
  • Semantic relevance scoring
  • Cross-platform search capabilities
  • Search analytics
Dimension 53 capabilities

Insights & Analytics

Automatic Content Updates (AI Content Writer)

Built-in capabilities for automated generation of documentation or marketing copy, documentation based on product gaps and feature gaps discovered by AI Agents.

Evaluation Criteria

Must include AI content generation features specifically designed for creating new content automatically based on feature gaps and knowledge base gaps.

Examples

  • Auto-generated documentation drafts
  • Content suggestions based on gaps
  • AI-written FAQ entries

AI Reports on Knowledge Gaps

Analytics that identify what information is missing from the knowledge base.

Evaluation Criteria

Must provide insights into unanswered questions, missing information. Should include actionable recommendations.

Examples

  • "Unanswered questions" reports
  • Knowledge gap analytics
  • Content improvement suggestions

AI Reports on Product Feature Gaps

Analytics that identify what information is missing from the knowledge base or what features users are requesting.

Evaluation Criteria

Must provide insights into unanswered questions, missing information, or feature requests. Should include actionable recommendations.

Examples

  • Feature request tracking
  • User feedback aggregation
  • Product improvement suggestions
Dimension 64 capabilities

Building Agent UIs

Out-of-box Chat Components (JavaScript)

Pre-built, customizable JS user interface components that can be embedded in AI Agents chats.

Evaluation Criteria

Must provide actual JavaScript components, not just embeddable widgets specifically for AI Agent UI Chats. Should include customization options and documentation.

Examples

  • npm packages with JavaScript components
  • Embeddable chat widgets with customization APIs

Out-of-box Chat Components (React)

Pre-built, customizable React user interface components that can be embedded in AI Agents chats.

Evaluation Criteria

Must provide actual React components, not just embeddable widgets specifically for AI Agent UI Chats. Should include customization options and documentation.

Examples

  • npm packages with React components
  • Embeddable chat widgets with customization APIs

Interactive Components within Agent Messages (forms, cards, etc.)

UI elements that allow users to interact beyond simple text chat, including forms, buttons, cards, and other rich interactions.

Evaluation Criteria

Must support interactive elements within the chat interface. Look for form handling, button actions, card-based responses, and rich media support.

Examples

  • In-chat forms for data collection
  • Interactive buttons for quick responses
  • Carousel cards
  • File upload capabilities

Custom UIs using Vercel AI SDK format

Compatibility with Vercel's AI SDK formats and streaming protocols for web applications.

Evaluation Criteria

Must support Vercel AI SDK formats, streaming responses, or demonstrate integration with Vercel ecosystem. Look for specific SDK compatibility.

Examples

  • Vercel AI SDK integration examples
  • Streaming response support
  • Next.js compatibility
  • Vercel deployment guides
Dimension 73 capabilities

Authentication and Authorization

Single Sign-on (SSO)

Single Sign-On integration with automated user provisioning and deprovisioning through SCIM protocol.

Evaluation Criteria

Must support major SSO providers (Okta, Azure AD, etc.) and demonstrate SCIM-based user lifecycle management.

Examples

  • Okta integration
  • Azure AD SCIM connector
  • Automated user provisioning
  • Group-based access management

Role-Based Access Control

Granular permission systems that control what different user roles can access and modify.

Evaluation Criteria

Must provide role definition capabilities, permission matrices, and demonstrate granular access controls across platform features.

Examples

  • Admin/user/viewer roles
  • Feature-based permissions
  • Agent access controls
  • Resource-level permissions

Audit Logs

Comprehensive logging of all user actions, system changes, and agent interactions for compliance and security.

Evaluation Criteria

Must provide detailed audit trails with timestamps, user attribution, and action details. Should support log export and retention policies.

Examples

  • User action logs
  • System change tracking
  • Agent interaction logs
  • Compliance reporting features
Dimension 81 capability

Deployment

Hosting types

Flexible deployment options including both managed cloud services and on-premises installations.

Evaluation Criteria

Must offer both deployment models with feature parity. Self-hosted options should include installation guides and support.

Examples

  • SaaS platform options
  • On-premises installation packages
  • Hybrid deployment guides
  • Deployment comparison charts
Dimension 94 capabilities

Security

PII removal capabilities

Automated systems that detect and remove personally identifiable information from conversations and data.

Evaluation Criteria

Must demonstrate active PII detection and removal, not just data masking. Should include multiple PII types and configurable policies.

Examples

  • PII detection algorithms
  • Automatic redaction features
  • Data anonymization tools
  • Privacy policy enforcement

Uptime & support SLAs

Contractual commitments to system availability and support response times with measurable guarantees.

Evaluation Criteria

Must provide specific uptime percentages and support response time commitments. Look for SLA documentation and performance reporting.

Examples

  • 99.9% uptime guarantees
  • 24/7 support commitments
  • Response time SLAs
  • Performance dashboards

SOC2 Type II certified

Successfully completed SOC 2 Type II audit demonstrating operational effectiveness of security controls over time.

Evaluation Criteria

Must have valid SOC 2 Type II certification, not just SOC 2 Type I. Look for recent audit reports or compliance badges.

Examples

  • SOC 2 Type II certificates
  • Compliance page documentation
  • Third-party attestations

GDPR/HIPAA compliant options

Platform configurations and features that enable compliance with data privacy and healthcare regulations.

Evaluation Criteria

Must provide specific compliance features, not just general security. Look for data processing agreements, privacy controls, and compliance documentation.

Examples

  • Data processing agreements
  • Privacy control features
  • HIPAA business associate agreements
  • GDPR compliance guides
Enterprise Demo

See Inkeep Enterprise

Find a time with our Agent Solutions team to get an overview of Inkeep Enterprise and demo of Inkeep Agents for your use case.

Try OSS on GitHub
Ask AI