Trinity by Ability.AI is a self-hosted, open-source agent runtime designed from the ground up for sovereign production deployments. Unlike framework libraries that require practitioners to add every production safety control, Trinity ships with governance as a core feature. This assessment evaluates Trinity against the eight PSF domains to determine where it covers practitioners natively, and where additional controls are still required.
trinity install <scaffold>.Trinity provides structural input governance through per-agent scope definitions and RBAC controls over which users and channels can trigger each agent. Native prompt injection resistance and PII detection require practitioner-added controls.
Trinity enforces input governance at the deployment and routing layer: each agent is scoped to specific channels (Slack, Telegram, WhatsApp, webhook), specific RBAC roles (admin, creator, operator, user), and specific task classes through its configuration. This means that an agent deployed to handle HR queries cannot be triggered by an external webhook intended for a customer support agent, and a user-role caller cannot invoke an operator-privileged workflow. This structural scoping is a meaningful input governance control that most frameworks lack entirely. Where Trinity does not yet provide native controls is at the semantic input layer: there is no built-in prompt injection detection, no PII classifier, and no mechanism to validate that the content of a user message conforms to an expected structure before it enters the agent's context. Practitioners deploying Trinity into environments where users submit free-text inputs must implement these controls at the application layer — before the message is routed to the Trinity agent.
Trinity's built-in approval queue provides human-in-the-loop review of consequential outputs before they act on downstream systems. Automated schema validation of agent outputs is not natively enforced and requires practitioner implementation.
Trinity's approval queue is one of its most important production features. Any agent action can be configured to require human approval before execution — the agent pauses, presents the proposed action to a designated reviewer, and only proceeds once approved. This is a meaningful output control for irreversible or high-stakes actions: an agent about to send an email, update a customer record, or trigger a financial transaction can be held at the approval gate until a human has verified the output is correct. What the approval queue does not provide is automated output validation: there is no built-in mechanism to assert that an agent's response conforms to a defined JSON schema, that it does not contain PII that should be redacted before delivery, or that it falls within permitted content categories. These semantic output controls must be added by the practitioner. For multi-agent workflows within Trinity, intermediate outputs flow between agents without automatic schema enforcement — a corrupted intermediate output can propagate to subsequent agents.
Trinity's self-hosted, sovereign architecture provides the strongest possible foundation for data protection. Data never leaves your perimeter. Agent state persists in your GitHub. No vendor has access to your operational data.
Trinity's data protection posture is fundamentally different from cloud-hosted agent platforms, and the difference is structural rather than configurable. Because Trinity runs inside your own infrastructure (Docker, on-premise or private cloud), every piece of data processed by an agent — user inputs, agent context, tool call payloads, outputs — stays within your network perimeter. There is no Trinity cloud that receives copies of your operational data. There is no telemetry pipeline sending agent conversation logs to Ability.AI's servers. Agent state is persisted in your own GitHub repository, which you control and can subject to your existing data governance controls. This architecture makes Trinity compliance-friendly by default for regulated data categories. An MSP deploying Trinity for a healthcare client can assert that no patient data leaves the client's environment. A financial services firm can assert that client financial data is processed only within their regulatory perimeter. These assertions are structurally true rather than dependent on vendor data processing agreements. The tamper-evident audit log (hash-chained, append-only, CSV/JSON export) provides the evidence trail needed for regulatory audit.
Trinity provides production-grade observability natively: OpenTelemetry tracing, per-agent cost tracking, execution replay, and a hash-chained audit log. This is the most complete native observability of any agent runtime we have assessed.
Observability is where Trinity most clearly differentiates from other agent runtimes. Most frameworks provide development-grade logging that requires significant additional tooling to become production-grade. Trinity ships with OpenTelemetry instrumentation, which means traces are emitted in a standard format compatible with any OTLP-compatible backend (Datadog, Grafana Tempo, Honeycomb, Jaeger, and others). Every agent execution is captured as a structured trace with per-step spans. In addition to OTEL tracing, Trinity provides execution replay — the ability to re-run a past agent execution against its original inputs for debugging or audit purposes. Per-agent cost tracking gives practitioners a clear view of LLM spend per workflow and per agent, which is essential for client billing in MSP deployments. The tamper-evident audit log is particularly significant for regulated environments: it is hash-chained and append-only, meaning any retrospective alteration is cryptographically detectable. This is the level of audit evidence that financial services, healthcare, and government clients require. For MSPs, this means you can provide clients with a verifiable record of every action their AI agents took.
Trinity provides per-agent guardrails, Docker-based isolation, channel scope enforcement, and approval queues as deployment safety controls. The self-hosted model eliminates shared infrastructure risks. Action budgets require explicit practitioner configuration.
Trinity's deployment safety model starts from a fundamentally safer baseline than cloud-hosted runtimes. Because each Trinity deployment is isolated within Docker on your own infrastructure, there is no shared-tenancy risk: your agents cannot be affected by other customers' workloads, and your credentials are not stored on a shared platform. The per-agent guardrail system constrains each agent's permitted tools, data access, and operational scope at configuration time — an agent scoped to read customer support tickets cannot be reconfigured at runtime to access financial records. Channel scope enforcement adds another layer: an agent deployed to a specific Slack channel cannot be invoked from a webhook or a different channel without explicit reconfiguration. The approval queue functions as a production circuit breaker: for any action type classified as consequential, the workflow pauses for human review rather than executing autonomously. Where practitioners must still add controls is at the resource budget layer: Trinity does not natively enforce per-run LLM token budgets or maximum execution time limits. For workflows that could theoretically run indefinitely (e.g., research agents with recursive tool use), practitioners should configure explicit timeout and cost-ceiling controls at the deployment layer.
Trinity's approval queue is a first-class runtime primitive, not an afterthought. Human oversight can be enforced at any workflow step, for any action category, with reviewer assignment and audit trail. This is the most complete native oversight mechanism we have assessed.
Most agent frameworks treat human oversight as a design pattern — something the practitioner builds on top of the framework by structuring their workflow to include a pause. Trinity treats it as a runtime primitive. The approval queue is a built-in component of the agent runtime, not an external add-on. When an agent reaches an action requiring approval, it automatically creates an approval task, assigns it to the configured reviewer (by role or by identity), and holds execution until the reviewer approves or rejects. Rejection can trigger alternative workflow paths. Approvals are recorded in the tamper-evident audit log with the reviewer's identity and timestamp. This design makes it structurally easy to comply with the PSF Domain 6 requirement that consequential actions require human review before execution — the infrastructure for that requirement is already in place. Practitioners are not required to build a bespoke pause-and-notify mechanism; they configure which action types require approval, and Trinity handles the rest. For MSPs, this means you can credibly tell clients that their AI agents cannot take any irreversible action without a named human approving it — and provide the audit log entry as evidence.
Trinity has been independently security-audited by UnderDefense. RBAC with four defined roles enforces least-privilege access. The self-hosted model eliminates the SaaS platform attack surface. Tamper-evident logging supports incident investigation.
Trinity's security posture rests on four pillars. First, independent audit: UnderDefense conducted a formal security assessment of Trinity, which distinguishes it from frameworks that are self-asserted as secure. Second, RBAC: the four-role model (admin, creator, operator, user) enforces least-privilege access with clear separation between those who configure agents (creators), those who operate them (operators), and those who use them (users). Third, self-hosted architecture: because Trinity runs in your infrastructure, there is no SaaS platform boundary to compromise. An attacker who wants to access your agents' data must compromise your infrastructure directly — there is no vendor platform to target. Fourth, tamper-evident audit log: the hash-chained append-only audit log makes it cryptographically difficult to cover up a security incident after the fact, which both deters insider threats and supports post-incident forensics. For MSPs serving clients with security-sensitive requirements (financial services, healthcare, government, legal), Trinity's security credentials are materially stronger than any cloud-hosted alternative and are evidenced by third-party audit rather than vendor attestation alone.
Trinity is open source and self-hosted. Agent state lives in your GitHub. Operations are not dependent on Ability.AI's uptime. You can fork, modify, and run the platform indefinitely without vendor involvement. This is the highest vendor resilience of any runtime we have assessed.
PSF Domain 8 addresses the risk that a vendor dependency becomes a single point of failure — that your AI agents stop working if a vendor changes pricing, discontinues a product, has an outage, or exits the market. Trinity eliminates this category of risk entirely through its architecture. Being open source, the codebase is available for inspection, forking, and modification under its licence. Being self-hosted, your running instance is not dependent on Ability.AI's infrastructure for operation — a network partition between your environment and Ability.AI's website has no effect on your deployed agents. Agent state being stored in your GitHub means that your agents' operational memory is under your version control, not in a vendor database. In a scenario where Ability.AI ceased to operate tomorrow, your deployed Trinity instances would continue running, your state would be intact, and you could maintain the platform from the open-source codebase indefinitely. For MSPs deploying AI infrastructure on behalf of clients, this resilience profile is commercially significant: you can credibly commit to multi-year managed service agreements without the caveat 'unless our vendor changes the product or pricing.'
Trinity's architecture maps directly to the MSP managed services model. You deploy and operate Trinity infrastructure on behalf of clients, within the client's own environment or a dedicated managed environment. The client never shares a multi-tenant cloud platform with other organisations. The client owns their data, their agent state, and their audit logs. You provide the operational expertise: deployment, configuration, certification, monitoring, and incident response.
This is a differentiated service that cloud-hosted AI platforms cannot replicate. When a client asks 'where does our data go?', the answer is 'nowhere — it stays in your environment.' When they ask 'what happens if the vendor changes pricing?', the answer is 'nothing — the platform runs independently of the vendor.' These are answers that only sovereign infrastructure makes possible.
The PSF certification path for Trinity MSP practitioners:
Trinity is the right runtime when any of the following requirements apply: regulated data that cannot leave the client's perimeter; compliance obligations requiring tamper-evident audit evidence; security posture that cannot accommodate a SaaS agent platform; multi-year service commitments that cannot tolerate vendor platform risk; or client size and workload that justify the operational overhead of self-hosted infrastructure.
It requires more initial setup than a hosted platform — you are deploying and operating infrastructure rather than clicking through a SaaS onboarding flow. For MSPs, this operational complexity is the service. The client is not paying for a SaaS subscription; they are paying for certified professionals who know how to deploy, configure, and operate sovereign AI infrastructure safely.
Trinity is less appropriate for exploratory deployments where quick iteration matters more than sovereignty, for small organisations without the IT infrastructure to run Docker-based systems, or for use cases with no regulated data and no compliance obligations — in those cases, a simpler hosted runtime may be adequate. The decision criteria should always start with the data and compliance requirements, not the technology preference.
The AIDA examination tests applied PSF knowledge across all eight domains — exactly the gaps and strengths covered in this assessment. 15 minutes. No charge. Ever.