A healthcare AI agent on AWS Bedrock AgentCore Runtime. The loop runs in AWS. Every credential, every authorization decision, every short-lived database lease still belongs to IBM Verify, HashiCorp Vault, and SPIFFE. Three identities, three trust boundaries, one clean request.
Self-host a LangGraph agent, and you own the loop. Your service holds the model client, parses tool-call blocks, runs the tools, and feeds results back until the model produces a final answer. The entire reasoning loop lives inside your infrastructure. It scales when you scale, crashes when you crash, and carries every credential the tools need because everything runs in your process.
AWS Bedrock AgentCore Runtime moves that loop out.
AgentCore Runtime is an AWS-managed container host purpose-built for agent loops. You hand it a model, a set of tools, and a system prompt. It runs the loop for you, in an isolated microVM per session, with eight-hour execution windows and per-second billing for the CPU time the loop actually uses. The time the loop spends waiting on the model is free. Your application stops being an agent host. It goes back to being an app.
But moving the loop changes a question. If AWS is now running my agent, where does identity live? Where do my secrets sit? Who decides whether this specific clinician, asking this specific question, gets to read this specific patient chart?
The loop can move. Trust does not have to.
AgentCore hosts the reasoning. IBM Verify, Vault, and SPIFFE host every identity decision.
This post walks through exactly how that separation works, in a real healthcare implementation we built end to end.
A healthcare AI assistant for clinicians. The scenario looks like this. A clinician signs into the application through IBM Verify, with multi-factor authentication. They open the chat. They ask "show me Sarah Brown's chart" and the assistant returns the patient record. They ask about a flagged sensitive patient and the assistant pauses, asks IBM Verify for a fresh step-up, gets explicit consent from the clinician, then returns the chart. They try to add a clinical note. The agent asks for write authorization. The clinician approves. The note lands in the database, attributed to the right clinician, against the right patient.
Three things make this implementation interesting, and they are the things this post is really about.
The agent holds no static credentials. There is no service account in a config file. There is no long-lived API key. There is no database password baked into a container. Every credential is short-lived, scoped to one tool call, and revoked the instant the call returns.
The clinician's identity reaches the database. When the SQL runs against the patient record, it runs under an ephemeral credential that was issued seconds earlier by HashiCorp Vault, on behalf of this clinician, for this exact operation. The audit log knows who asked. The query plan ran under their identity.
The model never sees the clinician's identity. The user's bearer token rides the transport, on the request headers. It never enters the conversation the model reads. The model reasons about the request. It does not get to know who is making it.
The agent loop runs in AWS. Everything that matters for security stays where it is auditable.
Each component has one job. Each can be tested in isolation. Each can be swapped without rewriting the others. That is the point.
| Component | Job |
|---|---|
| AWS Bedrock AgentCore Runtime | Hosts the agent loop in an isolated microVM per session. Streams responses. Scales from zero. AWS managed. |
| Strands Agent (Python) | The reasoning loop itself. Built fresh per invocation so each request carries its own user identity. |
| Claude Sonnet on Bedrock | The model. Stateless. Called per turn by the loop. Never sees user tokens. |
| MCP Server | Exposes the healthcare tools over Model Context Protocol. Owns every identity and credential decision. |
| IBM Verify SaaS | Identity provider. Issues access tokens. Performs Token Exchange (RFC 8693). Enforces RAR (RFC 9396) policies. Triggers step-up for sensitive operations. |
| HashiCorp Vault | Issues ephemeral database credentials. One credential, one tool call. Lease revoked when the call returns. |
| SPIFFE / SPIRE | Cryptographic workload identity for the MCP server. No Vault token on disk. No AppRole secret to pre-seed. |
| Calling application | Holds the user session. Forwards the bearer token. No model client, no loop logic, no tools. |
| Database | Patient records. Reached only with ephemeral credentials scoped to one operation at a time. |
Notice what is not on this list. There is no shared service account. No long-lived API key. No .env file with secrets in it. No vault token sitting in an environment variable. The whole stack runs without a single static credential.
If you take one thing from this post, take this. The single most important idea behind the architecture is that there are three separate identities flowing through one request, and they are not the same thing. Treat them as three, and the security model becomes clear. Treat them as one, and you accidentally hand the model a credential it should never see.
Three identities · three trust boundaries · one request
Application identity is your service's IAM identity in AWS. It is what gives your application permission to invoke the deployed AgentCore Runtime. The only IAM action it needs is bedrock-agentcore:InvokeAgentRuntime. On EC2 or ECS, this is an instance role discovered through the metadata service. On a developer laptop, it is an SSO profile. No static AWS keys on disk in production. Standard AWS pattern.
User identity is the clinician's IBM Verify access token. It rides the invoke call on the Authorization header. AgentCore Runtime does not consume it. It places the header on the agent's request context. The agent forwards it to the MCP server. The MCP server uses it as the subject token for RFC 8693 Token Exchange. The model never sees this token. It is on the transport, never in the conversation.
Workload identity is the MCP server's own cryptographic identity, issued by SPIFFE/SPIRE. The MCP server uses its SVID to authenticate to Vault when it needs an ephemeral database credential. There is no Vault token sitting on disk. There is no AppRole secret pre-seeded somewhere waiting to be exfiltrated.
AWS checks the first. IBM Verify checks the second. SPIFFE and Vault check the third. Each trust boundary is enforced by a different system. None of them substitutes for the others.
The principle
Wrap your application's IAM identity around the user's Verify identity, and let the workload's SPIFFE identity take it from there. Three identities, three verifiers, one clean request.
The cleanest way to reason about this architecture is to draw a line. On one side is what AgentCore Runtime is responsible for. On the other side is what stays in your hands.
In practice, this means your application stops doing five things it used to do. It no longer holds the model client. It no longer parses tool-call blocks. It no longer runs the tools. It no longer needs the agent's system prompt or tool definitions baked into it. It no longer scales with model traffic. AgentCore handles all of that.
What AgentCore does not own is identity. AgentCore Runtime does not consume the user's bearer token. It does not see the database. It does not know what RAR scopes were granted. AWS gives you an excellent hosting surface for the loop. You give yourself an excellent identity model behind the MCP server. The two systems are decoupled on purpose.
A note on AgentCore Identity
AgentCore ships an Identity service of its own, and for some workloads that is the right choice. We use IBM Verify instead because the agent's identity decisions shouldn't be isolated from the rest of your enterprise. They should be governed by the same system that already controls access everywhere else. The point is choice. AgentCore Runtime works with whichever identity provider you bring to it.
The tools are where every identity decision actually happens. They run wherever the agent can reach them, with access to IBM Verify, Vault, and the database. Whether they are exposed as an MCP server, embedded in the AgentCore agent runtime, or hosted as Lambdas, every tool call follows the same four-step rhythm: introspect the user token, exchange it with the agent's SPIFFE actor identity, mint the verify-rar credential, run the query and revoke the lease.
authorization_details object per RFC 9396. Not "give me read." "Give me permission to read the chart for this exact patient, on behalf of this exact clinician, for the next sixty seconds." IBM Verify grants those exact details or returns an error.This rhythm runs for every tool call. List patients. Read a chart. Write a note. Look up labs. Update a medication. There is no shared connection pool with a service account. There is no long-lived API key. There is the loop on AgentCore, the user token on the transport, and the MCP server's four-step dance.
If the patient is flagged as sensitive, the policy in step two requires a step-up. The clinician's phone buzzes. They approve. The exchange completes and the rhythm continues. If they do not approve, IBM Verify returns an error, the tool call fails cleanly, and the agent relays the rejection without retrying. The model gets "access denied," the clinician sees a clear message, the audit log records the attempt.
It helps to walk a single request through the stack from start to finish. The flow is short, but every hop matters.
A clinician opens a chart for a sensitive patient
The clinician opens the chat and asks for the chart of a sensitive patient on their afternoon roster. The application takes the prompt and the clinician's bearer token. It invokes the deployed AgentCore Runtime by ARN. The invoke call carries two things at once: the IAM identity of the application, which signs the AWS request, and the bearer token of the clinician, which rides on the Authorization header.
AgentCore Runtime wakes a microVM, places the user token on the agent's request context, and starts the Strands loop. The loop builds a fresh MCP client carrying the bearer token in its headers, builds a fresh Agent with the model and tools, and calls Claude.
Claude looks at the prompt, decides the right tool is read_patient_record, and returns a tool-call block. The loop catches it and calls the MCP server.
The MCP server runs Token Exchange against IBM Verify. The exchange request carries an RAR authorization detail for "read the chart for this exact record, on behalf of this clinician, for sixty seconds." IBM Verify sees the record is flagged sensitive and requires step-up. The clinician's phone buzzes. They approve. IBM Verify returns the scoped token.
The MCP server presents its SPIFFE SVID to Vault and asks for an ephemeral database credential. Vault generates the credential. The SQL runs. The chart comes back. The MCP server revokes the Vault lease in the same code path. The credential disappears.
The MCP server returns the chart to the agent loop. The loop feeds it to Claude. Claude composes a final answer. AgentCore Runtime streams it back to the application. The application streams it to the clinician's browser. The microVM goes idle. The next request starts the whole flow from scratch.
Most people assume managed infrastructure is expensive. For AgentCore Runtime at the scale most teams actually run, it is not.
AgentCore Runtime bills active vCPU-hours and GB-hours by the second. Idle time is free. The time the loop spends waiting on the model is free. That billing model means cost scales with actual usage, not with the number of sessions you could theoretically run.
To make that concrete across two very different deployments: a small practice with ten staff running fifty sessions a week, four turns each, three seconds of active CPU per turn, the AgentCore Runtime compute cost is a few cents a week. A larger enterprise running thousands of sessions a day will spend meaningfully more, but the math stays the same. You pay for the CPU seconds the loop actually consumed, not for headroom you provisioned and left idle overnight.
The model you choose on Bedrock prices per token in and per token out, and that is where the real spend lives at any scale. AgentCore Runtime compute adds a small fraction on top regardless of which model you select or how many users you have. The decision to use AgentCore is an architecture decision. The cost scales with your usage, and it does not add overhead you would not have paid running the loop yourself.
Operationally there are two things worth knowing. A runtime session that has been idle goes cold, and the next invoke pays a cold-start while a microVM spins up. Reusing a stable session id per conversation keeps a warm microVM and skips the cold-start on follow-up turns. A runtime session has a maximum lifetime of eight hours and an idle timeout of about fifteen minutes. For a chat experience that is plenty.
The honest summary
Moving the loop onto AgentCore Runtime is a portability and security-isolation decision, not a cost decision. The microVM-per-session model is genuinely good for agent workloads. The price tag is small enough that it disappears into the model bill.
How the system surfaces audit and operational logs for an enterprise SIEM.
The agent loop runs on AgentCore Runtime. The action authorization happens in IBM Verify. The credential is minted in HashiCorp Vault. The actual database read happens under a one-time role in your database. That is four systems, each writing its own audit trail, each landing on a surface your SIEM already ingests today. Nothing custom is required at the SIEM layer.
The agent's per-invocation log lands in /aws/bedrock-agentcore/runtimes/<runtime-id>-<endpoint>. Standard CloudWatch ingest: CloudWatch Logs to Kinesis Data Firehose to Splunk HEC, to Microsoft Sentinel Log Analytics, to Chronicle, to QRadar's AWS DSM, to your bucket of choice for batch ingest. Every major SIEM has a turnkey connector for this.
Every InvokeAgentRuntime call, every runtime create or update or delete, lands in CloudTrail with the IAM principal that made it. Already in your existing CloudTrail to SIEM pipeline if you have one.
Every tool call dispatched, every Token Exchange attempt, every Vault credential request, every error, all written structured and append-only. CloudWatch Agent on the EC2 host, container log driver, syslog, Fluentd, or Vector ship these to your SIEM. The MCP server's per-request log line includes the OBO jti, which is the join key.
Every Vault operation, including the verify-rar credential mint, the lease ID, the policy that allowed it, the actor entity that requested it, and the ephemeral database role that was created. Sensitive fields are HMAC'd. Vault's file audit device or socket audit device feeds your log forwarder. Vault Enterprise can stream directly via the audit broker.
The actual SQL the ephemeral role ran. CREATE ROLE, GRANT, the query, REASSIGN, DROP ROLE, all tied to the same per-request username. pgaudit or the equivalent for your backend, shipped via syslog to your SIEM.
Every Token Exchange (RFC 8693), every RAR evaluation (RFC 9396), every policy decision, every step-up MFA push, every approval or denial, every session revoke. Two delivery modes: pull from the /v1.0/events API on a schedule, or push via SSF (Shared Signals Framework) and CAEP (Continuous Access Evaluation Profile) to any SSF-compatible receiver. CAEP push is the standards-based path, and most modern SIEMs have a CAEP ingest module or can take it via webhook.
The join key · OBO jti
Every OBO token IBM Verify issues for a tool call carries a unique jti. That same jti appears in the Verify event log entry for the Token Exchange, in the Vault audit log entry for the credential mint, in the Vault audit log entry for the lease revoke, in the verify-rar plugin's cred_issued event, and in the MCP server's per-request structured log line.
One Splunk query, one KQL query, one AQL query, joining on jti, reconstructs the full chain end to end. The SOC analyst sees what the user requested, which policy decided it, what credential was minted, and what query ran under it, all in one view. No custom correlation logic, no schema mapping, no agent-side instrumentation.
Out of the box
No custom log aggregator. Each system writes to a SIEM-supported source today.
No correlation engine. The jti does the correlation in one query.
No agent-side telemetry pipeline. The agent never calls out to send a log. The act of calling the MCP server is the log.
No bespoke parser. CloudWatch Logs, Vault audit, pgaudit, and Verify CAEP events are all known SIEM sources with existing field mappings.
The takeaway is short. AgentCore Runtime is excellent at what it does. It hosts the loop. It scales the microVMs. It bills fairly. It stays out of the way of identity. That last part is the unlock.
The agent loop can live in AWS, and your identity model can stay where it was always supposed to live. In IBM Verify, behind your RAR policies, with ephemeral credentials issued by Vault, attested by SPIFFE, scoped to one operation at a time. The agent never holds a static credential. The model never sees the user's identity. The database never sees anything except an ephemeral user that exists for seconds. The SIEM sees all of it, correlated on one join key, with no custom plumbing.
That is what zero trust for agentic AI looks like in 2026. Not a dashboard. An implementation that runs the loop where it scales best, and keeps the trust where you can audit it.
AgentCore runs the loop.
Verify runs the trust.
Vault issues the lease.
SPIFFE proves the workload.
Four jobs. Four systems. One clean request.