I found 268 plaintext secrets in my AI stack
Six weeks ago a security audit told me I had 268 plaintext secrets scattered across my AI agent platform. This week I closed out the migration that fixed it. Here is what I learned about credentials, AI agents, and the gap between what we build and what we secure.

Key takeaway
- AI (Artificial Intelligence) agent platforms accumulate secrets faster than anyone admits. A modest setup with 25 specialised agents ended up with 71 distinct credentials across vendor APIs, bot tokens, OAuth secrets, and platform tokens. None of this is unusual.
- The right architecture treats every secret value as something the human handles personally, with the AI agent as a coordinator and verifier rather than a value-handler. Drawing that bright line is more important than any specific tool choice.
- The migration took longer than expected because security work always does. A six-phase plan stretched into seven phases, three transcript leak incidents informed three rules, and the runbook is now longer than the original plan. Every minute was worth it.
Six weeks ago an automated security audit told me my AI agent platform had 268 plaintext secrets scattered across configuration files, agent workspace directories, and .env files on the virtual machine that runs everything. API (Application Programming Interface) keys, bot tokens, OAuth (Open Authorisation) client secrets , all sitting on disk, readable by every agent and every script that wandered through the workspace.
Two days ago I committed the closing artefacts of the migration that fixed it. A 71-secret manifest, two operational skills, a manual fallback runbook, an as-built architecture document, and a leak ledger that tracked every accidental disclosure I made along the way. Three of them. I will get to that.
This is the post-mortem for a piece of work that no one wanted to do, took longer than I planned, and mattered more than most of what I have shipped this year.
The Problem No One Talks About#
Talk to anyone running production AI agents and you will hear about model selection, prompt engineering, evaluation harnesses, and tool-use reliability. You will hear less about secrets management. Almost nothing about credential rotation cadence. And nothing at all about the unglamorous question of what happens when your AI agent helpfully echoes part of an API key into a chat transcript at 4am because a regex pattern matched too greedily.
The reason is not mystery. It is that this work is boring, fiddly, and exposes how unfinished most agent stacks really are.
When I started this in mid-March 2026, my platform had:
- 25 specialised AI agents, each with role-specific permissions and tool access
- A central orchestrator routing work between them
- Several supporting services (a custom CRM (Customer Relationship Management), an OKR (Objectives and Key Results) tracker, a wiki, a public web application, dashboards) all sharing the same VM (Virtual Machine)
- 18 Telegram bots used for internal communication between agents and me
- 17 Telegram webhook secrets
- API keys for nine large language model providers
- Tokens for GitHub, Vercel, a home automation system, and an entire identity stack
- Several keys whose original purpose I had to investigate to confirm they were still in use
The agents needed access to all of this to do their jobs. The system was working. It was also a textbook example of what security people call sprawl.
The audit was uncomfortable to read because it was correct.
Why This Is Worse Than Old-Style Secrets Sprawl#
In a traditional application, plaintext credentials in config files are dangerous because a person or a process might exfiltrate them. The blast radius is bounded by who has shell access.
AI agent platforms add three new failure modes:
One. AI agents read those config files as part of their normal work. They are designed to ingest and reason over configuration. A secret in a config file is not just at rest. It is in active context with a probabilistic system that might surface it in unexpected ways.
Two. AI agents communicate over chat transcripts that get logged. When an agent debugs a configuration problem, it might helpfully read part of a .env file aloud. Once a value is in transcript, it is in the AI provider's logs. You cannot get it back.
Three. AI agents write to disk. The same agent troubleshooting a connection problem might decide to print debug output that includes environment variables. By the time you notice, the value is in three log files and a handoff document.
The combination is worse than the sum of the parts. A traditional secret leak requires malice or compromise. An AI agent leak requires only an unfortunate prompt and a literal-minded model trying to be helpful.
The Plan I Actually Followed#
I drafted a seven-phase plan in late April. The plan was:
- Foundation. Set up a centralised secrets vault. Create scoped service identities for the platform to read from it. Establish the bootstrap procedure for the very first credential.
- Provider spend caps. Before rotating anything, set per-vendor monthly spending caps so a leaked key cannot be drained for unbounded cost.
- Reference cleanup. Replace plaintext values in the main configuration file with environment-variable references.
- Vault migration. Replace those environment-variable references with calls to the vault, so the configuration file no longer contains secrets even by reference.
- Non-platform services. Migrate the supporting services to the same vault, each with its own scoped service identity.
- Rotation campaign. Rotate every long-lived credential at the vendor portal. New values land in the vault directly. Old values get revoked.
- Hardening. Destroy every plaintext copy still on disk. Cut over the runtime to read from a memory-only file that gets rebuilt on every restart.
Then a Phase 7 emerged that the original plan called ongoing operations. Cadence, audit reviews, lifecycle. That is the phase I onboarded this week.
The rotation campaign in Phase 5 alone took three working days. Phase 6 surfaced an architectural realisation that delayed Phase 7 by two days. Each phase had a story I wish I had captured in real time.
What Made the Architecture Work#
I made one decision early that I credit with most of what went right.
I drew a bright line between two classes of secret. Class A is values that a vendor portal issues: Anthropic API keys, GitHub tokens, Telegram bot tokens, Microsoft Entra app secrets. For these, the AI agent never sees the value. The vendor issues it. I copy it directly into the vault's web interface. The agent only ever sees a 12-character SHA-256 (Secure Hash Algorithm 256-bit) fingerprint to verify the new value reached every consumer correctly.
Class B is values the agent can generate itself with openssl rand or rotate via a vendor API call without any human-only step. For these, the agent rotates end-to-end during a temporary write-permission window that I open and close at the start and end of each rotation cycle.
Of my 71 secrets, 34 are Class A, 31 are Class B, and 6 are static configuration that does not really rotate.
The bright line matters because it removes an entire category of failure. The AI agent has no path by which a vendor-issued credential can transit its memory or output. If the agent has a bad day and tries to be helpful by echoing a value, the only Class A values it could echo are 12-character fingerprints that are useless to an attacker. The full values are not in its context.
This was not the original architecture. The original architecture had me constructing vault-update commands in my shell, with the new value pasted into the command line. The agent then helpfully verified the update. That worked but it routed every secret through my terminal, where copy-paste mistakes and shell history both threaten exposure. The Class A/B split came from a passing comment I made to my AI assistant during planning: "I will only update the vault, nothing else." It sounded obvious. It was actually the security model.
The agent's role for Class A secrets shrank to three things: plan the rotation, prompt me with the right deep links to the right portals, and verify after the fact that the new value reached every consumer. No value-handling. No copy-paste assistance. No "let me help you with that command." The agent became a coordinator. I became the only path through which a Class A value could move.
The Three Leaks#
I logged three accidental secret disclosures during this work. Each one informed a new operational rule.
The first happened during the rotation campaign in Phase 5. I was verifying that a freshly-rotated key had reached the vault correctly. I ran a command that piped the first 16 characters of the value to standard output for visual comparison. The first 16 characters of an API key are not the whole key, but they are enough that combined with knowledge of the key's format, an attacker has narrowed the search space significantly. The first 16 characters were now in my AI provider's logs.
I rotated that key again. I added a rule: never echo any prefix of a secret value, ever. Use the 12-character SHA-256 fingerprint for verification. The fingerprint reveals nothing useful to an attacker but is unique enough for human comparison.
The second happened during Phase 6. I ran a command that searched my running processes for environment variables matching a pattern. The pattern was meant to match one specific variable name. It accidentally matched as a substring inside several others, and the resulting output dumped the values of every API key in the gateway's environment into the chat. Eight keys, all visible.
I rotated all eight. I added a rule: regex patterns for environment-variable names must use word boundaries or exact-name matching. I wrote a Python helper that takes a key name as a command-line argument and returns the 12-character fingerprint, never the value, with no regex matching anywhere in the path.
The third happened this week, two hours before I closed out Phase 7 onboarding. I was investigating whether two specific keys were still in use anywhere on the system. I ran a recursive grep across the configuration repository. The grep recursed into a directory that held emergency rollback files from a previous phase: files that were in .gitignore and not committed, but were still plaintext on the disk. The grep matched a key value and printed it to my AI session's transcript.
The directory was already scheduled for shredding the next day. The compromise window was about 18 hours. I logged the incident and decided to defer rotation of the leaked key to the first scheduled rotation cycle in May. The mitigation is bounded by time, the value is not in any persistent log, and the original consumer of that key was retired weeks ago. But the rule update was important: never recursive-grep into archived directories, even when investigating something legitimate.
That third leak landed at exactly the moment the AI assistant was helping me build the operational rules to prevent leaks like that one. It was a useful piece of irony. It also became a worked example in the runbook, because nothing teaches a rule like the rule's own violation.
What I Built That Will Outlive the Migration#
The migration itself was finite. The phases close. The plaintext gets shredded. The rotation campaign ends. The artefacts that came out of it are the durable wins.
I built two skills that an existing AI agent in my platform now executes on a schedule. The first plans and shepherds rotation cycles every quarter. The second runs a monthly audit review across the vault's audit log, the platform's journal, GitHub token scope changes, and a few other signals, paging me only when an anomaly fires.
Both skills carry a 71-secret manifest as their source of truth. The manifest tells the agent the rotation method for each key (vendor portal, self-issued, BotFather DM (Direct Message), Telegram setWebhook API), the rotation cadence (quarterly, annual, biennial), the consumers, and the smoke test that verifies the new value works. Every entry has the deep link to the vendor portal and the deep link to the right page in the vault. The 18 Telegram bot tokens are split into four quarterly batches of four or five bots each, so a once-a-year burst of 18 sequential BotFather conversations becomes four manageable batches that fold into the regular quarterly cycle.
I wrote a manual fallback runbook for the case when the skill is unavailable, the AI agent is unreachable, or the vault is having an outage. Step by step, second person, with verification gates between every step. The runbook is now longer than the original migration plan. That feels right.
I wrote an as-built architecture document: the file a future fresh AI session can read to understand the whole system without trawling through six weeks of work history. The architecture lives in five trust zones, two storage tiers, and one bright line that the agent does not cross. There is a section called "the why log" that captures the non-obvious decisions and why we made them, so a future me cannot accidentally undo them with a clever-looking refactor.
All of this is version-controlled. The manifest, the skills, the runbook, the architecture document, the migration history, the leak ledger. If everything I have on disk vanished tomorrow, I could rebuild from the repository. That is the point.
The Shape of the Work That Remains#
Phase 7 is not closed. It is ongoing. The first rotation cycle runs in May. The first audit review runs in early May. The first 90-day bootstrap rotation comes due around late July. There is one open question I deferred to the May cycle: a pair of legacy OAuth credentials that an investigation showed have no live consumer in five of seven search scopes. Two scopes I cannot verify without help from a system I cannot reach. The decision is whether to retire them outright or rotate them as a precaution. Either is defensible. I want a fresh head for the call.
The skill scripts that automate the rotation cycle are stubs right now. The SKILL.md files describe what they should do. The Python implementations are Phase 7 follow-on work. The runbook covers the case where the scripts do not exist yet: manual rotation following exactly the same flow the scripts will eventually automate. The first cycle in May will run from the runbook. The second or third will run from the scripts.
I have not solved per-agent credential scoping. All 25 of my agents currently share the same Anthropic API key. The vault makes per-agent keys tractable, but it requires per-agent workspace setup at Anthropic that I have deferred to a quarterly planning session later in the year. This is a real limitation. A leaked key compromises all 25 agents, not one. It is on the list.
What I Would Tell Someone Starting This Tomorrow#
If you are running an AI agent platform and you have not done this work yet, three things.
First, the architecture decision matters more than the tool choice. I picked Bitwarden Secrets Manager after a multi-day research phase that compared it against Vault, AWS Secrets Manager, 1Password, and LastPass. The choice was right for me, but not because Bitwarden is meaningfully better than the others on a feature comparison. It was right because it integrated cleanly with the platform's secret-resolution mechanism, the pricing made the migration affordable for a small operation, and the data-sovereignty fit matched my situation. Almost any of the alternatives would have worked. What mattered was the bright line between Class A and Class B secrets, and that decision is tool-independent.
Second, write the leak ledger before you have any leaks. Every incident I had during this work informed at least one rule. By the time I had three incidents, I had a clear rule set and I had not actually compromised anything operationally , and every leaked value was rotated within hours and the originals revoked. The ledger is the document that turns "we had three near-misses" into "we have three rules and a system that gets stronger over time." Without the ledger, near-misses are forgotten and the rules never accrete.
Third, write the runbook for the case where everything is broken. The skill is the happy path. The runbook is the unhappy path. If your only credential-rotation procedure lives inside an AI agent, what happens when the agent is the thing that is broken: I now have step-by-step instructions for rotating any of the 71 secrets manually, recovering from a vault outage, onboarding a new service identity, and restoring my own AI agent connector when its API key changes. None of those procedures requires the AI agent to be running. Some of them are needed precisely because the AI agent is not running.
The Work No One Notices#
There is a particular satisfaction in finishing a piece of work that nobody will notice if it goes well. No one is going to send me a screenshot of the rotation skill firing on the first of May. No one is going to congratulate me on a clean audit review report. The reward for this work is the absence of the future incident I would otherwise have had.
But the architecture document, the runbook, the skills, and the manifest are real. They are version-controlled. They will be read again. The next time someone, me or a fresh AI session or a future hire, needs to add a secret, rotate a key, or recover from a vault outage, they will not have to figure it out from first principles. That is the win.
Six weeks ago I had 268 plaintext secrets and no rotation cadence. Today I have 71 secrets in a vault, two skills that automate the whole lifecycle, a runbook for everything else, and three rules I learned the hard way. Every minute was worth it.
Even the leaks. Especially the leaks.
Mark Smith is Principal AI Strategist at Cloverbase. To discuss this article or work with me, contact me at Cloverbase.
Share this article#
Comments
Loading...