april 2026 personal 8 min read

i promise it's not me

This is a story about what happens when platform trust-and-safety systems can't tell the difference between an AI agent and an attacker — and what happens when the only appeals path requires an employer.

On April 1, 2026, my GitHub account was flagged as malicious by an automated trust-and-safety system. I appealed. The appeal was denied. Two days later, after my employer escalated through enterprise support, my account was added to an allow-list.

I am unsure why I was banned. It's not a solved problem, and I am an anomaly today.

This was the verdict on the appeal:

"For this reason, I'm afraid we will not be removing the restrictions from this account."

The reason given: my account "appears to have been taking part in activity which goes against our Terms of Service" — specifically the clause about using GitHub Actions for "3rd party websites, incentivized activities, or general computing purposes."

Translated out of policy language: GitHub's automated trust-and-safety system thought it caught a cryptominer.

It caught me in the middle of a software-supply-chain fire, deploying agentic infrastructure for — ironically — the open-source identity layer that exists to make exactly this kind of misclassification stop happening.

This is what it looked like from the inside.

the day, factually

April 1, 2026. I did my morning check of agentic runs and my PRs were returning 404 from the /issues page. Latency on git push looked off. I assumed GitHub was having a day. I have had GitHub-having-a-day many times. I went to bed.

April 2. The 404s weren't going away. I checked my account: blocked. Marked malicious.

I filed a reinstatement request. I told them upfront who I was — a developer at a software-supply-chain security company, a contributor to github.com/agentic-research — and asked a diagnostic question: had I tripped something the day before when I rotated my SSH key and cleared old secrets? The reply came back the same day: my activity violated the Terms of Service; the restrictions would stand. Then the ticket was archived. There was no comment box anymore. To say anything else, I'd have to file a new ticket against the same closed loop. When I tried, the support system started returning rate-limit errors. I laughed — a 10-year account, and no way to complain. The appeals layer was, at that point, rate-limiting the appellant.

That is when I asked my manager for help. The first useful motion came informally — the kind of help that exists when people at one company know people at another. The formal enterprise-support escalation followed.

April 3. My account was added to an allow-list.

I was not unbanned. I was not exonerated. The detection still stands. I am a thankful exception to it, not a refutation of it. As far as the classifier is concerned, I am still doing whatever it thought I was doing — I just have someone who can vouch for me.

For most developers in my position, that someone does not exist. It was scary.

what I was actually doing

I was building notme and signet — an open-source identity layer for AI agents. Signet started in September 2025 — a self-sovereign cert layer for humans and machines. Notme came later, prototyped the weekend after the Trivy compromise on March 19. By April 1 the OIDC-bridge authority was running on auth.notme.bot. Two days later the ban hit.

The thesis, lifted from the homepage:

▎ agents aren't users — they're machines. they need machine identity, not hacked human identity.

To build it, I was running rosary — my own agent orchestrator — across a public org, agentic-research. In the seven days before the block:

235 workflow runs

10 repositories

1 github identity

In March alone, my user made roughly 1,300 commits.

Some of it was long compute — vulnerability-database parity tests on a project called venturi. Some of it was credential-exchange surface area — workflows in the notme repo doing OIDC → octo-sts → ephemeral-cert exchanges, exactly the pattern the project is meant to prove out. Some of it was just retries on builds that were broken because I was iterating fast.

All of it ran under one GitHub identity: jamestexas. That is how GitHub identity works today. One human. All the work. One commit author. One actor.

Rosary had no per-agent identity because there is no commodity primitive in 2026 to give it one. That is the gap signet exists to fill.

why the classifier wasn't wrong, exactly

I don't have visibility into what GitHub's detection actually saw — what follows is reconstruction.

The two weeks before I got flagged, the supply chain was on fire.

Feb 21–Mar 2 hackerbot-claw — itself an autonomous AI agent powered by Claude Opus 4.5 — scanned public repos for exploitable pull_request_target workflows. It hit Microsoft, DataDog, CNCF projects, and Trivy, stealing a PAT from the latter. Aqua's incomplete rotation of that PAT is what TeamPCP would ride 17 days later.

Mar 19 Aqua Security's Trivy scanner was compromised. 75 of 76 release tags in trivy-action force-pushed to malicious commits, plus 7 setup-trivy tags. One stolen PAT moving laterally across an org.

Mar 31 The axios npm maintainer was compromised. A trojanized version was live 00:21–03:25 UTC — long enough that fresh npm install -g @anthropic-ai/claude-code runs during the window pulled the bad axios transitively.

Mar 31 Anthropic accidentally shipped a 60 MB source map of Claude Code — 513,000 lines of TypeScript — in its npm tarball.

Apr 1 All three were public. The classifier that flagged me that day was shaped by exactly those silhouettes.

I am not arguing GitHub's detection should have been less paranoid that week. The week was, in fact, a paranoid week to be running an agent orchestrator whose behavior resembled compromised PAT activity.

The signals it lit on were real.

long-running compute jobs that retry after failing

The Grype-DB Parity Test in venturi (one of my private repos) ran for up to 78 minutes — vulnerability-database sync (NVD/OSV/distro), a candidate DB build, a Dice-coefficient comparison against Anchore's upstream Grype-DB. That last step kept hitting Parity score 0% is below threshold 95.0% → exit 1. So the run came back. And came back. And came back. To a classifier operating on behavioral silhouettes, that profile is indistinguishable from a miner: long, repetitive, GPU-or-CPU-bound, fails-and-retries. An awful design that my harness didn't catch and just… looped on. A valuable learning.

`curl | jq | base64 -w0 | ::add-mask::` pipelines

The gha-identity.yml reusable workflow does an OIDC token exchange, hits an external authority (auth.notme.bot), receives a short-lived cert, base64-encodes it for transport, masks it from logs, and hands it to the next job. It is a credential exchange. It is also the exact shape of a stager pipeline: fetch from external endpoint, base64, mask, run. The Trivy attackers' payload was structurally similar — credential dumps, AES-encrypted, exfiltrated to attacker C2. Different intent; same silhouette.

cross-repo cardinality from a single actor

Ten repos, 235 runs, single actor, public org with crypto-adjacent naming (notme, signet, notme.bot, ley-line-open). That is exactly the shape a cross-actor aggregator lights up on. It is also indistinguishable from a compromised PAT moving laterally across an org. The Trivy attack used exactly that pattern — one stolen PAT, then a bridging service account, then access to a half-dozen related repos, then npm, then Docker Hub. One actor; many surfaces.

The signals were real. The cause was different. The classifier had no way to tell.

the thesis

This is the bug notme.bot exists to fix. The incident made the gap concrete.

▎ every commit, every API call, every push your AI agent makes carries your identity. no separation. no scope. no revocation.

If rosary had been issuing per-agent ephemeral identities — which is what signet does — the actor-aggregate signal would have had nothing to aggregate. Not because there were fewer agents, but because each one would have presented a distinct, scoped, short-TTL workload identity. The classifier would have seen what it actually was: many machines, doing different work, none of them me.

This isn't a corner case. OpenClaw — one of the most-deployed personal AI agents in the wild — runs as you, every action it takes carrying your identity, because there is no other identity available. Its skills marketplace has been under sustained attack for months — thousands of malicious skills, data exfiltration baked into innocuous-looking helpers, full-agent-takeover vulnerabilities in the gateway. When any of that fires, the human is the actor of record. The pattern is everywhere now.

The platforms gating engineering work today don't have a good way to ingest that signal even when it exists. Workload OIDC is already a primitive — GitHub Actions issues OIDC tokens to every workflow run — but the trust-and-safety layer doesn't model "this is a delegated machine identity acting on behalf of a human" as a separate category from "this is the human." Everything funnels back to one actor account.

That assumption is going to keep producing false positives shaped exactly like mine. Probably more of them, faster, as agentic AI builds out. The same week the Trivy attackers were using one stolen PAT to thread through five orgs and 47 npm packages, I was using one personal GitHub identity to coordinate ten repos of an open-source identity-layer project. From a behavior-shape distance, those two profiles overlap.

signet ships the cert layer; APAS adds the attestation chain. both open source.

read the research →

what needs to change

I know these systems aren't free. Distinguishing actor identity from delegated identity increases false-negative risk during incidents. Attestation trust chains require platforms to ingest signals from third-party authorities. Appeals paths that scale to AI builders mean human reviewers, paid by someone.

I'm also building part of what I'm advocating for, and I want this approach to land.

Three things:

1. distinguish actor identities from delegated machine identities

Workload OIDC carries audience claims, ephemeral expiry, and scoped permissions. None of that information appears to make it to the classifier. It should.

2. treat agentic compute as a category worth modeling

Today it's a residual bucket of "automation" — indistinguishable from cron jobs and dependabot. As AI-coding tooling ships, that bucket will dominate CI volume. Heuristics that can't distinguish "agent-driven build iteration" from "cryptominer iteration" will keep overshooting.

3. an appeals path that knows agentic AI work exists

I had a manager. The manager had an enterprise-support contact. The contact had the authority to allow-list. If any of those three links was missing, I would still be banned today. The default outcome — for an agentic-AI builder who doesn't have a manager with a path to enterprise support — is the closed ticket.

I am not asking the classifier to be wrong less often. I am asking the appeals layer to be right more often.

"I promise it's not me" is what I had to say, because it was all I had to say. The primitive I'm building is the one that lets a machine prove it instead.

This incident is what its absence costs.

— notme.bot

references

hackerbot-claw — StepSecurity, Orca, Cybernews
Trivy / TeamPCP — Wiz, Microsoft, The Hacker News
axios npm compromise — Microsoft (Sapphire Sleet attribution), Wiz, Snyk, axios#10636 post-mortem
Claude Code source leak — InfoQ, The Hacker News, Alex Kim's writeup
OpenClaw security research — Cisco, Bitsight, Oasis (ClawJacked), 1Password, VentureBeat
me

the identity layer agents need

open-source. cryptographic. standards-defining.

the source research →