How Aegis Works: Privacy-First PII Redaction with Go WASM and Chrome Extensions
By Avik Mukherjee | Apr 12, 2026 · 22 min read · Updated Apr 12, 2026
I use ChatGPT, Claude, and Gemini daily. And every time I paste a log file, an error message, or a config snippet, I wonder — did I just send an API key, an email, an IP address to a third-party server?
The "solution" is to be careful. But being careful doesn't scale. You will forget. Everyone does.
So I built Aegis — a browser extension that intercepts what you type into AI chatbots and replaces sensitive data with tokens before it leaves your browser. alice@startup.io becomes [EMAIL_1]. 192.168.1.42 becomes [IP_ADDR_1]. The AI sees tokens. You see the truth.
Everything runs locally. No servers, no APIs, no telemetry. A Go WASM vault does the heavy lifting inside a Chrome offscreen document. The extension is built with WXT and React. The whole thing compiles to a single .wasm file.
This is the full breakdown of how it works, why each piece exists, and what I learned building it. Link for the Demo
The source is not publicly available. It is an internal tool for my personal use. This article is a technical post-mortem of the architecture and implementation details. If you want to build something similar, I hope this serves as a guide and inspiration.
The Problem#
When you send a message to ChatGPT, here's what happens:
- You type text into a textarea or contenteditable div
- You press Enter
- The site's JavaScript reads the input, constructs a request body, and sends it via
fetchorXMLHttpRequest - The request travels over HTTPS to OpenAI's servers
- The server processes it and returns a response
If your message contains sk-abc123def456ghi789 (an API key) or john@company.com or 10.0.0.5, that data is now on someone else's server. You can't unsend it.
The browser won't warn you. The AI site won't redact it. It's on you.
What Aegis Does#
Aegis inserts itself between step 2 and step 3:
- You type text into the AI input
- You press Enter
- Aegis intercepts the Enter keypress before the site's handler runs
- Aegis sends the text to a Go WASM vault that finds and replaces all PII
- Aegis writes the redacted text back into the input
- Aegis clicks the submit button
- The site's JavaScript reads the redacted input and sends it
The AI receives [EMAIL_1] instead of the real email. The mapping is stored locally. When the AI responds with [EMAIL_1] in its output, Aegis can hydrate those tokens back to the original values.
No data leaves your machine. The WASM vault runs inside your browser.
Live vault demo
The Architecture#
Here's the high-level picture:
Click any node to learn what it does
Three contexts, one vault.
Content Script — Runs on ChatGPT, Claude, Gemini, etc. Renders the floating shield button, intercepts input, communicates with the vault.
Offscreen Document — A hidden HTML page that exists solely to host the Go WASM instance. Chrome MV3 doesn't let content scripts or popups run WASM directly in a shared context, so we create a dedicated offscreen page. All vault operations happen here.
Options Page — The dashboard. Shows analytics (how many emails redacted, how many IPs), the token map (which token maps to which value), custom rules, and filter toggles. Reads from the same vault via the same message bus.
Popup — Quick on/off toggle and basic settings.
The key insight: one WASM instance, shared across all contexts. If the content script redacts an email on ChatGPT, the dashboard sees the updated analytics immediately. They're talking to the same Go process.
Part 1: The Go WASM Vault#
The core of Aegis is a Go program compiled to WebAssembly. It has two jobs:
- Find PII in text using regex patterns
- Replace matches with tokens, storing the mapping for later reversal
The Data Structure#
The vault is a struct with three key pieces of state:
- storage — A map from token to real value (
[EMAIL_1]→alice@startup.io) - lookup — A map from real value to token (reverse of storage, for deduplication)
- counters — Per-category counters tracking how many of each type have been redacted
There's also a mutex because WASM is single-threaded but Go's scheduler can interleave goroutines.
Tokenization#
When the vault encounters a sensitive value it hasn't seen before, it mints a token:
func tokenize(value, prefix, counter):
if value already in lookup:
return existing token
increment counter
token = "[prefix + counter]"
store token → value
store value → token
return token
So the first email becomes [EMAIL_1], the second [EMAIL_2], and so on. If the same email appears twice, it gets the same token both times — deduplication via the lookup map.
The Protect Function#
This is the main entry point. It takes a string and returns a redacted string:
func Protect(input):
lock mutex
result = input
// Apply custom rules first
for each custom rule:
if rule.pattern exists in result:
replace all occurrences with rule.replacement
// Contact category (if enabled)
if config.contact:
result = replaceAll(result, emailRegex, "EMAIL_", emailCounter)
result = replaceAll(result, phoneRegex, "PHONE_", phoneCounter)
result = replaceAll(result, moneyRegex, "MONEY_", moneyCounter)
// Technical category (if enabled)
if config.technical:
result = replaceAll(result, ipRegex, "IP_ADDR_", ipCounter)
result = replaceAll(result, pathRegex, "PATH_", pathCounter) // only paths > 5 chars
result = replaceAll(result, secretRegex, "SECRET_", secretCounter)
// Identity category (if enabled)
if config.identity:
result = replaceAll(result, dateRegex, "DATE_", dateCounter)
unlock mutex
return result
The categories are configurable — you can turn off contact redaction, technical redaction, or date redaction independently from the dashboard.
The Regex Patterns#
Each PII category has a compiled regex. These are the actual patterns:
- Email:
(?i)[a-z0-9._%+\-]+@[a-z0-9.\-]+\.[a-z]{2,} - Phone:
(\+\d{1,3}[.\-\s]?)?\(?\d{3}\)?[.\-\s]?\d{3}[.\-\s]?\d{4} - IP:
\b(?:\d{1,3}\.){3}\d{1,3}\b - Secret:
\b(?:sk|pk|api|key|token|secret|aws|gcp)[-_]?[a-zA-Z0-9_]{20,}\b - Path:
(/[a-zA-Z0-9._\-]+)+/?(filtered to only matches longer than 5 characters) - Money:
(?i)[$€£¥]\s?\d[\d,]*\.?\d*(?:\s?(?:million|billion|mil|k|m|b))?\bplus the inverse\b\d[\d,]*\.?\d*\s?(?:dollars?|euros?|pounds?|usd|eur|gbp)\b - Date: Covers ISO dates (
2026-05-30), relative dates (next Friday,yesterday), and month-day-year formats (April 15, 2026)
The Underscore Bug#
This one took me embarrassingly long to catch.
The original secret regex was \b(?:sk|pk|api|key|token|secret|aws|gcp)[-_]?[a-zA-Z0-9]{20,}\b. Look at the character class: [a-zA-Z0-9]. No underscore.
A test input like pk_live_xYz123AbC456DeF789GhI012 would fail. Here's why: after matching the prefix pk_, the engine starts consuming the token body at live. It reads l, i, v, e — then hits _. Underscore isn't in [a-zA-Z0-9], so the match stops at 4 characters. The regex requires 20+. Match fails. The API key goes through unredacted.
The fix was one character: [a-zA-Z0-9_]. But the symptom was invisible — nothing crashed, nothing warned. The key just... didn't get tokenized. I only noticed when the dashboard's "Secrets / keys" counter stayed at zero after pasting a Stripe key.
This is the trap with regex-based detection. The pattern either matches or it doesn't, and there's no middle ground where it says "I'm not sure about this one." Silent failure is the default.
The Reveal Function#
Reversal is straightforward — iterate over the token map and replace each token with its original value:
func Reveal(input):
lock mutex
result = input
for each (token, realValue) in storage:
result = replaceAll(result, token, realValue)
unlock mutex
return result
This lets you take AI output containing [EMAIL_1] [PHONE_2] and recover the original values.
Part 2: Compiling Go to WASM#
This is where it gets interesting. Go has first-class WASM support via GOOS=js GOARCH=wasm.
The Build Command#
GOOS=js GOARCH=wasm go build -o aegis.wasm .This cross-compiles Go to a WebAssembly binary targeting the js runtime. The output is a .wasm file that expects a JavaScript environment — specifically, the glue code provided by wasm_exec.js (shipped with every Go installation).
The WASM Bridge#
Go needs to expose its functions to JavaScript. The bridge file (wasm.go) is guarded by a build tag:
//go:build js && wasm
This means it only compiles when targeting WASM. The desktop entry point (main.go) has the inverse tag:
//go:build !js || !wasm
In the WASM build, the bridge does this:
func main():
vault = NewAegis()
// Expose functions on the global window object
js.Global().Set("__aegisVault", {
"protect": func(args) → vault.Protect(args[0].String()),
"reveal": func(args) → vault.Reveal(args[0].String()),
"configure": func(args) → vault.Configure(args[0].Bool(), args[1].Bool(), args[2].Bool()),
"getAnalytics": func(args) → JSON.stringify(vault.GetAnalytics()),
"getIdentityMap": func(args) → JSON.stringify(vault.GetIdentityMap()),
"getRules": func(args) → JSON.stringify(vault.GetRules()),
"addRule": func(args) → vault.AddRule(args[0].String(), args[1].String()),
"removeRule": func(args) → vault.RemoveRule(args[0].String()),
"clearVault": func(args) → vault.ClearVault(),
})
// Signal readiness
js.Global().Set("__aegisVaultReady", resolvedPromise)
// Keep the Go process alive forever
select {} // blocks indefinitely
js.FuncOf wraps Go functions as callable JavaScript functions. js.Value.String() and js.Value.Bool() unwrap JS values into Go types. Return values go the other direction — Go strings become JS strings.
The select {} at the end is critical. Without it, main() returns and the Go runtime exits, destroying the WASM instance. The empty select blocks forever, keeping the Go scheduler alive and responsive to future JavaScript calls.
JSON Serialization Gotcha#
Go's json.Marshal uses field names as-is. So this:
type Analytics struct {
Total int
Email int
IPAddr int
}Serializes to {"Total":0,"Email":0,"IPAddr":0} — PascalCase. But JavaScript expects camelCase: {"total":0,"email":0,"ipAddr":0}.
Without json tags, every field on the JS side is undefined because the keys don't match. The fix is explicit tags:
type Analytics struct {
Total int `json:"total"`
Email int `json:"email"`
IPAddr int `json:"ipAddr"`
}This was an actual bug I shipped. The dashboard showed zeros everywhere because Go was sending {"Total":5} and JavaScript was reading data.total (undefined).
Loading WASM in the Browser#
On the JavaScript side, loading looks like this:
// 1. Load Go's runtime glue
inject <script src="wasm_exec.js"> into the page
wait for it to load
// 2. Create a Go instance and fetch the WASM binary
go = new Go()
response = fetch("aegis.wasm")
// 3. Compile and instantiate
result = await WebAssembly.instantiateStreaming(response, go.importObject)
// 4. Run — executes the Go main(), which sets window.__aegisVault
go.run(result.instance)
// 5. Wait for readiness signal
await window.__aegisVaultReady
// 6. Vault is ready
window.__aegisVault.protect("my email is test@example.com")
// → "my email is [EMAIL_1]"
go.importObject provides the JS↔WASM bridge — things like console.log support, random number generation, and the syscall interface that Go expects. go.run() starts the Go scheduler inside WASM. From that point on, Go code runs in the browser, and JavaScript can call the exposed functions.
Part 3: The Offscreen Document Pattern#
This was the hardest architectural problem. Here's why it exists.
The Problem with Multiple WASM Instances#
Chrome extensions have multiple contexts:
- Content scripts — Run in the page context (ChatGPT, Claude, etc.)
- Extension pages — The popup and options page
- Service worker — The background script
Each context is isolated. If the content script loads WASM, it gets its own instance. If the options page loads WASM, it gets another instance. Two separate Go processes, two separate token maps, two separate counter states.
Result: you redact text on ChatGPT, but the dashboard shows zero redactions. The data is in the wrong WASM instance.
The Solution: Chrome Offscreen Documents#
Chrome MV3 introduced offscreen documents — hidden HTML pages that can run code that content scripts and service workers can't (like WASM). The pattern is:
- Background service worker creates an offscreen document on startup
- Offscreen document loads the WASM vault and listens for messages
- Content script and options page send
chrome.runtime.sendMessage()to communicate with the vault - Offscreen document processes the message, calls the WASM function, and sends back the result
This means there's exactly one WASM instance. One token map. One set of counters. Everything is consistent.
The Background Script#
The service worker's only job is to ensure the offscreen document exists:
func backgroundMain():
// Check if offscreen document already exists
contexts = chrome.runtime.getContexts({
contextTypes: ["OFFSCREEN_DOCUMENT"],
documentUrls: [chrome.runtime.getURL("offscreen.html")]
})
if contexts.length > 0:
return // Already running
// Create it
chrome.offscreen.createDocument({
url: "offscreen.html",
reasons: ["WORKERS"],
justification: "Aegis Vault WASM runtime"
})
This runs once when the extension starts. If the offscreen document is already alive (e.g., after a page refresh), it skips creation.
The Offscreen Document#
A minimal HTML page that loads the WASM and becomes a message router:
// offscreen.js
init():
// Load Go runtime
inject wasm_exec.js script
// Instantiate WASM
go = new Go()
response = fetch("aegis.wasm")
result = WebAssembly.instantiateStreaming(response, go.importObject)
go.run(result.instance)
await window.__aegisVaultReady
vault = window.__aegisVault
// Listen for messages from content script / options page
chrome.runtime.onMessage.addListener((msg, sender, sendResponse):
switch msg.type:
case "vault:protect":
data = vault.protect(msg.text)
case "vault:reveal":
data = vault.reveal(msg.text)
case "vault:configure":
vault.configure(msg.identity, msg.contact, msg.technical)
case "vault:getAnalytics":
data = JSON.parse(vault.getAnalytics())
case "vault:getIdentityMap":
data = JSON.parse(vault.getIdentityMap())
// ... other operations
sendResponse({ ok: true, data: data })
return true // keep the message channel open for async
)
The return true is critical. Chrome's messaging API is synchronous by default. Returning true tells Chrome "I'll call sendResponse later" — this keeps the message channel open for asynchronous operations.
The Message Bridge#
Both the content script and options page use a shared helper to send messages:
function sendVaultMessage(message):
return chrome.runtime.sendMessage(message).then(response:
if !response.ok:
throw Error(response.error)
return response.data
)
This abstracts away the message passing. From the consumer's perspective, it's just an async function call:
redactedText = await sendVaultMessage({ type: "vault:protect", text: input })
analytics = await sendVaultMessage({ type: "vault:getAnalytics" })
Part 4: Intercepting Input on AI Sites#
This is where the user experience lives. The content script needs to:
- Find the AI chat input element
- Intercept the Enter key before the site submits
- Redact the text
- Submit the redacted text
Interception sequence. Step 1 of 5
User presses Enter
The keydown event is created and starts traveling from the document root downward in the "capture phase".
Finding the Input#
Different AI sites use different input elements. ChatGPT uses a textarea. Claude uses a contenteditable div. Gemini uses a custom rich-text editor. The detection tries multiple selectors in order:
function findAIInput():
selectors = [
"#prompt-textarea", // ChatGPT
"rich-textarea div[contenteditable='true']", // Claude
"div[role='textbox']", // Generic
"div[contenteditable='true']", // Fallback
"textarea" // Last resort
]
for each selector:
element = document.querySelector(selector)
if element:
return element
return null
This runs on an interval (every second) because AI sites dynamically render their UI. The input element might not exist when the page first loads.
The Protection Hook#
This is the most delicate part. When the user presses Enter:
- The site's JavaScript listens for Enter to submit the message
- Aegis also listens for Enter to redact the message
- If the site's handler runs first, the original text is sent before redaction happens
The solution: listen in the capture phase.
DOM events have two phases:
- Capture phase — Events travel from the document root down to the target element
- Bubble phase — Events travel from the target element up to the document root
Most event listeners (including AI sites' submit handlers) use the bubble phase. If Aegis listens in the capture phase, it runs first:
document.addEventListener("keydown", handler, { capture: true })
The handler:
function handleKeydownCapture(event):
if event.key != "Enter": return
if event.shiftKey: return // Shift+Enter = newline, don't intercept
if !isProtected or !isReady: return
// Block the event from reaching any other handler
event.preventDefault()
event.stopPropagation()
event.stopImmediatePropagation()
// Redact the text (async — calls the offscreen vault)
redactedText = await protect(currentInputText)
// Write the redacted text back into the input
setInputValue(targetElement, redactedText)
// Find and click the submit button (or re-dispatch Enter)
submitButton = findSubmitButton()
if submitButton:
submitButton.click()
else:
targetElement.dispatchEvent(new KeyboardEvent("keydown", { key: "Enter" }))
stopImmediatePropagation() is the key call. It prevents any other handler — including the AI site's submit logic — from seeing the Enter keypress. Aegis has full control.
After redaction, it clicks the submit button directly. This bypasses the keyboard event entirely and triggers the site's submission logic with the already-redacted text in the input.
Setting Native Values#
For <textarea> and <input> elements, you can't just set .value and expect the site's framework (React, Vue, etc.) to notice. Frameworks intercept value changes through property descriptors. The workaround:
function setNativeValue(element, value):
// Get the native value setter from the prototype chain
prototypeSetter = Object.getOwnPropertyDescriptor(
HTMLTextAreaElement.prototype, "value"
).set
// Call it directly on the element
prototypeSetter.call(element, value)
// Notify React/Vue that the value changed
element.dispatchEvent(new Event("input", { bubbles: true }))
This bypasses React's synthetic event system and tells the framework "hey, this input changed." Without the input event dispatch, the site's state wouldn't update and the submit button would send stale data.
Part 5: Token Hydration in AI Responses#
When the AI responds with [EMAIL_1] [PHONE_2], you want to see the original values. Aegis provides a "hydrate" toggle that walks the page's DOM and replaces tokens with real values.
How It Works#
function hydratePage(identityMap):
// Walk all text nodes in the document
walker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT)
while (current = walker.nextNode()):
text = current.textContent
if "[" not in text: continue
matches = text.match(tokenRegex) // e.g., [EMAIL_1], [IP_ADDR_2]
if no matches: continue
// Replace tokens with real values
newText = text.replace(tokenRegex, (match) =>
identityMap[match] || match
)
current.textContent = newText
A MutationObserver watches for new content (AI responses are streamed in chunks). As the AI generates text containing tokens, each chunk is processed and tokens are replaced in real-time.
The user can toggle hydration on and off. When off, they see the AI's raw response with tokens. When on, they see their original data.
Getting the Identity Map#
The identity map ([EMAIL_1] → alice@startup.io) lives in the WASM vault. The content script fetches it via messaging:
identityMap = await sendVaultMessage({ type: "vault:getIdentityMap" })
// Returns: { "[EMAIL_1]": "alice@startup.io", "[IP_ADDR_1]": "10.0.0.55", ... }
Since all contexts share the same vault, the map includes tokens from every redaction — whether it happened on ChatGPT, Claude, or the dashboard's test input.
Part 6: The Dashboard#
The options page is a full dashboard with three tabs: Analytics, Vault, and Config.
Analytics Tab#
Shows:
- Total redacted — Sum of all PII matches since the vault started
- Tokens active — Number of entries in the identity map
- Rules active — Number of custom find-and-replace rules
- Vault status — Whether the WASM vault is ready
- Category breakdown — Bar charts for emails, IPs, phones, secrets, paths, dates, currency
- Distribution — Percentage breakdown by category
- Protect/Reveal tool — Paste text, click Protect to see the redacted output, click Reveal to reverse it
Vault Tab#
Shows the full token map as a table:
Has a "Clear vault" button that resets all tokens and counters (but keeps configuration and rules).
Config Tab#
- Protection filters — Toggle contact, technical, and identity categories on/off
- Custom rules — Add literal find-and-replace rules applied before PII scanning
- Vault engine — Shows the WASM status and runtime info
Auto-Refresh#
The dashboard polls the vault every 2 seconds. This means if you're redacting text on ChatGPT and have the dashboard open in another tab, the analytics update in near real-time.
All of this works because of the shared offscreen document. The dashboard sends vault:getAnalytics, the offscreen document reads the current state from the WASM instance, and returns the data. No storage sync issues, no stale caches.
What I Actually Learned#
Offscreen documents are the only sane way to share WASM state in MV3. Service workers cannot host long-lived shared DOM contexts, and content scripts are page-scoped. The offscreen document is the one place where the vault can stay alive and be shared across extension contexts.
Capture-phase event interception is essential. The naive approach — listen for Enter in the bubble phase — doesn't work. The AI site's submit handler runs first. You need addEventListener(..., true) and stopImmediatePropagation() to guarantee you control the flow.
Go's json.Marshal does not automatically emit camelCase keys. Without explicit
json:"fieldName" tags, JavaScript reads undefined from expected camelCase fields.
In Chrome runtime messaging listeners, always return true when replying asynchronously.
Otherwise the channel closes before sendResponse runs, and vault replies disappear.
WASM is surprisingly fast for text processing. Regex matching on strings is CPU-bound, and Go's regex engine is efficient. Even with the message-passing overhead (content script → background → offscreen → WASM → back), redaction adds less than 50ms to each keystroke. Imperceptible.
What's Missing#
The biggest gap: regex can't catch "My name is Alice" or "I live in Springfield." That requires NLP — named entity recognition, context-aware classification, probably a small model running locally. That's a fundamentally different project. Aegis catches the structured stuff (emails, IPs, keys) well enough that the unstructured gap is tolerable, but it's not a complete solution.
The second problem is token persistence. Right now everything lives in the WASM instance's memory. When the offscreen document is destroyed — browser restart, extension update, Chrome's memory management — every token map is gone. The AI response still says [EMAIL_1] but Aegis no longer knows what that means. Fixing this means serializing state to chrome.storage and rehydrating on startup, which is straightforward engineering but adds complexity to the offscreen document's lifecycle.
If You Want to Build Something Similar#
I can't share the full source for Aegis, but if you're building your own version, this is the checklist I would follow:
- Start with one redaction engine and one source of truth for token state
- Use an offscreen document in MV3 so every context talks to the same WASM instance
- Intercept submit events in the capture phase, not bubble phase
- Build a message contract early (
vault:protect,vault:reveal, analytics, rules) and keep it stable - Add exhaustive regex tests for real-world formats before shipping
- Persist vault state to extension storage if you need token continuity across browser restarts
If this post helps you ship your own local-first redaction tool, it has done its job.
Availability#
Aegis is an internal project for my personal workflows. The source code, extension package, and production configuration are not publicly available.
This article is intentionally written as a technical teardown so you can reproduce the architecture independently.
Have you built a browser extension with WASM before? What surprised you most about the offscreen document pattern?
Do you think local-first privacy tools like this are viable, or will AI companies eventually offer built-in redaction?
If you spot a technical mistake here, I'd rather fix it than leave it wrong.
Find me on GitHub, X, Peerlist, or LinkedIn.
Glossary#
- PII: Personally Identifiable Information. Data that can identify an individual — emails, phone numbers, SSNs, IP addresses, etc.
- WASM: WebAssembly. A binary format that runs at near-native speed in the browser. Go, Rust, and C++ can compile to it.
- Offscreen Document: A Chrome MV3 API for running code in a hidden document. Used for tasks that need DOM access but don't fit in content scripts or service workers.
- Content Script: JavaScript that runs in the context of a web page. Can access the page's DOM but is isolated from the page's JavaScript variables.
- Capture Phase: The first phase of DOM event propagation. Events travel from the document root down to the target element before bubbling back up.
- Token Map: A bidirectional mapping between PII values and their token replacements (e.g.,
alice@startup.io↔[EMAIL_1]). - Hydration: Replacing tokens in AI responses with their original values. Named after the concept of "rehydrating" dried data.
- MV3: Manifest V3. The latest Chrome extension platform. More restrictive than MV2 — no persistent background pages, limited API access.
- WXT: A framework for building browser extensions using web technologies. Handles bundling, manifest generation, and development workflow.
wasm_exec.js: Go's JavaScript runtime glue for WASM. Provides the bridge between Go's syscall interface and browser APIs.- Deduplication: Ensuring the same PII value always maps to the same token. If
alice@startup.ioappears twice, both occurrences become[EMAIL_1].