I Built a Mini Vercel like Clone in Rust in One Day. Here's Every Mistake I Made.
What I Wanted at the End#
Before I started, I wrote down exactly what "done" looked like:
- Push a commit to GitHub
- A webhook fires, a build starts in isolation
- Build logs stream to the browser in real time
- When the build finishes, a preview URL like
abc12345-preview.localhostis live - Hit that URL and get my Next.js app — HTML, CSS, JavaScript, everything
No cheating. Real webhooks. Real builds. Real artifacts. Real preview URLs through a reverse proxy.
The Stack I Chose#
| Layer | Technology |
|---|---|
| API | Rust + Axum |
| Database | PostgreSQL + sqlx |
| Message queue | NATS JetStream |
| Artifact storage | MinIO (S3-compatible) |
| Reverse proxy | Traefik v3 |
| Build runtime | Docker-in-Docker |
| Serve runtime | node:22-alpine containers |
I chose Rust because I wanted to, and because the async story is genuinely good for SSE streaming. I chose NATS JetStream over something simpler because I wanted at-least-once delivery — if the build worker crashes mid-build, the job should not be lost.
Both decisions proved worth it.
The Architecture#
GitHub Webhook
│
▼
Axum API ──── publishes job ────► NATS JetStream
│ │
│ ▼
│ Build Worker
│ docker run node:22-alpine
│ git clone → npm install → next build
│ docker cp artifacts out
│ upload to MinIO
│ publish result to NATS
│ │
│◄─── state updates ──────────────┘
│
▼
Traefik (port 80)
*.localhost → API → serve_artifact()
├── Static file in MinIO? → stream it directly
└── SSR route? → spin up node container
proxy request to it
Two processes. One message queue between them. The API never touches the build. The worker never touches HTTP. Clean from the start.
9:42 PM — The First Commit#
01d4a17 Added Initial Project Scaffold and SQL File.
A main.rs with three lines and a 92-line SQL migration. The migration is the real work — it defines the shape of everything: users, projects, deployments, api_keys, env_vars.
I spent longer on the schema than I expected. Deployments need state (queued → building → uploading → ready → error), artifact_key for the MinIO path, build_log, build_started_at, build_finished_at. Getting this right upfront saves rewriting migrations later.
By midnight I had auth middleware, JWT, error types, and a database connection. No routes. No business logic. Just the plumbing.
11 PM — Fighting GitHub OAuth for an Hour#
fb83b99 Implement GitHub OAuth flow using octocrab
9ae2a63 Fix octocrab OAuth: use reqwest for token exchange
7a18532 Fix Docker build: replace compile-time sqlx macros with runtime-checked queries
af4399a Fix Dockerfile: use rust:slim to support newer dependency MSRV
a35144f Fix model imports and Dockerfile Rust version for edition2024
fb5ff1c Fix &str/&String mismatch and deprecated TimeoutLayer
be0edce Fix TimeoutLayer arg order and &str/&String mismatch
680150b Fix reqwest form type mismatch using HashMap
ad56a4d Fix GitHub OAuth form data type inference
17bf1a9 Fix Axum 0.8 path parameter syntax
Ten commits to get OAuth working. That is not a typo.
The octocrab crate's OAuth token exchange did not work the way the docs implied. Swapped it for a raw reqwest call. Then sqlx's compile-time macros required a live database at compile time, which Docker's build stage does not have. Switched to runtime-checked queries. Then the Dockerfile's Rust version was too old for one of the dependencies. Then Axum 0.8 changed path parameter syntax from :id to {id}.
Each fix was small. All of them together took an hour.
The lesson: when you stack a new web framework, a new OAuth library, a new database library, and a new Rust edition in the same project, budget an hour just for the initial friction.
10:25 AM — NATS JetStream and Real-Time Log Streaming#
852b20f Phase 1: NATS JetStream build queue + SSE log streaming
This commit added 523 lines and is where the project became interesting.
Before explaining what I built, let me explain what NATS JetStream actually is — because it is not obvious from the name.
NATS is a lightweight messaging system. You publish a message to a subject like build.jobs. Anyone subscribed to that subject receives it. That is the core. It is fast, simple, and has almost no operational overhead.
JetStream is NATS's persistence layer. Plain NATS is fire-and-forget — if nobody is subscribed when you publish, the message is gone. JetStream stores messages in a stream and delivers them to consumers even if the consumer was offline when the message arrived. Consumers are durable: they remember their position and resume from where they left off after a restart.
For a build queue, this matters a lot. Consider what happens without persistence: the API publishes a build job. The worker is in the middle of restarting. The message arrives during that window. Nobody is home. The deployment is stuck in queued forever.
With JetStream: the message is stored in the stream. When the worker comes back up, it picks up the job. No lost builds. No manual intervention.
The other thing JetStream gives you is at-least-once delivery. The consumer acknowledges a message only after processing it. If the worker crashes mid-build without acknowledging, JetStream re-delivers. This is the difference between a queue that feels reliable and one that randomly drops work under failure.
For log streaming I used plain NATS (no persistence needed — live logs are ephemeral) and for build jobs I used JetStream. Right tool for each job.
The design: the API maintains one tokio::sync::broadcast channel per active deployment. When a client opens the SSE endpoint for a deployment, it subscribes to that channel. When the worker publishes a log line to NATS, the API's background subscriber receives it and fans it out to all SSE clients for that deployment.
// In the NATS subscriber background task
while let Some(msg) = subscriber.next().await {
if let Ok(log) = serde_json::from_slice::<LogLine>(&msg.payload) {
let sender = nats.get_log_sender(log.deployment_id).await;
let _ = sender.send(log);
}
}// In the SSE handler
let stream = async_stream::stream! {
loop {
match receiver.recv().await {
Ok(log_line) => yield Ok(Event::default().data(log_line.line)),
Err(RecvError::Closed) => {
yield Ok(Event::default().event("done").data(""));
break;
}
_ => {}
}
}
};JetStream gave me something I did not realize I needed until later: the build job survives a worker restart. The consumer is durable. If the worker goes down mid-build and comes back up, it picks up the job where it left off instead of silently dropping it.
11:50 AM — The Build Worker#
c2ee25b Convert to Cargo workspace with crates/api/
0b44119 Add NATS, MinIO, and worker services to docker-compose
5b8fcf7 Add build worker crate with git clone, Docker build, and MinIO upload
Three commits that restructured everything. The project became a Cargo workspace: crates/api/ and crates/build-worker/. Two binaries. One queue between them.
The worker's job:
- Subscribe to
build.jobson NATS JetStream - For each job:
docker run node:22-alpine sh -c "git clone {repo} && npm install && npm run build" - Stream stdout/stderr as log lines back to NATS
- When the build exits,
docker cpartifacts out of the container - Upload to MinIO under
{deployment_id}/ - Publish a
BuildResultback to NATS
The build runs inside a fresh Docker container every time. No shared state between builds. No leftover node_modules from a previous commit. Clean isolation by default.
12:00 PM — Linking GitHub Repos and Env Vars#
43c6f7b Implement POST /v1/projects/{id}/link to connect GitHub repo
e70a422 Add project env vars CRUD API and pass env vars to build jobs
03407be Fix GitHub OAuth callback to redirect to frontend with JWT token
Projects can now link to a GitHub repo. When a webhook fires for that repo's default branch, a deployment is automatically triggered.
Env vars were more interesting than I expected. They live in the database encrypted at rest, tagged with a target (build, runtime, or all). At build time, only build and all vars are passed to the worker — injected as -e KEY=VALUE flags on the docker run command. Runtime vars are a future problem.
2:00 PM — The Frontend#
37dc0df feat: create Next.js frontend for Vercel clone
e50e6c8 Add frontend features and configurations for Vercel clone
43e7357 feat: add GitHub repo fetching and new project page integration
60c00dc fix: resolve frontend API client issues and add /me endpoint
A Next.js dashboard. Projects list, deployments list, deployment detail with live log streaming, deploy button. Standard stuff. The log streaming was the interesting part — the frontend opens an EventSource to /v1/deployments/{id}/logs and appends lines as they arrive. When the done event fires, it polls the deployment state and updates the UI.
4:26 PM — MinIO Integration#
d666338 feat: integrate AWS SDK for S3 and add MinIO configuration
d81b0aa feat: add artifact_key to deployments
be40bb2 feat: implement deployment state update for uploading
MinIO is S3-compatible so I used the AWS SDK with a custom endpoint. The artifact_key column stores the MinIO prefix for each deployment's files — {deployment_id}/. Everything under that prefix belongs to that deployment.
The worker gained an uploading state: before uploading to MinIO, it publishes state: Uploading so the frontend can show progress. After a successful upload, it publishes state: Ready with the artifact_key.
4:53 PM — cargo-chef and Faster Docker Builds#
bcae9bd feat: update Dockerfiles to use cargo-chef for optimized builds
Without cargo-chef, every Docker build recompiled all dependencies from scratch. A clean build took several minutes. With cargo-chef, dependencies are compiled in a cached layer that only invalidates when Cargo.toml or Cargo.lock changes. Subsequent builds that only touch source files take seconds.
FROM lukemathwalker/cargo-chef:latest-rust-bookworm AS chef
WORKDIR /app
FROM chef AS planner
COPY . .
RUN cargo chef prepare --recipe-path recipe.json
FROM chef AS builder
COPY --from=planner /app/recipe.json recipe.json
RUN cargo chef cook --release -p vercel-clone-api --recipe-path recipe.json
COPY . .
RUN cargo build --release --locked -p vercel-clone-apiSmall change. Meaningful quality-of-life improvement when you are rebuilding every ten minutes to test a bug.
5:37 PM — The Big Serving Refactor#
a667a12 feat: implement deployment server management and artifact handling
This is the commit where serving went from "a rough idea" to "an actual architecture".
Before this, I had no plan for how to actually serve a Next.js app from MinIO artifacts. After thinking about it, I settled on two tiers:
Tier 1: Static files directly from MinIO. The API looks up the deployment by the HOST header, maps the request path to an S3 key, and streams the object body. No Node involved.
Tier 2: Dynamic routes via a Node container. If the file is not in MinIO (it's a server-rendered route), the API downloads the standalone build and starts a node:22-alpine container running server.js. It then proxies the request to that container.
The DeploymentServers struct manages the lifecycle of running containers — a HashMap<Uuid, RunningContainer> behind a Mutex, with an idle cleanup task that removes containers that have not served a request in five minutes.
pub struct DeploymentServers {
containers: Arc<Mutex<HashMap<Uuid, RunningContainer>>>,
work_dir: PathBuf,
docker_network: String,
idle_timeout_secs: u64,
}On the first request to a deployment URL, the container cold-starts. Subsequent requests hit it immediately. Five minutes of silence and it gets cleaned up. Simple and effective for local use.
Now the Bugs Start#
Everything above took about eight hours. The next four hours were debugging. Here is every bug, in order.
Bug 1: Build Time Always "Pending"#
build_started_at was always null. The build time column showed "Pending" even for completed deployments.
The worker published state: Building to NATS. The API subscriber handled Building by running a database update. But I never wrote the Building case in the match arm. It hit the catch-all _ => continue and the update never ran.
// Added this arm
crate::models::DeploymentState::Building => {
sqlx::query(
"UPDATE deployments SET state = 'building', build_started_at = NOW()
WHERE id = $1 AND state IN ('queued', 'building')",
)
.bind(result.deployment_id)
.execute(&*db)
.await
}One match arm. Twenty minutes of confusion.
Bug 2: Build Logs Not Persisted#
Logs streamed fine during the build. After the build finished, they were gone. The deployment detail page showed an empty log.
The pipeline was: worker publishes log lines → NATS → API subscriber → broadcast channel. On terminal state (ready/error), I closed the broadcast channel. The problem was that log lines and build results travel on different NATS subjects. A result can arrive before the last log lines have been processed.
The fix: buffer every log line in memory, keyed by deployment ID. On terminal state, sleep 500ms to drain in-flight messages, then flush the buffer to the database.
tokio::time::sleep(tokio::time::Duration::from_millis(500)).await;
let buffered_log = nats.take_log_buffer(result.deployment_id).await;
if !buffered_log.is_empty() {
sqlx::query("UPDATE deployments SET build_log = $1 WHERE id = $2")
.bind(&buffered_log)
.bind(result.deployment_id)
.execute(&*db)
.await?;
}Bug 3: artifact not found — Wrong Output Directory#
First actual preview URL test. Build completed. State was ready. Hit the URL. artifact not found.
The worker was auto-detecting output directories by checking for out/, build/, dist/ in order. My Next.js app used output: 'standalone', which produces .next/standalone/. None of my candidates matched. The artifact was uploaded under the wrong key structure.
I needed proper detection. The worker probes the stopped build container for specific files using docker cp:
let standalone_exists = tokio::process::Command::new("docker")
.args([
"cp",
&format!("{}:/app/repo/.next/standalone/server.js", container_name),
"-",
])
.stdout(std::process::Stdio::null())
.stderr(std::process::Stdio::null())
.status()
.await?
.success();
if standalone_exists {
return Ok(OutputType::Standalone);
}The critical detail: docker cp works on stopped containers. I initially used docker exec. The build container exits when the build completes. docker exec silently failed, stdout was empty, and detection always fell through to Static. This bug was invisible — no error, just wrong behavior.
Bug 4: MinIO Pagination Truncating Artifacts#
Fixed the detection. Hit the URL. Got: no standalone build found (n=0 files).
list_objects_v2 returns at most 1000 objects per call. A Next.js standalone build with node_modules has several thousand files. I was not paginating. The download stopped after the first page, server.js was in the objects I never fetched, and the check failed.
loop {
let mut req = s3_client.list_objects_v2().bucket(bucket).prefix(s3_prefix);
if let Some(ref token) = continuation_token {
req = req.continuation_token(token);
}
let resp = req.send().await?;
let truncated = resp.is_truncated.unwrap_or(false);
continuation_token = resp.next_continuation_token.clone();
// ... download objects ...
if !truncated { break; }
}Standard pagination loop. Should have been there from the start.
Bug 5: ERR_EMPTY_RESPONSE on First Visit#
Fixed pagination. Downloaded 2653 files. Started the container. Hit the URL. The browser returned ERR_EMPTY_RESPONSE.
Two problems compounded each other.
First: I was treating a successful TCP connection as "server is ready". Node accepts TCP connections slightly before it is ready to serve HTTP. I was proxying into that window and getting nothing back.
Second: I was forwarding Transfer-Encoding: chunked from Node's response to a client that had already received the full buffered body. The browser waited forever for a chunk terminator that never came.
Fix one: retry the proxy up to five times with a one-second gap.
for attempt in 0..5u32 {
if attempt > 0 { tokio::time::sleep(Duration::from_secs(1)).await; }
match client.get(&url).send().await {
Ok(resp) => return Ok(resp),
Err(e) => tracing::warn!(attempt, error = %e, "proxy attempt failed"),
}
}Fix two: strip hop-by-hop headers before forwarding the response.
const HOP_BY_HOP: &[&str] = &[
"transfer-encoding", "connection", "keep-alive",
"proxy-authenticate", "proxy-authorization",
"te", "trailers", "upgrade",
];Bug 6: CSS and JavaScript Returning 404#
HTML loaded. Page was a raw unstyled document. All /_next/static/ requests returned 404.
Next.js serves static assets at /_next/static/ in URLs but stores them as .next/static/ on disk. My MinIO key lookup used the URL path directly. _next/ mapped to nothing because the key was .next/.
let storage_path = if let Some(rest) = path.strip_prefix("_next/") {
format!(".next/{}", rest)
} else {
path.to_string()
};Fixed the lookup. Still 404. The files were not in MinIO at all.
The worker was uploading standalone/ correctly and uploading .next/static/ separately. The separate upload produced zero files. The docker_cp call for .next/static/ succeeded (the directory existed but I was copying from the wrong path inside the container) and upload_dir_with_prefix silently uploaded nothing.
The real fix: stop treating static files as a separate concern. In Next.js standalone deployments, the docs say to copy .next/static/ into standalone/.next/static/ so server.js serves them directly. Do that instead.
// Copy .next/static into standalone/ so server.js serves it
docker_cp(
&container_name,
"/app/repo/.next/static/.",
&local_standalone.join(".next").join("static"),
).await?;One directory. No separate upload. No separate download. The node container gets everything in one artifact.
Bug 7: The Docker Bind Mount Path Problem#
The API runs inside Docker. It downloads the standalone build to /tmp/vercel-clone-deployments/{id}/standalone/. Then it runs:
docker run -v /tmp/vercel-clone-deployments/{id}/standalone:/app node:22-alpine
The volume mount path is interpreted by the Docker daemon, which runs on the host. The API container's /tmp/vercel-clone-deployments/ is a bind mount from the host's /tmp/vercel-clone-deployments/. But Docker daemon looks for the path on the host, not in the container.
If container internal path and host path differ, the daemon mounts an empty directory. server.js does not exist. Container starts with nothing.
Fix: make both sides of the bind mount the same path.
# docker-compose.yml — was:
- /tmp/vercel-clone-deployments:/tmp/deployments
# Now:
- /tmp/vercel-clone-deployments:/tmp/vercel-clone-deploymentsThe daemon and the container agree on what the path means. Volume resolves correctly.
Bug 8: Preview URLs Stored With the Port Number#
BASE_DOMAIN in .env was localhost:8080. Every preview URL was stored as hash-preview.localhost:8080 in the database. Traefik routes on port 80 so the HOST header arriving at the API was hash-preview.localhost. Lookup failed every time.
# .env — was:
BASE_DOMAIN=localhost:8080
# Fixed:
BASE_DOMAIN=localhostPlus a one-line database fix for the 14 existing rows with the wrong value, and a port-stripping guard in the lookup handler.
Bug 9: docker exec on a Stopped Container#
After fixing the static files, I rebuilt the worker and triggered a new deployment. Output type was detected as static again despite the project having output: 'standalone' in next.config.js.
The build container runs foreground, not detached. When next build finishes, the container exits. My detection code called docker exec to probe for server.js. docker exec only works on running containers. The container had already stopped. docker exec failed silently. Detection fell through to Static every time.
The fix was already above — use docker cp instead, which works on stopped containers.
What It Looks Like Working#
A push to GitHub. Webhook fires. Build starts. Logs stream line by line in the dashboard. Build finishes in about two minutes. State goes to ready. A URL like http://e21aaddd-preview.localhost/ is live. HTML, CSS, JavaScript, fonts — everything loads.
That is the thing I wanted at the start of the day.
What I Would Do Differently#
Use docker cp for any file probe on build containers from the start. The build runs foreground. The container exits. docker exec will never work. I knew this and forgot it.
Read the Next.js standalone docs before writing the serving code. The official deployment guide explicitly says to copy .next/static/ into standalone/.next/static/. That one paragraph would have saved two bug sessions.
Paginate every S3 list call on day one. The 1000-object limit is in every S3 SDK's documentation. Skipping pagination works until you have a real node_modules directory. Then it silently breaks.
Make bind mount paths identical on host and container. Docker-in-Docker volume mounts are resolved by the daemon on the host. If the paths differ, you get a silent empty mount. Use the same path on both sides.
Test with a real project immediately. Trivial test projects do not have 2000-file node_modules, standalone output directories, or real CSS. Every bug above was invisible until I used an actual Next.js portfolio.
What I Actually Learned#
Building a simplified version of a tool you use every day is one of the better ways to understand it.
The Vercel deploy button is backed by surprisingly few moving parts: a webhook receiver, a job queue, a build runner, object storage, and a reverse proxy. Each one is individually straightforward. The complexity is in the coordination — async state machines, message ordering, file path conventions that have to match across three different processes.
Rust was the right choice for this. The async runtime handled SSE streaming and concurrent build monitoring cleanly. sqlx's compile-time query checking caught schema mismatches before runtime. The borrow checker enforced that shared state — the NATS client, the deployment server map — was accessed safely across async tasks without a runtime data race.
The rough parts were expected: compile times, verbose error type boilerplate, fighting the type system occasionally. Nothing that was an actual blocker.
NATS JetStream earned its place. At-least-once delivery meant no jobs were lost during restarts. The durable consumer meant I could restart the worker mid-test without re-triggering builds. Message replay let me debug stuck deployments. Worth the extra setup.
The one thing that surprised me: how much of the difficulty was file path conventions. Which directory does next build produce? Where does docker cp put the files? Which path does Docker daemon resolve for a volume mount? These are not algorithmic problems. They are plumbing problems. And they account for roughly half the bug list above.
The Bottom Line#
57 commits. One day. A working local Vercel clone — GitHub webhooks, NATS build queue, MinIO artifact storage, Traefik routing, isolated Node containers serving real preview URLs.
The deploy button is not magic. It is a webhook, a queue, a build container, some object storage, and a reverse proxy. Surprisingly few moving parts once you trace all of them.
I understand what happens when I click deploy now.
All code is on GitHub. Questions or corrections — find me there.
Related posts
Sponsor
Support my open-source work
If my projects, blog posts, or tools have helped you, consider sponsoring me on GitHub. Every contribution keeps the side projects shipping.