apr_
home/portfolio/Building My Portfolio Site: Next.js, Live Prometheus, and GitOps on a Homelab k3s Cluster

April 16, 2026 · 8 min read · Engineering / DevOps

Building My Portfolio Site: Next.js, Live Prometheus, and GitOps on a Homelab k3s Cluster

How I turned a job-search portfolio into a production-grade observability showcase

A deep dive into how I built anthonypaulruiz.com — Next.js 16 + React 19, live Prometheus metrics, React Flow infra diagrams, OpenTelemetry tracing, and a GitOps deployment pipeline via GitHub Actions → Argo CD → k3s.

TL;DR

I built my portfolio site to be a live demonstration of the observability and DevOps skills I describe on it — real Prometheus metrics, OTEL span waterfalls, an xterm.js shell, and a full GitOps pipeline deploying to my homelab k3s cluster via Argo CD.

Most portfolio sites are static pages with a skills list. I wanted mine to *be* the demo — a working system that shows, not tells, what I do every day. If you can click a button and watch a real OpenTelemetry span waterfall appear, you don't need to take my word for it.


Tech Stack

FrameworkNext.js 16 + React 19 (App Router)
StylingTailwind CSS v4 (no config file — all tokens in globals.css)
Componentsshadcn/ui (new-york style, zinc base)
ChartsRecharts via shadcn chart wrapper
Infra diagramReact Flow (@xyflow/react v12)
Terminalxterm.js (@xterm/xterm + fit + web-links addons)
AnimationsMotion (Framer Motion v12, import from motion/react)
LoggingPino + LogLayer — JSON to stdout in prod, pino-pretty in dev. OTEL mixin injects traceId/spanId on every record.
Server tracingOpenTelemetry NodeSDK + auto-instrumentations. RingBufferExporter (in-process, for the live trace demo) + BatchSpanProcessor → OTLPTraceExporter → Tempo (OTLP/HTTP port 4318).
Browser RUMGrafana Faro (@grafana/faro-react + faro-web-tracing). Captures Web Vitals, JS errors, console, and browser OTEL traces. Browser spans propagate W3C traceparent → Tempo, creating end-to-end traces from click to server.
Tracing backendGrafana Tempo — receives spans from the server (direct OTLP/HTTP) and from browser (via Grafana Alloy Faro receiver → Tempo).
RuntimeNode.js 22 (Alpine), Docker, k3s on Proxmox homelab
CI/CDGitHub Actions → ghcr.io → Argo CD → k3s (GitOps)

Why Deploy to a Homelab Instead of Vercel?

Vercel would have taken about fifteen minutes. That's exactly why I didn't use it. A portfolio site for a DevOps engineer should demonstrate DevOps — not outsource it. Deploying to a self-managed k3s cluster on Proxmox hardware, exposed via Cloudflare Tunnel, kept in sync by Argo CD, means every visitor is exercising the same stack I manage in production environments every day.

The hosting decision also creates something a Vercel deploy can't: a real observability target. There's a Prometheus scrape job, a Loki log stream, and a Tempo trace pipeline because there's a real cluster behind the site. The dashboards aren't mocked-up screenshots — they're pulling from the same exporters that watch the nodes this site runs on.

The deployment is the showcase

Every visitor to the site is hitting a pod on a k3s cluster I manage myself — including the Cloudflare Tunnel, Argo CD sync, and Prometheus scrape job. That's the point.

Explore the Infra Diagram — the real Proxmox topology that hosts every request to this page.

Live Prometheus Metrics in the Observability Section

lib/metrics.ts fires 13 parallel Prometheus queries on every server render — instant queries for node CPU/mem, pod counts, restart rates, and 30-day service uptime, plus two range queries for the 24-hour CPU history and 7-day restart trend. Every query gets its own 5-second `AbortController` timeout so a slow or unreachable Prometheus instance can't hang the page render. `Promise.allSettled` collects them all; failures fall through to a deterministic mock that uses the same data shape, so the UI is never broken — it just shows mock data with an indicator.

The whole function is wrapped in `React.cache()` for render-level deduplication — if two Server Components on the same request both call fetchMetrics(), Prometheus only sees one burst of queries. next: { revalidate: 60 } on each fetch tells Next.js to cache the response for 60 seconds at the ISR layer, which means the metrics are fresh without hammering Prometheus on every single page load.

One query worth calling out: buildNodeMetrics uses kube_node_info to map node-exporter instance IPs to real Kubernetes node hostnames. Without this, the Node Resources chart would label bars by IP address. A small query, a big readability difference.

Live Prometheus data: stat cards, 24h CPU history, per-node resources, and service health — all from real cluster exporters.

Explore the Observability section — live charts and stat cards updated from Prometheus every 60 seconds.

lib/metrics.ts (excerpt)
// 13 parallel Prometheus queries, each with a 5s abort timeout
const [upResult, cpuRangeResult, nodeInfoResult /* ... */] = await Promise.allSettled([
  promQuery("up"),
  promQueryRange("avg(rate(node_cpu_seconds_total{mode!='idle'}[5m])) * 100", "24h", "3600"),
  // ...
]);

OpenTelemetry: Turning a Portfolio Into a Trace Demo

The Live Request Trace button in the Observability section is the most technically involved piece of the site. Clicking it fires GET /api/live-trace, which creates a real OTEL trace with four child spans: validate_request, fetch_cluster_snapshot (a live Prometheus query), query_loki (a reachability probe), and serialize_response. Pino logs are emitted at each step with the active traceId and spanId automatically injected via the OTEL mixin — no call-site changes needed.

After the trace completes, the UI waits four seconds then fetches GET /api/live-trace/logs?traceId=xxx, which queries Loki for log lines matching that exact traceId. If Loki is reachable and has ingested the logs, the source badge shows 'loki' (green, pulsing). If Loki is unreachable or the logs haven't landed yet, the route falls back to a ring buffer of CapturedLog objects that were attached to the trace in-process — the badge shows 'captured' (amber). Either way, the UI has something to show.

Spans and logs are linked by a shared color palette keyed by spanId. Hovering a span in the waterfall highlights the correlated log lines, and vice versa. The visual connection makes the trace↔log relationship tangible in a way that text descriptions don't.

Span waterfall (left) and correlated Loki log lines (right). Each span and its logs share a color — hovering either side highlights the other.

Try the Live Request Trace — click the button and watch a real OTEL span waterfall appear, with correlated Loki logs alongside it.

Turbopack module isolation

Next.js dev mode with Turbopack can load modules in separate worker contexts. A globalThis-based ring buffer won't work — the exporter and route handler may not share the same global. Direct span collection in the route handler is the reliable path.

Infra Diagram: Rendering a Real Proxmox Topology with React Flow

The Infra Diagram section is a React Flow graph of the actual Proxmox homelab — not a sanitized diagram, but the real topology: Internet → Cloudflare Tunnel → Proxmox host → VMs (four k3s nodes, Home Assistant, an Ubuntu dev box, and a Windows workstation with GPU and SSD passthrough) → k3s cluster → running workloads.

Each k3s VM node receives live CPU% and memory% badges injected from Prometheus node_cpu_seconds_total and node_memory_* metrics. The trick is VM_TO_NODE_NAME — a static map from React Flow node IDs to Prometheus node-exporter hostnames. Without it, a useMemo lookup over the raw metric results would have no way to know which node-exporter instance corresponds to which diagram node. Color-coding makes the status scannable: blue/yellow/red for CPU, purple/yellow/red for memory.

Every node has a hover tooltip that shows the full spec sheet: hardware (Proxmox), k3s version for VM nodes, GPU and SSD passthrough details for the Windows workstation, and sensor/alert details for Home Assistant. The specs live in lib/homelab-data.ts alongside the cluster config, so they're easy to keep in sync as the homelab evolves.

The real Proxmox topology rendered in React Flow. Live CPU% and MEM% badges are injected from Prometheus — hover any node to see the full spec sheet.

Explore the Infra Diagram — hover any node for the full spec sheet, with live CPU and memory pulled from Prometheus.

An Interactive xterm.js Terminal in the Browser

xterm.js is a browser-only library — importing it at module level in a Next.js Server Component or even a 'use client' file that renders on the server will break the build. The fix is a dynamic import inside useEffect: the terminal, its CSS, FitAddon, and WebLinksAddon are all loaded client-side after mount. `ResizeObserver` watches the container and calls fitAddon.fit() on size changes, keeping the terminal columns and rows in sync with the layout. Everything is torn down and disposed on unmount to avoid WebGL context leaks.

The COMMANDS dictionary maps command strings to handler functions. Most commands return static output — kubectl get pods, uptime, skills — but curl /api/metrics is live: it hits the real /api/metrics endpoint and formats the response as cluster stats and service health rows with a source tag indicating whether the data came from Prometheus or the deterministic mock fallback.

Try it on the home page — type help to see available commands, or curl /api/metrics for live cluster data.

Grafana Faro: Browser-to-Server Distributed Tracing

Server-side OTEL only tells half the story. `@grafana/faro-react` and @grafana/faro-web-tracing add browser RUM — capturing Web Vitals (LCP, CLS, FCP, TTFB), unhandled JS errors, and console output. More importantly, TracingInstrumentation injects a W3C `traceparent` header on every fetch and XHR call the browser makes. That header becomes the parent context for the server-side OTEL span — so a single trace in Tempo spans from button click in the browser all the way through the Next.js route handler.

Grafana Alloy as the Faro collector

The browser SDK posts RUM payloads to a faro.receiver running in Grafana Alloy on port 8027. Alloy fans these out: traces → Tempo (OTLP/HTTP, port 4318), logs → Loki gateway. Alloy's own HTTP server lives on port 12345 — the Faro receiver runs on a separate ClusterIP Service (alloy-faro) to avoid the bind conflict. The Cloudflare Tunnel exposes this ClusterIP publicly so the browser SDK can POST from any device.

NEXT_PUBLIC_FARO_URL must be set at build time

NEXT_PUBLIC_* env vars are inlined into the client bundle at next build. If NEXT_PUBLIC_FARO_URL is only set in the pod's runtime env, Faro silently no-ops in the browser — the variable was never inlined. The Dockerfile declares ARG NEXT_PUBLIC_FARO_URL and exports it as ENV before npm run build, and the GitHub Actions workflow passes it via build-args: from a repo-level variable.

GitOps Deployment: GitHub Actions → Argo CD → k3s

Every push to main triggers a GitHub Actions workflow: build a Docker image with buildx, push it to ghcr.io/anthonypruiz/anthonypaulruiz tagged with the short SHA, then commit the new image tag directly into k8s/deployment.yaml. Argo CD watches the repo, detects the manifest change, and syncs the Deployment. Rolling update, zero downtime.

There's no KUBECONFIG in CI and no kubectl set image — the cluster never has to be reachable from the outside. Argo CD pulls from GitHub, not the other way around. automated: prune + selfHeal keeps the cluster in sync even if someone manually edits a resource. The ArgoCD Application manifest also excludes itself from sync to prevent the sync loop problem.

One CI detail worth noting: NEXT_PUBLIC_FARO_URL and NEXT_PUBLIC_APP_VERSION are passed as Docker build-args from a repo-level Actions variable. Next.js inlines NEXT_PUBLIC_* vars into the client bundle at build time — if they're only in the pod's runtime env, the browser SDK gets undefined and Faro silently no-ops. The Dockerfile declares both as ARG and re-exports them as ENV before npm run build to make this work.

.github/workflows/deploy.yml (excerpt)
- name: Update image in deployment manifest
  run: |
    sed -i "s|image: ghcr.io/anthonypruiz/anthonypaulruiz:.*|image: ghcr.io/anthonypruiz/anthonypaulruiz:sha-${{ github.sha }}|" k8s/deployment.yaml
    git commit -am "chore(deploy): bump image to sha-${{ github.sha }}"
    git push
Four steps, zero KUBECONFIG: buildx → ghcr.io push → manifest commit → Argo CD auto-sync. The cluster pulls; CI never pushes.

Lessons Learned

  • Tailwind v4 has no config file. All theme tokens live in globals.css under @theme inline. The import is @import "tailwindcss", not the old @tailwind base/components/utilities directives. It took adjustment, but the result is cleaner — one file owns the design system.
  • ssr: false dynamic imports are illegal in Server Components. Next.js 16 will throw at build time. below-fold.tsx exists as a 'use client' wrapper solely to own the next/dynamic calls for Observability, InfraDiagram, and TerminalSection — the three heaviest below-fold sections. The Server Component passes data down as props; the client wrapper handles the lazy loading.
  • next-themes 0.4.x logs a React 19 console warning about an inline <script> tag in a component. The script is the FOUC prevention mechanism. It's harmless, it's upstream, and it cannot be suppressed from this codebase. Learn to ignore it.
  • Pino + OTEL mixin gives you free correlation. The mixin() function in lib/logger.ts reads the active OTEL span context on every log call and injects traceId, spanId, dd.trace_id, and dd.span_id. Every log line is automatically correlated to its trace — no call-site changes, no manual ID threading.
  • React Flow custom node data must extend Record<string, unknown>. TypeScript's structural typing means any interface that doesn't satisfy the index signature will produce an error when passed to Node<YourDataType>[]. Extend the interface and move on.
  • Turbopack can isolate modules into separate worker contexts. A globalThis-based ring buffer will appear empty to a route handler running in a different worker than the span exporter. Direct span collection inside the route handler is the only reliable path for the live trace demo.

This site isn't a finished artifact — it's a running system. When I add a new node to the cluster, the infra diagram gets a new card. When I wire up a new exporter, the Observability section gets a new chart. When I solve an interesting infrastructure problem, it becomes a blog post. The stack evolves the same way production systems do: incrementally, with real constraints, and with observability turned on the whole time.

If you're a hiring manager evaluating whether I can build and operate distributed systems: everything you've seen on this site — the traces, the metrics, the logs, the deployment pipeline — is the same work I'd bring to your platform. It's not a recreation. It's the real thing.

#Next.js#React#Prometheus#OpenTelemetry#Kubernetes#Argo CD#Tailwind#GitOps

Source code on GitHub →

apr·Anthony Paul Ruiz
ci/cdhosted onk8sviacloudflare