Anthony Paul Ruiz

$ whoami

About

Observability Engineer · DevOps · Platform Engineering

I got into technology through curiosity and necessity — starting as a self-taught IT tech at a Manhattan law firm and growing into owning everything from network infrastructure to digital advertising to custom software. That scrappiness never left. Today I design and operate observability platforms that give engineering teams real signal in production: the kind of dashboards and alerting pipelines that mean your on-call engineer wakes up to a clear picture, not a wall of noise. If something needs building, I build it — I've replaced vendor products with purpose-built internal tools, made the case for better solutions and then delivered them solo, and integrated data sources across stacks that were never meant to talk to each other. I care about systems that are honest — metrics that reflect reality, runbooks that actually get used, and infrastructure that can explain itself. When I'm not building for clients, I'm running a self-hosted k3s homelab on Proxmox, shipping this portfolio site through a GitOps pipeline, and looking for ways to push observability deeper into every layer of the stack.

United States - MI - (Remote)anthonypruiz@outlook.comanthonypaulruiz.com

$ ls -la tools/

Grafana

Datadog

OpenTelemetry

Prometheus

Kubernetes

Docker

Argo CD

Helm

OpsGenie

Terraform

GitHub Actions

Proxmox

AWS

Cloudflare

NGINX

PostgreSQL

Redis

Home Assistant

Linux

Python

Bash

VMware

ClickHouse

InfluxDB

Ubuntu

Slack

Next.js

Zendesk

Postman

10+

years in production infrastructure

~80%

ticket reduction at StrongArm via telemetry and Grafana OnCall

Built twice

made the case for better tooling, then delivered it solo — at two separate companies

$ cat resume.yaml

Experience

Observability Administrator

TekStream Solutions

Feb 2023 — Mar 2026

Observability SRE embedded within the digital enablement org at a top-5 U.S. restaurant chain, enabling telemetry adoption (logs, traces, metrics) across ~30 engineering teams and hundreds of Kubernetes microservices. A primary POC for all Grafana, Datadog, OpenTelemetry, and OpsGenie troubleshooting across the Digital Enablement sub-org. Sole engineer behind a custom Next.js Health Status Dashboard and Slack Incident Command bot — integrating OpsGenie, Okta, and the Slack API — replacing Statuspage.io with a purpose-built incident coordination platform.

Top 5

US restaurant chain client

~30

teams enabled on Datadog, Grafana & OTel

Solo Engineer

custom incident platform (web app dashboard + bot)

›Expert in Datadog, Grafana, and OpenTelemetry — served as a primary implementation and troubleshooting POC for the entire digital enablement sub-org across ~30 teams and hundreds of Kubernetes microservices
›Single-handedly built a custom Next.js Health Status Dashboard on Vercel — integrating Okta SSO, OpsGenie, and the Slack API — after being asked to improve on Statuspage.io's limitations
›Built a Slack Incident Command bot from scratch: auto-routes responders per component via OpsGenie, manages dedicated incident channels, and posts live status updates to the health dashboard
›Established and enforced telemetry collection best practices across 30+ teams: tagging standards, cardinality-driven cost controls, and sampling compliance enforcement
›Rotated on-call as Managed Incident Response coordinator — led cross-functional stakeholder communications following structured runbooks

DatadogGrafanaOpenTelemetryOpsGenieKubernetesArgo CDNext.jsVercelOktaSlack APIStatuspage.ioIncident ManagementSREAWS

Senior Engineer

StrongArm Technologies

Jun 2021 — Feb 2023

Lead support engineer at an IoT workplace safety startup, responsible for all client troubleshooting, onboarding, and serving as the technical voice of the product on sales calls. Made the case for Grafana and single-handedly built the company's entire observability stack from scratch — integrating Clickhouse SQL, in-house APIs, and inventory databases into a unified operational picture. Wired Grafana OnCall to auto-generate Zendesk tickets with full client context, eliminating black-box troubleshooting and driving 80%+ of all support ticket creation automatically.

~80%

of tickets automated via Grafana OnCall

0 → prod

Grafana stack built solo from scratch

Solo Engineer

analytics that rivaled the full data team

›Single-handedly built the Grafana observability stack from scratch — integrating Clickhouse SQL, Prometheus, InfluxDB, BigQuery, Databricks, in-house APIs, and inventory databases to surface real-time warehouse and device health in a single pane of glass
›Built Grafana OnCall routing that auto-generated Zendesk tickets with full context (warehouse name, dock SAT number, contact info mapped from payload) — driving 80%+ of all ticket creation automatically with no alert fatigue
›Built analytics that rivaled the data analytics team's visibility into device and worksite usage patterns by merging data sources in ways the org had never done before
›Served as technical voice of the product on sales calls and primary troubleshooting POC for all client issues; deployed on-site to critical client locations to resolve issues directly
›Mentored growing support team and authored virtually all internal and client-facing documentation for device setup, troubleshooting, and platform integration

GrafanaGrafana OnCallDatadogPrometheusInfluxDBClickhouseBigQueryDatabricksSQLAWSZendeskWorkspaceOneLookerIoT

Jr. System Admin / Full Stack Developer

PFR IT Consulting Co

Apr 2014 — Jun 2021

Grew from part-time IT technician to the sole technical, web, and marketing owner of a prominent 50-person Manhattan law firm. Shortly after joining, identified $10K/month in wasted Google Ads spend and took over full management of their $25K/month account. Rebuilt the website from scratch (10s+ → under 2 seconds), administered the Windows AD network, and independently built a suite of custom C# tools and a Grafana-backed monitoring platform covering security, productivity, and system health.

<2s

page load (was 10s+)

$10K/mo

wasted ad spend identified and eliminated

Solo

IT, web, marketing, and custom tooling

›Identified $10K/month in wasted Google Ads spend immediately upon analysis; took over full $25K/month account management and significantly improved ROI
›Rebuilt the firm's website on a self-managed Ionos VM (PHP/HTML/JS) — cut page load from 10s+ to under 2 seconds; configured Cloudflare for DNS, image optimization, and email security (DMARC, SPF, DKIM)
›Devised a guerrilla marketing campaign — branded gear on construction sites + social media hashtag — that generated client leads and still drives visibility years later
›Built a custom C# client-arrival notification system (SQL Express backend, multi-user UI) — alerted paralegals in real time and gave the office manager a live wait-time view, eliminating missed client visits
›Built a Grafana-backed security and productivity monitoring suite from scratch: Zabbix metrics, keystroke logging, delta-based screenshot capture (15s interval, no duplicates), idle time tracking, login/logout events, and server room temperature — all surfaced in a single dashboard with scheduled daily PDF reports

GrafanaZabbixC# / .NETSQL ExpressCloudflareActive DirectoryPHPGoogle AdsAdobe SuiteSEO

$ ls -la ~/projects

Projects

Enterprise health status dashboard showing component health grid with inline 30/60/90-day uptime metrics, admin controls, and recent incident history

8 min read

Engineering / Observability

Enterprise Health Status Dashboard

Custom Next.js web app built as an internal replacement for Statuspage.io after the client needed customizations the vendor couldn't provide. Integrates Okta SSO, OpsGenie, and the Slack API for fully configurable component health views and role-based stakeholder access. Deployed on Vercel.

Next.jsVercelOktaOpsGenie+1

Auto-created incident channel showing bot messages, responder invites, and 30-minute update reminders

7 min read

Engineering / Incident Management

Slack Incident Command Bot

Solo-built Slack bot that automates incident response coordination via the OpsGenie API. Auto-routes responders into dedicated incident channels by component, manages stakeholder communications, and posts live status updates directly to the Health Status Dashboard — eliminating manual triage and driving consistent incident process.

Slack APIOpsGenieAWS

Grafana dashboard showing IoT device health, warehouse sync status, and automated OnCall ticket routing

9 min read

Observability

Enterprise IoT Observability with Grafana

Made the case for Grafana at an IoT workplace safety startup and built the entire observability stack solo — integrating 7 data sources including Clickhouse SQL, Prometheus, InfluxDB, BigQuery, Databricks, an in-house API, and an inventory database. Wired Grafana OnCall to auto-generate fully-contextualized Zendesk tickets, driving 80%+ of all support ticket creation automatically.

GrafanaGrafana OnCallClickhouseInfluxDB+3

Grafana office overview dashboard showing per-workstation keystroke counters, active window titles, idle time, and screenshot thumbnails

9 min read

Engineering / IT

Staff Monitoring & Productivity Platform

Built a comprehensive internal monitoring platform from scratch for a 50-person law firm. Integrated Zabbix metrics, a custom keystroke logger (daily counts + application/website tracking), a delta-based screenshot service (15s interval, skips duplicates), login/logout event tracking, idle time analysis, and server room temperature monitoring via web scrape — all surfaced in a Grafana dashboard with scheduled daily PDF reports to management.

GrafanaZabbixC#SQL Express+1

Before and after split view: left shows a 10-second WebPageTest waterfall on shared hosting, right shows sub-2-second load with Cloudflare CDN and dedicated VM

6 min read

Engineering / Web

Law Firm Website Rebuild

Rebuilt the website for Gorayeb & Associates — a prominent Manhattan personal injury law firm — from a slow, unoptimized site to a fast, well-ranked one. Migrated to a self-managed Ionos VM running PHP/HTML/JS, cutting page load from 10+ seconds to under 2 seconds. Configured Cloudflare for DNS, image compression, and email security (DMARC, SPF, DKIM). Built conversion-focused landing pages that supported a $25K/month Google Ads account.

PHPCloudflareGoogle AdsSEO+1

Proxmox web UI showing all seven VMs — four k3s nodes, Windows workstation, Home Assistant, and Ubuntu dev — each with live CPU and memory stats

14 min read

Infrastructure / Homelab

Self-Hosted k3s Homelab

4-node k3s cluster on Proxmox with a production-grade observability stack: Prometheus, Loki, Tempo, and Grafana. Browser RUM via Grafana Faro → Grafana Alloy → Tempo, enabling end-to-end distributed traces from browser click to server span. Exposed via Cloudflare Tunnel with zero open inbound ports. Full GitOps: Argo CD syncs from GitHub.

k3sProxmoxArgo CDCloudflare+3

anthonypaulruiz.com — portfolio site hero screenshot

8 min read

Engineering / DevOps

Portfolio Site

Next.js 16 + React 19 portfolio with live Prometheus health checks, React Flow infra diagram, and an xterm.js interactive terminal. Deployed via GitOps: GitHub Actions → ghcr.io → Argo CD → k3s.

Next.jsReactPrometheusDocker+1

ObservabilityEngineer

About