anthonypaulruiz.com

Anthony Paul Ruiz

Observability Engineer · DevOps · Platform Engineering

Summary

Observability and DevOps engineer with 10+ years of experience enabling engineering teams with production-grade telemetry, building observability platforms from scratch, and engineering custom incident response tooling. Expert in Grafana, Datadog, and OpenTelemetry — with a consistent track record of turning black-box troubleshooting into automated, proactive operations and building the purpose-built tools that make it possible.


Experience

Observability Administrator · TekStream Solutions
Feb 2023 — Mar 2026 · Remote
  • Expert in Datadog, Grafana, and OpenTelemetry — served as a primary implementation and troubleshooting POC for the entire digital enablement sub-org across ~30 teams and hundreds of Kubernetes microservices
  • Single-handedly built a custom Next.js Health Status Dashboard on Vercel — integrating Okta SSO, OpsGenie, and the Slack API — after being asked to improve on Statuspage.io's limitations
  • Built a Slack Incident Command bot from scratch: auto-routes responders per component via OpsGenie, manages dedicated incident channels, and posts live status updates to the health dashboard
  • Established and enforced telemetry collection best practices across 30+ teams: tagging standards, cardinality-driven cost controls, and sampling compliance enforcement
  • Rotated on-call as Managed Incident Response coordinator — led cross-functional stakeholder communications following structured runbooks

Datadog · Grafana · OpenTelemetry · OpsGenie · Kubernetes · Argo CD · Next.js · Vercel · Okta · Slack API · Statuspage.io · Incident Management · SRE · AWS

Senior Engineer · StrongArm Technologies
Jun 2021 — Feb 2023 · New York, NY (Hybrid)
  • Single-handedly built the Grafana observability stack from scratch — integrating Clickhouse SQL, Prometheus, InfluxDB, BigQuery, Databricks, in-house APIs, and inventory databases to surface real-time warehouse and device health in a single pane of glass
  • Built Grafana OnCall routing that auto-generated Zendesk tickets with full context (warehouse name, dock SAT number, contact info mapped from payload) — driving 80%+ of all ticket creation automatically with no alert fatigue
  • Built analytics that rivaled the data analytics team's visibility into device and worksite usage patterns by merging data sources in ways the org had never done before
  • Served as technical voice of the product on sales calls and primary troubleshooting POC for all client issues; deployed on-site to critical client locations to resolve issues directly
  • Mentored growing support team and authored virtually all internal and client-facing documentation for device setup, troubleshooting, and platform integration

Grafana · Grafana OnCall · Datadog · Prometheus · InfluxDB · Clickhouse · BigQuery · Databricks · SQL · AWS · Zendesk · WorkspaceOne · Looker · IoT

Jr. System Admin / Full Stack Developer · PFR IT Consulting Co
Apr 2014 — Jun 2021 · Manhattan, NY
  • Identified $10K/month in wasted Google Ads spend immediately upon analysis; took over full $25K/month account management and significantly improved ROI
  • Rebuilt the firm's website on a self-managed Ionos VM (PHP/HTML/JS) — cut page load from 10s+ to under 2 seconds; configured Cloudflare for DNS, image optimization, and email security (DMARC, SPF, DKIM)
  • Devised a guerrilla marketing campaign — branded gear on construction sites + social media hashtag — that generated client leads and still drives visibility years later
  • Built a custom C# client-arrival notification system (SQL Express backend, multi-user UI) — alerted paralegals in real time and gave the office manager a live wait-time view, eliminating missed client visits
  • Built a Grafana-backed security and productivity monitoring suite from scratch: Zabbix metrics, keystroke logging, delta-based screenshot capture (15s interval, no duplicates), idle time tracking, login/logout events, and server room temperature — all surfaced in a single dashboard with scheduled daily PDF reports

Grafana · Zabbix · C# / .NET · SQL Express · Cloudflare · Active Directory · PHP · Google Ads · Adobe Suite · SEO


Projects

Enterprise Health Status Dashboard

Custom Next.js web app built as an internal replacement for Statuspage.io after the client needed customizations the vendor couldn't provide. Integrates Okta SSO, OpsGenie, and the Slack API for fully configurable component health views and role-based stakeholder access. Deployed on Vercel.

Next.js · Vercel · Okta · OpsGenie · Slack API

Slack Incident Command Bot

Solo-built Slack bot that automates incident response coordination via the OpsGenie API. Auto-routes responders into dedicated incident channels by component, manages stakeholder communications, and posts live status updates directly to the Health Status Dashboard — eliminating manual triage and driving consistent incident process.

Slack API · OpsGenie · AWS

Enterprise IoT Observability with Grafana

Made the case for Grafana at an IoT workplace safety startup and built the entire observability stack solo — integrating 7 data sources including Clickhouse SQL, Prometheus, InfluxDB, BigQuery, Databricks, an in-house API, and an inventory database. Wired Grafana OnCall to auto-generate fully-contextualized Zendesk tickets, driving 80%+ of all support ticket creation automatically.

Grafana · Grafana OnCall · Clickhouse · InfluxDB · BigQuery · Prometheus · IoT

Staff Monitoring & Productivity Platform

Built a comprehensive internal monitoring platform from scratch for a 50-person law firm. Integrated Zabbix metrics, a custom keystroke logger (daily counts + application/website tracking), a delta-based screenshot service (15s interval, skips duplicates), login/logout event tracking, idle time analysis, and server room temperature monitoring via web scrape — all surfaced in a Grafana dashboard with scheduled daily PDF reports to management.

Grafana · Zabbix · C# · SQL Express · Windows

Law Firm Website Rebuild

Rebuilt the website for Gorayeb & Associates — a prominent Manhattan personal injury law firm — from a slow, unoptimized site to a fast, well-ranked one. Migrated to a self-managed Ionos VM running PHP/HTML/JS, cutting page load from 10+ seconds to under 2 seconds. Configured Cloudflare for DNS, image compression, and email security (DMARC, SPF, DKIM). Built conversion-focused landing pages that supported a $25K/month Google Ads account.

PHP · Cloudflare · Google Ads · SEO · Linux

Self-Hosted k3s Homelabhttps://anthonypaulruiz.com

4-node k3s cluster on Proxmox with a production-grade observability stack: Prometheus, Loki, Tempo, and Grafana. Browser RUM via Grafana Faro → Grafana Alloy → Tempo, enabling end-to-end distributed traces from browser click to server span. Exposed via Cloudflare Tunnel with zero open inbound ports. Full GitOps: Argo CD syncs from GitHub.

k3s · Proxmox · Argo CD · Cloudflare · OpenTelemetry · Tempo · Grafana Faro

Portfolio Sitehttps://anthonypaulruiz.com

Next.js 16 + React 19 portfolio with live Prometheus health checks, React Flow infra diagram, and an xterm.js interactive terminal. Deployed via GitOps: GitHub Actions → ghcr.io → Argo CD → k3s.

Next.js · React · Prometheus · Docker · Kubernetes


Skills

Observability & MonitoringGrafana, Grafana OnCall, Grafana Faro, Grafana Alloy, Datadog, Prometheus, OpenTelemetry, Tempo, Loki, OpsGenie, Statuspage.io, Zabbix
Orchestration & GitOpsKubernetes (k3s), Docker, Argo CD, Helm, GitHub Actions
Infrastructure & CloudAWS, Proxmox, Cloudflare, Terraform, NGINX, Vault, Linux
Languages & ScriptingPython, NextJS, Rust, Bash, SQL, TypeScript, C#, Powershell
Data & StoragePostgreSQL, Redis, MinIO, InfluxDB, Clickhouse, BigQuery, Databricks

Education

Ocala, Florida Aquired · GED, Highschool Equivalent
2004
In-Person Web Development Course · 4-Week Intensive, HTML · CSS · JavaScript
New York, NY
cPanel · Course, Linux System Administration
apr
anthonypaulruiz.com·linkedin.com/in/anthonypaulruiz·github.com/anthonypaulruiz