Writing

Blog

Articles on performance testing, SRE, observability, and AI systems performance.

All Performance Testing SRE AI Performance Observability Concepts

AI Performance Jun 23, 2026

Measuring LLM Inference Performance: Latency, Throughput, and Cost

The metrics that actually matter for LLM serving — TTFT, TPOT, tokens/sec, and cost per request — how they trade off, and how to load-test an inference endpoint.

Read →

SRE Jun 20, 2026

SLOs and Error Budgets: A Practical Guide for Performance Engineers

How to turn vague reliability goals into measurable SLIs, SLOs, and error budgets — and how that math directly governs release velocity and on-call load.

Read →

SRE Jun 6, 2026

Chaos Engineering: Testing Reliability by Breaking Things on Purpose

What chaos engineering is, how to run a safe first experiment, and how it connects to error budgets and SLOs.

Read →

SRE Jun 5, 2026

Capacity Planning with the Universal Scalability Law

How the Universal Scalability Law models contention and coherency penalties to predict where a system's throughput will actually peak and decline.

Read →

SRE Jun 5, 2026

Writing Incident Response Runbooks That Actually Get Used

What makes an incident runbook useful under real pressure versus one that gets ignored, with a practical structure to follow.

Read →

SRE Jun 5, 2026

On-Call Best Practices That Prevent Burnout

Practical on-call practices — rotation design, alert quality, and post-incident follow-up — that keep on-call sustainable rather than dreaded.

Read →

SRE Jun 4, 2026

Building a Genuine Blameless Postmortem Culture

What separates a blameless postmortem culture that actually works from one that's blameless only in name, and how to build the former.

Read →

SRE Jun 4, 2026

SRE vs DevOps vs Platform Engineering: What Actually Differs

A clear-eyed comparison of SRE, DevOps, and platform engineering as organizational approaches, and where the real differences (and overlaps) lie.

Read →

SRE Jun 4, 2026

Toil Reduction: Identifying and Eliminating Operational Toil

What SRE means by 'toil,' how to identify it systematically, and a practical framework for deciding what to automate first.

Read →

SRE Jun 3, 2026

Monitoring vs Observability: A Practical Distinction

What actually separates monitoring from observability beyond the buzzword, and why the distinction matters for debugging unknown failure modes.

Read →

SRE Jun 3, 2026

Runbooks vs Playbooks: A Useful Distinction for Incident Response

The practical difference between an incident runbook and a playbook, and when each is the right tool to write and maintain.

Read →

SRE Jun 3, 2026

SRE Team Topologies: Embedded, Centralized, and Hybrid Models

How SRE teams are typically organized — embedded, centralized, and hybrid models — and the trade-offs each makes between context and consistency.

Read →

AI Performance Jun 2, 2026

Continuous Batching: How Modern LLM Servers Achieve High Throughput

How continuous batching differs from static batching, why it's central to vLLM and TGI's throughput advantage, and what it costs individual requests.

Read →

AI Performance Jun 2, 2026

Prompt Caching and KV Cache: Why Repeated Context Gets Cheaper

How prompt/KV caching reduces cost and latency for repeated context in LLM applications, and when it actually helps versus doesn't.

Read →

AI Performance Jun 2, 2026

Benchmarking Vector Database Performance for RAG Systems

What actually matters when benchmarking a vector database for retrieval-augmented generation — recall, latency, and indexing trade-offs.

Read →

AI Performance Jun 1, 2026

GPU Utilization for LLM Model Serving: What to Actually Measure

Why GPU utilization percentage alone is a misleading metric for LLM serving, and what to measure instead to understand real efficiency.

Read →

AI Performance Jun 1, 2026

Quantization and Performance Trade-offs in LLM Serving

How model quantization (INT8, INT4, and similar) trades accuracy for latency, throughput, and memory savings, and how to evaluate the trade-off.

Read →

AI Performance Jun 1, 2026

Optimizing RAG Pipeline Latency: Where the Time Actually Goes

A breakdown of where latency accumulates in a retrieval-augmented generation pipeline, and the highest-leverage places to optimize it.

Read →

AI Performance May 31, 2026

Benchmarking Open-Source LLM Inference Servers: vLLM, TGI, and Ollama

A practical comparison framework for benchmarking vLLM, TGI, and Ollama, and what each is actually optimized for.

Read →

AI Performance May 31, 2026

Load Testing LLM APIs: A Practical Guide

How to design a load test specifically for LLM APIs, covering realistic prompt distributions, streaming measurement, and concurrency sweeps.

Read →

AI Performance May 31, 2026

Token Economics 101: Understanding LLM API Cost Structure

How LLM API pricing actually works — input vs output token pricing, why output costs more, and the practical levers for controlling cost.

Read →

Observability May 30, 2026

OpenTelemetry for Performance Engineers: A Practical Start

A practical introduction to OpenTelemetry's traces, metrics, and logs, and how to instrument a service for meaningful performance analysis.

Read →

Observability May 30, 2026

Prometheus and Grafana Basics for Performance Monitoring

How Prometheus's pull-based metrics model and PromQL work, and how to build Grafana dashboards that actually answer performance questions.

Read →

Observability May 30, 2026

The RED Method: Rate, Errors, Duration for Service Monitoring

How the RED method gives a simple, consistent framework for monitoring any request-driven service, and how it complements the USE method.

Read →

Observability May 29, 2026

Distributed Tracing Explained: Spans, Context, and Sampling

How distributed tracing actually works under the hood — spans, trace context propagation, and sampling strategies — explained from first principles.

Read →

Observability May 29, 2026

Structured Logging Best Practices for Debuggable Systems

Why structured logging (key-value fields, not free text) matters for debugging at scale, and practical conventions worth adopting.

Read →

Observability May 29, 2026

The USE Method: Utilization, Saturation, Errors for Resource Monitoring

How Brendan Gregg's USE method systematically checks system resources for performance bottlenecks, and how it pairs with the RED method.

Read →

Observability May 28, 2026

APM Tool Comparison: Datadog, Dynatrace, and New Relic

A practical comparison of how Datadog, Dynatrace, and New Relic approach instrumentation, AI-assisted root-cause analysis, and pricing.

Read →

Observability May 28, 2026

Building SLO Dashboards That Drive Real Decisions

How to design an SLO dashboard that actually informs the ship/freeze decisions error budgets are meant to enable, not just display pretty graphs.

Read →

Concepts May 28, 2026

Little's Law for Performance Engineers, with Worked Examples

An intuitive explanation of Little's Law (L = λW), how to derive concurrency, throughput, or latency from the other two, and common misuses.

Read →

Concepts May 27, 2026

Amdahl's Law for Performance Engineers

How Amdahl's Law quantifies the limit parallelization can achieve when part of a workload is inherently serial, with practical examples.

Read →

Concepts May 27, 2026

Queueing Theory Basics for Performance Engineers

An accessible introduction to queueing theory concepts — utilization, queue length, and waiting time — and why systems get dramatically slower near full utilization.

Read →

Concepts May 27, 2026

Why p99 Matters: Understanding Latency Percentiles

What latency percentiles actually mean, why averages systematically mislead, and the pitfalls of averaging or combining percentiles incorrectly.

Read →

Concepts May 26, 2026

Concurrency vs Parallelism: A Clear Distinction

The genuine technical distinction between concurrency and parallelism, why it matters for performance reasoning, and common confusions.

Read →

Concepts May 26, 2026

Garbage Collection Tuning Fundamentals

The core concepts behind garbage collector tuning — generational collection, pause times, and throughput trade-offs — applicable across JVM, .NET, and Go.

Read →

Concepts May 26, 2026

Throughput vs Latency: Why You Usually Can't Maximize Both

Why throughput and latency often trade off against each other through batching, and how to decide where to sit on that trade-off curve.

Read →

Performance Testing May 25, 2026

Setting Performance Budgets for Web Applications

How to set practical performance budgets (page weight, load time, Core Web Vitals) and enforce them in CI before they regress in production.

Read →

Performance Testing May 25, 2026

Spike, Stress, and Soak Testing: Three Different Questions

How spike testing, stress testing, and soak testing each answer a different reliability question, and why a single load test can't cover all three.

Read →

Observability May 25, 2026

Synthetic Monitoring vs Real User Monitoring (RUM)

How synthetic monitoring and real user monitoring complement each other for understanding production performance, and when to rely on each.

Read →

Performance Testing May 24, 2026

How to Write a Performance Test Plan That Answers a Real Question

A practical template for a performance test plan that starts from a specific question, not a generic checklist of tools and metrics.

Read →

Performance Testing May 24, 2026

A Pre-Launch Performance Testing Checklist

A practical checklist to run through before considering a performance testing effort complete and ready to inform a launch decision.

Read →

Performance Testing May 24, 2026

Top Performance Testing Mistakes (and How to Avoid Them)

A roundup of the most common, costly performance testing mistakes across tools and teams, distilled into a practical avoidance guide.

Read →

Concepts May 23, 2026

Understanding Apdex: Translating Latency into User Satisfaction

What the Apdex score actually measures, how to set its thresholds meaningfully, and its limitations as a single summary metric.

Read →

SRE May 23, 2026

How to Calculate an Error Budget, Step by Step

A step-by-step walkthrough of calculating an error budget from an SLO, with worked examples at different reliability targets.

Read →

Concepts Dec 28, 2023

What is DevPerfOps? Performance as a First-Class Citizen

DevPerfOps extends DevOps by embedding performance engineering across the entire delivery pipeline — shifting it left from a pre-release gate to a continuous, shared responsibility.

Read →