SRE & Platform Engineering Books

Books by Marcel Koert on building repeatable reliability, platform capabilities, and engineering culture.

SRE DevOps Platform Engineering

Featured Books

Essential SRE Articles book cover - A collection of 200+ articles on SRE and Platform Engineering

Essential SRE Articles

By Marcel Koert

202+ Articles • E-book

A comprehensive collection of 200+ articles covering Site Reliability Engineering, DevOps, Platform Engineering, Cloud Computing, and Artificial Intelligence. This e-book compiles essential knowledge for SREs, DevOps engineers, and IT professionals seeking to master modern infrastructure and operations.

Topics covered:

  • Site Reliability Engineering (SRE) fundamentals
  • DevOps practices and methodologies
  • Platform Engineering best practices
  • Cloud architecture and operations
  • Artificial Intelligence in IT operations
  • Monitoring, alerting, and observability
  • Incident management and postmortems

Available at:

Essential SRE: Way of Working book cover - Practical guide to Site Reliability Engineering habits and culture

SRE Essentials: Way of Working

By Marcel Koert

2026 • Hardcover/E-book

You do not build a real SRE team with alerts, dashboards, and good intentions.

You build it with clear ownership, practical process, operational discipline, and enough humanity to stop the work from turning into chaos, blame, and burnout.

Essential SRE: Way of Working is a practical Site Reliability Engineering book for SRE engineers who want to build something real. Not a slide deck version of SRE. Not a title change with no substance. A real team with clear ways of working, strong reliability habits, and processes that help instead of getting in the way.

This book goes straight at the reality of the job. Incidents are messy. Priorities collide. Toil grows quietly. Teams drift into firefighting. Communication breaks down under pressure. Reliability suffers long before the dashboards admit it.

That is why this book focuses on the part that matters most: how an SRE team actually works.

Inside this book, you will learn how to:

  • Build a real SRE team with clear roles and ownership
  • Create processes that improve reliability instead of adding bureaucracy
  • Use SLOs, SLIs, and error budgets in a practical way
  • Reduce toil, firefighting, and operational noise
  • Improve incident response, communication, and decision-making
  • Create accountability without losing trust and common sense
  • Balance strong engineering standards with the human reality of the work

This is not theory for perfect organisations with unlimited time and budget. It is for SRE engineers working in real environments, with real pressure, real systems, and real people.

If you want to help build an SRE team with good process, sharp operational thinking, and humanity at its core, this book will help you do it.

Topics Covered

SRE Fundamentals

Books on SLOs, SLIs, SLAs, error budgets, and the core principles of Site Reliability Engineering.

DevOps Practices

Culture, practices, and tools for bridging development and operations teams.

Cloud Architecture

Design patterns, best practices, and lessons from cloud-native transformations.

AI in IT Operations

Leveraging AI, ML, and automation to improve system reliability and operational efficiency.

Need practical SRE consulting for your engineering organisation?

MeloMar IT helps teams define meaningful SLOs, reduce toil, and build platform capabilities that actually support engineering teams.