SRE

Role

The SRE (Site Reliability Engineer) is an engineer dedicated to the reliability of production systems: availability, performance, latency, capacity, incident management and operations automation.

The SRE (Site Reliability Engineer) is an engineer dedicated to the reliability of production systems: availability, performance, latency, capacity, incident management and operations automation.

The practice was formalised at Google and popularised by the book Site Reliability Engineering (2016). It introduced concepts now considered standard: SLIs (indicators), SLOs (objectives), error budget (the allowed error budget that arbitrates between new features and stabilisation), toil (the repetitive work to be automated), and blameless post-mortems.

The SRE is close to DevOps but with a stronger emphasis on software engineering applied to operations (50% of time spent in code) and on quantified, measured reliability objectives.

SRE

Related terms

DevOps

Product

ORM

AI Engineer

CTO

Cybersecurity Analyst

Ready to find the missing piece of your team?