this post was submitted on 13 Jun 2023
7 points (100.0% liked)
Site Reliability Engineering
244 readers
1 users here now
Links and content related to Site Reliability Engineering (SRE), observability, system monitoring, and related topics. Related to DevOps, DevSecOps, etc.
Not affiliated with the subreddit r/sre, though the topics are similar in scope
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
As someone who's done both jobs, there is a difference between Site Reliability Engineering and DevOps, though it's mainly a distinction without a difference. Site Reliability Engineering focuses on the health of a website, monitoring it, improving uptime and resilience to the incidents that come at it. DevOps is a cultural movement that focuses on how the code gets up and online, and how the site or app goes through its entire lifecycle.
The net effect is that you basically do the same actions as both a Site Reliability Engineer and a DevOps Engineer, but you focus on different aspects of the problem. For DevOps, you're focusing on the overall efficiency of the deployment pipeline, ensuring that the whole thing has healthchecks, consistent monitoring and control mechanisms, version-controlled tracking of code, and automation facilitating all of that throughout. There should be cooperation between Dev and Ops (and Security in DevSecOps) throughout the entire lifecycle of the application from end to end, and the "success metric" for DevOps is how seamless that cooperation is -- it's a cultural shift, not a directly mechanical one.
SRE takes the approach that site reliability is paramount, and the best way to achieve site reliability is to implement the same above, but only those parts that make sense as the need arises. If poor deployment processes impact site reliability, the SRE tackles the site deployment processes. If the site isn't able to roll back a change because there's no source-control, then it's time to implement version-control on your source files. You only really implement what you need as it comes up or you foresee it coming up and causing problems in the near-ish future.
SRE is, in my experience, a much more engineering-centric approach to solving the same problems that DevOps arose to address, but both wind up making very similar design and tooling choices. I would not say either is necessarily "wrong" beyond whether they work in the shop's culture or not. Managed poorly, SRE can become a cowboy fun-park, where only the cool and shiny things get worked on as things like safety get deprioritized. DevOps can become an ivory tower, devoid of connection to the real world, assuming the shop isn't just offloading all of the app maintenance from the Devs onto the Ops or SRE folks in a "Your problem now" situation. Both are bad, but both can be recovered from, and neither is an engineering problem that engineers can fix with tooling. I've said this for years, but in my 16 years of doing IT professionally, I've never encountered an actual engineering problem; every problem I've ever encountered has been a cultural or organizational problem presenting itself as an engineering problem, and fixing cultural problems with engineering solutions is like trying to solve hardware problems with software: expensive, and prone to weird failure-states.
Tl;dr: Both start from different places, but they end up in pretty much the same healthy design patterns. Close enough that you should be able to swap from one to another fairly readily.