loudwhisper

joined 2 years ago
[–] loudwhisper 1 points 1 year ago (7 children)

Of course the problem is solved, but that doesn't mean that the solution is easy. Also, distributed protocols still need to work on top of a complicated network and with real-life constraints in terms of performances (to list a few). A bug, misconfiguration, oversight and you have a problem.

Just to make an example, I remember a Kafka cluster with 5 replicas completely shitting its pants for 6h to rebalance data during a planned maintenance where one node was brought offline. It caused one of the longest outages to date with the websites which relied on it offline. Was it our fault? Was it a misconfiguration? A bug? It doesn't matter, it's a complex system which was implemented and probably something was missed.

Technology is implemented by people, complexity increased the chances of mistakes, not sure this can be argued.

Making it harder to identify SPOF means you might miss your SPOF, and that means having liabilities, and having anyway scenarios where your system can crash, in addition for paying quite a lot to build a resilience that you don't achieve.

A single instance with 2 failure scenarios (disk failure and network failure) - to make an example - is not more fragile than a distributed system with 20 failure scenarios. Failure scenarios and SPOF can have compensating controls and be mitigated successfully. A complex system where these can't be fully identified can't have compensating control and residual risk might be much harder. So yes, a single disk can fail more likely than 3 disks at once, but this doesn't give the whole picture.

[–] loudwhisper 11 points 1 year ago* (last edited 1 year ago)

Thanks, that is a very good observation! I will try to sneak an edit later today where I can add some appendix about acronyms and abbreviations.

Edit:

While it might not look great, I have added at the bottom an Appendix with all (hopefully, I might have missed some) acronyms and abbreviations. Thanks for the suggestion!

[–] loudwhisper 1 points 1 year ago (9 children)

I wish it worked like that, but I donct think it does. Connecting clouds means introducing many complex problems. Data synchronization and avoiding split-brain scenarios, a network setup way more complex, stateful storage that needs to take into account all the quirks and peculiarities of all services across all clouds, service accounts and permissions that need to be granted and segregated for all of them, and way more. You may gain resilience in some areas, but you introduce a lot more things that can fail, be misconfigured or compromised.

Plus, a complex setup makes it harder by definition to identify SPOFs, especially considering it's very likely nobody in the workforce is going to be an expert in all the clouds in use.

To keep using your simile of the disks, a single disk with a backup might be a better solution for many people, considering you otherwise might need a RAID controller that can fail and all the knowledge to handle and manage a RAID array properly, in addition to paying 4 or 5 times the storage. Obviously this is just to make a point, I don't actually think that RAID 5 vs JBOD introduces comparable complexity compared to what multi-cloud architecture does to single-cloud.

[–] loudwhisper 1 points 1 year ago (11 children)

Complexity brings fragility. It's not about doing the job right, is that "right" means having to deal with a level of complexity, a so high number of moving parts and configuration options, that the bar is set very high.

Also, I would argue that a large number of organizations don't actually need the resilience that they pay a very high price for.

[–] loudwhisper 1 points 1 year ago

Yeah in general you can't mess the building blocks from the PoV of availability or internal design. That is true, since you are outsourcing it. You can still mess them up from other points of view (think about how many companies got breached due to misconfigured S3 buckets).

[–] loudwhisper 15 points 1 year ago (4 children)

Thanks! I went and tried on my phone and indeed setting Firefox to light mode indeed causes that horrendous and unreadable result. I will need to figure out way, eventually, and provide an alternative light scheme.

[–] loudwhisper 2 points 1 year ago (1 children)

cognito auth

But then at that point you are already vendor-locked, right? At that point, running on bare ec2 instances and taking more control in your hands (vs using even more AWS-specific services) is going to help very little, when your whole user management is now tied to a specific provider.

[–] loudwhisper 11 points 1 year ago (7 children)

How do you get this? Anything that tries to force a light mode?

This is how the site is supposed to look like (there is no light/dark theme selection):

[–] loudwhisper 3 points 1 year ago (3 children)

But then I would ask, what's the point of paying 10-20x per computing unit at that point? If you just use ec2 instance, all AWS offers you is an API to manage them, is it worth the premium? Besides, you will still need to mess with a lot of other services (VPCs, SGs, etc.) anyways.

What's the selling point in your opinion?

[–] loudwhisper 3 points 1 year ago (2 children)

Is that what you get with Cloud? Because there are still a million ways to shoot yourself in the foot. The main difference is that the single genius doesn't need to implement things him/herself, but decisions still need to be taken and fragile setups can still be built.

Imagine an ec2 instance in a satellite account performing some business critical function with an instance role, whose custom IAM policy allows to do it in another account. Clouds are not giving you good engineering, they are giving you premade building blocks, you can absolutely still make a mess with those. Even more, the complexity and the immense portfolio of features can allow very creative ways to build very low-quality systems.

I think you can have good, boring, simple systems built by engineers. With or without Cloud services.

[–] loudwhisper 5 points 1 year ago

I feel you very much. Security work is also somewhat similar.

I think this takes a way basically the component that made it interesting, understanding what you are doing to the point that you can build stuff.

it's about learning specific applets and features to click on and running down daily and weekly checklists.

Well said.

[–] loudwhisper 2 points 1 year ago (1 children)

This post must be fun with that one... 150+ instances in various contexts of "cloud".

view more: ‹ prev next ›