this post was submitted on 04 Oct 2025
73 points (100.0% liked)

Programming

22950 readers
145 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 2 years ago
MODERATORS
 

This is a discussion between John Ousterhout and Martin, who advocated in "Clean Code" to omit comments and split code in extremely small functions. Ousterhout takes that to town by asking Martin to explain an algorithm which Martin presented in his book on "Clean Code", and algorithm that generates a list of prime numbers. It turns out that Martin essentially does not understand his own code because of the way it is written - and even introduces a performance regression!

Ousterhout: Do you agree that there should be comments to explain each of these two issues?

Martin: I agree that the algorithm is subtle. Setting the first prime multiple as the square of the prime was deeply mysterious at first. I had to go on an hour-long bike ride to understand it.

[.. .] The next comment cost me a good 20 minutes of puzzling things out.

[...] I refactored that old algorithm 18 years ago, and I thought all those method and variable names would make my intent clear -- because I understood that algorithm.

[ Martin presents a re-write of the algorithm]

Ousterhout: Unfortunately, this revision of the code creates a serious performance regression: I measured a factor of 3-4x slowdown compared to either of the earlier revisions. The problem is that you changed the processing of a particular candidate from a single loop to two loops (the increaseEach... and candidateIsNot... methods). In the loop from earlier revisions, and in the candidateIsNot method, the loop aborts once the candidate is disqualified (and most candidates are quickly eliminated). However, increaseEach... must examine every entry in primeMultiples. This results in 5-10x as many loop iterations and a 3-4x overall slowdown.

It gets even more hilarious when one considers where Martin has taken from the algorithm, and who designed it originally:

Martin took it from a 1972 publication of Donald E. Knuths seminal article on Literate Programming:

http://www.literateprogramming.com/knuthweb.pdf

In this article, Knuth explains that the source code of a program should be ideally understood as a by-product of an explanation which is directed at humans, explaining reasoning, design, invariants and so on. He presents a system which can automatically extract and assemble program source code from such a text.

Even more interesting, the algorithm was not invented by Knuth himself. It was published in 1970 by Edsger Dijkstra in his "Notes on Structured Programming" (with a second edition in 1972).

In this truly fascinating and timeless text, Dijkstra writes on software design by top-down problem decomposition, proving properties of program modules by analysis, using invariants to compose larger programs from smaller algorithms and design new data types, and so on. Also, how this makes software maintainable. In this, he uses the prime number generation algorithm as an extended example. He stresses multiple times that both architecture and invariants need to be documented on their own, to make the code understandable. (If you want that feeling you are standing on the shoulders of giants, you should read what Dijkstra, Knuth, and also Tony Hoare and Niklaus Wirth wrote).

So, Robert Martin is proven wrong here. He does not even understand, and could not properly maintain, the code from his own book. Nor did he understand that his code is hard to understand for others.

( I would highly recommend Ousterhout's book.)

top 23 comments
sorted by: hot top controversial new old
[–] IanTwenty@lemmy.world 4 points 6 hours ago

Another thank you for posting this, made my day.

I have read and followed a fair amount of Uncle Bob's work but was not aware of Ousterhout till now. Bob says during the time the Clean Code book was written there was an anti-comment sentiment about and this matches my own experience. I agree with Ousterhout that it's taken too far in his book though.

I wonder if there is another factor at play - some people/cultures prefer high context communication and some less. Bob seems clearly to prefer low context i.e. the burden is on the (code) reader to educate themselves. Whereas John makes it a matter of professional behaviour that we make the next reader's work as simple as possible by commenting code as appropriate.

Surely it's better to assume high context is needed and provide it (within reason) versus only catering for low context. As Bob discovered he became a low context person when he returned to his own code after some time had passed.

[–] melsaskca@lemmy.ca 8 points 10 hours ago (2 children)

The real truth is in the code. Comments can become unmaintained or outdated, especially if several people are amending the code over time. As far as coding...if you ask 10 programmers to write a specific function you will see 10 different ways to achieve the exact same result. I'm exaggerating a bit but this is more or less true. Coding style, like beauty, is subjective.

Comments can become outdated, but so can variable and function names. "Self-documenting" code often relies on appropriate naming, yet this is also subject to rot as the code develops.

[–] HaraldvonBlauzahn@feddit.org 3 points 9 hours ago (1 children)

Comments can become unmaintained or outdated, especially if several people are amending the code over time.

In that case, they can be fixed like any other bug. For me, incorrect or incomplete documentation - especially user-facing documentation - is just a bug, too.

[–] BatmanAoD@programming.dev 1 points 7 hours ago

Fair, but it's one that the typical tools for finding bugs, tests and static analysis, cannot actually help with.

[–] squaresinger@lemmy.world 14 points 13 hours ago (1 children)

I'm totally with Ousterhout here! Thanks for posting this great discussion!

The problem with the "Clean code" approach of overdecomposition is that it doesn't abstract the code away in meaningful ways. The code is still there and to debug/avoid bugs you still need to know all of it, if the methods are entangeld. So I still need to keep 500 lines of code in mind, but now they aren't all in one file where I can easily follow them, but instead spread over 40 files, each just containing 1-2 line methods.

I'm also very much against "Clean code"'s recommendations on comments. In the end it either leads to no documentation or documentation lost somewhere in confluence that nobody ever reads or updates because it's not where it's needed.

Getting developers to read and update documentation is not an easy task, so the easier it is to find and update the documentation the more likely it is that the documentation is actually used. And there is no easier-to-access place for documentation than in comments right in the code. I really like Javadoc-style documentation since it easily explains the interface right where it's needed and neatly integrates with IDEs.

[–] Thorry@feddit.org 3 points 11 hours ago* (last edited 11 hours ago) (2 children)

There are a couple of things I do agree with in regards to the comments in code. They aren't meant as a replacement for documentation. Documentation is still required to explain more abstract overview kind of stuff, known limitations etc. If your class has 3 pages of text in comments at the top, that would probably be better off in the documentation. When working with large teams there are often people who need to understand what the code can and can't do, how edge cases are handled etc. but can't read actual code. By writing proper documentation, a lot of questions can be avoided and often help coders as well with a better understanding of the system. Writing doc blocks in a matter that can be extracted into a documentation helps a lot as well, but I feel that does provide an easy way out to not write actual documentation. Of course depending on the situation this might not matter or one might not care, it's something that comes up more when working in large teams.

Just like writing code, writing proper comments is a bit of an art. I've very often seen developers be way too verbose, commenting almost every line with the literal thing the next line does. Anyone who can read the code can see what it does. What we can't see is why it does this or why it doesn't do it in some other obvious way. This is something you see a lot with AI generated code, probably because a lot of their training was done on tutorials where every line was explained so people learning can follow along.

This also ties in with keeping comments updated and accurate when changing code. If the comment and the code doesn't match with each other, which one is true? I've in the past worked on legacy codebases where the comments were almost always traps. The code didn't match the comments at all, sometimes obviously so, most times only very subtle. We were always guessing was the implementation meant to be the comment and the difference just a mistake? The codebase was riddled with bugs, so it's likely. Or was the code changed at a later point on purpose and the comments neglected?

Luckily these days we have good tools in regards to source control, with things like feature branches, pull requests with tools that allow for discussion and annotation. That way at least usually the origin of a change is traceable. And code review can be applied before the change is merged, so mistakes like neglecting comments can be caught.

Now I don't agree with the principle of no comments at all. Just because a tool has some issues and limitations doesn't mean it gets banned from our toolbox. But writing actual useful comments is very hard and can be just as hard as writing good code. Comments also aren't a cheat card for writing bad code, the code needs to stand on its own and be enhanced by the comments.

It's one of those things we've been arguing about over my entire 40 year career. I don't think there is a right way. Whatever is best depends on the person, the team, the system etc. And like with many things, there are people who are good and people who suck. That's just the way the cookie crumbles.

[–] HaraldvonBlauzahn@feddit.org 4 points 9 hours ago* (last edited 9 hours ago)

Anyone who can read the code can see what it does. What we can't see is why it does this or why it doesn't do it in some other obvious way. This is something you see a lot with AI generated code, probably because a lot of their training was done on tutorials where every line was explained so people learning can follow along.

This is also an important difference between C++ and Rust: Rust ensures correctness of ownership semantics, mutating xor sharing values, absence of race conditions, and so on. In that sense, Rust has stricter syntax: It puts things into the code which are "meta-code" in C++.

Because of that, many or perhaps most correct Rust programs could be re-written verbatim to correct C++ programs. Invariants that would ensure correctness can be put into comments. But, in practice it is extremely hard to write originally correct multi-threaded C++ programs that way. One reason for this is that many C++ programmers lack both the means as well as the culture to annotate the correctness of their code. Of course invariants and pre-conditions can be annotated in comments, but in reality it is rarely done.

[–] squaresinger@lemmy.world 4 points 11 hours ago (1 children)

You are obviously right about the things you are saying. I was specifically talking about code documentation on a class/method level. User documentation, architecture documentation or other high-level documentation doesn't make sense in the code, of course.

I have seen similar levels of documentation as you talk about (every line, every call documentated), but in flow charts in Confluence. That has the same issues as documenting every line of code in comments but worse.

Just because a tool has some issues and limitations doesn't mean it gets banned from our toolbox.

This is very much it. Every tool can be abused and no tool is perfect. Code can have bugs and can be bad (and often both things happen). Should we now ban writing code?

If the comment and the code doesn't match with each other, which one is true?

This can be true even with code alone. A while ago I found a bug in an old piece of code written by someone who left the company years ago.

The method causing the bug was named something like isNotX(). In the function it returned isX. About half the places where the function was called, the returned value was assigned to a variable named isX and in the other half of the places the variable was named isNotX. So which is true?

A javadoc-style comment could have acted as parity. Since comments are simpler to write than code, it's easier to correctly explain the purpose of a function in there than in code.

While in the example I referenced it was quite clear that something was wrong, this might not always be the case. Often the code looks consistent while actually being wrong. A comment can help to discern what's going on there.

Another example of that that we had at the same project:

In the project there were bookings and prebookings. We had a customer-facing REST endpoint called "getSomeSpecialBookings" (it wasn't called that, but the important thing was that this function would return a special subset of bookings). Other "get...Bookings" endpoints would return only return bookings and not prebookings, but this special endpoint would return both bookings and prebookings. A customer complained about that, so we fixed the "bug" and now this endpoint only returned bookings.

(There was no comment anywhere and we couldn't find anything relevant in Confluence.)

Directly after the release some other customer creates a highest priority escalation because this change broke their workflow.

Turns out, that endpoint only existed because that customer asked for it and the dev who implemented that endpoint just implemented it as the customer requested without documenting it anywhere.

A comment would have been enough to explain that what this endpoint was doing was on purpose.

We all know that code tends to be bad, especially after the project has been running for a few years and has been through a few hands.

Why would anyone think that code is good enough to be the documentation?

Luckily these days we have good tools in regards to source control, with things like feature branches, pull requests with tools that allow for discussion and annotation. That way at least usually the origin of a change is traceable.

Sadly, we also have non-technical people running procurement and thus we keep switching tools because one is maginally cheaper or because cloud is cool right now (or not cool anymore right now) and migrations suck and then we end up with lost history.

[–] Thorry@feddit.org 3 points 11 hours ago (1 children)

we end up with lost history

Oof, I felt this in my soul

[–] squaresinger@lemmy.world 1 points 10 hours ago

A year or so before I started my current job, the team working on the project got split. Someone then decided that both teams should use different jira prefixes for tickets processed by each team. So they took all issues and automatically split them into two prefixes based on the people who implemented the ticket and renumbered everything. But they didn't do the same in Gitlab merge requests, and they didn't do it in git commit messages either.

So now git and gitlab reference all old tickets by their old numbering system, but there's no trace of these old numbers in Jira. It's close to impossible to find the Jira ticket mentioned in a git commit message.

Oh, and of course, nobody ever managed to properly link Jira and Gitlab (so that jira tickets contain the gitlab MRs, branches and commits) because for that you need a free Jira plugin and procurement wants a multi-page long description why this is needed, and it needs to be signed off by 5 people including the department lead and has to go through the whole procurement process before we can install that plugin.

[–] cr1cket@sopuli.xyz 11 points 13 hours ago (1 children)

Ousterhouts book was very much an eye opener for me when i stumbled over it some years ago.

All those recommendations from the "clean code" corner always felt a tad off to me, but i couldn't put the finger on it. Then i read this small and nicely written book an suddenly the fog cleared up.

[–] HaraldvonBlauzahn@feddit.org 6 points 13 hours ago

For anyone interested, here the link to his page on the book and where to get it:

https://web.stanford.edu/~ouster/cgi-bin/aposd.php

[–] Eddyzh@lemmy.world 7 points 13 hours ago (1 children)

Thank you for sharing this!

[–] Endmaker@ani.social 5 points 13 hours ago* (last edited 13 hours ago) (1 children)

My take / how I code:

Method length - when in doubt, and there's no time to do much thinking due to a tight deadline, shorter is better

(Method length shouldn't be the determining factor that goes into the design IMO. It should be other principles like cohesion. Shorter methods - on average - just happen to be side-effects of good design)

Comments - generally leave no comment where the code is capable of expressing itself; I do leave comments where it seemed helpful / necessary

Bundling vs TDD - no strong preference; both can be helpful depending on the situation

Bonus: the code for the prime number generator is atrocious. I did not bother reading the sections on it.

[–] HaraldvonBlauzahn@feddit.org 8 points 13 hours ago (1 children)

The whole point of software design is that any time invested into it pays back multiple times. "Not having time for it" is usually faulty thinking.

[–] Endmaker@ani.social 8 points 12 hours ago* (last edited 12 hours ago) (1 children)

"Not having time for it" is usually faulty thinking.

It absolutely is.

The whole point of software design is that any time invested into it pays back multiple times.

Try telling an unreasonable boss this.

[–] HaraldvonBlauzahn@feddit.org 6 points 12 hours ago* (last edited 12 hours ago) (1 children)

Possibly the reason why Bob Martins seems popular with managers: "Omitting comments makes code shorter. Shorter is simpler and therefore is better. Also, typing in all that comments costs so much time, given that the typing speed of developers in large projects is perhaps 50 lines per day!!".

One can see how that style of thinking leads to auto-generation of source code by LLMs ....

My advice here is in such situations to do whatever increases your respect for yourself. You will see that it does not slow you down.

And one more thing: In 35 years, I have never been fired because I tried to write good code. It's all posturing because in the end, they need you.

[–] sukhmel@programming.dev 3 points 11 hours ago (1 children)

I think my manager is strongly Clean Code inclined. More than once they removed comments, because they will become outdated anyway (so there's no use explaining what is going on at all, right? Right‽)

[–] dandi8@fedia.io 4 points 10 hours ago (2 children)

If you need comments to explain what is happening (and not why it is happening), then you've got some bad code that needs refactoring.

[–] HaraldvonBlauzahn@feddit.org 3 points 9 hours ago

However: Because code can have many layers of abstraction, there can be many layers of "what", too.

[–] sukhmel@programming.dev 1 points 9 hours ago

“Why” comments are of course included in “we don't need that” category