this post was submitted on 04 Oct 2025
73 points (100.0% liked)

Programming

22950 readers
132 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 2 years ago
MODERATORS
 

This is a discussion between John Ousterhout and Martin, who advocated in "Clean Code" to omit comments and split code in extremely small functions. Ousterhout takes that to town by asking Martin to explain an algorithm which Martin presented in his book on "Clean Code", and algorithm that generates a list of prime numbers. It turns out that Martin essentially does not understand his own code because of the way it is written - and even introduces a performance regression!

Ousterhout: Do you agree that there should be comments to explain each of these two issues?

Martin: I agree that the algorithm is subtle. Setting the first prime multiple as the square of the prime was deeply mysterious at first. I had to go on an hour-long bike ride to understand it.

[.. .] The next comment cost me a good 20 minutes of puzzling things out.

[...] I refactored that old algorithm 18 years ago, and I thought all those method and variable names would make my intent clear -- because I understood that algorithm.

[ Martin presents a re-write of the algorithm]

Ousterhout: Unfortunately, this revision of the code creates a serious performance regression: I measured a factor of 3-4x slowdown compared to either of the earlier revisions. The problem is that you changed the processing of a particular candidate from a single loop to two loops (the increaseEach... and candidateIsNot... methods). In the loop from earlier revisions, and in the candidateIsNot method, the loop aborts once the candidate is disqualified (and most candidates are quickly eliminated). However, increaseEach... must examine every entry in primeMultiples. This results in 5-10x as many loop iterations and a 3-4x overall slowdown.

It gets even more hilarious when one considers where Martin has taken from the algorithm, and who designed it originally:

Martin took it from a 1972 publication of Donald E. Knuths seminal article on Literate Programming:

http://www.literateprogramming.com/knuthweb.pdf

In this article, Knuth explains that the source code of a program should be ideally understood as a by-product of an explanation which is directed at humans, explaining reasoning, design, invariants and so on. He presents a system which can automatically extract and assemble program source code from such a text.

Even more interesting, the algorithm was not invented by Knuth himself. It was published in 1970 by Edsger Dijkstra in his "Notes on Structured Programming" (with a second edition in 1972).

In this truly fascinating and timeless text, Dijkstra writes on software design by top-down problem decomposition, proving properties of program modules by analysis, using invariants to compose larger programs from smaller algorithms and design new data types, and so on. Also, how this makes software maintainable. In this, he uses the prime number generation algorithm as an extended example. He stresses multiple times that both architecture and invariants need to be documented on their own, to make the code understandable. (If you want that feeling you are standing on the shoulders of giants, you should read what Dijkstra, Knuth, and also Tony Hoare and Niklaus Wirth wrote).

So, Robert Martin is proven wrong here. He does not even understand, and could not properly maintain, the code from his own book. Nor did he understand that his code is hard to understand for others.

( I would highly recommend Ousterhout's book.)

you are viewing a single comment's thread
view the rest of the comments
[–] squaresinger@lemmy.world 4 points 14 hours ago (1 children)

You are obviously right about the things you are saying. I was specifically talking about code documentation on a class/method level. User documentation, architecture documentation or other high-level documentation doesn't make sense in the code, of course.

I have seen similar levels of documentation as you talk about (every line, every call documentated), but in flow charts in Confluence. That has the same issues as documenting every line of code in comments but worse.

Just because a tool has some issues and limitations doesn't mean it gets banned from our toolbox.

This is very much it. Every tool can be abused and no tool is perfect. Code can have bugs and can be bad (and often both things happen). Should we now ban writing code?

If the comment and the code doesn't match with each other, which one is true?

This can be true even with code alone. A while ago I found a bug in an old piece of code written by someone who left the company years ago.

The method causing the bug was named something like isNotX(). In the function it returned isX. About half the places where the function was called, the returned value was assigned to a variable named isX and in the other half of the places the variable was named isNotX. So which is true?

A javadoc-style comment could have acted as parity. Since comments are simpler to write than code, it's easier to correctly explain the purpose of a function in there than in code.

While in the example I referenced it was quite clear that something was wrong, this might not always be the case. Often the code looks consistent while actually being wrong. A comment can help to discern what's going on there.

Another example of that that we had at the same project:

In the project there were bookings and prebookings. We had a customer-facing REST endpoint called "getSomeSpecialBookings" (it wasn't called that, but the important thing was that this function would return a special subset of bookings). Other "get...Bookings" endpoints would return only return bookings and not prebookings, but this special endpoint would return both bookings and prebookings. A customer complained about that, so we fixed the "bug" and now this endpoint only returned bookings.

(There was no comment anywhere and we couldn't find anything relevant in Confluence.)

Directly after the release some other customer creates a highest priority escalation because this change broke their workflow.

Turns out, that endpoint only existed because that customer asked for it and the dev who implemented that endpoint just implemented it as the customer requested without documenting it anywhere.

A comment would have been enough to explain that what this endpoint was doing was on purpose.

We all know that code tends to be bad, especially after the project has been running for a few years and has been through a few hands.

Why would anyone think that code is good enough to be the documentation?

Luckily these days we have good tools in regards to source control, with things like feature branches, pull requests with tools that allow for discussion and annotation. That way at least usually the origin of a change is traceable.

Sadly, we also have non-technical people running procurement and thus we keep switching tools because one is maginally cheaper or because cloud is cool right now (or not cool anymore right now) and migrations suck and then we end up with lost history.

[–] Thorry@feddit.org 3 points 14 hours ago (1 children)

we end up with lost history

Oof, I felt this in my soul

[–] squaresinger@lemmy.world 1 points 13 hours ago

A year or so before I started my current job, the team working on the project got split. Someone then decided that both teams should use different jira prefixes for tickets processed by each team. So they took all issues and automatically split them into two prefixes based on the people who implemented the ticket and renumbered everything. But they didn't do the same in Gitlab merge requests, and they didn't do it in git commit messages either.

So now git and gitlab reference all old tickets by their old numbering system, but there's no trace of these old numbers in Jira. It's close to impossible to find the Jira ticket mentioned in a git commit message.

Oh, and of course, nobody ever managed to properly link Jira and Gitlab (so that jira tickets contain the gitlab MRs, branches and commits) because for that you need a free Jira plugin and procurement wants a multi-page long description why this is needed, and it needs to be signed off by 5 people including the department lead and has to go through the whole procurement process before we can install that plugin.