I wrote a moderate-size project (about 10K LOC) in literate programming with a self-made tool. I also read a couple books written in this manner, such as "C interfaces and implementations" by D. Hanson.
I would not say it is a good way. The true thing is that code cannot be self-documented, thus documentation is necessary. But literate documentation is not the right way to do it.
First, it is too crafty. The good documentation should be formal and have a definite standard structure that is repeated across all similar projects. It also should have an encyclopedia-like form of short standalone pieces interlinked into a bigger whole so you can start anywhere instead of reading it from start to end.
Second, it is too programmatic. It has real code. But real code is not good to describe what is important. Try porting METAFONT to something else. METAFONT code is fully documented. In Pascal. And you want to draw it with JavaScript. It would be way more helpful to describe these algorithms without tying them to Pascal or any other specific notation. And then, once they are described in this form, add a document that maps them to a Pascal implementation.
rixed 5 hours ago [-]
Similarly, I have written several smaller programs using my own literal programming tool, and although the experience is interesting as it forces you to think deeper at every step, I came to the same conclusion that the end result feels hard to return to, and that literate programming is no replacement for higher level languages.
Propelloni 1 hours ago [-]
My first contact with literate programming was through Haskell, which has built-in support for it. It was great for learning a new language and a new domain. I abandoned the practice for day to day work, but occasionally return to it if I have to unlock a new domain. To me, it is a lot like rubber-ducking.
bilalq 11 hours ago [-]
I actually did some small hobby projects using Literate Coffeescript a long time ago. Looking at the source code today, and I can't help but feel like the proponents of literate programming were really onto something. I'm coming back a decade later, but I can easily see what's going on and why at a glance. Compared to many other projects that I've written in the past without documentation, it's a completely different vibe. The Gulpfile in particular is such a treat to read.
Yeah, it can look a bit repetitive if the code is already clear, but the context of why a thing is being done is still valuable. In the modern era with LLM tools, I'm sure it could be even more powerful.
ai-christianson 9 hours ago [-]
> Yeah, it can look a bit repetitive if the code is already clear, but the context of why a thing is being done is still valuable. In the modern era with LLM tools, I'm sure it could be even more powerful.
Is that because of literate programming, or is that because practicing literate programming made you focus more on writing high quality code and docs?
bilalq 7 hours ago [-]
I'd argue it's the same thing. When doing literate programming, I started by first writing a description of what I was going to do and why. Then I wrote the implementation. When I finished, I went back and updated the description to match what I'd done. Maybe I'd get the idea to improve the approach and repeat this for a few cycles.
But the specifics of the flow aside, it's the mindset difference that makes it all feel special. The docs are the primary artifact. The code is secondary.
In an era of Copilot-style inline suggestions, taking the time to write a lengthy description effectively feeds the prompt to get a better output.
naruhodo 5 hours ago [-]
I tend to write doc-comments before the functions they document, because it helps me think more clearly about what I want to happen - and sometimes causes me to entirely re-think my approach and abandon the function entirely.
I can definitely see such a practice improving LLM output.
Meanwhile, there are programmers that think comments are a "code smell".
txgvnn 11 hours ago [-]
If you are interested with Literate programming, you should try Emacs. Some packages are org-mode, eev and even elisp are best for literate programming. Example https://www.youtube.com/watch?v=dljNabciEGg
xenodium 5 hours ago [-]
Emacs org babel is super handy and fun. Sometime ago, I took a peek at one of the supported languages to see if I could get Objective-C working. It was fairly straightforward. Since then, I added SwiftUI, LLMs, and Dall-E babel support.
Tangle: Extract the source code blocks and generate real working code files for further compilation or execution, eventually outside of Emacs.
Weave: Export the whole Org file as literate, human-readable documentation (generally in HTML or LaTeX)."
Another interesting implementation of literate programming with _bidirectional sync_ between documentation and source code is Entangled (https://entangled.github.io/). This allows you to use all your normal tooling on the normal source code files, and the changes are reflected back to your Markdown documentation files.
svieira 12 hours ago [-]
The fact that the actual implementation is in `lit` too is really helpful - getting to see how one would actually use this on a larger program does make it much more intriguing than the simple examples (and much more approachable than TeX itself).
Been around for a long time indeed. I first learned literate programming in college at Tufts, from Norman Ramsey. He wrote noweb[1], an early implementation of Knuth's ideas.
One of my first software contributions was indexing and cross-references for noweb in plain TeX. I had thought it was completely lost to time and hard drives but it seems someone has actually kept it around! Bonkers.
(The username being `partingr` suggests it was some time late 92 to mid 95 whilst I was at cs.man.ac.uk)
I used something called nuweb for a class project back in the late 90's (an implementation of the "bully algorithm" in Java).
I don't remember why I selected nuweb, other than it worked with any language, but it looks like it was inspired by noweb. I had learned about literate programming from studying TeX.
onair4you 12 hours ago [-]
Oh you beat me to it!
w10-1 11 hours ago [-]
This doesn't seem to provide any context for literate programming, or the core literate operations?
cf leo editor for literate programming in python [0]
Yes, markdown has code blocks, and notebooks have embedded code in documentation since Mathematica in the 1980's. It is possible to get IDE support in such blocks.
But for literate programming, weaving/tangling sources is needed to escape the file structure, particularly when the build system imposes its own logic, and sometimes one needs to navigate into the code. Leo shows how complicated the semantics of weaving can get.
Eclipse as an IDE was great because their editor component made it easy to manage the trick of one editor for many sources, and their markers provided landmarks for cross-source navigation and summaries.
> [...] and notebooks have embedded code in documentation since Mathematica in the 1980's.
Late 80's, very late ... but the concept of "notebooks" predates Mathematica by at least a decade (it was very common to embed structure in source code files with markup).
jostylr 12 hours ago [-]
I've been wondering if AI coding agent world makes literate programming valuable again. I got into it with JavaScript being a mess prior to the modern changes. Needed a lot of workarounds. Then they improved the language and it felt like coding could be efficient with them. But if the programmer switched from coding to reviewing, maybe it would be good to be able to have various snippets, with an explanation preceding it and then verifying it. Haven't tried it yet. But I do wonder.
seanwilson 11 hours ago [-]
Maybe I'm missing something but how often is the English in literate programming repeating what's already written in the code? Does it work for large projects where it's often hard to explain all the parts in a linear way in the style of an essay?
I avoid code comments where I can because English is way less precise than code, it's an extra chore to keep the comments and code in sync, and when the comments and code inevitably get out of sync it's confusing which one is the source of truth. Does literate programming sidestep this somehow? Or have benefits that outweigh this?
juliangmp 11 hours ago [-]
I'm not sure who first coined this idea or put it in a book or where I've read it, but for code comments I generally like the "explain why, not what" philosophy.
The "what" is answered by the code itself and should be easy enough to comprehend if your design is simple and your names meaningful.
The "why" is much more important. Why does this parser check for some magic numbers at this specific offset and change some parameters if it finds them? If you don't explain that its because of e.g. compatibility with some legacy format, its gonna be a mystery to the reader.
kstrauser 8 hours ago [-]
This is dead on. Similarly, please document the "why not". As in, "here's some code that implements what should be an easy idea in an unexpected why. Why didn't I did it it normal way? Because this one special business case would break, and our customer got mad and threatened to cancel."
billfruit 10 hours ago [-]
I prefer to add huge amounts of comments, explaining in as much detail as I can, sometimes it will be a mini-essay in there. I write most of it before I write the code. It helps me formulate the code better. Later it serves as explanatory text.
Usually the problem with comments is that there is too less of it.
seanwilson 9 hours ago [-]
> Usually the problem with comments is that there is too less of it.
I've worked in a few code bases where many of the comments could be removed by using better function names, better variables names, and breaking complex conditionals into named subexpressions/variables.
And there was a fair chance comments were misleading or noise e.g. `/* send the record for team A */ teamB.send(...`, `/* if logged in and on home page */ if (!auth.user && router.name === 'home') ...`, `/* connect to database */ db.connect()`. I'd much rather comments were used as a last resort as they're imprecise, can be bandaids for code that's hard to read, and they easily get out of sync with the code because they're not executed/tested.
A block of comments to explain high-level details of complex/important code, or comments to explain the why or gotchas behind non-obvious code are useful though.
monkeyelite 9 hours ago [-]
The primary benefit of literate programming is being able to change the order of presentation.
But even then my experience doesn’t match yours. So you have some code. Who decided it would be that way? Do you have a picture of how it should look? Can you share a link to where you got this information? What problem led you do to this non-obvious thing?
taeric 11 hours ago [-]
I think this certainly happens a fair bit. Not at all uncommon to have a section that largely says what is going to happen next, which, fair that what is going to happen is what happens.
I think where it shines, is where it helps you break the code up, without having to break it up in a way that makes sense for the computer. Show an outline, but then drill into a section. The overall function can then be kept as a single unit, and you can sort of punt on sub sections. I tried this just recently in https://taeric.github.io/many_sums.html. I don't know that I succeeded, necessarily. Indeed, I think I probably should have broken things into more sections. That said, I did find that this helped me write the code more than I expected it to. (I also was very surprised at how effective the goto style of thinking was... Much to my chagrin.)
I will have to look again at some of the code I've read this way.
To directly answer the question of if it helped keep the documentation in sync, as it were, that is tough. I think it helps keep the code in a section directly related to the documentation for that section. All too often, the majority of code around something is not related to what you were wanting to do. Even the general use of common code constructs gets in the way of reading what you were doing. Literate programming seems the best way I have seen to give the narrative the ability to say "here is the outline necessary for a function" and then "this particular code is to do ..." Obviously, though, it is no panacea.
Literate programming seems fine for heavily algorithmic stuff when there's a lot of explaining to do compared to the amount of code and the code is linear, but I was more thinking about how it works for common web apps where it's lots of mundane code that criss-crosses between files.
WillAdams 9 hours ago [-]
My current project for a while was in a state where it was necessary to keep code in 3 separate files:
- gcodepreview.py (gcpy) --- the Python functions and variables
- pygcodepreview.scad (pyscad) --- the Python functions wrapped in OpenSCAD
- gcodepreview.scad (gcpscad) --- OpenSCAD modules and variables
as explained in: https://github.com/WillAdams/gcodepreview/blob/main/gcodepre... and it worked quite well (far better than the tiled set of three text editor windows which I was using at first) and I find the ability to sequence the code for the separate files in a single master file very helpful.
taeric 8 hours ago [-]
My gut is that this is where it should shine, oddly. The appeal of literate programming is that you can reflow the code to how you want to explain it. As such, if you had to add a bit of code in several places for a reason that you can explain in a narrative, then literate programming should help.
I say oddly, as I don't think I've seen it done for common web apps. I suspect that is largely because frameworks have not been a stable foundation to build on in a long time?
I can't help but think the old templates of old were a hint in how it would have worked fine? Have a section of the literate code that outlines the general template of a file, and where the old "your code goes here" comments used to denote where you add your logic, is instead another section that you can discuss on its own. (Anyone else remember those templates? Were common in app builders, if I recall correctly.)
mapcars 13 hours ago [-]
Literate programming is an intriguing concept, but its hard to compete with modern IDEs. Having build system is good, but can you get proper syntax highlight for the code segments? Or goto-symbol, real-time typechecking?
I feel like it needs its own IDE, because now apart from the coding abstractions you also have named snippets.
codebje 11 hours ago [-]
I write a bit of literate Haskell, sometimes. It's one of the most well supported literate programming systems out there: the compiler supports it, the language server supports it, using VSCode as an "IDE" means full support for all the things you mentioned. Haskell code formatters don't seem to support literate Haskell, though, and GitHub Copilot, at least, gets confused between prose and code (but that's fine, if I'm taking the time to make my code extra readable and understandable the last thing I want is for an AI to get involved).
Maybe a tool like the one presented here could work as a language server proxy to the underlying language's server. The presence of literate text alone doesn't seem to be the main issue, it's getting the code portions parsed, checked, and annotated with references that matters.
The notable downsides are that the .sty and .tex files have to be customized for the filenames which one can output, and I haven't been able to get auto-line numbering working between code blocks, so one has to manually manage the counters.
Code-as-a-Database is something lots of people would like to have, but not much effort has been put into implementation since... Smalltalk? Could still be a pile of loose text files with markup. Like how https://obsidian.md/ is an informal graph database of loose markdown files.
lmm 6 hours ago [-]
> Code-as-a-Database is something lots of people would like to have, but not much effort has been put into implementation since... Smalltalk?
It happens in some forms of Bank Python, but there's not much of it going on in the public/open-source world. I think because the advantages for a lone developer are small, and it's hard to maintain for an internet-based project since globally distributed databases are still expensive, bad, or both.
I mean... somewhat yes to all of those? Emacs can even do most of what you are asking for. When I export an org buffer, it even has the syntax highlighting in the html that I was looking at. :D
Obviously, the type checking will be a bit more limited for code snippets you haven't finished. But especially for image based environments, it should have everything that you have in the image just fine.
CWEB, which is the one that Knuth prefers, even supports step debugging. Has supported it for decades, at this point.
literate programming is particularly well suited for shell scripts. Sh and bash scripts often have high documentation / loc needs, often involving external links, and they lack good composition primitives. literate programming fits it like a glove. I used this for a project (plug: "https://github.com/hraban/tomono", in org-mode w babel and noweb). it remains my favorite public snippet of code to this day.
One often overlooked cute aspect of lp is how a digression on code you tried, but chose not to incorporate, is first class with the same highlighting etc as active code. It isn’t relegated to a monochrome, non syntax highlighted and awkwardly indented block comment. I find this very appropriate, and it encourages documenting “tried but failed” experiments , which can be incredibly useful.
Edit: another really cool benefit of lp: the “examples” chapter = tests. You can tangle the examples into a test script and run them in CI. Very satisfying.
kstrauser 8 hours ago [-]
This hasn't been updated in 4 years. Does that mean it's "done", or did the author get bored and drop it? Are there other common alternatives people who are into this kind of thing are using today?
I don't mean that to sound dismissive. This might be the most popular tool out there for all I know, and so well done that it hasn't needed any updates any ages.
stevelini 6 hours ago [-]
The flagship example is 300 lines to implement word count
nico 11 hours ago [-]
Great concept and very relevant today[1]
It’s interesting that using LLMs is making very explicit that “someone” needs to read the code and understand it. So having good comments and making code readable is great both for AI and humans
If your colleagues just don't feel the benefit of the extra .lit file, is there a way to pull their changes to the derived files into your own .lit files and to keep the .lit files in a parallel version control repo or branch?
taeric 11 hours ago [-]
Sorta? Noweb and org mode's support of it, at least, has a "detangle" and it worked surprisingly well last time I tried it. You can't edit the comments it puts in the source, for obvious reasons. And I'm sure it has trouble if you tried to get too fancy. But it did allow me to edit the generated source directly and pull those edits back into my literate source. I imagine if this was something people were more often doing, you could make it more reliable, even.
BeetleB 9 hours ago [-]
Leo probably has the best implementation of bringing in others changes into your literate project.
asimpletune 4 hours ago [-]
Reading code is more important than writing code.
groos 12 hours ago [-]
Ignore the naysayers here. Good job!
paxcoder 12 hours ago [-]
[dead]
make3 13 hours ago [-]
I feel like being able to import notebooks like `import notebook_name` and run jupyter notebooks (more easily) like `python notebook.ipynb` and the analogue in different languages would already get us 99% of the way there
The most immediate benefits for me are easier inspection and searching of the code in any text editor, and infinitely nicer version control. But it does also let you run and import the Notebook as if it was a Python script!
While writing my thesis I have also been experimenting with a Spyder-like workflow in VS Code, where you put in "# %%" to separate code blocks and get to run them in an IPython console. It had its perks, like the better Intellisense, and also resulted in this mix of interactivity and runnable file. Not as good on the markup front though.
jedimastert 12 hours ago [-]
There's a pretty direct line between the concept of notebooks and literate programming.
audiodude 12 hours ago [-]
Is there any intrinsic reason why Jupyter Notebooks can't be imported? You don't know which code blocks to run?
make3 8 hours ago [-]
not really, you can just run the blocks one by one like `nbconvert --to script` does. Looks like https://pypi.org/project/importnb/ exists
I would not say it is a good way. The true thing is that code cannot be self-documented, thus documentation is necessary. But literate documentation is not the right way to do it.
First, it is too crafty. The good documentation should be formal and have a definite standard structure that is repeated across all similar projects. It also should have an encyclopedia-like form of short standalone pieces interlinked into a bigger whole so you can start anywhere instead of reading it from start to end.
Second, it is too programmatic. It has real code. But real code is not good to describe what is important. Try porting METAFONT to something else. METAFONT code is fully documented. In Pascal. And you want to draw it with JavaScript. It would be way more helpful to describe these algorithms without tying them to Pascal or any other specific notation. And then, once they are described in this form, add a document that maps them to a Pascal implementation.
Yeah, it can look a bit repetitive if the code is already clear, but the context of why a thing is being done is still valuable. In the modern era with LLM tools, I'm sure it could be even more powerful.
Is that because of literate programming, or is that because practicing literate programming made you focus more on writing high quality code and docs?
But the specifics of the flow aside, it's the mindset difference that makes it all feel special. The docs are the primary artifact. The code is secondary.
In an era of Copilot-style inline suggestions, taking the time to write a lengthy description effectively feeds the prompt to get a better output.
I can definitely see such a practice improving LLM output.
Meanwhile, there are programmers that think comments are a "code smell".
- https://xenodium.com/ob-swiftui-updates
- https://github.com/xenodium/ob-dall-e-shell
- https://github.com/xenodium/ob-chatgpt-shell
- https://xenodium.com/org-babel-objective-c-support
"Literate programming (LP) offers 2 classical operations:
[1] https://org-babel.readthedocs.io/en/latest/https://github.com/zyedidia/Literate/tree/master/lit
[1]: https://en.wikipedia.org/wiki/Noweb
(The username being `partingr` suggests it was some time late 92 to mid 95 whilst I was at cs.man.ac.uk)
https://github.com/nrnrnr/noweb/tree/master/contrib/partingr
I don't remember why I selected nuweb, other than it worked with any language, but it looks like it was inspired by noweb. I had learned about literate programming from studying TeX.
cf leo editor for literate programming in python [0]
Yes, markdown has code blocks, and notebooks have embedded code in documentation since Mathematica in the 1980's. It is possible to get IDE support in such blocks.
But for literate programming, weaving/tangling sources is needed to escape the file structure, particularly when the build system imposes its own logic, and sometimes one needs to navigate into the code. Leo shows how complicated the semantics of weaving can get.
Eclipse as an IDE was great because their editor component made it easy to manage the trick of one editor for many sources, and their markers provided landmarks for cross-source navigation and summaries.
[0] https://leo-editor.github.io/leo-editor
Late 80's, very late ... but the concept of "notebooks" predates Mathematica by at least a decade (it was very common to embed structure in source code files with markup).
I avoid code comments where I can because English is way less precise than code, it's an extra chore to keep the comments and code in sync, and when the comments and code inevitably get out of sync it's confusing which one is the source of truth. Does literate programming sidestep this somehow? Or have benefits that outweigh this?
Usually the problem with comments is that there is too less of it.
I've worked in a few code bases where many of the comments could be removed by using better function names, better variables names, and breaking complex conditionals into named subexpressions/variables.
And there was a fair chance comments were misleading or noise e.g. `/* send the record for team A */ teamB.send(...`, `/* if logged in and on home page */ if (!auth.user && router.name === 'home') ...`, `/* connect to database */ db.connect()`. I'd much rather comments were used as a last resort as they're imprecise, can be bandaids for code that's hard to read, and they easily get out of sync with the code because they're not executed/tested.
A block of comments to explain high-level details of complex/important code, or comments to explain the why or gotchas behind non-obvious code are useful though.
But even then my experience doesn’t match yours. So you have some code. Who decided it would be that way? Do you have a picture of how it should look? Can you share a link to where you got this information? What problem led you do to this non-obvious thing?
I think where it shines, is where it helps you break the code up, without having to break it up in a way that makes sense for the computer. Show an outline, but then drill into a section. The overall function can then be kept as a single unit, and you can sort of punt on sub sections. I tried this just recently in https://taeric.github.io/many_sums.html. I don't know that I succeeded, necessarily. Indeed, I think I probably should have broken things into more sections. That said, I did find that this helped me write the code more than I expected it to. (I also was very surprised at how effective the goto style of thinking was... Much to my chagrin.)
I will have to look again at some of the code I've read this way.
To directly answer the question of if it helped keep the documentation in sync, as it were, that is tough. I think it helps keep the code in a section directly related to the documentation for that section. All too often, the majority of code around something is not related to what you were wanting to do. Even the general use of common code constructs gets in the way of reading what you were doing. Literate programming seems the best way I have seen to give the narrative the ability to say "here is the outline necessary for a function" and then "this particular code is to do ..." Obviously, though, it is no panacea.
Literate programming seems fine for heavily algorithmic stuff when there's a lot of explaining to do compared to the amount of code and the code is linear, but I was more thinking about how it works for common web apps where it's lots of mundane code that criss-crosses between files.
- gcodepreview.py (gcpy) --- the Python functions and variables
- pygcodepreview.scad (pyscad) --- the Python functions wrapped in OpenSCAD
- gcodepreview.scad (gcpscad) --- OpenSCAD modules and variables
as explained in: https://github.com/WillAdams/gcodepreview/blob/main/gcodepre... and it worked quite well (far better than the tiled set of three text editor windows which I was using at first) and I find the ability to sequence the code for the separate files in a single master file very helpful.
I say oddly, as I don't think I've seen it done for common web apps. I suspect that is largely because frameworks have not been a stable foundation to build on in a long time?
I can't help but think the old templates of old were a hint in how it would have worked fine? Have a section of the literate code that outlines the general template of a file, and where the old "your code goes here" comments used to denote where you add your logic, is instead another section that you can discuss on its own. (Anyone else remember those templates? Were common in app builders, if I recall correctly.)
I feel like it needs its own IDE, because now apart from the coding abstractions you also have named snippets.
Maybe a tool like the one presented here could work as a language server proxy to the underlying language's server. The presence of literate text alone doesn't seem to be the main issue, it's getting the code portions parsed, checked, and annotated with references that matters.
https://github.com/WillAdams/gcodepreview/blob/main/literati...
which allows me to have an ordinary .tex file:
https://github.com/WillAdams/gcodepreview/blob/main/gcodepre...
which outputs multiple .py and .scad files and generates a .pdf with nice listings-based code blocks, ToC, index, hyperlinks, &c.:
https://github.com/WillAdams/gcodepreview/blob/main/gcodepre...
The notable downsides are that the .sty and .tex files have to be customized for the filenames which one can output, and I haven't been able to get auto-line numbering working between code blocks, so one has to manually manage the counters.
It happens in some forms of Bank Python, but there's not much of it going on in the public/open-source world. I think because the advantages for a lone developer are small, and it's hard to maintain for an internet-based project since globally distributed databases are still expensive, bad, or both.
Obviously, the type checking will be a bit more limited for code snippets you haven't finished. But especially for image based environments, it should have everything that you have in the image just fine.
CWEB, which is the one that Knuth prefers, even supports step debugging. Has supported it for decades, at this point.
One often overlooked cute aspect of lp is how a digression on code you tried, but chose not to incorporate, is first class with the same highlighting etc as active code. It isn’t relegated to a monochrome, non syntax highlighted and awkwardly indented block comment. I find this very appropriate, and it encourages documenting “tried but failed” experiments , which can be incredibly useful.
Edit: another really cool benefit of lp: the “examples” chapter = tests. You can tangle the examples into a test script and run them in CI. Very satisfying.
I don't mean that to sound dismissive. This might be the most popular tool out there for all I know, and so well done that it hasn't needed any updates any ages.
It’s interesting that using LLMs is making very explicit that “someone” needs to read the code and understand it. So having good comments and making code readable is great both for AI and humans
1: “Writing documentation for AI: best practices” https://news.ycombinator.com/item?id=44311217
The most immediate benefits for me are easier inspection and searching of the code in any text editor, and infinitely nicer version control. But it does also let you run and import the Notebook as if it was a Python script!
While writing my thesis I have also been experimenting with a Spyder-like workflow in VS Code, where you put in "# %%" to separate code blocks and get to run them in an IPython console. It had its perks, like the better Intellisense, and also resulted in this mix of interactivity and runnable file. Not as good on the markup front though.