Canonising The Apocrypha Of [COMPANY NAME] Engineering

publié sur 2023-03-15 12 min de lecture

This is another repost of an RFC I wrote for work. The same "rules" from last time to protect "company property" apply here as well, with some new ones added.

Links to company documents have been replaced with fun videos.
Some examples of documentation have been removed.

To "protect" my privacy, I have also redacted references to [COMPANY NAME], although it shouldn't be that hard to figure what it is supposed to be with a little digging into my public profile, so I don't really know why I bothered.

A normal post should be coming in a month (or longer) if you were looking forward to that… for some reason.

Summary

This is an RFC for the implementation of a documentation system in engineering.

Motivation

Documentation in [COMPANY NAME] is lacking, possibly in no small part due to our structure in writing it. The current process does not give any guidance on the purpose, structure, and audience of the document the author is writing for, making it harder to write.

User intuition is not reliable enough when it comes to learning software. Complex systems have a wide range of applications. This leaves considerable room for human error. Having clear, easy-to-understand documentation will mean fewer problems to solve and less firefighting.

General reference

Documentation is context specific and depends on what your audience is looking to get out of it. Docs should be though of in terms of desired goals and outcomes: learning-oriented, problem-oriented, etc. From my experience, I go to documentation to try to find an answer to one of these three questions:

How do I X?
What is X supposed to do?
Why is X doing Y?

Each category of question follows a distinct user journey, and our documentation should be divided along these lines.

How-Tos (How do I X?)

These focus on achieving a specific outcome, be it how to run a migration in AWS DMS or add a new profile scraper. They are for when you aren’t interested in the specifics and just need to get something done. Learning how to do something before learning how it works is easier to understand.

Ideally, there should be an end goal to reach - to complete the exercise, and should allow your reader to get hands on with the how-to while reading instead of after. To help readers achieve this, how-tos should be clear, accessible and with a single focus.

Overviews (What is X supposed to do?)

These explain how the feature/code works, gives background and provides context. This type of documentation is for those who want to take a closer peek under the hood. They want to make connections across the software and understand the nuts and bolts. Information here is theoretical, rather than practical or descriptive.

It’s documentation that approaches a topic from a higher perspective, and from different angles. The main goal should be comprehensiveness. The reader ought to come away from a read feeling very comfortable with the topic in question. They should feel that they know the vast majority of the possible options, and more importantly they should understand how all the concepts fit together.

References (Why is X doing Y?)

A complete reference for all the APIs your code provides, or a very detailed outline about how a particular thing works. These should be designed for those who already know how to use some API, but need to look up the exact arguments some function takes, or how a particular setting influences behaviour, etc.

It’s important to point out that reference material is not in any way a substitute for good how-tos/overviews! Great reference material in the intercom service does readers no good whatsoever if they don’t know the intercom service even exists.

It was actually quite hard to find a good example for each category from our existing library of documentation. We need to be a lot stricter on the quality of documentation, because a lot of them - not trying to be rude here, but - they’re just not very good.

Guiding principles for writing

Once you know for what purpose and for whom the documentation you are authoring is for, here are some best practices to keep in mind:

Use a clear structure with an outline.
Use diagrams and illustrations to support your points. And don't forget to caption these.
Ensure the information is up-to-date, complete and correct. Remember, accuracy is critical.
Don't be lazy. Show respect for the people taking the time to read your work and put in the effort (grammar, structure, quality of examples, etc).
Consider writing the documentation, specifically the overview, while or before implementing features instead of after. This can help even with writing the code, as it can give you a clearer picture of what you’re implementing.
- Taken from GitLab’s handbook usage approach:
Documenting in the handbook before taking an action may require more time initially because you have to think about where to make the change, integrate it with the existing content, and then possibly add to or refactor the handbook to have a proper foundation. But, it saves time in the long run, and this communication is essential to our ability to continue scaling and adapting our organization.
Have someone proofread, or edit for you.
The best way to get better at writing is to write. The second best way to is read some else’s writing and copy them (this coincidentaly also involves writing).
- It’s pretty easy to spot the influence James Mickens has had on me. Specifically, the 2013 masterpiece “The Night Watch.”

Devil’s advocate

I don’t really want to play devil’s advocate against writing documentation, but to be fair, sometimes just figuring it out as you go is more fun :3 Really gives you that sense of pride and accomplishment.

Rationale and alternatives

Nearly everyone understands this. Nearly everyone knows that they need good documentation, and most people try to create good documentation. And most people fail.

As mentioned before, the idea with this approach is to categorise documentation by goals and outcomes, and to adopt a systematic approach to understanding the needs of documentation readers in their cycle of interaction with what is being documented.

This approach is light-weight, easy to understand and straightforward to apply. It doesn’t impose implementation constraints, and authors should be free to write documentation in the way they feel best gets the reader to their desired goal/outcome.

It also makes it more obvious on what docs are missing. If there was just a page called “profile scrapers”, I wouldn’t know what that covered - if it told me how to add one, or how they worked - and if this page didn’t have one of those, is it missing, or just somewhere else? Whereas if it were divided by outcome and there was only a page in overviews called “Profile scrapers architecture”, I would know what that’s about and could see that “How to add profile scrapers” is missing in how-to.

I’ve seen other approaches that are more structured, with a taller and stricter hierarchy of categories. I don’t believe this is the correct fit for us (or anyone tbh) because it lessens discoverability of documentation, compared to how a flatter structure can better utilise Notion’s database searching and filtering capabilities to enhance discoverability while still not overloading the reader with pages and keeping the “neatness” of a nested structure.

Having to navigate through several pages (especially on a website as slow as Notion) is also annoying (yes, most libraries/handbooks have a search, but sometimes I don’t know what I’m searching for). Reducing friction is important in this case of UX (or maybe it’s DX here?) so that people will actually want to read the documentation and won’t be off-put by having to put in any amount of effort. Some people already think documentation is boring ⁽¹⁾ ⁽¹⁾ They’re wrong, but some people think pineapple doesn’t belong on pizza (HINT: IT SHOULD BE THE ONLY PIZZA TOPPING), so maybe it should be expected at this point , don’t make it any worse.

If I’m casually browsing the documentation library, as I often do, and see a page called Data model, I’m going to be like “oh that seems neat and kinda important, I’ll go check it out.” But then when it turns out you have to download some attachment and open it in another web app, I’m going to be like “nah cba, maybe next time.”

This is bad, because who knows, maybe “Data model” is the most important document I could have ever read. Maybe my life will be forever incomplete without the wonders it holds. Maybe it would’ve provided me with the knowledge of eldritch horrors beyond my understanding, but then right after also provide me with the knowledge to understand said eldritch horrors to ensure I don’t go insane.

IF ONLY “DATA MODEL” WAS STRUCTURED IN SOME WAY THAT DIDN’T TRIGGER MY LAZY SLOTH-LIKE OTHER SELF THAT HATES ALL FORMS OF EFFORT.

NOW I’LL NEVER MEET THE GREAT TENTACLE MONSTER FROM THE SKY BECAUSE I HAD TO CLICK MORE THAN 2 TIMES TO SEE A DIAGRAM ABOUT WHAT I’M ASSUMING (BECAUSE I DIDN’T LOOK AT IT) FROM THE TITLE AND THE CONTEXT IN WHICH THIS PAGE HAS BEEN PLACED IN, IS A DATA MODEL OF OUR DATABASE.

Prior art

This entire RFC basically takes ideas from Diataxis née Divio's Documentation System, which is generally considered the cardinal rules of technical writing (source: ~~Hacker News~~ just trust me bro). Most, if not all professional technical writers divide their documentation into content types along those lines.

Diataxis puts documentation in 4 categories instead of the 3 I wrote about here. I removed the 4th (tutorials) because, in a purely engineering context, I couldn’t find enough of a difference between tutorials and how-tos, and so I merged them together (though I am open to having 4 if people really want it).

Although I feel I’ve mentioned all the important parts, I would encourage you to read through Diataxis to get a better understanding of this approach beyond the brief summary here.

Others I ~~stole from~~ was inspired by include:

Jacob Kaplan-Moss (Co-creator of Django) “Writing Great Documentation: What to write” (2009) - also divides docs into 3 categories that are pretty much the same as mine.
Federal Information Processing Standards Publication “Guideline for Software Documentation Management” (1986) - didn’t really use this, but was interesting to read.

Obligatory Rust evangelism from a Rust cult member

Please allow me this opportunity to tell you about the good word of the Rust programming language.

PLEASE, I have to do this - if I don’t shill Rust at EVERY opportunity, much like how the Duolingo owl murders people who forget to do their daily Spanish - Ferris the crab will BREAK into my apartment and SNAP all my LIMBS in HALF with their MUSCULAR CRAB CLAWS (we’re a much kinder and compassionate community here at the Rust cult, so no one dies).

Even the worst documented Rust projects like https://docs.rs/roux/2.2.1/roux/ (don’t mind me, just “shilling” my own library) have decent docs compared to other languages because it can generate docs all from doc strings in code.

Notice how an API reference is automatically generated at the bottom.

There is also a compiler option to make compilation fail if a public method/struct/etc. doesn’t have a doc string.

Writing code examples in your docs are also extra helpful in Rust because you can actually run them as unit tests.

On a standard qwerty keyboard, 'rust' is offset by exactly one key-width left from 'tidy'. 🤔

Open questions

I think having a glossary like: https://glossary.infil.net/index.html would also be useful for engineers to be able to look up AWS services/political terminology/GCX engineering terms.
- Definitely not suggesting this because I still don’t know what MP stands for…
I’m assuming the categories would be their own Notion database, but am a little undecided if that’s right for the reference docs, and if they should be in doc strings in code instead (LIKE RUST!!).
- This would make it closer to the code, which is probably where you already are if looking at reference documentation. And it might make updating it easier after a code change.
- It makes LSP better. Neovim (I’m assuming the lesser “IDEs” do this as well) displays the doc strings of everything in the autocomplete menu, which is pretty helpful most of the time.
- But it “spreads” the docs out, which is something I want to avoid. Docs should be as centralised as possible to make discovery easier. It would be easier to find method references if there wasn’t a bunch of code to skip through as well.
- It’s also possible we could have reference docs for things that aren’t code, or isn’t code that we wrote. In which case, reference docs would be in two different places, which feels yuck to me. Although, having reference docs for our code - in our code, and reference docs for everything else in Notion is probably an acceptable division if we were forced to have one.
- Having something that could publish the doc strings to Notion would be an ideal middle ground (KINDA LIKE HOW RUST PUBLISHES THE DOCS TO DOCS.RS).
Should we use a style guide for writing? It could help with making everything consistent, but would add more friction for writing?
- Most likely would use something like APA.
- Or make our own simple one that is easier to learn.
A system to ensure docs are up to date would be good, but I can’t think of anything that wouldn’t be annoying and useless most of the time.

Future possibilities

How can future be real if our eyes aren’t real?