When docs and a dinosaur Git along: enabling versioning in Docusaurus

This blog was originally published on Spectro Cloud’s offical blog site. Click here for a direct link to the original blog.

Rapid innovation — great for the product, a challenge for docs!

Palette is a powerful, versatile product and it changes fast. We issue major releases multiple times per year and there are always minor changes to things like our supported packs and environments. 

This is a good thing: it means we’re keeping pace with the needs of customers like you, and with the innovation of the cloud-native ecosystem.

But it creates a challenge for our documentation , too. Users depend on documentation to navigate the many features of Palette, and to get up to speed with what’s changed in each new version. At the time of this writing, our docs code base has over 435 markdown pages containing product information, and it continues to grow. 

We love a challenge, and our docs team is always looking at ways to improve not only the depth and quality of the content on our docs site, but also how easy it is to consume. For example, you may have already read about our Mendable chatbot integration in a previous blog.

Setting out on the content versioning mission

In late July 2023, we started the groundwork for a much more fundamental innovation: enabling content versioning on our documentation site. 

Before that, we had to manage all of our documentation data together in a single git branch, clarifying what content applied to what product version in various parts of the documentation. 

This was difficult for us to manage, but more importantly it made it challenging for our customers to identify what information pertained to specific product versions. This matters, because we give Palette customers using dedicated or self-hosted instances the freedom to stick with older versions of the product until they’re ready to upgrade.

Our goal was to implement a way for users to select the Palette version they’re using, and have the docs site show only the information relevant to that version. 

Versioning is an ‘advanced level’ practice — but for us we knew it was the right approach. 

In this blog, we’re going to show you how we implemented content versioning with Docusaurus and using git as the source of truth for all versioned content. We’ll also share our solution and explain it in detail.

If you’re a Palette customer, we hope you’ll find this an interesting behind-the-scenes tour of a feature that’s valuable to you. 

And if you or your team are looking at content versioning in your own documentation, we hope this article helps you and your team consider using git as the source of truth for all versioned content versus the default Docusaurus behavior of duplicating folders.

To enhance the value of this article to the open source community, and facilitate your understanding of the concepts, we have included a public GitHub repository with all the content code. This repository serves as a practical tool for you to explore and try out our solution. Simply head over to GitHub, click on the green ‘Use this template’ button, and embark on the technical adventure we are about to share.

PS: Review the README in the repository for additional information, such as the FAQ.

The foundation: docs as code

At Spectro Cloud, we follow a Docs as Code philosophy. This means we use the same tooling to create and maintain our documentation as you would use to create, test, deploy, and maintain an application. By following engineering best practices, we are able to deliver complex documentation reliably. 

We already used the best open-source documentation framework, Docusaurus , with our codebase in a git repository. Our vision for enabling content versioning was to use different git branches for each minor product release. 

Like any engineering effort, we started our versioning project by defining our requirements, both for our users and for our technical authors:

User requirements

  • It’s easy to find documentation that applies to a specific version of Palette.
  • I can easily identify and change my version context.
  • If I need to, I can always access information about prior Palette versions (updates don’t erase old information).

Author requirements

  • Enabling versioning doesn’t impact the ability of docs content to scale in the future.
  • Technical authors can target specific versions when updating, creating, and removing content, with minimal overhead.
  • The capabilities of the Docusaurus framework should be leveraged versus creating a custom solution.
  • Enabling versioning does not introduce a user-breaking change, such as URLs not working, etc.

The challenge: default Docusaurus behavior

Docusaurus supports content versioning out of the box. The official documentation does a great job of explaining how to enable the feature and what changes you need to make. 

The default behavior is simple and lends itself well to small documentation sites. Basically, whenever you create a new version, Docusaurus creates a copy of your entire content folder and increments the version number. Docusaurus exposes a command that handles this for you. 

npm run docusaurus docs:version 1.1.0

It looks something like this: 

The challenge with this approach is that in large documentation sites (such as Spectro Cloud’s), it can lead to large code bases that consume a lot of storage space. 

The large code base impacts interactions with the versioning control system, especially fresh clones. The latter point is challenging for CI/CD pipeline jobs, as code bases are often cloned, and downloading the code base will take longer. 

The other challenge is that when there are multiple content folders (one for each version), the folder tree structure becomes more complex. It’s more likely that an author accidentally works on the wrong version folder.

The other factor to consider when implementing content versioning is how to handle future backport changes. Backporting is when you have to change, either inserting or removing content from a previous version.

Keep in mind that docusarus does not handle backports: each team is responsible for creating and maintaining their own implementation.

Overall, the default Docusarus versioning behavior was not ideal for our team as we felt it did not address our requirements.

Git around the challenges

We quickly identified git as the solution to address both our scaling concerns and the ability to better control what information and content belonged to a specific version of Palette. 

The main challenge was that Docusaurus does not integrate with git, at least not out of the box. 

However, Docusaurus’s simple design for creating versioning content allowed us to create custom logic that built on the out-of-the-box behavior. 

The next issue was orchestrating when and how to invoke the Docusaurus version command. 

We decided to develop an orchestration mechanism to enable our vision for using git as the source of truth.

After evaluating a range of options, we settled on a bash script, which we named versions.sh, to act as the orchestrator for this logic and overall versioning solution, and we built out a set of actions that the orchestration solution steps through every time we run the script. We used the following decision matrix to help us determine the ideal solution for our team. Your team might select a different option depending your team composition, environment and so on.

If you’re following along using our sample GitHub repo, you can find the overall logic for the orchestrator in the example repository under the scripts/ folder with the name versions.sh .

Orchestrator bashes into action

  1. The orchestrator’s first step is to fetch all version branches from GitHub. The orchestrator needs to ensure it has the latest changes locally to generate up-to-date versioned content. The script contains some filtering logic to ensure that only version branches are worked on. The orchestrator needs to loop through each version branch, so it must know what all the version branches are.
  2. The second step, and the beginning of the loop logic, is to switch git context into each branch.
  3. Once inside a version branch, the orchestrator issues the `npm run docusaurus` command to generate the version content for the current branch. 
  4. That versioned content is moved to a staging area…
  5. …along with the respective versions.json file auto-generated by the Docusaurus command. A quick note about versioned content: for each plugin you create a version for, a sidebars and docs folder is generated. The versions.json file informs Docusaurus how many versions of content are expected. 
  6. Before exiting the loop (step seven), the orchestrator removes all the versioned content generated. Otherwise, git will complain about uncommitted changes and prevent the orchestrator from…
  7. …switching to a different git branch.

Once the loop logic iterates through each version branch and the content has been gathered in the staging area, additional logic can be applied, such as merging all generated versions.json files into a single file. The versions.json file is required and expected by Docusaurus when versioning is enabled.  

  1. Next we move all the versioned content from the staging area back to the root folder of the Docusaurus project. Moving the collective versioned content back to the project root folder along with the versions.json file allows Docusaurus to actually use the versioned content for a build or preview with a local development server.
  2. The last step of the orchestration flow is to modify the docusarus.config.js file. This file expects a configuration definition for each version found in the versions.json file. Docusaurus will refuse to start or compile if a versions.json file is available and no version definitions are provided to the docusarus.config.js

Streamlining local editing

An important note on that final step. The behavior we describe challenges local development environments, as versioned content is expected. But we needed our technical authors to be able to work on the site locally, including when no versioned content was available. 

Our solution was to configure docusarus.config.js with the following default entry and ensure no version.json is available locally. The content in the current branch, either the default branch or a version branch, is simply displayed as the latest content. 

This configuration allows our authors to start the local development server in all branches, including our main branch, without having to generate versioned content. This latter point is important, as working with versioned content can be taxing on the local development environment (especially on large sites like ours) and slow things down.

However, the configuration must look like the following image when versioned content is generated, and a versions.json must be available at the root of the project.

Scripting versus manual updates

We wanted to avoid manually updating the docusarus.config.js configuration file every time a new version is introduced or an old version is archived. We also wanted the ability to modify the version label and change the version banner behavior if needed. 

To do this, we created a NodeJs script that traverses and updates the docusarus.config.js file. The script finds the entry for a specific plugin ID, and inserts the values for each version branch. 

You can find the nodeJs script under scripts/update_docusarus_config.js

We chose a NodeJs script because it was the more straightforward approach to modifying a JavaScript file, and our CI environment also had NodeJs installed and available, therefore removing the need to install new tooling or dependencies. 

To manage the displayed version label and banner, a file named versionsOverride.json holds the label and banner configuration . The file contains a small JSON configuration that the update_docusarus_config.js consumes and takes into account when creating the new docusarus.config.js.

Archiving old versions

For a better user experience, and as recommended by the official Docusaurus documentation , we minimize the number of active documentation versions visible to users, and offload older versions from the main build to our archive domain. 

To do this, we use the archiveVersions.json file. It contains a reference to a specific version branch and the external URL hosting the archived content — we use Netlify to host all our archived versions under a different domain. 

We leverage the branch deploy feature to deliver this experience. From a docusarus.config.js perspective, we simply instruct Docusaurus to read from archiveVersions.json when defining the version dropdown menu for the docs plugin.

With this approach, we can ensure we only support three active versions at any time. All other previous versions become external URLs pointing to the Netlify archive domain. 

Abracadabra… watch the magic happen

Once all the orchestration steps are complete, we can either start a build with the command `npm run build` or preview the versioned content with the command `npm run start`. 

Below is a short GIF of the end-to-end flow. Take note of how the orchestration logic switches into different git branches and generates the versioned content. The folders versioned_docs and versioned_sidebars are generated throughout the process for each version branch and moved to the staging area.

Handling backports

The last step of our technical journey covers how we decided to tackle backporting changes. We wanted an experience that was relatively straightforward for our team to understand and use. We envisioned a flow where an author created a Pull Request (PR) and could choose where to backport the changes from the PR.

We investigated a few approaches, including developing our own backport solution. Luckily, we stumbled upon the open-source project The Backport Tool . We use this tool in a GitHub Action, and through it, we can offer our authors the experience of controlling backports through GitHub labels in a PR.

By selecting a label corresponding to a specific version branch, the author can trigger a subsequent PR for the particular version branch after merging the PR. The original PR receives a comment containing the status of each backport PR. 

In a merge conflict scenario, the generated status comment will display the reasons for the conflict. These scenarios are handled by git cherry-picking the merged commit into each respective version branch and manually resolving the merge conflict.

If you start the local development server in the example repository, you can find more information about this backport workflow by reviewing the tips and tricks page.

Over to you!

So there you have it: a scripted solution to implement git-based versioned documentation complete with archiving and backporting, already deployed and running great here at Spectro Cloud. 

We’re pretty proud of what we’ve built, but we know we’re standing on the shoulders of giants. At Spectro Cloud we’re big fans of open source , and over the past five years, we’ve been honored to contribute to projects like LocalAI, Kairos, Validator and SpectroMate. We dedicate this article as a token of appreciation for the Docusaurus project and its hard-working maintainers. Everything you just read builds on their amazing work.

Now it’s your turn. We hope this article and the associated example repository help spark ideas for how you and your team can tackle your own documentation versions. If you have questions, take a look at the example repository and review the README’s FAQ section.

Do you have experience with Docusaurus and git and want to chat about it? Do you have any ideas or suggestions? We would love to hear from you. Join our Slack community , where we discuss everything from Kubernetes to open-source.

And of course, do take a look at how the version functionality works in practice on our docs site – we hope it makes your experience as a Palette user that bit easier!