Skip to content

October 27, 2022

Visualizing The Book Writing Process: And Help I’m Looking For (Vega and Vega-Lite, SVG, and ideas?)

By Gene Kim

Back in 2019, after The Unicorn Project was published, I published a github repo of code (written in Clojure, of course) that created a graph based on analyzing the git repository where I committed the daily updates to the book manuscript. The graph below shows the word count over time, and by parsing the diffs, it also shows the region of the manuscript that was modified.

What I loved about it was that it shows the sequential process of working through a manuscript, from top to bottom, several times, restructuring and refining the words. It helped me understand how one spends the three years between the first words written and handing off the finished manuscript.

Generating this graph was a first stab at doing something I’ve wanted to do for nearly a decade, which is visualize the way a book manuscript changes over time.  

But, what I really wanted to do was something like Gource — the video below shows a representation of the evolution of the Python code base from (Aug 1990 to Jun 20212).

Earlier this year, three years after I wrote the code above, I dug it back out. My goal was to see if I could use it to help track progress and visualize work on the current book I’m working on with my mentor, Dr. Steven Spear, which we have until April to get done. 

Here’s a modification of that graph for that manuscript, showing where all the manuscript adds/modifies/deletes are.

But this was a far cry from what the Gource visualization shows.  

I was truly inspired by one of the Glamorous Toolkit / Smalltalk pairing sessions with Tudor Girba and Eric Normand. I decided to write a recursive descent Markdown parser to generate a Vega tree diagram and try animating it over time / commits.

This is an early prototype, but I was pleased with the output. Shown below are 103 frames of animation showing the structure of the manuscript, in both a radial tree view and tree view, showing the evolution of The Unicorn Project manuscript.

I’m hoping to combine it with the word count graph shown above.

Help I’m Looking For From Anyone!

(Jack Rusher: I’ve been thinking of you for the last year as I’ve been working on this, thinking you’d have some awesome insights and ideas — I’d love any impressions or advice!)

Questions from the top of my head, for anyone with opinions and ideas!

  • I’d love to show the Vega radial tree and tree diagrams side by side, and have them be about the same size — but using hconcat or similar operations requires them to share the same data set…  So that won’t work?  
  • I convert the SVG diagrams into PNG files, but it causes them to get super fuzzy — are there better ways to do this?  (Jack, I’m using your awesome darkstar library — and I have a proposed doc change to disclaim that the call into GraalJS has to be single-threaded. I spent a couple of hours yesterday trying to make a multi-threaded version — any chance anyone wants to pair on that together to get that done?) (PS: GraalJS is amazing. And I’d love to try running this inside of GraalVM to see how much JIT helps performance.)
  • Maybe the better approach is to take the parsed Markdown and split it into directories and files so that Gource can generate the visualization?
  • But the more useful approach would be a JS app, which could take all 103 frames, and use a slider bar to show each frame.


  • get all commits from repo
  • identify markdown file to extract
  • parse markdown into tree
  • overlay git add/modify/delete onto tree
  • convert into vega tree or radial tree
  • convert vega diagram into SVG diagram (using darkstar)
  • convert SVG into PNG (using macOS RSVG program)
  • convert PNG files into GIF (using ImageMagick)

I couldn’t have done any of this without Cursive / IntelliJ IDE, Clojure REPL, and the fantastic clerk notebooks (that enable “moldable development”, as coined by Tudor Girba) and portal (which can render vega diagrams).

I plan on making a video of how I these tools make for such a fantastic development environment and workflow.

- About The Authors
Avatar photo

Gene Kim

Award winning CTO, researcher, and author.

Follow Gene on Social Media

No comments found

Leave a Comment

Your email address will not be published.

Jump to Section

    More Like This

    What to Expect at DevOps Enterprise Summit Virtual – US 2022
    By Gene Kim

    I loved the DevOps Enterprise Summit Las Vegas conference! Holy cow. We held our…

    Map Camp: Weird Mapping – How to Create a Revolution
    By David Anderson

    A version of this post was originally published at Dave Anderson, author of…

    Serverless Myths
    By David Anderson , Michael O’Reilly , Mark McCann

    The term “serverless myths” could also be “modern cloud myths.” The myths highlighted here…

    What is the Modern Cloud/Serverless?
    By David Anderson , Michael O’Reilly , Mark McCann

    What is the Modern Cloud? What is Serverless? This post, adapted from The Value…