Glad You're Ready. Let's Get Started!

Let us know how we can contact you.

Thank you!

We'll respond shortly.

Quick abstracts of YAML or JSON documents

When I work with unfamiliar YAML files specifying deployment manifests, product metadata, serialized records, etc. I want to quickly get a sense of a few things:

  • What is the set of keys in this data structure?
  • If the structure(nested keys) of the document changed over time, what is the quick summary of changes


Given the following long YAML file, I don’t really want to read through all of it to learn what keys and paths are available in it:

  "receipt": "Oz-Ware Purchase Invoice",
  "date": "2012-08-06",
  "customer": {
    "given": "Dorothy",
    "family": "Gale"
  "items": [
      "part_no": "A4786",
      "descrip": "Water Bucket (Filled)",
      "price": 1.47,
      "quantity": 4
      "part_no": "E1628",
      "descrip": "High Heeled "Ruby" Slippers",
      "size": 8,
      "price": 100.27,
    …(many many more items )

Lets remove the value content, focus on structure, summarizing array entries as one:

>structure_digest order1.yml order2.yml ...

This summary hints at the basic structure of the file, particularly removing the noise of many items having very similar content and keys.


Usage: structure_digest [options] File1[, File2, ...]
    -t, --tree                       replace repeated suffixes with indents

Usecases: Web APIs provides a rich api of music records. Fetching a page of Pink Floyd’s releases returns a hefty 15K of minimized JSON:

curl -s > pink-floyd.json
>wc -c pink-floyd.json
    15186 pink-floyd.json
>head pink-floyd.json
{"pagination": {"per_page": 50, "items": 1330, "page": 1, "urls": {"last":
"", "next":
""}, "pages":
27}, "releases": [{"thumb":
"", "artist": "Pink
Floyd, The*", "main_release": 1090924, "title": "Apples And Oran…

~100s of lines in my terminal. But we can quickly understand this document now:

>structure_digest --tree pink-floyd.json

Usecases: Configuration files

A BOSH manifest specifies a cloud deployment. It’s used by Cloud Foundry and its configuration is rich. Lets abstract its example manifest and find the fields configuring a BOSH “job”:

>structure_digest bosh_example.yml | grep -E "^.jobs"

Pretty neat.

Finding structure changes with diff

If you have 2 versions of some information format and an example of each, here’s a quick way to see what changed:

>diff <(structure_digest old.json) <(structure_digest new.json)
< .pagination.pages
< .pagination.per_page
> .pagination.limit
> .pagination.offset

This is great, we can tell that the API introduced a change from pagination to offsets and limits

Learn more & respond

The project is on github. Please follow it there for new features, changes.

What do you think of this tool? Do you love it? Do you hate it? Let me know in the comments.

  1. Neat. This gives a very interesting view into the massive BOSH deployment manifests used to deploy Cloud Foundry:

    A long YAML file will still be a long structure_digest :)

  2. Ken Mayer says:

    Is this reverse engineering on the road to an XML schema?

  3. Serguei Filimonov says:

    BOSH deployment manifests are interesting because the “properties” key is just a freeform namespace. I usually removed it via ‘grep’ and the rest is more about the manifest

    I try to avoid the word ‘schema’ when thinking of this tool, because schema to me implies validation (rejecting or accepting something), whereas this just learns of the structure from multiple files and passively reports available paths.

Post a Comment

Your Information (Name required. Email address will not be displayed with comment.)

* Copy This Password *

* Type Or Paste Password Here *