XML

In a previous guide, I wrote a short overview of how to document JSON. If you haven't read it yet, I highly recommend that you do so. The following content relies on previously listed information. For this guide, we will be revising our robot API.

Note: this guide makes references to HTML. You do not need to know HTML to continue with this guide but it does help to understand how HTML elements work.

In this guide, we will revisit our delivery robot API as mentioned in previous technical writing guides that I have written.


XML stands for eXtensible Markup Language which is very much like HTML but meant to describe data. Unlike HTML, XML tags are not predefined and you can define your own tags as you see fit. Because of this customizable nature, XML can be used for any kind of structured data. Also, unlike JSON, XML has only one type: String

With that, lets jump into structures of XML and learn a little about XML in general.

Structured Data

XML allows for the transfer of data via a structure format. Think of this structure as a tree: From the root, you have several branches. Each branch in turn has it's own branch and so on until you finally reach the end with a leaf.

XML can contain a dictionary of lists, lists of dictionaries, and dictionary of dictionaries. A dictionary, in terms of XML, is similar in nature to JSON's key/value pair. But, unlike JSON, the first line of an XML file has to be used to declare a few things. This line is referred to as the header.

This header information usually includes version, character encoding, and so forth. But from a API writer's standpoint, you can basically ignore it.

Example of a declaration:

<?xml version="0.8" encoding="UTF-8" standalone="no" ?>

Tags and Content

Tags works like HTML starting with angle brackets (< >). The start and end tags must match that can only contain letters, numbers, and underscores. All XML tags must start with a letter character.

Example:

<type>roller</type>

Empty tags can use self-closing tags like this one:

<space_travel/>

Content is what is found between the opening and closing tags. If the content has no tags, it's treated like a string. In the type (tag) example above, roller is the content.

Nesting

Just like JSON objects, nested data can sits inside a set of tags.

Example of nested tags:

<robot>
  <type>roller</type>
  <weight>20</weight>
  <sensors>optical</sensors>
</robot>

In this example above, we have a robot tag with three nested tags within it (type, weight, and sensors).

Here is an example of a full XML file as noted from a robot object with type, weight, and sensors data:

<?xml version="0.8" encoding="UTF-8" standalone="no" ?>
<robot>
  <type>roller</type>
  <weight>2</weight>
  <sensors>
    <sensor>optical</sensor>
    <sensor>distance</sensor>
    <sensor>vibration</sensor>
    <sensor>fuel</sensor>
  <sensors>
<robot>

Attributes

In addition to content, tags can have attributes that contains simple data (such as a string or number). Attributes act like key/value pairs when accessing the data within it.

The key is created as a string within the tag but you do not have to put quotes around it. The value should be in a set of quotation marks. Key names must start with a letter character and can use any combination of letters, numbers, and underscores. Spaces and punctuation characters are not allowed for the name of keys names.

Most common designs with XML files is to use attributes as some sort of property about the data (such as metadata).

Example of attributes:

<robot>
  <weight unit="kilograms">20</weight>
  <velocity decimals="2" unit="km/h">32.22</velocity>
  <battery_life unit="hours" active="true">10</battery_life>
</robot>

In this particular case, the robot has a weight of 20kg, is moving at 32.22km/h, and has 10 hours of active battery life left.

Here is an example of attributes and an array:

<robot type="roller" weight="20" weight_unit="kilograms">
  <sensors>
    <sensor name="optical"/>
    <sensor name="distance"/>
    <sensor name="vibration"/>
  <sensors>
<robot>

From this example, we can tell that the robot is a roller type, weighs 20kg, and has three sensors including optical, distance, and vibration.

Comments

You can use comments in XML files just like HTML files by using the opening and closing comment tags of <!-- and -->. Everything in between these comment tags will be ignored.

Consider the following:

<robot type="roller" weight="20" weight_unit="kilograms">
  <sensors>
  <!-- This attribute needs at least one sensor to be listed in order to work properly. If this is empty, please review the source file. -->
    <sensor name="optical"/>
  <sensors>
<robot>

As noted, the comment nested in the sensors tag provides some information about the data but it won't be read or displayed by any parser or auto-documentation system.

Namespaces

Namespaces are commonly used in structured hierarchies, like XML documents, to allow the reuse of names but in different contexts. Namespaces are usually a prefix given to a tag name that is separated by a colon (:).

For example, we have a tag called wheels, and another set called front:wheels. The obvious difference here is that one tag set handles data about wheels in a more general sense while front:wheels handles the specifics of wheels located on the front side.

<wheels>
  <tire_count count="4"/>
</wheels>
<front:wheels>
  <tire_pressure_left unit="pounds" value="20"/>
  <tire_pressure_right unit="pounds" value="20"/>
</front:wheels>

Indentation

Indentation is typically used to indicate nesting of information. Indentation is typically handled via "White space" which means spaces, new lines, and so on. You may have noticed the indentation in all the previously listed examples and didn't think much about it but each example used a very rigid spacing policy of two white spaces per level.

While some think that spacing doesn't matter in a XML file unless it is inside a quotation mark, I tend to disagree. I believe that one should have clear and consistent usage of white spacing throughout the entire documentation set. Consistent spacing makes the XML document easier to read and just looks more professional.

Properly formatted XML includes:

  • An indent for every new level of brackets
    • 2 to 4 white spaces (depending on your team's/company's coding policy)
  • Tags that do not contain other tags can have start and end tags on the same line.
  • Tags that are nested should be on their own lines.

There is an ongoing argument about white spaces versus tabbed spaces. Depending on the options of your favorite XML editor, a tabbed space can be configured to use white spaces instead or use any other spacing configuration that the development team has blessed.

Documenting XML

Overall Approach to Documenting XML

Short caveat: There is no one perfect way to document XML files. You should review your company and/or team's policies on the particulars this subject.

API requests often use either JSON and XML. The request may specify which format is used. Because of this request, it would work to your advantage to document both JSON and XML files using similar (if not the same) table columns. If the API is designed well, the key names in JSON and the tag names in XML should be identical. Also, the types should be identical.

Documenting Attributes

Both JSON and XML files can pretty much deliver the same data but there is a structural difference between the two. That is, XML has attributes and JSON does not. If an API uses both JSON and XML, often the XML does not make use of the attributes in it's tags.

If XML has only a few attributes, you may want to document the attributes in the Notes column to your table. If the XML has many attributes, try adding an Attributes column to your table.

How to Document Types

There are two kinds of types in XML: simple and complex. Simple types are strings whereas complex contain other tags.

Technically, all simple types are strings but the software parsing the data may convert the string into another type. Because of this, you will need to document the kind of type so the developers know what to use for both content and attribute values.

Complex types are other elements with tags and you should list the element name in the Type column. If you are using multiple tables, then you should have a table for each element type. In this setup, you should make a link to the nested tags within the complex type to the other tables in the Type column.

Consider the following tables:

Object level of robot

Element Description Type
id Unique id of robot id element

Id: fields for robot's id.

Element Description Type
id Unique id of robot number
init_date Robot's initialization date dateString
created_by Name of person/machine who built this robot user element

User: fields for user's id.

Element Description Type
id Unique id of user number
name Name of person/machine string
location Location of user number

One additional note you should consider when creating these tables: Since the JSON and XML files are similar, your request and response will need to include a Required column. Again, follow the aforementioned format of "Required" or "Optional" with putting default values of the Optional in the Notes column as noted in the Documenting API Via JSON guide.

Tools for Writing Structured Data Documentation

There are wide variety of tools at your disposal for creating structured data documentation. Whichever you choose, you should make sure it has these bare minimum requirements:

  • The ability to create tables,
  • make links both internal and external to the document,
  • and assigning fonts like Monospace for code samples.

Format control and creating templates are great to have but that would be a desire, not necessarily a requirement for your documentation.

Some old-school tools that was used were programs like Microsoft Word (which is then saved out as a PDF for distribution on the web) but in my experiences, you get a lot more bang for your buck if you use a wiki based system like MediaWiki, Atlassian's Confluence, or MindTouch. One can go so far as to use a standard HTML editor and write the document using either HTML or Markdown but you would get bogged down in the finer details of consistent formatting. There are many many commercial tools that's up to the challenge of documenting your API such as Madcap Flare and Robohelp. These great tools tend to provide you with more functionality when dealing with technical documentation but may fall short in features for general documentation.

Why Use Structured Data for Documentation

A key need in writing structure documentation properly is so that the document is readable by both humans and machine. This works to your advantage because it can be included in automated testing and interactive documentation. For example, check out Swagger's Pet Store. This site delivers a well formatted and interactive documentation base that is both easy to use and includes a great search function.

I don't know about you but I use search and filter functions on documentation whenever it's available. And if I may stand on a soapbox for a moment, I'd like to say this: Please, please, please include a search feature! Technical documentation can get very long (and "wordy"). Wading through these documents can be painful but if the document includes a search filter, your quest for your API field can be cut down to seconds instead of minutes.

Using Structure Data for Documentation

Schema

XSD files (XML Schema Definition) are used to describe the structure of XML files which is also known as the Schema. The schema file describes which tags, attributes, and type are used in the XML files.

XSD files are written in XML markup. So if you know XML, you'll have no problem reading XSD files.

Using XML to Describe XML (XSD)

The following example lists all the valid types and values of sensors a robot may use in our documentation sample:

<xs:simpleType name=sensors>
  <xs:annotation>
    <xs:documentation>Types of sensors available for robots</xs:documentation>
  <xs:restriction base="xs:string">
    <xs:enumeration value="optical">
      <xs:annotation>
        <xs:documentation>Visual sensor to detect light</xs:documentation>
      </xs:annotation>
    </xs:enumeration>
    <xs:enumeration value="distance">
      <xs:annotation>
        <xs:documentation>Laser based sensor to detect distance of nearby objects</xs:documentation>
      </xs:annotation>
    </xs:enumeration>
    <xs:enumeration value="vibration">
      <xs:annotation>
        <xs:documentation>Seismic based sensor to detect vibrations of payload and traveling surface</xs:documentation>
      </xs:annotation>
    </xs:enumeration>
  </xs:restriction>
  </xs:annotation>
</xs:simpleType>

Disadvantages

While XSD is pretty powerful, it does have a few shortcomings:

  1. XSD can handle simple descriptions but if your descriptions get too wordy, the description could wrap oddly or just take up too much cell space in the table.
  2. XSD can't use images (diagrams, screen-shots, and so forth would have to handled in another fashion).
  3. Finally, XSD cannot use links.