XSL, plists and a bit of JSON

Apple’s Cocoa API:s can save an object graph to file by using the plist format. In its modern form it’s an XML dialect, which should be a good thing — but it’s not.

Apple’s engineers had probably never heard of XSL when they dreamed up the format, because they made it needlessly difficult to transform.

In this post I’ll show you how to transform a plist file to a format that is easier to work with, and I’ll also show you how to transform that intermediate format into JSON, thus making a complete plist to JSON pipeline.

The problem with the plist format

In plists nodes are identified by their siblings. This is a headache for tools like XSL which work with one node, rather than many nodes, at a time. The dict element is the element used to define a serialized object in the plist format, and instead of doing the common-sense thing to give all children of a dict an attribute which identified their names, Apple’s engineers decided to have each value node preceded by a key node, which contains the key.

The problem here is that the name of a property is meta-data, and should, in my opinion, be an attribute of the value node (attribute and meta-data are synonyms, but that is not the only reason).

By having a value node depending on its preceding sibling to identify it, the plist format feels fragile. If a key node is accidentally removed, or the nodes are rearanged there is no way of identifying values anymore. Granted, that shouldn’t happen, but it might happen as a side effect of something else — for example, if you couldn’t use the preceding-sibling axis in XPath, you would have a hard time reaching the key node defining the name of the current node.

However, it’s perfectly possible to use XSL to work with plists, but it’s not as simple as other “proper” XML formats, but there are some problems. To know the name of a node you have to do a silly and messy lookup (more on this below), and you have to decide if you are going to work with key nodes or the value nodes (dict, array, string, integer, etc.). My suggestion is the latter because in the context of an array node there won’t be any key, so you can write different templates for the different modes.

For first problem described above, consider this plist exerpt:

<key>name</key>
<string>Theo</string>

the most common use case would be so transform this into some kind of key-value format (let’s say “key: value”). This can be done with this template:

<xsl:template match="string">
    <xsl:value:of select="preceding-sibling::key[1]/text()"/>
    <xsl:text>: </xsl:text>
<xsl:value-of select="text()"/>

If the plist format had been better designed, the first value-of would select “@key“, “@name” or perhaps “@id“, which would be nicer and cleaner.

Converting from plist

I have found myself in the situation where I wanted to transform a plist into another format and written some XSL which does the trick. The next time I look at it I that it is absolutely horrible to look at and it’s always the plist format that is the cause. The last time I needed to convert a plist, I decided that this has to stop, and this is the reason for this article.

A better solution to my problem is to create a stylesheet that transforms from the plist format to an intermediate format, for which I then can write another stylesheet that transforms to the destination format. The first stylesheet can be reused independently of the destination format, and it makes the second stylesheet much easier to write and to read.

In this case I wanted to transform a plist document into JSON. It can be done in other ways, and another day I migh show you how to do it in a similar way using PHP.

From plist to an intermediate format

The plist format consists of dict, array primitive value types such as string and integer. I believe that almost always a dict node represents an object, so I chose to rename that type. The children of the object element are called string, integer, etc., just as before.

In my intermediate format there are no key nodes (no surprise there, I hope), they have been moved to a name property on the child nodes of the object element.

Without any further ado, here is a simple plist, and the equivalent in my new intermediate format:

plist:

<plist version="1.0">
<dict>
    <key>name</key>
    <string>Theo</string>
    <key>birthday</key>
    <date>1981-12-01T00:00:00Z</date>
    <key>randomNumbers</key>
    <array>
        <integer>3</integer>
        <integer>5</integer>
        <integer>1</integer>
    </array>
</dict>
</plist>

the intermediate format:

<propertylist>
    <object>
        <string name="name">Theo</string>
        <date name="birthday">1981-12-01T00:00:00Z</date>
        <array name="randomNumbers">
            <integer>3</integer>
            <integer>5</integer>
            <integer>1</integer>
        </array>
    </object>
</propertylist>

The two examples are very similiar, which shouldn’t be a surprise. It’s not that the plist format is bad in every way, it’s just the key elements that ruins it.

The plist to intermediate format stylesheet

When I wrote the stylesheet that transforms a plist document to the intermediate format I realised that there were to modes in the plist format: default mode and dictionary mode. In the default mode a node does not have a preceding key sibling, this is the case in arrays and at the start of the document (the initial dict does not have a name). The in the dictionary mode, however, value nodes have preceding key nodes, which have to be found in order to know the name of the current node.

I deliberately used the word “mode” in the explanation above because I used the “mode” feature in XSL to write different templates for the same elements depending on whether they were children of a dict or an array (or top level). This means that a string or integer node will treated differently depending on the mode, and also that this is done without ugly and fragile choose/when/otherwise constructs.

You can find the complete stylesheet at the bottom of this article, but here are some excerpts:

In default mode, a string, integer, real or date node will be output exactly as it was, but in “dict” mode it will also have a “name” attribute, whose value is extracted from the key node preceding it.

<xsl:template match="string|integer|real|date">
    <xsl:element name="{local-name()}">
        <xsl:value-of select="text()"/>
    </xsl:element>
</xsl:template>

<xsl:template match="string|integer|real|date" mode="dict">
    <xsl:variable name="name">
        <xsl:value-of select="preceding-sibling::key[1]/text()"/>
    </xsl:variable>
    <xsl:element name="{local-name()}">
        <xsl:attribute name="name">
            <xsl:value-of select="$name"/>
        </xsl:attribute>

        <xsl:value-of select="text()"/>
    </xsl:element>
</xsl:template>

Here is another excerpt in which takes care of dictionaries:

<xsl:template match="dict">
    <object>
        <xsl:apply-templates mode="dict"/>
    </object>
</xsl:template>

As you see, the template for dictionaries just inserts an object element and continues, but also switches to the “dict” mode. There is actually an identical copy of this template for the “dict” mode, since there seems to be no way for a template to match in more than one mode.

The template that matches array is similar, but instead swiches to default mode.

You can find the complete stylesheet at the bottom of this article.

From the intermediate format to JSON

The plist format is a serialized object graph, and so is JSON, so it’s not so strange that you would want to convert between the two. Using the plist to intermediate format stylesheet I defined above we can now add a stylesheet to the pipeline to transform to JSON (which isn’t an XML format, but that doesn’t matter, XSL handles that very well).

A simplified version of the needed XSL looks like this, one template for object elements, one for array and one for the primitives:

<xsl:template match="object">
    <xsl:text>{</xsl:text>
    <xsl:apply-templates />
    <xsl:text>}</xsl:text>
</xsl:template>

<xsl:template match="array">
    <xsl:text>"</xsl:text>
    <xsl:value-of select="@name"/>
    <xsl:text>":</xsl:text>
    <xsl:text>[</xsl:text>
    <xsl:apply-templates />
    <xsl:text>]</xsl:text>
</xsl:template>

<xsl:template match="string|integer|real|date">
    <xsl:text>"</xsl:text>
    <xsl:value-of select="@name"/>
    <xsl:text>":"</xsl:text>
    <xsl:value-of select="text()"/>
    <xsl:text>",</xsl:text>
</xsl:template>

However, this is not entierly correct, you need to add a check so that the last item in a list doesn’t get a comma after it, and the code assumes that an object will never occur as a child of an object (this actually seems to be rare in plist documents), nor arrays as childs of arrays (an odd, but concievable possibility). The actual stylesheet, which can be found below, takes care of these exceptions, it also escapes any quotes in the strings and replaces newlines with “\n”.

Conclusion

Using the plist to intermediate format stylesheet you can concentrate on the destination format when you write a stylesheet instead of how to get around the problems of the plist format. The intermediate format is reusable regardless of the destination format and can be handy to have lying around.

Resources

You are welcome to download the stylesheets using the links below, and keep them for personal use, but please contact me if you want to use them in any context where you will distribute them to a third party. I can’t promise that the won’t blow up your computer or destroy your data. If you have any suggestions, comments or questions, I happily answer everything.

I have provided the stylesheets as downloadable files instead of pasting the code because of I can’t figure out how to make WordPress (or is it the code highlighter?) display XSL properly. Moreover, WordPress didn’t allow me to upload XSL-files, so I had to gzip them.

6 Responses to “XSL, plists and a bit of JSON”

  1. elmimmo Says:

    Hi!

    Do you still have those files? Or else, do you know of other website that provides an XSLT stylesheet to parse Apple’s PLIST files?

  2. Olaf Tietze Says:

    Hi,

    Great article. Today I encountered the plist horrors Apple created with those keys. It seems they don’t get how XML works.

    Would you be so kint to reupload your files or drop me a mail, as the links are no longer valid?

    Thanks a lot, regards,–

    Olaf

    (Hamburg/Germany)

  3. Theo Says:

    I’ve updated the links to the files and fixed some code formatting issues in the post. If the links to the files open directly in the browser, use view source to see the code (that’s how Chrome treats them at least).

  4. elmimmo Says:

    The stylesheet for transforming from plist format to intermediate format has a bug. dict elements other than /plist/dict are preceded by a key, and the stylesheet should be adding a name attribute to the object element with the value of that key.

    In other words

    <xsl:template match="dict">
        <object>
            <xsl:apply-templates mode="dict"/>
        </object>
    </xsl:template>
    
    <xsl:template match="dict" mode="dict">
        <object>
            <xsl:apply-templates mode="dict"/>
        </object>
    </xsl:template>
    

    should be:

    <xsl:template match="dict">
        <object>
            <xsl:apply-templates mode="dict"/>
        </object>
    </xsl:template>
    
    <xsl:template match="dict" mode="dict">
        <xsl:variable name="name">
            <xsl:value-of select="preceding-sibling::key[1]/text()"/>
        </xsl:variable>
    
        <object name="{$name}">
            <xsl:apply-templates mode="dict"/>
        </object>
    </xsl:template>
    

    I think.

  5. elmimmo Says:

    Intermediate to JSON also seems to be broken. Does not correctly handle string|integer|real|date inside an array, and obviates boolean values.

    I put versions of the files with my fixes (which I think are final) at https://gist.github.com/2851115 and https://gist.github.com/2851536 (i hope that is fine with Theo; I’ve kept the © message since I just tweaked some lines and credited this website)

  6. Theo Says:

    You’re probably right that it doesn’t take care of nested dicts, just the top level dict. Thanks for the gists!

Leave a Reply