PHP plist parsing

As promised I’m going to look further into parsing plist files, this time in PHP. I’m going to show some functional programming concepts applied to PHP, using it similarly to how you program in XSL.

When I say similar to XSL I mean that there will be functions which transform the nodes of the plist and which function that gets to transform which node will de decided using pattern matching, just like how templates work in XSL (and how function calling works in Haskell, for example).

The code

The goal of parsing the plist is to transform it into a PHP value structure (I’m deliberately not saying object structure). dict nodes will be transformed into associative arrays and arrays into, surprise, surprise, arrays.

First let’s look at the function which finds the appropriate function to handle a specific node. It’s called parseValue and takes a DOMElement object as input, finds a function to handle it and returns what that function returns (a PHP value).

The way it decides which function to call is simple: it looks at the node’s name and tests to see if there is a function with that name prefixed with “parse_”. If the node is a number node the function parse_number is used to transform it. This logic could be made much more complex, for example to support something similar to XPath-paths, but this simple version works for most cases. The prefix “parse_” could be anything, it completely arbitrary.

function parseValue( $valueNode ) {
  $valueType = $valueNode->nodeName;

  $transformerName = "parse_$valueType";

  if ( is_callable($transformerName) ) {
    // there is a transformer function for this node type
    return call_user_func($transformerName, $valueNode);
  }

  // if no transformer was found
  return null;
}

As we will see below, parseValue is similar to apply-templates in XSL.

Then there is the transformer functions, equivalent to templates in XSL, which handle the different node types. First let’s look at the simple cases, integer, string, date and booleans. Save for booleans it’s only a matter of returning the text content of the nodes (we could do some kind of date parsing on the date perhaps, or return a proper int instead of a string representation of an int, but it doesn’t really matter in PHP anyway). Since the booleans come as <yes/> and <no/> there needs to be a parse_yes and parse_no handling each.

function parse_integer( $integerNode ) {
  return $integerNode->textContent;
}

function parse_string( $stringNode ) {
  return $stringNode->textContent;  
}

function parse_date( $dateNode ) {
  return $dateNode->textContent;
}

function parse_true( $trueNode ) {
  return true;
}

function parse_false( $trueNode ) {
  return false;
}

Handling dictionaries and arrays is a bit more complex, since they have to deal with child values:

function parse_dict( $dictNode ) {
  $dict = array();

  // for each child of this node
  for (
    $node = $dictNode->firstChild;
    $node != null;
    $node = $node->nextSibling
  ) {
    if ( $node->nodeName == "key" ) {
      $key = $node->textContent;

      $valueNode = $node->nextSibling;

      // skip text nodes
      while ( $valueNode->nodeType == XML_TEXT_NODE ) {
        $valueNode = $valueNode->nextSibling;
      }

      // recursively parse the children
      $value = parseValue($valueNode);

      $dict[$key] = $value;
    }
  }

  return $dict;
}

function parse_array( $arrayNode ) {
  $array = array();

  for (
    $node = $arrayNode->firstChild;
    $node != null;
    $node = $node->nextSibling
  ) {
    if ( $node->nodeType == XML_ELEMENT_NODE ) {
      array_push($array, parseValue($node));
    }
  }

  return $array;
}

Wraping it up

To get some data to transform, load your plist file (for example the iTunes music library file) with this code:

$plistDocument = new DOMDocument();
$plistDocument->load($path);

And then start off the parsing with this snipplet:

function parsePlist( $document ) {
  $plistNode = $document->documentElement;

  $root = $plistNode->firstChild;

  // skip any text nodes before the first value node
  while ( $root->nodeName == "#text" ) {
    $root = $root->nextSibling;
  }

  return parseValue($root);
}

Other uses

The implementation above transforms a Plist file into a PHP value structure, but it could just as well transform it into a JSON string or just about any other data interchange format.

16 Responses to “PHP plist parsing”

  1. Nathan Ziarek Says:

    Perfect!

    As someone completely unfamiliar with these functions (and a pretty bad programmer to boot), I had a little bit of trouble understanding how to kick everything off. For anyone else in my boat, here it is:

    Get every function on this page into a document and then

    $path = "/Path/to/your/plist/file/";
    
    $plistDocument = new DOMDocument();
    $plistDocument-&gt;load($path);
    
    $array = parsePlist($plistDocument);
    

    $array is now a multidimensional array of your plist (in my case iTunes Library). Since arrays are super easy to work with, you can transform the data how ever you like. All I need now is the inverse function to take my changed arrays and rewrite a nice and neat plist file.

    Thanks for the great scripts!

  2. Theo Says:

    Thank you for that summary!

  3. Thomas Says:

    What will the array look like ? How can I traverse it ?

    Are there only numeral keys for dicts and arrays and array keys for “key”s and siblings ?

    If I were to process the songs in the Itunes plist, would I assume that first dimension is , second , and so forth… ?

  4. Thomas Says:

    (“..first dimension is xml node [plist], second [dict], and so forth…?”)

  5. Parsing ITunes Library.xml And Plist Files | David Bisset: Web Designer, Coder, Wordpress Guru Says:

    [...] get them in my bookmarks for my fellow coders. Parsing ITunes Library.xml with ActionScript 3.0 and plist parsing in PHP. Tags: Actionscript, Flex, iTunes, [...]

  6. Patrick Says:

    Does this code only work in PHP5? Are there any updates since this article was published? Like others, I am trying to figure out how to parse the iTunes library xml file. Thanks.

  7. Theo Says:

    It would work fine in PHP5. I’m pretty sure I tested it on a PHP5 system (can’t remember when I last worked on a PHP4 system).

    You can even stick the functions inside a class, but then you would have to change this part:

    if ( is_callable($transformerName) ) { return call_user_func($transformerName, $valueNode); }

    into something like this:

    if ( is_callable(array($this, $transformerName)) ) { return $this->$transformerName($valueNode); }

  8. Philip Brechler Says:

    Hi!

    I would like to use your script to parse trough a system profile of OS X clients. But after this:

    $plistDocument = new DOMDocument(); $plistDocument->load($path);

    the scipt stops without a error notification. What is the meaning of this?

  9. Theo Says:

    I have no idea. Check your error log. It could be a number of things that has gone wrong. For one it’s possible that your PHP installation doesn’t have DOM support.

  10. Philip Brechler Says:

    Thanks!

    I tried it on another server, it works fine now. But what can I search for? If I do a echo count ( $array ); its allways 0. So if parse for “serialNumber” or such it should find something or is there a error in my thougts?

  11. Theo Says:

    print_r($array);

  12. CFPropertyList Says:

    “The PHP implementation of Apple’s PropertyList can handle XML PropertyLists as well as binary PropertyLists. It offers functionality to easily convert data between worlds, e.g. recalculating timestamps from unix epoch to apple epoch and vice versa. A feature to automagically create (guess) the plist structure from a normal PHP data structure will help you dump your data to plist in no time.”

  13. john freeman Says:

    hi,

    This almost useful post is missing a crucial piece of information (for me anyway) – namely, how to do the iteration of the file to get the DOMElement objects.

    I’ve been messing with reading RSS files, and they have a nice hierarchical structure, but Mac PLists (AlbumData) have this “key-value” pair structure that I came to your page to the hope of finding a good way to deal with. I will keep looking!

    thanks!

  14. john freeman Says:

    the comment above mine – from CFPropertyList, was EXACTLY what I was looking for, so thanks for getting my to where I want to be!

    jf

  15. Theo Says:

    @john: what you’re asking for is exactly what’s under the “Wrapping it up” header.

  16. KT Says:

    Theo,

    Thanks for the excellent example. I worked your code, with a few minor tewaks, into a standalone class in case anyone else finds it useful

    read_file($inPath);

    // Class definition class PListFile { function read_file($inPath) { $document = new DOMDocument(); $document->load($inPath);

        return $this-&gt;parse_plist($document);
    }
    
    function parse_plist($inDocument) 
    {
        $docNode = $inDocument-&gt;documentElement;
        $root    = $docNode-&gt;firstChild;
    
        // skip any text nodes before the first value node
        while ( $root-&gt;nodeName == "#text" ) 
        {
            $root = $root-&gt;nextSibling;
        }
    
        return $this-&gt;parse_node($root);
    }
    
    function parse_node( $inNode ) 
    {
        $valueType  = strtolower($inNode-&gt;nodeName);
        $selector   = 'parse_'.$valueType;
    
        return $this-&gt;$selector($inNode);
    }
    
    function parse_array( $inNode ) 
    {
        $array = array();
    
        for ( $node = $inNode-&gt;firstChild; $node != null; $node = $node-&gt;nextSibling ) 
        {
            if ( $node-&gt;nodeType == XML_ELEMENT_NODE ) 
            {
                $array[] = $this-&gt;parse_node($node);
            }
        }
    
        return $array;
    }
    
    function parse_dict( $inNode ) 
    {
        $dict = array();
    
        // for each child of this node
        for ( $node = $inNode-&gt;firstChild; $node != null; $node = $node-&gt;nextSibling ) 
        {
            if ( $node-&gt;nodeName == "key" ) 
            {
                $key        = $node-&gt;textContent;
                $valueNode  = $node-&gt;nextSibling;
    
                // skip text nodes
                while ( $valueNode-&gt;nodeType == XML_TEXT_NODE ) 
                {
                    $valueNode = $valueNode-&gt;nextSibling;
                }
    
                // recursively parse the children
                $value = $this-&gt;parse_node($valueNode);
    
                $dict[$key] = $value;
            }
        }
    
        return $dict;
    }
    
    function parse_integer( $inNode ) 
    {
        return $inNode-&gt;textContent;
    }
    
    function parse_real( $inNode ) 
    {
        return $inNode-&gt;textContent;
    }
    
    function parse_string( $inNode ) 
    {
        return $inNode-&gt;textContent;  
    }
    
    function parse_date( $inNode ) 
    {
        return $inNode-&gt;textContent;
    }
    
    function parse_true( $inNode ) 
    {
        return true;
    }
    
    function parse_false( $inNode ) 
    {
        return false;
    }
    

    }

Leave a Reply