PHP plist parsing
As promised I’m going to look further into parsing plist files, this time in PHP. I’m going to show some functional programming concepts applied to PHP, using it similarly to how you program in XSL.
When I say similar to XSL I mean that there will be functions which transform the nodes of the plist and which function that gets to transform which node will de decided using pattern matching, just like how templates work in XSL (and how function calling works in Haskell, for example).
The code
The goal of parsing the plist is to transform it into a PHP value structure (I’m deliberately not saying object structure). dict nodes will be transformed into associative arrays and arrays into, surprise, surprise, arrays.
First let’s look at the function which finds the appropriate function to handle a specific node. It’s called parseValue and takes a DOMElement object as input, finds a function to handle it and returns what that function returns (a PHP value).
The way it decides which function to call is simple: it looks at the node’s name and tests to see if there is a function with that name prefixed with “parse_”. If the node is a number node the function parse_number is used to transform it. This logic could be made much more complex, for example to support something similar to XPath-paths, but this simple version works for most cases. The prefix “parse_” could be anything, it completely arbitrary.
function parseValue( $valueNode ) {
$valueType = $valueNode->nodeName;
$transformerName = "parse_$valueType";
if ( is_callable($transformerName) ) {
// there is a transformer function for this node type
return call_user_func($transformerName, $valueNode);
}
// if no transformer was found
return null;
}
As we will see below, parseValue is similar to apply-templates in XSL.
Then there is the transformer functions, equivalent to templates in XSL, which handle the different node types. First let’s look at the simple cases, integer, string, date and booleans. Save for booleans it’s only a matter of returning the text content of the nodes (we could do some kind of date parsing on the date perhaps, or return a proper int instead of a string representation of an int, but it doesn’t really matter in PHP anyway). Since the booleans come as <yes/> and <no/> there needs to be a parse_yes and parse_no handling each.
function parse_integer( $integerNode ) {
return $integerNode->textContent;
}
function parse_string( $stringNode ) {
return $stringNode->textContent;
}
function parse_date( $dateNode ) {
return $dateNode->textContent;
}
function parse_true( $trueNode ) {
return true;
}
function parse_false( $trueNode ) {
return false;
}
Handling dictionaries and arrays is a bit more complex, since they have to deal with child values:
function parse_dict( $dictNode ) {
$dict = array();
// for each child of this node
for (
$node = $dictNode->firstChild;
$node != null;
$node = $node->nextSibling
) {
if ( $node->nodeName == "key" ) {
$key = $node->textContent;
$valueNode = $node->nextSibling;
// skip text nodes
while ( $valueNode->nodeType == XML_TEXT_NODE ) {
$valueNode = $valueNode->nextSibling;
}
// recursively parse the children
$value = parseValue($valueNode);
$dict[$key] = $value;
}
}
return $dict;
}
function parse_array( $arrayNode ) {
$array = array();
for (
$node = $arrayNode->firstChild;
$node != null;
$node = $node->nextSibling
) {
if ( $node->nodeType == XML_ELEMENT_NODE ) {
array_push($array, parseValue($node));
}
}
return $array;
}
Wraping it up
To get some data to transform, load your plist file (for example the iTunes music library file) with this code:
$plistDocument = new DOMDocument();
$plistDocument->load($path);
And then start off the parsing with this snipplet:
function parsePlist( $document ) {
$plistNode = $document->documentElement;
$root = $plistNode->firstChild;
// skip any text nodes before the first value node
while ( $root->nodeName == "#text" ) {
$root = $root->nextSibling;
}
return parseValue($root);
}
Other uses
The implementation above transforms a Plist file into a PHP value structure, but it could just as well transform it into a JSON string or just about any other data interchange format.
2007-07-19 at 15:37
Perfect!
As someone completely unfamiliar with these functions (and a pretty bad programmer to boot), I had a little bit of trouble understanding how to kick everything off. For anyone else in my boat, here it is:
Get every function on this page into a document and then
$path = “/Path/to/your/plist/file/”;
$plistDocument = new DOMDocument();
$plistDocument->load($path);
$array = parsePlist($plistDocument);
$array is now a multidimensional array of your plist (in my case iTunes Library). Since arrays are super easy to work with, you can transform the data how ever you like. All I need now is the inverse function to take my changed arrays and rewrite a nice and neat plist file.
Thanks for the great scripts!
2007-07-22 at 10:05
Thank you for that summary!
2007-11-01 at 10:07
What will the array look like ? How can I traverse it ?
Are there only numeral keys for dicts and arrays and array keys for “key”s and siblings ?
If I were to process the songs in the Itunes plist, would I assume that first dimension is , second , and so forth… ?
2007-11-01 at 10:09
(”..first dimension is xml node [plist], second [dict], and so forth…?”)
2008-01-27 at 16:25
[...] get them in my bookmarks for my fellow coders. Parsing ITunes Library.xml with ActionScript 3.0 and plist parsing in PHP. Tags: Actionscript, Flex, iTunes, [...]
2008-04-01 at 16:53
Does this code only work in PHP5? Are there any updates since this article was published? Like others, I am trying to figure out how to parse the iTunes library xml file. Thanks.
2008-04-01 at 17:02
It would work fine in PHP5. I’m pretty sure I tested it on a PHP5 system (can’t remember when I last worked on a PHP4 system).
You can even stick the functions inside a class, but then you would have to change this part:
if ( is_callable($transformerName) ) {
return call_user_func($transformerName, $valueNode);
}
into something like this:
if ( is_callable(array($this, $transformerName)) ) {
return $this->$transformerName($valueNode);
}
2008-06-03 at 23:42
Hi!
I would like to use your script to parse trough a system profile of OS X clients. But after this:
$plistDocument = new DOMDocument();
$plistDocument->load($path);
the scipt stops without a error notification. What is the meaning of this?
2008-06-04 at 07:12
I have no idea. Check your error log. It could be a number of things that has gone wrong. For one it’s possible that your PHP installation doesn’t have DOM support.
2008-06-04 at 09:36
Thanks!
I tried it on another server, it works fine now. But what can I search for? If I do a echo count ( $array ); its allways 0. So if parse for “serialNumber” or such it should find something or is there a error in my thougts?
2008-06-04 at 12:27
print_r($array);