Ichabod explained

The Adobe MAX sessions are online, and among them is a presentation by Jim Corbett about how Adobe and Google’s new Flash indexer works. I think it’s great to get such a thorough explanation from Adobe, but it’s a shame that they didn’t do this at the same time their PR people were busy hyping it a few months ago. Since it’s release it’s become obvious that it’s very limited, and there’s still no evidence that it works any better than the old swf2html method. Jim’s presentation makes it clear why this is — and I maintain my previous assessment that we’re better off using progressive enhancement/graceful degradation until there’s significant changes in the Flash Player to overcome the problems.

Jim shows the headless player (called “Ichabod”) running through the Flex Store example application, it clicks some buttons and extracts the text in each state (see the presentation around the 19 minute mark). The problem should be obvious when you see the extracted text:

Learn more >>
$59.99
minutes for just
1000
Featured products
Nokia 9300 Communicator
Nokia 9500 Communicator
Nokia 6800

[...]

Compare Items
Your Shopping Cart
$0.00
Total:
$0.00
Shipping:
$0.00
Grand Total:
Submit Order
Nokia 6010
$99.99
Nokia 3100 Blue
Tri-band
$139.00
Nokia 3100 Pink
Tri-band
$139.00

And so on.

There’s no context whatsoever, just the pure text. There’s no way to tell which texts belong together (or if any even do), there is no indication as to which parts are headers, which parts are texts on buttons, visual relations are lost, etc. There is nothing that makes it possible to go beyond naïve indexing (“this page mentions the word X”).

The fundamental problem here is that there is no way to solve this problem given the current state of Flash Player. There is nothing but text, and there is no semantic structure. The semantic content of a Flash site is encoded entirely in the presentation, the state of things is even worse than the state of HTML ten years ago.

I’ve worked on search engine optimizations for Flash-based web shops, and the content extracted by Ichabod just doesn’t cut it (not to mention that Google, on their end, don’t implement the network interface yet, so dynamically loaded content isn’t even considered — which makes every other problem moot). Until there’s a way to mark up the content in such a way that the output of Ichabod matches an HTML page in terms of semantic richness progressive enhancement/graceful degradation is the only working solution to making Flash sites indexable.

4 Responses to “Ichabod explained”

  1. Martin Says:

    Im really hoping that one good thing that could come out of this is for them to actually release a version of the headless player so we can run unit tests without requiring a display.

  2. Theo Says:

    True, that would be great.

  3. Marcus Stade Says:

    I’ll second that for sure. A headless player would be perfect for testing.

  4. Why I’m not working for 2008-12-12 / 2009-01-01 | CISNKY Says:

    [...] Ichabod explained.Looks like we are still at square one when it comes to indexing Flash content. [...]

Leave a Reply