The Adobe MAX sessions are online, and among them is a presentation by Jim Corbett about how Adobe and Google’s new Flash indexer works. I think it’s great to get such a thorough explanation from Adobe, but it’s a shame that they didn’t do this at the same time their PR people were busy hyping it a few months ago. Since it’s release it’s become obvious that it’s very limited, and there’s still no evidence that it works any better than the old swf2html method. Jim’s presentation makes it clear why this is — and I maintain my previous assessment that we’re better off using progressive enhancement/graceful degradation until there’s significant changes in the Flash Player to overcome the problems.
Jim shows the headless player (called “Ichabod”) running through the Flex Store example application, it clicks some buttons and extracts the text in each state (see the presentation around the 19 minute mark). The problem should be obvious when you see the extracted text:
Learn more >> $59.99 minutes for just 1000 Featured products Nokia 9300 Communicator Nokia 9500 Communicator Nokia 6800
Compare Items Your Shopping Cart $0.00 Total: $0.00 Shipping: $0.00 Grand Total: Submit Order Nokia 6010 $99.99 Nokia 3100 Blue Tri-band $139.00 Nokia 3100 Pink Tri-band $139.00
And so on.
There’s no context whatsoever, just the pure text. There’s no way to tell which texts belong together (or if any even do), there is no indication as to which parts are headers, which parts are texts on buttons, visual relations are lost, etc. There is nothing that makes it possible to go beyond naÃ¯ve indexing (“this page mentions the word X”).
The fundamental problem here is that there is no way to solve this problem given the current state of Flash Player. There is nothing but text, and there is no semantic structure. The semantic content of a Flash site is encoded entirely in the presentation, the state of things is even worse than the state of HTML ten years ago.
I’ve worked on search engine optimizations for Flash-based web shops, and the content extracted by Ichabod just doesn’t cut it (not to mention that Google, on their end, don’t implement the network interface yet, so dynamically loaded content isn’t even considered — which makes every other problem moot). Until there’s a way to mark up the content in such a way that the output of Ichabod matches an HTML page in terms of semantic richness progressive enhancement/graceful degradation is the only working solution to making Flash sites indexable.