It’s probably the number one topic on Flash blogs today: O’Reilly launches InsideRIA and has an article about how Google indexes SWF files. As usual when it comes to SEO and Flash, almost no one understands what it means, but almost everyone talks about it as the New Big Thing. Guess what? It’s not. I’ve said quite a few things on the subject (try the Search Engine Optimization category in the list to the right), but I think Geoff Stearns has put it best:
It doesn’t matter.
That is the bottom line. It doesn’t matter if Google can index the SWF, because it won’t do you any good.
It is interesting that Google now uses Adobe’s Search Engine SDK, not because it will change anything, but why they have switched. I have commented on Google’s approach to SWF indexing before (Google and Flash) and found that they don’t understand the problem, this is just further proof of that.
The article that started it all has this to say about Google’s new approach:
This is great news for RIAs and especially Flash/Flex folks who care about search engine ranking.
This is also good because we can avoid techniques that could look like cloaking to Googlebot which will hurt your search engine listings.
Both quotes from Google’s Indexing Flash Text with Adobe’s SDK on InsideRIA.
These two statements are false. Firstly there’s no good news here, Google has changed from their home-brewn mistargeted effort to Adobe’s mistargeted effort. It will do almost nothing for Flash site rankings, and absolutely nothing for Flex applications. Secondly, it will have no effect on the techniques currently empolyed to achieve indexable Flash sites, because the cloaking technique is the only technique that works, and more to the point: the SWF indexing technique does not work, and cannot work but in the simplest SWF files.
The reasons are very simple:
- SWF files have no semantical structure. Even if Google can extract text from SWF files, how could it ever know in which order the text appears in? How does it know that the text is not only a placeholder that will get parts of it replaced when the application runs? How can it associate a header with a body? These are only a few issues that make the data in a SWF unusable without actually running the application.
- SWF files contain very little content. Only the simplest Flash site contains static text, most load their content from XML files on the server, or at least dynamically generates or changes text fields. Unless the SWF is actually run there is no way of knowing how it will appear to the user.
- SWF files load other SWF files. A not uncommon technique is to have a bootstrap SWF that loads a bigger SWF containing the actual site. Once again: without actually running the application there is no way to know how the application appears to the user.
- Any application that requires the user to login cannot be indexed. Most Flex applications are just that: applications, they let users work with their data, data which is not public and should never ever be indexed. It’s not like Google is allowed past the login-screen on Facebook. Asking that Google index a Flex application is like asking Google Desktop to index Flex Builder.
In short: without running the SWF file there is no way of knowing how it will appear to the user, it cannot be known which text appears (or if it appears at all), nor in what order or what context — but since Google can’t find the SWF files to start with it shouldn’t bother with those questions and instead work on something useful like how to make progressive enhancement/graceful degradation easier and more Google-compatible. For more discussion about these issues, read my previous article on the subject: Google and Flash.
I really hope that the article is not a measure of the future quality of InsideRIA, because this just shows that the author understands as little as Google about the subject.