Google has been indexing SWF:s using their new techniques for a couple of weeks now, and it should be possible to see what it really means. I was very critical in my last post on the subject, and some of the things I have been proven wrong about, but it seems that so far I have been mostly right, nothing has really changed.
What I was wrong about
Google does indeed find content embedded using SWFObject, something I seriously doubted that it would. I’ve seen reports that embedding using
object tags works better, and Adobe’s AC_FL_RunContent is worse.
Another good thing is that when you search for something that was found inside an SWF it is the embedding HTML page that appears in the search results. If it instead had been the SWF itself then we would have to actively hide our Flash-based sites from Google, but luckily it seems to work as it should.
What I was right about, so far at least
I haven’t found any compelling evidence that Google has indexed any dynamically loaded content, or at least not that that content counts towards the rank of the SWF or embedding page.
We’ve known for a long time that Google can pick apart a SWF and find any static text inside it, but since they now claim to be able to “explores Flash files in the same way that a person would” you would expect dynamically loaded content to weigh in towards the rank of the SWF or embedding page, otherwise they aren’t actually doing anything they didn’t do before.
Ryan’s competition basically works like this: create a Flex based application that loads content containing the words “fleximagically searchable” dynamically, but only on a user activated event (a button click for example). The site that ranks highest on Google in the beginning of September wins. The competition has some major flaws, but is suitable for looking for sites that get indexed for their dynamic content, rather than the contents of the actual SWF.
Using Ryan Stewart‘s “fleximagically searchable” competition it should be easy to measure the competence of Google’s SWF indexing. Searching for those two words yields many hits but most are blogs talking about the competition, rather than being competition entries. Looking at the few hits that are Flash-based all have the words as static text in the SWF, none of the top 20 is a Flash-based site that only loads the words dynamically (there is one that looks like it does, but it uses cloaking — serving the content text-only when the user agent is “googlebot”). This means that in the weeks since Ryan announced the competition no entry that fulfills the rules has actually been indexed by Google, as far as I can tell.
There can be a number of reasons as to why this is. One is that the competition entries have been ranked so low that I don’t find them, but I’ve tried searching for the exact phrases without luck, so I lean towards one of the two other possible reasons: Google can’t do what it claims, or Google hasn’t actually started doing it yet, despite saying so.
If it is the case that Google just haven’t turned on all the features of it’s Flash indexer yet we might yet see some results, and when we do I will strike out the parts of this post that are wrong.
If someone has an example of Google indexing dynamically loaded content, that can be shown not to be down to some other factor like the text of links to that page, the page title or other page content, please let me know and I’ll link to it here as a proof that I was wrong.
Update (July 23): The owner of the fleximagically-searchable.com domain has done some extensive testing and come to more or less the same conclusion as I have: Google doesn’t (yet) index dynamically loaded content. Even worse is that it also seems like Google isn’t indexing anything that is dynamically set using ActionScript, only static text, which is more or less what they’ve done for a long time.
Update (July 26): In an interview, Justin Everett-Church, the Senior Product Manager for Flash Player said
Flash Player does not actually implement the network API, we actually hand that off to our host, so in the case of a browser the browser will make a network request and that’s what add cookies. A similar process is happening on the search server, where we will actually say “well, I need this XML file or I need this other SWF” and it’s up to the Google host application to return that content. My understanding right now is that that part of it has not been implemented by Google even though our search player allow that capability.
TheFlashBlog: Flash Player FAQs Video with Justin Everett-Church (the quote is at 8:20), emphasis mine.
Just to make sure that there are no misunderstandings this is what I mean when I say that Google doesn’t seem to index dynamically loaded content: I have seen no examples where a page only appeared as a result to a query because of content that was not on the embedding HTML page and not as static text in the SWF file but was only in data from an external source (e.g. an XML or text file) loaded by the SWF. For example, if you search for “fleximagically searchable” I expect to find an example where those two words can only be found in a file loaded by a running SWF. Yet another way of explaining it is that if you put some nonsensical text in a file, created a small Flash site that did nothing but load that file and display the text then you should be able to search for that text, or portions of it, and find the page that embeds the Flash site — and you shouldn’t find the page because the same text appeared on the by way of any other means than the Flash site loading them, not by being included in the HTML, the URL, any links to the site, etc.