The Red Herring revisited

Google has been indexing SWF:s using their new techniques for a couple of weeks now, and it should be possible to see what it really means. I was very critical in my last post on the subject, and some of the things I have been proven wrong about, but it seems that so far I have been mostly right, nothing has really changed.

What I was wrong about

Google does indeed find content embedded using SWFObject, something I seriously doubted that it would. I’ve seen reports that embedding using object tags works better, and Adobe’s AC_FL_RunContent is worse.

Another good thing is that when you search for something that was found inside an SWF it is the embedding HTML page that appears in the search results. If it instead had been the SWF itself then we would have to actively hide our Flash-based sites from Google, but luckily it seems to work as it should.

What I was right about, so far at least

I haven’t found any compelling evidence that Google has indexed any dynamically loaded content, or at least not that that content counts towards the rank of the SWF or embedding page.

We’ve known for a long time that Google can pick apart a SWF and find any static text inside it, but since they now claim to be able to “explores Flash files in the same way that a person would” you would expect dynamically loaded content to weigh in towards the rank of the SWF or embedding page, otherwise they aren’t actually doing anything they didn’t do before.

Ryan’s competition basically works like this: create a Flex based application that loads content containing the words “fleximagically searchable” dynamically, but only on a user activated event (a button click for example). The site that ranks highest on Google in the beginning of September wins. The competition has some major flaws, but is suitable for looking for sites that get indexed for their dynamic content, rather than the contents of the actual SWF.

Using Ryan Stewart‘s “fleximagically searchable” competition it should be easy to measure the competence of Google’s SWF indexing. Searching for those two words yields many hits but most are blogs talking about the competition, rather than being competition entries. Looking at the few hits that are Flash-based all have the words as static text in the SWF, none of the top 20 is a Flash-based site that only loads the words dynamically (there is one that looks like it does, but it uses cloaking — serving the content text-only when the user agent is “googlebot”). This means that in the weeks since Ryan announced the competition no entry that fulfills the rules has actually been indexed by Google, as far as I can tell.

There can be a number of reasons as to why this is. One is that the competition entries have been ranked so low that I don’t find them, but I’ve tried searching for the exact phrases without luck, so I lean towards one of the two other possible reasons: Google can’t do what it claims, or Google hasn’t actually started doing it yet, despite saying so.

If it is the case that Google just haven’t turned on all the features of it’s Flash indexer yet we might yet see some results, and when we do I will strike out the parts of this post that are wrong.

If someone has an example of Google indexing dynamically loaded content, that can be shown not to be down to some other factor like the text of links to that page, the page title or other page content, please let me know and I’ll link to it here as a proof that I was wrong.

Update (July 23): The owner of the fleximagically-searchable.com domain has done some extensive testing and come to more or less the same conclusion as I have: Google doesn’t (yet) index dynamically loaded content. Even worse is that it also seems like Google isn’t indexing anything that is dynamically set using ActionScript, only static text, which is more or less what they’ve done for a long time.

Update (July 26): In an interview, Justin Everett-Church, the Senior Product Manager for Flash Player said

Flash Player does not actually implement the network API, we actually hand that off to our host, so in the case of a browser the browser will make a network request and that’s what add cookies. A similar process is happening on the search server, where we will actually say “well, I need this XML file or I need this other SWF” and it’s up to the Google host application to return that content. My understanding right now is that that part of it has not been implemented by Google even though our search player allow that capability.

TheFlashBlog: Flash Player FAQs Video with Justin Everett-Church (the quote is at 8:20), emphasis mine.

Just to make sure that there are no misunderstandings this is what I mean when I say that Google doesn’t seem to index dynamically loaded content: I have seen no examples where a page only appeared as a result to a query because of content that was not on the embedding HTML page and not as static text in the SWF file but was only in data from an external source (e.g. an XML or text file) loaded by the SWF. For example, if you search for “fleximagically searchable” I expect to find an example where those two words can only be found in a file loaded by a running SWF. Yet another way of explaining it is that if you put some nonsensical text in a file, created a small Flash site that did nothing but load that file and display the text then you should be able to search for that text, or portions of it, and find the page that embeds the Flash site — and you shouldn’t find the page because the same text appeared on the by way of any other means than the Flash site loading them, not by being included in the HTML, the URL, any links to the site, etc.

4 Responses to “The Red Herring revisited”

  1. Theo Says:

    Perhaps I should explain why I think that Ryan’s competition has major flaws:

    Firstly, SEO is not only about content, it is just as much about linking to the content and having an appropriate URL, page title, etc. The content only counts for so much. Moreover, each entry will contain the exact same content, namely the search string (any other content can be subtracted because it will not have any relation to it), this means that removing any other variables from the calculation all entries should rank the same. Thus we can conclude that the winner will win only because of factors outside of the goal of the competition. So neither Flex nor Google’s new indexing techniques have anything to do with it.

    Secondly, Flex search engine optimization is not very interesting. There are a limited number of use cases where it would even make sense to index a Flex application — examples are shops and showcases — but most applications require the user to log in, or contain only user generated content that is transient or private and can’t be indexed.

    I think that Ryan only have good intentions. Competitions like this are great, it’s just that perhaps this one wasn’t so well though through. Unfortunately I don’t think it has done much more than spread misunderstandings about the “why” of Flex SEO.

  2. John Dowdell Says:

    Info about Google following SWFObject JavaScript, and not yet doing external data loads, was in an update to Google’s original group post on the subject: http://googlewebmastercentral.blogspot.com/2008/06/improved-flash-indexing.html

    Google did say that they had already been indexing webpage application states discovered via Ichabod, but they did not offer a public estimate of how long it would take to process all sites.

    (That question of “What should a search engine do with data which webpages dynamically load?” is a very complex one, applying to Ajax as well as Flash. Many SEO debates are about each and every word of text, when it’s a near-certainty that sites would not be significantly ranked for incidental text. For instance, if multiple sites access the same external datasource, which one gets ranked higher?)

    (I had initial doubts what that “fleximagically searchable” test would prove, particularly after I saw a lot of webpages discussing it…. ;-)

    jd/adobe

  3. Theo Says:

    Yes, the issue of how dynamically loaded content should be weighed is a tricky one, especially in the context of Flash where there is nothing like the semantics of HTML. A blob of text is loaded, cut up and stuck in different places of the application, but how do you make any sense out of it?

    When it comes to Ajax the way you do that is to look for changes in the DOM and reindex. You capture the semantics of the new state just as any web page, h1 means header, p means body text, and so on.

    When it comes to Flash there’s no semantics at all, all you see is plain text in text fields. How do you know what is important and what is not, what relates to what and which fields are headers and which are just junk?

    I’ve argued for progressive enhancement in favour of SWF indexing again and again, it’s better that we deliver the content in a structured and rich format to Google, than Google trying to make sense of the content in an environment where it has been stripped of its structure and semantics.

    SWF indexing is a red herring: it diverts attention away from more useful solutions.

  4. John Dowdell Says:

    I’d be surprised if Google or Yahoo or Microsoft did much processing of the DOM at all.

    A lot of SEO writing is just eyewash… first you figure out how people might plausibly search for your service, and whether you might plausibly rank highly on those terms (eg, you’ve got too much competition on query “flowers”, but might place on “‘san francisco’ flower delivery orchids”), and then work on your usual title, URL, metadata and particularly inbound anchor text.

    With the state of search engines today, expecting any type of meaningful realworld result for every single world just seems strange.

    jd/adobe

Leave a Reply