Why Google isn’t indexing dynamic content (yet)

In an interview with Lee Brimelow, the Senior Product Manager for Flash Player, Justin Everett-Church clears up some of the questions surrounding Adobe and Google’s new SWF indexing capabilities:

Flash Player does not actually implement the network API, we actually hand that off to our host, so in the case of a browser the browser will make a network request and that’s what adds cookies. A similar process is happening on the search server, where we will actually say “well, I need this XML file or I need this other SWF” and it’s up to the Google host application to return that content. My understanding right now is that that part of it has not been implemented by Google even though our search player allow that capability.

TheFlashBlog: Flash Player FAQs Video with Justin Everett-Church (the quote is at 8:20), emphasis mine.

Justin also mentions that he and Adobe have been very clear with Google on the point that content embedded with SWFObject must be indexable (which seems to have made an impact).

I think this interview clears up a lot of issues I’ve been wondering about lately, and kudos to Lee for posting it. It’s great to let the people that work directly with these things explain them directly, and I hope we will see more interviews like this in the future.

So to sum it up, this is the state of Google’s SWF indexing capabilities so far:

  • Content embedded with SWFObject will be indexed
  • Static text will be indexed
  • Dynamically loaded content will not be indexed
  • Any site that uses a bootstrap SWF to load the main content will not get indexed
  • Any calls to ExternalInterface will fail, so sites that depend on being able to communicate with the wrapper will have problems (this wasn’t explicitly mentioned, but I think it’s very likely judging from Justin’s description of how the indexing environment works)

There have also been reports that nothing that is done with ActionScript is indexed, but it’s possible that this is down to Google not using the new method on all its indexing servers yet.

New questions

Now that we’ve had some of our questions answered, I would like to pose some new ones. What I’ve been thinking about the last few days is exactly how the indexer interacts with the Flash site. This is how Google describes their new method:

We’ve developed an algorithm that explores Flash files in the same way that a person would, by clicking buttons, entering input, and so on.

Official Google Webmaster Central Blog: Improved Flash indexing, emphasis mine

But what is a button? There’s no such thing as a button in Flash, only display objects with event handlers. Since anything is potentially clickable, how will the indexer be able to discern which display objects that act as buttons? Will it only work with some special cases (for example if buttonMode is true or with subclasses of SimpleButton)? What about interaction that isn’t clicking — take dontclick.it for (an extreme) example, will that site get indexed properly?

This is a question for Google, rather than Adobe. What Adobe has provided is very low-level and Google probably have a lot of work to do to get things working. I have a feeling that last month’s announcement was a little premature, but that we will see improvements during the fall.

However, I must add that I still don’t think that this will ever get better than progressive enhancement, and that SWF indexing is a red herring diverting the attention away from more appropriate approaches to Flash search engine optimization.

6 Responses to “Why Google isn’t indexing dynamic content (yet)”

  1. Nuno Rosa Says:

    “Any calls to ExternalInterface will fail,(…)” Hope not, because even if google indexes some content most of it will use EI for deeplinking.

    “(…)how will the indexer be able to discern which display objects that act as buttons?(…)” My guess is that their approach might be to crawl the display list looking for InteractiveDisplayObject descendants and if they have any listeners subscribing mouse events, triggers it.

    Adobe should release more info on how the “player” interacts with content, for now, that’s more important than what is or what isn’t indexed. The announcement was more to create the hype inside the community, let’s see what comes next.

  2. Theo Says:

    @Nuno

    Unless the deep linking implementation assumes that EI will always be available there should be no problem, I’m thinking more of applications that are integrated with Ajax-based applications on the same page, or for other reasons don’t work if EI is not available. If a site works in the standalone Flash Player it should be OK.

  3. Jensa Says:

    Hi, I interviewed Justin about this some time ago. You can find details here:

    http://www.flashmagazine.com/News/detail/swfs_to_become_fully_searchable/

    J

  4. Arul Says:

    Google do not index the dynamic contents. Because dynamic contents are invisible to google crawler eyes.

  5. Tom H Says:

    What I’m worried about now is will the Flash content of a progressively-enhanced site be used instead of the good ‘ol HTML? I hope Google have a mechanism in place for deciding if it’s best they crawl the Flash, the HTML, or both.

  6. How to make Flex RIA contents accessible to search engines like Google? - Programmers Goodies Says:

    [...] network interface to be able to load any referenced resources, like XML data, other SWFs, etc., and this is currently not implemented by Google. This means that for an application that loads all it’s data dynamically, like say, all that [...]

Leave a Reply