So, here we go again, Google has annonced that they will index SWF files with a new algorithm and the whole Flash blogosphere echobox is ringing with the words of the clueless. The announcement shows how little Google understands about Flash websites and needlessly diverts the attention away from developing a real solution to Flash website search engine optimization. The reaction to Google’s announcement also shows how little the Flash bloggers understand about the problem. I’m not sure which of these two is the most annoying.
The bottom line is that SWF indexing is a lost cause, it will not make a difference, and the only thing that has changed is that now Google is even better at finding nothing.
To illustrate this let’s analyse the the caveat at the end of the new announcement:
- We currently do not attach content from external resources that are loaded by your Flash files. If your Flash file loads an HTML file, an XML file, another SWF file, etc., Google will separately index that resource, but it will not yet be considered to be part of the content in your Flash file.
- It will not work if your content is loaded dynamically as XML or if you use a bootstrap SWF.
Even if it’s true as they say that they will discover links to external content when they scan a SWF,
unless they actually execute the code they will miss most things. Just consider this code:As has been pointed out in the comments, executing the code is exactly what Google claim that they do. Google doesn’t have to understand that the variable above contains an URL, they can just run the code and see that it indeed loads something. However, this doesn’t change the fact that figuring out what it really means is really, really hard. Why was the file loaded? What significance does it have? Was it loaded as a response to something or was it just preloaded to be used later. Does one part of the loaded data relate to any other part, or is it a random collection of stuff? etc. And even Google they got hold of that XML file, how would the spider know how to correctly parse the XML? Unlike HTML there is no given semantic structure to a XML document, there is no way for Google to make any sense out of it or where it would appear in the actual application, even if it would ever appear, it might just be configuration. Moreover, if you use a bootstrap SWF your content will not be indexed correctly since the relation between the main SWF and the bootstrap will not be maintained. The point of a bootstrap file is that it has to be loaded first, but since Google will not find any content in it, it will not be ranked and probably never found. Even if the main SWF is indexed and possible to find you will not be able to visit the site since it has to be loaded using the bootstrap, which will not be found… It’s Catch-22: you need the bootstrap to be what people find, but it can’t since the nature of a bootstrap is that it devoid of content.
var url : String = baseUrl + "/content.xml";There is no way that Google will be able to figure out what the URL to the content is, or even understand that the variable actually contains a URL, that would require very serious intelligence on the part of their spider.
As you can see, Google will acheive very little with their new indexing algorithms, and they must know so, I cannot belive that the Google engineers are not aware of these issues. It’s also surprising how few in the Flash blogosphere that are aware of the problems.
Indexing Flash web sites isn’t easy, but it will never be acheived by indexing SWF files — the problem is in the very nature of the format. SWF files are executable applications, not semantically structured data, like HTML.
What we need is a constructive dialogue with Google about how to solve the problem for real Flash sites, but first Google has to get rid of their extremely naive idea of what Flash sites are. The last few announcements from Google about SWF indexing have not been helpful and have only served to divert the attention away from solving the real problem.