Author Archive
So, Google is factoring web site speed into ranking algorithms. Just how are they figuring web site speed? What does it mean for a site engineer? We’ve been obsessing over page performance for some years now, so how does this really change anything?
Well, first off, how does it impact a site engineer? According to the official post:
it doesn’t carry as much weight as the relevance of a page. Currently, fewer than 1% of search queries are affected by the site speed signal in our implementation
Its not clear to me which 1% are affected or how that’s decided. Is it just a test affecting queries for 1% of users, or is it uniformly applied to all users but is only being applied to 1% of queries? Which queries and why?
Anyway, given that its only one of “more than 200 signals” clearly its not the major factor in determining relevancy. But still, Google can’t throw out a challenge like this and not expect people to obsess over it. Which is part of the point, I think.
When working on optimizing site performance engineers typically consider a variety of KPI:
So what is Google actually measuring as “web site speed”?
The official post displays a chart indicating
Labs > Site Performance shows the speed of your website as experienced by users around the world as in the chart below
Furthermore, they link to an earlier post describing site performance in webmaster tools which says
The performance overview shows a graph of the aggregated speed numbers for the website, based on the pages that were most frequently accessed by visitors who use the Google Toolbar with the PageRank feature activated
Matt Cutt’s post on site speed links to the blog post containing the above information indicating
Google’s webmaster console provides information very close to the information that we’re actually using in our ranking
So there it is. Google are measuring web site speed as Full Page Download, including all resources across ALL pages on your site. All pages. They confirm this
As the page load times are based on actual accesses made by your users, it’s possible that it includes pages which are disallowed from crawling. While Googlebot will not be able to crawl disallowed pages, they may be a significant part of your site’s user experience
So to recap:
Clearly, most of the well-documented best practices for speeding up your website still apply. So is there anything else to consider?
So to conclude… If you’re already focused on site performance, you don’t really have much to worry about. Keep optimizing pages for your real end users on real bandwidth and continuously monitor your sites performance.
I’ll be speaking at Velocity 2010 on the topic of “The Measurable Value of Performance By Design“. Last year Shopzilla talked about “You get what you measure“. This will be a followup covering the additional work we’ve done to try and maintain performance while rapidly adding features and experimenting on our site. In a nutshell, we took our eye off the ball, something we claimed was a bad idea. And it was. I’m going to talk about:
There are a lot of interesting-looking talks this year:
If you work on building or operating web applications (or both!) you really should consider attending Velocity.

I had a great time at TSSJS 2010. I posted the slides from my short 20 minute presentation:
Related to this there’s a great presentation from Bitcurrent that provides a concise overview of the Impact of web latency on conversion rates which includes some earlier data from Shopzilla.
Finally, Matt Raible provides pretty in-depth coverage of some of the talks at TSSJS 2010. One of the most interesting ones to me was Eben Hewitt’s Creating an Event-Driven SOA.
There are a number of techniques out there for cross-domain iframe hiding using URL fragments or proxy iframes. I’m presenting a technique that has worked well for hiding iframes that host 3rd party ad content but where occasionally there is no ad content to display. Specifically, it works even when you may have the same ad slot hosted on multiple sub-domains on your site. its necessary since the 3rd party ad content is served from an ad network such as Doubleclick.
Of course, its preferable to fill empty ad slots; but this has worked in a pinch where we simply did not have sufficient ad content and we preferred not to display an internal ad. We wanted the entire slot to disappear.
Not serving ads in iframes obviously avoids this entire problem, but we’ve preferred to sandbox 3rd party ads in iframes to prevent issues; we’ve found defects with ads especially when the same ad content could appear in more than one slot on the same page – where those ads jacked up the entire page display. Iframes are also naturally asynchronous, but of course there are Javascript techniques to gain this same feature.
* You are an Ad publisher who can control the “creative” that is returned by the ad network in the scenario where there is no real ad to display.
* You can host an HTML file on your site which helps the iframe creative to “bust out” and hide itself; that resource is accessible from the same path on each sub-domain of your site.
We’re exploting the fact that an iframe’s referrer URL is the URL of the containing page. The iframe can find the sub-domain of the parent page, then request and execute some Javascript from a resource on a known path on the sub-domain. This Javascript will run in the “security context” of the sub-domain. This Javascript code can then do whatever it wants, in our case hiding the parent element of the iframe.
Provide the special creative that should be served when there is no ad content. Have your ad operations people schedule this:
<script type="text/javascript">
if (window != top) {
var referrerMatch = document.referrer.match(/[A-Za-z]+:\/\/[A-Za-z0-9.-]+(:[0-9]+)?/);
if (referrerMatch != null) {
location.replace(referrerMatch[0] + '/iframe-hide.html');
}
}
</script>
Create the iframe-hiding helper /iframe-hide.html, making sure its accessible off every sub-domain on which the ad slot might be located:
<script type="text/javascript">
<!--
try {
var iframes = parent.document.getElementsByTagName('iframe');
for (var i=0; i < iframes.length; i++) {
if (document == iframes[i].contentDocument || self == iframes[i].contentWindow) {
iframes[i].parentNode.style.display = 'none';
break;
}
}
} catch (e) { }
-->
</script>
That’s pretty much it.
When the special creative is served it locates the /iframe-hide.html resource off the enclosing page’s sub-domain and it executes, hiding the parent node of the iframe.
We’ve had this solution in place for over a year; it works for the scenarios outlined above. However, this technique is not totally transparent to the end user as it requires an extra round-trip to occur before the iframe is hidden. It can cause obvious shifts in the page content for above-the-fold ad slots, especially ones that occupy a significant horizontal position. For less obvious ad slots, especially those below-the-fold, the user may be oblivious.
I was fortunate to get the opportunity to lead a team that re-engineered our bizrate.com and shopzilla.com websites from scratch on a new technology stack. One of the driving forces was to improve the performance of our sites. We knew that in order to make performance a first-order priority we had to design it into the architecture of the site.
We first began speaking about the bottom line benefits at Velocity 2009. Since then, we wanted to share some of the technical details about how we built our new site infrastructure and some of the techniques we used to measure then improve performance.
I got the chance to deliver a presentation at a number of Souther California Java Users Groups. We’ve chosen to share it here: Shopzilla – Performance By Design
Some other useful references for high performance sites:
Something else we’ve learned is that if you take your eye off the ball, performance will regress. As we’ve spent the last year adding more and more features to our sites, we’ve given back some of our performance gains. We recently embarked on a project to get some of that back and add more automtated performance measurement. Stay tuned to our Shopzilla Tech Blog for more info.
This article is a cross-post from Shopzilla Tech Blog
During a recent tech workshop, Phil challenged some of us to think about our roles from a different perspective; to give our “job descriptions” a bit of a different spin — focusing on job expectations. One of these exercises was to finish the thought, “I am a …”. I see a lot of job candidates with Architect titles on their resumes with a huge variety of skill sets and experience. Looking beyond technical skills and trying to distill the qualities of an Architect was certainly an interesting exercise.
Here is my take on the expectations of an Architect:
I am an Architect
I am an Architect and above all, I am relentless in my drive for continuous improvement.
Finally, what differentiates an architect on a smaller team from an enterprise architect or from a Chief Architect? I found an interesting paper - Role of the Chief Architect – that suggests there are many dimensions, but organizational scope could be the primary factor.

3KHEZWYNBEZ5