Why Page Speed Isn’t Enough

Ajax Architecture – Market Navigation

Archive feature on Outlook for Mac

I’ve always liked the workflow Gmail implemented by having an Archive “folder” (its really only a label in Gmail) and encouraging me to keep my inbox clean; although I’m always surprised when I see people retain thousands of emails in their inbox.  When working with Outlook 2011 on the Mac, I want the same effect:  I created an Archive folder and move every email I’ve read and dealt with into that folder.

Unfortunately Outlook doesn’t have a built-in way for automatically moving an email into a named folder.  When you invoke the Move To Folder… key combination, it prompts for a folder name.

Its possible, however, to write a script and attach it as a key combination.  Here’s the script:

tell application "Microsoft Outlook"
	activate
	set msgSet to current messages
	if msgSet = {} then
		error "No messages selected. Select at least one message."
		error -128
	end if
	set theMsg to item 1 of msgSet
	set theAccount to account of theMsg
	set archiveFolder to folder "Archive" of folder "Inbox" of theAccount
	repeat with aMessage in msgSet
		move aMessage to archiveFolder
	end repeat
end tell

About the only thing you may need to customize is the path to and name of your Archive folder.

set archiveFolder to folder "Archive" of folder "Inbox" of theAccount

To use the script:

  1. From Outlook, click the little Script symbol to the right of the Help menu.
  2. Select About This Menu…
  3. Click Open Folder button
  4. Save the above script in a file called “Archive\cA” to this folder. – Yes, that name is correct; it associates the script with the keyboard shortcut CTRL+A. You may find it easiest to save the script to a file on your Desktop, then drag it into the finder window that was opened

Now, highlight an email message and press CTRL+A. It will move the highlighted email into the Archive folder.

About the only caveat is that sometimes it is a little “slow”, and if you’re quick enough, you can type CTRL+A but then move your focus onto another email which is then archived instead.

Google Instant

Google have, of course, changed the game of search again with Google Instant.  Their official blog suggests “Search: now faster than the speed of type“.  The speed is certainly impressive, but probably the most interesting questions are around how it will affect user behavior: impressions, clicks on both organic and paid results and how this will affect SEO.

In the opinion of Joshua Bixby “Eventually, page 2 search results are going to be irrelevant”.

They also describe how it’ll affect impressions.

From a web application perspective, I’m excited that they raised the bar yet again.  Its actually pretty fun to play catch up to Google… when you’re not a direct competitor, of course.

Application Resiliency

Came across an interesting presentation from Chris Westin at Yahoo.  It piqued my interest because it documented a lot of the same concepts we exploited at Shopzilla.  Namely:
  • Loosely coupled, independent application functions
  • Redundancy at every tier
  • UI handles failed or timed-out web service calls and degrades gracefully
  • Implement page fragments to render portions of data returned from HTTP service calls
  • Multiple parallel asynchronous HTTP calls to fetch content
  • Employ caching at the service level

One difference at Shopzilla is that we didn’t wait for all service calls to complete before rendering the page.  We began rendering from the top; utilizing Futures from the Java concurrency API, it would automatically block if we needed data that wasn’t yet available, or proceed to render the content if the data was already there.  Judicious use of response flushing could allow us to send portions of the page back to the browser for rendering even while the server was still busy.

View more presentations from Chris Westin.

Betfair

I recently joined Betfair, based out of London. Betfair is the world’s largest betting community. I’ll be working on the sports site products. Its a really interesting platform – exchange betting allows you to place bets with other people rather than Betfair itself – so is much more like trading on a stock market.

Its a pretty high-volume business, executing millions of transactions per day on over a billion http requests, so is sure to be exciting.

Oh, and Betfair is hiring.

Betfair Logo

Velocity 2010 Performance by Design video

How does Google measure site speed?

So, Google is factoring web site speed into ranking algorithms.  Just how are they figuring web site speed?  What does it mean for a site engineer?  We’ve been obsessing over page performance for some years now, so how does this really change anything?

Well, first off, how does it impact a site engineer?  According to the official post:

it doesn’t carry as much weight as the relevance of a page. Currently, fewer than 1% of search queries are affected by the site speed signal in our implementation

Its not clear to me which 1% are affected or how that’s decided.  Is it just a test affecting queries for 1% of users, or is it uniformly applied to all users but is only being applied to 1% of queries?  Which queries and why?

Anyway, given that its only one of “more than 200 signals” clearly its not the major factor in determining relevancy.  But still, Google can’t throw out a challenge like this and not expect people to obsess over it.  Which is part of the point, I think.

When working on optimizing site performance engineers typically consider a variety of KPI:

  • Time to first byte
  • Base page download
  • Progressively rendered elements:  headers, above-the-fold
  • Full page download, including all resources

So what is Google actually measuring as “web site speed”?

The official post displays a chart indicating

Labs > Site Performance shows the speed of your website as experienced by users around the world as in the chart below

Furthermore, they link to an earlier post describing site performance in webmaster tools which says

The performance overview shows a graph of the aggregated speed numbers for the website, based on the pages that were most frequently accessed by visitors who use the Google Toolbar with the PageRank feature activated

Matt Cutt’s post on site speed links to the blog post containing the above information indicating

Google’s webmaster console provides information very close to the information that we’re actually using in our ranking

So there it is.  Google are measuring web site speed as Full Page Download, including all resources across ALL pages on your site.  All pages.  They confirm this

As the page load times are based on actual accesses made by your users, it’s possible that it includes pages which are disallowed from crawling. While Googlebot will not be able to crawl disallowed pages, they may be a significant part of your site’s user experience

So to recap:

  • They’re measuring full page load including all resources.  Your scripts, your images, third party display ads, third party scripts etc.
  • They’re measuring all pages visited by users on your site, not just crawlable pages.
  • They’re measuring from users actual web browsers.  No simulations.  From real bandwidths.

Clearly, most of the well-documented best practices for speeding up your website still apply.  So is there anything else to consider?

  • Post-loading content is looking pretty interesting to us, if it can be done in such a way that it is not factored into page load time.
  • Minimizing 3rd party content, such as display ads, could have a huge impact.  We have little control over 3rd party creative.  I’ve seen ads make up to 7 additional HTTP requests for XML, Flash, images etc.  Steve Souders has a complete initiative around Performance of 3rd Party Content.
  • Focus on pages that might be contributing to longer load times even if they aren’t your primary experience.
  • Beware of links on your page that are served from your domain but redirect to other sites.  E.g. http://your-site.com/redirector?target=<some_other_url> that 302s to some_other_url.  I believe Google is counting the foreign site load times as part of the linking domain’s performance.  I’ll have concrete numbers on that in a few weeks.
  • Continuously monitor and measure your web-site performance over typical user bandwidths, for example using Keynote KITE.  Optimizing for your office LAN and a low-end DSL connection are two different propositions.

So to conclude… If you’re already focused on site performance, you don’t really have much to worry about.  Keep optimizing pages for your real end users on real bandwidth and continuously monitor your sites performance.

Velocity 2010 – The Measurable Value of Performance By Design

I’ll be speaking at Velocity 2010 on the topic of “The Measurable Value of Performance By Design“.  Last year Shopzilla talked about “You get what you measure“.  This will be a followup covering the additional work we’ve done to try and maintain performance while rapidly adding features and experimenting on our site.  In a nutshell, we took our eye off the ball, something we claimed was a bad idea.  And it was.  I’m going to talk about:

  • Why we took our eye off the ball
  • The real financial costs of slowing down
  • The specific techniques we applied to make our sites fast again
  • The process and technology framework we put in place to ensure this never happens again

There are a lot of interesting-looking talks this year:

If you work on building or operating web applications (or both!) you really should consider attending Velocity.

O'Reilly Velocity Web Performance & Operations Conference 2010

Slides from Performance By Design TSSJS 2010

I had a great time at TSSJS 2010.  I posted the slides from my short 20 minute presentation:

Related to this there’s a great presentation from Bitcurrent that provides a concise overview of the Impact of web latency on conversion rates which includes some earlier data from Shopzilla.

Finally, Matt Raible provides pretty in-depth coverage of some of the talks at TSSJS 2010.  One of the most interesting ones to me was Eben Hewitt’s Creating an Event-Driven SOA.

Hiding empty ad slots with cross-domain iframe scripting

There are a number of techniques out there for cross-domain iframe hiding using URL fragments or proxy iframes.  I’m presenting a technique that has worked well for hiding iframes that host 3rd party ad content but where occasionally there is no ad content to display.  Specifically, it works even when you may have the same ad slot hosted on multiple sub-domains on your site.  its necessary since the 3rd party ad content is served from an ad network such as Doubleclick.

Of course, its preferable to fill empty ad slots; but this has worked in a pinch where we simply did not have sufficient ad content and we preferred not to display an internal ad.  We wanted the entire slot to disappear.

Not serving ads in iframes obviously avoids this entire problem, but we’ve preferred to sandbox 3rd party ads in iframes to prevent issues; we’ve found defects with ads especially when the same ad content could appear in more than one slot on the same page – where those ads jacked up the entire page display.  Iframes are also naturally asynchronous, but of course there are Javascript techniques to gain this same feature.

Pre-conditions of this solution

* You are an Ad publisher who can control the “creative” that is returned by the ad network in the scenario where there is no real ad to display.

* You can host an HTML file on your site which helps the iframe creative to “bust out” and hide itself; that resource is accessible from the same path on each sub-domain of your site.

How it works

We’re exploting the fact that an iframe’s referrer URL is the URL of the containing page.  The iframe can find the sub-domain of the parent page, then request and execute some Javascript from a resource on a known path on the sub-domain.  This Javascript will run in the “security context” of the sub-domain.  This Javascript code can then do whatever it wants, in our case hiding the parent element of the iframe.

Steps

Provide the special creative that should be served when there is no ad content.  Have your ad operations people schedule this:

<script type="text/javascript">
  if (window != top) {
    var referrerMatch = document.referrer.match(/[A-Za-z]+:\/\/[A-Za-z0-9.-]+(:[0-9]+)?/);
    if (referrerMatch != null) {
      location.replace(referrerMatch[0] + '/iframe-hide.html');
    }
  }
</script>

Create the iframe-hiding helper /iframe-hide.html, making sure its accessible off every sub-domain on which the ad slot might be located:

<script type="text/javascript">
<!--
try {
  var iframes = parent.document.getElementsByTagName('iframe');
  for (var i=0; i < iframes.length; i++) {
    if (document == iframes[i].contentDocument || self == iframes[i].contentWindow) {
      iframes[i].parentNode.style.display = 'none';
      break;
    }
  }
} catch (e) { }
-->
</script>

That’s pretty much it.

When the special creative is served it locates the /iframe-hide.html resource off the enclosing page’s sub-domain and it executes, hiding the parent node of the iframe.

Conclusion

We’ve had this solution in place for over a year; it works for the scenarios outlined above.  However, this technique is not totally transparent to the end user as it requires an extra round-trip to occur before the iframe is hidden.  It can cause obvious shifts in the page content for above-the-fold ad slots, especially ones that occupy a significant horizontal position.  For less obvious ad slots, especially those below-the-fold, the user may be oblivious.