On Github yoavweiss / preloader-velocity-nyc-talk
Preloaders!!!
I've been working on Web performance stuff for the last 15 years and on responsive images implementation in Blink & WebKit in the last 2 years.
The reason that I'm talking about it is that when I was talking about responsive images in last year's velocity, and asked the audience "who knows what is the browser's preloader", only Steve raised his hand. That made me think that we're not talking about this enough.
Also as part of my responsive images work, I had to fiddle quite a lot with the preloader on the one hand, and manage mailing list flames on the other, from developers who thought that it's a "mindless optimization" that's "holding us back".
As a web performance engineer, the arrival of the preloader really changed the way we had to think about Web performance, and improved the load times of not-so-optimized web sites significantly.
So I'm here to talk a little about what is the preloader, what it does, and why we need it. Badly.
But, not all subresources are created equal.
Some of them have super powers!.
when the parser created a script element, it would halt and wait for all the resources that may impact running of this javascript (including the script itself) to download, get evaluated and executed, before it would continue the parsing work.
Users got frustrated by the waiting.
Web developers got frustrated by their poor performing sites.
Books were written with workaround techniques & best practices:
Minimizing the number of CSS and JS requests, putting scripts at the bottom, etc
Those are still good practices in an HTTP/1.1 world, but in pre-preloader days, their impact was HUGE
And evetually, browsers decided to do something about it.
Around 2007-2008, browsers added, each on its own, a mechanism called the preloader.
Well, no one actually called it the preloader. IE called it the look ahead parser, Firefox called it the speculative parser, and in WebKit it was called the preloadScanner.
I like to call it the preloader, since it's a vendor neutral term, and it describes what it does, preload resources
- Steve Souders
Even though implementations were different, the basic idea behind them was the same.
Peek into the HTML coming in from the network and
start an early fetch of the resources that will most probably be requested later on.
Despite rumors, the preloader is not some weird regex engine, it doesn't look into the raw bytes, etc, etc.
In order to explain the preloader, a quick detour into how HTML parsing works - what happens between the time that the browser gets the HTML as bytes on the wire(less) and the time it has a complete DOM tree.
Before parsing can start, the browser turns the incoming bytes into characters, and turns these characters into tokens in a process that's called tokenization.
A token represents an HTML tag, its name, its attributes and their values
But a token doesn't really know anything about HTML's rules (which tags auto close, which tags can be nested inside other tags, etc)
The parser then takes these tokens and creates a DOM tree from them, while applying the specced HTML parsing rules.
Unlike tokenization, since the parser is building the DOM tree, it must run on the main thread
The tokenizer may or may not run on a background thread
The preloader steps in between the tokenization phase and the parsing phase.
It feeds off serialized tokens (so basically complete HTML tags), and concludes from them which resources are likely to be needed later on, and the type of such resources
So the preloader has no notion of a DOM tree, or nesting, but it can keep track of tags, ignore comments, etc
After it's done with the tokens, it just passes them on to the parser
Like the tokenizer, the preloader may be off the main thread
var p = document.createElement("p"); document.body.appendChild(p);
Synchronous scripts may expect the DOM to be in a certain state
JavaScript can alter the HTML, which means that the parser can reliably continue parsing, adding stuff to the DOM and only then run the javascript.
That may cause the javascript to break (since it rightfully expecting the DOM to be in a certain state when it runs).
<script>document.write("<!--");</script> <div>...scripts can alter the HTML, which means creating the "future" DOM may become irrelevant
<script> var width = document.getElementById("thing").width(); </script>
Javascripts (both internal and external) can look into style information on the current DOM.
That means that if the external CSS wasn't fully downloaded, the browser have to wait for it to arrive, and apply it, before running scripts that are later on in the page.
Some browsers optimized that to do that only when scripts actually include such directives, while others didn't bother with it (citation needed).
All of them fetch scripts, CSS and images
XXXXXXXXXX - add some stuff that aren't prefetched in some browsers
XXXXXXXXXX - add some stuff that aren't prefetched anywhere
This can change at any time
Once the browser has digged up all the resources, it needs to prioritize them if it wants to download the most important ones first
Different browsers do that in different ways
It used to be that each content type had its priority and things would download by that
Nowadays its more complicated than that
Now you may ask yourselves "where's the spec to all of this wonder?"
There is no preloader spec since it's an optimization that does not have any compatibility implications
The fact that it isn't specced allows browsers to rapidly innovate in that area, and come up with faster priority schemes, etc
Preloaders don't see into JS, and currently, not even into CSS
Resources that you want the preloader to pick up and download, should be in markup
If OTOH, you have resources that are not necessary to the initial page load, they must not be in markup
e.g. lazy loading, resp img solution, etc
document.getElementByTagName("base").href= "http://DontDoThat.com"
The network download order of a resource can and will change between browsers and browser versions.
Make your peace with it
If you're setting cookies in JS, you cannot expect them to be present in the same page load on all the images.
They may be set of some but not for others.
var f = new FontFace("newfont", "url(newfont.woff)", {}); f.load().then(function (fontFact) { // Do something once font is loaded. });
<link rel="preload" href="/assets/font.woff" as="font"> <link rel="preconnect" href="http://thirdparty.com">
At least in Chromium, to avoid contention, the download happens in two phases.
The first phase downloads one low priority resource and the render blocking resources
Now let's talk about what can be improved in the preloader.
So it's more of a "possible future" or "future wherein someone will give me time to work on that"
Blink & WebKit currently support @import fetching for internal style tags.
Other browsers should also support that
@import support in external scripts in underway in Blink, thanks to Johny Rein from Opera
There are currently 2 proposals for specs that manage the priorities of resources declared in markup
If either would get adoption, it'd enable the preloader to be smarter, and would enable devs to opt-out of the preloader
You could claim that detecting resources (and their priorities) as fast as possible becomes even more important
No contention avoidance
The preloader is an extremely important browser optimizations
Developers should be aware of it, and work with it, rather than against it
I believe that there's still some more optimization juice that can be squized out of it
@yoavweiss on Twitter & GitHub