Who's afraid of the big bad preloader?



Who's afraid of the big bad preloader?

0 0


preloader-velocity-nyc-talk


On Github yoavweiss / preloader-velocity-nyc-talk

Who's afraid of the big bad preloader?

Hi, I'm Yoav Weiss, and I won't be talking about Responsive Images for the next 45 minutes. Well, almost.

What will we be talking about?

Preloaders!!!

I've been working on Web performance stuff for the last 15 years and on responsive images implementation in Blink & WebKit in the last 2 years.

The reason that I'm talking about it is that when I was talking about responsive images in last year's velocity, and asked the audience "who knows what is the browser's preloader", only Steve raised his hand. That made me think that we're not talking about this enough.

Also as part of my responsive images work, I had to fiddle quite a lot with the preloader on the one hand, and manage mailing list flames on the other, from developers who thought that it's a "mindless optimization" that's "holding us back".

As a web performance engineer, the arrival of the preloader really changed the way we had to think about Web performance, and improved the load times of not-so-optimized web sites significantly.

So I'm here to talk a little about what is the preloader, what it does, and why we need it. Badly.

Once upon a time

https://www.flickr.com/photos/soldiersmediacenter/3351707140 So, once upon a time in a land far far away, browsers were fetching Web pages.
https://www.flickr.com/photos/tacker/7008650081 The way they were doing that was that a user would typein a URL, or go to a link. The browser would then fetch the HTML and start parsing it and creating a DOM tree.

DOM

Subresources

https://www.flickr.com/photos/crazymandi/8165507495 Once a DOM element the requires to download a subresource was created that needs downloading, the subresource would be added to the download queue, and the parser continued parsing out more resources.

SUPER subresources!

http://commons.wikimedia.org/wiki/File:Long_Beach_Comic_%26_Horror_Con_2011_-_Krypto,_the_Super_Dog_(6301707368).jpg

But, not all subresources are created equal.

Some of them have super powers!.

when the parser created a script element, it would halt and wait for all the resources that may impact running of this javascript (including the script itself) to download, get evaluated and executed, before it would continue the parsing work.

SLOW!

That meant that for pages with multiple external scripts and CSS files (which is the lot of them), the resource downloading process was extremely inefficient.

Frustration

http://chriscoven.deviantart.com/art/Grumpy-cat-344073323

Users got frustrated by the waiting.

Web developers got frustrated by their poor performing sites.

Workarounds

Books were written with workaround techniques & best practices:

Minimizing the number of CSS and JS requests, putting scripts at the bottom, etc

Those are still good practices in an HTTP/1.1 world, but in pre-preloader days, their impact was HUGE

A native solution

And evetually, browsers decided to do something about it.

Around 2007-2008, browsers added, each on its own, a mechanism called the preloader.

Well, no one actually called it the preloader. IE called it the look ahead parser, Firefox called it the speculative parser, and in WebKit it was called the preloadScanner.

I like to call it the preloader, since it's a vendor neutral term, and it describes what it does, preload resources

"The greatest browser optimization of all times"

- Steve Souders

What does it do?

Even though implementations were different, the basic idea behind them was the same.

Peek into HTML

Peek into the HTML coming in from the network and

Speculatively download resources

start an early fetch of the resources that will most probably be requested later on.

How does it work?

Despite rumors, the preloader is not some weird regex engine, it doesn't look into the raw bytes, etc, etc.

In order to explain the preloader, a quick detour into how HTML parsing works - what happens between the time that the browser gets the HTML as bytes on the wire(less) and the time it has a complete DOM tree.

Preprocessing

Tokenization

"<img src><div>some text</div></img>" ↓StartTag: img, src attribute, StartTag: div, Text: "some text", EndTag: div, EndTag: img

Before parsing can start, the browser turns the incoming bytes into characters, and turns these characters into tokens in a process that's called tokenization.

A token represents an HTML tag, its name, its attributes and their values

But a token doesn't really know anything about HTML's rules (which tags auto close, which tags can be nested inside other tags, etc)

Parsing

HTMLBodyElement HTMLImageElement HTMLDivElement Text

The parser then takes these tokens and creates a DOM tree from them, while applying the specced HTML parsing rules.

Unlike tokenization, since the parser is building the DOM tree, it must run on the main thread

The tokenizer may or may not run on a background thread

Where does the preloader fit in?

The preloader steps in between the tokenization phase and the parsing phase.

It feeds off serialized tokens (so basically complete HTML tags), and concludes from them which resources are likely to be needed later on, and the type of such resources

So the preloader has no notion of a DOM tree, or nesting, but it can keep track of tags, ignore comments, etc

After it's done with the tokens, it just passes them on to the parser

Like the tokenizer, the preloader may be off the main thread

Why are we blocking???

So why is the parser blocked on scripts anyway?

DOM state

var p = document.createElement("p");
document.body.appendChild(p); 

Synchronous scripts may expect the DOM to be in a certain state

JavaScript can alter the HTML, which means that the parser can reliably continue parsing, adding stuff to the DOM and only then run the javascript.

That may cause the javascript to break (since it rightfully expecting the DOM to be in a certain state when it runs).

document.write

<script>document.write("<!--");</script>
<div>...
scripts can alter the HTML, which means creating the "future" DOM may become irrelevant

Style queries

<script>
var width = 
    document.getElementById("thing").width();
</script>

Javascripts (both internal and external) can look into style information on the current DOM.

That means that if the external CSS wasn't fully downloaded, the browser have to wait for it to arrive, and apply it, before running scripts that are later on in the page.

Some browsers optimized that to do that only when scripts actually include such directives, while others didn't bother with it (citation needed).

Which resources?

All of them fetch scripts, CSS and images

XXXXXXXXXX - add some stuff that aren't prefetched in some browsers

XXXXXXXXXX - add some stuff that aren't prefetched anywhere

This can change at any time

<link> external CSS

External JS

<img>

@import

<video poster>

<picture><img>

Not preloaded

  • <input>
  • <object>
  • <video><source>
  • <audio><source>
  • <iframe>
  • <link rel=import>

CSS based resources

JS based resources

Priorities

Once the browser has digged up all the resources, it needs to prioritize them if it wants to download the most important ones first

Different browsers do that in different ways

It used to be that each content type had its priority and things would download by that

Nowadays its more complicated than that

Based on context

  • CSS
  • Scripts
  • Visible images
  • Non-visible images

<head> first

Render blocking resources in the head are being downloaded first, at least in Chromium.

Spec???

Now you may ask yourselves "where's the spec to all of this wonder?"

There is no preloader spec since it's an optimization that does not have any compatibility implications

The fact that it isn't specced allows browsers to rapidly innovate in that area, and come up with faster priority schemes, etc

What has the preloader ever done for us?

Before

After

20% on average

Source: https://plus.google.com/+IlyaGrigorik/posts/8AwRUE7wqAE 20% improvements across the board: load time, domContentLoaded and speedIndex

<advice>

Critical resources must be in markup

Preloaders don't see into JS, and currently, not even into CSS

Resources that you want the preloader to pick up and download, should be in markup

Non-critical resources... Maybe not

If OTOH, you have resources that are not necessary to the initial page load, they must not be in markup

e.g. lazy loading, resp img solution, etc

Don't invalidate the DOM

document.getElementByTagName("base").href=
    "http://DontDoThat.com"

Assume nothing about loading order

The network download order of a resource can and will change between browsers and browser versions.

Make your peace with it

Assumption fail #1

Setting cookies with JS

If you're setting cookies in JS, you cannot expect them to be present in the same page load on all the images.

They may be set of some but not for others.

Assumption fail #2

Scripts at the bottom scripts at the bottom may get downloaded before images

Use APIs that expose critical hidden resources

FontLoad API

var f = new FontFace("newfont", 
                     "url(newfont.woff)", {});
f.load().then(function (fontFact) {
    // Do something once font is loaded.
});
                    

Resource hints

<link rel="preload" href="/assets/font.woff" 
      as="font">
<link rel="preconnect" href="http://thirdparty.com">
                    

Render blocking resources are bad, mmmmmkay?

At least in Chromium, to avoid contention, the download happens in two phases.

The first phase downloads one low priority resource and the render blocking resources

</advice>

<future???>

Now let's talk about what can be improved in the preloader.

So it's more of a "possible future" or "future wherein someone will give me time to work on that"

More resources

<iframe> <link rel=import>

Better CSS support

Blink & WebKit currently support @import fetching for internal style tags.

Other browsers should also support that

@import support in external scripts in underway in Blink, thanks to Johny Rein from Opera

Fonts

Being able to start download of Web fonts earlier can be a huge win for speed index and UX.

Background images

Background images can also be helpful if we could start them earlier, but to a lesser extent.

Resource priorities

There are currently 2 proposals for specs that manage the priorities of resources declared in markup

If either would get adoption, it'd enable the preloader to be smarter, and would enable devs to opt-out of the preloader

</future>

Speaking of the future

HTTP2?

No load queue

Priorities sent to the server

You could claim that detecting resources (and their priorities) as fast as possible becomes even more important

No contention avoidance

Server push tho

For first party resources, you could also claim that server push makes all that obsolete

Time will tell

To sum it up

The preloader is an extremely important browser optimizations

Developers should be aware of it, and work with it, rather than against it

I believe that there's still some more optimization juice that can be squized out of it

https://www.flickr.com/photos/tambako/13498520775 Dealing with the preloader can be at times scary
https://www.flickr.com/photos/etersigni/6787786882 But when it comes to preformance, the preloader is your best friend. Be kind to it, and he'd be kind to you.

Thanks!

@yoavweiss on Twitter & GitHub

Slides: yoavweiss.github.io/preloader-velocity-nyc-talk