Web Analytics & Tracking 101



Web Analytics & Tracking 101

0 2


web-tracking-101-slides

[OUTDATED AS OF 08/2016] Slides of my "Web Analytics & Tracking 101" lecture.

On Github willdurand-edu / web-tracking-101-slides

Web Analytics & Tracking 101

William Durand — May 30, 2016

Agenda

  • Web Analytics
  • Google Analytics
  • Web Tracking
  • The NSA

Web Analytics

Definition

Web analytics is the measurement, collection, analysisand reporting of web data for purposes of understandingand optimizing web usage.

  • It can be used as a tool for business and market research
  • There are two categories of web analytics:

    • off-site web analytics
    • on-site web analytics

Off-Site WA

Off-site web analytics refers to web measurement and analysis regardless of whether you own/maintain a website.

On-Site WA

On-site web analytics measures a visitor's behavioronce on your website. There are two main technicalways of collecting the data: server log file analysis,and page tagging.

Google Analytics is the most widely used on-site webanalytics service. Piwik is its Open-Source alternative.

Server Log File Analysis (1/2)

Web servers record some of their transactions in a logfile.

In the early 1990s, web site statistics consisted of counting the number of client requests (hits) made to the web server.

This was a reasonable method initially, since each web siteoften consisted of a single HTML file. However, with the introduction of images in HTML, and web sites that spanned multiple HTML files, this count became less useful.

AWStats and Webalizer are the most well-knownweb server log analysis softwares.

Server Log File Analysis (2/2)

Two units of measure were introduced in the mid-1990s to gauge more accurately the amount of human activity on web servers:

  • page views: a request made to the web server for a page,
  • visits or sessions: a sequence of requests from a uniquely identified client that expired after a certain amount of inactivity.

Because of caching, this method is not accurate enough.

Page Tagging (1/3)

In the mid-1990s, Web counters were commonly seen:

U Can't Touch This!

Page Tagging (2/3)

In the late 1990s this concept evolved to include a small invisible image (called a web bug) rather than a visible one, and, by using JavaScript, to pass along with the image request certain information about the page and the visitor.

This information can then be processed remotely by a web analytics company, and extensive statistics generated.

Page Tagging (3/3)

Historically, vendors of page-tagging analytics solutions have used third-party cookies sent from the vendor's domain instead of the domain of the website being browsed.

Third-party cookies can handle visitors who cross multiple unrelated domains within the company's site, since the cookie is always handled by the vendor's servers. However, there are two problems: privacy concerns and cookie deletion.

Most vendors of page tagging solutions have now moved to provide at least the option of using first-party cookies (i.e. cookies assigned from the client subdomain).

Google Analytics

Overview

Google Analytics works by the inclusion of a block of JavaScript code on pages in your website. When users to your website view a page, this JavaScript code references a JavaScript file which then executes the tracking operation.

The tracking operation retrieves data about the page request through various means and sends this information to theGoogle Analytics server via a list of parameters attached toa single-pixel image request.

How Does The Tracking Code Work? (1/2)

A browser requests a web page that contains the tracking code A JavaScript Array named _gaq is created and tracking commands are pushed onto the array A <script> element is created and enabled for asynchronous loading The ga.js tracking code is fetched. Once fetched and loaded, the commands on the _gaq array are executed and the array is transformed into a tracking object Loads the script element to the DOM After the tracking code collects data, the GIF request is sent to the Analytics database for logging and post-processing

How Does The Tracking Code Work? (2/2)

How Does Google Analytics Collect Data? (1/2)

The data used to provide all the information in Google Analytics reports comes from three different sources:

  • The HTTP request of the user (hostname, the browser type, referrer, language)
  • Browser/system information (Java and Flash support, screen resolution thanks to the DOM)
  • First-party cookies (in order to obtain user session)

How Does Google Analytics Collect Data? (2/2)

When all this information is collected, it is sent to the Google Analytics servers in the form of a long list of parameters attached to a single-pixel GIF image request:

http://www.google-analytics.com/__utm.gif?
  utmac=UA-30138-1&                      // Account
  utmcc=__utma%3D97315849.1774621898...& // Cookie values
  utmdt=analytics%20page%20test&         // Page title
  utmfl=9.0%20r48&                       // Flash version
  utmhn=example.com&                     // Hostname
  utmn=769876874&                        // Unique ID to prevent caching
  utmcs=ISO-8859-1&                      // Language enc.
  utmsr=1280x1024&                       // Screen resolution
  utmsc=32-bit&                          // Screen color depth
  utmul=en-us&                           // Browser language
  ...

The Google Analytics GIF Request Parameters

Web Tracking

Definition

Web tracking or Website visitor tracking is the analysis of visitor behaviour on a website. Analysis of an individual visitor's behaviour may be used to provide that visitor with options or content that relates to their implied preferences.

It Is All About Cookies!!!

(Well, most of the time)

Cookies (1/2)

A cookie is a small piece of data sent from a website and stored in a user's web browser while the user is browsing that website according to Wikipedia.

Every time the user loads the website, the browser sends the cookies back to the server to notify the website of the user's previous activity.

Cookies (2/2)

Cookies have some important implications on theprivacy and anonymity of web users.

While cookies are sent only to the server setting them or a server in the same Internet domain, a web page may contain images or other components stored on servers in other domains.

Cookies that are set during retrieval of these components are called third-party cookies. Advertising companies use third-party cookies to track a user across multiple sites.

EU Cookie Directive

The EU cookie directive requires websites to gain permission from users, before planting cookies with two exceptions:

  • Some cookies can be exempted from informed consent under certain conditions if they are not used for additional purposes (e.g., cookies used to keep track of a user's input when filling online forms or as a shopping cart);
  • First-party analytics cookies are not likely to create a privacy risk if websites provide clear information about the cookies to users and privacy safeguards.

How Do Third-Party Tracking Cookies work? (1/4)

Let's say you have four websites:

  • gizmodo.com
  • slashdot.org
  • wired.com
  • facebook.com

You are a user, and you visit each of these websites. Each of these websites also works with a third-party company whose job is to serve ads, and named: badboys.com.

On every page of these websites, there is a web bug that points to badboys.com's adserver and requests an ad.

How Do Third-Party Tracking Cookies work? (2/4)

Assuming gizmodo.com knows your age and gender, when visiting it, the following request is made:

http://badboys.com/adrequest?site=gizmodo.com?age=25&gender=m
Browser requests content Server requests cookies (if any) Browser sends cookies Server does some black magic, and picks an ad Server returns new cookie data, and the ad Browser displays this ad

How Do Third-Party Tracking Cookies work? (3/4)

Even though you are visiting the site gizmodo.com, the cookie is under the domain badboys.com. When the server returns new cookie data, it puts in there when you last visited gizmodo.com and how many times you’ve been there today.

Now let's say you spend the morning watching tech-news on gizmodo.com, slashdot.org, and wired.com, all the while receiving ads from badboys.com. Each page you view gives them a little bit more information about you.

How Do Third-Party Tracking Cookies work? (4/4)

Now, you go to facebook.com. Serving an ad for such a website is complicated as it depends on how you use Facebook. Maybe you didn't fill in enough information to know your age, your gender or things you like.

But, let's walk through the steps of a Facebook's ad call to badboys.com in more details!

1. Browser Requests Content

http://badboys.com/adrequest?site=facebook.com

2. Server Requests Cookies (If Any)

3. Browser Sends Cookies

  • Cookie contains your age & gender (25 & male) (encrypted)
  • Cookie shows you visited gizmodo.com 12 times this morning
  • Cookie shows you visited slashdot.org 3 times this morning
  • Cookie shows you visited wired.com once this morning

4. Server Does Some Black Magic,And Picks An Ad

  • Server puts user data (25, male, 16 visits to tech sites) into a profile engine
  • Profile engine represents categories, say male, tech
  • Server looks for ad campaigns targeted to male and tech
  • Server picks highest paying male and tech targeted ad campaign

That's the black magic!

5. Server Returns New Cookie Data,And The Ad

6. Browser Displays This Ad

Conclusion

Instead of showing you a random ad, badboys.com is able to show you a highly relevant and targeted advertisement that you are far more likely to click on.

And...

Given how much tracking companies know about your browsing history, it is worth asking whether these companies also know who you are. The answer, unfortunately, appears to be yes, at least for those of you who use social networking sites.

Paths For Data Leakage FromSocial Networks To Third-Party Tracking Companies

The most obvious way that a third-party tracker (remember badboys.com?) might learn which account on a social networking site is yours via the HTTP Referrer header.

A typical URL on a social networking site includes a username or user ID number, and any third-party will be able to see that.

What Can You Do?

Pick a good cookie policy for your browser, like only keep cookies until I close my browser, or manual approval of all cookies Disable Flash cookies and all the other kinds of super cookies Use browser extensions to control when third-party sites can include content in your pages or run code in your browser Use uBlock (ad blocker) Privacy Badger (Ghostery is controversed) TOR? Tails?

Also, check out panopticlick.eff.org, and Lightbeam!

Is your browser safe against tracking? (1/3)

Browser fingerprinting is a method of tracking webbrowsers by the configuration and settings informationthey make visible to websites.

If your browser is unique, then it's possible that an online tracker can identify you even without setting tracking cookies.

Paper: How Unique Is Your Web Browser?

Is your browser safe against tracking? (2/3)

Canvas fingerprinting is a more sophisticated type of browser fingerprinting technique. Entropy is due to Operating System, browser, GPU, and graphics driver. In short:

  • Relies on a hidden HTML5 <canvas>
  • JavaScript renders text and drawing
  • The final bitmap is then converted into a unique token

Paper: The Web Never Forgets: Persistent Tracking Mechanisms in the Wild

Canvas Fingerprinting

Source: https://securehomes.esat.kuleuven.be/~gacar/persistent/index.html

Is your browser safe against tracking? (3/3)

History stealing with CSS :visited

Plugging the CSS History Leak / Extra Lecture: Privacy on the Web

jenairienacacher.fr

Are We Safe?

The NSA

A slide from an internal NSA presentation indicating that the agency uses at least one Google cookie as a way to identify targets for exploitation. Thank you Edward Snowden!

Google's PREF Cookie

Google assigns a unique PREF cookie anytime someone's browser makes a connection to any of the company's Web properties or services.

This can occur when consumers directly use Google services (Search, Maps, etc.), or when they visit websites that contain embedded widgets (Google Plus).

That cookie contains a code that allows Google to uniquelytrack users to personalize ads.

What The F**k?

The NSA and its British counterpart, GCHQ, are using cookies that advertising networks place on computers to identify people browsing the Internet.

The intelligence agencies have found particular use for a part of a Google-specific tracking mechanism known as the PREF cookie.

These cookies typically don't contain personal information, but they do contain numeric codes that enable websites to uniquely identify a person's browser.

Thank You.

Questions?

1
Web Analytics & Tracking 101 William Durand — May 30, 2016