Media Access – Buzzwords! – Media Fragments URI



Media Access – Buzzwords! – Media Fragments URI

0 0


mediaaccess


On Github surma-slides / mediaaccess

Media Access

by Alexander “Surma” Surma, October 2014

surma-slides.github.io/mediaaccess

Included you will find…

Bundle of standards (mostly drafts) about different kinds of access to different kinds of media.

Media is video, audio, webpages, anything. 2 completed standards (+ Ontology), 1 working candidate, 8 drafts. Use-cases et al. not included. Different sizes, extensions and a lot of overlap.

Buzzwords!

Who has not heared of getUserMedia() or Web Audio API?

Arguably the most hyped and well known standards out of the media access collection.

Buzzwords!

February 3, 2013

WebRTC = Web Real-Time Communication. A lot of synergy and mutual motivation.

Buzzwords!

October 9, 2014

Jan Moschke. Extensive User of WebAPI. Collaborative music editor with WebRTC. DevFest 2013 & 2014.

getUserMedia()

Demo

No flash! We have acces to stream, frames, audio etc. Details later. Lots of potential.

Web Audio API

Demo

Access to decoders, generators, input and output devices, etc

Media Access?

  • Completed: access media files (fragments)
  • Completed: access media metadata
  • access A/V devices
  • generate and output audio
  • process and record audio streams
  • accessibility

First two are “completed”. Rest is “draft”. Access A/V has one, tiny candidate standard.

Media Fragments URI

Access fragments of media

“Completed Work”. Tight coupling with a draft “Fragment Resolution”. What are fragments? Fragments can be in temporal and spatial dimensions. Also tracks.

Media Fragments URI

Goal

Media Fragments URI

Dimensions

  • Spatial
  • Temporal
  • Tracks
  • ID
ID is a a convenience layer for temporal dimension. Let’s start with the raw syntax. Where to apply it will be covered later.

Media Fragments URI

Temporal Fragmentation

Syntax:

t=<start>,<end>
              

Example:

t=20,1:24:30
t=npt:50
t=,22:10
              

npt=Normal Play Time, could also be SMPTE timecodes (Society of Motion Picture & Television Engineering)

Media Fragments URI

Spatial Fragmentation

Syntax:

xywh=<x>,<y>,<w>,<h>
              

Example:

xywh=160,120,320,240
xywh=pixel:160,120,320,240
xywh=percent:25,25,50,50
              

Both temporal and spatial fragmentation have exact definition how to validate and calculate the fragmented media. Where to apply the syntax? Let's talk about URIs.

URI

Intermezzo

  foo://example.com:8042/over/there?name=ferret#nose
  \_/   \______________/\_________/ \_________/ \__/
   |           |            |            |        |
scheme     authority       path        query   fragment
              

URI as a abstraction of URLs. URLs by Tim Berners-Lee. URLs specify a location. Supposed to be persistent over time. Changing content via redirection.

Fragment technically not part of resource locator (UserAgent handles it, will not be transferred to server).

Media Fragments URI

Usage

Individual resource (server-side implementation)

http://yt.com/FL3MqSKLNHY?t=20,30&xywh=160,120,320,240
              
Modifier on a resource (client-side implementation)
http://yt.com/FL3MqSKLNHY#t=20,30&xywh=160,120,320,240
              

Semantics! Technically no relation with server-side implementation. Reencoding. Standard defines how to validate and calculate fragment descriptions. video-Tag supports temporal only. How is server side implemented? How is client side implemented?

Media Fragments URI

Resolution

Original picture from the standard. Ugly. No SVG. I was fed up. Summary! When cached: easy. When not cached: Range request. Can the UA solve fragment2byte-range problem? When not: Special header or query.

Media Fragments URI

Augmented HTTP Range Header

GET /video.ogv HTTP/1.1
Host: www.example.com
Accept: video/*
Range: t:npt=10-20
            
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes, t, id
Content-Length: 3743
Content-Type: video/ogg
Content-Range: bytes 19147-22880/35614993
Content-Range-Mapping: { t:npt 9.85-21.16/0.0-653.79 }
  = { bytes 19147-22880/35614993 }
Etag: "b7a60-21f7111-46f3219476580"

{binary data}
            
What if client doesn’t have headers (resolution, color space, etc?)

Media Fragments URI

Augmented HTTP Range Header

GET /video.ogv HTTP/1.1
Host: www.example.com
Accept: video/*
Range: t:npt=10-20;include-setup
            
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes, t
Content-Length: 804020
Content-Type: multipart/byteranges;boundary=End
Content-Range-Mapping:
  { t:npt 10.0-20.0/0-38.3;include-setup } =
  { bytes 0-1650,1264525-2066894/4055466 0}
  --End
Content-Type: video/webm
Content-Range: bytes 0-1650/4055466
{binary data}
--End
Content-Type: video/webm
Content-Range: bytes 1264525-2066894/4055466
{binary data}
            

Media Fragments URI

Support

caniuse.com/#search=fragments Non-existent. caniuse.com by no means authorative, but yeah.

Metadata API

Metadata of media

Examples of metadata, which currently can’t be read by webapps. Going to keep this short, boring and unusable. Think of PDF, MP4,...

Metadata API

Sources of metadata

  • User-Agent
  • Media-Resource Web Service
Media identified by URI. User-Agent might inspect headers, web service can provide more data. Think IMDb.

Metadata API

Example

mediaResource = new MediaResource();
aSyncObject = mediaResource.createMediaResource(
   "http://www.w3.org/.../MAWG-Stockholm-20090626.JPG",
   metadataSources, 1);
            
Some JPG. What are those resources?

Metadata API

Example

metadataSources = new MetadataSource[2];
metadataSources[0] = new MetadataSource(
    "http://www.w3.org/.../DC_example1.xml",
    "dc"
  );
metadataSources[1] = new MetadataSource(
    "http://www.w3.org/.../MAWG-Stockholm-20090626.JPG",
    "exif"
  );
            
Unifying EXIF and another "Dublin Core".

Metadata API

Support

This is Chrome Canary! Notice something? Both "completed works" are not supported at all.

HTML Media Capture

Capture media in forms with the <input> tag.

Short standard. Good standard. Unsupported standard. “Recommendation Candidate”.

HTML Media Capture

<input type="file" accept="image/*" capture>
<input type="file" accept="video/*" capture>
<input type="file" accept="audio/*" capture>
                

That’s it! In forms it will be multipart/mime but can also be accessed via JS.

HTML Media Capture

HTML Media Capture

Support

Nope. No support.

Media Capture and Streams

Access to multimedia streams from local devices

Danger Zone! Draft! Typically, but not limited to, audio, video or both. Freshly engineered infrastructure for media streams.

Media Capture and Streams

This is the design. MediaTracks of a stream are synced. To specify the implementation guidelines, W3C uses WebIDL!

WEBIDL

Intermezzo

Web Interface Definition Language. Self-explanatory. The Irony of an untyped langauge.

Media Capture and Streams

interface MediaStream : EventTarget {
    sequence<MediaStreamTrack> getAudioTracks ();
    sequence<MediaStreamTrack> getVideoTracks ();
    MediaStreamTrack?          getTrackById (DOMString trackId);
    void                       addTrack (MediaStreamTrack track);
    void                       removeTrack (MediaStreamTrack track);
    MediaStream                clone ();

    readonly    attribute DOMString    id;
    readonly    attribute boolean      ended;
                attribute EventHandler onended;
                attribute EventHandler onaddtrack;
                attribute EventHandler onremovetrack;
};
            
clone() -> independent consumption, allows pipelines. Cut detector.

Media Capture and Streams

[Constructor,
  Constructor (MediaStream stream),
  Constructor (MediaStreamTrackSequence tracks)]
            
Repackaging. Act as a source. Consider WebRTC.

Media Capture and Streams

interface NavigatorUserMedia {
    void getUserMedia (
      MediaStreamConstraints? constraints,
      NavigatorUserMediaSuccessCallback successCallback,
      NavigatorUserMediaErrorCallback errorCallback
    );
};
            
Under discussion!

Media Capture and Streams

{
  mandatory: {
    width: { min: 640 },
    height: { min: 480 }
  },
  optional: [
    { width: 650 },
    { width: { min: 650 }},
    { frameRate: 60 },
    { width: { max: 800 }},
    { facingMode: "user" }
  ]
}
            
Under discussion!

Media Capture and Streams

navigator.getUserMedia = (
  navigator.getUserMedia ||
  navigator.webkitGetUserMedia ||
  navigator.mozGetUserMedia ||
  navigator.msGetUserMedia
);
              

The usual cross-browser trickery

Media Capture and Streams

navigator.getUserMedia({
  video: true,
  audio: false
},
function(localMediaStream) {
  var video = document.querySelector('video');
  video.src = window.URL.createObjectURL(localMediaStream);
  video.play();
  setTimeout(function() {
    localMediaStream.stop();
    video.src = "";
  }, 5000);
},
function(err) {
  console.log("The following error occured: " + err);
});
              

This is the demo code.

Media Capture and Streams

Support

caniuse.com/#feat=stream

Media Recording

[Constructor (MediaStream stream)]
interface MediaRecorder : EventTarget {
    readonly attribute MediaStream        stream;
    readonly attribute RecordingStateEnum state;
    void              record (optional long? timeslice);
    void              stop ();
    void              pause ();
    void              resume ();

    attribute EventHandler       ondataavailable;
    // ...
    // It’s a huge interface
    // ...
};
            
Encodes into container/muxing (webm, maybe others). Returns blob (as in HTML5 blob).

Media Recording

Support

Nope

Image Capture

[Constructor(VideoStreamTrack track)]
interface ImageCapture : EventTarget {
    readonly    attribute PhotoSettingsOptions photoSettingsOptions;
    readonly    attribute VideoStreamTrack     videoStreamTrack;
                attribute EventHandler         onphoto;
                attribute EventHandler         onerror;
                attribute EventHandler         onphotosettingschange;
                attribute EventHandler         onframegrab;
    void setOptions (PhotoSettings? photoSettings);
    void takePhoto ();
    void getFrame ();
};
            
FrameGrab -> Return RGBA Canvas Thingy. Photo -> Apply settings (restart?), filters and return blob.

Image Capture

Support

Depth Stream

Arcane technology. Depth is just pixel data as well, but the technicality of the standard is interesting.

Depth Stream

partial dictionary MediaStreamConstraints {
    (boolean or MediaTrackConstraints) depth = false;
};            
partial interface MediaStream {
    sequence getDepthTracks ();
};
            

Depth Stream

Support

Nope Attention! Most depth-cameras (like Kinect) identify as normal cameras with b/w image stream.

Web Audio API

Goals

  • Processing
  • Synthesizing
  • Audio Routing Graph Paradigm
  • Sample-accurate sound playback with low latency
  • LFOs, Envelopes, FFT, biquad filter, ...
  • 3D support
Think: MaxDSP/PureData. Pretty long standard, but only because a lot of node types.

Web Audio API

Bird’s view

“the right thing just happens”

Web Audio API

Audio Context

[Constructor]
interface AudioContext : EventTarget {
    readonly attribute AudioDestinationNode destination;
    readonly attribute AudioListener listener;

    ScriptProcessorNode createScriptProcessor(...);

    AnalyserNode createAnalyser();
    GainNode createGain();
    DelayNode createDelay(optional double maxDelayTime = 1.0);
    BiquadFilterNode createBiquadFilter();
    // ...
};
            
Creation of new nodes via context because initialization and low-level C++ engine. OfflineAudioContext for fast rendering.

Web Audio API

Audio Node

interface AudioNode : EventTarget {
    void connect(
      AudioNode destination,
      optional unsigned long output = 0,
      optional unsigned long input = 0
    );
    void disconnect(optional unsigned long output = 0);

    readonly attribute AudioContext context;
    readonly attribute unsigned long numberOfInputs;
    readonly attribute unsigned long numberOfOutputs;

    attribute unsigned long channelCount;
    attribute ChannelCountMode channelCountMode;
    attribute ChannelInterpretation channelInterpretation;
};
            

Web Audio API

Audio Listener

interface AudioListener {

    attribute double dopplerFactor;
    attribute double speedOfSound;

    // Uses a 3D cartesian coordinate system
    void setPosition(double x, double y, double z);
    void setOrientation(double x, double y, double z, double xUp, double yUp, double zUp);
    void setVelocity(double x, double y, double z);

};
            

Web Audio API

Oscillator Node

interface OscillatorNode : AudioNode {

    attribute OscillatorType type;

    readonly attribute AudioParam frequency; // in Hertz
    readonly attribute AudioParam detune; // in Cents

    void start(double when);
    void stop(double when);
    void setPeriodicWave(PeriodicWave periodicWave);
};
            
Supports: "sine", "square", "sawtooth", "triangle", "custom" Cent = 2^(1/1200)

Web Audio API

Support

Links

  • http://www.w3.org/TR/2012/REC-media-frags-20120925/
  • http://www.w3.org/TR/2011/WD-media-frags-recipes-20111201/
  • http://www.w3.org/TR/WebIDL/
  • http://www.w3.org/TR/2013/WD-mediacapture-streams-20130903/
  • http://www.w3.org/TR/2013/WD-mediastream-recording-20130205/
  • http://www.w3.org/TR/2014/WD-mediacapture-depth-20141007/
  • http://www.w3.org/TR/2013/WD-image-capture-20130709/
  • http://www.w3.org/TR/2013/WD-webaudio-20131010/

Media Access

by Alexander “Surma” Surma, October 2014

surma-slides.github.io/mediaaccess