type_erasure



type_erasure

2 58


type_erasure


On Github tzlaine / type_erasure

Type Erasure

Solving Classic OOP Problems with an Elegant Design Pattern

Zach Laine

Outline

  • The Importance of Values
  • Why Polymorphism?
  • Those Other Solutions
  • Type Erasure

A Quick Aside

All the code you will see in this presentation compiles and works as advertized.

Part 1

The Importance of Values

Dealing with values instead of references has a couple of very nice benefits

  • Clear ownership/lifetime semantics
  • Equational reasoning

Clear Semantics

Who owns foo?

foo_t * foo_factory ();

foo_t * foo = foo_factory();

Is this better?

std::shared_ptr<foo_t> foo_factory ();

std::shared_ptr<foo_t> foo = foo_factory();

That depends on whether you're partial to global state....

Equational Reasoning

foo_t * foo_factory ();
void foo_user (foo_t * f);

void some_function () {
    foo_t * foo = foo_factory();
    foo_user(foo);

    // What can I say about foo here?

    /* more code ... */
}

With value types, you only have to reason about the values in the code front of you.

With reference types, you must simultaneously reason about code that mutates the values elsewhere.

Part 2

Why Polymorphism?

Code reuse of course!

Q: How does the author of this function intend its users to use it?

bool my_predicate (some_t obj);

A: Non-polymorphically. You have to give me a some_t object, or get sliced. Not terribly reusable with other types.

Q: How about this one?

bool my_predicate (const base_t & obj);

A: Runtime-polymorphically. You can give me any base_t-derived object by reference. The function is now more reusable, but you must use inheritance. Yuck!

Q: Or this?

template <typename T>
bool my_predicate (T obj);

A: Compile-time-polymorphically. You can give me any object whose type can be used to instantiated the template. The function is now even more reusable, but you must use metaprogramming. Gross!

Q: Is this an example of polymorphism?

struct base { virtual ~base () {} };
struct derived : base { virtual int foo () { return 42; } };

void some_function () {
    base * b_pointer = new derived;
    requires_foo(static_cast<derived*>(b_pointer)); // <-- this here
}

A: No! Polymorphism means using one type as if it were another type. Here, we have to cast to make the type we have look like the type expected in the interface to foo().

Part 3

Those Other Solutions

  • Inheritance-Based Runtime polymorphism
  • Template-based Compile-time polymorphism

(Problems with) Inheritance as Runtime Polymorphism

The inheritance mechanism gives us polymorphism, but we must either:

  • Limit ourselves to a single interface found in the base class:
struct base
{ virtual int foo () const = 0; };

struct derived : base
{
    virtual int foo () const
    { return 42; }

    float bar () const
    { return 3.0f; }
};

void some_function () {
    base * b_pointer = new derived;
    uses_foo(b_pointer);    // Yay!
    // uses_bar(b_pointer); // Aww...
}

Note that the "base" could use multiple inheritance, as long as the symbols in all the multiply inherited bases do not collide.

OR

  • Use multiple inheritance to bolt on multiple interfaces:
struct base { virtual ~base() {} };
struct has_foo : virtual base { virtual int foo () { return 42; } };
struct has_bar : virtual base { virtual float bar () { return 3.0f; } };
struct derived : has_foo, has_bar {};

void some_function () {
    base * b_pointer = new derived;
    uses_foo(dynamic_cast<has_foo*>(b_pointer)); // Wait, but...
    uses_bar(dynamic_cast<has_bar*>(b_pointer)); // Aww...
}

Which leads to the "diamond of death",

which leads to virtual inheritance,

which leads to fear,

which leads to anger,

... you get it.

More importantly, we have actually given up polymorphism by doing this; we must carry around the information that b_pointer contains interfaces not found in base.

We cannot make classes from different class hierarchies conform to a common interface, because they have no common base to use when passing them.

struct int_foo
{
    virtual int foo () { return 42; }
    virtual void log () const;
};
struct float_foo
{
    virtual float foo () { return 3.0f; }
    virtual void log () const;
};

Note that int_foo and float_foo both have a member

void log() const.

So, we ditch code reuse in the case of logging:

void log_to_terminal (const int_foo & loggee);
void log_to_terminal (const float_foo & loggee);

Or we resort to ugly hacks:

struct log_base { virtual void log () const; };
struct int_foo : log_base {/*...*/}; 
struct float_foo : log_base {/*...*/}; 
void log_to_terminal (const log_base & loggee);

We cannot easily separate interface and implementation without using multiple inheritance.

Virtual functions can be tricky to get right, especially in large class heirarchies.

  • We've all seen lots of these problems in real code.
  • C++11's override and final help.
  • There is frequently a question of whether or not a given type's virtual function implementation should call its base class's version of the same function.

In addition to all the above limitations, we are limited in which interfaces we give to which types by our choice of base class(es).

Inheritance imposes very tight coupling; it is not possible to have unrelated types with the same interfaces used interchangably.

We must always take parameters by reference to use runtime polymorphism. There go our nice benefits from value semantics.

A Quick Case Study: Widgets and Layouts

Widgets are UI elements (buttons, text boxes, etc.).

Layouts place widgets in the UI.

A layout can contain widgets.

A layout can contain sublayouts.

Using inheritance, you are all but locked in to giving layouts and widgets a common base,

... even though that makes no sense.

(Problems with) Templates as Compile-time Polymorphism

The Classic Problems

  • Metaprogramming requires a large body of knowledge about a large number of obscure language rules, and even more obscure TMP-specific tricks.

  • Metaprogramming heavy code is hard to maintain. This is true for experts, but is moreso in a team of varying skill levels.

  • Metaprogramming might be simply impossible to use where you work (even if you wanted to).

  • Compile times and object code size can get away from you if you're not careful.

Does not play well with runtime variation.

Compile-time is easy:

template <typename TrueType, typename FalseType, bool Selection>
typename std::conditional<Selection, TrueType, FalseType>::type
factory_function () {
    return
        typename std::conditional<Selection, TrueType, FalseType>::type();
}

int an_int = factory_function<int, float, true>();
float a_float = factory_function<int, float, false>();

Runtime is impossible (without type erasure):

template <typename TrueType, typename FalseType>
auto factory_function (bool selection) -> /* ??? */
{ return /* ??? */; }

Because of the lack of easy runtime interoperability, once you decide to use TMP, you're almost always stuck doing more TMP.

There must be a better way!

Part 4

Type Erasure

Based on everything we've seen so far, we want an interface that works like this:

// No coupling!
// No virtual functions, kind of!
struct foo { int value () const; }; 
struct bar { int value () const; };

// No templates!
int value_of (magic_type obj)
{ return obj.value(); }

void some_function () {
    // Value semantics!
    if (value_of(foo()) == value_of(bar())) {
        /*...*/
    }
}

Making magic happen

struct anything
{
    anything () = default;
    anything (const anything & rhs);
    anything & operator= (const anything & rhs);
    template <typename T> anything (T t);
    template <typename T> anything & operator= (T t);

    struct handle_base
    {
        virtual ~handle_base () {}
        virtual handle_base * clone () const = 0;
    };

    template <typename T>
    struct handle : handle_base
    {
        handle (T value);
        virtual handle_base * clone () const;
        T value_;
    };

    std::unique_ptr<handle_base> handle_;
};

anything definitions

template <typename T>
anything::anything (T t) :
    handle_ (new handle<typename std::remove_reference<T>::type>(
        std::forward<T>(t)
    ))
{}

anything::anything (const anything & rhs) :
    handle_ (rhs.handle_->clone())
{}

template <typename T>
anything & anything::operator= (T t)
{
    anything temp(std::forward<T>(t));
    std::swap(temp, *this);
    return *this;
}

anything & anything::operator= (const anything & rhs)
{
    anything temp(rhs);
    std::swap(temp, *this);
    return *this;
}

anything::handle definitions

template <typename T> 
anything::handle<T>::handle (T value) :
    value_ (std::move(value))
{}

template <typename T> 
anything::handle_base * anything::handle<T>::clone () const
{ return new handle(value_); }

anything in action

int i = 1; 
int * i_ptr = &i;

anything a;
a = i;
a = i_ptr;

a = 2.0;
a = std::string("3");

struct foo {};
a = foo();

Dymamic typing ("duck typing"), similar to that in Python. Consider dumping scripting languages for this.

That's not a joke.

Q: Can anything really hold anything?

A: No, but it can hold anything with a copy constructor. It should really be called copyable.

Ok, so how do we get back to this?

// No coupling!
// No virtual functions, kind of!
struct foo { int value () const; }; 
struct bar { int value () const; };

// No templates!
int value_of (magic_type obj)
{ return obj.value(); }

void some_function () {
    // Value semantics!
    if (value_of(foo()) == value_of(bar())) {
        /*...*/
    }
}

Easy -- just forward the calls to value():

    // anything gets:
    int value () const
    { return handle_->value(); }
    // anything::handle_base gets:
    virtual int value () const = 0;
    // anything::handle<T> gets:
    virtual int value () const
    { return value_.value(); }

We can add any arbitrary API in the same fashion. It's easy but repetitive to do so. More on that in a bit.

Say we have two disjoint APIs we wish to support.

One erased type for all widgets:

struct widget
{
    // some boilerplate ...
    void render () const;
    // more boilerplate ...
};

And one for all objects that can be used in our layout system:

struct layoutable
{
    // boilerplate ...
    layout_geometry geometry () const;
    // boilerplate ...
};

Here it is in use:

struct button
{
    void render () const;
    layout_geometry geometry () const;
    void set_label ();
    void set_image ();
    // etc.
};

void do_layout (layoutable l) {
    layout_geometry geometry = l.geometry();
    /* use geometry... */
}
void render_widget (widget w)
{ w.render(); }

void some_function () {
    button b;
    do_layout(b);
    render_widget(b);
}

Q: What about performance?

A: It's complicated.

Pointer-based Inheritance vs. Erased Types

Function call overhead is exactly the same.

How about heap allocations?

Operation Inheritance Simple Type Erasure Construct Yes Yes Copy No Yes Assign No Yes Get Alternate Interface No* Yes

* dynamic_cast<> is not free, and is not consistent with polymorphism.

This chart also applies to copies of the underlying object.

Optimizing the Type Erasure Technique

Step 1: Accept references

Instead of always copying the given value, accept std::reference_wrappers.

button b;
widget ref_widget(std::ref(b));   // Underlying object
widget cref_widget(std::cref(b)); // not copied.

Replace the handle constructor with these:

        template <typename U = T>
        handle (T value,
                typename std::enable_if<
                    std::is_reference<U>::value
                >::type * = 0) :
            value_ (value)
        {}

        template <typename U = T>
        handle (T value,
                typename std::enable_if<
                    !std::is_reference<U>::value,
                    int
                >::type * = 0) noexcept :
            value_ (std::move(value))
        {}

Add this specialization:

    template <typename T>
    struct handle<std::reference_wrapper<T>> :
        handle<T &>
    {
        handle (std::reference_wrapper<T> ref) :
            handle<T &> (ref.get())
        {}
    };

Pointer-based Inheritance vs. Erased Types

Allocations did not change.

How about copies of the underlying object?

Operation Inh. Simple TE TE + ref Construct Yes Yes No Copy No Yes No Assign No Yes No Alt. Interface No* Yes No

All versions discussed hereafter include support for std::reference_wrapper.

Step 2: Use a Copy-On-Write Wrapper

Instead of always copying our type erased objects, only copy them when they are mutated.

copy_on_write<widget> w_1(widget{button()});
copy_on_write<widget> w_2 = w_1;    // No copy.
widget & mutable_w_2 = w_2.write(); // Copy happens here.

A nice benefit of using copy-on-write is thread safety.

Pointer-based Inheritance vs. Erased Types

Allocations:

Operation Inh. Simple TE TE + COW Construct 1 1 2 Copy 0 1 0 Assign 0 1 0 Alt. Interface 0* 1 2

Step 3: Integrate Copy-On-Write into Erased Types

Remove one set of allocations on construction by applying copy-on-write directly to the handle_ member.

widget w_1 = button(); // Only 1 allocation here.
widget w_2 = w_1;      // No copy.

Copies only happen when a non-const member function is called.

Pointer-based Inheritance vs. Erased Types

Allocations:

Operation Inh. Simple TE TE w/COW Construct 1 1 1 Copy 0 1 0 Assign 0 1 0 Alt. Interface 0* 1 1

Step 4: Apply the Small Buffer Optimization

std::array<int, 1024> big_array;
anything small = 1;                 // No allocation to store an int.
anything ref = std::ref(big_array); // No allocation to store a std::ref().
anything large = big_array;         // Allocation required here.

Pointer-based Inheritance vs. Erased Types

Allocations for small†/large objects:

Operation Inh. Simple TE TE w/SBO Construct 1/1 1/1 0/1 Copy 0/0 1/1 0/1 Assign 0/0 1/1 0/1 Alt. Interface 0/0* 1/1 0/1

†std::referece_wrapper is always small.

Step 5: Apply SBO and Integrated COW

std::array<int, 1024> big_array;
anything small = 1;                 // No allocation to store an int.
anything ref = std::ref(big_array); // No allocation to store a std::ref().
anything large = big_array;         // Allocation required here.
anything copied = large;            // No allocations for copies.

Pointer-based Inheritance vs. Erased Types

Allocations for small†/large objects:

Operation Inh. Simple TE TE w/SBO+COW Construct 1/1 1/1 0/1 Copy 0/0 1/1 0/0 Assign 0/0 1/1 0/0 Alt. Interface 0/0* 1/1 0/1

† Again, std::referece_wrapper is always small.

Copies only happen when a non-const member function is called.

The final implementation is a bit long to show here.

A Quick Review

What have we gained versus using inheritance?

What have we lost?

Gains

  • Value semantics

  • Never writing new or delete

  • Ability to bind types to interfaces never seen at the time of the types' writing, including multiple interfaces

  • Thread safety via copy-on-write

  • Elision of heap allocations for small types and references

Losses

  • Simplicity of implementation

  • Thread safety come at the cost of some atomic operations and copies when values are mutated

Another Option: Boost.TypeErasure

Boost.TypeErasure is the most robust library-based type erasure solution to date.

  • Uses metapgrogramming-based explicit v-table construction.
  • Supports casts directly to the held type, à la Boost.Any.
  • Supports free-function requirements.
  • Supports operator requirements.
  • Supports associated type requirements.
  • Supports concept maps.

An example from the online docs:

any<
    mpl::vector<
        copy_constructible<>,
        typeid_<>,
        incrementable<>,
        ostreamable<>
    >
> x(10);
++x;
std::cout << x << std::endl; // prints 11

An example of an erased type with a member function:

BOOST_TYPE_ERASURE_MEMBER((has_push_back), push_back, 1)

void append_many(any<has_push_back<void(int)>, _self&> container) {
    for(int i = 0; i < 10; ++i)
        container.push_back(i);
}

Caveats

  • The per-object v-table can create large objects.
  • All the metaprogramming can contribute to long compile times.
  • The error messages will make your eyes bleed.
  • The small buffer optimization is not supported.
  • Writing const member function requirements is not straightforward.

Inheritance vs. Hand-Rolled Type Erasure vs. Boost.TypeErasure

Allocations for small/large/ref held values:

Operation Inh. Optimized TE Boost TE Construct 1/1/- 0/1/0 1/1/0 Copy 0/0/- 0/0/0 1/1/0 Assign 0/0/- 0/0/0 1/1/0 Alt. Interface 0/0/-* 0/1/0 1/1/0

All values are the same for copies of the underlying value, except for the optimized hand-rolled version's binding to an alternate interface.

Space requirements, including held-object storage:

Technique Held Object Size Space Requirement Inheritance All P + O Hand-rolled TE Small P + B Hand-rolled TE Large P + B + O Boost TE All FM + O

P is the size of a pointer to a heap-allocated object, F is the size of a pointer-to-function, O is the size of the stored object, M is the number of members in an erased type's interface, and B is small buffer size.

One More Thing ...

The biggest advantage of using Boost.TypeErasure over a hand-rolled implementation is that your hands get a rest.

Q: How do we avoid large amounts of cut-and-pasted code?

A: We generate it, using emtypen.

emtypen

emtypen is an MIT-licensed tool that takes a real C++ header with one or more archetypal interfaces and produces the hand-rolled version of the type erasure techniques presented in this talk. It uses libclang. For example:

$ emtypen archetypes.hpp > generated.hpp

generated.hpp now contains the generated erased types, one for each archetypal struct/class in archetypes.hpp.

All the erased types in the code in this talk were generated using this tool.

Your input can be a real-boy C++ header, that you can syntax check with a real compiler.

As for the output,

  • Include guards are preserved.
  • Included headers are preserved.
  • Namespaces are preserved.
  • All struct/class definitions are converted to erased types.
  • Templated archetypes are supported.
  • Everything else is lost (including macros and comments, because libclang).

Archetype input:

#ifndef LOGGABLE_INTERFACE_INCLUDED__
#define LOGGABLE_INTERFACE_INCLUDED__

#include <iostream>


struct loggable
{
    std::ostream & log (std::ostream & os) const;
};

#endif

Erased type output:

#ifndef LOGGABLE_INTERFACE_INCLUDED__
#define LOGGABLE_INTERFACE_INCLUDED__

# include <iostream>

#include <algorithm>
#include <cassert>
#include <functional>
#include <memory>
#include <type_traits>
#include <utility>

#if defined(_MSC_VER) && _MSC_VER == 1800
#define noexcept
#endif

struct loggable {
public:
    // Lots of boilerplate code here.
    std :: ostream & log ( std :: ostream & os) const
    { assert(handle_); return handle_->log(os ); }

private:
    // Even more boilerplate code here.
};

#endif

emtypen generates your erased types based on a "form" file.

The form file contains the boilerplate code for generating one erased type.

emtypen takes optional command-line parameters that can be used to change the form file used, or the block of headers it includes near the top of the file.

You can completely change the boilerplate if you like. Forms are provided for all the hand-rolled versions of type erasure presented in this talk.

Go get it:

http://github.com/tzlaine/type_erasure

 

Questions?

0