hbase-datatypes-wip



hbase-datatypes-wip

0 0


hbase-datatypes-wip

Sides from the Hadoop Summit 2013 HBase Birds of a Feather Meetup

On Github ndimiduk / hbase-datatypes-wip

HBase Data Types (WIP)

Nick Dimiduk Member of Technical Staff, HBase HBase Birds of a Feather, Hadoop Summit 2013

motivation

  • byte[] is necessary but not sufficient for building applications
  • every application must (re-)solve it's own subset of the "serialization problem"
  • ecosystem tools must agree in order to interoperate
  • application code full of pre/post client API-call serialization litter

design

think holistically -- not just Java encoding format that preserves non-encoded order user-extensible API for defining types provide basic data types out of the box extend client API with type support extend MapReduce API with type support

tricky bits

  • encoding format => data on disk => painful to change
  • respect encoding performance overhead in tight loops
  • user extensible, user usable API
    • realistic API that real apps can build against
    • don't impost on POJOs (no forced interfaces, &c)
    • avoid magic (no ASM, no AOP, avoid ORM scorn)
  • community agreement on what data types to ship out of the box
  • more stuff I haven't {thought of,encountered} yet

inspiration

implementation

  • HBASE-8089: Add type support (parent ticket)
  • HBASE-8201: Implement serialization strategies (patch available)
  • HBASE-8694: Performance evaluation of serialization strategies (unassigned)
  • HBASE-8693: Implement extensible type API based on serialization primitives (WIP)
  • HBASE-7941: Provide client API with support for primitive types (unassigned)
  • HBASE-8593: Type support in ImportTSV tool (WIP)

future directions

  • consider data types anywhere else users touch HBase
    • bundled MR jobs
    • Filters
  • extend RegionServer API with type support (?!?)
    • Coprocessors?
    • StoreFile formats?
  • non-Java implementation for non-Java clients

thanks!