motivation
-
byte[] is necessary but not sufficient for building applications
- every application must (re-)solve it's own subset of the
"serialization problem"
- ecosystem tools must agree in order to interoperate
- application code full of pre/post client API-call serialization
litter
design
think holistically -- not just Java
encoding format that preserves non-encoded order
user-extensible API for defining types
provide basic data types out of the box
extend client API with type support
extend MapReduce API with type support
tricky bits
- encoding format => data on disk => painful to change
- respect encoding performance overhead in tight loops
- user extensible, user usable API
- realistic API that real apps can build against
- don't impost on POJOs (no forced interfaces, &c)
- avoid magic (no ASM, no AOP, avoid
ORM scorn)
- community agreement on what data types to ship out of the box
- more stuff I haven't {thought of,encountered} yet
implementation
-
HBASE-8089: Add
type support (parent ticket)
-
HBASE-8201:
Implement serialization strategies (patch available)
-
HBASE-8694:
Performance evaluation of serialization strategies (unassigned)
-
HBASE-8693:
Implement extensible type API based on serialization primitives
(WIP)
-
HBASE-7941:
Provide client API with support for primitive types (unassigned)
-
HBASE-8593:
Type support in ImportTSV tool (WIP)
future directions
- consider data types anywhere else users touch HBase
- extend RegionServer API with type support (?!?)
-
Coprocessors?
-
StoreFile formats?
- non-Java implementation for non-Java clients