On Github wdullaer / data-unification-slides
Created by Wouter Dullaert / @wouterdullaert
1 Retailer
A few retailers
A lot retailers and 1 distributor
A lot of retailers and a few distributors
TME to centralise operations accros Europe
All of customer data lives close to the customer
But relevant data is increasingly incoming centrally
How do we link all this data together and feed it back?
Mapping 2 sources
Linking 2 sources
Mapping 3 sources
Linking 3 sources
After some more time and sources
Which values do you retain?
What is the origin of the data?
What if multiple consumers of the data have different requirements for the merged entity?
How do you handle data updates in source systems?
How do you handle data updates in consuming systems?
Ingest in schema of source system (Removes friction)
Map fields from source systems into the target schema.
Machine learning assisted
Based on your mappings and the statistical profile of the attributes Tamr will offer mapping recommendations
Effort of mapping the data goes down as more sources are integrated
Use ML model to link entities across all the records
Train model by evaluating record pairs
A business user can do this!
Regularly train new pairs to keep the model in sync with changes in the sources
Multiple views that essentially describe how individual fields are merged
Views are functions over the data -> flexibility
Save all updates as immutable events
Create "pseudo" sources by providence
ML model will group events into clusters