Data Enrichment

Data enrichment is the first step in the process to gain valuable insights that can benefit a company based on data collected through analytics or machine learning. It involves the merging of third-party data from an external authoritative source with an existing database of first-party customer data.

Why is it important ?

Data enrichment ideally improves the final ML model. But how?

Fill missing information:

  • Cases where the fields do not contain any data
  • Sometimes it is interesting to keep these records because the lack of information can be informative (e.g. fraud)

Enrichment:

  • Use of other databases often to add new fields keeping the same number of records
  • Connect data, sometimes heterogeneous, with each other

Transformation stage (coding and standardization):

  • A very dependent step in the choice of the data mining algorithm used
  • Groupings: Cases where the attributes take a very large number of values discrete (e.g. addresses that can be grouped into 2 regions)
  • Discrete attributes: Discrete attributes take their values (often textual) in a given finite set
  • Two possible representations: vertical representation or horizontal or fragmented representation (more adapted to the search of data)
  • Type changes to allow certain manipulations such as distance calculations, mean (e.g. date of birth)
  • Scaling uniformity

 

Additional Resources:

Explorium delivers the end-game of every data science process - from raw, disconnected data to game-changing insights, features, and predictive models. Better than any human can.
Request a demo
New! Explorium Closes $75M Series C Amid Soaring Demand for External Data Learn More