Data enrichment is the first step in the process to gain valuable insights that can benefit a company based on data collected through analytics or machine learning. It involves the merging of third-party data from an external authoritative source with an existing database of first-party customer data.
Why is it important ?
Data enrichment ideally improves the final ML model. But how?
Fill missing information:
- Cases where the fields do not contain any data
- Sometimes it is interesting to keep these records because the lack of information can be informative (e.g. fraud)
- Use of other databases often to add new fields keeping the same number of records
- Connect data, sometimes heterogeneous, with each other
Transformation stage (coding and standardization):
- A very dependent step in the choice of the data mining algorithm used
- Groupings: Cases where the attributes take a very large number of values discrete (e.g. addresses that can be grouped into 2 regions)
- Discrete attributes: Discrete attributes take their values (often textual) in a given finite set
- Two possible representations: vertical representation or horizontal or fragmented representation (more adapted to the search of data)
- Type changes to allow certain manipulations such as distance calculations, mean (e.g. date of birth)
- Scaling uniformity