Data modeling is a process used to define and analyze data requirements needed to support business processes. It is essentially about identifying logical entities and logical dependencies between different entities. Data modeling is an abstract representation in the sense that the values of the observed individual data are ignored in favor of the structure, relationships, names, and formats of the relevant data, even though a list of valid values is often recorded. The data model must not only define the data structure, but also what the data really means (semantics).
When we talk about Big Data, the data modeling is a key concept. The common belief is that everything is solved when the data is stored. However, this is not enough:
Modeling helps data understanding :
Data modeling is required to understand the information that we are manipulating. With this in mind, data architects have set up information structures, rediscovered the concepts of sight, object, or star models, etc. The governance of the data has become a priority again along with the maintenance of well documented data catalogues.
Advanced companies like Google, Facebook, and Amazon establish metamodels and ontologies to understand, interpret, and navigate the data. The theory of graphs has emerged from its closet and now supports social network data analysts. Architects and data engineers have finally set up layered datalakes, which are capable of replacing decision-making warehouses, and providing the same level of readability as the old systems.
Data modeling , allowing a simple reading of the data of the company, is therefore essential for data science and data analytics teams to have the same basis for analysis. It’s also important to factorize the work and improve efficiency.