Training and Testing

While preparing the data in a machine learning model, the best approach is to separate our dataset into two distinct parts from the beginning:

  1. The training set, which will allow us to train our model, and will be used by the learning algorithm
  2. The testing set, which measures the error of the final model on data it has never seen

We will simply pass this data as if it were data that we have never seen before. This is known as step training, testing the ML model, and measuring the performance of our model on this data. This is also called held-out data, to emphasize that it is not to be touched before the end of the process to make sure that the model works.

It's up to you to define the proportion of the dataset that you want to allocate to each part. In general, the data is typically separated as per the following proportions: 80% for the training set and 20% for the testing set.

 

Additional Resources:

Explorium delivers the end-game of every data science process - from raw, disconnected data to game-changing insights, features, and predictive models. Better than any human can.
Request a demo
New! Explorium Closes $75M Series C Amid Soaring Demand for External Data Learn More