Machine learning (a subset of artificial intelligence) involves the advancement of computer algorithms that evolve and improve over time through learned experience. Because these machines learn through repetition, these models are built on training data or sample sets of data. This training develops in the machine the ability to act without relying on a specific program.
The problem is that the data used to build these models can be expensive, and the training models are only effective if they’re fed the right quantity and quality of data, neither of which is easy or cheap to obtain. Additionally, the cost of training models is expensive. It can cost a large amount of money in computer resources to train a single AI model. Industry analysts predict that worldwide spend on AI will double from $50 billion this year to more than $100 billion by 2024.
To emphasize the issue, a study by Dimensional Research revealed that 96% of organizations run into problems with training data quality and quantity. The same study claims that most machine learning model projects require more than 100,000 data samples to perform effectively. Data acquisition and data labeling can sometimes be expensive — amplifying the reasons machine learning’s potential and reach has yet to be fully realized.
Adding to this complexity, a machine learning model is designed for only one specific task — for example, identifying a cat. With the right quantity and quality of data, a model can perform that task with unprecedented speed and precision, but that’s the limit to its performance, which is an obvious limitation to achieving ROI. Suffice to say, the task of building models comes with many hurdles and stumbling blocks.
Transfer learning is how machine learning becomes ubiquitous
A deep learning neural network technique called transfer learning takes knowledge gained from a pretrained model that was used to solve a specific task and applies it to a different yet similar problem within the same domain (a related task).
This can usually be done with much less data because the initial model’s baseline knowledge of shapes, colors, gradients, temperatures and so on is easy to transfer. For example, the knowledge gained from one model that is trained on 1,000 images to recognize cats could be fine-tuned with the addition of a small number of labeled examples to recognize a new target domain of dogs.
With transfer learning, data scientists and engineers don’t need to train models from scratch to benefit from the powerful knowledge gained by these robust deep training models. Essentially, transfer learning is about not having to reinvent the wheel, which helps with accessibility. It also removes the bulk of the development process, thereby drastically reducing the cost to build models so we can reach economies of scale with AI.
A prime example in this field is the largest natural language machine learning model ever built: the GPT-3. This powerful open API program (created by research lab OpenAI) uses machine learning, deep learning and transfer learning to produce humanlike predictive text. From authoring a blog to developing websites to composing music, GPT-3’s ability to transfer learning from machine to machine is powerful and impressive. The full version of the autoregressive language model has the capacity of 175 billion machine learning parameters, and performance is significantly enhanced with each new task it performs. But these exciting capabilities come at a steep cost. The cost to train this GPT-3 model was almost $12 million — an unbearable investment for most companies and organizations.
The advantages of these deep learning neural networks are immense, but there are challenges: Deep learning neural networks demand large batches of compute servers, require massive amounts of training data and a significant amount of time to train. Additional challenges with transfer learning include data collection, data shift (where training produces an insufficient model because the data fails to match real-world cases), and underspecification. Underspecification is a statistics issue where observed effects can have multiple possible outcomes, and it has to do with the way models are trained and tested.
Fortunately, large well-known companies, including AWS, Microsoft and NVIDIA, have taken on the lofty task of developing prebuilt, powerhouse transfer learning toolkits to remove the burden of building models from scratch, address most of these challenges, and expedite production machine learning. (Full disclosure: Author’s company has partnerships with NVIDIA and AWS.) These capable toolkits have the potential to change the development ecosystem by allowing the less tech-savvy to build ML models. Previously, high demand for specialized engineers was needed to build these models, but with transfer learning, the demand for domain expertise will soon outgrow the demand for engineers.
The ability to transfer learning from one machine to another within the same domain is how machine learning becomes accessible and affordable for most businesses. This same theory leads back to a previous article I wrote about the concept of reuse as the engine that drives a healthy economy. With transfer learning, most obstacles to production machine learning are eradicated. Consequently, more machine learning applications can be developed at a rapid pace.