There are three key factors for the success of machine learning applications; that is, algorithm, data, and computational resource. Prof. Zhi-Hua Zhou of Nanjing University disclosed that, classical machine learning theories concern of algorithm and data, ignoring the influence of computational resource. Indeed, machine learning generalization error bounds generally have terms on hypothesis class complexity and sample complexity, corresponding to algorithm and data aspects, respectively, but none term about the amount of computational resources.
To enable the influence of computational resources be considered by learning theory, Prof. Zhou proposed the CoRE (COmputational Resource Efficient)-learning framework. He defined the concept of “machine learning throughput”, with which the influence of computational resources on machine learning performance can be formulated and studied at an abstract level. Throughput is a classical concept in computer systems and database systems, while this is the first time for it to be formally introduced into machine learning theory.
A key component of CoRE-learning is “time-sharing”. Time-sharing is a concept with long history. Our computer systems in early stages, say 1950s, were working in “elusive” style which allows a computer to serve for only one user task. With Turing laureates Edgar F. Codd and Fernando J. Corbató’s great contributions, the elusive-style changed to the “time-sharing” style which allows a computer to serve multiple user tasks. Prof. Zhou indicated that, the current “intelligent supercomputing” facilities, however, is working nearly in elusive style where a “pre-determined” amount of computational resources are purchased/allocated to a user task, such as training a large language model, no matter whether resources may be wasted because more-than-enough are purchased/allocated, or the training may fail because less-than-enough are purchased/allocated.
CoRE-learning considers a task bundle, i.e., a set of task threads during a concerned time period, and the CoRE-learnability depends not only on the algorithm and data, but also the supply of computational resources, where a “scheduling” mechanism plays a vital role to allocate resources at runtime to different task threads smartly, resulting in both hardware efficiency and user efficiency.
In summary, the CoRE-learning theoretical framework enables, for the first time, the influence of the supply of computational resources on machine learning performances be considered in learning theory. It also provides a thread leading the change from the current “elusive” resource usage style in intelligent supercomputing facilities to the more efficient “time-sharing” style, shedding a light on power consumption reduction for machine learning model training which becomes more and more bothering for sustainable development.
Journal
National Science Review