The world of artificial intelligence (AI) mostly exists in cloud-computing facilities and rarely touches your smartphone. When you use a tool like ChatGPT to answer a prompt, the hard work of training the program, so that it functions properly, has been done days, weeks, and months before, behind the scenes, in the enormous AI data centers built by Microsoft and others.
However, 2024 could be the year the divide is crossed -- and it could be when AI starts to learn in your pocket. Efforts are underway to make it possible to train a neural net -- even a large language model (LLM) -- on your personal device, with little or no connection to the cloud.
Also:I'm taking AI image courses for free on Udemy with this little trick - and you can too
The most obvious benefits of on-device training include: avoiding the delay incurred by having to connect to the cloud; learning from local information on a constant and personalized manner; and preserving privacy that would be violated by sending personal data to a cloud data center.
The impact of on-device training could be a transformation in the capabilities of neural networks. AI could be personalized to your own actions as you walk around, tapping, scrolling, and dragging. AI could learn from the environments you pass through during your daily routine, gathering signs about the world.
Also:How Apple's AI advances could make or break the iPhone 16
Recent work by Apple engineers suggests the company is looking to bring larger neural networks, the "generative" kind represented by OpenAI's ChatGPT, to run locally on the iPhone.
More broadly, Google introduced a radically scaled-down AI approach called TinyML several years ago. TinyML can run neural nets in devices with as little as a milliwatt of power, such as smart sensors placed on machinery.
The greater challenge for technology companies is to make those kinds of neural networks not just perform predictions on a phone, but also learn new things on a phone -- to carry out training locally.
That effort takes far more processing power, far more memory, and far more bandwidth for any computer to train a neural net than to use the finished neural net to make predictions.
Also:Machine learning at the edge: TinyML is getting big
Efforts have been underway to conquer that computing mountain by doing things such as selectively updating only portions of the neural net's "weights" or "parameters." A signature effort there is MIT's TinyTL, which uses what's called transfer learning as a way to refine a neural net that is already mostly trained.
TinyTL has so far been used for small things, such as facial recognition. But the state of the art is now moving to tackling the LLMs of generative AI, including OpenAI's GPT-4. The LLMs have hundreds of billions of neural weights that need to be kept in memory, and then passed to the processor to be updated as new information comes in. This training challenge takes place on a scale never before attempted.
Also: 7 ways to make sure your data is ready for generative AI
A research report this month by staff at European chip-making giant STMicroelectronics makes the case that it's not enough in these training efforts to perform inference on mobile devices -- instead, the client device must also train the neural network to keep it fresh.
"Enabling only model's inference on the device is not enough," write Danilo Pietro Pau and Fabrizio Maria Aymone. "The performance of the AI models, in fact, deteriorates as time passes since the last training cycle; phenomenon known as concept drift," for which the solution is to update the program with new training data.
Also:How Google and OpenAI prompted GPT-4 to deliver more timely answers
The authors suggest slimming down a neural net, so it's easier to train a model on a memory-constrained device. Specifically, they experiment with removing what's called "back-propogation", the mathematical method in LLMs that is the most compute-intensive part of training.
Pau and Aymone found that replacing back-propogation with simpler math could reduce the amount of on-device memory needed for the neural weights by as much as 94%.
Some scientists advocate for splitting up the training task among many client devices, which is called "federated learning".
Researchers Chu Myaet Thwal and team at Kyung Hee University this month adapted a form of LLM used for image recognition across as many as 50 workstation computers, each running a single Nvidia GPU gaming card. Their code took less memory on the device to train than the standard version of the neural net without losing accuracy.
Some experts, meanwhile, argue network communications will have to be adjusted, so mobile devices can communicate better when performing federated learning.
Also:AI will change software development in massive ways, says MongoDB CTO
Scholars at the Institute for Electrical and Electronic Engineering this month hypothesized a communications network using the forthcoming 6G standard, where the bulk of LLM training is completed first in a data center. Then, the cloud coordinates a bunch of client devices that "fine-tune" the LLM with local data.
Such "federated fine-tuning", where each device learns some portion of an LLM, without starting from scratch, can be done with a lot less processing power on the battery-powered device than in full training.
Many approaches aim to reduce the memory and processing required for each neural weight. The ultimate approach is what's called "binary neural networks", where instead of each weight having a numeric value, the weights have only a one or a zero, which vastly reduces the amount of on-device storage required.
Also:Problems scaling AI? MIT proposes sub-photon optical deep learning at the edge
A lot of the technical concerns mentioned above sound abstract, but consider some of the use cases of training a neural net locally.
A team at Nanyang Technological University in Singapore this month used on-device learning to counter cyber threats by having each individual device train its own local version of an AI-based "intrusion-detection system" or IDS, which is a common cybersecurity program.
Instead of the client devices having to interact with a central server, the team was able to download an initial draft of the IDS code and then fine-tune it for local security conditions. Not only is such training more specific to a local security threat, it also prevents the passing of sensitive security information back and forth over the network, where it could be intercepted by malicious parties.
Apple is rumored to be eyeing greater on-board AI functionality for iOS devices and has offered clues to what could be completed in a mobile context.
In a paper in August, Apple scientists described a way to automatically learn the qualities of mobile apps, called the Never-ending UI Learner. The program runs on a smartphone and automatically presses buttons and undertakes other interactions to determine which kinds of controls a user interface requires.
The aim is to use each device to automatically learn, rather than relying on a bunch of human workers who spend their time pressing buttons and annotating app functions.
The experiment was undertaken in a controlled setting by Apple staff. If the trial was attempted in the wild using real customers' iPhones, then "a privacy-preserving approach would be needed (e.g., on-device training)," the authors write.
Another mobile-based concept was described by Apple scientists in 2022 in a paper titled "Training Large-Vocabulary Neural Language Models by Private Federated Learning for Resource-Constrained Devices".
Their goal was to train speech-recognition AI on mobile devices using the federated learning approach.
Also: Nvidia makes the case for the AI PC at CES 2024
Each person's device uses samples of interactions with a "voice assistant" (probably Siri) to train the neural net. Then, the neural network parameters developed by each phone are sent to the network, where they're aggregated to make one improved neural net.
The big takeaway from all these research efforts is that scientists are hard at work trying to find ways of compressing and dividing the work of training to make it feasible on battery-operated devices with less memory and less processing power than workstations and servers.
Whether this research effort breaks through in 2024 remains to be seen. However, what's already clear is that the training of neural networks is going to move out of the cloud and, quite possibly, into the palm of your hand.