Mirroring Microsoft’s efforts to develop compact yet capable AI language models, Apple has unveiled a collection of eight diminutive source-available AI language models collectively dubbed OpenELM, an acronym for “Open-source Efficient Language Models.” These models are specifically designed to operate directly on smartphones, a departure from the traditional approach of relying on cloud-based data centers for AI processing.

While these models are currently positioned as proof-of-concept research projects, they hold the potential to form the foundation for future on-device AI offerings from the tech giant. Apple has made the source code for OpenELM accessible on the Hugging Face platform under an Apple Sample Code License. However, due to certain restrictions within the license, it may not fully align with the commonly accepted definition of “open source,” albeit the source code remains publicly available.

Akin to Microsoft’s recently introduced Phi-3 models, which aim to deliver a practical level of language comprehension and processing performance within a compact package, Apple’s OpenELM models are engineered to operate locally on devices. Although Microsoft’s Phi-3-mini boasts 3.8 billion parameters, a metric that serves as a rough measure of an AI model’s capability and complexity, some of Apple’s OpenELM models are even smaller, ranging from 270 million to 3 billion parameters across eight distinct models.

For context, the largest model in Meta’s Llama 3 family currently includes a staggering 70 billion parameters, with a 400 billion version on the horizon. OpenAI’s groundbreaking GPT-3, released in 2020, shipped with an impressive 175 billion parameters. However, recent research efforts have focused on developing smaller AI language models that can match the capabilities of their larger counterparts from just a few years ago.

The eight OpenELM models come in two distinct variations: four as “pretrained” versions (essentially raw, next-token models) and four as instruction-tuned variants (fine-tuned for instruction following, which is more suitable for developing AI assistants and chatbots). The models are:

OpenELM-270M
OpenELM-450M
OpenELM-1_1B
OpenELM-3B
OpenELM-270M-Instruct
OpenELM-450M-Instruct
OpenELM-1_1B-Instruct
OpenELM-3B-Instruct

OpenELM features a maximum context window of 2048 tokens, with tokens being fragmented representations of data used by AI language models for processing. The models were trained on publicly available datasets, including RefinedWeb, a version of PILE with duplicates removed, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens of data.

Apple’s approach with OpenELM incorporates a “layer-wise scaling strategy” that purportedly allocates parameters more efficiently across each layer, thereby saving computational resources and improving the model’s performance while requiring fewer pre-training tokens. According to Apple’s released white paper, this strategy enabled OpenELM to achieve a 2.36 percent improvement in accuracy over Allen AI’s OLMo 1B, another small language model, while requiring only half as many pre-training tokens.

In addition to the OpenELM models themselves, Apple has also released the code for CoreNet, the library utilized to train OpenELM, along with reproducible training recipes that allow the neural network weights to be replicated. This level of transparency is relatively uncommon for a major tech company, underscoring Apple’s commitment to advancing open research, ensuring trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks.

As stated in its OpenELM paper abstract, transparency is a key objective for the company: “The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks.”

By releasing the source code, model weights, and training materials, Apple states that it aims to “empower and enrich the open research community.” However, the company also cautions that since the models were trained on publicly sourced datasets, “there exists the possibility of these models producing outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts.”

While Apple has not yet integrated these new AI language model capabilities into its consumer devices, rumors suggest that the upcoming iOS 18 update, expected to be revealed at WWDC in June, may include new AI features that leverage on-device processing to ensure user privacy. However, the company may potentially collaborate with Google or OpenAI to handle more complex, off-device AI processing, aiming to provide Siri with a long-overdue performance boost.

Despite the potential benefits of on-device AI processing, Apple candidly acknowledges that since the OpenELM models were trained on publicly sourced datasets, “there exists the possibility of these models producing outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts.” This candid admission underscores the ongoing challenges and ethical considerations surrounding the development and deployment of AI technologies.

As the race to develop smaller, more efficient, and more capable AI models intensifies, Apple’s OpenELM initiative positions the company as a formidable contender in this rapidly evolving landscape. By embracing transparency, reproducibility, and on-device processing, Apple aims to strike a balance between advancing AI capabilities and addressing privacy and ethical concerns, setting the stage for a potential paradigm shift in how AI is integrated into consumer devices.