
Despite the anticipation surrounding the Personalized Siri feature, Apple announced that its debut would be postponed until next year, following the roll-out of iOS 18.4. However, employee feedback suggests optimism within the company that the feature might actually launch later this year. Recent insights reveal how Apple is refining its AI training processes, particularly through the development of Apple Intelligence.
Innovative Training Methods: How Apple Utilizes Synthetic Data While Maintaining User Privacy
In light of the delay for Personalized Siri, a report from Bloomberg sheds light on Apple’s strategy for training its AI systems. The report references a blog from Apple’s Machine Learning Research, which discusses the use of synthetic data for training AI models.
Historically, critics have noted that Apple has been trailing behind its competitors in the AI arena. The company’s unconventional use of synthetic data has presented certain challenges. For instance, the method struggles to effectively interpret trends necessary for tools that require comprehensive summarization or articulate communication, such as drafting lengthy emails.
Recognizing these challenges, Apple has introduced an innovative approach that allows for the comparison of synthetic data with actual user emails, all while ensuring that user privacy is respected. This process is set to enhance the effectiveness of AI models for better communication features.
To improve our models we need to generate a set of many emails that cover topics that are most common in messages. To curate a representative set of synthetic emails, we start by creating a large set of synthetic messages on a variety of topics. For example, we might create a synthetic message, “Would you like to play tennis tomorrow at 11:30AM?”
This is done without any knowledge of individual user emails. We then derive a representation, called an embedding, of each synthetic message that captures some of the key dimensions of the message like language, topic, and length. These embeddings are then sent to a small number of user devices that have opted in to Device Analytics.
Participating devices then select a small sample of recent user emails and compute their embeddings. Each device then decides which of the synthetic embeddings is closest to these samples. Using differential privacy, Apple can then learn the most-frequently selected synthetic embeddings across all devices, without learning which synthetic embedding was selected on any given device.
These most-frequently selected synthetic embeddings can then be used to generate training or testing data, or we can run additional curation steps to further refine the dataset. For example, if the message about playing tennis is one of the top embeddings, a similar message replacing “tennis” with “soccer” or another sport could be generated and added to the set for the next round of curation (see Figure 1).This process allows us to improve the topics and language of our synthetic emails, which helps us train our models to create better text outputs in features like email summaries, while protecting privacy.
Although Apple acknowledges the limitations of its current approach, the new technology promises to provide a better understanding of user trends without infringing on privacy rights or collecting sensitive information. According to Bloomberg, this improved functionality is expected to be featured in the upcoming beta versions of iOS 18.5 and macOS 15.5. For further details, you can explore Apple’s comprehensive post on this topic.
Leave a Reply ▼