Google’s Gemini AI Training Raises Eyebrows
A recent report from TechCrunch has unveiled some intriguing methodologies employed by Google in the training of its Gemini AI. As part of this developmental process, Google has enlisted contractors tasked with assessing the AI’s outputs on various criteria, including accuracy, clarity, and safety. What makes this situation noteworthy is the fact that these contractors are using Anthropic’s large language model (LLM), Claude, as a benchmark for comparing responses generated by Gemini.
Unusual Findings During Comparisons
In recent observations, contractors working within Google’s internal analysis platform discovered an unexpected issue: one of the outputs attributed to Claude was presented as Gemini’s response. Further scrutiny revealed that several responses exhibited Claude’s characteristics, particularly in terms of safety. For instance, one noteworthy example involved Claude declining to engage in role-playing scenarios as another AI assistant, drawing attention to the model’s cautious approach.
Implications of Anthropic’s Terms of Service
Adding a layer of complexity to this situation are the explicit terms of service set forth by Anthropic. These terms clearly prohibit the use of Claude’s outputs for the development of competing products:
Use Restrictions. Customer may not and must not attempt to (a) access the Services to build a competing product or service, including to train competing AI models except as expressly approved by Anthropic; (b) reverse engineer or duplicate the Services; or (c) support any third party’s attempt at any of the conduct restricted in this sentence. Customer and its Users may only use the Services in the countries and regions Anthropic currently supports.
Google’s Non-committal Response
When questioned by TechCrunch regarding whether they obtained the necessary permissions from Anthropic for utilizing Claude’s outputs, Google’s response was less than definitive. Shira McNamara, a spokesperson for DeepMind, commented:
Of course, in line with standard industry practice, in some cases we compare model outputs as part of our evaluation process.
Navigating the AI Landscape
While cross-model comparisons are a prevalent practice in AI development, where companies benchmark their models against rivals to gauge performance and devise improvements, the act of directly utilizing or imitating another model’s outputs without explicit consent poses significant ethical questions. This situation unfolds as the AI industry becomes increasingly competitive. Google has recently introduced an experimental version of Gemini that reportedly surpasses OpenAI’s GPT-4o in several evaluation categories. Meanwhile, Anthropic continues to advance Claude, with recent enhancements that enable versatile conversational styles and an integrated tool for writing and executing JavaScript code directly within its interface.
Leave a Reply