Adaptive ML trains Gemma 3 for exceptional multilingual results
Adaptive ML aids SK Telecom in creating a version of Gemma that can moderate customer support at a fraction of the size, latency, and cost.
Adaptive ML is the team behind Adaptive Engine, a reinforcement learning platform that enables enterprises to fine-tune, evaluate, and serve small, specialized models. South Korean telecom giant SK Telecom chose Adaptive ML to train a multilingual customer service moderation LLM to support SKT's more than 23 million subscribers, who speak a mix of both English and Korean.
After training with the Adaptive Engine, Adaptive ML and SK Telecom found that Gemma 3 4B could meet or exceed the performance of both proprietary and open models with significantly lower cost, making it a compelling choice for their customer service use case.
The challenge
It’s often difficult for LLMs to maintain compliance with a business’s unique content policies. For businesses in the online customer service industry, unique content policies curate a safe and respectful environment for customers and employees alike. Models are used to identify and respond to harmful written content across customer service chats and emails, with every company having their own criteria and policies regarding such language. It’s harder to teach those nuances to off-the-shelf proprietary models, so they often miss content that should be marked as adult, harmful, biased, or illegal.
That challenge is compounded when businesses like SK Telecom require a multilingual solution, because many of the leading open and proprietary LLMs focus on western languages like English and don’t perform as well in eastern languages like Korean.
While some larger models might meet the bar for identifying harmful content, their parameter size leads to high inference costs and too much latency. This led the Adaptive ML team to create Adaptive Engine, which offers an enhanced level of control and flexibility when training smaller scale models, while also lowering infrastructure costs and latency.
SK Telecom’s evaluation which demonstrates Gemma 3 4B powerful Korean performance relative to its size in identifying harmful content.
For each model the ability to recognize toxic content is measured across seven dimensions of “unsafe language”, and then an Aggregate score is calculated (0-1 scale, higher is better).
The solution
Adaptive ML selected Gemma 3 4B for its size and performance, along with multiple other open models to fine-tune for SK Telecom with Adaptive Engine. The models were first supervised fine-tuned (SFT) on 8K Korean samples, followed by further training with proximal policy optimization (PPO). A similarly-sized dataset for harmful content detection in English was used to train and assess the models’ ability to identify toxicity across multiple languages.
Working with Gemma 3 was a positive experience for the team. “We downloaded the model directly from Hugging Face and it was easy to convert to Adaptive ML’s internal format thanks to Gemma’s detailed documentation,” said Alessandro Cappelli, Co-Founder and Research Scientist at Adaptive ML. “The game-changer for our use case was the model's strong multilingual capabilities and long context windows.”
The impact
After training with Adaptive Engine, Adaptive ML evaluated the models using content moderation tests conducted in both Korean and English. And thanks to the SFT and PPO training, Gemma 3 4B now performs in Korean as well as it does in English, delivering the multilingual performance required to meet SK Telecom’s customer service needs.
Effective content moderation in Korean demands an understanding of cultural nuance where standard APIs fail. We were impressed that by using advanced reinforcement learning, a small, open 4B model could achieve precision that outperforms even large proprietary systems in both Korean and English. This provides a highly accurate, low-latency solution, giving us greater strategic control while keeping our customer data secure.
Eric Davis, Vice President of the AI Tech Collaboration Group at SK Telecom
In both English and Korean, Gemma 3 4B demonstrated the best performance relative to its size, beating open and proprietary models twice its size. For instance, in the Korean evaluation, Gemma 3 4B trained with PPO achieved a 0.80 aggregate score, exceeding GPT-4o and Claude Sonnet 3.7 (0.76 and 0.77 respectively) and almost matching the Llama 8B trained with SFT alone.
“Gemma 4 3B was able to exceed frontier performance at a fraction of the size, offering a lower latency, lower cost alternative for customer support moderation,“ said Cappelli.
What’s next
When reflecting on his experience with Gemma, Cappell feels the team spent “too much time debating between the 12B and 27B versions, when, in the end, Gemma 4B worked surprisingly well for our use case,” and recommends that other developers “Give small LLMs the opportunity; they’re likely to surprise you.”
Going forward, the team plans to continue to work with Gemma. “These results are very exciting and validate the use of Gemma 4B for more use cases, particularly other customer support workflows,” concludes Cappelli.