KT Unveils Korea-Specific AI Benchmark Covering Rental Fraud and Dokdo Dispute
Developed with Korea University, Immediately Applicable to Other Cultures
Reflects 14,000 Evaluation Samples, the Largest Scale in Korea
On June 16, KT released the benchmark 'KSAFE-MM,' which evaluates how safely multimodal large language models (MLLMs) provide answers that reflect Korean social issues and cultural context.
This benchmark, co-developed with Korea University, consists of 'KSAFE-MM-G,' which translates global common risks into the Korean cultural context, and 'KSAFE-MM-C,' which reflects Korea-specific social issues such as rental fraud and the Dokdo dispute. Based on a total of 14,135 evaluation samples—the largest number in Korea—KT verified 12 global MLLMs, including Google Gemma and Naver HyperCLOVA X.
A key feature of this benchmark is the introduction of an automated, universal pipeline. Traditional benchmarks have relied on manual review, leading to high costs and low efficiency. In contrast, KSAFE-MM covers the entire process, from collecting sensitive topics based on local communities, generating template-based queries (questions that users input into the AI model), and synthesizing images, to generating jailbreak queries designed to cleverly bypass AI's safety measures or ethical restrictions.
This means that a standardized framework is now available, allowing rapid development of safety benchmarks that reflect local characteristics even without cultural experts, thereby reducing costs and increasing efficiency. The joint KT–Korea University research team demonstrated the pipeline’s global applicability through a pilot experiment (JSAFE-MM-C) adapted to Japanese, showing it can be applied instantly to any cultural context worldwide.
KT expects that the research results can be used for various purposes, including safety verification in actual AI service environments, red team testing, and guardrail model evaluation. The research findings and the benchmark are available on Archive and Hugging Face, making them accessible to all.
KT has played a key role throughout the research, centering its dedicated RAI organization on designing a safety risk classification system and implementing evaluation metrics and logic. This organization recently also released the multilingual text benchmark 'XL-SafetyBench.'
Hot Picks Today
"Tired of Constant Buying and Selling"... Where SamjeonNix Investors Are Turning [Click eStock]
- "Samsung Holds Incentive Bonus Feast"... Civil Servants Also Demand 7.1% Pay Increase
- High School Student Breaks Into Elementary Classroom... Bodily Fluid Found in Female Teacher's Tumbler, Urine on Chair
- "Please Give Up Your Seat": Middle-Aged Woman Refuses Pregnant Woman’s Request, Claims "I Am Also Pregnant"
- "Thought It Was Just Waste"... Urban Mining Pours Out 99.99% Pure Gold [Reportage]
Jae-Hyung Park, Executive Director and Head of KT AX Frontier AI Lab, said, "The release of the safety benchmark is more than just a data distribution—it's about establishing a foundation for the overall advancement of the AI safety research ecosystem. We hope KSAFE-MM will become the common standard for verifying AI safety in the Korean language and cultural context, both in academia and industry."
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.