NCSoft Unveils VARCO-VISION, a Korean-Specialized Vision Language Model, and Five Multimodal Benchmarks

  • 작성자 : 홈페이지담당자
  • 작성일 : 2024.12.05
  • 조회 : 658

Logo of VARCO, the foundation for NCSoft’s VLM VARCO-VISION | Image Courtesy of NCSoft


NCSoft (Co-CEOs Taekjin Kim and Byung-Moo Park) introduced VARCO-VISION, a medium-scale open-source Vision Language Model (VLM) optimized for Korean, along with five Korean multimodal benchmarks, on December 4.

A VLM is a language model that processes text and image inputs simultaneously. With growing interest in multimodal research, VLMs are gaining prominence. However, most open-source VLMs are designed for English and Chinese, limiting Korean language support and forcing domestic enterprises to rely on APIs from global tech giants like GPT or Claude.

The newly unveiled VARCO-VISION supports Korean and English prompts and processes image inputs. Its capabilities rival Large Language Models (LLMs), allowing developers to handle image-text and text-only tasks with a single model. NCSoft emphasized that VARCO-VISION demonstrates top-tier performance among similarly scaled models, particularly in the Korean language. It excels in OCR, grounding, and referring expressions, offering accurate image recognition and reasoning results.

AI service developers can use VARCO-VISION to create multimodal AI applications with features such as image recognition, Q&A, image description, OCR, and object location detection. Content creators can benefit by automating detailed image descriptions, speeding up data collection, and improving planning efficiency. The model is also expected to be integrated into NCSoft’s upcoming VARCO Studio.

NCSoft has introduced five evaluation benchmarks to advance research in Korean AI models. These benchmarks aim to address the need for multimodal benchmarks for the Korean language, which has made accurate performance assessments challenging.

The new benchmarks include adaptations of three widely used English multiple-choice benchmarks (MMBench, SEED-Bench, and MMStar) and one subjective benchmark (LLava-in-the-wild). Additionally, NCSoft developed a unique benchmark, “K-DTCBench,” designed to assess comprehension of Korean documents, tables, and charts.

Yeonsu Lee, Head of NC Research, stated, “The release of VARCO-VISION and five Korean benchmarks underscores NCSoft’s leadership in the multimodal AI field. We plan to expand VLM applications to audio and video while enhancing content creation capabilities to support diverse industries.”
 

첨부파일

확인

아니오