Editions

ENGLISH

Infofla, “VLM-based automation can overcomes the limitations of existing AI that only answered.”

한국어

Infofla, “VLM-based automation can overcomes the limitations of existing AI that only answered.”

Posted August. 26, 2024 13:57,

Updated September. 03, 2024 13:35

한국어

Amazon Web Services (AWS), a representative of the cloud market, held 'AWS Summit Seoul 2024' at COEX in Seoul on May 16 and 17. The event, which analyzes the current status of the Korean cloud ecosystem and conveys the vision of AWS, started in 2015 and marks its 10th year this year.

AWS Summit 2024 was the largest cloud-related event in Korea, with some 60 partner companies setting up booths and 29,000 people participating. This event introduced a number of technologies, products and services that lead to generative AI-related innovations that have recently emerged as the biggest topic in the market.

Choi Inmook, CEO of Infofla, presenting at AWS Summit (source:itdonga)

The success case of AWS customers developing AI-related products and services in the AWS cloud ecosystem was also noticeable. Among them, the cases of startups and small and medium-sized enterprises also received attention. Infofla (CEO Choi In-mook), which presented the theme of 'Generative AI-Based IT Service Automation Technology and Cloud Service Development Journey' at the EXPO session on the 17th, is one of them.

Choi In-mook, CEO of Infofla started by clarifying the possibility and limitations of RPA(Robotic Process Automation) solution that automates repetitive tasks. In particular, he mentioned the need for professional help in order for non-specialists to use the existing script-based RPA smoothly. Also, when automating repetitive tasks within the web or app with RPA, RPA may stop due to failure to respond to problems such as sudden emergence of pop-up windows or screen failures.

VLM allows us to overcome the limitations of existing RPA. (source:infofla)

Infofla proposed 'VLM(Vision Language Model)' as a way to overcome the limitations of this existing RPA. This is the addition of image processing capabilities to the LLM (Large language model). It can overcome the limitations of existing RPAs, such as solving all problems with screen recognition, not scripts, and supporting remote environments.

In addition, it is possible to actively respond like a person by recognizing the screen even in unexpected situations, and above all, the ability improves as learning continues. For example, in the case of automating the task of repeatedly entering text into a web page based on VLM, even if a pop-up window appears during work, you can continue to analyze the screen and situation, close the pop-up window and enter text into the text box. Even existing RPAs can create scripts to respond to such situations, but as the script increases, the possibility of errors increases.

Infofla proposed 'VLAgent (VLM + Agent)' developed by itself. This means an agent model that can recognize screens and perform commands through VLM. Manual creation of scripts is unnecessary, and AI can understand the screen and automatically create and execute work plans and action plans.

This is a collection of recent trends such as LLM, AI agents, and business automation. Also, unlike existing AI services that simply answer user questions, it is possible to solve problems through execution. In addition, the existing model required excessively high processing power in the process of recognizing and responding to high-resolution images, and Korean users were uncomfortable because it did not support Hangul.

Configuration of 'VLAgent' (source:infofla)

Choi emphasized that Infofla developed its own solution with 4K high resolution and Hangul support, running on regular PCs, and data learning functions to solve these problems. VLAgent has been uploaded to AWS including 'RPACA', Infofla's real-time object recognition RPA, and can be used through 'ITOMS', Infofla's AI-based integrated management system.

Choi Inmook, CEO of Infopla, introducing the demo video of VLAgent (source:itdonga)

During the presentation, Choi showed a demonstration video. When he entered a commands into VLAgent saying, "Tell me the route from Konkuk University Station to Gangnam Station" on the Windows OS based PC, AI controlled the mouse and keyboard, ran the Chrome web browser on the desktop, selected the Google Map service, and automatically entered the starting point (Konkuk University Station) and destination (Gangnam Station) to search the route.

Choi said, "Our solution can be used in various fields such as service and business automation, support for the blind, entertainment business, education and manufacturing, medical care, and customer service," and "In fact, it is the first attempt in Korea, and it is hard to find similar solution overseas."

BY Kim Yong-Woo (pengo@itdonga.com)

Headline News

Ghibli-style AI art boom sparks copyright concerns

Opinion

About Dong-A Ilbo

Editions