Artificial intelligence can produce answers that seem accurate but rely on outdated information. A South Korean research team has developed a method to identify and filter out these persistent timing-related errors.
On Tuesday, KAIST said a team led by professor Hwang Eui-jong of the School of Electrical Engineering, working with Microsoft Research, has created an evaluation system that automatically detects time-related errors in large language models.
AI systems are often effective at determining what is correct, but they struggle to judge whether an answer is accurate at a specific point in time. For example, when asked who was appointed minister last month, a chatbot may name someone from a year earlier. When asked for the current won-dollar exchange rate, it may return figures from months ago.
As large language models are used more widely in fields such as medicine and law, ensuring the reliability of their responses has become increasingly important. Existing evaluation methods, however, have focused mainly on factual accuracy and have not adequately addressed whether information is up to date.
To address the issue, the research team introduced a “temporal database” into AI evaluation, allowing systems to track how information evolves over time. The approach goes beyond verifying whether an answer is correct, assessing instead whether the dates and timeframes cited are also accurate.
Using this method, the team improved detection of so-called temporal hallucinations, cases where responses appear correct but rely on faulty time references, by an average of 21.7 percent compared with conventional approaches.
Hwang said the framework could serve as a practical basis for evaluating AI performance in specialized fields. “By turning large-scale expert data into evaluation resources, we expect this approach to support more reliable validation of AI systems in areas such as healthcare and law,” he said.
한채연 chaezip@donga.com