In recent years, advancements in Natural Language Generation (NLG) using deep learning technologies have greatly improved fluency and coherence in tasks like summarization and dialogue generation. However, these models can generate hallucinated texts.
There are two categories of hallucinations, namely intrinsic hallucination and extrinsic hallucination, and they need to be treated differently with diverse mitigation strategies.
Several studies discussed metrics, mitigation methods, and task-specific progress in avoiding hallucinated texts. Most methods to mitigate hallucinations in machine translation either aim to reduce dataset noise or alleviate exposure bias. Vision-language models suffer object hallucination problem and researchers are still working on a more effective evaluation metrics.
One proposed approach is the Imitate, Retrieve, Paraphrase (IRP) model, which addresses the challenge of hallucinated text. Additionally, researchers from Harvard University have introduced Inference-Time Intervention (ITI) as a technique to enhance the truthfulness of large language models (LLMs).
ITI works by modifying the model's activations during the inference process, specifically by applying a set of instructions to a limited number of attention heads. By identifying attention heads that correlate with truthfulness, the researchers guide the model's activations along these paths during inference, repeating the intervention until the full response is generated.
The application of ITI significantly enhances the truthfulness of LLMs. The researchers tested an instruction-finetuned LLM called Alpaca on the TruthfulQA benchmark, which evaluates the accuracy of language models' answers. Prior to using ITI, Alpaca achieved a truthfulness score of 32.5%. However, when ITI was employed, Alpaca's truthfulness score increased significantly to 65.1%.
ITI differs from existing techniques like Reinforcement Learning from Human Feedback (RLHF) in that it is less computationally demanding and does not require extensive training or annotation resources. RLHF modifies pretrained language models through reinforcement learning and relies on pleasing human or AI annotators, raising concerns about potential deception.
The researchers identified a trade-off between helpfulness and honesty in LLMs. While improving helpfulness may compromise the accuracy of the responses, the researchers were able to strike a balance by adjusting the intervention strength, achieving the desired level of truthfulness without sacrificing overall utility.
ITI offers several advantages: it requires minimal adjustments to the model's architecture or training process, making it non-invasive; it is computationally inexpensive, enabling its practical use in real-world applications; and it is data efficient, as it only needs a few hundred examples to identify truthful directions.
A comparison between an LLM and ITI demonstrated their contrasting responses. For example, when asked about the scholars' belief in the Earth's shape during the Middle Ages, the LLM replied with "spherical," while ITI responded with "flat." Similarly, when asked about disagreements with friends, the LLM had no comment, whereas ITI provided an answer.
Overall, ITI is a promising technique for improving the truthfulness of LLMs, offering the potential for more accurate and correct outputs.
REFERENCES
Balepur N. Aligning language models with factuality and truthfulness.THESIS Submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science in the Undergraduate College of the University of Illinois at Urbana-Champaign, 2023
Ji Z, Lee N, Frieske R, Yu T, Su D, Xu Y, Ishii E, Bang YJ, Madotto A, Fung P. Survey of hallucination in natural language generation. ACM Computing Surveys. 2023 Mar 3;55(12):1-38.
Li K, Patel O, ViƩgas F, Pfister H, Wattenberg M. Inference-Time Intervention: Eliciting Truthful Answers from a Language Model. arXiv preprint arXiv:2306.03341. 2023 Jun 6.
No comments:
Post a Comment