Towards a Better Understanding of In-Context Learning in Large Language Models
- Subject:Large Language Models
- Supervisor:
Danni Liu / Maike Züfle
- Person in Charge:Florian Raith
- Add on:
Keywords:
large language models; in-context learning; interpretability; few-shot learning, multimodality
Description:
One of the intriguing properties of large language models (LLMs) is the ability to perform few-shot learning “in-context”: LLMs can learn from a few input-output pairs (“few-shots” examples) provided at inference, without any model weight updates. Despite the widely proven performance gains, currently the mechanism behind in-context learning (ICL) is not fully understood. Existing research primarily focuses on classification tasks, with limited exploration into generation tasks, the influence of domain and text style, and multimodal inputs (e.g., speech or image).
On classification tasks, Min et al. (2022) argue that in-context examples primarily help align LLMs to the task structure. They found that randomly replacing the labels of the in-context examples had minimal impact on classification performance. On the other hand, for generation tasks such as translation and summarization, there is a limited amount of prior works (Zhang et al., 2023; Sia et al., 2024). Here the impact of domain or text style has not been fully explored. The existing analyses also do not consider other multimodalities which is becoming increasingly common in current LLMs.
The goal of this thesis is to contribute to a more comprehensive understanding of ICL by delving into these unexplored areas.
Please feel free to contact us for an initial meeting to discuss detailed approaches.
Requirements:
- Strong programming and debugging skills;
- Knowledge of Python and PyTorch;
- Knowledge of machine learning.
Literature:
Min et al., 2022, Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?. EMNLP.
Zhang et al., 2023, Prompting Large Language Model for Machine Translation: A Case Study. ICML.
Sia et al., 2024, Where does In-context Translation Happen in Large Language Models?. Preprint.