State-of-the-art NLP technologies such as neural question answering or information retrieval systems have enabled many people to access information efficiently. However, these advances have been made in an English-first way, leaving other languages behind. Large-scale multilingual pre-trained models have achieved significant performance improvements on many multilingual NLP tasks where input text is provided. Yet, on knowledge-intensive tasks that require retrieving knowledge and generating output, we observe limited progress. Moreover, in many languages, existing knowledge sources are critically limited. This workshop addresses challenges for building information access systems in many languages. In particular, we attempt to discuss several core challenges in this field, e.g.:
We cover diverse topics of cross-lingual knowledge-intensive NLP tasks such as cross-lingual question answering, information retrieval, fact verification, and information extraction. By grouping those tasks into a cross-lingual information access topic, we encourage the communities to work together towards building a general framework that supports multilingual information access.
Our first track seeks submissions in broad areas relevant to multilingual information access. Our main focus areas include:
We will also feature system descriptions from our shared task, which we expect to highlight the primary challenges to cross-lingual knowledge dependent NLP. We also encourage submissions on related topics, including:
Our second track provides a venue for well-curated multilingual datasets, even when they focus on one or two languages. We are considering releasing the collected dataset to the research community after carefully examining the quality. We follow the NeurIPS 2021 Datasets and Benchmarks Track for the submission guideline.
We will hold two sessions. Both will likely be virtual-only attendance. Session 1: Building Multilingual Resources will feature 3-4 expert speakers, who will each deliver a 15 minute talk, followed by a panel discussion. Session 2: Modeling for Low and Medium Resource Languages will feature 3-4 expert speakers, who will each deliver a 15 minute talk, followed by a panel discussion.