Introduction: Benchmarking Cross-Task Generalization
The goal of Natural-Instructions project is to provide a good quality benchmark for measuring generalization to unseen tasks. This generalization hinges upon (and benefits from) understanding and reasoning with natural language instructions that plainly and completely describe a task (traditionally defined as mapping an input string to an output string). Models equipped with "understanding" language instructions, should successfully solve any unseen task, if they are provided with the task instructions.
Explore the data
You can explore the content of each task using the following interface:
Download the data
We have built two datasets for studying this goal. Our v1.x dataset leveraged the crowdsourcing templates of existing NLP datasets. This dataset consists of 61 tasks. The v2.x dataset is built upon the earlier work, has a simpler schema and contains over 1.5k tasks. We have collected these instructions with the help of many generous community contributors. The picture below shows the comparison of the dataset's schema:
You can download the data (the instructions for each task and their instances) from the following link:
Besides the dataset size, the two datasets are also different in complexity of their schemas. The first version has a more granular schema (figure below; left). However, the 2nd version has simpler representation (figure below; left). The purpose of this follow-up work was to study the effect of scale and therefore, we sacrificed schema granularity in favor dataset scale.
Relevant Papers
Here are the relevant papers:
-
Natural Instructions V1.x: Swaroop Mishra, Daniel Khashabi, Chitta Baral, Hannaneh Hajishirzi (2022).
Cross-task generalization via natural language crowdsourcing instructions.
ACL 2022
@inproceedings{naturalinstructions, title={Cross-task generalization via natural language crowdsourcing instructions}, author={Mishra, Swaroop and Khashabi, Daniel and Baral, Chitta and Hajishirzi, Hannaneh}, booktitle={ACL}, year={2022} }
-
Natural Instructions V2: Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi et al. (2022).
Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks.
arXiv 2022
@inproceedings{supernaturalinstructions, title={Super-NaturalInstructions:Generalization via Declarative Instructions on 1600+ Tasks}, author={Wang, Yizhong and Mishra, Swaroop and Alipoormolabashi, Pegah and Kordi, Yeganeh and Mirzaei, Amirreza and Arunkumar, Anjana and Ashok, Arjun and Dhanasekaran, Arut Selvan and Naik, Atharva and Stap, David and others}, booktitle={EMNLP}, year={2022} }