Public Accessibility#
Finally, in this
Github repository, we showcase scripts to perform inference on T0 with one
or multiple GPUs, along with instructions to reproduce training or evaluation reported in our paper (Sanh et
al., 2021).
Conclusion#
The ability to generalize to new tasks is the cornerstone of a general AI model. We are
excited about T0 because we show that it is possible to train a smaller large language model with comparable
generalization performance to models with 100s of billions of parameters. We showcase how we can apply T0 to
cooking recommendation and answering world knowledge, and we are excited to see more of its novel applications and further
research on zero-shot learning.
Acknowledgments#
We would like to acknowledge the co-authors for this blog post: Yong Zheng-Xin, Victor
Sanh, and Steven Liu.
Thanks to those who provided ideas for applications of T0: Colin Raffel, Victor Sanh,
Lintang Sutawika, Zaid Alyafeai, M Saiful Bari, Yong Zheng-Xin, and Albert Webson.
Thanks to those who contributed prompts and figures: Eliza Szczechla, Stella Biderman, and
Colin Raffel.
Thanks to the prompt-engineering subgroup at BigScience for creating T0 and providing
feedback on the blog post.
References#
[1] BIG-bench collaboration. “Beyond the imitation game: Measuring and extrapolating the
capabilities of language models.” In preparation, 2021.
[2] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena,
Yanqi Zhou, Wei Li, and Peter J. Liu. “Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer”. In Journal of Machine Learning Research, 2020.
[3] Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and
Jacob Steinhardt. “Measuring Massive Multitask Language Understanding.” In Proceedings of the
International Conference on Learning Representations (ICLR), 2021.
[4] Michał Bień, Michał Gilski, Martyna Maciejewska, Wojciech Taisner, Dawid Wisniewski,
and Agnieszka Lawrynowicz. “RecipeNLG: A cooking recipes dataset for semi-structured text generation.” In
Proceedings of the 13th International Conference on Natural Language Generation, 2020.
[5] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla
Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel
Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu,
Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark,
Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. “Language Models are
Few-Shot Learners.” In Advances in Neural Information Processing Systems, 2020.
[6] Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid
Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu,
Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti
Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit
Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault
Fevry, Jason Alan Fries, Ryan Teehan, Stella Biderman, Leo Gao, Tali Bers, Thomas Wolf, and Alexander M.
Rush. “Multitask Prompted Training Enables Zero-Shot Task Generalization.” Preprint
(arXiv:2110.08207), 2021.