Publications – Emiel van Miltenburg

van Miltenburg, Emiel

Open Press Tilburg University, 2025, ISBN: 9789403769509.

Miltenburg, Emiel

Wat zou een grotere prioriteit moeten krijgen om Large Language Models (LLM’s) zoals ChatGPT veilig en betrouwbaar verder te ontwikkelen? Diversen

2025, (This article contains my response to a question submitted by a member of the public. The AI helpdesk contacted me to answer this question.).

Abstract | Links

Miltenburg, Emiel

Dual use issues in the field of Natural Language Generation Diversen

2025.

Abstract | Links

Liebrecht, Christine; Miltenburg, Emiel; Hooijdonk, Charlotte; Kunneman, Florian; Merckens, Anouk; Niessen, Nik

Hoe halen chatbots de kink uit de kabel?: Reparatiestrategieën bij onbegrip in een chatbotgesprek Tijdschriftartikel

In: Tijdschrift voor Communicatiewetenschap, vol. 52, nr. 3, pp. 288–325, 2024, ISSN: 1384-6930, (Publisher Copyright: © Christine Liebrecht, Emiel van Miltenburg, Charlotte van Hooijdonk, Florian Kunneman, Anouk Merckens & Nik Niessen.).

Abstract | Links

Miltenburg, Emiel; Braggaar, Anouck; Braun, Nadine; Goudbeek, Martijn; Krahmer, Emiel; Lee, Chris; Pauws, Steffen; Tomas, Frédéric

ReproHum: 0033-03: How Reproducible Are Fluency Ratings of Generated Text? A Reproduction of August et al. 2022 Proceedings Article

In: Balloccu, Simone; Belz, Anya; Huidrom, Rudali; Reiter, Ehud; Sedoc, Joao; Thomson, Craig (Ed.): Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, pp. 132–144, ELRA and ICCL, 2024.

Abstract | Links

Miltenburg, Emiel

Wat als Greta Monach Amerikaans was geweest?: Over keuzes, toeval, en de reproduceerbaarheid van digitale poëzie Tijdschriftartikel

In: Neerlandistiek.nl, 2024, ISSN: 1567-6633.

Links

Miltenburg, Emiel

Willekeurige gedichten Tijdschriftartikel

In: Neerlandistiek.nl, 2024, ISSN: 1567-6633.

Links

Braggaar, Anouck; Kunneman, Florian; Miltenburg, Emiel

Analyzing Patterns of Conversational Breakdown in Human-Chatbot Customer Service Conversations Proceedings Article

In: 2024.

Abstract | Links

Miltenburg, Emiel

Image captioning in different languages Diversen

2024.

Abstract | Links

Backus, Ad; Cohen, Michael; Cohn, Neil; Faber, Myrthe; Krahmer, Emiel; Laparle, Schuyler; Maier, Emar; Miltenburg, Emiel; Roelofsen, Floris; Sciubba, Eleonora; Scholman, Merel; Shterionov, Dimitar; Sie, Maureen; Tomas, Frédéric; Vanmassenhove, Eva; Venhuizen, Noortje; Vos, Connie

Minds: Big questions for linguistics in the age of AI Tijdschriftartikel

In: Linguistics in the Netherlands, vol. 40, nr. 1, pp. 301–308, 2023, ISSN: 0929-7332.

Links

Rasenberg, Marlou; Amha, Azeb; Coler, Matt; Koppen, Marjo; Miltenburg, Emiel; Rijk, Lynn; Stommel, Wyke; Dingemanse, Mark

Reimagining language: Towards a better understanding of language by including our interactions with non-humans Tijdschriftartikel

Abstract | Links

Miltenburg, Emiel; Braggaar, Anouck; Braun, Nadine; Damen, Debby; Goudbeek, Martijn; Lee, Chris; Tomas, Frédéric; Krahmer, Emiel

How reproducible is best-worst scaling for human evaluation? A reproduction of `Data-to-text Generation with Macro Planning' Proceedings Article

In: Belz, Anya; Popović, Maja; Reiter, Ehud; Thomson, Craig; Sedoc, João (Ed.): Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems, pp. 75–88, Incoma Ltd., Shoumen, Bulgaria, 2023.

Abstract | Links

Miltenburg, Emiel

Resource papers as registered reports: a proposal Tijdschriftartikel

In: Northern European Journal of Language Technology, vol. 9, nr. 1, pp. 1–6, 2023, ISSN: 2000-1533.

Abstract | Links

Huynh, Minh Hien; Lentz, Tomas; Miltenburg, Emiel

Implicit causality in GPT-2: a case study Proceedings Article

In: Amblard, Maxime; Breitholtz, Ellen (Ed.): Proceedings of the 15th International Conference on Computational Semantics, pp. 67–77, Association for Computational Linguistics, 2023.

Abstract | Links

Belz, Anya; Thomson, Craig; Reiter, Ehud; Abercrombie, Gavin; Alonso-Moral, Jose M.; Arvan, Mohammad; Braggaar, Anouck; Cieliebak, Mark; Clark, Elizabeth; Deemter, Kees; Dinkar, Tanvi; Dušek, Ondřej; Eger, Steffen; Fang, Qixiang; Gao, Mingqi; Gatt, Albert; Gkatzia, Dimitra; González-Corbelle, Javier; Hovy, Dirk; Hürlimann, Manuela; Ito, Takumi; Kelleher, John D.; Klubicka, Filip; Krahmer, Emiel; Lai, Huiyuan; Lee, Chris; Li, Yiru; Mahamood, Saad; Mieskes, Margot; Miltenburg, Emiel; Mosteiro, Pablo; Nissim, Malvina; Parde, Natalie; Plátek, Ondřej; Rieser, Verena; Ruan, Jie; Tetreault, Joel; Toral, Antonio; Wan, Xiaojun; Wanner, Leo; Watson, Lewis; Yang, Diyi

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP Proceedings Article

In: Tafreshi, Shabnam; Akula, Arjun; Sedoc, João; Drozd, Aleksandr; Rogers, Anna; Rumshisky, Anna (Ed.): The Fourth Workshop on Insights from Negative Results in NLP, pp. 1–10, Association for Computational Linguistics, 2023.

Abstract | Links

@inproceedings{168ea12dfee34500b890cff6d859e673,

title = {Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP},

author = {Anya Belz and Craig Thomson and Ehud Reiter and Gavin Abercrombie and Jose M. Alonso-Moral and Mohammad Arvan and Anouck Braggaar and Mark Cieliebak and Elizabeth Clark and Kees Deemter and Tanvi Dinkar and Ondřej Dušek and Steffen Eger and Qixiang Fang and Mingqi Gao and Albert Gatt and Dimitra Gkatzia and Javier González-Corbelle and Dirk Hovy and Manuela Hürlimann and Takumi Ito and John D. Kelleher and Filip Klubicka and Emiel Krahmer and Huiyuan Lai and Chris Lee and Yiru Li and Saad Mahamood and Margot Mieskes and Emiel Miltenburg and Pablo Mosteiro and Malvina Nissim and Natalie Parde and Ondřej Plátek and Verena Rieser and Jie Ruan and Joel Tetreault and Antonio Toral and Xiaojun Wan and Leo Wanner and Lewis Watson and Diyi Yang},

editor = {Shabnam Tafreshi and Arjun Akula and João Sedoc and Aleksandr Drozd and Anna Rogers and Anna Rumshisky},

doi = {10.18653/v1/2023.insights-1.1},

year  = {2023},

date = {2023-05-00},

booktitle = {The Fourth Workshop on Insights from Negative Results in NLP},

pages = {1–10},

publisher = {Association for Computational Linguistics},

abstract = {We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, and that all but one of the experiments we selected for reproduction was discovered to have flaws that made the meaningfulness of conducting a reproduction questionable. As a result, we had to change our coordinated study design from a reproduce approach to a standardise-then-reproduce-twice approach. Our overall (negative) finding that the great majority of human evaluations in NLP is not repeatable and/or not reproducible and/or too flawed to justify reproduction, paints a dire picture, but presents an opportunity for a rethink about how to design and report human evaluations in NLP.},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}

Sluiten

Miltenburg, Emiel; Clinciu, Miruna; Dušek, Ondřej; Gkatzia, Dimitra; Inglis, Stephanie; Leppänen, Leo; Mahamood, Saad; Schoch, Stephanie; Thomson, Craig; Wen, Luou

Barriers and enabling factors for error analysis in NLG research Tijdschriftartikel

In: Northern European Journal of Language Technology, vol. 9, nr. 1, pp. 1–22, 2023, ISSN: 2000-1533.

Abstract | Links

Miltenburg, Emiel

Evaluating NLG systems: A brief introduction Diversen

2023, (Originally published on the website of the International Conference on Natural Language Generation (INLG) 2023: https://inlg2023.github.io/eval_blog.html).

Abstract | Links

Coretta, Stefano; Casillas, Joseph V.; ...,; Miltenburg, Emiel; ...,; Roettger, Timo B.

Multidimensional Signals and Analytic Flexibility: Estimating Degrees of Freedom in Human-Speech Analyses Tijdschriftartikel

In: Advances in Methods and Practices in Psychological Science, vol. 6, nr. 3, pp. 25152459231162567, 2023.

Abstract | Links

@article{doi:10.1177/25152459231162567,

title = {Multidimensional Signals and Analytic Flexibility: Estimating Degrees of Freedom in Human-Speech Analyses},

author = {Stefano Coretta and Joseph V. Casillas and ... and Emiel Miltenburg and ... and Timo B. Roettger},

url = {https://doi.org/10.1177/25152459231162567},

doi = {10.1177/25152459231162567},

year  = {2023},

date = {2023-01-01},

journal = {Advances in Methods and Practices in Psychological Science},

volume = {6},

number = {3},

pages = {25152459231162567},

abstract = {Recent empirical studies have highlighted the large degree of analytic flexibility in data analysis that can lead to substantially different conclusions based on the same data set. Thus, researchers have expressed their concerns that these researcher degrees of freedom might facilitate bias and can lead to claims that do not stand the test of time. Even greater flexibility is to be expected in fields in which the primary data lend themselves to a variety of possible operationalizations. The multidimensional, temporally extended nature of speech constitutes an ideal testing ground for assessing the variability in analytic approaches, which derives not only from aspects of statistical modeling but also from decisions regarding the quantification of the measured behavior. In this study, we gave the same speech-production data set to 46 teams of researchers and asked them to answer the same research question, resulting in substantial variability in reported effect sizes and their interpretation. Using Bayesian meta-analytic tools, we further found little to no evidence that the observed variability can be explained by analysts’ prior beliefs, expertise, or the perceived quality of their analyses. In light of this idiosyncratic variability, we recommend that researchers more transparently share details of their analysis, strengthen the link between theoretical construct and quantitative system, and calibrate their (un)certainty in their conclusions.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Sluiten

Braggaar, Anouck; Tomas, Frédéric; Blomsma, Peter; Hommes, Saar; Braun, Nadine; Miltenburg, Emiel; Lee, Chris; Goudbeek, Martijn; Krahmer, Emiel

A reproduction study of methods for evaluating dialogue system output: Replicating Santhanam and Shaikh (2019) Proceedings Article

In: Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges, pp. 86–93, Association for Computational Linguistics, 2022, ISBN: 978-1-955917-60-5.

Abstract | Links

Braggaar, Anouck; Martijn, Gabriella; Liebrecht, C.; Hooijdonk, Charlotte; Miltenburg, Emiel; Kunneman, Florian; Krahmer, Emiel; Hoeken, Hans; Molder, Hedwig

Smooth operators. Development and effects of personalized conversational AI Conferentie

2022, (The 32nd Meeting of Computational Linguistics in The Netherlands, CLIN ; Conference date: 17-06-2022 Through 17-06-2022).

Abstract | Links

@conference{b341c2d529e540a0bf734fbccfd68be2,

title = {Smooth operators. Development and effects of personalized conversational AI},

author = {Anouck Braggaar and Gabriella Martijn and C. Liebrecht and Charlotte Hooijdonk and Emiel Miltenburg and Florian Kunneman and Emiel Krahmer and Hans Hoeken and Hedwig Molder},

url = {https://clin2022.uvt.nl/},

year  = {2022},

date = {2022-06-17},

abstract = {Organizations are increasingly implementing chatbots to provide customer service as chatbots are always available and can help customers quickly. However, there are still improvements to be made for chatbots to reach their full potential since 1) chatbot technology still faces some limitations, 2) customers perceive chatbot communication as unnatural and impersonal, and 3) customer service employees are still trying to find their way in collaborating with their new ‘colleague.’ In this 4-year NWO-funded project, we aim to develop and evaluate chatbots with a human touch to improve customers’ and employees’ collaboration and experience within a customer service context.In the first year of the project, we focused on the evaluation of customer service chatbots on the one hand, and the evaluation of the multifaceted collaboration between customer service employees and chatbots on the other hand.A systematic literature review was conducted to investigate how chatbots as task-based dialogue systems are evaluated within different fields of study. While the more technical fields (such as NLP) seem to focus to a great extent on automatic metrics, the more business-oriented fields (such as communication science) often make use of human evaluations. By conducting a search in four databases (ACL, ACM, IEEE and Web of Science) 3,800 records were retrieved that contained an evaluation of task-oriented dialogue systems/chatbots or discussed evaluation techniques. After screening, 146 studies were included in the literature review. These papers were assessed on what evaluation techniques were used, how they were used and in what context. The final goal of the study is to make an overview of metrics that are used in the technical fields and make them understandable and usable for the business-oriented fields.The perceptions of managers, conversational designers, and human agents regarding their criteria for evaluating human chatbot collaboration were examined by means of an interview study. Our study found that all parties used their own criteria to evaluate the collaboration and that the evaluation criteria used varied according to the job positions interviewees held. Managers evaluate the chatbot collaboration in terms of cost reduction. Conversational designers perceive both customers as well as human agents as their ‘customers’, focusing on customer satisfaction as their main evaluation criteria. Human agents evaluate the collaboration by looking at the extent to which collaborating with the chatbot has positively affected their job satisfaction and has resulted in traffic improvements. Finally, in terms of improvements, our results showed that both human agents and conversational designers advocate back-end integration of the chatbot to improve collaboration. However, it also became clear that with this collaboration, new dilemmas arise, such as team alignment and privacy issues related to the processing of personal data. Such insights could be considered in future chatbot design to make the collaboration within human chatbot teams run as smoothly as possible and in that respect benefit organizations, human agents, and customers.},

note = {The 32nd Meeting of Computational Linguistics in The Netherlands, CLIN ; Conference date: 17-06-2022 Through 17-06-2022},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

Sluiten

Organizations are increasingly implementing chatbots to provide customer service as chatbots are always available and can help customers quickly. However, there are still improvements to be made for chatbots to reach their full potential since 1) chatbot technology still faces some limitations, 2) customers perceive chatbot communication as unnatural and impersonal, and 3) customer service employees are still trying to find their way in collaborating with their new ‘colleague.’ In this 4-year NWO-funded project, we aim to develop and evaluate chatbots with a human touch to improve customers’ and employees’ collaboration and experience within a customer service context.In the first year of the project, we focused on the evaluation of customer service chatbots on the one hand, and the evaluation of the multifaceted collaboration between customer service employees and chatbots on the other hand.A systematic literature review was conducted to investigate how chatbots as task-based dialogue systems are evaluated within different fields of study. While the more technical fields (such as NLP) seem to focus to a great extent on automatic metrics, the more business-oriented fields (such as communication science) often make use of human evaluations. By conducting a search in four databases (ACL, ACM, IEEE and Web of Science) 3,800 records were retrieved that contained an evaluation of task-oriented dialogue systems/chatbots or discussed evaluation techniques. After screening, 146 studies were included in the literature review. These papers were assessed on what evaluation techniques were used, how they were used and in what context. The final goal of the study is to make an overview of metrics that are used in the technical fields and make them understandable and usable for the business-oriented fields.The perceptions of managers, conversational designers, and human agents regarding their criteria for evaluating human chatbot collaboration were examined by means of an interview study. Our study found that all parties used their own criteria to evaluate the collaboration and that the evaluation criteria used varied according to the job positions interviewees held. Managers evaluate the chatbot collaboration in terms of cost reduction. Conversational designers perceive both customers as well as human agents as their ‘customers’, focusing on customer satisfaction as their main evaluation criteria. Human agents evaluate the collaboration by looking at the extent to which collaborating with the chatbot has positively affected their job satisfaction and has resulted in traffic improvements. Finally, in terms of improvements, our results showed that both human agents and conversational designers advocate back-end integration of the chatbot to improve collaboration. However, it also became clear that with this collaboration, new dilemmas arise, such as team alignment and privacy issues related to the processing of personal data. Such insights could be considered in future chatbot design to make the collaboration within human chatbot teams run as smoothly as possible and in that respect benefit organizations, human agents, and customers.

Sluiten

Liebrecht, C.; Hooijdonk, Charlotte; Kunneman, Florian; Miltenburg, Emiel

“Hallo, ik ben Anna, uw virtuele assistent”: Talige kenmerken in customer service chatbots Tijdschriftartikel

In: DIXIT: tijdschrift over toegepaste taal- en spraaktechnologie, vol. 18, pp. 10–12, 2021, ISSN: 1572-6037.

Corone, Anna; Nanne, Annemarie; Miltenburg, Emiel

Controlling Social Media Data: a Case Study of the Effect of Social Presence on Consumers’ Engagement with Brand-generated Instagram Posts Proceedings Article

In: Hendrickx, Iris; Verheijen, Lieke; Wijngaert, Lidwien (Ed.): Proceedings of the 8th Conference on Computer-Mediated Communication CMC and Social Media Corpora (CMC-Corpora2021), pp. 25–29, Radboud University, 2021, (Conference on Computer-Mediated Communication CMC and Social Media Corpora, CMC-Corpora ; Conference date: 28-10-2021 Through 29-10-2021).

Abstract | Links

Gehrmann, Sebastian; Adewumi, Tosin; Aggarwal, Karmanya; Ammanamanchi, Pawan Sasanka; Aremu, Anuoluwapo; Bosselut, Antoine; Chandu, Khyathi Raghavi; Clinciu, Miruna-Adriana; Das, Dipanjan; Dhole, Kaustubh; Du, Wanyu; Durmus, Esin; Dušek, Ondřej; Emezue, Chris Chinenye; Gangal, Varun; Garbacea, Cristina; Hashimoto, Tatsunori; Hou, Yufang; Jernite, Yacine; Jhamtani, Harsh; Ji, Yangfeng; Jolly, Shailza; Kale, Mihir; Kumar, Dhruv; Ladhak, Faisal; Madaan, Aman; Maddela, Mounica; Mahajan, Khyati; Mahamood, Saad; Majumder, Bodhisattwa Prasad; Martins, Pedro Henrique; McMillan-Major, Angelina; Mille, Simon; Miltenburg, Emiel; Nadeem, Moin; Narayan, Shashi; Nikolaev, Vitaly; Rubungo, Andre Niyongabo; Osei, Salomey; Parikh, Ankur; Perez-Beltrachini, Laura; Rao, Niranjan Ramesh; Raunak, Vikas; Rodriguez, Juan Diego; Santhanam, Sashank; Sedoc, João; Sellam, Thibault; Shaikh, Samira; Shimorina, Anastasia; Cabezudo, Marco Antonio Sobrevilla; Strobelt, Hendrik; Subramani, Nishant; Xu, Wei; Yang, Diyi; Yerukola, Akhila; Zhou, Jiawei

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics Proceedings Article

In: Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021), pp. 96–120, Association for Computational Linguistics, 2021, (Workshop on Natural Language Generation, Evaluation, and Metrics , GEM2021 ; Conference date: 05-08-2021 Through 06-08-2021).

Abstract | Links

@inproceedings{c42a48c622ca4ea49e886a43cdf0b3f9,

title = {The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics},

author = {Sebastian Gehrmann and Tosin Adewumi and Karmanya Aggarwal and Pawan Sasanka Ammanamanchi and Anuoluwapo Aremu and Antoine Bosselut and Khyathi Raghavi Chandu and Miruna-Adriana Clinciu and Dipanjan Das and Kaustubh Dhole and Wanyu Du and Esin Durmus and Ondřej Dušek and Chris Chinenye Emezue and Varun Gangal and Cristina Garbacea and Tatsunori Hashimoto and Yufang Hou and Yacine Jernite and Harsh Jhamtani and Yangfeng Ji and Shailza Jolly and Mihir Kale and Dhruv Kumar and Faisal Ladhak and Aman Madaan and Mounica Maddela and Khyati Mahajan and Saad Mahamood and Bodhisattwa Prasad Majumder and Pedro Henrique Martins and Angelina McMillan-Major and Simon Mille and Emiel Miltenburg and Moin Nadeem and Shashi Narayan and Vitaly Nikolaev and Andre Niyongabo Rubungo and Salomey Osei and Ankur Parikh and Laura Perez-Beltrachini and Niranjan Ramesh Rao and Vikas Raunak and Juan Diego Rodriguez and Sashank Santhanam and João Sedoc and Thibault Sellam and Samira Shaikh and Anastasia Shimorina and Marco Antonio Sobrevilla Cabezudo and Hendrik Strobelt and Nishant Subramani and Wei Xu and Diyi Yang and Akhila Yerukola and Jiawei Zhou},

url = {https://www.aclweb.org/portal/content/first-workshop-generation-evaluation-and-metrics-acl-2021},

year  = {2021},

date = {2021-08-00},

booktitle = {Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)},

pages = {96–120},

publisher = {Association for Computational Linguistics},

abstract = {We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for the 2021 shared task at the associated GEM Workshop.},

note = {Workshop on Natural Language Generation, Evaluation, and Metrics , GEM2021 ; Conference date: 05-08-2021 Through 06-08-2021},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}

Sluiten

Miltenburg, Emiel; Clinciu, Miruna; Dušek, Ondřej; Gkatzia, Dimitra; Inglis, Stephanie; Leppänen, Leo; Mahamood, Saad; Manning, Emma; Schoch, Stephanie; Thomson, Craig; Wen, Luou

Underreporting of errors in NLG output, and what to do about it Proceedings Article

In: Proceedings of the 14th International Conference on Natural Language Generation, pp. 140–153, Association for Computational Linguistics, 2021, (The 14th International Conference on Natural Language Generation, INLG ; Conference date: 20-09-2021 Through 24-09-2021).

Abstract | Links

Miltenburg, Emiel; Lee, Chris; Krahmer, Emiel

Preregistering NLP research Proceedings Article

In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 613–623, Association for Computational Linguistics, 2021, (Human Language Technology Conference 2021, HLTCon ; Conference date: 16-03-2021 Through 18-03-2021).

Abstract | Links

Lee, Chris; Gatt, Albert; Miltenburg, Emiel; Krahmer, Emiel

Human evaluation of automatically generated text: Current trends and best practice guidelines Tijdschriftartikel

In: Computer Speech and Language: An official publication of the International Speech Communication Association (ISCA), vol. 67, pp. 1–24, 2021, ISSN: 0885-2308, (Funding Information: We received support from RAAK-PRO SIA (2014-01-51PRO) and The Netherlands Organization for Scientific Research (NWO 360-89-050), which is gratefully acknowledged. Furthermore, we want to extend our gratitude towards the anonymous reviewers and also towards Leshem Choshen, Ond?ej Du?ek, Kees van Deemter, Dimitra Gkatzia, David Howcroft, Ehud Reiter, and Sander Wubben for their valuable comments on the paper. Funding Information: We received support from RAAK-PRO SIA (2014-01-51PRO) and The Netherlands Organization for Scientific Research (NWO 360-89-050), which is gratefully acknowledged. Furthermore, we want to extend our gratitude towards the anonymous reviewers and also towards Leshem Choshen, Ondřej Dušek, Kees van Deemter, Dimitra Gkatzia, David Howcroft, Ehud Reiter, and Sander Wubben for their valuable comments on the paper. Publisher Copyright: © 2020 The Authors).

Abstract | Links

@article{d7e145ce52934367931192384e305b11,

title = {Human evaluation of automatically generated text: Current trends and best practice guidelines},

author = {Chris Lee and Albert Gatt and Emiel Miltenburg and Emiel Krahmer},

doi = {10.1016/j.csl.2020.101151},

issn = {0885-2308},

year  = {2021},

date = {2021-05-21},

journal = {Computer Speech and Language: An official publication of the International Speech Communication Association (ISCA)},

volume = {67},

pages = {1–24},

publisher = {Academic Press},

abstract = {Currently, there is little agreement as to how Natural Language Generation (NLG) systems should be evaluated, with a particularly high degree of variation in the way that human evaluation is carried out. This paper provides an overview of how (mostly intrinsic) human evaluation is currently conducted and presents a set of best practices, grounded in the literature. These best practices are also linked to the stages that researchers go through when conducting an evaluation research (planning stage; execution and release stage), and the specific steps in these stages. With this paper, we hope to contribute to the quality and consistency of human evaluations in NLG. (C) 2020 The Authors. Published by Elsevier Ltd.},

note = {Funding Information: We received support from RAAK-PRO SIA (2014-01-51PRO) and The Netherlands Organization for Scientific Research (NWO 360-89-050), which is gratefully acknowledged. Furthermore, we want to extend our gratitude towards the anonymous reviewers and also towards Leshem Choshen, Ond?ej Du?ek, Kees van Deemter, Dimitra Gkatzia, David Howcroft, Ehud Reiter, and Sander Wubben for their valuable comments on the paper. Funding Information: We received support from RAAK-PRO SIA (2014-01-51PRO) and The Netherlands Organization for Scientific Research (NWO 360-89-050), which is gratefully acknowledged. Furthermore, we want to extend our gratitude towards the anonymous reviewers and also towards Leshem Choshen, Ondřej Dušek, Kees van Deemter, Dimitra Gkatzia, David Howcroft, Ehud Reiter, and Sander Wubben for their valuable comments on the paper. Publisher Copyright: © 2020 The Authors},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Sluiten

Mille, Simon; Dhole, Kaustubh; Mahamood, Saad; Perez-Beltrachini, Laura; Gangal, Varun Prashant; Kale, Mihir; Miltenburg, Emiel; Gehrmann, Sebastian

Automatic Construction of Evaluation Suites for Natural Language Generation Datasets Proceedings Article

In: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1 (NeurIPS Datasets and Benchmarks 2021), 2021, (Conference on Neural Information Processing Systems 2021 : Datasets and Benchmarks , NeurIPS 2021 ; Conference date: 28-11-2021 Through 09-12-2021).

Abstract | Links

Miltenburg, Emiel; Lee, Chris; Castro-Ferreira, Thiago; Krahmer, Emiel

Evaluation rules! On the use of grammars and rule-based systems for NLG evaluation Proceedings Article

In: Proceedings of the 1st Workshop on Evaluating NLG Evaluation, pp. 17–27, Association for Computational Linguistics, 2020, (Workshop on Evaluating NLG Evaluation ; Conference date: 18-12-2020).

Abstract | Links

Miltenburg, Emiel; Lu, Wei-Ting; Krahmer, Emiel; Gatt, Albert; Chen, Guanyi; Li, Lin; Deemter, Kees

Gradations of Error Severity in Automatic Image Descriptions Proceedings Article

In: Proceedings of the 13th International Conference on Natural Language Generation, pp. 398–411, Association for Computational Linguistics, 2020.

Abstract | Links

Miltenburg, Emiel

How Do Image Description Systems Describe People? A Targeted Assessment of System Competence in the PEOPLE-domain Proceedings Article

In: Proceedings of the Second Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN), pp. 30–36, Association for Computational Linguistics, 2020, (Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge, LANTERN ; Conference date: 01-12-2020).

Abstract | Links

Howcroft, David M.; Belz, Anya; Clinciu, Miruna-Adriana; Gkatzia, Dimitra; Hasan, Sadid A.; Mahamood, Saad; Mille, Simon; Miltenburg, Emiel; Santhanam, Sashank; Rieser, Verena

Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definitions Proceedings Article

In: Proceedings of the 13th International Conference on Natural Language Generation, pp. 169–182, Association for Computational Linguistics, 2020, (International Conference on Natural Language Generation, INLG 2020 ; Conference date: 15-12-2020 Through 18-12-2020).

Abstract | Links

Lee, Chris; Gatt, Albert; Miltenburg, Emiel; Wubben, Sander; Krahmer, Emiel

Best practices for the human evaluation of automatically generated text Proceedings Article

In: Proceedings of the 12th International Conference on Natural Language Generation, pp. 355–368, Association for Computational Linguistics, 2020, (12th International conference on Natural Language Generation (INLG 2019) ; Conference date: 29-10-2019 Through 01-11-2019).

Abstract | Links

Miltenburg, Emiel

On the use of human reference data for evaluating automatic image descriptions Conferentie

2020, (2020 VizWiz Grand Challenge Workshop, VizWiz 2020 ; Conference date: 14-06-2020 Through 14-06-2020).

Abstract | Links

Miltenburg, Emiel

Toevallige Haiku's Diversen

2020.

Links

Ferreira, Thiago Castro; Lee, Chris; Miltenburg, Emiel; Krahmer, Emiel

Neural data-to-text generation: A comparison between pipeline and end-to-end architectures Proceedings Article

In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 552–562, Association for Computational Linguistics, 2019, (2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing , (EMNLP-IJCNLP) ; Conference date: 03-11-2019 Through 07-11-2019).

Abstract | Links

Miltenburg, Emiel

Pragmatic factors in (automatic) image description Proefschrift

Vrije Universiteit Amsterdam, 2019.

Links

Ibrahimi, Sarah; Chen, Shuo; Arya, Devanshu; Câmara, Arthur; Chen, Yunlu; Crijns, Tanja; Goes, Maurits; Mensink, Thomas; Miltenburg, Emiel; Odijk, Daan; Thong, William; Zhao, Jiaojiao; Mettes, Pascal

Interactive Exploration of Journalistic Video Footage Through Multimodal Semantic Matching Proceedings Article

In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2196–2198, ACM, 2019, ISBN: 9781450368896.

Abstract | Links

Miltenburg, Emiel; Kerkhof, Merel; Koolen, Ruud; Goudbeek, Martijn; Krahmer, Emiel

On task effects in NLG corpus elicitation: A replication study using mixed effects modeling Proceedings Article

In: Deemter, Kees; Lin, Chenghua; Takamura, Hiroya (Ed.): Proceedings of the 12th International Conference on Natural Language Generation (INLG 2019), 2019, (12th International conference on Natural Language Generation (INLG 2019) ; Conference date: 29-10-2019 Through 01-11-2019).

Abstract | Links

Miltenburg, Emiel; Elliott, Desmond; Vossen, Piek

Talking about other people: an endless range of possibilities Proceedings Article

In: Proceedings of the 11th International Conference on Natural Language Generation, pp. 415–420, Association for Computational Linguistics, 2018, (11th International Conference on Natural Language Generation, INLG 2018 ; Conference date: 05-11-2018 Through 08-11-2018).

Abstract | Links

Miltenburg, Emiel; Elliott, Desmond; Vossen, Piek

Measuring the Diversity of Automatic Image Descriptions Proceedings Article

In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1730–1741, Association for Computational Linguistics, 2018, (International Conference on Computational Linguistics 2018, COLING 2018 ; Conference date: 20-08-2018 Through 26-08-2018).

Abstract | Links

Miltenburg, Emiel; Kadar, Akos; Koolen, Ruud; Krahmer, Emiel

DIDEC: The Dutch Image Description and Eye-tracking Corpus Proceedings Article

In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 3658–3669, 2018, (International Conference on Computational Linguistics 2018, COLING 2018 ; Conference date: 20-08-2018 Through 26-08-2018).

Abstract | Links

Miltenburg, Emiel; Koolen, Ruud; Krahmer, Emiel

Varying image description tasks: spoken versus written descriptions Proceedings Article

In: Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, pp. 88–100, 2018, (5th Workshop on NLP for Similar Languages, Varieties and Dialects, VarDial ; Conference date: 20-08-2018).

Abstract | Links

Miltenburg, Emiel; Elliott, Desmond; Vossen, Piek

Cross-linguistic differences and similarities in image descriptions Proceedings Article

In: Proceedings of the 10th International Conference on Natural Language Generation, pp. 21–30, Association for Computational Linguistics, 2017.

Abstract | Links

Miltenburg, Emiel

Pragmatic descriptions of perceptual stimuli Proceedings Article

In: Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp. 1–10, Association for Computational Linguistics, 2017.

Abstract | Links

Son, Chantal; Miltenburg, Emiel; Morante, Roser

Building a Dictionary of Affixal Negations Proceedings Article

In: Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics (ExProM), pp. 49–56, The COLING 2016 Organizing Committee, 2016.

Abstract | Links