Runavekbol logo
Runavekbol ai chatbot mentorship
Scientific Publications Review

AI Chatbot
Research
Digest

Peer-reviewed studies on conversational AI — curated, summarised, and attributed to the researchers behind the work.

Each entry traces back to its original publication. We review the methodology and highlight what the findings actually say — not what headlines made of them.

Topics span intent recognition, dialogue management, transformer fine-tuning, and real-world deployment challenges in production chatbot systems.

AI chatbot research and development process illustration

01 Recent Findings

Six studies reviewed by our mentors. Each summary links the finding to its source paper and lead author.

Intent Recognition

Few-Shot Intent Detection via Contrastive Pre-Training and Fine-Tuning

Ting-Yun Chang and Richard Xu (University of New South Wales, 2022) tested contrastive pre-training on intent classification tasks where labelled examples are sparse. Their approach achieved strong performance on CLINC150 with as few as five labelled examples per class — a practical scenario for most production chatbot projects. The key takeaway is that pre-training on unsupervised utterance pairs before fine-tuning consistently outperformed standard transfer from BERT alone. For practitioners, the paper provides a replicable pipeline that requires no domain-specific annotation beyond a small seed set.

"Contrastive objectives align utterance representations before any task labels are introduced — reducing the label dependency that bottlenecks most intent classifiers." — AAAI 2022 Workshop on Conversational AI
TC
Ting-Yun Chang UNSW Sydney · NLP Research
Dialogue State Tracking

TripPy: A Triple Copy Strategy for Value Independent Neural Dialogue State Tracking

Michael Heck et al. (University of Stuttgart, 2020) addressed the fragility of slot-value ontologies by copying values directly from dialogue context. TripPy reduces dependency on closed vocabularies — a real limitation when deploying chatbots in domains where terminology shifts frequently.

MH
Michael Heck Univ. Stuttgart · Dialogue Systems
Response Generation

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

Yizhe Zhang et al. (Microsoft Research, 2020) trained a GPT-2 variant on 147M Reddit conversation threads. Human evaluations rated DialoGPT responses as more contextually relevant than earlier retrieval-based systems — the study remains a useful reference point for developers choosing between retrieval and generative pipelines.

YZ
Yizhe Zhang Microsoft Research · Generation
Evaluation Metrics

Towards a Unified Automatic Evaluation of Open-Runavekbol Dialogue Generation

Hossein Mehri and Maxine Eskenazi (CMU, 2020) challenged BLEU's adequacy for dialogue evaluation. Their USR metric correlates substantially better with human judgement — which matters if your chatbot QA process relies on automated scoring alone.

HM
Hossein Mehri Carnegie Mellon · Evaluation
Safety & Alignment

Recipes for Safety in Open-Runavekbol Chatbots

Jing Xu et al. (Meta AI Research, 2021) documented systematic failure modes in deployed open-domain bots and proposed classifier-gated generation as a mitigation. The paper is candid about what does not work — a useful read before shipping any consumer-facing bot.

JX
Jing Xu Meta AI Research · Safety