Sample Efficient On-Line Learning of Optimal Dialogue Policies with Kalman Temporal Differences

  title={Sample Efficient On-Line Learning of Optimal Dialogue Policies with Kalman Temporal Differences},
  author={Olivier Pietquin and Matthieu Geist and Senthilkumar Chandramohan},
Designing dialog policies for voice-enabled interfaces is a tailoring job that is most often left to natural language processing experts. This job is generally redone for every new dialog task because cross-domain transfer is not possible. For this reason, machine learning methods for dialog policy optimization have been investigated during the last 15 years. Especially, reinforcement learning (RL) is now part of the state of the art in this domain. Standard RL methods require to test more or… CONTINUE READING
Highly Cited
This paper has 44 citations. REVIEW CITATIONS


Publications citing this paper.
Showing 1-10 of 27 extracted citations


Publications referenced by this paper.
Showing 1-10 of 30 references


  • Milica Gasic, Filip Jurcicek, +4 authors Steve Young. Gaussian processes for fast policy optimisa managers
  • Tokyo, Japan,
  • 2010
Highly Influential
6 Excerpts

In InterSpeech’09

  • Lihong Li, Suhrid Balakrishnan, Jason Williams. Reinforcement Learning for Dialog Manage Iteration, Fast Feature Selection
  • Brighton (UK),
  • 2009
Highly Influential
3 Excerpts

In Interspeech’10

  • Senthilkumar Chandramohan, Matthieu Geist, Olivier Pietquin. Optimizing Spoken Dialogue Management wi Iteration
  • Makuhari (Japan),
  • 2010
2 Excerpts

Similar Papers

Loading similar papers…