The advent and ubiquity of mass-market portable computational devices has opened up new opportunities for the development of assistive technologies for disabilities, especially within the domain of augmentative and alternative communications (AAC) devices. Word prediction can facilitate everyday communication on mobile devices by reducing the physical interactions required to produce dialogue with them. To support personalized word prediction, a text prediction system should learn from the user’s own data to update the initial learned likelihoods that provide high quality “out of the box” performance. Within this lies an inherent trade-off: a larger corpus of initial training data can yield better default performance, but may also increase the amount of user data required for personalization of the system to be effective. We investigate a learning approach employing hierarchical modeling of phrases expected to offer sufficient “out of the box” performance relative to other learning approaches, while reducing the amount of initial training data required to facilitate on-line personalization of the text prediction system. The key insight of the proposed approach is the separation of stopwords, which primarily play syntactical roles in phrases, from keywords, which provide context and meaning in the phrase. This allows the abstraction of a phrase from an ordered list of all words to an ordered list of keywords. Thus the proposed hierarchical modeling of phrases employs two layers: keywords and stopwords. A third level abstracting the keywords to a single topic is also considered, combining the power of both topic modeling and trigrams to make predictions within and between layers. Empirically relaxed versions of the developed models are evaluated on training data composed of a mixture of slightly modified dialogues from the Santa Barbara Corpus of Spoken American English. Performance is measured in terms of the number of user interactions (keystroke or touch screen event) required to complete a phrase. We compare their performance against a system employing no prediction.