Language represents shared conventionalization of concepts by all speakers. Hence language documentation preserves information far beyond a collection of sound shapes, lexical forms, and grammatical structures. The preservation of linguistically conventionalized conceptual structure is even more crucial for endangered language since this information is very often not available elsewhere. However, this task is rarely done since the tedious and time-consuming lexical semantic research is either not available of not feasible. In this paper, I discuss two recent developments towards a common conceptual infrastructure for multilingual language documentation. First, a Global Wordnet grid is proposed as the common infrastructure for linguistically motivated conceptual representations for all languages. The wordnet framework built upon lexical meaning and lexical semantic relations maintains cross-lingual inter-operability yet allows encoding rich idiosyncrasies of each language. Most crucially, the design of the global wordnet grid allows less computerized languages to bootstrap from resources of highly computerized languages via bilingual lexical mapping. There are, however, still two critical issues that need to be solved before global workdnet gird can be a shared (ontological) conceptual framework for all languages: the scarcity of lexical semantic information (especially from endangered languages), and the lack of a shared conceptual core as the basis of multilingual conceptual representation. We proposed a shared core common ontology based on the Swadesh list as a solution to these two critical issues. Comparing Swadesh lists from six different languages allowed us to build a small shared ontology that reflects direct human experience, and can serves as the cross-lingual conceptual core. In addition, these micro ontologized lexica can be used as seeds for developing a fully-grown and more comprehensive documentation of linguistically motivated ontology for each language.

