From Tags to Topic Maps: Using Marked-up Hebrew Text to Discover Linguistic Patterns
Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited. INTRODUCTION A very large percentage of business and academic data is stored in textual format. With the exception of metadata, such as author, date, title and publisher, these data are not overtly structured like the standard, mainly numerical, data in relational databases. Parallel to data mining, which finds new patterns and trends in numerical data, text mining is the process aimed at discovering unknown patterns in free text. Owing to the importance of competitive and scientific knowledge that can be exploited from these texts, “text mining has become an increasingly popular and essential theme in data mining” (Han & Kamber, 2001, p. 428). Text mining has a relatively short history: “Unlike search engines and data mining that have a longer history and are better understood, text mining is an emerging technical area that is relatively unknown to IT professions” (Chen, 2001, p. vi).