Argumentative Zoning: Information Extraction from Scientific Text

Abstract

ing service in physics and the manufacturer of the INSPEC database, indexed 174,000 items in one year alone (1996), of which about 146,500 are journal articles. However, these already impressive numbers exclude less important journals, workshop proceedings, conference papers and non-English material. Indeed, the growth rate is probably exponential—Maron and Kuhns (1960) estimated that the indexed scientific material doubles in volume every 12 years. The masses of information the researcher is exposed to make it hard for her to find the needle in the haystack as it is impossible to skim-read even a portion of the potentially relevant material. The information access and search problem is particularly acute for researchers in interdisciplinary subject areas like computational linguistics or cognitive science, as they must in principle be aware of articles in a whole range of neighbouring fields, such as computer science, theoretical linguistics, psychology, philosophy and formal logic. Apart from keeping abreast of developments in scientific fields in general, more practical requirements emerge when researchers who are experienced in one scientific field start getting interested in a new scientific field, in which they have no prior knowledge. Their information needs have suddenly changed: Kircz (1991) states that such readers seek understanding instead of a firm, formal answer. The exact information need is not known beforehand; the questions they pose are not precise (Kircz’ example is the question “what are they doing in high-temperature super-conductivity?” (p. 357)). Belkin (1980) refers to their situation as an “anomalous knowledge state”. We think that researchers in a new field initially need answers to the following questions: What are the main problems and main approaches? Knowledge of a number of important concepts in the field needs to be acquired: the current problems and the standard methodologies in the field. For the main approaches, the researcher needs to know their strengths and weaknesses. The searcher also needs to gain an overview of the evaluation methodology and typical numerical results in the field. Which researchers and groups are connected with which concepts? Researchers’ names—and the institutions where they work—must be associated with seminal approaches and seminal papers. The searcher must determine schools of thought: clusters of people working together, sharing premises and building on each others work. 1.1. Information Foraging in Science 15 If researchers read a paper in a new field, they are particularly interested in the general approaches described, the relation to other work, and its conclusions, instead of specialist details (Kircz, 1991). Oddy et al. (1992) and Shum (1998) argue that what such readers particularly need is an embedding of the particular piece of work within a broader context and in relation to other works. The preferred information source at that stage of knowledge is an experienced colleague. Another standard technique for gaining a deeper overview of a field is to find a recent review article, to follow up the bibliographic links and to read however many of those papers one’s time permits. But sometimes neither of these useful aids is available, and a full-blown bibliographic search using an electronic document retrieval system is necessary, e.g. BIDS, FirstSearch or MEDLINE. This is typically done by a keyword search, where the keywords can be combined with Boolean operators. In most commercial bibliographic data bases, keyword search is still performed on document surrogates, rather than on the full text of the document, as the full text is not always available in electronic form. Typical document surrogates used in document retrieval environments are bibliographic information (i.e. title, authors, date of publication, journal name), a list of

Extracted Key Phrases

11 Figures and Tables

0102030'02'04'06'08'10'12'14'16
Citations per Year

205 Citations

Semantic Scholar estimates that this publication has 205 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Teufel1999ArgumentativeZI, title={Argumentative Zoning: Information Extraction from Scientific Text}, author={Simone Teufel and Vasilis Karaiskos and Anne Wilson and David McKelvie}, year={1999} }