Keinosuke Matsumoto

Learn More
Layout is a physical arrangement of document components such as text-blocks, text-lines, figures and tables in a page. The difficulty of layout analysis depends on a class of layout to be analyzed. An important and the most investigated class would be rectangular layout. Layout is rectangular if all document components are circumscribed by non-overlapping(More)
Retrieval of electronic documents is a fundamental component for intelligent access to the contents of documents. Although the history of its research is long, it is still not a trivial task, in particular, when we retrieve long documents with short queries. For the retrieval of long documents, a method called passage-based document retrieval has proven to(More)
Document image retrieval is a task to retrieve document images relevant to a user’s query. Most of existing methods based on word-level indexing rely on the representation called “bag of words” which originated in the field of information retrieval. This paper presents a new representation of documents that utilizes additional information about the location(More)