Data Set Used
This paper proposes a platform for portal and local repositories. Our methodology aims not only at construction of portal site but also at supporting capture of digital contents transformed from interview videos with intellectuals.
In this paper, we propose replication methods to perform effective document sharing on a peer-to-peer(P2P) system where peers frequently join and leave the system. The proposed method uses the relevancy and usefulness of peers to determine how many replications should be made, and where to locate these replications. This paper shows empirically that the… (More)
This paper introduces the ongoing project that aims to develop a mobile sensing framework to collect sensor data reflecting personal-scale, or microscopic, roadside phenomena by crowd sourcing and also using social big data, such as traffic, climate, and contents of social network services like Twitter. To collect them, smartphone applications are provided.… (More)
When constructing a large document archive, an important element is the digitizing of printed documents. Although various techniques for document image analysis such as Optical Character Recognition (OCR) have been developed, error handling is required in constructing real document archive systems. This paper discusses the problem from the quality… (More)
Text categorization is one of the key functions for utilizing vast amount of documents. It can be seen as a classification problem, which has been studied in pattern recognition and machine learning fields for a long time and several classification methods have been developed such as statistical classification, decision tree, support vector machines and so… (More)
We propose a stochastic context-free grammar for extracting information from scanned document images. The grammar is designed to disambiguate layout analysis and utilize both layout and text features. We applied this grammar to the problem of extracting bibliographic information from scanned academic papers and found that it can accurately extract… (More)