Benjamin Nguyen

Learn More
We consider the monitoring of a flow of incoming documents. More precisely, we present here the monitoring used in a very large warehouse built from XML documents found on the web. The flow of documents consists in XML pages (that are warehoused) and HTML pages (that are not). Our contributions are the following:<ul><li>a subscription language which(More)
The requirements for effective search and management of the WWW are stronger than ever. Currently Web documents are classified based on their content not taking into account the fact that these documents are connected to each other by links. We claim that a page’s classification is enriched by the detection of its incoming links’ semantics. This would(More)
The current abundance and complexity of P2P architectures makes it extremely difficult to assess their performance. P2PTester is the first tool devised to interface with, and measure the performance of, existing P2P data management platforms. We isolate basic components present in current P2P platforms, and insert "hooks" for P2PTester to capture, analyze(More)
Current applications, from complex sensor systems (e.g. quantified self) to online e-markets acquire vast quantities of personal information which usually ends-up on central servers. Decentralized architectures, devised to help individuals keep full control of their data, hinder global treatments and queries, impeding the development of services of great(More)
One of the promises of the Semantic Web is to support applications that easily and seamlessly deal with heterogeneous data. Most data on the Web, however, is in the Extensible Markup Language (XML) format, but using XML requires applications to understand the format of each data source that they access. To achieve the benefits of the Semantic Web involves(More)
In recent years the growth of internet applications has highlighted the limit of traditional data model representation like UML. Ontologies are means of knowledge sharing and reuse and promise a more suitable knowledge representation for building more “intelligent” applications. In this paper we define the requirements that an ontology must meet in order to(More)
In this paper, we present a method and a tool for deriving a skeleton of an ontology from XML schema files. We first recall what an is ontology and its relationships with XML schemas. Next, we focus on ontology building methodology and associated tool requirements. Then, we introduce Janus, a tool for building an ontology from various XML schemas in a given(More)
With the unstoppable growth of the world wide Web, the great success of Web search engines, such as Google and AltaVista, users now turn to the Web whenever looking for information. However, many users are neophytes when it comes to computer science, yet they are often specialists of a certain domain. These users would like to add more semantics to guide(More)
Text analytics, an increasingly important application domain, is hampered by the high barrier to entry due to the many conceptual difficulties novice developers encounter. This work addresses the problem by developing a tool to guide novice developers to adopt the best practices employed by expert developers in text analytics and to quickly harness the full(More)