Shudang Diao

Learn More
1 Data Preprocessing 1.1 XML parsing The official datasets are XML format so we have to parse them before indexing. We choose Lucene as our tool for indexing and searching ,we select the Jakarta-commons-Digester (the following we referred to as digester) to parse the xml documents. The xml document is processed by the Digester to be a java object and then(More)
  • 1