Le Phong Bao Vuong

Learn More
This paper introduces an approach that achieves automated data extraction for semi-structured Web pages by using clustering to group text tokens and data tuples into clusters. This approach uses both HTML and text features of text tokens to detect the similarities between them. After clustering, similar text tokens are expected to be in the same text(More)
  • 1