Finding semantic needles in haystacks of Web text and links∗


Content and links are used to search, rank, cluster and classify Web pages. Here I analyze and visualize similarity relationships in massive Web datasets to identify how content and link analysis should be integrated for relevance approximation. Human-generated metadata from Web directories is used to estimate semantic similarity. Highly heterogeneous… (More)


