Mining a search engine's corpus without a query pool

Many websites (e.g., WedMD.com, CNN.com) provide keyword search interfaces over a large corpus of documents. Meanwhile, many third parties (e.g., investors, analysts) are interested in learning big-picture analytical information over such a document corpus, but have no direct way of accessing it other than using the highly restrictive web search interface… CONTINUE READING