Learn More
Numerous real-world applications produce networked data such as web data (hypertext documents connected via hyperlinks) and communication networks (people connected via communication links). A recent focus in machine learning research has been to extend traditional machine learning classification techniques to classify nodes in such data. In this report, we(More)
A large portion of real-world data is stored in commercial relational database systems. In contrast, most statistical learning methods work only with " flat " data representations. Thus, to apply these methods, we are forced to convert our data into a flat form, thereby losing much of the relational structure present in our database. This paper builds on(More)
A key challenge for machine learning is tackling the problem of mining richly structured data sets, where the objects are linked in some way due to either an explicit or implicit relationship that exists between the objects. Links among the objects demonstrate certain patterns, which can be helpful for many machine learning tasks and are usually hard to(More)
Many databases contain uncertain and imprecise references to real-world entities. The absence of identifiers for the underlying entities often results in a database which contains multiple references to the same entity. This can lead not only to data redundancy, but also inaccuracies in query processing and knowledge extraction. These problems can be(More)
Most real-world data is heterogeneous and richly interconnected. Examples include the Web, hypertext, bibliometric data and social networks. In contrast, most statistical learning methods work with " flat " data representations, forcing us to convert our data into a form that loses much of the link structure. The recently introduced framework of(More)
The dynamic nature of citation networks makes the task of ranking scientific articles hard. Citation networks are continually evolving because articles obtain new citations every day. For ranking scientific articles, we can define the popularity or prestige of a paper based on the number of past citations at the user query time; however, we argue that what(More)