Eirinaios Michelakis

Learn More
Several real-world applications need to effectively manage and reason about large amounts of data that are inherently uncertain. For instance, pervasive computing applications must constantly reason about volumes of noisy sensory readings for a variety of reasons, including motion prediction and human behavior modeling. Such probabilistic data analyses(More)
We present Filtron, a prototype anti-spam filter that integrates the main empirical conclusions of our comprehensive analysis on using machine learning to construct effective personalized anti-spam filters. Filtron is based on the experimental results over several design parameters on four publicly available benchmark corpora. After describing Filtron's(More)
Rule-based information extraction is a process by which structured objects are extracted from text based on user-defined rules. The compositional nature of rule-based information extraction also allows rules to be expressed over previously extracted objects. Such extraction is inherently uncertain, due to the varying precision associated with the rules used(More)
The wide deployment of wireless sensor and RFID (Radio Frequency IDentification) devices is one of the key en-ablers for next-generation pervasive computing applications, including large-scale environmental monitoring and control, context-aware computing, and " smart digital homes ". Sensory readings are inherently unreliable and typically exhibit strong(More)
— Unstructured text represents a large fraction of the world's data. It often contain snippets of structured information within them (e.g., people's names and zip codes). Information Extraction (IE) techniques identify such structured information in text. In recent years, database research has pursued IE on two fronts: declarative languages and systems for(More)
Triangle counting is an important problem in graph mining. The clustering coefficient and the transitivity ratio, two commonly used measures effectively quantify the triangle density in order to quantify the fact that friends of friends tend to be friends themselves. Furthermore, several successful graph mining applications rely on the number of triangles.(More)
Triangle counting is an important problem in graph mining. The clustering coefficient and the transitivity ratio, two commonly used measures effectively quantify the triangle density in order to quantify the fact that friends of friends tend to be friends themselves. Furthermore, several successful graph mining applications rely on the number of triangles(More)
The convergence of embedded sensor systems and stream query processing suggests an important role for database techniques, in managing data that only partially – and often inaccurately – capture the state of the world. Reasoning about uncertainty as a first class citizen, inside a database system, becomes an increasingly important operation for processing(More)
  • 1