Csaba István Sidló

Learn More
Location prediction over mobility traces may find applications in navigation, traffic optimization, city planning and smart cities. Due to the scale of the mobility in a metropolis, real time processing is one of the major Big Data challenges. In this paper we deploy distributed streaming algorithms and infrastructures to process large scale mobility data(More)
We introduce an experimental web log mining architecture with advanced storage and data mining components. The aim of the system is to give a flexible base for web usage mining of large scale Internet sites. We present experiments over logs of the largest Hungarian Web portal [origo] (www.origo.hu) that among others provides online news and magazines,(More)
" Big Data " (BD) problems require handling extremely large or complex datasets that would be difficult and expensive using traditional relational databases. Software solutions with distributed processing, weakened consistency requirements and well-designed data models help overcoming scalability issues. Wind energy systems produce extremely large datasets.(More)
Entity resolution (ER), or deduplication is a computation-ally hard problem with O(n 2) time complexity. We reformulate ER as a search problem, and develop algorithms using efficient indices. Indices can enhance algorithm scalability, facilitate distributed processing, but require additional storage space. We study the performance and trade-offs between(More)
STREAMLINE aims for improving the overall workflow of big data analytics systems. For this goal, it combines research in different areas to reduce the complexity of the work with data at rest and data in motion in a unified fashion. As a foundation STREAMLINE offers a uniform programming model on top of Apache Flink, for which it drives innovations in a(More)