Csaba István Sidló

Learn More
Location prediction over mobility traces may find applications in navigation, traffic optimization, city planning and smart cities. Due to the scale of the mobility in a metropolis, real time processing is one of the major Big Data challenges. In this paper we deploy distributed streaming algorithms and infrastructures to process large scale mobility data(More)
We introduce an experimental web log mining architecture with advanced storage and data mining components. The aim of the system is to give a flexible base for web usage mining of large scale Internet sites. We present experiments over logs of the largest Hungarian Web portal [origo] (www.origo.hu) that among others provides online news and magazines,(More)
" Big Data " (BD) problems require handling extremely large or complex datasets that would be difficult and expensive using traditional relational databases. Software solutions with distributed processing, weakened consistency requirements and well-designed data models help overcoming scalability issues. Wind energy systems produce extremely large datasets.(More)
Entity resolution (ER), or deduplication is a computation-ally hard problem with O(n 2) time complexity. We reformulate ER as a search problem, and develop algorithms using efficient indices. Indices can enhance algorithm scalability, facilitate distributed processing, but require additional storage space. We study the performance and trade-offs between(More)
Data quality is crucial in all information systems. As a key step in obtaining clean data, record linkage or entity resolution (ER) groups database records by the underlying real world entities. In this paper we give practical motivating examples and review the available ER formal models. The formal model for matching and merging records determines not just(More)