Haricharan Ramachandra

Learn More
Internet companies like LinkedIn handle a large amount of incoming web traffic. Events generated in response to user input or actions are stored in a source database. These database events feature the typical characteristics of Big Data: high volume, high velocity and high variability. Database events are replicated to isolate source database and form a(More)
Data quality is essential in big data paradigm as poor data can have serious consequences when dealing with large volumes of data. While it is trivial to spot poor data for small-scale and offline use cases, it is challenging to detect and fix data inconsistency in large-scale and online (real-time or near-real time) big data context. An example of such(More)
Cloud Computing promises a cost-effective and administration-effective solution to the traditional needs of computing resources. While bringing efficiency to the users thanks to the shared hardware and software, the multi-tenency characteristics also bring unique challenges to the backend cloud platforms. In particular, the JVM mechanisms used by Java(More)
Large-scale web services like LinkedIn serve millions of users across the globe. The user experience depends on high service availability and performance of the services. In such a scenario, capacity measurement is critical for these cloud services. Resources should be provisioned such that the service can easily handle peak traffic without experiencing(More)
Linux kernel feature of Cgroups (Control Groups) is being increasingly adopted for running applications in multi-tenanted environments. Many projects (e.g., Docker) rely on cgroups to isolate resources such as CPU and memory. It is critical to ensure high performance for such deployments. At LinkedIn, we have been using Cgroups and investigated its(More)
Increasing adoption of Big Data in business environments have driven the needs of stream joining in realtime fashion. Multi-stream joining is an important stream processing type in today's Internet companies, and it has been used to generate higher-quality data in business pipelines. Multi-stream joining can be performed in two models: (1) All-In-One (AIO)(More)
For enterprise applications that deal with large scale of data, storage IO is oftentimes the performance bottleneck. SSD (Solid State Drive) is increasingly being adopted by companies/applications to alleviate applications' IO bottleneck. However, not every application/product is justified to migrate to SSD from HDD (Hard Disk Drive), as such migration will(More)
Today's applications are increasingly using memory mapped files for managing large volumes of data in hoping to enjoy the performance benefits of memory mapping compared with traditional file IO. Memory mapped files uses the OS page caching mechanism to save expensive system call and copying. However, as we find out, a naive usage of memory mapped files(More)
SSD (Solid State Drive) is being increasingly adopted to alleviate the IO performance bottlenecks of applications. Numerous measurement results have been published to showcase the performance improvement brought by SSD as compared to HDD (Hard Disk Drive). However, in most deployment scenarios, SSD is simply treated as a “faster HDD”. Hence(More)