Haricharan Ramachandra

Learn More
Internet companies like LinkedIn handle a large amount of incoming web traffic. Events generated in response to user input or actions are stored in a source database. These database events feature the typical characteristics of Big Data: high volume, high velocity and high variability. Database events are replicated to isolate source database and form a(More)
Data quality is essential in big data paradigm as poor data can have serious consequences when dealing with large volumes of data. While it is trivial to spot poor data for small-scale and offline use cases, it is challenging to detect and fix data inconsistency in large-scale and online (real-time or near-real time) big data context. An example of such(More)
Cloud Computing promises a cost-effective and administration-effective solution to the traditional needs of computing resources. While bringing efficiency to the users thanks to the shared hardware and software, the multi-tenency characteristics also bring unique challenges to the backend cloud platforms. In particular, the JVM mechanisms used by Java(More)
Increasing adoption of Big Data in business environments have driven the needs of stream joining in realtime fashion. Multi-stream joining is an important stream processing type in today's Internet companies, and it has been used to generate higher-quality data in business pipelines. Multi-stream joining can be performed in two models: (1) All-In-One (AIO)(More)
For enterprise applications that deal with large scale of data, storage IO is oftentimes the performance bottleneck. SSD (Solid State Drive) is increasingly being adopted by companies/applications to alleviate applications' IO bottleneck. However, not every application/product is justified to migrate to SSD from HDD (Hard Disk Drive), as such migration will(More)
Large-scale web services like LinkedIn serve millions of users across the globe. The user experience depends on high service availability and performance of the services. In such a scenario, capacity measurement is critical for these cloud services. Resources should be provisioned such that the service can easily handle peak traffic without experiencing(More)
Modern cloud computing platforms (e.g. Linux on Intel CPUs) feature ACPI-based (Advanced Configuration and Power Interface) mechanism, which dynamically scales CPU frequencies/voltages to adjust the CPU frequencies based on the workload intensity. With this feature, CPU frequency is reduced when the workload is relatively light in order to save energy,(More)
Accurate capacity measurement of Internet services is critical to ensure high-performing production computing environments. In this work, we present our solution of performing accurate capacity measurement. Referred to as “LiveRedliner”, it uses live traffic in production environments to drive the measurement, hence avoiding many pitfalls that(More)