Cristina L. Abad

Learn More
Placing data as close as possible to computation is a common practice of data intensive systems, commonly referred to as the data locality problem. By analyzing existing production systems, we confirm the benefit of data locality and find that data have different popularity and varying correlation of accesses. We propose DARE, a distributed adaptive data(More)
A huge increase in data storage and processing requirements has lead to Big Data, for which next generation storage systems are being designed and implemented. However, we have a limited understanding of the workloads of Big Data storage systems. We consider the case of one common type of Big Data storage cluster: a cluster dedicated to supporting a mix of(More)
This paper presents Natjam, a system that supports arbitrary job priorities, hard real-time scheduling, and efficient preemption for Mapreduce clusters that are resource-constrained. Our contributions include: i) exploration and evaluation of smart eviction policies for jobs and for tasks, based on resource usage, task runtime, and job deadlines; and ii) a(More)
Intrusion detection is an important part of networkedsystems security protection. Although commercial products exist, finding intrusions has proven to be a difficult task with limitations under current techniques. Therefore, improved techniques are needed. We argue the need for correlating data among different logs to improve intrusion detection systems(More)
The address resolution protocol (ARP) is used by computers to map network addresses (IP) to physical addresses (MAC). The protocol has proved to work well under regular circumstances, but it was not designed to cope with malicious hosts. By performing ARP cache poisoning or ARP spoofing attacks, an intruder can impersonate another host (man-in-the-middle(More)
We present the design and implementation of VisFlowConnect, a powerful new tool for visualizing network traffic flow dynamics for situational awareness. The visualization capability provided by VisFlowConnect allows an operator to assess the state of a large and complex network given an overall view of the entire network and filter/drill-down features with(More)
Efficient namespace metadata management is increasingly important as next-generation file systems are designed for peta and exascales. New schemes have been proposed, however, their evaluation has been insufficient due to a lack of appropriate namespace metadata traces. Specifically, no Big Data storage system metadata trace is publicly available and(More)
This paper presents Natjam, a system that supports arbitrary job priorities, hard real-time scheduling, and efficient preemption for Mapreduce clusters that are resource-constrained. Our contributions include: i) smart eviction policies for jobs and for tasks, based on resource usage, task runtime, and job deadlines; and ii) a work-conserving task(More)
Impaired innate inflammatory response has a key role in the Crohn's disease (CD) pathogenesis. The aim of this study was to investigate the possible role of the TLR10–TLR1–TLR6 gene cluster in CD susceptibility. A total of 508 CD patients (284, cohort 1 and 224, cohort 2) and 576 controls were included. TLR10–TLR1–TLR6 cluster single-nucleotide(More)
Multicasting at the IP layer has not been widely adopted due to a combination of technical and non-technical issues. End-system multicast (also called application-layer multicast) is an attractive alternative to IP layer multicast for reasons of user management (set-up and control) and attack avoidance. Sessions can be established on demand such that there(More)