Tore Larsen

  • Citations Per Year
Learn More
In an earlier paper, we observed that PastSet (our experimental tuple space system) was 1.83 times faster on global reductions than LAM-MPI. Our hypothesis was that this was due to the better resource usage of the PATHS framework (an extension to PastSet that supports orchestration and configuration) due to a mapping of the communication and operations(More)
Using TCP/IP or M-VIA, the performance of the structured distributed shared memory system PastSet is measured and compared to a reference single-node implementation (excluding all intra-node communication). The latencies of PastSet-operations are measured using several micro-benchmarks. For the experiment setup used, M-VIA latencies are shown to be between(More)
We identify two ways of increasing the performance of allreduce-style of collective operations in a multi-cluster with large WAN latencies: (i) hiding latency in system noise, and (ii) conditional-allreduce where knowledge about the application is used to reduce the number of WAN messages. In our multicluster, system noise was not large enough to hide the(More)
Microarray experiments can provide molecular-level insight into a variety of biological processes, from yeast cell cycle to tumorogenesis. However, analysis of both genomic and protein microarray data requires interactive collaborative investigation by biology and bioinformatics researchers. To assist collaborative analysis, remote collaboration tools for(More)
Parallel programs running on clusters are typically decomposed and mapped to run with one thread per processor each working on its disjoint subset of the data. We evaluate performance improvements and limitations for a micro-benchmark and the NAS benchmarks, by using overdecomposition to map multiple threads to each processor to overlap computation with(More)
For collaboration, cross-platform sharing of display content amongst desktop, laptop, handheld computers and smart phones is needed. Due to architectural and performance differences, support for sharing of display content is complex and the performance is low. By using standard media players and video stream formats we reduce or avoid several of these(More)
Using a cluster of eight four-way computers, PastSet, an experimental tuple space based shared memory system, has been measured to be 1.83 times faster on global reduction than using the Allreduce operation of LAM-MPI. Our hypothesis is that this is due to PastSet using a better mapping of processes to computers resulting in less messages and more use of(More)