• Corpus ID: 21675214

An O(N) Sorting Algorithm: Machine Learning Sorting

@article{Zhao2018AnOS,
  title={An O(N) Sorting Algorithm: Machine Learning Sorting},
  author={Hanqing Zhao and Yuehan Luo},
  journal={ArXiv},
  year={2018},
  volume={abs/1805.04272}
}
We propose an $O(N\cdot M)$ sorting algorithm by Machine Learning method, which shows a huge potential sorting big data. This sorting algorithm can be applied to parallel sorting and is suitable for GPU or TPU acceleration. Furthermore, we discuss the application of this algorithm to sparse hash table. 

Figures from this paper

Polyjuice: High-Performance Transactions via Learned Concurrency Control

This work proposes a learning-based framework that explicitly optimizes concurrency control via offline training to maximize performance and builds Polyjuice, a novel algorithms that can outperform existing algorithms by specializing to a given workload.

Learned Indexes for Dynamic Workloads

Doraemon caches the previously-trained models and incrementally fine-tunes them for similar access patterns and data distribution and improves the query latency by 45.1% and reduces the model re-training time to 1/20.

References

SHOWING 1-10 OF 30 REFERENCES

Parallel Sorting by Regular Sampling

Sorting and searching

Analysis of Fast Parallel Sorting Algorithms for GPU Architectures'

This paper is presenting an analysis of parallel and sequential bitonic, odd-even and rank-sort algorithms on different GPU and CPU architectures written to exploit task parallelism model as available on multi-core GPUs using the OpenCL specification.

A Comparison of Parallel Sorting Algorithms on Different Architectures

A comparative performance evaluation of three different parallel sorting algorithms: bitonic sort, sample sort, and parallel radix sort shows that the relative performance of the algorithms differed on the various machines.

A Randomized Parallel Sorting Algorithm with an Experimental Study

A novel variation on sample sort which uses only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead, and its performance is invariant over the set of input distributions unlike previous efficient algorithms.

Tight Bounds on the Complexity of Parallel Sorting

  • F. Leighton
  • Computer Science, Mathematics
    IEEE Transactions on Computers
  • 1985
Tight upper and lower bounds are proved on the number of processors, information transfer, wire area, and time needed to sort N numbers in a bounded-degree fixed-connection network.

Parallel FPGA-Based Implementation of Recursive Sorting Algorithms

The hardware implementation and optimization of parallel recursive algorithms that sort data using binary trees using a hierarchical finite state machine are described and the performance of sorting operations is increased compared to previous implementations.

In-datacenter performance analysis of a tensor processing unit

  • N. JouppiC. Young D. Yoon
  • Computer Science
    2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)
  • 2017
This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN) and compares it to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the samedatacenters.

Journal of Parallel and Distributed Computing

The toolkit leverages the principles of SECDA, a hardware/software co-design methodology, to reduce the design time of optimized DNN inference accelerators on edge devices with FPGAs and includes modules for cost-effective SystemC simulation, profiling, and AXI-based data communication.

The Art in Computer Programming

Here the authors haven’t even started the project yet, and already they’re forced to answer many questions: what will this thing be named, what directory will it be in, what type of module is it, how should it be compiled, and so on.