Interaction with external-memory devices, such as hard disk drives (HDDs), is a significant challenge for modern applications due to the large delays incurred by random access. In this project, we develop and study a new model for external-memory applications called bowtie streaming. Our optimized I/O scheduling algorithms are able to maintain throughputs several times higher than the fastest prior methods.
Tuxedo is a software suite containing the reference implementation of our bowtie streaming algorithms. It contains the following components:
The software package also implements an efficient external-memory distribution sort, TuxedoSort, which has been tested on single-machine sorts up to 40TB using only 20GB of memory. On this challenging task, TuxedoSort was over 8x faster than the next-fastest system, finishing the sort in only 24 hours. Note that the RAID system on our test machine has 4 GB/s sequential throughput, meaning that the theoretical minimum time just to read and write 40TB of data is roughly six hours.
Tuxedo's source code can be requested via email.
This is a simulation of the bucket game introduced in our paper on bowtie streaming. It demonstrates the distribute-from-file case, or D1 in the paper's notation.
Gabriel Stella, Dmitri Loguinov. IEEE BigData, December 2022.