Low-Latency MapReduce

Author:Xiaoran Li Advisor:Zhiying Wang

June 5, 2019


Material

Abstract

We investigate the problem of MapReduce and coded MapReduce. MapReduce is a programming model for the website search engine. It has three phases: Map, Shuffle, and Reduce. Even though it is an extensively used tool for distributed computing, the communication time involved in the shuffle phase can be a bottleneck for the overall system delay. Coded MapReduce was recently proposed that trades replicated computation for shorter communication. In this thesis, we formally define the model and algorithm of MapReduce and coded MapReduce. Moreover, we implement them with identical Map and Reduce functions. The communication time and the overall system time is analyzed theoretically and measured experimentally. Finally, as an example, reverse indexing is implemented under MapReduce and coded MapReduce framework. We analyze and measure the delay performance for this case as well. We find that when each Map computation task is replicated twice, the coded MapReduce scheme is almost twice as fast as the naive MapReduce scheme.