Main Content

MapReduce

Programming technique for analyzing data sets that do not fit in memory

mapreduce is a programming technique which is suitable for analyzing large data sets that otherwise cannot fit in your computer’s memory. Using a datastore to process the data in small chunks, the technique is composed of a Map phase, which formats the data or performs a precursory calculation, and a Reduce phase, which aggregates all of the results from the Map phase. For more information, see Getting Started with MapReduce.

For information about using other products with mapreduce, see Speed Up and Deploy MapReduce Using Other Products.

Functions

expand all

mapreduceProgramming technique for analyzing data sets that do not fit in memory
datastoreCreate datastore for large collections of data
addAdd single key-value pair to KeyValueStore
addmultiAdd multiple key-value pairs to KeyValueStore
hasnextDetermine if ValueIterator has one or more values available
getnextGet next value from ValueIterator
mapreducerDefine execution environment for mapreduce or tall arrays
gcmrGet current mapreducer configuration

Objects

KeyValueStoreStore key-value pairs for use with mapreduce
ValueIteratorAn iterator over intermediate values for use with mapreduce

Topics

Troubleshooting

Debug MapReduce Algorithms

This example shows how to debug mapreduce algorithms in MATLAB®. Debugging enables you to follow the movement of data between the different phases of mapreduce execution and inspect the state of all intermediate variables.