What tool boxes do I need to integrate with Hadoop.

2 visualizaciones (últimos 30 días)
Adam Neuf
Adam Neuf el 10 de Ag. de 2015
Editada: Adam Neuf el 18 de Nov. de 2015
Hi, I am currently looking into integrating Matlab with a Hadoop Cluster. I have looked all over the website but it isn't clear which tool boxes are actually necessary to do this, I know that Matlab Compiler, Parallel Computing Tool Box, and the Matlab Distributed Computing Server(MDCS), are related, but I have found the website very unclear, and if all, none, or some of these are actually necessary. Thanks

Respuesta aceptada

Esther
Esther el 18 de Nov. de 2015
Hi Adam,
To integrate MATLAB with a cluster (whether a Hadoop cluster or some other generic cluster), you need MATLAB Distributed Computing Server (MDCS).
Then to send mapreduce jobs to that Hadoop cluster from MATLAB, you'll need at minimum Parallel Computing Toolbox.
Matlab Compiler is only required if you wish to package MapReduce based algorithms for deploying to production Hadoop systems.
Required:
  • MATLAB, MDCS, Parallel Computing Toolbox
Optional:
  • Matlab Compiler
  1 comentario
Adam Neufeldt
Adam Neufeldt el 18 de Nov. de 2015
I actually ended up contacting them and had a phone call with one of their engineers and here are the notes from that meeting:
There are two methods:
  • Method 1: With the parallel computing tool box(installed locally on each of our machines) and the MATLAB Distributed Computing Server(installed on the Hadoop Cluster)
-This runs interactively on a live session. You can write and test code and have it run instantaneously and it is almost identical to how you normally use Matlab except you will have all of the additional computing power of all of the cores, and you would be using Map Reduce algorithms.
  • Method 2: Matlab Compiler
- Can compile Analytics into an exe(Hadoop specific) which can then run on the cluster(so it is not intereactive). With no tool boxes at all you can still download data from the Hadoop cluster, and write and test Map Reduce algorithms on a small section of the cluster.
You can of course combine these two methods, by testing and debugging your code on the entire cluster by using the MDCS and parallel computing toolbox interactively, and then compiling the code.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Cluster Configuration en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by