D4M: Dynamic Distributed Dimensional Data Model

D4M provides a simple interface to a powerful mathematical represention unifying spreadsheets, database tables, matrix, and graphs/networks.

What is D4M?

D4M is a breakthrough in computer programming that combines the advantages of five distinct processing technologies (sparse linear algebra, associative arrays, fuzzy algebra, distributed arrays, and triple-store/NoSQL databases such as Hadoop HBase and Apache Accumulo) to provide a database and computation system that addresses the problems associated with Big Data. D4M significantly improves search, retrieval, and analysis for any business or service that relies on accessing and exploiting massive amounts of digital data. Evaluations have shown D4M to simultaneously increase computing performance and to decrease the effort required to build applications by as much as 100x. Improved performance translates into faster, more comprehensive services provided by companies involved in healthcare, Internet search, network security, and more. Less, and simplified, coding reduces development times and costs. Moreover, the D4M layered architecture provides a robust environment that is adaptable to various databases, data types, and platforms.

The D4M software is intended for Data Scientists and Algorithm Developers.

Currently Supported Environments and Databases

Matlab/GNU Octave, Julia, and Python! Standard SQL databses (via JTDS bindings). Triple stores (Accumulo and potentially HBase).

MIT Open Course Ware (OCW) online class

Mathematics of Big Data and Machine Learning

MIT Press Text Book