The next generation of data integration


Simple business questions can be surprisingly hard to answer using today's IT systems. For a large company, a question like “how many employees are tax-exempt?” may require querying hundreds of databases using multiple data models and possibly inconsistent definitions: for example, is a “contractor” also an “employee”? Over the past five years, our team has developed a new technology for performing data-integration tasks—such as querying, combining, and evolving databases—based on category theory, a branch of mathematics that has already revolutionized several areas of computer science. Category theory gives us theoretical guidance missing from the widely-used relational model of data, and we have used it to build a prototype software tool, FQL, for integrating databases more quickly and more accurately than tools that use the relational model.


Categorical Informatics was spun out of the MIT Mathematics Department in 2015. We are supported by a SBIR grant from the National Institute for Standards and Technology and an I-Corps Teams Grant from the National Science Foundation.