Truthful and Secure Multimodal Agents

No matter how useful Large Language Models (LLMs) may be, as long as they have certain fatal weaknesses (hallucinations, security leaks, systematic failures on many inputs), their deployment in the DoW will be limited. The goal of this project is to remove these roadblocks—to create models that are truthful, that cannot be attacked or leak sensitive information, and that correctly process the many demanding types of data the DoW uses. We aim to disseminate these new techniques and tools to the wider community, enabling a broad array of models to transition seamlessly to DoW, academic, and commercial use cases. At the same time, the project team is focused on integrating all these technologies into a useful, truthful, and secure multimodal model.

As a key component of our strategy, we are pioneering a map understanding initiative that establishes a world-leading benchmark for vision-enabled LLMs. We are developing a comprehensive, first-of-its-kind dataset of map-based question-answer pairs that challenges AI systems to interpret spatial, geographic, and symbolic information as adeptly as humans. This dataset, combined with our comprehensive AI testbench that currently benchmarks the world’s best Vision-LLMs, sets a new standard for evaluating and validating AI-driven map interpretation. Complementing these advancements, our SecureLLM method is being extended to handle sensitive information, further ensuring that our models are both secure and reliable for DoW applications.

Team Glass

– Jim Glass (MIT PI)

– Boris Katz (MIT Co-PI)

– Paul Gibby (MIT Lincoln Laboratory Lead)

– Nathaniel Maidel  (DAF Liaison)

Published Research

To learn more about Truthful and Secure Multimodal Agents research and other AI Accelerator projects, view our published research here.

Are you up for a Challenge?

Learn more about AI Accelerator challenges here.