MultiFM: Multimodal Foundation Models for ISR Decision Making

Remote sensing imagery interpretation is crucial in ISR (Intelligence, Surveillance, and Reconnaissance) decision making. Traditional machine learning models for remote sensing are typically highly specialized and limited to a single modality and task. This specialization necessitates the redevelopment of models as tasks or modalities change, leading to resource and time inefficiencies. In contrast, recent foundation models in computer vision exhibit a wide range of capabilities, hinting at a possibility of similarly versatile models in remote sensing. These foundation models, which are trained on massive web-scale datasets, have proven effective for creative tasks by generating photorealistic images. However, when it comes to scientific and ISR applications that demand accuracy in representing reality – and quantitative estimates of uncertainty – current models may fall short.

Truly intelligent AI for ISR decision making must be capable of accurately representing a scene and identifying relevant objects from a variety of intelligence signals, while being adaptable to various downstream tasks. MultiFM aims to develop an AI agent that functions as an “autonomous decision assistant,” capable of learning scene representations and reasoning to support human decision-makers. The resulting AI will utilize a range of intelligence signals—from textual annotations and physical constraints to multimodal and multiview remote sensing data (e.g., SAR, EO/IR, LiDAR, and ground-level sensors) along with sensor parameters—to develop and refine scene representations. These refined representations will facilitate ISR decision making at speed and scale.

MultiFM: Multimodal Foundation Models for ISR Decision Making

FREQUENTLY ASKED QUESTIONS