image: Overview of the proposed segmentation pipeline results using LangRS. Image created by M. Diab, P. Kolokoussis, and M.A. Brovelli, Politecnico di Milano and NTUA.
Credit: Mohanad Diab, et al
The amount of aerial and satellite imagery captured worldwide continues to grow. Yet, efficiently identifying and labeling features in these images—like roofs, cars, or trees—remain challenging. To that end, researchers at Politecnico di Milano and the National Technical University of Athens developed a new pipeline by combining advanced AI models with smart data-handling strategies.
The study was published in the KeAi journal Artificial Intelligence in Geosciences.
“General-purpose AI models are powerful, but they often struggle when asked to locate unfamiliar objects without explicit training,” says corresponding author Professor Maria Antonia Brovelli from Politecnico di Milano. “By using a sliding window hyper inference approach to cut large images into smaller, more manageable patches, and by applying an outlier-rejection step to remove erroneous detections, we greatly reduce computational burden of the models and improve the accuracy in identifying specific features.”
The new pipeline leverages open-source foundation models like Segment Anything Model (SAM) and Grounding DINO in a strategic two-step process. First, it intentionally over-detects objects to ensure even the smallest details are captured. This is achieved through a sliding window approach, which applies the detection model to smaller image patches. This method not only reduces the computational burden critical for large-scale remote sensing imagery, but also enhances detection accuracy.
Next, the system refines the results by filtering out irrelevant bounding boxes, such as those that are excessively large or poorly positioned, using statistical and data-driven techniques. The remaining high-quality bounding boxes are then passed to SAM, which generates precise segmentation masks.
The pipeline operates in a zero-shot manner, meaning the models were used in an off-the-shelf fashion, retaining their original training parameters without any additional fine-tuning or retraining on external data. In aerial images with a spatial resolution of less than one meter, the developed pipeline achieved outstanding segmentation results, reaching up to 99% accuracy.
“Essentially, we’re taking advantage of the versatility of off-the-shelf, large-scale AI models, by building a robust processing pipeline developed by Mohanad Diab to achieve the best results,” Kolokoussis adds. “We hope this pipeline will make automated remote sensing imagery analysis more accessible, speeding up everything from environmental surveys to urban planning.”
###
Contact author name, affiliation, email address: Mohanad Diab, Politecnico di Milano, mohanadyousef.diab@mail.polimi.it
The publisher KeAi was established by Elsevier and China Science Publishing & Media Ltd to unfold quality research globally. In 2013, our focus shifted to open access publishing. We now proudly publish more than 200 world-class, open access, English language journals, spanning all scientific disciplines. Many of these are titles we publish in partnership with prestigious societies and academic institutions, such as the National Natural Science Foundation of China (NSFC).
Journal
Artificial Intelligence in Geosciences
Method of Research
Imaging analysis
Subject of Research
Not applicable
Article Title
Optimizing zero-shot text-based segmentation of remote sensing imagery using SAM and Grounding DINO
COI Statement
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The author is an Editorial Board Member/Editor-in-Chief/Associate Editor/Guest Editor for [ISPRS - International Journal of Geoinformation] and [Taylor \& Francis - International Journal of Digital Earth]