Prompting Foundation Models for Zero-Shot Ship Instance Segmentation in SAR Imagery

Researchers combined YOLOv11 ship detection with Segment Anything Model 2 to perform zero-shot instance segmentation on SAR maritime imagery without mask annotations. The approach uses spatial constraints from a SAR-trained detector to regularize foundation model predictions, sidestepping the need for fine-tuning or adapters.
Modelwire context
ExplainerThe real trick here is not the combination of two models but the constraint mechanism: the SAR-trained detector's bounding boxes act as spatial guardrails that stop SAM2 from hallucinating segment boundaries in a domain it was never trained on. That constraint design is the contribution, not the pipeline architecture itself.
This connects to the segmentation uncertainty work covered in 'SegWithU' from April 16, which also grappled with making foundation-model-style predictions trustworthy without retraining. Both papers are responding to the same underlying problem: general-purpose vision models produce plausible-looking outputs in specialized domains, but plausible is not the same as correct, and neither paper fully solves the calibration question. The SAR maritime context here is genuinely narrow, and the broader archive on this site skews toward language models and enterprise deployment, so direct comparisons are limited.
The meaningful test is whether this approach holds when ship density increases, specifically in congested port scenes where bounding box overlap would force SAM2 prompts to compete. If the authors or a follow-up group publish results on high-density SAR benchmarks like HRSID within the next six months, that will clarify whether the spatial constraint design scales or breaks.
Coverage we drew on
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsYOLOv11 · Segment Anything Model 2 · SAM2
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.